Disclosure of Invention
The embodiment of the application provides a data processing method, a data processing device, computer equipment and a storage medium, which can cool data and further solve the problem of subsequent data dispersion caused by arranging cold data and hot data together.
The embodiment of the application provides a data processing method, which comprises the following steps:
obtaining a target data block, wherein the target data block comprises data historically written into the data block;
when the target data block is a data block positioned at the head of a data structure queue, adding the target data block to a garbage collection list;
when the number of the data blocks in the garbage collection list reaches a preset number, performing garbage collection on the data blocks in the garbage collection list to obtain a result data block;
marking a target result data block in the result data blocks, and adding the marked target result data block to the tail end of the data structure queue to cool the data in the target result data block.
Correspondingly, an embodiment of the present application provides a data processing apparatus, including:
a first obtaining unit configured to obtain a target data block, the target data block including data historically written into the data block;
the first adding unit is used for adding the target data block to a garbage collection list when the target data block is a data block positioned at the head of a data structure queue;
the first obtaining unit is used for performing garbage collection on the data blocks in the garbage collection list when the number of the data blocks in the garbage collection list reaches a preset number, so as to obtain a result data block;
the first marking unit is used for marking a target result data block in the result data blocks and adding the marked target result data block to the tail end of the data structure queue so as to cool data in the target result data block.
In an embodiment, the first obtaining unit includes:
the extraction subunit is configured to, when the number of the data blocks in the garbage collection list reaches a preset number, extract data in an effective data page on the data blocks in the garbage collection list to obtain a blank data block and effective data in the effective data page;
and the obtaining subunit is used for obtaining a result data block based on the blank data block and the effective data in the effective data page.
In an embodiment, the obtaining subunit is further configured to determine a target blank data block from the blank data blocks based on the data amount of the valid data; writing the valid data into the target blank data block to obtain a target result data block; and taking blank data blocks except the target blank data block and the target result data block as result data blocks.
In an embodiment, the data processing apparatus further includes:
a second marking unit, configured to not mark the blank data block in the result data block, where the blank data block is used for writing subsequent data.
In an embodiment, the data processing apparatus further includes:
and the processing unit is used for processing the target data block based on the proportion of valid data and invalid data in the target data block when the target data block is not the data block positioned at the head of the data structure queue.
In one embodiment, the processing unit includes:
the adding subunit is configured to add the target data block to a garbage collection list if the ratio of the valid data page to the invalid data page in the target data block does not reach a preset ratio, so that the target data block is subjected to garbage collection;
and the marking subunit is used for not marking the target data block if the proportion of the valid data page to the invalid data page in the target data block reaches a preset proportion.
In an embodiment, the data processing apparatus further includes:
the second acquisition unit is used for acquiring target data to be written into the data block;
a second obtaining unit, configured to write the target data into the blank data block, so as to obtain a target data write block in which the target data is written;
and the third marking unit is used for marking the target data writing block and adding the marked target data writing block to the tail end of the data structure queue so as to cool the target data.
Accordingly, embodiments of the present application further provide a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes steps in the data processing method provided in any of the embodiments of the present application.
Correspondingly, an embodiment of the present application further provides a storage medium, where the storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to execute steps in the data processing method provided in any embodiment of the present application.
The method comprises the steps of obtaining a target data block, wherein the target data block comprises data which are historically written into the data block; when the target data block is a data block positioned at the head of a data structure queue, adding the target data block to a garbage collection list; when the number of the data blocks in the garbage collection list reaches a preset number, performing garbage collection on the data blocks in the garbage collection list to obtain a result data block; marking a target result data block in the result data blocks, and adding the marked target result data block to the tail end of the data structure queue to cool the data in the target result data block. This scheme can cool off data, and then solves the problem of the follow-up data dispersion that leads to because of putting cold data and hot data arrangement together.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a data processing method and device, computer equipment and a storage medium. Specifically, the embodiment of the application provides a data processing device suitable for computer equipment. The computer device may be a terminal or a server, the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
Taking a computer device as an example, the terminal can acquire a target data block, wherein the target data block comprises data which are historically written into the data block; when the target data block is the data block positioned at the head of the data structure queue, adding the target data block to a garbage collection list; when the number of the data blocks in the garbage collection list reaches a preset number, performing garbage collection on the data blocks in the garbage collection list to obtain a result data block; and marking a target result data block in the result data blocks, and adding the marked target result data block to the tail end of the data structure queue to cool the data in the target result data block.
Therefore, the data can be cooled, and the problem of subsequent data dispersion caused by arranging cold data and hot data together is solved.
The following embodiments may be described in detail, and the description order of the following embodiments is not intended to limit the preferred order of the embodiments.
The embodiment of the application provides a data processing method, which can be executed by a terminal or a server, or can be executed by the terminal and the server together; the embodiment of the present application is described by taking an example in which the data processing method is executed by a terminal, and specifically, is executed by a data processing apparatus integrated in the terminal. As shown in fig. 1, the specific flow of the data processing method may be as follows:
101. a target data block is obtained, the target data block including data historically written to the data block.
The target data block refers to a data block (which may be understood as a data block) to which data is written, the target data block contains previously written data, and the target data block can perform operations such as writing data and erasing data.
In an embodiment, after the target data block is obtained, the data processing method may further include:
when the target data block is not the data block at the head of the data structure queue, the target data block is processed based on the ratio of valid data to invalid data in the target data block.
The data structure queue refers to a first-in-first-out (FIFO) queue, and in practical application scenarios, depending on device configurations, there may be one or more data structure queues, and each data structure queue has multiple data blocks for writing data.
In an example, after the target data block is obtained, whether the target data block is a data block located at the head of the data structure queue may be determined, and when the target data block is not a data block located in the data structure queue, whether to add the target data block to the garbage collection list or not to mark the target data block may be determined according to a ratio of valid data to invalid data in the target data block.
In an embodiment, the step "processing the target data block based on a ratio of valid data to invalid data in the target data block" may include:
if the proportion of the valid data pages to the invalid data pages in the target data block does not reach the preset proportion, adding the target data block to a garbage collection list so as to facilitate garbage collection of the target data block;
and if the proportion of the effective data pages to the ineffective data pages in the target data block reaches a preset proportion, not marking the target data block.
The valid data includes valid data pages, and the invalid data includes invalid data pages.
In one example, data is written into a data block in a flash memory, because the flash memory is characterized in that data is written and read in units of pages (pages), and data on the page needs to be erased before new data is written, but data is erased in units of blocks (blocks), and the data block includes a plurality of data pages (pages). Because of this characteristic, when data is written into each page of the data block, the block will be gradually consumed, and the data to be updated can only be written on the new data page, so that the data block where the old data page originally located will have invalid data pages.
102. When the target data block is the data block at the head of the data structure queue, the target data block is added to the garbage collection list.
The data blocks in the garbage collection list are used for garbage collection, and when the number of the data blocks in the garbage collection list reaches a preset number, the data blocks in the garbage collection list are collected.
In one example, garbage collection occurs when the remaining amount of data chunks is insufficient to cope with future data writes. Garbage collection is to read out valid data pages (cold data) scattered on a data block having many invalid data pages and write the data blocks into a new data block (a data block in which cold data is collected), so that the invalid data blocks can be erased. Garbage collection actually erases a collected data block for use in subsequent writing of data.
The data written in real time may be referred to as hot data, and the data written and stored for a period of time without being updated may be referred to as cold data.
The data written in the data block may be various data, for example, after the mobile phone takes a picture, the mobile phone may write the picture data to the flash memory of the mobile phone.
In an example, as shown in fig. 2, a target data block may be obtained, whether the target data block is a data block to which data is written before is determined, and if the target data block is the data block to which data is written before, the target data block is put into a garbage collection list for garbage collection. If the data block is not the data block of the previously written data, whether the proportion of the valid data and the invalid data in the target data block reaches a preset proportion is judged, if the proportion does not reach the preset proportion, the target data block is added to a garbage recovery list so as to facilitate garbage recovery of the target data block, and if the proportion of the valid data and the invalid data reaches the preset proportion, the target data block is not marked.
The data blocks placed in the garbage collection list can be subjected to garbage collection processing to obtain result blocks subjected to garbage collection processing, whether the result data blocks have written data or not can be judged, if the written data exist, the target result data blocks can be determined, the target result data blocks are added to the tail end of the data structure queue to cool the data, and if the written data do not exist, the result data blocks which do not have the written data can be unmarked so as to be used for writing data subsequently.
The determination of whether the target data block is a data block into which data is written before may be determined by determining whether the target data block is obtained from the head of the data structure queue, and if the target data block is obtained from the head of the data structure queue, the target data block is the data block into which data is written before, and the target data block may be added to the garbage collection list.
In one example, a write data block may be marked on the table, and the data block number is pushed to the tail end of the Queue (Queue) of the data structure of indefinite size, when garbage collection is needed, the data block is taken out from the top end of the data structure alignment (the block with the longest existence in the alignment on the time axis) and unlocked as the moving source object of garbage collection, and the data block is not pushed into the data structure alignment again until the data moving is completed and new data is written. When garbage collection is carried out, a source data block list is checked, if a data block which is locked to be written with data exists in a source, the sorted data block after the garbage collection is locked and pushed into the tail end of a row, when the data block is taken from the top end of the row by a data structure in the next garbage collection, if the sorted locked data block is checked and the effective data reaches a certain proportion, the data block is directly unlocked and the next data structure is taken to be the row data block, and if the effective data of the locked data block is lower than the certain proportion, the data block is added into the garbage collection list. The design can ensure that the data block of the hot data can be locked for a longer time and has a larger coverage range, and the cold data is prevented from being used as a moving source of garbage recovery.
103. And when the data blocks in the garbage collection list reach the preset number, performing garbage collection on the data blocks in the garbage collection list to obtain a result data block.
The result data block refers to a data block after garbage collection, and the result data block may include a data block in which new data is written, and may also include a data block in which new data is not written.
In an embodiment, the step of performing garbage collection on the data blocks in the garbage collection list to obtain the result data block when the number of the data blocks in the garbage collection list reaches a preset number may include:
when the number of the data blocks in the garbage collection list reaches a preset number, extracting data in an effective data page on the data blocks in the garbage collection list to obtain effective data in a blank data block and the effective data page;
and obtaining a result data block based on the blank data block and the valid data in the valid data page.
In one embodiment, the step of obtaining a result data block based on the blank data block and the valid data in the valid data page may include:
determining a target blank data block from the blank data blocks based on the data amount of the valid data;
writing the valid data into a target blank data block to obtain a target result data block;
and taking blank data blocks except the target blank data block and the target result data block as result data blocks.
Herein, garbage collection is understood to be reading out valid data pages (cold data) scattered on a data block having a plurality of invalid data pages, and writing the valid data pages (cold data) into a new data block (cold block collecting cold data), so that the invalid data block can be erased. The garbage collection is to erase the collected data blocks for subsequent data writing, so that when the data blocks in the garbage collection list reach a preset number, after data in the valid data pages on the data blocks in the garbage collection list are extracted, the blank data blocks and the valid data in the valid data pages can be obtained, then the valid data can be written into the blank data blocks according to the data amount of the extracted valid data, the target result data blocks and the rest blank data blocks in which the valid data are written can be obtained, and the target result data blocks and the rest blank data blocks can be collectively called as result data blocks.
In an embodiment, when the number of the data blocks in the garbage collection list reaches a preset number, the data blocks in the garbage collection list are subjected to garbage collection, and after a result data block is obtained, the data processing method may further include:
and marking blank data blocks in the result data blocks, wherein the blank data blocks are used for writing subsequent data.
104. And marking a target result data block in the result data blocks, and adding the marked target result data block to the tail end of the data structure queue to cool the data in the target result data block.
The data structure queue operates independently to guarantee hot data blocks, active operation cannot be conducted after the data blocks are pushed into the tail end of the data structure queue, marks can be cancelled until the top (head) of the data structure alignment is reached, and then the data blocks are searched by garbage collection and placed in a garbage collection list.
In an embodiment, the data processing method may further include:
acquiring target data to be written into a data block;
writing the target data into the blank data block to obtain a target data writing block in which the target data is written;
and marking the target data writing block, and adding the marked target data writing block to the tail end of the data structure queue to cool the target data.
In one example, as shown in fig. 3, target data to be written into a data block may be written into the data block, and then the data block into which the target data is written is marked, and then the marked data block may be pushed to the tail end of the data structure queue.
In one example, the write data block may be marked on the table, and the data block number is pushed to the tail of the fixed-size data structure Queue (Queue), when garbage collection is required, the data block may be taken from the top of the data structure pair (i.e. the head of the data structure Queue, which may be understood as the data block that exists the longest in the data structure pair on the time axis) and unlocked as the moving source object of garbage collection, and the data block may not be pushed into the data structure pair before the moving of the data is completed and new data is written, which can distinguish cold data and hot data in a certain range of operations.
The method mainly aims to achieve the aim that a storage block (data block) which is just used and full is achieved as much as possible by using a cold/hot data distinguishing technology, the storage block (data block) can be kept in a data structure queue as much as possible, and data merging is not carried out by selecting a garbage recycling mechanism and is carried out with other storage blocks (data blocks), so that the phenomenon that hot data which are frequently updated are continuously mixed with cold data stored in other storage blocks due to the garbage recycling mechanism is avoided. The advantage of distinguishing between cold/hot data is that frequently updated hot data is usually overwritten again in a short time, earlier stored data can be directly considered invalid data as soon as the data is overwritten, and when the overwritten hot data satisfies a storage block, the new storage block can be directly erased as needed, without performing garbage collection, or increasing the proportion of invalid data in a storage block, so that the recovery efficiency can be improved even if the garbage collection needs to be started.
Therefore, the data can be cooled, and the problem of subsequent data dispersion caused by arranging cold data and hot data together is solved.
In order to better implement the above method, correspondingly, the embodiment of the present application further provides a data processing apparatus, wherein the data processing apparatus may be specifically integrated in a terminal, and referring to fig. 4, the data processing apparatus may include a first obtaining unit 201, a first adding unit 202, a first obtaining unit 203, and a first marking unit 204, as follows:
(1) a first acquisition unit 201;
a first obtaining unit 201 is configured to obtain a target data block, where the target data block includes data that has been historically written into the data block.
(2) A first adding unit 202;
a first adding unit 202, configured to add the target data block to the garbage collection list when the target data block is a data block located at the head of the data structure queue.
(3) A first obtaining unit 203;
the first obtaining unit 203 is configured to perform garbage collection on the data blocks in the garbage collection list when the number of the data blocks in the garbage collection list reaches a preset number, so as to obtain a result data block.
In an embodiment, as shown in fig. 5, the first obtaining unit 203 includes:
an extracting subunit 2031, configured to, when the number of the data blocks in the garbage collection list reaches a preset number, extract data in an effective data page on the data blocks in the garbage collection list, to obtain valid data in a blank data block and the effective data page;
the get subunit 2032 is configured to get a result data block based on the blank data block and the valid data in the valid data page.
In an embodiment, the obtaining subunit 2032 is further configured to determine a target blank data block from the blank data blocks based on the data amount of the valid data; writing the valid data into a target blank data block to obtain a target result data block; and taking blank data blocks except the target blank data block and the target result data block as result data blocks.
(4) A first marking unit 204;
the first marking unit 204 is configured to mark a target result data block of the result data blocks, and add the marked target result data block to the tail end of the data structure queue to cool the data in the target result data block.
In one embodiment, the data processing apparatus further includes:
and a second marking unit 205, configured to not mark a blank data block in the result data block, where the blank data block is used for writing subsequent data.
In one embodiment, the data processing apparatus further includes:
the processing unit 206 is configured to process the target data block based on a ratio of valid data to invalid data in the target data block when the target data block is not a data block at the head of the data structure queue.
In one embodiment, as shown in fig. 6, the processing unit 206 includes:
an adding subunit 2061, configured to add the target data block to the garbage collection list if the ratio of the valid data page to the invalid data page in the target data block does not reach the preset ratio, so that the target data block is subjected to garbage collection;
the marking subunit 2062 is configured to not mark the target data block if the ratio of the valid data page to the invalid data page in the target data block reaches a preset ratio.
In one embodiment, the data processing apparatus further includes:
a second obtaining unit 207, configured to obtain target data of a data block to be written;
a second obtaining unit 208, configured to write the target data into the blank data block, and obtain a target data write block into which the target data is written;
the third marking unit 209 marks the target data write block and adds the marked target data write block to the tail end of the data structure queue to cool the target data.
As can be seen from the above, the first obtaining unit 201 of the data processing apparatus of the embodiment of the present application obtains the target data block, where the target data block includes data historically written into the data block; then, when the target data block is the data block at the head of the data structure queue, the first adding unit 202 adds the target data block to the garbage collection list; when the number of the data blocks in the garbage collection list reaches a preset number, the first obtaining unit 203 performs garbage collection on the data blocks in the garbage collection list to obtain a result data block; the first marking unit 204 marks a target result data block of the result data blocks, and adds the marked target result data block to the tail end of the data structure queue to cool the data in the target result data block. This scheme can cool off data, and then solves the problem that the follow-up data that leads to because of putting together cold data and hot data arrangement.
In addition, an embodiment of the present application further provides a computer device, where the computer device may be a device such as a terminal or a server, and as shown in fig. 7, a schematic structural diagram of the computer device according to the embodiment of the present application is shown, specifically:
the computer device may include components such as a processor 701 of one or more processing cores, memory 702 of one or more storage media, a power supply 703, and an input unit 704. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 7 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. Wherein:
the processor 701 is a control center of the computer apparatus, connects various parts of the entire computer apparatus using various interfaces and lines, and performs various functions of the computer apparatus and processes data by running or executing software programs and/or modules stored in the memory 702 and calling data stored in the memory 702, thereby monitoring the computer apparatus as a whole. Optionally, processor 701 may include one or more processing cores; preferably, the processor 701 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 701.
The memory 702 may be used to store software programs and modules, and the processor 701 executes various functional applications and data processing by operating the software programs and modules stored in the memory 702. The memory 702 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, application programs (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 702 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 702 may also include a memory controller to provide the processor 701 with access to the memory 702.
The computer device further includes a power supply 703 for supplying power to the various components, and preferably, the power supply 703 is logically connected to the processor 701 through a power management system, so that functions of managing charging, discharging, and power consumption are implemented through the power management system. The power supply 703 may also include any component including one or more of a dc or ac power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The computer device may also include an input unit 704, the input unit 704 being operable to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the computer device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 701 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 702 according to the following instructions, and the processor 701 runs the application program stored in the memory 702, thereby implementing various functions as follows:
acquiring a target data block, wherein the target data block comprises data historically written into the data block; when the target data block is the data block positioned at the head of the data structure queue, adding the target data block to a garbage collection list; when the number of the data blocks in the garbage collection list reaches a preset number, performing garbage collection on the data blocks in the garbage collection list to obtain a result data block; and marking a target result data block in the result data blocks, and adding the marked target result data block to the tail end of the data structure queue to cool the data in the target result data block.
Therefore, the data can be cooled, and the problem of subsequent data dispersion caused by arranging cold data and hot data together is solved.
It will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by instructions or by instructions controlling associated hardware, and the instructions may be stored in a storage medium and loaded and executed by a processor.
To this end, the present application provides a storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any one of the data processing methods provided by the present application. For example, the instructions may perform the steps of:
acquiring a target data block, wherein the target data block comprises data historically written into the data block; when the target data block is the data block positioned at the head of the data structure queue, adding the target data block to a garbage collection list; when the number of the data blocks in the garbage collection list reaches a preset number, performing garbage collection on the data blocks in the garbage collection list to obtain a result data block; and marking a target result data block in the result data blocks, and adding the marked target result data block to the tail end of the data structure queue to cool the data in the target result data block.
Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the storage medium can execute the steps in any data processing method provided in the embodiments of the present application, beneficial effects that can be achieved by any data processing method provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.
According to an aspect of the application, there is provided, among other things, a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the data processing method provided in the summary and the embodiments of the invention.
The foregoing detailed description has provided a data processing method, an apparatus, a computer device, and a storage medium according to embodiments of the present application, and specific examples have been applied in the present application to explain the principles and implementations of the present application, and the descriptions of the foregoing embodiments are only used to help understand the method and the core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.