WO2016192605A1 - 一种数据处理方法和装置 - Google Patents

一种数据处理方法和装置 Download PDF

Info

Publication number
WO2016192605A1
WO2016192605A1 PCT/CN2016/083890 CN2016083890W WO2016192605A1 WO 2016192605 A1 WO2016192605 A1 WO 2016192605A1 CN 2016083890 W CN2016083890 W CN 2016083890W WO 2016192605 A1 WO2016192605 A1 WO 2016192605A1
Authority
WO
WIPO (PCT)
Prior art keywords
confirmation
data block
queue
read
confirmed
Prior art date
Application number
PCT/CN2016/083890
Other languages
English (en)
French (fr)
Inventor
杨天洋
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2016192605A1 publication Critical patent/WO2016192605A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking

Definitions

  • the present invention relates to the field of data processing, and in particular, to a data processing method and apparatus.
  • processing devices In the data processing of the Internet, processing devices often need to read data blocks from a database or other data storage unit, and process or confirm the read data blocks.
  • the data read in the data storage unit such as a database is generally in the form of one or more data block queues, and the reading device sequentially reads and confirms the data blocks from the one or more data block queues.
  • the data volume of the data block queue is very large, and the efficiency of serial reading or single-thread reading of the data block is too low.
  • reading devices generally use parallel reading of data blocks to improve reading efficiency.
  • Parallel read mode causes the order of reading data blocks from the data block queue to be different from the order of data blocks after parallel confirmation.
  • a read interrupt may occur due to processing device restart or processing capability limitation.
  • the processing device cannot confirm which data blocks have been acknowledged, which data blocks have not been read, or which data blocks have been processed due to the different reading order and the confirmation order of the data blocks. Read but not yet confirmed.
  • the processing device acts to identify that the persisted data block has been read and confirmed by persisting the data block that has been read and confirmed. That is to say, through the persistence operation, the processing device does not read the data block that has been persisted when it continues to read the data block after the interruption.
  • the processing device reads and confirms the data block in parallel much faster than the persistence speed, and there will be a large number of read and confirmed data blocks waiting for the persistence service to persist its processing, resulting in The processing of persistent services is very stressful.
  • the present invention provides a data processing method and apparatus, so that it is not necessary to perform a persistence operation on each of the data blocks that have been read and confirmed, and it is also possible to ensure that the reading is continued after the interruption occurs.
  • the data block in the read data block queue that has not been read and confirmed is not skipped by the processing device, and the processing pressure of the persistent service is reduced under the condition of ensuring the data reading accuracy.
  • a data processing method comprising:
  • the processing device reads the data block from the queue of read data blocks
  • the processing device generates a confirmation object having a one-to-one relationship with the data block, and the confirmation object is used to record an acknowledgement status of the data block having the association relationship;
  • the processing device establishes an acknowledgment queue including N acknowledgment objects, the order of arrangement of the N acknowledgment objects in the acknowledgment queue is the same as the reading order, and the reading order is the same as the N acknowledgment objects.
  • the confirmation object located in the queue header in the confirmation queue is the most-formed data object among the N confirmation objects.
  • the processing device modifies the confirmation status of the confirmation object associated with the data block that is read and confirmed from the unconfirmed to confirm;
  • the processing device performs a persistence operation on the data block that is read and confirmed according to the confirmation status of the confirmation object in the confirmation queue.
  • the processing device performs a persistence operation on the data block that is read and confirmed according to the confirmation status of the confirmation object in the confirmation queue, and specifically includes:
  • the processing device identifies the confirmation status of the confirmation object located in the queue header, and if the confirmation status is confirmed, deletes the confirmation object located in the queue header, and the confirmation queue is The next confirmation object of the deleted confirmation object is placed in the queue header, and the confirmation status of the confirmation object located in the queue header is recognized again until the confirmation status of the confirmation object located in the queue header is unconfirmed. And storing the location information of the data block associated with the confirmation object that is located before the confirmation object of the queue header in the persistence area, where the location information is the corresponding data. Location information of the block in the queue of read data blocks.
  • the method further includes:
  • the processing device acquires location information of the newly saved data block from the persistent region
  • the processing device reads, according to the location information of the newly saved data block, from the queue of the read data block The position should continue to read the data block.
  • the processing device reads the data block from the queue of read data blocks using parallel read.
  • the confirmation queue is saved in the cache.
  • the confirmation object is used to record the confirmation status of the data block having the association relationship, and specifically includes:
  • a data processing device comprising:
  • a reading unit configured to read a data block from the queue of read data blocks
  • a generating unit configured to generate a confirmation object having a one-to-one relationship with the data block, where the confirmation object is used to record an acknowledgement status of the data block having the association relationship;
  • a establishing unit configured to establish an acknowledgment queue including N acknowledgment objects, wherein the N acknowledgment objects are arranged in the same order as the reading order, and the reading order is the same as the N acknowledgment objects
  • the order in which the N data blocks of the association relationship are read from the read data block queue, and the confirmation object located in the queue header in the confirmation queue is the data of the N most confirmed objects Object
  • a modifying unit configured to: if the data block in the queue of the read data block is read and confirmed, confirm the confirmation status of the confirmation object associated with the data block that is read and confirmed from the unconfirmed to the confirmed;
  • a persistence unit configured to perform a persistence operation on the data block that is read and confirmed according to the confirmation status of the confirmation object in the confirmation queue.
  • the persistence unit is specifically configured to identify an acknowledgement status of the confirmation object located in the queue header, and if the confirmation status is confirmed, delete the confirmation object located in the queue header, and according to the arrangement order
  • the next confirmation object of the confirmation object to be deleted in the confirmation queue is placed in the queue header, and the confirmation state of the confirmation object located in the queue header is recognized again until the confirmation of the confirmation object located in the queue header
  • the position information of the data block associated with the confirmation object that is located before the confirmation object of the queue header is saved in the persistent area according to the arrangement order, and the position is saved in the persistent area.
  • the information is location information of the corresponding data block in the queue of the read data block.
  • the method further includes:
  • An obtaining unit configured to obtain location information of the newly saved data block from the persistent region
  • the reading unit is further configured to continue to read the data block from the corresponding position in the queue of the read data block according to the location information of the newly saved data block.
  • the reading unit reads the data block from the read data block queue using parallel reading.
  • the confirmation queue is saved in the cache.
  • the confirmation object is used to record the confirmation status of the data block having the association relationship, and specifically includes:
  • the processing device when the processing device reads the data block, it generates a confirmation object for recording the confirmation status of the data block in a one-to-one relationship with the data block, and according to the data block queue from the read data block.
  • the reading order of the read data block establishes an acknowledgment queue including N acknowledgment objects, and the processing device confirms the acknowledgment state as unconfirmed based on the acknowledgment status of the acknowledgment object in the acknowledgment queue and the arbitrage order of the acknowledgment queue.
  • the data block that has been read and confirmed by the data block read by the processing device is used for the persistence operation, and the data block for each read confirmation is not subjected to the persistence operation, and the data reading accuracy is ensured. In this case, the processing pressure of the persistent service is reduced.
  • FIG. 1 is a flowchart of a method for processing a data according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of establishing a confirmation queue according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a method for continuing to read after a read interrupt according to an embodiment of the present invention
  • FIG. 4 is a structural diagram of a device of a data processing apparatus according to an embodiment of the present invention.
  • FIG. 5 is a structural diagram of a device of a data processing apparatus according to an embodiment of the present invention.
  • processing devices In the data processing of the Internet, processing devices often need to read data blocks from a database or other data storage unit, and process or confirm the read data blocks.
  • the data volume of the data block queue is very large, and the efficiency of serial reading or single-thread reading of the data block is too low.
  • reading devices generally use parallel reading of data blocks to improve reading efficiency.
  • Parallel read mode causes the order of reading data blocks from the data block queue to be different from the order of data blocks after parallel confirmation.
  • a read interrupt may occur due to processing device restart or processing capability limitation.
  • the processing device cannot confirm which data blocks have been acknowledged, which data blocks have not been read, or which data blocks have been processed due to the different reading order and the confirmation order of the data blocks. Read but not yet confirmed. In order to ensure that when the interrupt is read and resumed, there will be no case where the data block that has been read is repeatedly read or the data block that has not been read is skipped and is not read, usually used for each The data block that has been read and confirmed is persisted and saved to the persistence area.
  • the persistence can be understood as saving the data block in the memory to a database, a file, etc., and the processing device performs the function of identifying that the persistent data block has been read and confirmed by persisting the data block that has been read and confirmed. . That is to say, through the persistence operation, the processing device does not read the data block that has been persisted when it continues to read the data block after the interruption.
  • the processing device reads and confirms the data block in parallel much faster than the persistence speed, and there is a huge amount of read and confirmed data blocks waiting for the persistence service to persist it.
  • the processing that leads to persistent services is very stressful.
  • the embodiment of the present invention provides a data processing method and apparatus.
  • the processing device reads a data block
  • the processing device generates a confirmation object for recording the confirmation status of the data block in a one-to-one relationship with the data block.
  • an acknowledgment queue including N acknowledgment objects according to a reading order of reading the data blocks from the read data block queue, and the processing device determines the acknowledgment status of the acknowledgment objects in the acknowledgment queue and the order of the acknowledgment queues.
  • the last read-confirmed data block of the data block whose status is unconfirmed and read by the processing device is persistent.
  • the operation does not require a persistence operation for each data block that has been read and confirmed, which reduces the processing pressure of the persistent service while ensuring the accuracy of data reading.
  • the processing device acquires the last saved data block from the persistent region before continuing to read.
  • Location information that is, the data block that was last executed before the interrupt was read.
  • the processing device continues to read the data block from the corresponding position in the queue of the read data block according to the location information. Since the arrangement order and the reading order are the same, if the location information of the data block a is newly stored in the persistent area, the data block a and the previous confirmation object a of the confirmation object b located in the queue header in the current confirmation queue have Correspondence relationship.
  • the reading When the processing speed is merged and the data block is continuously read, the reading will be started directly from the data block a without repeatedly reading the data blocks before the data block a in the read data block queue.
  • the acknowledgment status of the acknowledgment object corresponding to the data block can be determined that these data blocks have all been confirmed by reading. This reduces the number of read-confirmed data blocks that may be repeatedly read when the reading is continued after the processing device is interrupted, improving processing efficiency.
  • FIG. 1 is a flowchart of a method for processing a data according to an embodiment of the present invention, where the method includes:
  • S101 The processing device reads the data block from the read data block queue.
  • S102 The processing device generates a confirmation object having a one-to-one relationship with the data block, and the confirmation object is used to record an acknowledgement status of the data block having the association relationship.
  • the processing device may read the data block from the read data block queue by using parallel reading.
  • the confirmation object is used to record the confirmation status of the data block having the association relationship, and specifically includes:
  • the confirmation state of the confirmation target coincides with the confirmation state of the data block corresponding to the confirmation target.
  • the processing device establishes an acknowledgment queue including N acknowledgment objects, and the order of the N acknowledgment objects in the acknowledgment queue is the same as the reading order, and the reading order is the N acknowledgment objects.
  • the order in which the N data blocks having the association relationship are read from the read data block queue, and the confirmation object located in the queue header in the confirmation queue is the most satisfied among the N confirmation objects Data object.
  • the processing device generates a confirmation object every time a data block is read from the read data block queue, and the data block has a one-to-one correspondence with the confirmation object.
  • the processing device sequentially generates the confirmation objects into the confirmation queue, that is, the reading order of the processing device to read the data blocks and the order of the confirmation objects in the confirmation queue are the same.
  • the confirmation queue is saved in the cache.
  • the cache can be, for example, a memory.
  • FIG. 2 is a schematic diagram of establishing a confirmation queue according to an embodiment of the present invention.
  • the processing device first reads out the data block 1 from the read data block queue, thereby generating a corresponding confirmation object 1, and then the processing device reads the data block 2, and generates a corresponding confirmation object 2 to In this way, the data block N is read and a corresponding confirmation object N is generated.
  • the reading order of the processing device reads the data block is consistent with the order of the corresponding confirmation objects in the confirmation queue.
  • the direction of the arrow in the data block queue of Figure 2 is the reading order.
  • the reading time of the leftmost data block is relatively old, and the reading time from left to right becomes larger, and the rightmost data of the queue.
  • the block is the data block read by the latest processed device.
  • Figure 2 confirms that the direction of the arrow in the queue is in the order of arrangement.
  • the leftmost confirmation object of the queue is relatively early to join the confirmation queue.
  • the later the time from the left to the right joins the queue, the rightmost confirmation object is the latest addition to the confirmation. Queued. That is to say, the processing device reads the data block 1 in the reading order prior to the data block 2, and the order of the confirmation object 1 corresponding to the data block 1 in the confirmation queue also precedes the confirmation object 2 corresponding to the data block 2.
  • S104 If the data block in the read data block queue is read and confirmed by the processing device, the processing device confirms the confirmation status of the confirmation object associated with the data block that is read and confirmed. It has been confirmed.
  • the confirmation status of the confirmation object changes according to the state of the read confirmation of the corresponding data block.
  • the processing device just reads a data block, the data block is in the unread confirmation state, then the The confirmation status of the confirmation object corresponding to the data block is also unconfirmed.
  • the confirmation status of the confirmation object corresponding to the data block will be changed to confirmed.
  • S105 The processing device performs a persistence operation on the data block that is read and confirmed according to the confirmation status of the confirmation object in the confirmation queue.
  • the location information can be used to relocate the data block next time, so that when the read is interrupted, The sequential reading can be re-started from the data block in the read data block queue according to the location information.
  • the embodiment of the present invention provides a specific manner for performing a persistence operation on the data block that is read and confirmed according to the confirmation status of the confirmation object in the confirmation queue, and specifically includes:
  • the processing device identifies the confirmation status of the confirmation object located in the queue header, and if the confirmation status is confirmed, deletes the confirmation object located in the queue header, and the confirmation queue is The next confirmation object of the deleted confirmation object is placed in the queue header, and the confirmation status of the confirmation object located in the queue header is recognized again until the confirmation status of the confirmation object located in the queue header is unconfirmed. And storing the location information of the data block associated with the confirmation object that is located before the confirmation object of the queue header in the persistence area, where the location information is the corresponding data. Location information of the block in the queue of read data blocks.
  • the processing device does not need to perform a persistence operation on all data blocks that have been acknowledged.
  • the processing device queries the confirmation queue, the confirmation status of each confirmation object in the confirmation queue is deleted, and when the confirmation status of the confirmation object in the queue header is confirmed, the processing device deletes the selected status in the arrangement order.
  • the next confirmation object of the deleted confirmation object is placed in the queue header. For example, as shown in FIG. 2, the data block 1, the data block 3, and the data block 4 are read and confirmed, and the confirmation object 1, the confirmation object 3, and the confirmation are confirmed.
  • the acknowledgment status of object 4 will be changed accordingly to confirmed, and data block 2 and data block 5 have not been read and confirmed by the processing device.
  • the processing device When the processing device queries the acknowledgment list, it finds that the acknowledgment status of the acknowledgment object 1 in the queue header is confirmed, deletes the acknowledgment object 1, and continues the acknowledgment object 2 as the queue header, and the acknowledgment state of the acknowledgment object 2 is unconfirmed.
  • the processing device stores the location information of the data block 1 corresponding to the storage confirmation object 1 in the persistent area.
  • the processing device deletes the confirmation target 2, and the confirmation target 3 continues to be judged as the queue header, because the confirmation object 3
  • the processing device deletes the confirmation target 3 and the confirmation target 4, and continues the determination of the confirmation target 5 as the queue header.
  • the confirmation target 5 is retained.
  • the position information of the data block 4 corresponding to the confirmation object 4 is saved in the persistent area.
  • the persistence operation needs to be performed on the data block 1, the data block 2, the data block 3, and the data block 4, that is, it needs to be performed four times; It is only necessary to perform a persistence operation on Data Block 1 and Data Block 4, that is, it only needs to be executed twice, and the execution of the persistence operation is reduced twice.
  • the processing pressure of the persistence service can be effectively reduced, at least the time for the data block that needs to be persisted to wait for being persisted is shortened.
  • the processing device when the processing device reads the data block, it generates a confirmation object for recording the confirmation status of the data block in a one-to-one relationship with the data block, and reads the data according to the data block queue from the read data block.
  • Block read An acknowledgment queue including N acknowledgment objects is established, and the processing device reads the acknowledgment status as unconfirmed and read by the processing device earlier according to the acknowledgment status of the acknowledgment object in the acknowledgment queue and the order of acknowledgment queues.
  • the last read block of the data block is persisted without any persistence operation for each block of data that has been read and confirmed, and the persistence service is reduced while ensuring data read accuracy. Handle pressure.
  • FIG. 3 is a schematic diagram of a method for continuing to read after a read interrupt according to an embodiment of the present invention. If the processing device reads a data interrupt from the read data block queue, the method includes:
  • S301 The processing device acquires location information of the newly saved data block from the persistent area.
  • the location information of the newly saved data block can be understood as the shortest time between the last save to the persistent area before the read interrupt, or the time to read into the persistent area.
  • S302 The processing device continues to read the data block from the corresponding position in the queue of the read data block according to the location information of the newly saved data block.
  • data block 1 and data block 2 have been read and confirmed, data block 4 and data block 5 have also been read and confirmed, and data block 3 has not been read and confirmed.
  • the reading of the processing device is interrupted, and the location information of the newly saved data block in the persistent area should be the location information of the data block 2 before the processing device reads the data block 3.
  • the location information of the data block 2 acquired through S301 is read from the read data according to the location information of the data block 2.
  • the position corresponding to the data block 2 in the block queue starts to continue to be read, so that the processing device can eliminate the data block 1 and the data block 2 without repeating the reading, thereby reducing the data block that may be repeatedly read when the processing device interrupts and continues to read. quantity. It can also ensure that unconfirmed data blocks can be read again without loss.
  • the processing device acquires the location information of the last saved data block from the persistent area, that is, the data block that was last executed before the interruption, before continuing to read.
  • the processing device continues to read the data block from the corresponding position in the queue of the read data block according to the location information. Since the arrangement order and the reading order are the same, if the location information of the data block a is newly stored in the persistent area, the data block a and the previous confirmation object a of the confirmation object b located in the queue header in the current confirmation queue have Correspondence relationship.
  • the reading When the processing speed is merged and the data block is continuously read, the reading will be started directly from the data block a without repeatedly reading the data blocks before the data block a in the read data block queue.
  • the acknowledgment status of the acknowledgment object corresponding to the data block can be determined that these data blocks have all been confirmed by reading. This reduces the number of read-confirmed data blocks that may be repeatedly read when the reading is continued after the processing device is interrupted, improving processing efficiency.
  • FIG. 4 is a structural diagram of a device of a data processing apparatus according to an embodiment of the present invention, including:
  • the reading unit 401 is configured to read a data block from the read data block queue.
  • the reading unit 401 reads a data block from the read data block queue by using parallel reading. In order to increase the reading speed.
  • the generating unit 402 is configured to generate a confirmation object having a one-to-one relationship with the data block, where the confirmation object is used to record an acknowledgement status of the data block having the association relationship.
  • the confirmation object is used to record the confirmation status of the data block having the association relationship, and specifically includes:
  • the confirmation state of the confirmation target coincides with the confirmation state of the data block corresponding to the confirmation target.
  • the establishing unit 403 is configured to establish an acknowledgment queue including N acknowledgment objects, wherein the order of the N acknowledgment objects in the acknowledgment queue is the same as the reading order, and the reading order is the N acknowledgment objects
  • the order in which the N data blocks having the association relationship are read from the read data block queue, and the confirmation object located in the queue header in the confirmation queue is the most satisfied among the N confirmation objects Data object.
  • the reading unit 401 reads one data block from the read data block queue, and the generating unit 402 generates a confirmation object, and the data block has a one-to-one correspondence with the confirmation object.
  • the establishing unit 403 sequentially forms the generated confirmation objects into the confirmation queue, that is, the read order in which the reading unit 401 reads the data blocks and the order in which the confirmation objects in the confirmation queue are arranged are the same.
  • the confirmation queue is saved in the cache.
  • the cache can be, for example, a memory.
  • the processing device first reads out the data block 1 from the read data block queue, thereby generating a corresponding confirmation object 1, and then the processing device reads the data block 2, and generates a corresponding confirmation object 2 to In this way, the data block N is read and a corresponding confirmation object N is generated.
  • the reading order of the processing device reads the data block is consistent with the order of the corresponding confirmation objects in the confirmation queue.
  • the direction of the arrow in the data block queue of Figure 2 is the reading order. It can be understood that the reading time of the leftmost data block is relatively old, and the reading time from left to right becomes larger, and the rightmost data of the queue.
  • the block is the data block read by the latest processed device.
  • Figure 2 confirms that the direction of the arrow in the queue is in the order of arrangement.
  • the leftmost confirmation object of the queue is relatively early to join the confirmation queue.
  • the later the time from the left to the right joins the queue, the rightmost confirmation object is the latest addition to the confirmation. Queued. That is to say, the processing device reads the data block 1 in the reading order prior to the data block 2, and the order of the confirmation object 1 corresponding to the data block 1 in the confirmation queue also precedes the confirmation object 2 corresponding to the data block 2.
  • the modifying unit 404 is configured to modify the confirmation status of the confirmation object associated with the data block that is read and confirmed from the unconfirmed to the confirmed if the data block in the read data block queue is read and confirmed.
  • the confirmation status of the confirmation object changes according to the state of the read confirmation of the corresponding data block.
  • the reading unit 401 just reads a data block
  • the data block is in the unread confirmation state.
  • the confirmation status of the confirmation object corresponding to the data block is also unconfirmed.
  • the confirmation status of the confirmation object corresponding to the data block will be changed to confirmed.
  • the persistence unit 405 is configured to perform a persistence operation on the data block that is read and confirmed according to the confirmation status of the confirmation object in the confirmation queue.
  • the location information can be used to relocate the data block next time, so that when the read interrupt is continued, the sequential read can be re-started from the data block in the read data block queue according to the location information.
  • the persistence unit 405 is specifically configured to identify an acknowledgement status of the confirmation object located in the queue header, and if the confirmation status is confirmed, delete the confirmation object located in the queue header, according to the Arranging the next confirmation object of the confirmation object to be deleted from the confirmation queue in the queue header, and identifying the confirmation status of the confirmation object located in the queue header again until the queue head is located
  • the position information of the data block associated with the confirmation object that is located before the confirmation object of the queue header is stored in the persistent area according to the arrangement order.
  • the location information is location information of a corresponding data block in the queue of read data blocks.
  • the persistence unit 405 does not need to perform a persistence operation on all data blocks that have been read and confirmed.
  • the persistence unit 405 passes the confirmation status of each confirmation object in the confirmation queue, and deletes when the confirmation status of the confirmation object in the queue header is confirmed, and the arrangement is performed.
  • the next confirmation object of the deleted confirmation object in the sequence is placed in the queue header. For example, as shown in FIG. 2, the data block 1, the data block 3, and the data block 4 are read and confirmed, and the confirmation object 1 is confirmed.
  • the acknowledgment status of object 3 and acknowledgment object 4 will be changed accordingly to confirmed, and data block 2 and data block 5 have not been read and confirmed by the processing device.
  • the persistence unit 405 finds that the acknowledgment state of the acknowledgment object 1 in the queue header is confirmed, deletes the acknowledgment object 1, and continues the acknowledgment object 2 as the queue header, and confirms the acknowledgment state of the object 2 In the case of unconfirmed, the persistence unit 405 stores the location information of the data block 1 corresponding to the confirmation object 1 in the persistent region. Next, if the data block 2 is also read and confirmed by the processing device, the confirmation state of the confirmation object 2 is changed to confirmed, the persistence unit 405 deletes the confirmation object 2, and continues the determination of the confirmation object 3 as the queue header.
  • the persistence unit 405 deletes the confirmation target 3 and the confirmation target 4, and continues the determination of the confirmation target 5 as the queue header, and confirms that the confirmation state of the object 5 is not If it is confirmed, the confirmation object 5 is retained, and the position information of the data block 4 corresponding to the confirmation object 4 is saved in the persistent area.
  • the persistence operation needs to be performed on the data block 1, the data block 2, the data block 3, and the data block 4, that is, it needs to be performed four times; It only needs to perform the persistence operation on data block 1 and data block 4, that is, it only needs to be executed twice, and the execution of the persistence operation is reduced twice.
  • the processing pressure of the persistence service can be effectively reduced, at least the time for the data block that needs to be persisted to wait for being persisted is shortened.
  • the processing device when the processing device reads the data block, it generates a confirmation object for recording the confirmation status of the data block in a one-to-one relationship with the data block, and reads the data according to the data block queue from the read data block.
  • the reading order of the block establishes an acknowledgment queue including N acknowledgment objects, and the processing device confirms the acknowledgment state as an unconfirmed and earlier processed device according to the acknowledgment state of the acknowledgment object in the acknowledgment queue and the arranging order of the acknowledgment queue.
  • the previously read data block of the read data block is subjected to the persistence operation without performing the persistence operation on each of the data blocks that have been read and confirmed, and the data reading accuracy is lowered.
  • the processing pressure of persistent services is provided to the persistence operation without performing the persistence operation on each of the data blocks that have been read and confirmed, and the data reading accuracy is lowered.
  • FIG. 5 is a structural diagram of a device of a data processing apparatus according to an embodiment of the present invention, if the reading unit 401 reads a data block from a queue of read data blocks. Read interrupts occur during the process, including:
  • the obtaining unit 501 is configured to obtain location information of the newly saved data block from the persistent area.
  • the reading unit 401 is further configured to continue to read the data block from the corresponding position in the queue of the read data block according to the location information of the newly saved data block.
  • data block 1 and data block 2 have been read and confirmed, data block 4 and data block 5 have also been read and confirmed, and data block 3 has not been read and confirmed.
  • the reading of the reading unit 401 is interrupted, and the position information of the newly saved data block in the persistent area should be the position information of the data block 2 before the reading unit 401 reads the data block 3.
  • the location information of the data block 2 acquired by the obtaining unit 501 is based on the location information of the data block 2.
  • the reading is continued from the position corresponding to the data block 2 in the read data block queue, so that the reading unit 401 can eliminate the reading of the data block 1 and the data block 2 without repeated reading, thereby reducing the interruption of the processing device.
  • the processing device if the processing device has a read interruption in the process of reading the data block, the processing device is from the persistent region before continuing to read. Obtaining location information of the last saved data block, that is, a data block that is last executed before the interruption, and the processing device continues from the corresponding position in the queue of the read data block according to the location information. Read the data block. Since the arrangement order and the reading order are the same, if the location information of the data block a is newly stored in the persistent area, the data block a and the previous confirmation object a of the confirmation object b located in the queue header in the current confirmation queue have Correspondence relationship.
  • the reading When the processing speed is merged and the data block is continuously read, the reading will be started directly from the data block a without repeatedly reading the data blocks before the data block a in the read data block queue.
  • the acknowledgment status of the acknowledgment object corresponding to the data block can be determined that these data blocks have all been confirmed by reading. This reduces the number of read-confirmed data blocks that may be repeatedly read when the reading is continued after the processing device is interrupted, improving processing efficiency.

Abstract

本发明实施例公开了一种数据处理方法和装置,处理设备从被读取数据块队列中读取数据块;生成与所述数据块具有一一关联关系的确认对象,用于记录具有所述关联关系的数据块的确认状态;建立包括N个确认对象的确认队列,根据所述确认队列中确认对象的确认状态对被读取确认的数据块执行持久化操作,可见,处理设备根据所述确认队列中的确认对象的确认状态以及确认队列的排列顺序,将确认状态为未确认的、且较早被处理设备读取到的数据块的上一个已读取确认的数据块进行持久化操作,而不用对每一个已读取确认的数据块都进行持久化操作,在保证数据读取精度的情况下降低了持久化服务的处理压力。

Description

一种数据处理方法和装置
本申请要求2015年06月05日递交的申请号为201510303238.6、发明名称为“一种数据处理方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及数据处理领域,特别是涉及一种数据处理方法和装置。
背景技术
在互联网的数据处理中,处理设备经常会需要从数据库或其他数据存储单元中读取数据块,并对读取的数据块进行处理或者说确认。数据库等数据存储单元中被读取的数据一般是一个或多个数据块队列的形式,读取设备从该一个或多个数据块队列中依次读取并确认数据块。
在大数据的场景下,数据块队列的数据量非常的大,串行读取或者说单线程读取数据块的效率太低。目前读取设备一般都是采用并行读取数据块的方式,以提高读取效率。并行读取的方式,会造成从数据块队列中读取数据块的顺序与并行确认后的数据块的顺序不同。在读取数据块的过程中,可能会因为处理设备重启或者处理能力限制等导致出现读取中断的情况。处理设备在中断后继续读取数据块时,由于数据块的读取顺序和确认顺序不同,处理设备无法确认哪些数据块已经被确认了、哪些数据块还未被读取或哪些数据块已被读取但还未被确认。处理设备为了确保当出现中断读取后继续时,不会出现将已经读取过的数据块反复读取或者没有读取到的数据块被跳过没有被读取的情况,通常使用对每一个已读取并确认后的数据块进行持久化操作,保存到持久化区域中,所述持久化可以理解为将内存中的数据转换保存到数据库、文件等位置。处理设备通过对已读取确认的数据块持久化达到标识被持久化的数据块已经被读取确认的作用。也就是说,通过持久化的操作,处理设备在中断后继续读取数据块时不会读取已经被持久化的数据块。
然而问题是,大数据的场景下,处理设备并行读取并确认数据块的速度远大于持久化的速度,会有海量的读取并确认的数据块等待持久化服务对其持久化处理,导致持久化服务的处理压力很大。
发明内容
为了解决上述技术问题,本发明提供了一种数据处理方法和装置,使得不需要对每一个已读取确认的数据块都进行持久化操作也能保证在出现中断后继续读取的情况时,被读取数据块队列中未被读取确认的数据块不会被处理设备跳过读取,在保证数据读取精度的情况下降低了持久化服务的处理压力。
本发明实施例公开了如下技术方案:
一种数据处理方法,所述方法包括:
处理设备从被读取数据块队列中读取数据块;
所述处理设备生成与所述数据块具有一一关联关系的确认对象,所述确认对象用于记录具有所述关联关系的数据块的确认状态;
所述处理设备建立包括N个确认对象的确认队列,所述N个确认对象在所述确认队列中的排列顺序与读取顺序相同,所述读取顺序为与所述N个确认对象具有所述关联关系的N个数据块从所述被读取数据块队列中被读取出的顺序,所述确认队列中位于队列头的确认对象为所述N个确认对象中最先生成的数据对象;
若所述被读取数据块队列中的数据块被所述处理设备读取确认,所述处理设备将与被读取确认的数据块具有关联关系的确认对象的确认状态从未确认修改为已确认;
所述处理设备根据所述确认队列中确认对象的确认状态对被读取确认的数据块执行持久化操作。
优选的,所述处理设备根据所述确认队列中确认对象的确认状态对被读取确认的数据块执行持久化操作,具体包括:
所述处理设备对位于所述队列头的确认对象的确认状态进行识别,若所述确认状态为已确认,删除位于所述队列头的确认对象,根据所述排列顺序从所述确认队列中将被删除的确认对象的下一个确认对象置于所述队列头,再一次对位于所述队列头的确认对象的确认状态进行识别,直到位于所述队列头的确认对象的确认状态为未确认时,根据所述排列顺序,将位于所述队列头的确认对象前一个的、且已被删除的确认对象所关联的数据块的位置信息保存到持久化区域中,所述位置信息为对应的数据块在所述被读取数据块队列中的位置信息。
优选的,若所述处理设备从被读取数据块队列中读取数据块的过程中出现读取中断,还包括:
所述处理设备从持久化区域中获取最新保存的数据块的位置信息;
所述处理设备根据所述最新保存的数据块的位置信息从所述被读取数据块队列中对 应的位置开始继续读取数据块。
优选的,
所述处理设备使用并行读取的方式从被读取数据块队列中读取数据块。
优选的,
所述确认队列保存在缓存中。
优选的,所述确认对象用于记录具有所述关联关系的数据块的确认状态,具体包括:
若与所述确认对象具有所述关联关系的数据块未被读取确认,所述确认对象的确认状态为未确认;
若与所述确认对象具有所述关联关系的数据块已被读取确认,所述确认对象的确认状态为已确认。
一种数据处理装置,包括:
读取单元,用于从被读取数据块队列中读取数据块;
生成单元,用于生成与所述数据块具有一一关联关系的确认对象,所述确认对象用于记录具有所述关联关系的数据块的确认状态;
建立单元,用于建立包括N个确认对象的确认队列,所述N个确认对象在所述确认队列中的排列顺序与读取顺序相同,所述读取顺序为与所述N个确认对象具有所述关联关系的N个数据块从所述被读取数据块队列中被读取出的顺序,所述确认队列中位于队列头的确认对象为所述N个确认对象中最先生成的数据对象;
修改单元,用于若所述被读取数据块队列中的数据块被读取确认,将与被读取确认的数据块具有关联关系的确认对象的确认状态从未确认修改为已确认;
持久化单元,用于根据所述确认队列中确认对象的确认状态对被读取确认的数据块执行持久化操作。
优选的,
所述持久化单元,具体用于对位于所述队列头的确认对象的确认状态进行识别,若所述确认状态为已确认,删除位于所述队列头的确认对象,根据所述排列顺序从所述确认队列中将被删除的确认对象的下一个确认对象置于所述队列头,再一次对位于所述队列头的确认对象的确认状态进行识别,直到位于所述队列头的确认对象的确认状态为未确认时,根据所述排列顺序,将位于所述队列头的确认对象前一个的、且已被删除的确认对象所关联的数据块的位置信息保存到持久化区域中,所述位置信息为对应的数据块在所述被读取数据块队列中的位置信息。
优选的,若所述读取单元从被读取数据块队列中读取数据块的过程中出现读取中断,还包括:
获取单元,用于从持久化区域中获取最新保存的数据块的位置信息;
所述读取单元还用于根据所述最新保存的数据块的位置信息从所述被读取数据块队列中对应的位置开始继续读取数据块。
优选的,
所述读取单元使用并行读取的方式从被读取数据块队列中读取数据块。
优选的,
所述确认队列保存在缓存中。
优选的,所述确认对象用于记录具有所述关联关系的数据块的确认状态,具体包括:
若与所述确认对象具有所述关联关系的数据块未被读取确认,所述确认对象的确认状态为未确认;
若与所述确认对象具有所述关联关系的数据块已被读取确认,所述确认对象的确认状态为已确认。
由上述技术方案可以看出,处理设备在读取数据块时,会生成与数据块具有一一关联关系的、用于记录数据块的确认状态的确认对象,并根据从被读取数据块队列中读取数据块的读取顺序建立包括N个确认对象的确认队列,处理设备根据所述确认队列中的确认对象的确认状态以及确认队列的排列顺序,将确认状态为未确认的、且较早被处理设备读取到的数据块的上一个已读取确认的数据块进行持久化操作,而不用对每一个已读取确认的数据块都进行持久化操作,在保证数据读取精度的情况下降低了持久化服务的处理压力。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种数据处理方法的方法流程图;
图2为本发明实施例提供的一种确认队列建立示意图;
图3为本发明实施例提供的一种读取中断后继续读取的方法示意图;
图4为本发明实施例提供的一种数据处理装置的装置结构图;
图5为本发明实施例提供的一种数据处理装置的装置结构图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
在互联网的数据处理中,处理设备经常会需要从数据库或其他数据存储单元中读取数据块,并对读取的数据块进行处理或者说确认。在大数据的场景下,数据块队列的数据量非常的大,串行读取或者说单线程读取数据块的效率太低。目前读取设备一般都是采用并行读取数据块的方式,以提高读取效率。并行读取的方式,会造成从数据块队列中读取数据块的顺序与并行确认后的数据块的顺序不同。在读取数据块的过程中,可能会因为处理设备重启或者处理能力限制等导致出现读取中断的情况。处理设备在中断后继续读取数据块时,由于数据块的读取顺序和确认顺序不同,处理设备无法确认哪些数据块已经被确认了、哪些数据块还未被读取或哪些数据块已被读取但还未被确认。处理设备为了确保当出现中断读取后继续时,不会出现将已经读取过的数据块反复读取或者没有读取到的数据块被跳过没有被读取的情况,通常使用对每一个已读取并确认后的数据块进行持久化操作,保存到持久化区域中。所述持久化可以理解为将内存中的数据块转换保存到数据库、文件等地方,处理设备通过对已读取确认的数据块持久化达到标识被持久化的数据块已经被读取确认的作用。也就是说,通过持久化的操作,处理设备在中断后继续读取数据块时不会读取已经被持久化的数据块。
然而问题是,大数据的场景下,处理设备并行读取并确认数据块的速度远大于持久化的速度,会有海量的读取并确认的数据块等待持久化服务对其进行持久化处理,导致持久化服务的处理压力很大。
故为此,本发明实施例提供了一种数据处理方法和装置,处理设备在读取数据块时,会生成与数据块具有一一关联关系的、用于记录数据块的确认状态的确认对象,并根据从被读取数据块队列中读取数据块的读取顺序建立包括N个确认对象的确认队列,处理设备根据所述确认队列中的确认对象的确认状态以及确认队列的排列顺序,将确认状态为未确认的、且较早被处理设备读取到的数据块的上一个已读取确认的数据块进行持久 化操作,而不用对每一个已读取确认的数据块都进行持久化操作,在保证数据读取精度的情况下降低了持久化服务的处理压力。
使用传统的对每一个已读取确认的数据库进行持久化操作,由于持久化的速度赶不上并行读取的读取确认速度,导致会有海量的已读取确认的数据库处于等待被持久化的状态中,也就是处于未被持久化的情况,若这时所述处理设备出现了读取数据库中断,那么当所述处理设备再一次继续开始从所述被读取数据块队列读取数据块时,所述处理设备需要重新读取这些海量的且之前已经被读取确认的数据块,由此造成系统资源的浪费,降低处理设备的处理效率。
在本发明实施例中,若处理设备在读取数据块的过程中出现读取中断的情况,所述处理设备在继续读取之前,从所述持久化区域中获取最近一次保存的数据块的位置信息,也就是在读取中断前最后被执行持久化操作的数据块。所述处理设备根据所述位置信息从所述被读取数据块队列中对应的位置开始继续读取数据块。由于所述排列顺序和所述读取顺序相同,若持久化区域中最新保存为数据块a的位置信息,数据块a与当前确认队列中位于队列头的确认对象b的前一个确认对象a具有对应关系。当处理速度合并在继续读取数据块时,将直接从数据块a开始读取,而不用重复读取在被读取数据块队列中读取顺序位于数据块a之前的数据块,因为通过这些数据块所对应的确认对象的确认状态,可以确定这些数据块已经均被读取确认了。由此减少了处理设备中断后继续读取时可能会重复读取的已读取确认的数据块的数量,提高了处理效率。
实施例一
图1为本发明实施例提供的一种数据处理方法的方法流程图,所述方法包括:
S101:处理设备从被读取数据块队列中读取数据块。
S102:所述处理设备生成与所述数据块具有一一关联关系的确认对象,所述确认对象用于记录具有所述关联关系的数据块的确认状态。
举例说明,可选的,所述处理设备可以使用并行读取的方式从被读取数据块队列中读取数据块。
可选的,所述确认对象用于记录具有所述关联关系的数据块的确认状态,具体包括:
若与所述确认对象具有所述关联关系的数据块未被读取确认,所述确认对象的确认状态为未确认;
若与所述确认对象具有所述关联关系的数据块已被读取确认,所述确认对象的确认状态为已确认。
也就是说,所述确认对象的确认状态与所述确认对象对应的数据块的确认状态一致。
S103:所述处理设备建立包括N个确认对象的确认队列,所述N个确认对象在所述确认队列中的排列顺序与读取顺序相同,所述读取顺序为与所述N个确认对象具有所述关联关系的N个数据块从所述被读取数据块队列中被读取出的顺序,所述确认队列中位于队列头的确认对象为所述N个确认对象中最先生成的数据对象。
举例说明,所述处理设备从被读取数据块队列中每读取出一个数据块,就会生成一个确认对象,数据块与确认对象是一一对应的。随着数据块读取,处理设备将生成的确认对象依次组成确认队列,也就是说,处理设备读取数据块的读取顺序和确认队列中确认对象的排列顺序是相同的。所述确认队列保存在缓存中。所述缓存例如可以为内存。
如图2所示,图2为本发明实施例提供的一种确认队列建立示意图。图2中,处理设备从被读取数据块队列中先读取出数据块1,由此生成对应的确认对象1,紧接着处理设备读取数据块2,并生成对应的确认对象2,以此类推,读取数据块N,生成对应的确认对象N。可以看出,处理设备读取数据块的读取先后顺序与所述确认队列中的对应的确认对象的排列顺序是一致的。图2数据块队列中的箭头方向为读取顺序,可以理解为最左边的数据块被读取的读取时间点相对最早,从左向右读取时间点依次变大,队列最右边的数据块为最新被处理设备读取的数据块。图2确认队列中的箭头方向为排列顺序,队列最左边的确认对象为相对最早加入所述确认队列的,从左到右加入队列的时间越晚,最右边的确认对象为最新加入所述确认队列的。也就是说,处理设备读取数据块1的读取顺序先于数据块2,所述确认队列中数据块1对应的确认对象1的排列顺序也先于数据块2对应的确认对象2。
S104:若所述被读取数据块队列中的数据块被所述处理设备读取确认,所述处理设备将与被读取确认的数据块具有关联关系的确认对象的确认状态从未确认修改为已确认。
举例说明,确认对象的确认状态会随着对应的数据块的读取确认的状态改变而改变,所述处理设备刚读取到一个数据块时,该数据块处于未读取确认状态,那么该数据块对应的确认对象的确认状态也为未确认。当该数据块被所述处理设备读取确认了,则该数据块对应的确认对象的确认状态将改变为已确认。
S105:所述处理设备根据所述确认队列中确认对象的确认状态对被读取确认的数据块执行持久化操作。
举例说明,位置信息可用于下次重新定位到该数据块,以便在读取中断后继续时, 可以根据位置信息在所述被读取数据块队列中重新从该数据块开始进行顺序读取。
可选的,本发明实施例提供了一种具体的根据所述确认队列中确认对象的确认状态对被读取确认的数据块执行持久化操作的方式,具体包括:
所述处理设备对位于所述队列头的确认对象的确认状态进行识别,若所述确认状态为已确认,删除位于所述队列头的确认对象,根据所述排列顺序从所述确认队列中将被删除的确认对象的下一个确认对象置于所述队列头,再一次对位于所述队列头的确认对象的确认状态进行识别,直到位于所述队列头的确认对象的确认状态为未确认时,根据所述排列顺序,将位于所述队列头的确认对象前一个的、且已被删除的确认对象所关联的数据块的位置信息保存到持久化区域中,所述位置信息为对应的数据块在所述被读取数据块队列中的位置信息。
举例说明,处理设备不需要对所有已读取确认的数据块均执行持久化操作。处理设备在查询所述确认队列时,通过所述确认队列中各个确认对象的确认状态,当所述队列头中的确认对象的确认状态为已确认,则删除,将所述排列顺序中该被删除的确认对象的下一个确认对象置于所述队列头中,例如,如图2所示,数据块1、数据块3和数据块4被读取确认,确认对象1、确认对象3和确认对象4的确认状态会相应的改变为已确认,数据块2和数据块5还未被处理设备读取确认。那么处理设备在查询确认列表时,发现队列头中的确认对象1的确认状态为已确认,删除确认对象1,将确认对象2作为队列头继续判断,由于确认对象2的确认状态为未确认,处理设备将存储确认对象1所对应的数据块1的位置信息保存到持久化区域中。接下来,若数据块2也被处理设备读取确认了,确认对象2的确认状态会改变为已确认,处理设备将删除确认对象2,将确认对象3作为队列头继续判断,由于确认对象3和确认对象4的确认状态均为已确认,处理设备将删除确认对象3和确认对象4,将确认对象5作为队列头继续判断,确认对象5的确认状态为未确认,则保留确认对象5,并将确认对象4所对应的数据块4的位置信息保存到持久化区域中。相同的场景下,若使用传统的持久化方法,需要对数据块1、数据块2、数据块3和数据块4均执行持久化操作,也就是需要执行四次;而通过本发明实施例,只需要对数据块1和数据块4执行持久化操作,即只需要执行两次,而减少执行了两次持久化操作。可以有效降低了持久化服务的处理压力,至少可以使需要被持久化的数据块等待被持久化的时间变短。
由此可见,处理设备在读取数据块时,会生成与数据块具有一一关联关系的、用于记录数据块的确认状态的确认对象,并根据从被读取数据块队列中读取数据块的读取顺 序建立包括N个确认对象的确认队列,处理设备根据所述确认队列中的确认对象的确认状态以及确认队列的排列顺序,将确认状态为未确认的、且较早被处理设备读取到的数据块的上一个已读取确认的数据块进行持久化操作,而不用对每一个已读取确认的数据块都进行持久化操作,在保证数据读取精度的情况下降低了持久化服务的处理压力。
实施例二
使用传统的方式对每一个已读取确认的数据库进行持久化操作,由于持久化的速度赶不上并行读取的读取确认速度,导致会有海量的已读取确认的数据库处于等待被持久化的状态中,也就是处于未被持久化的情况,若这时所述处理设备出现了读取数据库中断,那么当所述处理设备再一次继续开始从所述被读取数据块队列读取数据块时,所述处理设备需要重新读取这些海量的且之前已经被读取确认的数据块,由此造成系统资源的浪费,降低处理设备的处理效率。
在图1所对应实施例的基础上,图3为本发明实施例提供的一种读取中断后继续读取的方法示意图。若所述处理设备从被读取数据块队列中读取数据块的过程中出现读取中断,所述方法包括:
S301:所述处理设备从持久化区域中获取最新保存的数据块的位置信息。
举例说明,所述最新保存的数据块的位置信息可以理解为在读取中断之前,最后一个保存到持久化区域中的,或者保存到持久化区域中的时间距离读取中断时间最短的。
S302:所述处理设备根据所述最新保存的数据块的位置信息从所述被读取数据块队列中对应的位置开始继续读取数据块。
举例说明,例如,如图2所示,数据块1和数据块2已经被读取确认,数据块4和数据块5也已经被读取确认,数据块3尚未被读取确认,若此时处理设备的读取被中断了,那么在持久化区域中最新保存的数据块的位置信息应该是处理设备读取数据块3之前的数据块2的位置信息。
中断后,当处理设备准备继续从所述被读取数据块队列读取数据块时,通过S301获取到的数据块2的位置信息,根据数据块2的位置信息从从所述被读取数据块队列中数据块2对应的位置开始继续读取,这样,处理设备可以不用重复读取确认数据块1和数据块2,减少了处理设备中断后继续读取时可能会重复读取的数据块的数量。且能够保证未确认的数据块一定能再次读取到,不会发生丢失现象。
由上述实施例可以看出,在本发明实施例中,若处理设备在读取数据块的过程中出 现读取中断的情况,所述处理设备在继续读取之前,从所述持久化区域中获取最近一次保存的数据块的位置信息,也就是在中断前最后被执行持久化操作的数据块,所述处理设备根据所述位置信息从所述被读取数据块队列中对应的位置开始继续读取数据块。由于所述排列顺序和所述读取顺序相同,若持久化区域中最新保存为数据块a的位置信息,数据块a与当前确认队列中位于队列头的确认对象b的前一个确认对象a具有对应关系。当处理速度合并在继续读取数据块时,将直接从数据块a开始读取,而不用重复读取在被读取数据块队列中读取顺序位于数据块a之前的数据块,因为通过这些数据块所对应的确认对象的确认状态,可以确定这些数据块已经均被读取确认了。由此减少了处理设备中断后继续读取时可能会重复读取的已读取确认的数据块的数量,提高了处理效率。
实施例三
图4为本发明实施例提供的一种数据处理装置的装置结构图,包括:
读取单元401,用于从被读取数据块队列中读取数据块。
举例说明,可选的,所述读取单元401使用并行读取的方式从被读取数据块队列中读取数据块。以便提高读取速度。
生成单元402,用于生成与所述数据块具有一一关联关系的确认对象,所述确认对象用于记录具有所述关联关系的数据块的确认状态。
举例说明,可选的,所述确认对象用于记录具有所述关联关系的数据块的确认状态,具体包括:
若与所述确认对象具有所述关联关系的数据块未被读取确认,所述确认对象的确认状态为未确认;
若与所述确认对象具有所述关联关系的数据块已被读取确认,所述确认对象的确认状态为已确认。
也就是说,所述确认对象的确认状态与所述确认对象对应的数据块的确认状态一致。
建立单元403,用于建立包括N个确认对象的确认队列,所述N个确认对象在所述确认队列中的排列顺序与读取顺序相同,所述读取顺序为与所述N个确认对象具有所述关联关系的N个数据块从所述被读取数据块队列中被读取出的顺序,所述确认队列中位于队列头的确认对象为所述N个确认对象中最先生成的数据对象。
举例说明,所述读取单元401从被读取数据块队列中每读取出一个数据块,所述生成单元402就会生成一个确认对象,数据块与确认对象是一一对应的。随着数据块读取, 所述建立单元403将生成的确认对象依次组成确认队列,也就是说,所述读取单元401读取数据块的读取顺序和确认队列中确认对象的排列顺序是相同的。所述确认队列保存在缓存中。所述缓存例如可以为内存。
图2中,处理设备从被读取数据块队列中先读取出数据块1,由此生成对应的确认对象1,紧接着处理设备读取数据块2,并生成对应的确认对象2,以此类推,读取数据块N,生成对应的确认对象N。可以看出,处理设备读取数据块的读取先后顺序与所述确认队列中的对应的确认对象的排列顺序是一致的。图2数据块队列中的箭头方向为读取顺序,可以理解为最左边的数据块被读取的读取时间点相对最早,从左向右读取时间点依次变大,队列最右边的数据块为最新被处理设备读取的数据块。图2确认队列中的箭头方向为排列顺序,队列最左边的确认对象为相对最早加入所述确认队列的,从左到右加入队列的时间越晚,最右边的确认对象为最新加入所述确认队列的。也就是说,处理设备读取数据块1的读取顺序先于数据块2,所述确认队列中数据块1对应的确认对象1的排列顺序也先于数据块2对应的确认对象2。
修改单元404,用于若所述被读取数据块队列中的数据块被读取确认,将与被读取确认的数据块具有关联关系的确认对象的确认状态从未确认修改为已确认。
举例说明,确认对象的确认状态会随着对应的数据块的读取确认的状态改变而改变,所述读取单元401刚读取到一个数据块时,该数据块处于未读取确认状态,那么该数据块对应的确认对象的确认状态也为未确认。当该数据块被读取确认了,则该数据块对应的确认对象的确认状态将改变为已确认。
持久化单元405,用于根据所述确认队列中确认对象的确认状态对被读取确认的数据块执行持久化操作。
举例说明,位置信息可用于下次重新定位到该数据块,以便在读取中断后继续时,可以根据位置信息在所述被读取数据块队列中重新从该数据块开始进行顺序读取。
可选的,所述持久化单元405具体用于对位于所述队列头的确认对象的确认状态进行识别,若所述确认状态为已确认,删除位于所述队列头的确认对象,根据所述排列顺序从所述确认队列中将被删除的确认对象的下一个确认对象置于所述队列头,再一次对位于所述队列头的确认对象的确认状态进行识别,直到位于所述队列头的确认对象的确认状态为未确认时,根据所述排列顺序,将位于所述队列头的确认对象前一个的、且已被删除的确认对象所关联的数据块的位置信息保存到持久化区域中,所述位置信息为对应的数据块在所述被读取数据块队列中的位置信息。
举例说明,所述持久化单元405不需要对所有已读取确认的数据块均执行持久化操作。所述持久化单元405在查询所述确认队列时,通过所述确认队列中各个确认对象的确认状态,当所述队列头中的确认对象的确认状态为已确认,则删除,将所述排列顺序中该被删除的确认对象的下一个确认对象置于所述队列头中,例如,如图2所示,数据块1、数据块3和数据块4被读取确认,确认对象1、确认对象3和确认对象4的确认状态会相应的改变为已确认,数据块2和数据块5还未被处理设备读取确认。那么所述持久化单元405在查询确认列表时,发现队列头中的确认对象1的确认状态为已确认,删除确认对象1,将确认对象2作为队列头继续判断,由于确认对象2的确认状态为未确认,所述持久化单元405将存储确认对象1所对应的数据块1的位置信息保存到持久化区域中。接下来,若数据块2也被处理设备读取确认了,确认对象2的确认状态会改变为已确认,所述持久化单元405将删除确认对象2,将确认对象3作为队列头继续判断,由于确认对象3和确认对象4的确认状态均为已确认,所述持久化单元405将删除确认对象3和确认对象4,将确认对象5作为队列头继续判断,确认对象5的确认状态为未确认,则保留确认对象5,并将确认对象4所对应的数据块4的位置信息保存到持久化区域中。相同的场景下,若使用传统的持久化方法,需要对数据块1、数据块2、数据块3和数据块4均执行持久化操作,也就是需要执行四次;而通过本发明实施例,只需要对数据块1和数据块4执行持久化操作,即只需要执行两次,减少执行了两次持久化操作。可以有效降低了持久化服务的处理压力,至少可以使需要被持久化的数据块等待被持久化的时间变短。
由此可见,处理设备在读取数据块时,会生成与数据块具有一一关联关系的、用于记录数据块的确认状态的确认对象,并根据从被读取数据块队列中读取数据块的读取顺序建立包括N个确认对象的确认队列,处理设备根据所述确认队列中的确认对象的确认状态以及确认队列的排列顺序,将确认状态为未确认的、且较早被处理设备读取到的数据块的上一个已读取确认的数据块进行持久化操作,而不用对每一个已读取确认的数据块都进行持久化操作,在保证数据读取精度的情况下降低了持久化服务的处理压力。
实施例四
使用传统的方式对每一个已读取确认的数据库进行持久化操作,由于持久化的速度赶不上并行读取的读取确认速度,导致会有海量的已读取确认的数据库处于等待被持久化的状态中,也就是处于未被持久化的情况,若这时所述读取单元401出现了读取数据 库中断,那么当所述读取单元401再一次继续开始从所述被读取数据块队列读取数据块时,所述读取单元401需要重新读取这些海量的且之前已经被读取确认的数据块,由此造成系统资源的浪费,降低处理设备的处理效率。
在图4所对应实施例的基础上,图5为本发明实施例提供的一种数据处理装置的装置结构图,若所述读取单元401从被读取数据块队列中读取数据块的过程中出现读取中断,还包括:
获取单元501,用于从持久化区域中获取最新保存的数据块的位置信息。
所述读取单元401还用于根据所述最新保存的数据块的位置信息从所述被读取数据块队列中对应的位置开始继续读取数据块。
举例说明,例如,如图2所示,数据块1和数据块2已经被读取确认,数据块4和数据块5也已经被读取确认,数据块3尚未被读取确认,若此时所述读取单元401的读取被中断了,那么在持久化区域中最新保存的数据块的位置信息应该是所述读取单元401读取数据块3之前的数据块2的位置信息。
中断后,当所述读取单元401准备继续从所述被读取数据块队列读取数据块时,通过所述获取单元501获取到的数据块2的位置信息,根据数据块2的位置信息从从所述被读取数据块队列中数据块2对应的位置开始继续读取,这样,所述读取单元401可以不用重复读取确认数据块1和数据块2,减少了处理设备中断后继续读取时可能会重复读取的数据块的数量。且能够保证未确认的数据块一定能再次读取到,不会发生丢失现象。
由上述实施例可以看出,在本发明实施例中,若处理设备在读取数据块的过程中出现读取中断的情况,所述处理设备在继续读取之前,从所述持久化区域中获取最近一次保存的数据块的位置信息,也就是在中断前最后被执行持久化操作的数据块,所述处理设备根据所述位置信息从所述被读取数据块队列中对应的位置开始继续读取数据块。由于所述排列顺序和所述读取顺序相同,若持久化区域中最新保存为数据块a的位置信息,数据块a与当前确认队列中位于队列头的确认对象b的前一个确认对象a具有对应关系。当处理速度合并在继续读取数据块时,将直接从数据块a开始读取,而不用重复读取在被读取数据块队列中读取顺序位于数据块a之前的数据块,因为通过这些数据块所对应的确认对象的确认状态,可以确定这些数据块已经均被读取确认了。由此减少了处理设备中断后继续读取时可能会重复读取的已读取确认的数据块的数量,提高了处理效率。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到上述实施例 方法中的全部或部分步骤可借助软件加通用硬件平台的方式来实现。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者诸如媒体网关等网络通信设备)执行本发明各个实施例或者实施例的某些部分所述的方法。
需要说明的是,本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于设备及系统实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的设备及系统实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
以上所述仅是本发明的优选实施方式,并非用于限定本发明的保护范围。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。

Claims (12)

  1. 一种数据处理方法,其特征在于,所述方法包括:
    处理设备从被读取数据块队列中读取数据块;
    所述处理设备生成与所述数据块具有一一关联关系的确认对象,所述确认对象用于记录具有所述关联关系的数据块的确认状态;
    所述处理设备建立包括N个确认对象的确认队列,所述N个确认对象在所述确认队列中的排列顺序与读取顺序相同,所述读取顺序为与所述N个确认对象具有所述关联关系的N个数据块从所述被读取数据块队列中被读取出的顺序,所述确认队列中位于队列头的确认对象为所述N个确认对象中最先生成的数据对象;
    若所述被读取数据块队列中的数据块被所述处理设备读取确认,所述处理设备将与被读取确认的数据块具有关联关系的确认对象的确认状态从未确认修改为已确认;
    所述处理设备根据所述确认队列中确认对象的确认状态对被读取确认的数据块执行持久化操作。
  2. 根据权利要求1所述的方法,其特征在于,所述处理设备根据所述确认队列中确认对象的确认状态对被读取确认的数据块执行持久化操作,具体包括:
    所述处理设备对位于所述队列头的确认对象的确认状态进行识别,若所述确认状态为已确认,删除位于所述队列头的确认对象,根据所述排列顺序从所述确认队列中将被删除的确认对象的下一个确认对象置于所述队列头,再一次对位于所述队列头的确认对象的确认状态进行识别,直到位于所述队列头的确认对象的确认状态为未确认时,根据所述排列顺序,将位于所述队列头的确认对象前一个的、且已被删除的确认对象所关联的数据块的位置信息保存到持久化区域中,所述位置信息为对应的数据块在所述被读取数据块队列中的位置信息。
  3. 根据权利要求1所述的方法,其特征在于,若所述处理设备从被读取数据块队列中读取数据块的过程中出现读取中断,还包括:
    所述处理设备从持久化区域中获取最新保存的数据块的位置信息;
    所述处理设备根据所述最新保存的数据块的位置信息从所述被读取数据块队列中对应的位置开始继续读取数据块。
  4. 根据权利要求1所述的方法,其特征在于,
    所述处理设备使用并行读取的方式从被读取数据块队列中读取数据块。
  5. 根据权利要求1所述的方法,其特征在于,
    所述确认队列保存在缓存中。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述确认对象用于记录具有所述关联关系的数据块的确认状态,具体包括:
    若与所述确认对象具有所述关联关系的数据块未被读取确认,所述确认对象的确认状态为未确认;
    若与所述确认对象具有所述关联关系的数据块已被读取确认,所述确认对象的确认状态为已确认。
  7. 一种数据处理装置,其特征在于,包括:
    读取单元,用于从被读取数据块队列中读取数据块;
    生成单元,用于生成与所述数据块具有一一关联关系的确认对象,所述确认对象用于记录具有所述关联关系的数据块的确认状态;
    建立单元,用于建立包括N个确认对象的确认队列,所述N个确认对象在所述确认队列中的排列顺序与读取顺序相同,所述读取顺序为与所述N个确认对象具有所述关联关系的N个数据块从所述被读取数据块队列中被读取出的顺序,所述确认队列中位于队列头的确认对象为所述N个确认对象中最先生成的数据对象;
    修改单元,用于若所述被读取数据块队列中的数据块被读取确认,将与被读取确认的数据块具有关联关系的确认对象的确认状态从未确认修改为已确认;
    持久化单元,用于根据所述确认队列中确认对象的确认状态对被读取确认的数据块执行持久化操作。
  8. 根据权利要求7所述的装置,其特征在于,
    所述持久化单元,具体用于对位于所述队列头的确认对象的确认状态进行识别,若所述确认状态为已确认,删除位于所述队列头的确认对象,根据所述排列顺序从所述确认队列中将被删除的确认对象的下一个确认对象置于所述队列头,再一次对位于所述队列头的确认对象的确认状态进行识别,直到位于所述队列头的确认对象的确认状态为未确认时,根据所述排列顺序,将位于所述队列头的确认对象前一个的、且已被删除的确认对象所关联的数据块的位置信息保存到持久化区域中,所述位置信息为对应的数据块在所述被读取数据块队列中的位置信息。
  9. 根据权利要求7所述的装置,其特征在于,若所述读取单元从被读取数据块队列中读取数据块的过程中出现读取中断,还包括:
    获取单元,用于从持久化区域中获取最新保存的数据块的位置信息;
    所述读取单元还用于根据所述最新保存的数据块的位置信息从所述被读取数据块队列中对应的位置开始继续读取数据块。
  10. 根据权利要求7所述的装置,其特征在于,
    所述读取单元使用并行读取的方式从被读取数据块队列中读取数据块。
  11. 根据权利要求7所述的装置,其特征在于,
    所述确认队列保存在缓存中。
  12. 根据权利要求7至11任一项所述的装置,其特征在于,所述确认对象用于记录具有所述关联关系的数据块的确认状态,具体包括:
    若与所述确认对象具有所述关联关系的数据块未被读取确认,所述确认对象的确认状态为未确认;
    若与所述确认对象具有所述关联关系的数据块已被读取确认,所述确认对象的确认状态为已确认。
PCT/CN2016/083890 2015-06-05 2016-05-30 一种数据处理方法和装置 WO2016192605A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510303238.6 2015-06-05
CN201510303238.6A CN106294477B (zh) 2015-06-05 2015-06-05 一种数据处理方法和装置

Publications (1)

Publication Number Publication Date
WO2016192605A1 true WO2016192605A1 (zh) 2016-12-08

Family

ID=57440116

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/083890 WO2016192605A1 (zh) 2015-06-05 2016-05-30 一种数据处理方法和装置

Country Status (2)

Country Link
CN (1) CN106294477B (zh)
WO (1) WO2016192605A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109597832A (zh) * 2018-10-25 2019-04-09 平安科技(深圳)有限公司 数据处理方法、计算设备、存储设备和存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679276B (zh) * 2017-08-31 2021-06-18 电力规划总院有限公司 电力系统潮流接线图的生成方法、装置及电子设备
CN110457272B (zh) * 2019-08-15 2022-02-01 中国银行股份有限公司 票据批量处理方法和装置
CN115098537B (zh) * 2021-10-19 2023-03-10 腾讯科技(深圳)有限公司 事务执行方法、装置、计算设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436490A (zh) * 2010-10-28 2012-05-02 微软公司 多功能存储器内数据库恢复
CN102541751A (zh) * 2010-11-18 2012-07-04 微软公司 用于数据去重复的可缩放块存储
CN103443773A (zh) * 2011-08-12 2013-12-11 甲骨文国际公司 利用多个存储设备减少数据库录入的写延迟的方法和系统
CN103870570A (zh) * 2014-03-14 2014-06-18 广州携智信息科技有限公司 一种基于远程日志备份的HBase数据可用性及持久性的方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541746B (zh) * 2010-12-07 2014-11-26 深圳市金蝶中间件有限公司 一种队列的数据处理方法和系统
CN102567490B (zh) * 2011-12-21 2013-12-04 华为技术有限公司 数据库内的描述信息的恢复和数据的缓存方法及设备
US10304276B2 (en) * 2012-06-07 2019-05-28 Universal City Studios Llc Queue management system and method
CN103678577B (zh) * 2013-12-10 2017-10-24 新浪网技术(中国)有限公司 一种数据更新方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436490A (zh) * 2010-10-28 2012-05-02 微软公司 多功能存储器内数据库恢复
CN102541751A (zh) * 2010-11-18 2012-07-04 微软公司 用于数据去重复的可缩放块存储
CN103443773A (zh) * 2011-08-12 2013-12-11 甲骨文国际公司 利用多个存储设备减少数据库录入的写延迟的方法和系统
CN103870570A (zh) * 2014-03-14 2014-06-18 广州携智信息科技有限公司 一种基于远程日志备份的HBase数据可用性及持久性的方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109597832A (zh) * 2018-10-25 2019-04-09 平安科技(深圳)有限公司 数据处理方法、计算设备、存储设备和存储介质
CN109597832B (zh) * 2018-10-25 2023-05-23 平安科技(深圳)有限公司 数据处理方法、计算设备、存储设备和存储介质

Also Published As

Publication number Publication date
CN106294477A (zh) 2017-01-04
CN106294477B (zh) 2019-10-01

Similar Documents

Publication Publication Date Title
US10133679B2 (en) Read cache management method and apparatus based on solid state drive
WO2016192605A1 (zh) 一种数据处理方法和装置
US10382380B1 (en) Workload management service for first-in first-out queues for network-accessible queuing and messaging services
CN107480150B (zh) 一种文件加载方法和装置
JP2019504412A (ja) ショートリンクの処理方法、デバイス、及びサーバ
TWI547869B (zh) 處理檔案變更的方法及相關電子裝置
CN108228102B (zh) 节点间数据迁移方法、装置、计算设备及计算机存储介质
WO2021098260A1 (zh) 一种数据删除方法、系统、设备及计算机可读存储介质
WO2016155635A1 (zh) 一种数据处理方法和设备
US10771358B2 (en) Data acquisition device, data acquisition method and storage medium
CN113595822B (zh) 一种数据包管理方法、系统和装置
US11544232B2 (en) Efficient transaction log and database processing
WO2017219867A1 (zh) 一种短消息重试处理方法及装置、系统
WO2019019382A1 (zh) 缓存处理方法、装置、计算机设备和存储介质
JP2019510305A (ja) データ集約方法および装置
WO2020024446A1 (zh) 数据的存储方法及装置、存储介质、计算机设备
US10073657B2 (en) Data processing apparatus, data processing method, and computer program product, and entry processing apparatus
CN112698783A (zh) 对象存储方法、装置及系统
CN107846476B (zh) 一种信息同步方法、设备及存储介质
CN110168513B (zh) 在不同存储系统中对大文件的部分存储
EP3776252A1 (en) Resumable merge sort
WO2022095349A1 (zh) 一种集群拓扑更新方法、系统、设备及计算机存储介质
CN112711572B (zh) 适用于分库分表的在线扩容方法和装置
WO2018094958A1 (zh) 一种数据处理方法、装置及系统
CN107145302B (zh) 一种用于在分布式存储系统中执行文件写入的方法与设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16802528

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16802528

Country of ref document: EP

Kind code of ref document: A1