CN116414733B - Data processing method, device, computer equipment and storage medium - Google Patents

Data processing method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN116414733B
CN116414733B CN202310204662.XA CN202310204662A CN116414733B CN 116414733 B CN116414733 B CN 116414733B CN 202310204662 A CN202310204662 A CN 202310204662A CN 116414733 B CN116414733 B CN 116414733B
Authority
CN
China
Prior art keywords
median
data
memory
data set
disk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310204662.XA
Other languages
Chinese (zh)
Other versions
CN116414733A (en
Inventor
简小云
李洁玮
吴东锴
张文
许颖芯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
INSIGMA TECHNOLOGY CO LTD
HONG KONG-ZHUHAI-MACAO BRIDGE AUTHORITY
Original Assignee
INSIGMA TECHNOLOGY CO LTD
HONG KONG-ZHUHAI-MACAO BRIDGE AUTHORITY
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by INSIGMA TECHNOLOGY CO LTD, HONG KONG-ZHUHAI-MACAO BRIDGE AUTHORITY filed Critical INSIGMA TECHNOLOGY CO LTD
Priority to CN202310204662.XA priority Critical patent/CN116414733B/en
Publication of CN116414733A publication Critical patent/CN116414733A/en
Application granted granted Critical
Publication of CN116414733B publication Critical patent/CN116414733B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0646Configuration or reconfiguration
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

The present application relates to a data processing method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: confirming a median data set in the disk according to the ordered data set in the disk; storing a median data set in the disk into a memory, updating the median of the data in the memory under the condition that new data is stored in the memory, and confirming the change information of the median of the data in the memory as a median floating value; in the case of transferring new data in the memory to the ordered data set, updating the median and the median data set in the disk according to the median floating value, taking the updated median data set in the disk as the median data set in the disk, and jumping to the step of storing the median data set in the disk in the memory … … and confirming the change information of the median of the data in the memory as the median floating value until the new data is not stored in the memory. The method can improve the processing efficiency of data processing.

Description

Data processing method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technology, and in particular, to a data processing method, apparatus, computer device, storage medium, and computer program product.
Background
With the development of computer technology, data in a computer gradually presents sea level; the role played by data processing is also becoming increasingly important. In the data processing process, the overall distribution of the data is generally described by the median.
In the conventional data processing method, since the data volume is too large, the data cannot be read into the memory to be sequenced, so that a large amount of disordered data is generally stored in the disk, then a part of the disordered data is sequentially read into the memory to be sequenced, a temporary file is generated based on the sequencing result, the temporary file is stored in the disk, and finally all the temporary files in the disk are merged to obtain the sequencing result, and the median is obtained according to the sequencing structure.
However, in the conventional data processing method, after the disk obtains the temporary files, the disk also needs to merge each temporary file to obtain the median; therefore, the conventional data processing method is low in processing efficiency.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data processing method, apparatus, computer device, computer-readable storage medium, and computer program product that can improve processing efficiency.
In a first aspect, the present application provides a data processing method. The method comprises the following steps:
transferring ordered data in the memory to a disk associated with the memory as an ordered data set in the disk under the condition that the quantity of data in the memory meets a memory capacity threshold;
confirming the median of the ordered data set in the disk as the median of the median data set in the disk; a median data set in the magnetic disk is determined according to the median of the ordered data set and the data before and after the median of the ordered data set;
storing a median data set in the disk into the memory, updating the median of the data in the memory under the condition that the memory stores new data, and confirming the change information of the median of the data in the memory as a median floating value;
and under the condition that the new data in the memory is transferred to the ordered data set in the disk, updating the median and the median data set of the ordered data set in the disk according to the median floating value, taking the updated median data set in the disk as the median data set in the disk, jumping to store the median data set in the disk into the memory, and under the condition that the memory stores the new data, updating the median of the data in the memory, and confirming the change information of the median of the data in the memory as the median floating value until the new data is not stored in the memory.
In one embodiment, when the memory stores new data, updating the median of the data in the memory, and determining the change information of the median of the data in the memory as the median floating value includes:
when the memory stores new data, sorting the new data and the data in the median dataset in the memory, and updating the median of the sorted data;
in the case that the sum of the numbers of the new data in the memory and the ordered data set in the disk is odd and the new data is greater than the median of the median data set, or in the case that the sum of the numbers of the new data in the memory and the ordered data set in the disk is even and the new data is less than the median of the median data set, determining position difference information between the median of the ordered data and the initial median as change information of the median of the data in the memory; the initial median is the median of the median data set in the memory;
And taking the change information as the median floating value.
In one embodiment, the position difference information between the median of the ordered data and the initial median is obtained by:
confirming a first data offset of the initial median; the first data offset is the initial median and is relative to the number offset of the first data of the ordered data in the memory;
confirming a second data offset of a median of the ordered data in the memory; the second offset is the number offset of the first data of the ordered data in the memory relative to the ordered median in the memory;
and confirming the difference information between the first data offset and the second data offset as the position difference information.
In one embodiment, in the case of transferring the new data in the memory to the ordered data set in the disk, updating the median and the median data set of the ordered data set in the disk according to the median float value includes:
under the condition that the new data in the memory is transferred to an ordered data set in the disk, confirming the median of the ordered data set after transferring the new data as the updated median of the ordered data set according to the median floating value and the median of the ordered data set before transferring the new data;
Updating the median dataset in the disk according to the updated median of the ordered dataset after transferring the new data and the updated data before and after the updated median in the ordered dataset after transferring the new data;
in the case of transferring the new data in the memory to an ordered data set in the disk, updating the median of the ordered data set and the median data set in the disk according to the median floating value, further includes:
and deleting the median data set in the memory.
In one embodiment, the method further comprises:
obtaining the total amount of data in the memory according to the amount of the new data in the memory and the amount of the data of the median dataset;
and transferring the new data to the ordered data set in the magnetic disk under the condition that the total quantity of the data in the memory meets the memory capacity threshold value or the condition that the median floating value meets the preset floating value threshold value.
In one embodiment, before confirming the median of the ordered data sets in the disk as the median of the median data sets in the disk, the method further comprises:
Under the condition that a head identifier exists in the ordered data set in the magnetic disk, querying a tail identifier in the data of the ordered data set; the header identifier is an identifier of first data in the ordered data in the memory; the tail mark is the mark of last data in the ordered data in the memory;
under the condition that tail marks exist in ordered data sets in the disk, the ordered data sets in the disk are confirmed to be the ordered data sets successfully transferred from the memory;
the identifying the median of the ordered data sets in the disk as the median of the median data sets in the disk comprises:
and under the condition that the ordered data set in the disk is the ordered data set successfully transferred from the memory, confirming the median of the ordered data set in the disk as the median of the median data set in the disk.
In a second aspect, the present application also provides a data processing apparatus. The device comprises:
the memory data transfer module is used for transferring the ordered data in the memory to a disk associated with the memory as an ordered data set in the disk under the condition that the quantity of the data in the memory meets a memory capacity threshold;
The median data confirming module is used for confirming the median of the ordered data set in the disk and taking the median as the median of the median data set in the disk; a median data set in the magnetic disk is determined according to the median of the ordered data set and the data before and after the median of the ordered data set;
the floating information confirming module is used for storing a median data set in the magnetic disk into the memory, updating the median of the data in the memory under the condition that the memory stores new data, and confirming the change information of the median of the data in the memory as a median floating value;
and the median data updating module is used for updating the median and the median data set of the ordered data set in the disk according to the median floating value under the condition that the new data in the memory is transferred to the ordered data set in the disk, jumping to store the median data set in the disk into the memory by taking the updated median data set in the disk as the median data set in the disk, updating the median of the data in the memory under the condition that the memory stores the new data, and confirming the change information of the median of the data in the memory as the median floating value until the new data is not stored in the memory.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
transferring ordered data in the memory to a disk associated with the memory as an ordered data set in the disk under the condition that the quantity of data in the memory meets a memory capacity threshold;
confirming the median of the ordered data set in the disk as the median of the median data set in the disk; a median data set in the magnetic disk is determined according to the median of the ordered data set and the data before and after the median of the ordered data set;
storing a median data set in the disk into the memory, updating the median of the data in the memory under the condition that the memory stores new data, and confirming the change information of the median of the data in the memory as a median floating value;
and under the condition that the new data in the memory is transferred to the ordered data set in the disk, updating the median and the median data set of the ordered data set in the disk according to the median floating value, taking the updated median data set in the disk as the median data set in the disk, jumping to store the median data set in the disk into the memory, and under the condition that the memory stores the new data, updating the median of the data in the memory, and confirming the change information of the median of the data in the memory as the median floating value until the new data is not stored in the memory.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
transferring ordered data in the memory to a disk associated with the memory as an ordered data set in the disk under the condition that the quantity of data in the memory meets a memory capacity threshold;
confirming the median of the ordered data set in the disk as the median of the median data set in the disk; a median data set in the magnetic disk is determined according to the median of the ordered data set and the data before and after the median of the ordered data set;
storing a median data set in the disk into the memory, updating the median of the data in the memory under the condition that the memory stores new data, and confirming the change information of the median of the data in the memory as a median floating value;
and under the condition that the new data in the memory is transferred to the ordered data set in the disk, updating the median and the median data set of the ordered data set in the disk according to the median floating value, taking the updated median data set in the disk as the median data set in the disk, jumping to store the median data set in the disk into the memory, and under the condition that the memory stores the new data, updating the median of the data in the memory, and confirming the change information of the median of the data in the memory as the median floating value until the new data is not stored in the memory.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:
transferring ordered data in the memory to a disk associated with the memory as an ordered data set in the disk under the condition that the quantity of data in the memory meets a memory capacity threshold;
confirming the median of the ordered data set in the disk as the median of the median data set in the disk; a median data set in the magnetic disk is determined according to the median of the ordered data set and the data before and after the median of the ordered data set;
storing a median data set in the disk into the memory, updating the median of the data in the memory under the condition that the memory stores new data, and confirming the change information of the median of the data in the memory as a median floating value;
and under the condition that the new data in the memory is transferred to the ordered data set in the disk, updating the median and the median data set of the ordered data set in the disk according to the median floating value, taking the updated median data set in the disk as the median data set in the disk, jumping to store the median data set in the disk into the memory, and under the condition that the memory stores the new data, updating the median of the data in the memory, and confirming the change information of the median of the data in the memory as the median floating value until the new data is not stored in the memory.
The data processing method, the data processing device, the computer equipment, the storage medium and the computer program product firstly transfer the ordered data in the memory to a disk associated with the memory as an ordered data set in the disk under the condition that the quantity of the data in the memory meets the memory capacity threshold; then confirming the median of the ordered data set in the disk as the median of the median data set in the disk; then, a median data set in the disk is stored in the memory, and the median of the data in the memory is updated and the change information of the median of the data in the memory is confirmed as a median floating value under the condition that new data is stored in the memory; and finally, under the condition that new data in the memory are transferred to the ordered data set in the disk, updating the median and the median data set of the ordered data set in the disk according to the median floating value, taking the updated median data set in the disk as the median data set in the disk, jumping to the step that the median data set in the disk is stored in the memory, under the condition that the new data are stored in the memory, updating the median of the data in the memory, and confirming the change information of the median of the data in the memory as the median floating value until the new data are not stored in the memory. In this way, the median data set in the memory is determined according to the ordered data set in the disk, and when new data is stored in the memory, the median floating value of the ordered data set in the disk can be confirmed according to the median change information of the data in the memory, so that the median in the ordered data set is updated according to the median floating value; in the process, the calculation advantages of the memory and the storage advantages of the magnetic disk are reasonably exerted, and the median floating value representing the floating influence of the new data on the median of the ordered data set is predicted and recorded while the new data is stored in the memory, so that the median can be quickly positioned according to the median floating value without repositioning the median based on all the data in the magnetic disk when the new data is stored in the magnetic disk, and the processing efficiency of data processing is improved.
Drawings
FIG. 1 is a flow diagram of a data processing method in one embodiment;
FIG. 2 is a flowchart illustrating a step of determining a median change information of data in a memory as a median floating value according to one embodiment;
FIG. 3 is a flow chart of a data processing method according to another embodiment;
FIG. 4 is a flow diagram of a low resource consumption median method for massive real-time data in one embodiment;
FIG. 5 is a block diagram of a data processing apparatus in one embodiment;
fig. 6 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
It should be noted that in the big data age, the meaning of data is not only represented in the data amount, but also in the information contained in the data; the information behind the data is fully mined, and decision making can be assisted by decision making staff. Through analysis and processing of big data, the characteristics and the change trend of the things to which the data belong can be described, and even the development trend of the things to which the data belong can be predicted; while the median is in the middle of a set of data, and is not affected by extreme values in the data, the median is often used to describe the overall distribution characteristics of a set of data.
In an exemplary embodiment, as shown in fig. 1, a data processing method is provided, and this embodiment is illustrated by applying the method to a server; it will be appreciated that the method may also be applied to a terminal, and may also be applied to a system comprising a server and a terminal, and implemented by interaction between the server and the terminal. The server can be realized by an independent server or a server cluster formed by a plurality of servers; the terminal may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, etc. In this embodiment, the method includes the steps of:
step S102, transferring the ordered data in the memory to a disk associated with the memory as an ordered data set in the disk under the condition that the quantity of the data in the memory meets the memory capacity threshold.
The memory is an internal running space used for data processing in the server, and the disk associated with the memory is a storage space used for storing data processing results of the memory in the server; the memory capacity threshold is an upper limit of a capacity that can be stored in the memory, and in this embodiment, for convenience of illustration, a memory capacity threshold of 9 data is taken as an example, and it is understood that in actual data processing, the memory capacity threshold is much greater than 9.
It should be noted that, when the memory receives data, the memory can sort the data based on the operation advantage of the memory, and the sorting method can select bubbling sorting, insertion sorting, quick sorting and the like.
It can be appreciated that in order to facilitate the transfer of subsequent new data to the ordered data set in the disk, the ordered data in memory needs to fall into the disk in order and at intervals; the order refers to that the data in the memory is transferred to the disk according to the order from big to small or from small to big; the interval refers to that a certain storage space is needed to be reserved in two adjacent data to store the data transferred from the memory to the disk.
Specifically, when the server detects that the number of data in the memory reaches 9, the ordered data in the memory, which has completed ordering, is transferred to a disk associated with the memory, and is used as an ordered data set in the disk. It will be appreciated that after successful transfer of ordered data from memory to disk, no ordered data is stored in memory.
For example, assuming that the memory receives {2,6,1,4,5,8,3,7,9} sequentially, each time one data is accessed, the sequential ordering is performed, and when the number of stored data reaches 9, the ordered data in the memory is {1,2,3,4,5,6,7,8,9}; at this time, the server detects that the amount of data in the memory meets the memory capacity threshold, so that the ordered data in the memory is moved to the disk as an ordered data set in the disk.
Step S104, confirming the median of the ordered data set in the disk as the median of the median data set in the disk.
Wherein, the median data set in the disk is determined according to the median of the ordered data set and the data before and after the median of the ordered data set; for example, the ordered data set in disk is {1,2,3,4,5,6,7,8,9}, with a median of 5; assuming that the number of preset median data sets is 3, the server selects {4,5,6} as the median data set centering around the median 5, and if the number of preset median data sets is 5, the server selects {3,4,5,6,7} as the median data set centering around the median 5. It will be appreciated that for ease of selection, the number of preset median data sets should be odd.
Specifically, the server confirms the median of the ordered data sets according to the ordered data sets in the disk, takes the median of the ordered data sets as the median of the median data sets in the disk, and confirms the median data sets according to the median of the median data sets and the preset median data set quantity.
Step S106, the median data set in the disk is stored in the memory, and when new data is stored in the memory, the median of the data in the memory is updated, and the change information of the median of the data in the memory is confirmed as a median floating value.
The median change information of the data in the memory refers to the median position change information of the data in the memory before and after storing new data.
Specifically, the server stores a median data set in the disk into the memory, and under the condition that new data is stored in the memory, updates the median of the data in the memory according to the data in the memory, and confirms the change information of the median of the data in the memory as a median floating value before and after the new data is stored.
For example, the median data set in the memory is {4,5,6}, the memory stores new data 3, the data in the memory is changed to {3,4,5,6}, the median is changed to be 4 from the original 5, that is, the position of the median is moved one bit to the left in the set, the change information is recorded as-1, and the change information-1 is confirmed as the median floating value; for another example, the memory continues to store new data 7, the data in the memory becomes {3,4,5,6,7}, the median becomes 5, and the same as the median before storing the new data, so the change information is recorded as 0, and the change information 0 is confirmed as the median floating value. In this embodiment, the median when the number of data is even is defined as the first number of the middle two numbers, for example, the median of {3,4,5,6} is 4; according to the practical requirements, the average value of the two middle numbers can be defined as the median of {3,4,5,6}, for example, 4.5.
It can be understood that the median data set in the disk is stored in the memory, so that when new data is stored in the memory, the influence of the new data on the median of the ordered data set in the disk can be more simply and quickly confirmed. For example, assuming that the median dataset is not stored in memory, then every time the memory stores a new data, the server needs to record a comparison of the new data in the memory and the median of the ordered dataset in disk once; then, when the next new data is stored in the memory, the server needs to record the comparison condition of the next new data and the median of the ordered data set, and update the comparison condition of the next new data and the median of the ordered data set based on the comparison condition of the next new data and the previous new data (because the previous new data is not stored in the disk at this time), so that the resource consumption of the memory is increased.
Step S108, under the condition that new data in the memory are transferred to an ordered data set in the disk, updating the median and the median data set of the ordered data set in the disk according to the median floating value, taking the updated median data set in the disk as the median data set in the disk, jumping to the step of storing the median data set in the disk into the memory, under the condition that new data are stored in the memory, updating the median of the data in the memory, and confirming the change information of the median of the data in the memory as the median floating value until the new data are not stored in the memory.
The term "not storing new data in the memory" refers to a situation that the memory does not receive new data within a period of time after the median data set in the disk is stored in the memory.
Specifically, under the condition that new data in a memory are transferred to an ordered data set in a disk, the server updates the median of the ordered data set in the disk according to the recorded median floating value, and updates the median data set according to the updated median of the ordered data set; and then taking the updated median data set in the disk as the median data set in the disk in the step S106, and jumping to the step S106 until the memory does not receive new data within a period of time after the median data set in the disk is stored in the memory.
It should be noted that, since the median data set in the memory is obtained according to the median data set in the disk, that is, the data in the median data set in the memory itself exists in the median data set in the disk, the server only needs to transfer the new data in the memory to the ordered data set in the disk.
For example, assuming that the data in the memory is {3,4,5,6}, the new data is 3, the median floating value is-1, if the server stores the new data 3 in the ordered data set in the disk at this time, the ordered data set becomes {1,2,3,3,4,5,6,7,8,9}, the median is shifted one bit to the left according to the median floating value-1, and thus the median in the ordered data set is changed from 5 to 4; then, according to the new median 4 and the preset number of median data sets 3, the server selects {3,4,5} as the new median data set, and jumps to step S106; for example, the data in the memory is {3,4,5,6,7}, the new data is {3,7}, the median floating value is 0, if the server stores the new data {3,7} in the ordered data set in the disk at this time, the ordered data set becomes {1,2,3,3,4,5,6,7,7,8,9}, and according to the median floating value 0, the median is not changed, so the median in the ordered data set is still 5.
In the data processing method, firstly, under the condition that the quantity of data in the memory meets a memory capacity threshold, the server transfers the ordered data in the memory to a disk associated with the memory to be used as an ordered data set in the disk; then confirming the median of the ordered data set in the disk as the median of the median data set in the disk; then, a median data set in the disk is stored in the memory, and the median of the data in the memory is updated and the change information of the median of the data in the memory is confirmed as a median floating value under the condition that new data is stored in the memory; and finally, under the condition that new data in the memory are transferred to the ordered data set in the disk, updating the median and the median data set of the ordered data set in the disk according to the median floating value, taking the updated median data set in the disk as the median data set in the disk, jumping to the step that the median data set in the disk is stored in the memory, under the condition that the new data are stored in the memory, updating the median of the data in the memory, and confirming the change information of the median of the data in the memory as the median floating value until the new data are not stored in the memory. In this way, the server can confirm the median floating value of the ordered data set in the disk according to the median change information of the data in the memory when the new data is stored in the ordered data set in the disk under the condition that the new data is stored in the memory by determining the median data set in the obtained memory according to the ordered data set in the disk, so that the median in the ordered data set is updated according to the median floating value; in the process, the calculation advantages of the memory and the storage advantages of the magnetic disk are reasonably exerted, and the median floating value representing the floating influence of the new data on the median of the ordered data set is predicted and recorded while the new data is stored in the memory, so that the median can be quickly positioned according to the median floating value without repositioning the median based on all the data in the magnetic disk when the new data is stored in the magnetic disk, and the processing efficiency of data processing is improved.
In an exemplary embodiment, as shown in fig. 2, in the step S106, when new data is stored in the memory, the median of the data in the memory is updated, and the change information of the median of the data in the memory is confirmed as the median floating value, which specifically includes the following steps:
step S202, when new data is stored in the memory, sorting the new data and the data in the median dataset in the memory, and updating the median of the sorted data.
In step S204, when the sum of the number of new data in the memory and the number of data in the ordered data set in the disk is an odd number and the new data is greater than the median of the median data set, or when the sum of the number of new data in the memory and the number of data in the ordered data set in the disk is an even number and the new data is less than the median of the median data set, the position difference information between the median of the ordered data and the initial median is determined to be the change information of the median of the data in the memory.
In step S206, the change information is used as a median floating value.
Wherein the initial median is the median of the median dataset in the memory.
Specifically, the server sorts the new data in the memory and the data of the median data set under the condition that the memory stores the new data, and obtains the median of the updated and sorted data according to the updated and sorted data; then, in the case that the sum of the number of new data in the memory and the number of data of the ordered data set in the disk is an odd number and the new data is greater than the median of the median data set, or in the case that the sum of the number of new data in the memory and the number of data of the ordered data set in the disk is an even number and the new data is less than the median of the median data set, confirming position difference information between the median of the ordered data and the median of the median data set in the memory, and taking the position difference information as change information; and finally, taking the change information as a median floating value.
For example, the median data set in the memory is {4,5,6}, the memory stores new data 3, the ordered data in the memory becomes {3,4,5,6}, the median at this time becomes 4, the number of new data is 1, the number of data in the ordered data set in the disk is 9, the sum of the numbers is 10, and the new data 3 is smaller than the initial median 5, then the position difference information between the median 4 of the updated ordered data and the initial median 5 is shifted one bit to the left, and thus the median floating value obtained by the server is-1.
The memory continues to store new data 7, the ordered data in the memory becomes {3,4,5,6,7}, the median at this time becomes 5, the number of new data is 2 (including new data including 3 and 7), the number of data in the ordered data set in the disk is 9, the sum of the numbers is 11, and the new data 7 is greater than the initial median 5, then no position difference information exists between the median 5 of the ordered data and the initial median 5, and therefore the median floating value obtained by the server is 0.
The memory continues to store new data 1, the ordered data in the memory becomes {1,3,4,5,6,7}, the median at this time becomes 4, the number of new data is 3, the number of data in the ordered data set in the disk is 9, the sum of the numbers is 12, and the new data 1 is smaller than the initial median 5, then the position difference information between the median 4 of the ordered data and the initial median 5 is updated to be shifted one bit to the left, so that the median floating value obtained by the server is-1.
It will be appreciated that where the sum of the number of new data in the memory and the number of data in the ordered data set in the disk is an odd number and the new data is greater than the median of the median data set, the median float value will not change relative to the median float value determined by the last new data. For example, the memory continues to store new data 2, the ordered data in the memory becomes {1,2,3,4,5,6,7}, the median is 4 at this time, the number of new data is 4, the number of data in the ordered data set in the disk is 9, the sum of numbers is 13, and the new data 2 is smaller than the initial median 5, then the position difference information between the median 4 and the initial median 5 of the updated ordered data is shifted one bit to the left, so that the median floating value obtained by the server is still-1.
Similarly, in the case where the sum of the number of new data in the memory and the number of data in the ordered data set in the disk is an even number and the new data is smaller than the median of the median data set, the median float value obtained by the server will not change.
In this embodiment, the server obtains the median floating value that can represent the influence of the new data on the median of the ordered data set in the disk under the condition that the new data moves to the ordered data set in the disk by updating the position difference information between the median of the ordered data and the initial median in the memory, so that the position of the median can be quickly located when the new data is stored in the disk, and the processing efficiency of data processing is improved. In addition, the server classifies the confirmation process of the median floating value into two types according to the fact that the sum of the number of new data in the memory and the number of data in the ordered data set in the disk is odd or even and the comparison condition of the new data and the initial median size, one type of the new data is that the median floating value can be changed, the median floating value needs to be updated, and the other type of the new data is that the median floating value cannot be changed, so that repeated confirmation of the floating value is not needed, the confirmation process of the median floating value is simplified, and the processing efficiency of data processing is further improved.
In an exemplary embodiment, in the step S204, the position difference information between the median of the sorted data and the initial median is obtained by:
confirming a first data offset of an initial median; the first data offset is an initial median, and is relative to the number offset of first data of the ordered data in the memory; confirming a second data offset of the median of the ordered data in the memory; the second offset is the ordered median in the memory and is offset relative to the number of the first data of the ordered data in the memory; and confirming the difference information between the first data offset and the second data offset as position difference information.
Specifically, the server firstly confirms the number offset of initial median relative to the first data of the ordered data in the memory, and the initial median is used as a first data offset; then confirming the median of the ordered data in the memory, and taking the number offset of the first data of the ordered data in the memory as a second data offset; and finally, confirming the difference information between the first data offset and the second data offset as position difference information.
For example, when the data in the memory is {1,3,4,5,6,7}, the first data offset is 3 and the second data offset is 2, so that the difference information between the first data offset and the second data offset obtained by the server is 2-3= -1, thereby obtaining the position difference information-1.
In this embodiment, the server can quickly confirm the position difference information between the initial median first data offset and the median second data offset of the data sequenced in the memory, thereby obtaining the median change information of the data in the memory, and further obtaining the median floating value, so that when the subsequent new data moves to the ordered data set in the disk, the server can quickly confirm the median in the ordered data set in the disk, and further improve the processing efficiency of data processing.
In an exemplary embodiment, in step S108, when new data in the memory is transferred to the ordered data set in the disk, the median and the median data set of the ordered data set in the disk are updated according to the median floating value, which specifically includes the following contents: under the condition that new data in a memory are transferred to an ordered data set in a disk, confirming the median of the ordered data set after transferring the new data according to the median floating value and the median of the ordered data set before transferring the new data, and taking the median as the updated median of the ordered data set; and updating the median data set in the disk according to the updated median of the ordered data set after transferring the new data and the data before and after the updated median in the ordered data set after transferring the new data.
In the step S108, when new data in the memory is transferred to the ordered data set in the disk, after updating the ordered data set and the ordered data set in the disk according to the median floating value, the method specifically includes the following steps: and deleting the median data set in the memory.
Specifically, under the condition that new data in a memory are transferred to an ordered data set in a disk, the server confirms the median of the ordered data set after transferring the new data in the ordered data set after transferring the new data as the updated median of the ordered data set according to the median floating value and the median of the ordered data set before transferring the new data; updating the median data set in the disk according to the updated median and the data before and after the updated median to obtain the updated median data set in the disk; and finally deleting the median data set in the memory.
For example, the server transfers the new data {3,7,1,2} in the memory to the ordered data set in the disk to obtain an ordered data set {1,1,2,2,3,3,4,5,6,7,7,8,9} after transferring the new data, and confirms that the median of the ordered data set after transferring the new data is one bit data on the left of 5 according to the median floating value-1 and the median 5 of the ordered data set before transferring the new data, namely 4 is the median after updating the ordered data set; then, according to the data 3 and 5 before and after the data 4 and 4, updating the median data set in the disk to obtain an updated median data set {3,4,5}; and finally, deleting the median data set {4,5,6} in the memory by the server.
In this embodiment, the server updates the median of the ordered data set after transferring the new data by the median floating value and the median of the ordered data set before transferring the new data, and updates the median data set in the disk based on the updated median, so as to obtain the median data set of the median floating value, which needs to be stored in the memory next time, and is convenient for confirming the influence of the new data on the median of the ordered data set in the disk; meanwhile, the server deletes the median data set in the memory, so that the influence of the original median data set can be avoided when new data stored in the memory exists.
In an exemplary embodiment, the data processing method specifically further includes the following: obtaining the total amount of data in the memory according to the amount of new data in the memory and the amount of data in the median dataset; in the case where the total amount of data in the memory meets the memory capacity threshold, or in the case where the median float value meets the preset float value threshold, new data is transferred to the ordered data set in the disk.
Wherein the preset float threshold is a range of median float values, the preset float threshold is typically set to one half of the number of median data sets. If the memory continues to store new data after the median float value meets the range of the preset float value threshold, the median float value determined according to the new data at the moment cannot be accurately updated; it is therefore desirable to transfer new data in memory to the ordered data set of the disk when the median float meets a range of preset float thresholds, i.e., when the median float is greater than one-half of the median data set number.
For example, when the data in the memory is {1,2,3,4,5,6,7}, the median floating value recorded by the server is-1; the memory continues to store {1, 1} in sequence, when the first 1 is stored, the data in the memory becomes {1,1,2,3,4,5,6,7}, the median in the memory is 3, namely, the median floating value recorded by the server is-2, at this time, the new data {1,1,2,3,7} is transferred to the disk, a new ordered data set {1,1,1,2,2,3,3,4,5,6,7,7,8,9} is obtained, the updated median in the disk is obtained according to the median floating value to be the second 3, and the median is the same as the median in the memory, namely, the median obtained by updating by the server is correct.
The second 1 is stored in the memory, the data in the memory is changed to {1,1,1,2,3,4,5,6,7}, at this time, the median in the memory is 3, the median floating value recorded by the server is still-2, at this time, the new data {1,1,1,2,3,7} is transferred to the disk to obtain a new ordered data set {1,1,1,1,2,2,3,3,4,5,6,7,7,8,9}, the updated median in the disk is obtained according to the median floating value to be the second 3, and the median is the same as the median in the memory, namely, the median obtained by updating by the server is correct.
The third 1 is stored in the memory, the data in the memory is changed to {1,1,1,1,2,3,4,5,6,7}, the median in the memory is 2, namely, the median floating value recorded by the server is-3, at this time, the server transfers the new data {1,1,1,1,2,3,7} to the disk to obtain a new ordered data set {1,1,1,1,1,2,2,3,3,4,5,6,7,7,8,9}, the updated median in the disk is the first 3 according to the median floating value, and the median is different from the median in the memory, namely, the server cannot obtain the correct median according to the median floating value.
If the server transfers new data to the disk if the correct median is not available, this may lead to erroneous landing of the data, e.g. the server should store new data {1, 1} in front of 2 according to the median 2 in the memory, but since the median in the disk is the first 3, the new data {1, 1} becomes between 2 and 3 stored in the disk. Therefore, the server needs to continuously store the first 1 in the memory, namely, the absolute value 2 of the median floating value-2 is more than 3/2 of the number of the median data sets, and new data in the memory is transferred to the disk, so that the situation that the subsequent median floating value is continuously increased, the median of the ordered data sets in the disk cannot be correctly positioned through the median floating value, and the new data is misplaced is avoided.
Specifically, the server triggers an operation of transferring new data in the memory to the ordered data set in the disk in two cases: firstly, when the total amount of data in the memory, namely the sum of the amount of new data and the amount of data of the median data set, meets a memory capacity threshold; secondly, the median floating value meets a preset floating value threshold. It can be understood that, for the sake of easy understanding, in this embodiment, the memory capacity threshold and the number of median data sets are all exemplified by smaller values; however, in practical applications, the memory capacity threshold and the number of median data sets are both large, so that the preset floating value threshold is also large, and the server does not trigger an operation of transferring new data in the memory to the ordered data set in the disk when the memory stores a small amount of data, so that frequent read operation and write operation on the disk are not performed.
In this embodiment, the server triggers the operation of transferring new data in the memory to the ordered data set in the disk through two trigger conditions, which not only can avoid the data loss caused by the total amount of data in the memory exceeding the memory capacity threshold, but also can avoid the situation that the new data falls down due to the fact that the median in the disk is wrongly confirmed under the condition that the median floating value is changed too much, thereby improving the accuracy of confirming the median in the data processing process.
In an exemplary embodiment, in step S104, the following is specifically included before the median of the ordered data set in the disk is confirmed as the median of the median data set in the disk: under the condition that a head identifier exists in the ordered data set in the disk, querying a tail identifier in the data of the ordered data set; the header mark is the mark of the first data in the ordered data in the memory; the tail mark is the mark of last data in the ordered data in the memory; and under the condition that tail identification exists in the ordered data set in the disk, confirming the ordered data set in the disk as the ordered data set successfully transferred from the memory.
Step S104, the median of the ordered data set in the disk is confirmed, and the median of the ordered data set in the disk is used as the median of the ordered data set in the disk, and specifically includes the following steps: in the case that the ordered data set in the disk is the ordered data set successfully transferred from the memory, the median of the ordered data set in the disk is confirmed as the median of the median data set in the disk.
The header identifier is an identifier of first data in the ordered data in the memory, and may be understood that a tag, such as a head, is added to the first data in the memory to indicate that the data is first data; the tail identifier is the identifier of the last data of the ordered data in the memory, and can be understood as adding a flag, such as tail, to the last data in the memory to identify the data as the last data. For example, the data in the memory is {1,2,3,4,5,6,7,8,9}, then the first data with the head identifier is 1 and the last data with the tail identifier is 9.
Specifically, before the server uses the median of the ordered data set in the disk as the median of the median data set in the disk, it is further required to determine whether all the ordered data in the memory is successfully transferred to the disk, and the specific determination process is as follows: the server firstly inquires whether a head mark exists in the ordered data set in the disk, if so, continuously inquires whether a tail mark exists, if so, the ordered data in the memory is successfully transferred, the ordered data set in the disk is confirmed as the ordered data set successfully transferred from the memory, and then the median of the ordered data set in the disk is confirmed and is used as the median of the median data set in the disk. If the server cannot inquire the head identifier or the tail identifier in the ordered data set of the disk, the ordered data in the partial memory fails to be transferred and needs to be transferred again.
In this embodiment, the server determines, according to the query results of the head identifier and the tail identifier of the ordered data set in the disk, whether all the ordered data in the memory is successfully transferred to the ordered data set in the disk, thereby avoiding data loss and median confirmation errors caused by partial data transfer failure and increasing the accuracy of median confirmation in the data processing process.
In an exemplary embodiment, as shown in fig. 3, another data processing method is provided, and the method is applied to a server for illustration, and includes the following steps:
in step S301, in the case that the amount of data in the memory meets the memory capacity threshold, the ordered data in the memory is transferred to the disk associated with the memory as the ordered data set in the disk.
Step S302, under the condition that a head identifier exists in ordered data sets in a disk, querying a tail identifier in the data of the ordered data sets; under the condition that tail marks exist in ordered data sets in a disk, the ordered data sets in the disk are confirmed to be the ordered data sets successfully transferred from the memory; in the case that the ordered data set in the disk is the ordered data set successfully transferred from the memory, the median of the ordered data set in the disk is confirmed as the median of the median data set in the disk.
Step S303, storing the median data set in the disk in the memory, sorting the new data and the data in the median data set in the memory when the new data is stored in the memory, and updating the median of the sorted data.
Step S304, when the sum of the number of the new data in the memory and the number of the ordered data sets in the disk is odd and the new data is larger than the median of the median data sets, or when the sum of the number of the new data in the memory and the number of the ordered data sets in the disk is even and the new data is smaller than the median of the median data sets, determining the position difference information between the median of the ordered data and the initial median as the change information of the median of the data in the memory, and taking the change information as the median floating value.
Wherein the initial median is the median of the median dataset in the memory.
In step S305, in the case where the total amount of data in the memory meets the memory capacity threshold, or in the case where the median float value meets the preset float value threshold, new data is transferred to the ordered data set in the disk.
In step S306, when transferring new data in the memory to the ordered data set in the disk, the median of the ordered data set after transferring the new data is confirmed as the updated median of the ordered data set according to the median floating value and the median of the ordered data set before transferring the new data.
Step S307, according to the updated median of the ordered data set after transferring the new data and the updated data before and after the updated median in the ordered data set after transferring the new data, updating the median data set in the disk, and deleting the median data set in the memory.
Step S308, taking the updated median data set in the disk as the median data set in the disk, and jumping to step S303 until no new data is stored in the memory.
In this embodiment, on the one hand, the server obtains the median floating value by updating the position difference information between the median of the ordered data and the initial median in the memory, so that when new data is stored in the disk, the position of the median can be quickly located according to the median floating value, and the processing efficiency of data processing is improved. On the other hand, the server classifies the confirmation process of the median floating value into two types according to the condition that the sum of the number of new data in the memory and the number of data in an ordered data set in the disk is odd or even and the new data is compared with the initial median size, one type is that the median floating value can be changed, the median floating value needs to be updated, and the other type is that the median floating value cannot be changed, so that repeated confirmation of the floating value is not needed, the confirmation process of the median floating value is simplified, and the processing efficiency of data processing is further improved. In addition, the server updates the median of the ordered data set after transferring the new data through the median floating value, and updates the median data set in the disk based on the updated median, so that the next median data set which needs to be stored in the memory can be obtained, and the influence of the new data on the median of the ordered data set in the disk can be conveniently confirmed; meanwhile, the server deletes the median data set in the memory, so that the influence of the original median data set can be avoided when new data stored in the memory is ensured. The server triggers the operation of transferring new data in the memory to the ordered data set in the disk through two trigger conditions, so that not only can the data loss caused by the fact that the total quantity of the data in the memory exceeds a memory capacity threshold be avoided, but also the situation that the new data fall down due to the fact that the median in the disk is wrongly confirmed under the condition that the median floating value is changed too much can be avoided, and the accuracy of confirming the median in the data processing process is improved; in addition, the server judges whether all the ordered data in the memory is successfully transferred to the ordered data set in the disk according to the query results of the head mark and the tail mark of the ordered data set in the disk, so that the data loss and the median confirmation error caused by partial data transfer failure are avoided, and the accuracy of median confirmation in the data processing process is further improved. Based on the data processing process, the server reasonably plays the calculation advantages of the memory and the storage advantages of the magnetic disk, predicts and records the median floating value representing the floating influence of the new data on the median of the ordered data set while storing the new data in the memory, so that the median can be updated directly according to the median floating value without updating the median based on all the data in the magnetic disk when the new data is stored in the magnetic disk, and the processing efficiency of data processing is improved.
In order to more clearly illustrate the data processing method provided in the embodiments of the present application, a specific embodiment is described below specifically. In an exemplary embodiment, the present application further provides a low-resource consumption median method for massive real-time data as shown in fig. 4, which specifically includes the following steps:
step 1: and accessing the data into the memory.
The real-time data is accessed into the memory, and the memory sorts each accessed number. When the data in the memory reaches a certain amount, representing that the memory cannot be accommodated, the full amount of data is transferred to the disk.
Step 2: the full data is orderly dropped.
Server deviceAnd transferring the ordered full data into a disk, and ensuring the ordered landing of the data. The method is divided into a plurality of files in the disk for persistence, so that the subsequent rapid positioning of the median is facilitated. At the same time, according to the median value recorded in the memory, the storage position of the median in the disk, i.e. the offset relative to the first data, is recorded as offset α The median value is recorded as
Step 3: the median interval is stored in the memory.
The server sets the offset interval in the disk as the offset interval according to the self-defined median interval size beta Is stored in the memory, and the data set stored in the memory is denoted as omega 1.
Step 4: the position of the median of the memory floats.
The memory is accessed with new real-time data, recorded as a new data set omega 2, and compared with the data set omega 1 in the memory, so that the position of the memory median is positioned to float up and down.
Step 5: the new data set omega 2 in the memory is transferred to the disk.
When the median floating value exceeds one half of the data quantity of the data set omega 1, and the subsequent accessed new data cannot accurately position the median in the disk, or the data quantity in the memory is overlarge, and the memory is about to not accommodate the new data, the server stores the new data set omega 2 in the memory into the disk.
Step 6: the median location of data in the disk.
The server updates the location of the median in the disk quickly based on the median float value. And obtaining a new data set omega 1 according to the updated median. And then repeating the steps 2 to 6.
In this embodiment, the server considers the locality of data, fully plays the high-performance computing advantage of the memory, reasonably arranges the spatial distribution of the memory and the disk, realizes ordered disk placement of the total data, and integrates and plans the memory and the disk space. And the up-and-down floating of the median positioning is carried out according to the newly accessed data, so that the median is positioned quickly, the disk reading times are reduced, and a median algorithm with low resource consumption is realized, thereby improving the efficiency of median positioning in the data processing process. And because the data files in the disk are orderly, the first and last offset of the data in the file can be recorded by using the file name in advance, so that the median can be quickly repositioned when the conditions of power failure, server downtime, program breakdown and the like occur.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a data processing device for realizing the above related data processing method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation of one or more embodiments of the data processing device provided below may refer to the limitation of the data processing method hereinabove, and will not be repeated herein.
In an exemplary embodiment, as shown in fig. 5, there is provided a data processing apparatus including: a memory data transfer module 502, a median data validation module 504, a floating information validation module 506, and a median data update module 508, wherein:
the memory data transfer module 502 is configured to transfer the ordered data in the memory to a disk associated with the memory as an ordered data set in the disk if the number of data in the memory meets a memory capacity threshold.
A median data validation module 504, configured to validate the median of the ordered data sets in the disk as the median of the median data sets in the disk; the median data set in the disk is determined from the median of the ordered data sets and the data before and after the median of the ordered data sets.
The floating information confirming module 506 is configured to store a median data set in the disk in the memory, update a median of the data in the memory when new data is stored in the memory, and confirm the change information of the median of the data in the memory as a median floating value.
And the median data updating module 508 is configured to update the median and the median data set of the ordered data set in the disk according to the median floating value when new data in the memory is transferred to the ordered data set in the disk, skip the updated median data set in the disk to the median data set in the disk for storing the new data in the memory, and update the median of the data in the memory and confirm the change information of the median of the data in the memory as the median floating value until the new data is not stored in the memory.
In an exemplary embodiment, the floating information validation module 506 is further configured to, if the memory stores new data, sort the new data and the data in the median dataset in the memory, and update the median of the sorted data; when the sum of the number of the new data in the memory and the number of the ordered data sets in the disk are odd numbers and the new data is larger than the median of the median data sets, or when the sum of the number of the new data in the memory and the number of the ordered data sets in the disk is even numbers and the new data is smaller than the median of the median data sets, confirming the position difference information between the median of the ordered data and the initial median as the change information of the median of the data in the memory; the initial median is the median of the median dataset in the memory; the change information is used as a median floating value.
In an exemplary embodiment, the floating information validation module 506 is further configured to validate the first data offset of the initial median; the first data offset is an initial median, and is relative to the number offset of first data of the ordered data in the memory; confirming a second data offset of the median of the ordered data in the memory; the second offset is the ordered median in the memory and is offset relative to the number of the first data of the ordered data in the memory; and confirming the difference information between the first data offset and the second data offset as position difference information.
In an exemplary embodiment, the median data update module 508 is further configured to, in a case of transferring new data in the memory to the ordered data set in the disk, confirm the median of the ordered data set after transferring the new data as the updated median of the ordered data set according to the median floating value and the median of the ordered data set before transferring the new data; updating the median dataset in the disk according to the updated median of the ordered dataset after transferring the new data and the data before and after the updated median in the ordered dataset after transferring the new data; and deleting the median data set in the memory.
In an exemplary embodiment, the memory data transfer module 502 is further configured to obtain the total amount of data in the memory according to the amount of new data in the memory and the amount of data in the median dataset; in the case where the total amount of data in the memory meets the memory capacity threshold, or in the case where the median float value meets the preset float value threshold, new data is transferred to the ordered data set in the disk.
In an exemplary embodiment, the data processing apparatus further includes a data transfer checking module for querying the data of the ordered data set for a tail identifier in the case that the head identifier exists in the ordered data set in the disk; the header mark is the mark of the first data in the ordered data in the memory; the tail mark is the mark of last data in the ordered data in the memory; and under the condition that tail identification exists in the ordered data set in the disk, confirming the ordered data set in the disk as the ordered data set successfully transferred from the memory.
The median data confirmation module 504 is further configured to confirm the median of the ordered data set in the disk as the median of the median data set in the disk, in the case that the ordered data set in the disk is the ordered data set successfully transferred from the memory.
Each of the modules in the above-described data processing apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In an exemplary embodiment, a computer device is provided, which may be a server, and an internal structure thereof may be as shown in fig. 6. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data to be subjected to data processing. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data processing method.
It will be appreciated by those skilled in the art that the structure shown in fig. 6 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In an exemplary embodiment, a computer device is also provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In an exemplary embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method embodiments described above.
In an exemplary embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. A method of data processing, the method comprising:
transferring ordered data in the memory to a disk associated with the memory as an ordered data set in the disk under the condition that the quantity of data in the memory meets a memory capacity threshold;
confirming the median of the ordered data sets in the magnetic disk, and determining the median data set in the magnetic disk according to the median of the ordered data sets in the magnetic disk, the data before and after the median of the ordered data sets and the preset median data set quantity;
Storing a median data set in the disk into the memory, updating the median of the data in the memory under the condition that the memory stores new data, and confirming the change information of the median of the data in the memory as a median floating value;
and under the condition that the new data in the memory is transferred to the ordered data set in the disk, updating the median and the median data set of the ordered data set in the disk according to the median floating value, taking the updated median data set in the disk as the median data set in the disk, jumping to store the median data set in the disk into the memory, and under the condition that the memory stores the new data, updating the median of the data in the memory, and confirming the change information of the median of the data in the memory as the median floating value until the new data is not stored in the memory.
2. The method according to claim 1, wherein updating the median of the data in the memory and confirming the change information of the median of the data in the memory as the median floating value in the case where the memory stores new data, comprises:
When the memory stores new data, sorting the new data and the data in the median dataset in the memory, and updating the median of the sorted data;
in the case that the sum of the numbers of the new data in the memory and the ordered data set in the disk is odd and the new data is greater than the median of the median data set, or in the case that the sum of the numbers of the new data in the memory and the ordered data set in the disk is even and the new data is less than the median of the median data set, determining position difference information between the median of the ordered data and the initial median as change information of the median of the data in the memory; the initial median is the median of the median data set in the memory;
and taking the change information as the median floating value.
3. The method according to claim 2, wherein the position difference information between the median of the ordered data and the initial median is obtained by:
Confirming a first data offset of the initial median; the first data offset is the initial median and is relative to the number offset of the first data of the ordered data in the memory;
confirming a second data offset of a median of the ordered data in the memory; the second offset is the number offset of the first data of the ordered data in the memory relative to the ordered median in the memory;
and confirming the difference information between the first data offset and the second data offset as the position difference information.
4. The method of claim 1, wherein updating the median and the median data set of the ordered data set in the disk based on the median float value in the event that the new data in the memory is transferred to the ordered data set in the disk comprises:
under the condition that the new data in the memory is transferred to an ordered data set in the disk, confirming the median of the ordered data set after transferring the new data as the updated median of the ordered data set according to the median floating value and the median of the ordered data set before transferring the new data;
Updating the median dataset in the disk according to the updated median of the ordered dataset after transferring the new data and the updated data before and after the updated median in the ordered dataset after transferring the new data;
in the case of transferring the new data in the memory to an ordered data set in the disk, updating the median of the ordered data set and the median data set in the disk according to the median floating value, further includes:
and deleting the median data set in the memory.
5. The method according to claim 1, wherein the method further comprises:
obtaining the total amount of data in the memory according to the amount of the new data in the memory and the amount of the data of the median dataset;
and transferring the new data to the ordered data set in the magnetic disk under the condition that the total quantity of the data in the memory meets the memory capacity threshold value or the condition that the median floating value meets the preset floating value threshold value.
6. The method according to any one of claims 1 to 5, wherein before confirming the median of the ordered data sets in the disk, determining the median data set in the disk from the median of the ordered data sets in the disk, the data before and after the median of the ordered data sets, and the preset number of median data sets, further comprises:
Under the condition that a head identifier exists in the ordered data set in the magnetic disk, querying a tail identifier in the data of the ordered data set; the header identifier is an identifier of first data in the ordered data in the memory; the tail mark is the mark of last data in the ordered data in the memory;
under the condition that tail identification exists in the ordered data set in the magnetic disk, the ordered data set in the magnetic disk is confirmed to be the ordered data set successfully transferred from the memory;
the determining the median of the ordered data set in the disk according to the median of the ordered data set in the disk, the data before and after the median of the ordered data set and the preset number of the median data set, and the determining the median data set in the disk includes:
and under the condition that the ordered data set in the magnetic disk is the ordered data set successfully transferred from the memory, confirming the median of the ordered data set in the magnetic disk, and determining the median data set in the magnetic disk according to the median of the ordered data set in the magnetic disk, the data before and after the median of the ordered data set and the preset median data set quantity.
7. A data processing apparatus, the apparatus comprising:
the memory data transfer module is used for transferring the ordered data in the memory to a disk associated with the memory as an ordered data set in the disk under the condition that the quantity of the data in the memory meets a memory capacity threshold;
the medium data confirmation module is used for confirming the medium number of the ordered data sets in the magnetic disk, and determining the medium data sets in the magnetic disk according to the medium number of the ordered data sets in the magnetic disk, the data before and after the medium number of the ordered data sets and the preset medium number of the data sets;
the floating information confirming module is used for storing a median data set in the magnetic disk into the memory, updating the median of the data in the memory under the condition that the memory stores new data, and confirming the change information of the median of the data in the memory as a median floating value;
and the median data updating module is used for updating the median and the median data set of the ordered data set in the disk according to the median floating value under the condition that the new data in the memory is transferred to the ordered data set in the disk, jumping to store the median data set in the disk into the memory by taking the updated median data set in the disk as the median data set in the disk, updating the median of the data in the memory under the condition that the memory stores the new data, and confirming the change information of the median of the data in the memory as the median floating value until the new data is not stored in the memory.
8. The apparatus of claim 7, wherein the floating information validation module is further configured to, if the memory stores new data, sort the new data and the data in the median dataset in the memory, and update a median of the sorted data; in the case that the sum of the numbers of the new data in the memory and the ordered data set in the disk is odd and the new data is greater than the median of the median data set, or in the case that the sum of the numbers of the new data in the memory and the ordered data set in the disk is even and the new data is less than the median of the median data set, determining position difference information between the median of the ordered data and the initial median as change information of the median of the data in the memory; the initial median is the median of the median data set in the memory; and taking the change information as the median floating value.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
CN202310204662.XA 2023-03-03 2023-03-03 Data processing method, device, computer equipment and storage medium Active CN116414733B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310204662.XA CN116414733B (en) 2023-03-03 2023-03-03 Data processing method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310204662.XA CN116414733B (en) 2023-03-03 2023-03-03 Data processing method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116414733A CN116414733A (en) 2023-07-11
CN116414733B true CN116414733B (en) 2024-02-20

Family

ID=87052337

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310204662.XA Active CN116414733B (en) 2023-03-03 2023-03-03 Data processing method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116414733B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107544753A (en) * 2017-07-26 2018-01-05 阿里巴巴集团控股有限公司 Data processing method, device and server
CN110019353A (en) * 2017-09-15 2019-07-16 北京国双科技有限公司 A kind of data processing method and device
CN111694703A (en) * 2019-03-13 2020-09-22 阿里巴巴集团控股有限公司 Cache region management method and device and computer equipment
CN112988064A (en) * 2021-02-09 2021-06-18 华中科技大学 Concurrent multitasking-oriented disk image processing method
CN113296686A (en) * 2020-05-29 2021-08-24 阿里巴巴集团控股有限公司 Data processing method, device, equipment and storage medium
WO2021179211A1 (en) * 2020-03-11 2021-09-16 深圳市欢太科技有限公司 Method and apparatus for determining data integrity, and electronic device and storage medium
CN114724725A (en) * 2022-03-31 2022-07-08 医渡云(北京)技术有限公司 Data processing method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107544753A (en) * 2017-07-26 2018-01-05 阿里巴巴集团控股有限公司 Data processing method, device and server
CN110019353A (en) * 2017-09-15 2019-07-16 北京国双科技有限公司 A kind of data processing method and device
CN111694703A (en) * 2019-03-13 2020-09-22 阿里巴巴集团控股有限公司 Cache region management method and device and computer equipment
WO2021179211A1 (en) * 2020-03-11 2021-09-16 深圳市欢太科技有限公司 Method and apparatus for determining data integrity, and electronic device and storage medium
CN113296686A (en) * 2020-05-29 2021-08-24 阿里巴巴集团控股有限公司 Data processing method, device, equipment and storage medium
CN112988064A (en) * 2021-02-09 2021-06-18 华中科技大学 Concurrent multitasking-oriented disk image processing method
CN114724725A (en) * 2022-03-31 2022-07-08 医渡云(北京)技术有限公司 Data processing method and device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Novel algorithms for computing medians and other quantiles of disk-resident data;L. Fu Etc.;Proceedings 2001 International Database Engineering and Applications Symposium;全文 *
内存多核并行查询优化技术研究;焦敏 等;计算机学报;第37卷(第9期);全文 *
基于磁盘表存储FP-TREE的关联规则挖掘算法;申彦;宋顺林;朱玉全;;计算机研究与发展(06);全文 *

Also Published As

Publication number Publication date
CN116414733A (en) 2023-07-11

Similar Documents

Publication Publication Date Title
US11853549B2 (en) Index storage in shingled magnetic recording (SMR) storage system with non-shingled region
US11604834B2 (en) Technologies for performing stochastic similarity searches in an online clustering space
CN106874348B (en) File storage and index method and device and file reading method
CN104238962B (en) The method and device of data is write into caching
JP2005267600A5 (en)
CN110888837B (en) Object storage small file merging method and device
CN110147203B (en) File management method and device, electronic equipment and storage medium
CN112463058B (en) Fragmented data sorting method and device and storage node
CN111984651A (en) Column type storage method, device and equipment based on persistent memory
CN116414733B (en) Data processing method, device, computer equipment and storage medium
CN111857600A (en) Data reading and writing method and device
CN115237351B (en) NAND block dynamic remapping and read-write command processing method and storage device
CN115269745B (en) Method, equipment and storage medium for mapping relational data to graph data
CN115878625A (en) Data processing method and device and electronic equipment
CN110019086A (en) More copy read methods, equipment and storage medium based on distributed file system
CN106469174B (en) Method for reading data and device
CN110490581B (en) Distributed system critical data resource updating method and system
CN112015791B (en) Data processing method, device, electronic equipment and computer storage medium
CN106980616A (en) A kind of mass small documents merge storage method and system
CN110007874B (en) Data writing method and device of three-dimensional flash memory and readable storage medium
CN115221360A (en) Tree structure configuration method and system
CN113326262A (en) Data processing method, device, equipment and medium based on key value database
CN115374127B (en) Data storage method and device
CN113238857B (en) Map mapping table multithreading traversal method and device based on memory pool
CN113625964B (en) NandFlash-based sequential storage method, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant