CN113536075B - Data extraction method, device and storage medium - Google Patents

Data extraction method, device and storage medium Download PDF

Info

Publication number
CN113536075B
CN113536075B CN202110818362.1A CN202110818362A CN113536075B CN 113536075 B CN113536075 B CN 113536075B CN 202110818362 A CN202110818362 A CN 202110818362A CN 113536075 B CN113536075 B CN 113536075B
Authority
CN
China
Prior art keywords
filtering
information
data
target column
storage device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110818362.1A
Other languages
Chinese (zh)
Other versions
CN113536075A (en
Inventor
郑宁
王晋强
蔡雪梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ruizhe Hangzhou Technology Co ltd
Original Assignee
Ruizhe Hangzhou Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ruizhe Hangzhou Technology Co ltd filed Critical Ruizhe Hangzhou Technology Co ltd
Priority to CN202110818362.1A priority Critical patent/CN113536075B/en
Publication of CN113536075A publication Critical patent/CN113536075A/en
Application granted granted Critical
Publication of CN113536075B publication Critical patent/CN113536075B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data extraction method, a device and a storage medium, wherein the method comprises the following steps: the method comprises the steps that a host side firstly obtains a reference column from the computable storage device, determines filtering information according to the content and selection conditions of the reference column, and then sends a filtering request to the computable storage device, wherein the filtering request carries the filtering information and is used for indicating the computable storage device to filter a target column based on the filtering information, and after the filtering is completed, the host side receives filtered data from the computable storage device. By filtering the target column in the computable storage device, the computable storage device returns the filtered data to the host side, so as to reduce the occupation of IO bandwidth and memory bandwidth in the system and reduce the calculation pressure of the host side, such as the consumption of a Central Processing Unit (CPU).

Description

Data extraction method, device and storage medium
Technical Field
The present application relates to the field of data processing, and in particular, to a data extraction method, apparatus, and storage medium.
Background
With the rapid development of the internet and large data, the data is being generated and consumed at an unprecedented speed, and the amount of data that we need to process is multiplied. Current data analysis engines, such as various databases, mostly employ a column-store method to store data.
Currently, in the data analysis of column memories, a host loads all columns participating in the data analysis from a storage device into a memory, and then the host performs column filtering, and then performs subsequent processing. By the data analysis mode, occupation of IO bandwidth and memory bandwidth is increased for the system, and calculation pressure is brought to a CPU at a host side.
Disclosure of Invention
The application provides a data extraction method, a data extraction device and a storage medium, which are used for reducing the occupation of IO bandwidth and memory bandwidth in a system and reducing the calculation pressure of a host side.
In a first aspect, the present application provides a data extraction method, applied to a host, including:
Obtaining a reference column from a computable storage device;
Determining filtering information according to the content of the reference column and the selection condition;
sending a filtering request to the computable storage device, wherein the filtering request carries the filtering information, and the filtering request is used for indicating the computable storage device to filter the target column based on the filtering information;
Filtered data from the computable storage device is received.
In one possible implementation, before sending the filtering request to the computable storage device, the method further includes: and inquiring storage information of the target column in the computable storage device, wherein the storage information comprises a logic address of the target column and an address offset and a length of the target column in the logic address, and generating the filtering request according to the storage information and filtering information.
In one possible implementation, generating a filtering request according to the stored information and the filtering information includes: and determining the data quantity after filtering the target column according to the size of the target column and the filtering information, applying for a memory space with the same size as the data quantity in the host end, and generating a filtering request according to the storage information, the filtering information and the address and the size of the memory space.
In one possible implementation, before receiving the filtered data from the computable storage device, the method further comprises: and determining the data volume after filtering the target column according to the size of the target column and the filtering information, and applying for a memory space with the same size as the data volume in a host side.
In a second aspect, the present application provides a data extraction method applied to a computable storage device, including:
Receiving a filtering request from a host, wherein the filtering request carries filtering information, and the filtering request is used for indicating the computable storage device to filter a target column based on the filtering information, and the filtering information is determined according to the content of a reference column and selection conditions;
filtering the target column based on the filtering information;
And sending the filtered data to a host side.
In a possible implementation manner, the filtering request further comprises storage information, wherein the storage information comprises a logic address of the target column and an address offset and a length of the target column in the logic address; before filtering the target column based on the filtering information, the method further comprises: determining a data block corresponding to the logic address of the target column; the target column is determined in the data block based on the address offset and length of the target column in the logical address.
In one possible implementation, the filtering request further includes the size of the memory space applied at the host side. Before the filtered data is sent to the host, the data extraction method may further include: and determining that the size of the filtered data is smaller than or equal to the size of the memory space.
In a possible implementation manner, the data extraction method may further include: if the size of the filtered data is larger than the size of the memory space, deleting the filtered data, and returning an error signal to the host end, wherein the error signal is used for indicating that the memory space applied by the host end is insufficient.
In a third aspect, the present application provides a data extraction device, applied to a host, including:
An acquisition module for acquiring a reference column from the computable storage device;
The determining module is used for determining filtering information according to the content of the reference column and the selection condition;
The sending module is used for sending a filtering request to the computable storage device, wherein the filtering request carries filtering information, and the filtering request is used for indicating the computable storage device to filter the target column based on the filtering information;
and a receiving module for receiving the filtered data from the computable storage device.
In a possible implementation manner, the determining module is further configured to: and inquiring storage information of the target column in the computable storage device, wherein the storage information comprises a logic address of the target column and an address offset and a length of the target column in the logic address, and generating the filtering request according to the storage information and filtering information.
In a possible implementation manner, the determining module is specifically configured to: and determining the data quantity after filtering the target column according to the size of the target column and the filtering information, applying for a memory space with the same size as the data quantity in the host end, and generating a filtering request according to the storage information, the filtering information and the address and the size of the memory space.
In a possible implementation manner, the determining module is further configured to: and determining the data volume after filtering the target column according to the size of the target column and the filtering information, and applying for a memory space with the same size as the data volume in a host side.
In a fourth aspect, the present application provides a data extraction apparatus for application to a computable storage device, comprising:
The receiving module is used for receiving a filtering request from a host, wherein the filtering request carries filtering information, the filtering request is used for indicating the calculable storage equipment to filter a target column based on the filtering information, and the filtering information is determined according to the content of a reference column and a selection condition;
the filtering module is used for filtering the target column based on the filtering information;
And the sending module is used for sending the filtered data to the host side.
In a possible implementation manner, the method further comprises a determining module, which is used for determining a data block corresponding to the logic address of the target column; the target column is determined in the data block based on the address offset and length of the target column in the logical address.
In a possible implementation manner, the filtering request further includes the size of the memory space applied at the host end, and the determining module is further configured to: and determining that the size of the filtered data is smaller than or equal to the size of the memory space.
In a possible implementation manner, the sending module is further configured to: if the size of the filtered data is larger than the size of the memory space, deleting the filtered data, and returning an error signal to the host end, wherein the error signal is used for indicating that the memory space applied by the host end is insufficient.
In a fifth aspect, the present application provides a data extraction system comprising:
A computable storage device and a host side, wherein the host side is configured to perform the data extraction method according to the first aspect; the computable memory device is configured to perform the data extraction method as described in the second aspect.
In a sixth aspect, the present application provides an electronic device, comprising:
A memory and a processor;
The memory is used for storing program instructions;
the processor is configured to invoke program instructions in the memory to perform the data extraction method of the first aspect.
In a seventh aspect, the present application provides a computer readable storage medium, in which computer program instructions are stored, which when executed implement the data extraction method of the first aspect.
In an eighth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the data extraction method of the first aspect.
The application provides a data extraction method, a device and a storage medium, wherein a host side firstly acquires a reference column from a computable storage device, then determines filtering information according to the content and selection conditions of the reference column, and sends a filtering request to the computable storage device, wherein the filtering request carries the filtering information, and the filtering request is used for indicating the computable storage device to filter a target column based on the filtering information, and then receives filtered data from the computable storage device. By filtering the target column in the computable storage device, the computable storage device returns the filtered data to the host side, so as to reduce the occupation of IO bandwidth and memory bandwidth in the system and reduce the calculation pressure of the host side, such as the consumption of a Central Processing Unit (CPU).
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions of the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort to a person skilled in the art.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;
FIG. 2 is a flowchart of a data extraction method according to an embodiment of the present application;
FIG. 3 is an exemplary diagram of generating filter information based on a single reference column provided by an embodiment of the present application;
FIG. 4 is an exemplary diagram of generating filter information based on two reference columns provided by an embodiment of the present application;
FIG. 5 is an exemplary diagram depicting destination column data information provided by an embodiment of the present application;
FIG. 6 is a flowchart of a data extraction method according to another embodiment of the present application;
FIG. 7 is an exemplary diagram of a computing storage device encapsulating a filtering result in an embodiment of the application;
FIG. 8 is a flowchart of a data extraction method according to another embodiment of the present application;
FIG. 9 is a schematic diagram of a data extraction device according to an embodiment of the present application;
Fig. 10 is a schematic structural diagram of a data extraction device according to another embodiment of the present application;
fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
First, some technical terms related to the present application will be explained:
The CPU (Central Processing Unit, CPU) is used as the operation and control core of the computer system and is the final execution unit for information processing and program running. Since the generation of the CPU, great development is made on the aspects of logic structure, operation efficiency and functional extension.
At present, in the analysis of data stored in a column, a plurality of columns are generally involved in the analysis, and the selection condition of the data is usually specific to a certain column, namely, the database engine reads each column involved in the analysis into the memory, then filters the column according to the selection condition of a certain column, and extracts corresponding data from other columns according to the filtering result.
Based on the above problems, embodiments of the present application provide a data extraction method, apparatus, and storage medium, which generate corresponding filtering information according to a reference column, and utilize the computing capability of a computable storage device itself to complete data filtering of a target column in the computable storage device, and return the filtered result to a host, thereby reducing the occupation of IO bandwidth and memory bandwidth in a system, and reducing the consumption of a CPU at the host.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application. As shown in fig. 1, in this application scenario, when the host 110 includes a CPU and a memory and needs to perform data processing, the host firstly extracts a reference column from the computable storage device 120 and stores the reference column in the memory, then the CPU performs an operation of determining filtering information of the reference column from the selection conditions, and then the host 110 applies for a memory space for storing a return result according to the filtering information, and sends a filtering request to the computable storage device 120.
Among other things, the computable storage device 120 includes a computing unit, a cache, a storage controller, and a storage medium. Specifically, the computing unit is configured to extract data in the target column according to the filtering information, cache the data for temporarily storing the data after extracting and filtering the target column in the computable storage device 120, and the storage controller is configured to query the target column from the storage medium and send the filtered data to the host 110. In summary, when data processing is required, the host 110 sends a filtering request to the computable storage device 120, the computable storage device 120 performs data filtering, and the filtering operation is completed, and the filtered data is returned to the host 110.
It should be noted that fig. 1 is only a schematic diagram of an application scenario provided by an embodiment of the present application, and the embodiment of the present application does not limit the devices included in fig. 1 or limit the positional relationship between the devices in fig. 1. For example, in the application scenario shown in fig. 1, the computable storage device 120 may be an external memory with respect to the host side 110, or an internal memory integrated into the host side 110.
The technical scheme of the application is described in detail through specific embodiments. It should be noted that the following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.
Fig. 2 is a flowchart of a data extraction method according to an embodiment of the present application. The method is applied to a host side. As shown in fig. 2, the data extraction method includes the steps of:
s201, acquiring a reference column from the computable storage device.
In practical applications, when the host side has a need to obtain multiple columns of data from the computable storage device, for example, data analysis, the host side will first read the reference columns stored in the storage medium of the computable storage device into the memory of the host side.
The reference column is a column containing specific content determined according to the selection condition, and the naming mode is not limited in the application.
S202, determining filtering information according to the content of the reference column and the selection condition.
Wherein the determination of the filtering information may be either element-based or element group-based. Typically, when the amount of data is too large, the filtering information is often determined based on the element group, thereby speeding up the rate of data filtering. Illustratively, when determining filter information based on element level, selected elements are noted 1 and unselected elements are noted 0; when determining the filtering information based on the element group level, if any element in the element group is selected, all elements corresponding to the element group are selected.
Illustratively, FIG. 3 is an exemplary diagram of generating filter information based on a single reference column provided by an embodiment of the present application. As shown in fig. 3, fig. 3 shows an example of generating filter information of different granularities (element level and element group level), and the selection condition of the reference column is that the element value is greater than 4, and 1 in the generated filter information indicates that it is selected and 0 indicates that it is not selected.
Alternatively, the reference column may be one column or a plurality of columns. When the reference column is a plurality of columns, the filter information is generated according to a logical relationship between the column selection conditions. For example, the data contained in the reference column may represent a student's number, a student's subject score, or the like, and when the reference column is a plurality of columns, it may represent a student's subject score of a plurality of subjects. Meanwhile, the selection condition may be a logical relationship of selection conditions of a plurality of subjects, such as learning more than 100 minutes and english more than 110 minutes, or language less than 120 minutes and physical more than 140 minutes. In a specific implementation process, selection conditions are determined first, and then specific reference columns are determined according to the selection conditions.
Illustratively, FIG. 4 is an exemplary diagram of generating filter information based on two reference columns provided by an embodiment of the present application. As shown in fig. 4, fig. 4 shows the case of generating the filtering information at the element level when two reference columns are used, and two different multi-column selection conditions are included in this example, namely, "column 1 is greater than 4 and column 2 is less than 8" and "column 1 is greater than 4 or column 2 is less than 5". At this time, the filtering information is generated according to the logical relation among the plurality of selection conditions, and further, the specific meaning represented by the column 1 may be the number of times of being evaluated as excellent in one year, and the specific meaning represented by the column 2 may be the rank of the rank. Taking the first selection condition as an example, the selection condition is that the first eight employees with the total rank being evaluated as excellent times of more than 4 in one year are screened out.
S203, sending a filtering request to the computable storage device.
The filtering request carries filtering information, and the filtering request is used for indicating the computable storage device to filter the target column based on the filtering information.
In addition, the target column is a column in which specific contents are required to be extracted according to the filter information, for example, in the above embodiment, the target column is the name of an employee.
In the application, after the host generates the filtering information, the host sends the filtering request to the computable storage device, and the computable storage device completes the filtering.
S204, receiving filtered data from the computable storage device.
In the embodiment of the application, a host side firstly acquires a reference column from a computable storage device, then determines filtering information according to the content and the selection condition of the reference column, and then sends a filtering request to the computable storage device, wherein the filtering request carries the filtering information, the filtering request is used for indicating the computable storage device to filter a target column based on the filtering information, and after filtering, the host side receives filtered data from the computable storage device. By filtering the target column in the computable storage device, the computable storage device returns the filtered data to the host side, so as to reduce the occupation of IO bandwidth and memory bandwidth in the system and reduce the calculation pressure of the host side, such as the consumption of a Central Processing Unit (CPU).
On the basis of the foregoing embodiment, further, before sending the filtering request to the computable storage device in S203, the data extraction method may further include: querying storage information of a target column in the computable storage device, wherein the storage information comprises a logic address of the target column and address offset and length of the target column in the logic address; and generating a filtering request according to the stored information and the filtering information.
As described above, after the filtering information is determined in S202, before the filtering request is generated, the storage information of the target column in the computable storage device needs to be queried, where the storage information includes the address and the size of the target column, and therefore the storage information needs to be sent as the filtering request to the computable storage device, so that after the computable storage device receives the filtering request, the specific position of the target column is determined from the storage information in the filtering request, and then filtering is performed according to the filtering information.
For example, fig. 5 is an exemplary diagram describing data information of a target column, where, as shown in fig. 5, a host side queries a logical address occupied by a target column in a storage device, and an offset and a length in the logical address, and confirms a data type of the target column, fig. 5 describes related information of target data, in which the target data is 4-byte integer data, the logical address occupied in the storage device (each logical block size is 4 KB) is LBA10 to LBA13, the overall length of the target data is 12KB, the logical address of a first data logical block is LBA10, and the logical address of a 4 th data logical block is LBA13. The target column has an address offset of 2KB in the first logical data block.
On the basis of the foregoing embodiment, still further, generating the filtering request according to the stored information and the filtering information may include: and determining the data volume after filtering the target column according to the size of the target column and the filtering information, applying for a memory space with the same size as the data volume in the host end, and generating a filtering request according to the storage information, the filtering information and the address and the size of the memory space.
In the method, after the storage information of a target column is inquired, the size of data of the filtered target column is calculated according to the storage information and the filtering information of the target column, a host applies for a memory space with a corresponding size in a memory, and the storage information, the filtering information and the applied memory space of a target determined before are generated together to generate a filtering request which is sent to a computable storage device for filtering. In addition, in some cases, the size of the information included in some target columns is not easy to directly determine, for example, the content included in the target columns is an evaluation for students, and the specific length is known only after filtering because the evaluation is long or short, so that the memory space needs to be applied at the host side after sending the filtering request.
Optionally, based on the foregoing embodiment, in step S204, before receiving the filtered data from the computable storage device, the data extraction method may further include: determining the data quantity after filtering the target column according to the size of the target column and the filtering information; in the host, a memory space with the same size as the data volume is applied.
It should be noted that, the operation of applying for the memory space may be completed after sending the filtering request and before receiving the filtered data, or may be completed before sending the filtering request, which is not limited by the present application.
In the embodiment of the application, the data volume after filtering the target column is determined according to the size of the target column and the filtering information, and the memory space with the same size as the data volume is applied in the host side. And the step can reserve space for the filtered data by applying for the memory space before determining the filtering request or after sending the filtering request and before receiving the filtered data, thereby ensuring that the filtered data can be effectively stored in the memory and facilitating the subsequent use.
The operations performed by the host side in the data extraction method provided by the present application are described above, and next, the steps performed by the computable storage device are described through fig. 6.
Fig. 6 is a flowchart of a data extraction method according to another embodiment of the present application. The data extraction method provided by the application can be applied to a computable storage device, as shown in fig. 6, and comprises the following steps:
S601, a filtering request from a host end is received, wherein the filtering request carries filtering information, and the filtering request is used for indicating the computable storage device to filter a target column based on the filtering information, and the filtering information is determined according to the content of a reference column and selection conditions.
S602, filtering the target column based on the filtering information.
The calculation unit performs filtering extraction on the target column based on the filtering information previously determined from the reference column, specifically, each number in the filtering information corresponds to the target column, that is, the data is extracted from the target column with the number 1, and the data is not extracted with the number 0. For example, if a certain segment of the filtering information is 10011, the stored data in the corresponding position corresponding to the target column is: 100. 120, 105, 110, 119. The extracted data is 100, 110 and 119 after filtering extraction, and it is not difficult to find that 120 and 105 have been filtered out. When the target data has completed the filtering operation and has no filtering error, the computing unit may further encapsulate the filtered result (the encapsulation information includes the number of elements (groups) in the result, the length of the result, CRC check, etc.
Illustratively, FIG. 7 is an exemplary diagram of a computing storage device encapsulating the filtering results in an embodiment of the application. As shown in fig. 7, the filtered data is encapsulated in the middle, the data header stores information such as the length, the number and the coding mode of the data, the data tail stores a CRC check code, CRC, english is Cyclic Redundancy Check, and chinese is cyclic redundancy check, which is a channel coding technique for generating a short fixed bit check code according to data such as a network data packet or a computer file, and is mainly used for detecting or checking errors possibly occurring after data transmission or storage. And adding a CRC (cyclic redundancy check) code to the data tail for checking whether the data is in error in the storage process and the subsequent transmission process.
The filtered data is temporarily stored in a cache of a computable storage device, and a computing unit in the storage device may be a general-purpose device such as a CPU or a special-purpose device such as an FPGA.
S603, sending the filtered data to a host side.
It should be noted that, when the filtering operation finds an error, the computing unit discards the request, clears the completed part, discards the part which is not completed, and returns corresponding error information to the host side; and when the filtering operation is error-free, backfilling the packaged result into a memory used for storing the result at the host end by a storage controller in the computable storage device. For the host end, if the host end receives the returned correct information, CRC check is carried out on the returned data, and after the check is correct, the database engine carries out subsequent processing; if the host receives the returned error information, the host end performs data filtering operation, and the operation is completed and then is transmitted to the database engine for subsequent processing. This step is to ensure that the filtering operation is performed properly so that it is not affected by the nonresistible factors in the computable storage device.
In the embodiment of the application, the calculable storage device receives the filtering request from the host end, filters the target column based on the filtering information, and sends the filtered data to the host end after the filtering.
Based on the foregoing embodiment, further, before performing step S602 to filter the target column based on the filtering information, the data extraction method may further include: and determining a data block corresponding to the logical address of the target column, and determining the target column in the data block according to the address offset and the length of the target column in the logical address.
After the storage device receives the filtering request of the host, the storage request carries the storage information of the target column, and the storage information comprises the logic address of the target column and the address offset and the length of the target column in the logic address, so that a storage controller in the storage device reads a corresponding data block from a storage medium according to the logic address of target data, and a computing unit in the storage device intercepts a corresponding data part according to the offset and the length of the target column in the data block. The data portion is the target column for subsequent filtering operations.
In some embodiments, the filtering request may further include the size of the memory space applied at the host side. Before the filtered data is sent to the host, the data extraction method may further include: and determining that the size of the filtered data is smaller than or equal to the size of the memory space.
In a possible implementation manner, the data extraction method may further include: if the size of the filtered data is larger than the size of the memory space, deleting the filtered data, and returning an error signal to the host end, wherein the error signal is used for indicating that the memory space applied by the host end is insufficient.
That is, the size of the memory space is used for comparing with the size of the filtered data, and if the size of the applied memory space is enough, the filtered data is returned to the host end; if the memory space is not enough, an error signal is returned to the host end, and the filtered data is deleted.
In some embodiments, the filter request may also include an address of the memory space applied at the host side. When the filtered data needs to be returned to the host, the address of the memory space carried by the storage device through the received filtering request can be calculated, and the filtered data is stored in the memory space corresponding to the address.
Fig. 8 is a flowchart of a data extraction method according to another embodiment of the present application. This embodiment describes a data extraction method using a host side and a computable storage device as one system. As shown in fig. 8, the data extraction method includes:
S801, the host reads the reference column, and generates filtering information according to the content of the reference column and filtering conditions.
S802, the host side acquires the data type, the logic address, the offset and the length of the target data in the target column, applies for a memory for storing the returned result, and sends the memory to the computable storage device together with the filtering information.
S803, the storage controller reads the corresponding data block from the storage medium according to the logical address of the target data, then the computing unit intercepts the target data content, and filters the data according to the filtering information, and the filtered data is temporarily stored in the cache of the storage device.
S804, whether an error is found.
If no error is found, step S805 is executed to package the filtered data in a predetermined format, and return the filtered data to the memory used for storing the result at the host.
If an error is found, step S806 is executed, and the calculable storage device gives up the request, returns a corresponding error code to the host, and the host completes the filtering operation.
S807, judging whether the filtering of the target column is finished.
If the filtering is completed, step S808 is executed, and the filtering is completed, and the host performs the subsequent operation.
Or if the filtering is not completed, returning to step S802. For large-batch data, generally, a method of multiple filtering may be adopted, where each filtering operation filters a portion of the data, and after filtering, the portion of the filtered data is returned to the host, and step S802 is performed again, so that the operation is repeated. After the multiple times of filtering, if the filtering is judged to be completed, the data after the last filtering is sent to the host end, and the step S802 is not executed any more; for the data of the single filtering, since the filtering operation is completed after the single filtering, it is not necessary to return to step S802 again.
For example, if a company needs to reward employees with work ages of 5 years or more and with a number of times of late arrival of less than 4, in the database, the content represented by each row is information of each person of the employee corresponding to the row, so in the present application, there are two reference columns, the content of reference column 1 is work ages, the content of reference column 2 is number of times of late arrival, and at the same time, the data to be extracted is the name of the employee, and thus the target column in this example is the name of the employee. The selection conditions of the conditions are that the reference column 1 is more than 5 and the reference column 2 is less than 4, filtering information is generated based on the selection conditions, a computing unit in the computable storage device extracts corresponding employee names in the target column according to the filtering information, and the employee names are packaged and transmitted to a host side for subsequent processing.
The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.
Fig. 9 is a schematic structural diagram of a data extraction device 900 according to an embodiment of the application. The data extraction device is applied to a host side. As shown in fig. 9, the data extraction device includes:
an acquisition module 901, configured to acquire a reference column from a computable storage device;
A determining module 902, configured to determine filtering information according to the content of the reference column and the selection condition;
The sending module 903 is configured to send a filtering request to the calculable storage device, where the filtering request carries filtering information, and the filtering request is used to instruct the calculable storage device to filter the target column based on the filtering information;
a receiving module 904 for receiving the filtered data from the computable storage device.
In a possible implementation, the determining module 902 is further configured to: and inquiring storage information of the target column in the computable storage device, wherein the storage information comprises a logic address of the target column and an address offset and a length of the target column in the logic address, and generating the filtering request according to the storage information and filtering information.
In a possible implementation, the determining module 902 is specifically configured to: and determining the data quantity after filtering the target column according to the size of the target column and the filtering information, applying for a memory space with the same size as the data quantity in the host end, and generating a filtering request according to the storage information, the filtering information and the address and the size of the memory space.
In a possible implementation, the determining module 902 is further configured to: and determining the data volume after filtering the target column according to the size of the target column and the filtering information, and applying for a memory space with the same size as the data volume in a host side.
The device provided in the embodiment of the present application may be used to execute the method of the foregoing embodiment, and its implementation principle and technical effects are similar, and will not be described herein.
Fig. 10 is a schematic structural diagram of a data extraction device 1000 according to another embodiment of the application. The data extraction device provided by the embodiment of the application is applied to the computable storage equipment. As shown in fig. 10, the data extraction device includes:
a receiving module 1001, configured to receive a filtering request from a host, where the filtering request carries filtering information, and the filtering request is used to instruct the calculable storage device to filter a target column based on the filtering information, where the filtering information is determined according to content of a reference column and a selection condition;
A filtering module 1002, configured to filter a target column based on the filtering information;
and the sending module 1003 is configured to send the filtered data to the host.
In a possible implementation manner, the device further comprises a determining module (not shown in the figure) for determining a data block corresponding to the logical address of the target column; the target column is determined in the data block based on the address offset and length of the target column in the logical address.
In a possible implementation manner, the filtering request further includes the size of the memory space applied at the host end, and the determining module is further configured to: and determining that the size of the filtered data is smaller than or equal to the size of the memory space.
In a possible implementation, the sending module 1003 is further configured to: if the size of the filtered data is larger than the size of the memory space, deleting the filtered data, and returning an error signal to the host end, wherein the error signal is used for indicating that the memory space applied by the host end is insufficient.
The device provided in the embodiment of the present application may be used to execute the method of the foregoing embodiment, and its implementation principle and technical effects are similar, and will not be described herein.
It should be noted that, it should be understood that the division of the modules of the above apparatus is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated. And these modules may all be implemented in software in the form of calls by the processing element; or can be realized in hardware; the method can also be realized in a form of calling software by a processing element, and the method can be realized in a form of hardware by a part of modules. For example, the processing module may be a processing element that is set up separately, may be implemented in a chip of the above-mentioned apparatus, or may be stored in a memory of the above-mentioned apparatus in the form of program codes, and the functions of the above-mentioned processing module may be called and executed by a processing element of the above-mentioned apparatus. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.
For example, the modules above may be one or more integrated circuits configured to implement the methods above, such as: one or more Application SPECIFIC INTEGRATED Circuits (ASIC), or one or more microprocessors (DIGITAL SIGNAL processors, DSP), or one or more field programmable gate arrays (field programmable GATE ARRAY, FPGA), etc. For another example, when a module above is implemented in the form of processing element scheduler code, the processing element may be a general purpose processor, such as a central processing unit (central processing unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.
Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the application. The electronic device may be provided as a server or terminal device, for example. Referring to FIG. 11, electronic device 1100 includes a processing component 1101 that further includes one or more processors, and memory resources represented by memory 1102, for storing instructions, such as application programs, executable by processing component 1101. The application program stored in memory 1102 may include one or more modules each corresponding to a set of instructions. Further, the processing component 1101 is configured to execute instructions to perform any of the method embodiments described above.
The electronic device 1100 may also include a power component 1103 configured to perform power management of the electronic device 1100, a wired or wireless network interface 1104 configured to connect the electronic device 1100 to a network, and an input output (I/O) interface 1105. The electronic device 1100 may operate based on an operating system stored in the memory 1102, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.
The application also provides a computer readable storage medium, wherein the computer readable storage medium stores computer execution instructions, and when the processor executes the computer execution instructions, the scheme of the data extraction method is realized.
The application also provides a computer program product comprising a computer program which, when executed by a processor, implements a scheme of the data extraction method as above.
The computer readable storage medium described above may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk. A readable storage medium can be any available medium that can be accessed by a general purpose or special purpose computer.
An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. In the alternative, the readable storage medium may be integral to the processor. The processor and the readable storage medium may reside in an Application SPECIFIC INTEGRATED Circuits (ASIC). It is also possible that the processor and the readable storage medium are present as separate components in the data extraction device.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims (7)

1. A data extraction method, which is applied to a host, the data extraction method comprising:
Obtaining a reference column from a computable storage device;
Determining filtering information according to the content of the reference column and the selection condition;
Sending a filtering request to the computable storage device, wherein the filtering request carries the filtering information, and the filtering request is used for indicating the computable storage device to filter a target column based on the filtering information;
the filtered data from the computable storage device is received, CRC check is carried out on the returned data, and after the check is correct, the database engine carries out subsequent processing;
If the returned error information is received, the host end performs data filtering operation, and the operation is finished and then is transmitted to a database engine for subsequent processing;
Before sending the filtering request to the computable storage device, the method further comprises:
querying storage information of a target column in the computable storage device, wherein the storage information comprises a logic address of the target column and an address offset and a length of the target column in the logic address;
generating the filtering request according to the stored information and the filtering information;
The generating the filtering request according to the stored information and the filtering information includes:
determining the data quantity after filtering the target column according to the size of the target column and the filtering information;
applying for a memory space with the same size as the data volume in the host end so as to store the filtered data in the memory space;
And generating the filtering request according to the storage information, the filtering information and the address and the size of the memory space.
2. A data extraction method applied to a computable storage device, the data extraction method comprising:
receiving a filtering request from a host side, wherein the filtering request carries filtering information, and the filtering request is used for indicating the computable storage equipment to filter a target column based on the filtering information, and the filtering information is determined according to the content of the reference column and the selection condition after the host side acquires the reference column from the computable storage equipment;
Filtering the target column based on the filtering information;
when the filtering operation is error-free, sending the filtered data to the host;
When the filtering operation finds errors, the computing unit gives up the request, clears the completed part, and discards the part which is not completed, and simultaneously returns corresponding error information to the host end, so that the host end receives the returned error information, the host end carries out the filtering operation of the data, and the operation is finished and then is transmitted to the database engine for subsequent processing;
The filter request further includes storage information including a logical address of a target column in the computable storage device and an address offset and length of the target column in the logical address; the filtering request is generated according to the storage information, the filtering information and the address and the size of the memory space, wherein the host side determines the data volume after filtering the target column according to the size of the target column and the filtering information, applies for the memory space with the same size as the data volume in the host side and generates the data volume according to the storage information, the filtering information and the address and the size of the memory space; before the filtering information is used for filtering the target column, the method further comprises the following steps:
Determining a data block corresponding to the logical address of the target column;
the target column is determined in the data block according to the address offset and length of the target column in the logical address.
3. The method of claim 2, wherein the filtering request further includes a size of the memory space applied at the host side, and further comprising, before sending the filtered data to the host side:
and determining that the size of the filtered data is smaller than or equal to the size of the memory space.
4. A data extraction device, applied to a host, comprising:
An acquisition module for acquiring a reference column from the computable storage device;
the determining module is used for determining filtering information according to the content of the reference column and the selection condition;
the sending module is used for sending a filtering request to the calculable storage device, wherein the filtering request carries the filtering information, and the filtering request is used for indicating the calculable storage device to filter a target column based on the filtering information;
The receiving module receives the filtered data from the computable storage device, performs CRC (cyclic redundancy check) on the returned data, and performs subsequent processing by the database engine after the returned data is checked to be correct; if the returned error information is received, the host end performs data filtering operation, and the operation is finished and then is transmitted to a database engine for subsequent processing;
The determining module is further used for inquiring storage information of the target column in the computable storage device, wherein the storage information comprises a logic address of the target column and an address offset and a length of the target column in the logic address; generating the filtering request according to the stored information and the filtering information;
The determining module is further configured to determine, according to the size of the target column and the filtering information, an amount of data after filtering the target column; applying for a memory space with the same size as the data volume in the host end so as to store the filtered data in the memory space; and generating the filtering request according to the storage information, the filtering information and the address and the size of the memory space.
5. A data extraction apparatus for application to a computable storage device, the data extraction apparatus comprising:
The system comprises a receiving module, a filtering module and a storage module, wherein the receiving module is used for receiving a filtering request from a host side, the filtering request carries filtering information, the filtering request is used for indicating the calculable storage device to filter a target column based on the filtering information, and the filtering information is determined according to the content of the reference column and the selection condition after the host side acquires the reference column from the calculable storage device;
the filtering module is used for filtering the target column based on the filtering information;
The sending module is used for sending the filtered data to the host end when the filtering operation is error-free; when the filtering operation finds errors, the computing unit gives up the request, clears the completed part, and discards the part which is not completed, and simultaneously returns corresponding error information to the host end, so that the host end receives the returned error information, the host end carries out the filtering operation of the data, and the operation is finished and then is transmitted to the database engine for subsequent processing;
the filter request further includes storage information including a logical address of a target column in the computable storage device and an address offset and length of the target column in the logical address; the filtering request is generated according to the storage information, the filtering information and the address and the size of the memory space, wherein the host side determines the data volume after filtering the target column according to the size of the target column and the filtering information, applies for the memory space with the same size as the data volume in the host side and generates the data volume according to the storage information, the filtering information and the address and the size of the memory space;
the determining module is used for determining a data block corresponding to the logical address of the target column before the target column is filtered based on the filtering information; the target column is determined in the data block according to the address offset and length of the target column in the logical address.
6. An electronic device, comprising: a memory and a processor;
The memory is used for storing program instructions;
the processor is configured to invoke the program instructions to perform the data extraction method of any of claims 1 to 3.
7. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein program instructions, which when executed, implement the data extraction method according to any one of claims 1 to 3.
CN202110818362.1A 2021-07-20 2021-07-20 Data extraction method, device and storage medium Active CN113536075B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110818362.1A CN113536075B (en) 2021-07-20 2021-07-20 Data extraction method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110818362.1A CN113536075B (en) 2021-07-20 2021-07-20 Data extraction method, device and storage medium

Publications (2)

Publication Number Publication Date
CN113536075A CN113536075A (en) 2021-10-22
CN113536075B true CN113536075B (en) 2024-06-04

Family

ID=78100368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110818362.1A Active CN113536075B (en) 2021-07-20 2021-07-20 Data extraction method, device and storage medium

Country Status (1)

Country Link
CN (1) CN113536075B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102576333A (en) * 2009-10-05 2012-07-11 马维尔国际贸易有限公司 Data caching in non-volatile memory
US8843914B1 (en) * 2011-09-19 2014-09-23 Amazon Technologies, Inc. Distributed update service
CN105144160A (en) * 2013-03-15 2015-12-09 甲骨文国际公司 A method to accelerate queries using dynamically generated alternate data formats in flash cache
CN109885589A (en) * 2017-12-06 2019-06-14 腾讯科技(深圳)有限公司 Data query method, apparatus, computer equipment and storage medium
CN109983431A (en) * 2016-12-15 2019-07-05 甲骨文国际公司 System and method for storing the list retrieval in equipment
CN110337137A (en) * 2019-05-22 2019-10-15 华为技术有限公司 Packet filtering method, apparatus and system
WO2020005337A1 (en) * 2018-06-30 2020-01-02 Western Digital Technologies, Inc. Multi-device storage system with hosted services on peer storage devices
CN110737813A (en) * 2019-09-26 2020-01-31 苏州浪潮智能科技有限公司 method, equipment and medium for improving efficiency of reptile
US10698756B1 (en) * 2017-12-15 2020-06-30 Palantir Technologies Inc. Linking related events for various devices and services in computer log files on a centralized server
CN111679909A (en) * 2020-05-19 2020-09-18 深圳市元征科技股份有限公司 Data processing method and device and terminal equipment
CN112115160A (en) * 2020-08-14 2020-12-22 苏宁云计算有限公司 Query request scheduling method and device and computer system
WO2021036848A1 (en) * 2019-08-26 2021-03-04 华为技术有限公司 Data processing method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102103543B1 (en) * 2013-11-28 2020-05-29 삼성전자 주식회사 All-in-one data storage device having internal hardware filter, method thereof, and system having the data storage device
KR102214511B1 (en) * 2014-02-17 2021-02-09 삼성전자 주식회사 Data storage device for filtering page using 2-steps, system having the same, and operation method thereof
KR102251811B1 (en) * 2015-01-02 2021-05-13 삼성전자주식회사 Data storage device having internal hardware filter, and data processing system having the data storage device
CN113126888B (en) * 2020-01-15 2024-04-19 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for storage management

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102576333A (en) * 2009-10-05 2012-07-11 马维尔国际贸易有限公司 Data caching in non-volatile memory
US8843914B1 (en) * 2011-09-19 2014-09-23 Amazon Technologies, Inc. Distributed update service
CN105144160A (en) * 2013-03-15 2015-12-09 甲骨文国际公司 A method to accelerate queries using dynamically generated alternate data formats in flash cache
CN109983431A (en) * 2016-12-15 2019-07-05 甲骨文国际公司 System and method for storing the list retrieval in equipment
CN109885589A (en) * 2017-12-06 2019-06-14 腾讯科技(深圳)有限公司 Data query method, apparatus, computer equipment and storage medium
US10698756B1 (en) * 2017-12-15 2020-06-30 Palantir Technologies Inc. Linking related events for various devices and services in computer log files on a centralized server
WO2020005337A1 (en) * 2018-06-30 2020-01-02 Western Digital Technologies, Inc. Multi-device storage system with hosted services on peer storage devices
CN110337137A (en) * 2019-05-22 2019-10-15 华为技术有限公司 Packet filtering method, apparatus and system
WO2021036848A1 (en) * 2019-08-26 2021-03-04 华为技术有限公司 Data processing method and device
CN110737813A (en) * 2019-09-26 2020-01-31 苏州浪潮智能科技有限公司 method, equipment and medium for improving efficiency of reptile
CN111679909A (en) * 2020-05-19 2020-09-18 深圳市元征科技股份有限公司 Data processing method and device and terminal equipment
CN112115160A (en) * 2020-08-14 2020-12-22 苏宁云计算有限公司 Query request scheduling method and device and computer system

Also Published As

Publication number Publication date
CN113536075A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
CN109325009B (en) Log analysis method and device
CN109684607B (en) JSON data analysis method and device, computer equipment and storage medium
CN113835870A (en) Data processing method and system
CN112738216B (en) Equipment adaptation method, device, equipment and computer readable storage medium
CN109597618B (en) Program development method, program development device, computer device, and storage medium
CN112187713B (en) Message conversion method, device, computer equipment and storage medium
CN108241676A (en) Realize the method and apparatus that data synchronize
CN112000589A (en) Data writing method, data reading device and electronic equipment
CN113495728B (en) Dependency relationship determination method, dependency relationship determination device, electronic equipment and medium
CN113177045A (en) Data extraction method and device, computable storage equipment and data request equipment
CN115390847A (en) Log processing method and device, computer readable storage medium and terminal
CN112181430A (en) Code change statistical method and device, electronic equipment and storage medium
CN113391972A (en) Interface testing method and device
CN108776665B (en) Data processing method and device
CN113536075B (en) Data extraction method, device and storage medium
CN112286594B (en) Object serialization and deserialization method and device, electronic device and medium
US10970206B2 (en) Flash data compression decompression method and apparatus
CN110737678B (en) Data searching method, device, equipment and storage medium
US20230289298A1 (en) Method and device for splitting operators, and storage medium
CN111897833A (en) Data processing method and device
CN111507430A (en) Feature coding method, device, equipment and medium based on matrix multiplication
CN114115900B (en) Script compiling method and device and electronic equipment
CN111190896A (en) Data processing method, data processing device, storage medium and computer equipment
CN111061927A (en) Data processing method and device and electronic equipment
CN113656830B (en) Database desensitization grammar parsing method, system, computer and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant