CN109284260B - Big data file reading method and device, computer equipment and storage medium - Google Patents

Big data file reading method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN109284260B
CN109284260B CN201811203112.1A CN201811203112A CN109284260B CN 109284260 B CN109284260 B CN 109284260B CN 201811203112 A CN201811203112 A CN 201811203112A CN 109284260 B CN109284260 B CN 109284260B
Authority
CN
China
Prior art keywords
data
field
row
target
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811203112.1A
Other languages
Chinese (zh)
Other versions
CN109284260A (en
Inventor
袁彪
张要伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Securities Co Ltd
Original Assignee
Ping An Securities Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Securities Co Ltd filed Critical Ping An Securities Co Ltd
Priority to CN201811203112.1A priority Critical patent/CN109284260B/en
Publication of CN109284260A publication Critical patent/CN109284260A/en
Application granted granted Critical
Publication of CN109284260B publication Critical patent/CN109284260B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • G06F12/0246Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, a device, computer equipment and a storage medium for reading a big data file, which are applied to the technical field of databases and are used for solving the problem that the data reading mode of the existing dbf file is low in efficiency. The method provided by the application comprises the following steps: reading header information of the target dbf file to obtain data quantity of each row of fields in the header information; mapping the first block field data of the target dbf file into a specified memory to serve as a current field block; acquiring field data of each row in the current field block row by row in sequence according to the field data quantity of each row; analyzing the acquired field data to obtain an analyzed data value; before all field data of the target dbf file are acquired, mapping the next field data of the target dbf file into a designated memory to serve as a new current field block, and returning to execute the step of acquiring the field data of each row in the current field block row by row in sequence according to the field data quantity of each row.

Description

Big data file reading method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of database technologies, and in particular, to a method and apparatus for reading a big data file, a computer device, and a storage medium.
Background
With the advent of the large data age, storage, reading and migration of data has become increasingly important. Currently, most databases use dbf format files to store data. The dbf file (database file with suffix of. Dbf) is a database format file used by database systems such as dBase and FoxPro. Most of the choices are made to use dbf files because the amount of data needed in the large data age is very large, and the server often needs to obtain the needed data from a data center or elsewhere and store it in a database so that the client can quickly respond when requesting the data from the server. dbf files are widely used because of their good data structure.
When reading the dbf file, the existing mode sequentially reads each piece of data from the disk according to the requirement of the dbf field format, and analyzes the field data in the dbf file in the reading process to obtain the value of the corresponding field.
However, when the dbf file with large data volume is read, for example, the dbf file with the volume of more than 10G is read, a large amount of reading time is required, and the requirement of the large data age on quick data reading is difficult to meet.
Disclosure of Invention
The embodiment of the application provides a method, a device, computer equipment and a storage medium for reading a big data file, which are used for solving the problem that the efficiency of the data reading mode of the existing dbf file is low.
A method for reading a large data file, comprising:
reading header information of a target dbf file to obtain data volume of each row of field in the header information, wherein the data volume of each row of field refers to the size of each row of field in the target dbf file;
mapping first block field data of the target dbf file to a specified memory to serve as a current field block, wherein the first block field data refers to field data which is located behind the head information and is close to the head information and has a preset data size in the target dbf file;
acquiring field data of each row in the current field block row by row in sequence according to the field data quantity of each row;
analyzing the acquired field data to obtain an analyzed data value;
before all field data of the target dbf file are acquired, mapping the next field data of the target dbf file into the designated memory to serve as a new current field block, and returning to execute the step of sequentially acquiring the field data of each row in the current field block row by row according to the field data quantity of each row, wherein the next field data refers to the field data which is positioned behind the current field block and is close to the preset data quantity of the current field block in the target dbf file.
A large data file reading apparatus comprising:
the header information reading module is used for reading header information of the target dbf file to obtain data volume of each row of field in the header information, wherein the data volume of each row of field refers to the size of each row of field in the target dbf file;
the field data mapping module is used for mapping the first block field data of the target dbf file to a specified memory to serve as a current field block, wherein the first block field data refers to field data which is located behind the head information and is close to the head information and has a preset data size in the target dbf file;
a line-by-line acquisition module, configured to sequentially acquire field data of each line in the current field block line by line according to the field data amount of each line;
the field data analysis module is used for analyzing the acquired field data to obtain an analyzed data value;
and the circulation processing module is used for mapping next field data of the target dbf file to the designated memory as a new current field block until all field data of the target dbf file are acquired, and returning to trigger the progressive acquisition module and the field data analysis module in sequence, wherein the next field data refers to the field data which is positioned behind the current field block and is close to the current field block in the target dbf file and has the preset data size.
A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the big data file reading method described above when the computer program is executed.
A computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the large data file reading method described above.
The method, the device, the computer equipment and the storage medium for reading the big data file comprise the steps of firstly, reading the header information of a target dbf file to obtain the data volume of each row of field data in the header information, wherein the data volume of each row of field data refers to the size of each row of field data in the target dbf file; then, mapping the first block field data of the target dbf file to a specified memory to serve as a current field block, wherein the first block field data refers to field data which is positioned behind the head information and is close to the head information and has a preset data size in the target dbf file; sequentially obtaining field data of each row in the current field block row by row according to the field data quantity of each row; analyzing the acquired field data to obtain an analyzed data value; before all field data of the target dbf file are acquired, mapping the next field data of the target dbf file into the designated memory to serve as a new current field block, and returning to execute the step of sequentially acquiring the field data of each row in the current field block row by row according to the field data quantity of each row, wherein the next field data refers to the field data which is positioned behind the current field block and is close to the preset data quantity of the current field block in the target dbf file. Therefore, the speed of processing data in the memory is far faster than that of processing data in the disk, and the file mapping mode is higher than that of reading data on the disk, so that the overall processing efficiency is higher than that of the existing mode, the efficiency of reading data of the dbf file is improved, the reading time of the data is shortened, and the requirement of the large data era on quick data reading can be met.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an application environment of a method for reading a big data file according to an embodiment of the application;
FIG. 2 is a flow chart of a method for reading a big data file according to an embodiment of the application;
FIG. 3 is a flowchart of the step 102 of the method for reading a big data file in an application scenario according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of deleting header information in an application scenario in a big data file reading method according to an embodiment of the application;
FIG. 5 is a flow chart of presetting the preset data size in an application scenario according to a big data file reading method in an embodiment of the application;
FIG. 6 is a schematic diagram of a big data file reading device according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a computer device in accordance with an embodiment of the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The method for reading the big data file provided by the application can be applied to an application environment as shown in fig. 1, wherein a client communicates with a server through a network. The client may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.
In one embodiment, as shown in fig. 2, a method for reading a big data file is provided, and the method is applied to the server in fig. 1, and includes the following steps:
101. reading header information of a target dbf file to obtain data volume of each row of field in the header information, wherein the data volume of each row of field refers to the size of each row of field in the target dbf file;
in this embodiment, the dbf file is one of database format files, which has a fixed format requirement, for example, the header of the dbf file is header information, and the size of one line of field data in the dbf file, that is, the data size of each line of field data, is recorded in the header information. Accordingly, the server can obtain the data amount of each line of field in the header information by reading the header information of the target dbf file.
102. Mapping first block field data of the target dbf file to a specified memory to serve as a current field block, wherein the first block field data refers to field data which is located behind the head information and is close to the head information and has a preset data size in the target dbf file;
it can be appreciated that, because the data is mapped to the memory at a much faster rate than the data is read, the present solution maps the target dbf file to the specified memory for processing in a file mapping manner. Because the target dbf file is usually large, and the memory resources of the server are limited, it is often difficult to map the entire target dbf file into memory at one time. Therefore, in this embodiment, the server may map the first block field data of the target dbf file to the specified memory as the current field block, where the first block field data refers to field data of a preset data size located after and next to the header information in the target dbf file. For example, assume that 0-100 bytes in the target dbf file are header information and 101-10000 bytes are field data. When the preset data size is 100 bytes, the first block of field data is 101-200 bytes, and the server maps the field data of 101-200 bytes in the target dbf file to the specified memory as the current field block.
In practical use, when mapping the target dbf file to the specified memory, it is difficult to map the header field data to the specified memory alone. To this end, as shown in fig. 3, further, the step 102 may specifically include:
201. mapping data with the preset mapping data size from the head information in the target dbf file to a specified memory, wherein the preset mapping data size is equal to the sum of the preset data size and the data size of the head information;
202. and determining the data except the head information in the data mapped to the appointed memory as a current field block.
For step 201, it may be understood that, in order to successfully map the first block field data to the specified memory, the first block field data may be mapped to the specified memory together with header information, that is, field data with a preset mapping data size from the header of the target dbf file, where the preset mapping data size is equal to the sum of the preset data size and the data size of the header information. For example, the above example is received, i.e., 0-200 bytes of data in the target dbf file are mapped into a specified memory.
For step 202, since the data mapped in step 201 includes header information, and the header information does not include the values of the field data required by the scheme, the server may directly determine the data other than the header information in the data mapped to the specified memory as the current field block. The header information mapped to the specified memory may be erased or ignored, which is not limited in this embodiment.
In order to solve the above-mentioned problem that it is difficult to implement separate mapping of the header field data when mapping data, another way is to perform processing in this embodiment. As shown in fig. 4, further, before step 102, the method further includes:
301. deleting the header information of the target dbf file to obtain a new target dbf file;
step 102 specifically comprises: and mapping the field data with the preset data size at the head part in the target dbf file to a specified memory to serve as a current field block.
The idea of this way is that the header information in the target dbf file is deleted before the target dbf file is mapped, so that when the step 102 is executed to map the target dbf file, only field data remains in the new target dbf file, and no interference caused by the header information exists, so that the mapping process can be directly performed.
For step 301, the server may delete header information of the target dbf file to obtain a new target dbf file after deleting the header information.
Based on step 301, it may be understood that in step 102, the server may map the field data with the preset data size in the header of the target dbf file to the specified memory as the current field block. This is because, after the processing in step 301, the header information is not already in the target dbf file, so that the field data with the preset data size can be directly fetched from the beginning and mapped into the specified memory, and the field data mapped into the specified memory is the current field block.
In the dbf file, field data are stored in the form of rows, and the data volume of each row is the field data volume of each row. However, considering that the data size of the current field block is a preset data size when the mapped current field block is determined, if the preset data size and the field data size of each row are not limited, the data of the last row in the current field block may be less than one row of field data. For example, assuming that the data size of each line of field is 100, the preset data size is 250,0-100 bytes and is header information, the first block of field data of the target dbf file is 101-350, the last line of data is 301-350, and it can be seen that the last line of data is less than one line of field data, which may cause difficulty in subsequent analysis of field data. For this reason, the present embodiment can avoid the above situation by defining the relationship between the preset data amount size and the field data amount of each line, so that the last line data of each current field block is enough one line of field data.
As shown in fig. 5, further, the preset data size may be preset by:
401. acquiring the current available space of the appointed memory;
402. determining a mapping space of the specified memory for mapping field data according to a preset memory usage proportion and the current available space;
403. dividing the mapping space by the data volume of each row of field to obtain a first numerical value;
404. and rounding the first value, and calculating the product of the rounded first value and the data volume of each row of field to obtain a second value serving as a preset data volume.
For step 401, the currently available space is the remaining space that the specified memory can currently use, for example, assuming that the specified memory is 4g and 2g has been used, the currently available space is 2g.
For step 402, it is understood that the server may preset a memory usage ratio that indicates how much of the memory in the currently available space should be used to map data. The memory usage ratio may be set according to actual usage conditions, for example, may be set to 50%. After the server obtains the current available space of the specified memory, the server can calculate the product of the preset memory usage proportion and the current available space to obtain the mapping space of the specified memory for mapping field data.
For steps 403 and 404, in this embodiment, in order to make the preset data size equal to an integer multiple of the data size of each row of field, the preset data size is as close as possible to the size of the mapping space. Therefore, the mapping space may be divided by the field data amount of each row to obtain a first value, then the first value is rounded, the rounded first value may be regarded as a multiple, and a second value obtained by calculating the product of the rounded first value and the field data amount of each row is an integer multiple of the field data amount of each row, so as to determine that the second value is a preset data amount.
103. Acquiring field data of each row in the current field block row by row in sequence according to the field data quantity of each row;
it can be understood that, because the field data in the dbf file is arranged and stored in a row and a row, and the arrangement sequence of each field and the bytes occupied by each field are pre-agreed in the field data of each row, when the field data in the current field block is acquired, the field data of each row in the current field block needs to be acquired row by row sequentially according to the field data quantity of each row, and then the field data are respectively parsed in the executing step 104.
Further, as can be seen from the foregoing, the last line of data of the current field block may be less than one line of field data, and in this case, after step 103, the method may further include: if the data volume of the field data of the last line of the current field block is less than the data volume of each line of field data, temporarily storing the field data of the last line as continuous data; then, after mapping the next field data of the target dbf file to the designated memory as a new current field block in step 105, the continuation data is merged into the header of the current field block before returning to the step of sequentially obtaining the field data of each line in the current field block line by line according to the amount of each line of field data. For example, given the above example, assuming that the last line of data of the current field block is 301-350, the server may save the last line of data in memory, then, after obtaining a new current field block in step 105, it may know that the new current field block is 351-600, and the server merges the last line of data into the header of the current field block, so that the current field block is 301-600. It can be seen that the last line of data of the current field block can be well processed by this way of processing.
104. Analyzing the acquired field data to obtain an analyzed data value;
in this embodiment, after each time the field data of each row in the current field block is obtained, the server may parse the obtained field data to obtain the parsed data value. It can be understood that each field in the field data of each row in the dbf file is arranged according to a predetermined arrangement sequence, and the bytes occupied by each field are also predetermined, so that after the field data are acquired, the data values in the field data are directly resolved according to the predetermined fields.
105. Before all field data of the target dbf file are acquired, mapping the next field data of the target dbf file into the designated memory to serve as a new current field block, and returning to execute the step of sequentially acquiring the field data of each row in the current field block row by row according to the field data quantity of each row, wherein the next field data refers to the field data which is positioned behind the current field block and is close to the preset data quantity of the current field block in the target dbf file.
In this embodiment, since the field data in the entire target dbf file is obtained and parsed, after the step 104 is performed, that is, after the current field block is obtained and parsed, the server may map the next field data of the target dbf file to the specified memory as a new current field block, then return to the step 103 and the step 104 to process the next field data of the first field data, after the step 103 and the step 104 process the next field data, determine the next field data as a new current field block, then obtain and parse until all field data of the target dbf file are obtained and parsed, and it is known that the data values of all field data in the target dbf file can be obtained at this time, so that the reading of the target dbf file is completed.
In the embodiment of the application, firstly, header information of a target dbf file is read to obtain the data volume of each row of field in the header information, wherein the data volume of each row of field refers to the size of each row of field in the target dbf file; then, mapping the first block field data of the target dbf file to a specified memory to serve as a current field block, wherein the first block field data refers to field data which is positioned behind the head information and is close to the head information and has a preset data size in the target dbf file; sequentially obtaining field data of each row in the current field block row by row according to the field data quantity of each row; analyzing the acquired field data to obtain an analyzed data value; before all field data of the target dbf file are acquired, mapping the next field data of the target dbf file into the designated memory to serve as a new current field block, and returning to execute the step of sequentially acquiring the field data of each row in the current field block row by row according to the field data quantity of each row, wherein the next field data refers to the field data which is positioned behind the current field block and is close to the preset data quantity of the current field block in the target dbf file. Therefore, the speed of processing data in the memory is far faster than that of processing data in the disk, and the file mapping mode is higher than that of reading data on the disk, so that the overall processing efficiency is higher than that of the existing mode, the efficiency of reading data of the dbf file is improved, the reading time of the data is shortened, and the requirement of the large data era on quick data reading can be met.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
In an embodiment, a large data file reading device is provided, where the large data file reading device corresponds to the large data file reading method in the above embodiment one by one. As shown in fig. 6, the large data file reading apparatus includes a header information reading module 501, a field data mapping module 502, a progressive acquisition module 503, a field data parsing module 504, and a loop processing module 505. The functional modules are described in detail as follows:
the header information reading module 501 is configured to read header information of a target dbf file, and obtain a field data amount of each row in the header information, where the field data amount of each row refers to a size of field data of each row in the target dbf file;
the field data mapping module 502 is configured to map, as a current field block, first block field data of the target dbf file to a specified memory, where the first block field data refers to field data, in the target dbf file, located after the header information and close to the header information, with a preset data size;
a progressive obtaining module 503, configured to sequentially obtain, progressive, according to the field data amount of each row, field data of each row in the current field block;
the field data parsing module 504 is configured to parse the acquired field data to obtain parsed data values;
and the loop processing module 505 is configured to map, until all field data of the target dbf file is acquired, next block field data of the target dbf file to the specified memory as a new current field block, and return to trigger the progressive acquisition module and the field data analysis module sequentially, where the next block field data refers to field data of the preset data size located after the current field block and next to the current field block in the target dbf file.
Further, the field data mapping module may include:
a first mapping unit, configured to map data with a preset mapping data size from the header information in the target dbf file to a specified memory, where the preset mapping data size is equal to a sum of the preset data size and the data size of the header information;
and the field block determining unit is used for determining data except the header information in the data mapped to the appointed memory as a current field block.
Further, the big data file reading device may further include:
the head information deleting module is used for deleting the head information of the target dbf file to obtain a new target dbf file;
the field data mapping module is specifically configured to: and mapping the field data with the preset data size at the head part in the target dbf file to a specified memory to serve as a current field block.
Further, the preset data size may be preset by:
the available space acquisition module is used for acquiring the current available space of the appointed memory;
the mapping space determining module is used for determining a mapping space of the appointed memory for mapping field data according to a preset memory use proportion and the current available space;
the first numerical value calculation module is used for dividing the mapping space by the data volume of each row of field to obtain a first numerical value;
the rounding module is used for rounding the first numerical value, calculating the product of the rounded first numerical value and the data volume of each row of field, and obtaining a second numerical value serving as the preset data volume.
Further, the big data file reading device may further include:
the temporary storage module is used for temporarily storing the field data of the last line as continuous data if the acquired data volume of the field data of the last line of the current field block is smaller than the field data volume of each line;
and the merging module is used for merging the continuous data to the head of the current field block before returning to triggering the progressive acquisition module and the field data analysis module in sequence after mapping the next field data of the target dbf file to the designated memory as a new current field block.
For specific limitations on the large data file reading apparatus, reference may be made to the above limitations on the large data file reading method, and no further description is given here. The above-described respective modules in the large data file reading apparatus may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data involved in the large data file reading method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of reading a large data file.
In one embodiment, a computer device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method for reading a large data file in the above embodiment, such as steps 101 to 105 shown in fig. 2. Alternatively, the processor, when executing a computer program, implements the functions of the modules/units of the big data file reading apparatus in the above embodiments, such as the functions of the modules 501 to 505 shown in fig. 6. In order to avoid repetition, a description thereof is omitted.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of the large data file reading method of the above embodiment, such as steps 101 to 105 shown in fig. 2. Alternatively, the computer program when executed by the processor implements the functions of the modules/units of the large data file reading apparatus in the above embodiment, such as the functions of the modules 501 to 505 shown in fig. 6. In order to avoid repetition, a description thereof is omitted.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (8)

1. A method for reading a large data file, comprising:
reading header information of a target dbf file to obtain data volume of each row of field in the header information, wherein the data volume of each row of field refers to the size of each row of field in the target dbf file;
mapping data with the preset mapping data size from the head information in the target dbf file to a specified memory, wherein the preset mapping data size is equal to the sum of the preset data size and the data size of the head information;
determining data except the head information in the data mapped to the appointed memory as a current field block;
the first block field data refers to field data with a preset data size, which is positioned behind the head information and is close to the head information, in the target dbf file;
acquiring field data of each row in the current field block row by row in sequence according to the field data quantity of each row;
analyzing the acquired field data to obtain an analyzed data value;
before all field data of the target dbf file are acquired, mapping the next field data of the target dbf file into the designated memory to serve as a new current field block, and returning to execute the step of sequentially acquiring the field data of each row in the current field block row by row according to the field data quantity of each row, wherein the next field data refers to the field data which is positioned behind the current field block and is close to the preset data quantity of the current field block in the target dbf file;
the preset data size is preset through the following steps:
acquiring the current available space of the appointed memory;
determining a mapping space of the specified memory for mapping field data according to a preset memory usage proportion and the current available space;
dividing the mapping space by the data volume of each row of field to obtain a first numerical value;
and rounding the first value, and calculating the product of the rounded first value and the data volume of each row of field to obtain a second value serving as a preset data volume.
2. The method for reading a big data file according to claim 1, wherein before mapping the data of the predetermined mapping data size from the header information in the target dbf file to the specified memory, the predetermined mapping data size is equal to a sum of the predetermined data size and the data size of the header information, the method further comprises:
deleting the header information of the target dbf file to obtain a new target dbf file;
the mapping the first block field data of the target dbf file to the specified memory as the current field block specifically includes: and mapping the field data with the preset data size at the head part in the target dbf file to a specified memory to serve as a current field block.
3. The large data file reading method according to any one of claims 1 to 2, characterized by further comprising, after sequentially acquiring the field data of each line in the current field block line by line in accordance with the field data amount of each line:
if the data volume of the field data of the last line of the current field block is less than the data volume of each line of field data, temporarily storing the field data of the last line as continuous data;
after mapping the next field data of the target dbf file to the designated memory as a new current field block, merging the continuing data to the header of the current field block before returning to the step of sequentially obtaining the field data of each row in the current field block row by row according to the field data quantity of each row.
4. A big data file reading apparatus, comprising:
the header information reading module is used for reading header information of the target dbf file to obtain data volume of each row of field in the header information, wherein the data volume of each row of field refers to the size of each row of field in the target dbf file;
a field data mapping module, configured to map data with a preset mapping data size from the header information in the target dbf file to a specified memory, where the preset mapping data size is equal to a sum of the preset data size and the data size of the header information; determining data except the head information in the data mapped to the appointed memory as a current field block; the first block field data refers to field data with a preset data size, which is positioned behind the head information and is close to the head information, in the target dbf file;
a line-by-line acquisition module, configured to sequentially acquire field data of each line in the current field block line by line according to the field data amount of each line;
the field data analysis module is used for analyzing the acquired field data to obtain an analyzed data value;
the circulation processing module is used for mapping next field data of the target dbf file to the designated memory to serve as a new current field block until all field data of the target dbf file are acquired, and returning to trigger the progressive acquisition module and the field data analysis module in sequence, wherein the next field data refers to the field data which is positioned behind the current field block and is close to the current field block in the target dbf file and has the preset data size;
the preset data size can be preset through the following steps:
the available space acquisition module is used for acquiring the current available space of the appointed memory;
the mapping space determining module is used for determining a mapping space of the appointed memory for mapping field data according to a preset memory use proportion and the current available space;
the first numerical value calculation module is used for dividing the mapping space by the data volume of each row of field to obtain a first numerical value;
the rounding module is used for rounding the first numerical value, calculating the product of the rounded first numerical value and the data volume of each row of field, and obtaining a second numerical value serving as the preset data volume.
5. The big data file reading apparatus of claim 4, wherein the field data mapping module comprises:
a first mapping unit, configured to map data with a preset mapping data size from the header information in the target dbf file to a specified memory, where the preset mapping data size is equal to a sum of the preset data size and the data size of the header information;
and the field block determining unit is used for determining data except the header information in the data mapped to the appointed memory as a current field block.
6. The large data file reading apparatus of claim 4, wherein the large data file reading apparatus further comprises:
the head information deleting module is used for deleting the head information of the target dbf file to obtain a new target dbf file;
the field data mapping module is specifically configured to: and mapping the field data with the preset data size at the head part in the target dbf file to a specified memory to serve as a current field block.
7. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the big data file reading method according to any of claims 1 to 3 when the computer program is executed.
8. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the big data file reading method according to any of claims 1 to 3.
CN201811203112.1A 2018-10-16 2018-10-16 Big data file reading method and device, computer equipment and storage medium Active CN109284260B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811203112.1A CN109284260B (en) 2018-10-16 2018-10-16 Big data file reading method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811203112.1A CN109284260B (en) 2018-10-16 2018-10-16 Big data file reading method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109284260A CN109284260A (en) 2019-01-29
CN109284260B true CN109284260B (en) 2023-10-13

Family

ID=65177219

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811203112.1A Active CN109284260B (en) 2018-10-16 2018-10-16 Big data file reading method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109284260B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111752814B (en) * 2020-05-26 2022-05-31 苏州浪潮智能科技有限公司 Batch processing method and system for RMT (remote metering test) data

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7293032B1 (en) * 2002-12-11 2007-11-06 Ncr Corp. Compressing decimal types
US7543019B1 (en) * 2004-03-31 2009-06-02 Emc Corporation Methods and apparatus providing backward compatibility for applications that access a changing object model
US20140207771A1 (en) * 2013-01-21 2014-07-24 Snap-On Incorporated Methods and Systems for Mapping Repair Orders within a Database
CN105446991A (en) * 2014-07-07 2016-03-30 阿里巴巴集团控股有限公司 Data storage method, query method and device
CN107797883A (en) * 2016-09-06 2018-03-13 南京中兴新软件有限责任公司 A kind of backup of memory database, restoration methods and device
CN108153919A (en) * 2018-02-28 2018-06-12 弘成科技发展有限公司 DBF data export platform and its deriving method
CN108280223A (en) * 2018-02-09 2018-07-13 弘成科技发展有限公司 DBF data for College Enrollment import platform and introduction method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7293032B1 (en) * 2002-12-11 2007-11-06 Ncr Corp. Compressing decimal types
US7543019B1 (en) * 2004-03-31 2009-06-02 Emc Corporation Methods and apparatus providing backward compatibility for applications that access a changing object model
US20140207771A1 (en) * 2013-01-21 2014-07-24 Snap-On Incorporated Methods and Systems for Mapping Repair Orders within a Database
CN105446991A (en) * 2014-07-07 2016-03-30 阿里巴巴集团控股有限公司 Data storage method, query method and device
CN107797883A (en) * 2016-09-06 2018-03-13 南京中兴新软件有限责任公司 A kind of backup of memory database, restoration methods and device
CN108280223A (en) * 2018-02-09 2018-07-13 弘成科技发展有限公司 DBF data for College Enrollment import platform and introduction method
CN108153919A (en) * 2018-02-28 2018-06-12 弘成科技发展有限公司 DBF data export platform and its deriving method

Also Published As

Publication number Publication date
CN109284260A (en) 2019-01-29

Similar Documents

Publication Publication Date Title
CN109597571B (en) Data storage method, data reading method, data storage device, data reading device and computer equipment
CN110795171B (en) Service data processing method, device, computer equipment and storage medium
CN109040191B (en) File downloading method and device, computer equipment and storage medium
CN111949389B (en) Slurm-based information acquisition method and device, server and computer-readable storage medium
CN112613271A (en) Data paging method and device, computer equipment and storage medium
CN109656474B (en) Data storage method and device, computer equipment and storage medium
CN109284260B (en) Big data file reading method and device, computer equipment and storage medium
CN109522273B (en) Method and device for realizing data writing
CN108829345B (en) Data processing method of log file and terminal equipment
CN116225314A (en) Data writing method, device, computer equipment and storage medium
CN113704027A (en) File aggregation compatible method and device, computer equipment and storage medium
CN112800123A (en) Data processing method, data processing device, computer equipment and storage medium
CN112732819A (en) ETL-based data processing method, device, equipment and storage medium
CN109582516B (en) SSD back-end performance analysis method and device, computer equipment and storage medium
CN112783866A (en) Data reading method and device, computer equipment and storage medium
CN111241818B (en) Word slot filling method, device, equipment and storage medium
CN114003576B (en) Method and device for calculating file traversal progress, computer equipment and storage medium
CN113608675B (en) RAID data IO processing method and device, computer equipment and medium
CN114327274B (en) Mapping table loading checking method and device based on solid state disk and computer equipment
CN113535646A (en) Mirror image file uploading method, device, equipment and medium based on cloud platform
CN110147384B (en) Data search model establishment method, device, computer equipment and storage medium
CN117453643B (en) File caching method, device, terminal and medium based on distributed file system
CN112001805B (en) Medical insurance data processing method, device, equipment and medium based on fixed time window
CN114185620B (en) Method and device for realizing acceleration of SSD firmware loading, computer equipment and storage medium
CN110162561B (en) Offline compression method, offline compression device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant