CN107463332B - File segmentation method and device - Google Patents

File segmentation method and device Download PDF

Info

Publication number
CN107463332B
CN107463332B CN201610390987.1A CN201610390987A CN107463332B CN 107463332 B CN107463332 B CN 107463332B CN 201610390987 A CN201610390987 A CN 201610390987A CN 107463332 B CN107463332 B CN 107463332B
Authority
CN
China
Prior art keywords
file
segmentation
value
processing
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610390987.1A
Other languages
Chinese (zh)
Other versions
CN107463332A (en
Inventor
傅海雯
陈思羽
吴国钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610390987.1A priority Critical patent/CN107463332B/en
Publication of CN107463332A publication Critical patent/CN107463332A/en
Application granted granted Critical
Publication of CN107463332B publication Critical patent/CN107463332B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1847File system types specifically adapted to static storage, e.g. adapted to flash memory or SSD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a file segmentation method and a file segmentation device. Acquiring server system resources, and calculating a current segmentation threshold according to the system resources; dividing the file to be processed according to the current division threshold value, and processing the file of the division result; and the dynamic segmentation of the calculation file is realized by acquiring and storing the file processing result.

Description

File segmentation method and device
Technical Field
The application belongs to the field of data calculation, and particularly relates to a file segmentation method and device.
Background
In the process of processing big data, when big file calculation is processed, the memory overflow of an operating system is usually caused by directly calculating the big file. Therefore, in order to prevent the memory overflow when the large file is loaded into the memory for calculation, the file is divided, and a common practice in the prior art is to divide the large file to be calculated according to a certain fixed value, for example, according to the fixed file size or the fixed file line number, and the calculation result is output only after the final calculation in the real-time calculation is finished.
However, the file segmentation method in the prior art does not consider the busy situations of the free memory of the application server and the CPU, and in the real-time calculation process, the memory overflow may be caused by the excessively large file segmentation, and since the loaded files are calculated in sequence, the calculation of any file fails, and a staged calculation result cannot be obtained.
Therefore, a new file segmentation method is urgently needed.
Disclosure of Invention
In view of the above, the present application provides a file splitting method and device.
In order to solve the technical problem, the application discloses a file segmentation method and a file segmentation device.
The file segmentation method comprises the following steps:
acquiring server system resources, and calculating a current segmentation threshold according to the system resources;
dividing the file to be processed according to the current division threshold value, and processing the file of the division result;
and acquiring and storing the file processing result.
Wherein the system resources specifically include: the idle rate of the CPU of the server and the idle value of the memory.
Wherein, calculating the current segmentation threshold according to the system resource specifically comprises: and carrying out weighted summation on the idle rate of the CPU and the idle value of the memory at the current moment, and taking the result of the weighted summation as the current segmentation threshold.
The method for segmenting the file to be processed according to the current segmentation threshold specifically comprises the following steps: and adopting a command line tool to segment the file to be processed according to the current segmentation threshold value.
The file segmentation method comprises the following steps:
dividing the file to be processed according to a preset division threshold value to obtain a division part and a residual part; performing the file processing on the divided part, acquiring a division correction value according to system resources of a server at the file processing moment, and updating the preset division threshold value with the division correction value for performing next division on the rest part;
and acquiring and storing the result of the file processing performed by the segmentation part.
The system resources specifically include: the idle rate of the CPU of the server and the idle value of the memory.
The obtaining of the segmentation correction value specifically includes: and carrying out weighted summation on the idle rate of the CPU and the idle value of the memory, and taking the result of the weighted summation as the segmentation correction value.
The method includes the following steps that a file to be processed is divided according to a preset dividing threshold value to obtain a dividing part and a remaining part, and the method specifically includes the following steps: and adopting a command line tool to segment the file to be processed according to the file segmentation threshold value.
Wherein, obtaining the segmentation correction value according to the system resource further comprises: and when the vacancy rate of the server CPU is judged to be larger than a preset first threshold value and the vacancy value of the memory is judged to be larger than a preset second threshold value, taking the preset segmentation threshold value as the segmentation correction value.
The application provides a file segmenting device, includes following module:
the preprocessing module is used for acquiring system resources of a server and calculating a current segmentation threshold according to the system resources;
the first segmentation module is used for segmenting the file to be processed according to the current segmentation threshold;
the first processing module is used for carrying out file processing on the segmentation result;
and the first storage module is used for acquiring and storing the file processing result.
Wherein the system resources specifically include: the idle rate of the CPU of the server and the idle value of the memory.
Further, the preprocessing module is specifically configured to: and carrying out weighted summation on the idle rate of the CPU and the idle value of the memory at the current moment, and taking the result of the weighted summation as the current segmentation threshold.
Further, the first segmentation module is specifically configured to: and adopting a command line tool to segment the file to be processed according to the current segmentation threshold value.
The application provides a file segmenting device, includes following module:
the second segmentation module is used for segmenting the file to be processed according to a preset segmentation threshold value to obtain a segmentation part and a residual part; the second processing module is used for carrying out file processing on the divided part, acquiring a division correction value according to system resources of a server at the moment of file processing, and updating the preset division threshold value with the division correction value so as to carry out next division on the rest part;
and the second storage module is used for acquiring and storing the file processing result.
Wherein the system resources specifically include: the idle rate of the CPU of the server and the idle value of the memory.
Wherein the second processing module is specifically configured to: and carrying out weighted summation on the idle rate of the CPU and the idle value of the memory, and taking the result of the weighted summation as the segmentation correction value.
Wherein the second segmentation module is specifically configured to: and adopting a command line tool to segment the file to be processed according to the segmentation threshold value.
Wherein the second processing module is further specifically configured to: and when the vacancy rate of the server CPU is judged to be larger than a preset first threshold value and the vacancy value of the memory is judged to be larger than a preset second threshold value, taking the preset segmentation threshold value as the segmentation correction value.
Compared with the prior art, the application can obtain the following technical effects:
1) the method comprises the steps of dynamically segmenting a large file to be computed by computing the load conditions of an idle memory and a CPU of a server, and avoiding the occurrence of overflow of the memory of the server due to the loading of an overlarge file;
2) and the calculation results are collected in real time and stored in real time, so that the problem that any calculation result cannot be output due to calculation failure is avoided.
Of course, it is not necessary for any one product to achieve all of the above-described technical effects simultaneously.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flowchart of a first embodiment of the present application;
FIG. 2 is a technical flowchart of a second embodiment of the present application;
FIG. 3 is a schematic structural diagram of an apparatus according to a third embodiment of the present application;
fig. 4 is a schematic structural diagram of a fourth apparatus according to the present application.
Detailed Description
Embodiments of the present application will be described in detail with reference to the drawings and examples, so that how to implement technical means to solve technical problems and achieve technical effects of the present application can be fully understood and implemented.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
As used in the specification and in the claims, certain terms are used to refer to particular components. As one skilled in the art will appreciate, manufacturers may refer to a component by different names. This specification and claims do not intend to distinguish between components that differ in name but not function. In the following description and in the claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. "substantially" means within an acceptable error range, and a person skilled in the art can solve the technical problem within a certain error range to substantially achieve the technical effect. Furthermore, the term "coupled" is intended to encompass any direct or indirect electrical coupling. Thus, if a first device couples to a second device, that connection may be through a direct electrical coupling or through an indirect electrical coupling via other devices and couplings. The description which follows is a preferred embodiment of the present application, but is made for the purpose of illustrating the general principles of the application and not for the purpose of limiting the scope of the application. The protection scope of the present application shall be subject to the definitions of the appended claims.
It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.
Fig. 1 is a technical flowchart of a first embodiment of the present application, and with reference to fig. 1, a file splitting method of the present application specifically includes the following steps:
step 110: acquiring server system resources, and calculating a current segmentation threshold according to the system resources;
step 120: dividing the file to be processed according to the current division threshold value, and processing the file of the division result;
step 130: and acquiring and storing the file processing result.
In step 110, the server refers to all devices capable of performing data calculation, and the hardware components at least include a central processing unit CPU, a memory space for providing data processing, and a hard disk space for storing processing results. The system resource of the server is specifically the CPU idle rate of the server and the size of the idle value of the memory.
Specifically, the CPU Idle rate is generally expressed by the size of the value of idlecus, i.e., System idles, whose full name is System Idle Process, i.e., the System Idle value, and the larger the value of idlecus, the lower the CPU occupancy. For example, when the "System Idle Process" Process occupies 2% of the resources, it means that only 2% of the resources of the machine are currently Idle, i.e., the machine may be infected with a virus or occupied by other programs by 98%. In other words, the larger the System Idle Process occupies resources, the more resources are available to the System. In this embodiment, a shell scripting language may be used to collect the CPU idle rate of the server in real time, and of course, any computer language, software, or hardware capable of realizing the collection of the server resource utilization rate in this embodiment is within the scope of this application.
Meanwhile, in the embodiment of the application, when the CPU idle rate of the server at the current moment is obtained, the remaining memory data of the server at the current moment also needs to be obtained. The server memory is mainly used for storing temporary data and caching, temporary processing data are put into the memory, direct hard disk reading is avoided, and therefore the reading speed of files during data processing is improved. The server system occupies less memory, and the amount of memory consumption depends on the amount of data processed by the server. Therefore, before the file is divided, the residual value of the memory is used as the basis of the file division in the data processing process, and the size of the loaded file can be dynamically adjusted, so that the memory overflow caused by the overlarge loaded file is avoided.
Specifically, after the system resources of the server at the current time are acquired, the idle rate of the CPU and the idle value of the memory are weighted and summed, and the result of the weighted summation is used as the current time division threshold. The formula for the specific weighted sum is as follows:
the division threshold is the idle rate of the CPU + the idle value of the memory/64 × N, where N is an integer and represents the number of blocks of the memory, and the unit of the memory idle value is MB.
The idle rate of the CPU has less influence on the calculation of the partition threshold, so the formula may also exist in the following manner:
the partition threshold is the free value/64 × N of the memory, where N is an integer and represents the number of blocks of the memory, and the unit of the memory free value is MB. For example, on one application server, if the idle value at a certain time is 80%, and the idle memory value is 8000MB, then the file division threshold at this time is: when the size of the file to be processed is less than 250.8MB, the processing frequency is increased without affecting the processing speed, and when the size of the file to be processed is greater than 250.8MB, the CPU load may be excessive and the memory may overflow.
In step 120, after the current segmentation threshold is obtained, the file is segmented, and a specific segmentation method may employ a command line tool of the system to segment.
The command line tool is a working prompt in the operating system that prompts for command input. Command hints vary from one operating system environment to another. In Windows environment, the command line program is cmd, exe, which is a 32-bit command line program, and the microsoft Windows system is based on a command interpreter on Windows, similar to the DOS operating system of microsoft. Exe may execute by entering commands, for example, shutdown after 30 seconds if shutdown-s is entered. In the embodiment of the application, a cut command line tool can be adopted to realize the file segmentation, and a segmentation tool carried by a Microsoft Windows system is adopted, so that the method is simple, easy to use and free, and the low-cost file segmentation is realized.
In step 120, the file can be directly segmented according to the known size to be segmented, and the file processing is performed after the segmentation. The file processing may include any form of processing of compressing, computing, storing, transmitting, and playing files. For example, when the file processing is file calculation, before the file calculation, a current segmentation threshold is obtained according to the use condition of the system resource at the current time, the file to be processed is segmented into a first part for calculation and a remaining second part according to the current file segmentation threshold, and the first part for calculation can be loaded into the memory for calculation. And continuously acquiring the use condition of the system resources of the first part at the time of calculation, calculating the segmentation threshold value of the next file calculation according to the use condition of the system resources at the time, and segmenting the second part according to the new segmentation threshold value after obtaining a new segmentation threshold value, thereby obtaining the most reasonable file size when the file calculation is carried out each time.
It should be noted that, in the embodiment of the present application, the result obtained by the segmentation is the minimum unit of each file processing, which is limited by system resources, and the current segmentation threshold is not necessarily the same at each time, so the minimum unit for file processing is not usually fixed.
In step 130, a thread is added to collect the processing result of the server in real time and store the processing result in the hard disk, and the whole processing result is output after all the divided parts are processed, so that the problem that any processing result cannot be output due to processing failure or processing timeout of a certain link can be avoided.
In the embodiment, the system resources of the server are collected in advance, and the file is divided according to the remaining free memory of the server and the CPU load condition so as to be processed, so that the phenomenon that the memory of the server overflows due to the fact that the file is too large is avoided, meanwhile, the calculation result is collected in real time and the processing result is stored, and compared with the prior art, the embodiment of the application avoids the condition that any processing result cannot be output due to processing failure.
Fig. 2 is a technical flowchart of a second embodiment of the present application, and with reference to fig. 2, a file splitting method of the present application specifically includes the following steps:
step S210: dividing the file to be processed according to a preset division threshold value to obtain a division part and a residual part;
step S220: performing the file processing on the divided part, acquiring a division correction value according to system resources of a server at the file processing moment, and updating the preset division threshold value with the division correction value for performing next division on the rest part;
step S230: and acquiring and storing the file processing result.
Specifically, in step S210, the preset segmentation threshold may be preset or obtained according to the system resource of the current server. The method comprises the following steps that firstly, calculation is carried out according to system resources of the current state of a server before the whole file to be processed is not processed; secondly, in the process of processing the file, when the file divided into small blocks is processed, the reasonable file size cut value is calculated according to the system resources left when the system processes the file at the next time of processing the file.
The divided part is a small file which needs to be processed currently, the rest part is the part left after the small file is cut off from the large file, and the part is divided again after the file division threshold value is updated and then calculated.
Specifically, in step S220, the file processing may include any form of processing such as compression, calculation, storage, transmission, and playing of the file. And processing the small files obtained by the segmentation in the first step, and simultaneously acquiring the system resource amount at the current moment. Each division of the file to be processed requires reprocessing of the threshold size that should be divided at the next processing. When the divided current small file participates in data processing, the calculation of the next file division threshold value, namely the division correction value, is synchronously performed with the divided current small file, and the preset division threshold value is updated by the division correction value, so that the next division and loading can be directly performed on the rest part of the file to be processed according to the updated division threshold value after the processing of the current small file is completed, and the time and the efficiency are saved.
The system resource of the server is specifically the CPU idle rate of the server and the size of the idle value of the memory.
Specifically, the CPU Idle rate is generally expressed by the size of the value of idlecus, i.e., System idles, whose full name is System Idle Process, i.e., the System Idle value, and the larger the value of idlecus, the lower the CPU occupancy. For example, when the "System Idle Process" Process occupies 2% of the resources, it means that only 2% of the resources of the machine are currently Idle, i.e., the machine may be infected with a virus or occupied by other programs by 98%. In other words, the larger the System Idle Process occupies resources, the more resources are available to the System. In this embodiment, a shell script language may be used to collect the CPU idle rate of the server in real time, and of course, any processor language, software, or hardware capable of realizing the collection of the server resource utilization rate in this embodiment is within the scope of this application.
Meanwhile, in the embodiment of the application, when the CPU idle rate of the server is obtained, the remaining memory data of the server also needs to be obtained. The server memory is mainly used for storing temporary data and caching, temporary processing data are put into the memory, direct hard disk reading is avoided, and therefore the reading speed of files during data processing is improved. The server system occupies less memory, and the amount of memory consumption depends on the amount of data processed by the server. Therefore, before the file is divided, the residual value of the memory is used as the basis of the file division in the data processing process, and the size of the loaded file can be dynamically adjusted, so that the memory overflow caused by the overlarge loaded file is avoided.
Specifically, after the system resources of the server are acquired, the idle rate of the CPU and the idle value of the memory are subjected to weighted summation, and the result of the weighted summation is used as the file segmentation threshold. The formula for the specific weighted sum is as follows:
the division threshold is the idle rate of the CPU + the idle value of the memory/64 × N, where N is an integer and represents the number of blocks of the memory, and the unit of the memory idle value is MB.
The idle rate of the CPU has less influence on the calculation of the file division threshold, so the formula may also exist in the following manner:
the partition threshold is the free value/64 × N of the memory, where N is an integer and represents the number of blocks of the memory, and the unit of the memory free value is MB.
For example, on an application server, the idle value is 80%, the idle memory value is 8000MB, the divided file threshold is: when the size of the file to be processed is less than 250.8MB, the number of times of processing increases without affecting the processing speed, and when the size of the file to be processed is greater than 250.8MB, the CPU load may be excessive and the memory may overflow.
While processing the partition with the size of 250.8MB, continuously acquiring the current remaining amount of the server system resource, and as the number of processing tasks increases, assuming that the idle value is 85% and the free memory value is 7500MB at this time, the partition correction value is: 0.75+7500/64 × 2 ═ 235.125 MB. That is, the small file which is next participated in the file processing should be processed with the highest processing efficiency when the small file is divided into 235.125 MB.
When the segmentation part with the size of 235.125MB is processed, the current residual quantity of the server system resources is continuously obtained, and the file segmentation threshold value when the residual part of the file to be processed is segmented next time is calculated, so that the size of the file segmentation threshold value is continuously and dynamically updated reasonably according to the residual quantity of the current system resources of the server. Specifically, in step S230, a thread is added to collect the processing result of the server in real time and store the processing result in the hard disk, and the whole processing result is output after all the divided parts are processed, so that it is possible to avoid that any processing result cannot be output due to processing failure or processing timeout in a certain link.
It should be noted that, in this embodiment of the application, the system resource of the server is acquired while the file processing is performed on the result of the division, and when it is determined that the idle rate of the CPU of the server is greater than a preset first threshold and when it is determined that the idle value of the memory is greater than a preset second threshold, the preset division threshold is used as the division correction value. That is, if the remaining value of the memory used for file processing is large and the variation fluctuation is small, the large file to be processed can be divided into small files of the same size for calculation by using the file division threshold value of the same size after one-time processing, so that the time for calculating the division correction value can be saved, and CPU occupation caused by excessive calculation can be avoided. It should be noted that, in the embodiment of the present application, the result obtained by the segmentation is the minimum unit of each file processing, which is limited by system resources, and the current segmentation threshold is not necessarily the same at each time, so the minimum unit for file processing is not usually fixed. In the implementation, in the process of performing segmentation calculation on a large file, system resources of a server are continuously acquired, and the large file to be calculated is dynamically segmented by calculating the load conditions of an idle memory and a CPU (central processing unit) of the server, so that the phenomenon that the memory of the server overflows due to the loading of an overlarge file is avoided; meanwhile, if the residual value of the system memory is large and the change is small, the large file is segmented by the fixed file segmentation threshold, and time and efficiency are saved.
Fig. 3 is a schematic structural diagram of an apparatus according to a third embodiment of the present application, and with reference to fig. 3, a file splitting apparatus according to the present application specifically includes:
the preprocessing module 310 is configured to obtain server system resources, and calculate a current segmentation threshold according to the system resources;
a first segmentation module 320, configured to segment the file to be processed according to the current segmentation threshold;
the first processing module 330 is configured to perform file processing on the segmentation result;
the first storage module 340 is configured to obtain and store the result of the file processing.
Wherein the system resources specifically include: the idle rate of the CPU of the server and the idle value of the memory.
The preprocessing module 310 is specifically configured to: and carrying out weighted summation on the idle rate of the CPU and the idle value of the memory at the current moment, and taking the result of the weighted summation as the current segmentation threshold.
The segmentation module 320 is specifically configured to: and adopting a command line tool to segment the file to be processed according to the current segmentation threshold value.
The embodiment shown in fig. 3 may perform the method described in the embodiment shown in fig. 1, and the implementation principle and technical effect thereof are not described in detail.
Fig. 4 is a schematic structural diagram of a device according to a fourth embodiment of the present application, and with reference to fig. 4, a file splitting device according to the present application specifically includes:
a second segmentation module 410, configured to segment the file to be processed according to a preset segmentation threshold to obtain a segmented portion and a remaining portion; a second processing module 420, configured to perform file processing on the divided portion, obtain a division correction value according to a system resource of a server at the time of file processing, and update the preset division threshold value with the division correction value to perform next division on the remaining portion;
the second storage module 430 is configured to obtain and store a result of the file processing.
Wherein the system resources specifically include: the idle rate of the CPU of the server and the idle value of the memory.
The second processing module 420 is specifically configured to: and carrying out weighted summation on the idle rate of the CPU and the idle value of the memory, and taking the result of the weighted summation as the segmentation correction value.
Wherein the second segmentation module 410 is specifically configured to: and adopting a command line tool to segment the file to be processed according to the segmentation threshold value.
The second processing module 420 is further specifically configured to: and when the vacancy rate of the server CPU is judged to be larger than a preset first threshold value and the vacancy value of the memory is judged to be larger than a preset second threshold value, taking the preset segmentation threshold value as the segmentation correction value.
The embodiment shown in fig. 4 may execute the method described in the embodiment shown in fig. 2, and the implementation principle and technical effect thereof are not described in detail.
The foregoing description shows and describes several preferred embodiments of the invention, but as aforementioned, it is to be understood that the invention is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as expressed herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (18)

1. A method of file splitting, comprising:
acquiring server system resources, and calculating a current segmentation threshold according to the system resources; wherein the current segmentation threshold is used to represent a file size of a first part of the file processing;
dividing the file to be processed according to the current division threshold value to obtain a first part and a remaining second part for file processing, and performing file processing on the first part;
updating the current segmentation threshold according to the system resources of the server during the file processing, and circularly executing the actions of file segmentation, file processing and current segmentation threshold updating on the remaining second part according to the updated current segmentation threshold until the file to be processed is processed;
and acquiring and storing the file processing result.
2. The method of claim 1, wherein the system resources specifically include: the idle rate of the CPU of the server and the idle value of the memory.
3. The method of claim 2, wherein computing the current segmentation threshold based on the system resources comprises:
and carrying out weighted summation on the idle rate of the CPU and the idle value of the memory at the current moment, and taking the result of the weighted summation as the current segmentation threshold.
4. The method of claim 1, wherein segmenting the file to be processed according to the current segmentation threshold specifically comprises:
and adopting a command line tool to segment the file to be processed according to the current segmentation threshold value.
5. A method of file splitting, comprising:
dividing the file to be processed according to a preset division threshold value to obtain a divided part and a residual part; wherein the segmentation threshold is used to represent a file size of the segmented portion;
performing file processing on the divided part, acquiring a division correction value according to system resources of a server during the file processing, updating the preset division threshold value by the division correction value, and circularly executing the actions of file division, file processing and division threshold value updating on the rest part until the file to be processed is processed;
and acquiring and storing the file processing result.
6. The method of claim 5, wherein the system resources specifically include: the idle rate of the CPU of the server and the idle value of the memory.
7. The method of claim 6, wherein obtaining segmentation correction values comprises:
and carrying out weighted summation on the idle rate of the CPU and the idle value of the memory, and taking the result of the weighted summation as the segmentation correction value.
8. The method of claim 5, wherein the dividing the file to be processed according to the preset dividing threshold to obtain the divided part and the remaining part specifically comprises:
and adopting a command line tool to segment the file to be processed according to the segmentation threshold value.
9. The method of claim 6, wherein obtaining segmentation correction values based on the system resources, further comprises:
and when the vacancy rate of the server CPU is judged to be larger than a preset first threshold value and the vacancy value of the memory is judged to be larger than a preset second threshold value, taking the preset segmentation threshold value as the segmentation correction value.
10. A file splitting apparatus, comprising:
the preprocessing module is used for acquiring system resources of a server and calculating a current segmentation threshold according to the system resources; wherein the current segmentation threshold is used to represent a file size of a first part of the file processing;
the first segmentation module is used for segmenting the file to be processed according to the current segmentation threshold value to obtain a first part for processing the file and a remaining second part;
the first processing module is used for carrying out file processing on the first part;
the first circulation module is used for updating the current segmentation threshold according to the system resources of the server during the file processing, and circularly executing the actions of the file segmentation, the file processing and the current segmentation threshold updating on the remaining second part according to the updated current segmentation threshold until the file to be processed is processed;
and the first storage module is used for acquiring and storing the file processing result.
11. The apparatus of claim 10, wherein the system resources specifically comprise: the idle rate of the CPU of the server and the idle value of the memory.
12. The apparatus of claim 11, wherein the pre-processing module is specifically configured to:
and carrying out weighted summation on the idle rate of the CPU and the idle value of the memory at the current moment, and taking the result of the weighted summation as the current segmentation threshold.
13. The apparatus of claim 10, wherein the first segmentation module is specifically configured to:
and adopting a command line tool to segment the file to be processed according to the current segmentation threshold value.
14. A file splitting apparatus is characterized by comprising the following modules:
the second segmentation module is used for segmenting the file to be processed according to a preset segmentation threshold value to obtain a segmentation part and a residual part; wherein the segmentation threshold is used to represent a file size of the segmented portion;
the second processing module is used for carrying out file processing on the divided parts;
the second circulation module is used for acquiring a segmentation correction value according to the system resource of the server at the file processing time, updating the preset segmentation threshold value by the segmentation correction value, and circularly executing the actions of file segmentation, file processing and segmentation threshold value updating on the rest part until the file to be processed is processed;
and the second storage module is used for acquiring and storing the file processing result.
15. The apparatus of claim 14, wherein the system resources specifically comprise: the idle rate of the CPU of the server and the idle value of the memory.
16. The apparatus of claim 15, wherein the second processing module is specifically configured to:
and carrying out weighted summation on the idle rate of the CPU and the idle value of the memory, and taking the result of the weighted summation as the segmentation correction value.
17. The apparatus of claim 14, wherein the second segmentation module is specifically configured to:
and adopting a command line tool to segment the file to be processed according to the segmentation threshold value.
18. The apparatus of claim 15, wherein the second processing module is further specifically configured to:
and when the vacancy rate of the server CPU is judged to be larger than a preset first threshold value and the vacancy value of the memory is judged to be larger than a preset second threshold value, taking the preset segmentation threshold value as the segmentation correction value.
CN201610390987.1A 2016-06-03 2016-06-03 File segmentation method and device Active CN107463332B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610390987.1A CN107463332B (en) 2016-06-03 2016-06-03 File segmentation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610390987.1A CN107463332B (en) 2016-06-03 2016-06-03 File segmentation method and device

Publications (2)

Publication Number Publication Date
CN107463332A CN107463332A (en) 2017-12-12
CN107463332B true CN107463332B (en) 2020-12-29

Family

ID=60545791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610390987.1A Active CN107463332B (en) 2016-06-03 2016-06-03 File segmentation method and device

Country Status (1)

Country Link
CN (1) CN107463332B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101531B (en) * 2018-06-22 2022-05-31 联想(北京)有限公司 File processing method, device and system
CN109190094B (en) * 2018-09-05 2023-03-10 盈嘉互联(北京)科技有限公司 Building information model file segmentation method based on IFC standard
CN109413509A (en) * 2018-12-06 2019-03-01 武汉微梦文化科技有限公司 A kind of HD video processing method
CN110515964A (en) * 2019-08-30 2019-11-29 百度在线网络技术(北京)有限公司 A kind of file updating method, device, electronic equipment and medium
CN114490693A (en) * 2022-02-17 2022-05-13 平安普惠企业管理有限公司 Data modification method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101557407A (en) * 2008-04-11 2009-10-14 盛大计算机(上海)有限公司 Transmission and storage method of program-ordering data of high definition media P2P
CN105528371A (en) * 2014-09-30 2016-04-27 北京金山云网络技术有限公司 Method, device, and system for executing writing task

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6463103B2 (en) * 2014-12-02 2019-01-30 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Method for writing a plurality of files, tape device system, and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101557407A (en) * 2008-04-11 2009-10-14 盛大计算机(上海)有限公司 Transmission and storage method of program-ordering data of high definition media P2P
CN105528371A (en) * 2014-09-30 2016-04-27 北京金山云网络技术有限公司 Method, device, and system for executing writing task

Also Published As

Publication number Publication date
CN107463332A (en) 2017-12-12

Similar Documents

Publication Publication Date Title
CN107463332B (en) File segmentation method and device
US10761935B2 (en) Accelerating system dump capturing
US11275618B2 (en) Method, device and medium for allocating resource based on type of PCI device
CN109582649B (en) Metadata storage method, device and equipment and readable storage medium
CN109343862B (en) Scheduling method and device of resource data of application
CN111338803B (en) Thread processing method and device
CN108446300B (en) Data information scanning method and device
CN110737717A (en) database migration method and device
CN113590285A (en) Method, system and equipment for dynamically setting thread pool parameters
US20210389994A1 (en) Automated performance tuning using workload profiling in a distributed computing environment
CN108062224B (en) Data reading and writing method and device based on file handle and computing equipment
US10318422B2 (en) Computer-readable recording medium storing information processing program, information processing apparatus, and information processing method
US11556377B2 (en) Storage medium, task execution management device, and task execution management method
CN115794690A (en) Processing method and device based on external equipment in server
CN114253662A (en) Method, device, equipment and medium for dynamically migrating virtual machines
CN114968482A (en) Server-free processing method and device and network equipment
US20160266960A1 (en) Information processing apparatus and kernel dump method
CN112540842A (en) Method and device for dynamically adjusting system resources
US11418417B2 (en) Managing stateful workloads executing on temporarily available resources of a cloud computing system
CN110929102B (en) Data processing method and device and electronic equipment
US11593165B2 (en) Resource-usage notification framework in a distributed computing environment
US20230195527A1 (en) Workload distribution by utilizing unused central processing unit capacity in a distributed computing system
WO2021250898A1 (en) Information processing device, information processing method, and program
CN110968552B (en) Application information storage method and device
CN118259986A (en) Processing method, processing device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant