CN112732650A - File fragmentation method and device - Google Patents

File fragmentation method and device Download PDF

Info

Publication number
CN112732650A
CN112732650A CN202011633475.6A CN202011633475A CN112732650A CN 112732650 A CN112732650 A CN 112732650A CN 202011633475 A CN202011633475 A CN 202011633475A CN 112732650 A CN112732650 A CN 112732650A
Authority
CN
China
Prior art keywords
file
record
split
hash value
data file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011633475.6A
Other languages
Chinese (zh)
Inventor
郑宇惟
陈静国
刘轲
朱晓洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202011633475.6A priority Critical patent/CN112732650A/en
Publication of CN112732650A publication Critical patent/CN112732650A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/137Hash-based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a file fragmentation method and device, which can be used in the financial field or other fields. The method comprises the following steps: reading records in the data file to be split, and screening the records according to the key fields corresponding to the data file to be split; after the record screening is passed, determining a hash value corresponding to the record according to the key field; the recorded corresponding hash value is subjected to remainder operation with a preset number of fragments to obtain a remainder; writing the record into the small fragment file in the fragment folder according to the file structure of the data file to be split; wherein, the fragment folders correspond to the remainders one by one. The invention utilizes the Hash algorithm to carry out efficient file fragmentation, splits a large file into small fragmented files, fully considers the factors of single file hashing, data validity and the like, has high processing efficiency, is not easy to make mistakes, and provides high-quality basic service for a modern bank system to frequently process mass data files.

Description

File fragmentation method and device
Technical Field
The present invention relates to the field of file splitting technologies, and in particular, to a file fragmentation method and device.
Background
At present, when a bank system processes files with large data volume in batches, the files are read into a cache in a whole manner, which causes a series of problems of frequent I \ O reading and writing, high memory occupation, slow data processing, wrong data format and the like, and if the problems are solved, the processed files are overtime, and the next process is influenced; and if so, a large amount of online transaction resources are occupied, and daytime business is blocked. The existing file splitting technology has the problems of low processing efficiency, high data error probability and the like.
Disclosure of Invention
Aiming at the problems in the prior art, embodiments of the present invention mainly aim to provide a file fragmentation method and apparatus, which achieve efficient, asynchronous, and effective fragmentation of large data files into fragmented small files.
In order to achieve the above object, an embodiment of the present invention provides a file fragmentation method, where the method includes:
reading a record in a data file to be split, and screening the record according to a key field corresponding to the data file to be split;
after the record screening is passed, determining a hash value corresponding to the record according to the key field;
taking the hash value corresponding to the record and a preset number of fragments for remainder to obtain a remainder;
writing the record into a small fragment file in a fragment folder according to the file structure of the data file to be split; and the fragment folders correspond to the remainders one by one.
Optionally, in an embodiment of the present invention, the method further includes: and if the key field is known to be in a digital type, converting the key field into a character type.
Optionally, in an embodiment of the present invention, the determining, according to the character-type key field, the hash value corresponding to the record includes: according to the Chinese character internal code expansion specification, taking a byte array of a character type key field, and writing the byte array into a character cache region; and performing shift operation and logic operation on the characters in the character cache region by using a preset initial hash value to obtain a hash value corresponding to the record.
Optionally, in an embodiment of the present invention, the method further includes: receiving an inspection file and a data file to be split, and verifying the data file to be split; after the data file to be split passes the verification, determining a processing mode of the data file to be split according to the corresponding relation between the check file and the data file to be split; the processing modes comprise a single thread processing mode and a multi-thread asynchronous processing mode.
Optionally, in an embodiment of the present invention, writing the record into a small fragment file in a fragment folder includes: if the fact that the number of records in the small fragmented files does not reach the preset maximum number of records is known, the records are written into the small fragmented files; and if the fact that the number of records in the small fragment files reaches the preset maximum number of records is known, a small fragment file is newly built in the fragment folder, and the records are written into the newly built small fragment file.
An embodiment of the present invention further provides a file fragmentation device, where the device includes:
the record reading module is used for reading a record in a data file to be split and screening the record according to a key field corresponding to the data file to be split;
the hash value module is used for determining a hash value corresponding to the record according to the key field after the record screening is passed;
a remainder determining module, configured to take the remainder of the hash value corresponding to the record and a preset number of fragments to obtain a remainder;
the record writing module is used for writing the record into the small fragment files in the fragment folder according to the file structure of the data file to be split; and the fragment folders correspond to the remainders one by one.
Optionally, in an embodiment of the present invention, the hash value module is further configured to: and if the key field is known to be in a digital type, converting the key field into a character type.
Optionally, in an embodiment of the present invention, the hash value module includes: the character cache unit is used for taking a byte array of the character type key field according to the Chinese character internal code expansion specification and writing the byte array into a character cache region; and the hash value unit is used for carrying out shift operation and logic operation on the byte array in the character cache region by utilizing a preset initial hash value to obtain a hash value corresponding to the record.
Optionally, in an embodiment of the present invention, the apparatus further includes: the file checking module is used for receiving the check file and the data file to be split and checking the data file to be split; the processing mode module is used for determining the processing mode of the data file to be split according to the corresponding relation between the check file and the data file to be split after the data file to be split passes the verification; the processing modes comprise a single thread processing mode and a multi-thread asynchronous processing mode.
Optionally, in an embodiment of the present invention, the record writing module is further configured to: if the fact that the number of records in the small fragmented files does not reach the preset maximum number of records is known, the records are written into the small fragmented files; and if the fact that the number of records in the small fragment files reaches the preset maximum number of records is known, a small fragment file is newly built in the fragment folder, and the records are written into the newly built small fragment file.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method when executing the program.
The present invention also provides a computer-readable storage medium storing a computer program for executing the above method.
The invention utilizes the Hash algorithm to carry out efficient file fragmentation, splits a large file into small fragmented files, fully considers the factors of single file hashing, data validity and the like, has high processing efficiency, is not easy to make mistakes, and provides high-quality basic service for a modern bank system to frequently process mass data files.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a flowchart of a file fragmentation method according to an embodiment of the present invention;
FIG. 2 is a flow chart of determining a hash value according to an embodiment of the present invention;
FIG. 3 is a flow chart of determining a processing mode in an embodiment of the present invention;
FIG. 4 is a flowchart of file fragmentation in an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a file fragmentation device according to an embodiment of the present invention;
FIG. 6 is a block diagram of a hash module according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a file fragmentation method and a device, which can be used in the financial field or other fields, and it should be noted that the file fragmentation method and the device can be used in the financial field and any fields except the financial field, and the application fields of the file fragmentation method and the device are not limited.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
At present, the bank system has the problems of slow processing, more error data, resource occupation and the like when processing large data files. The method for fragmenting the file is based on the key field hash to perform efficient fragmentation of the file, and asynchronously split a plurality of large files into corresponding small fragmented files. Fig. 1 is a flowchart of a file fragmentation method according to an embodiment of the present invention, where the method includes:
and step S1, reading the record in the data file to be split, and screening the record according to the key field corresponding to the data file to be split.
The records in the data file to be split are read one by one, the records are screened according to the key fields, and the records with errors are removed, for example, if the personal client file contains the record of the public client number by mistake, the records with the wrong client number are removed. And determining a key field corresponding to the data file to be split according to the type of the data file to be split, wherein the data file to be split is a personal customer file, and the key field is a customer number.
And step S2, after the record screening is passed, determining a hash value corresponding to the record according to the key field.
The process of taking the hash value according to the key field specifically comprises the following steps: judging by taking the value of the key field, and if the value is NULL or an empty character string, directly returning to 0; otherwise, the following judgments are continued: if the key field is character type, then do not convert; if the key field is digital, the key field is converted into character type after removing the leading 0. And taking the byte array of the character type key field according to the GBK (Chinese character internal code extension specification) coding mode, and transferring the byte array into a character buffer area. The hash value corresponding to the record is obtained by performing operations such as shifting and logical operation on the character in the character buffer.
And step S3, the hash value corresponding to the record is complemented with the preset number of the fragments to obtain a remainder.
The number of the fragments is predefined and represents the number of folders to be split of the data file to be split. For example, based on the a priori knowledge, the number of fragments of an individual client file is 128, the hash value of a record is 2, and the remainder is 2.
Step S4, writing the record into small fragmented files in a fragmented folder according to the file structure of the data file to be fragmented; and the fragment folders correspond to the remainders one by one.
The fragment folder and the remainder are in a one-to-one correspondence relationship, and the correspondence relationship can be realized in a manner that the file name of the fragment folder contains the remainder. For example, the filename of the sharded folder may be set to File-2, which corresponds to a remainder value of 2. The fragmented folder comprises at least one fragmented small file, the maximum record number which can be recorded by the fragmented small file needs to be predefined, for example, the number of fragments of an individual client file based on prior knowledge is 128, the maximum record number of the corresponding fragmented small file is 50000, and the verified data is the optimal fragmented data and has the best splitting and processing effects.
Further, when writing a record, it may be determined whether the small fragmented file of the current fragmented folder reaches the maximum record number, and if not, the record is written into the current small fragmented file directly according to the file structure of the data file to be split. Specifically, a source file (to-be-split data file) and a small fragment file structure are kept consistent, an XML file is used for maintaining the field name, field type, field length and character set of the source file, the field sequence is consistent with that of the source file, and the source file records are read one by one and the key value pairs { 'field names' in the source file records are returned in the form of Map: field value }. All field names of the source file are stored by a List, and the field value of the Map is taken through field name loop. And storing the extracted field value by using a List, matching the character set of the file, and finally writing the value of the List into the small fragmented files according to the field sequence. And if the record number in the current small fragment file reaches the maximum record number, newly building a small fragment file, and writing the record into the newly-built small fragment file according to the file structure of the data file to be split. Specifically, the file name of the newly created fragmented small file may be named based on the file name of the previous fragmented small file plus 1.
As an embodiment of the invention, the method further comprises: and if the key field is known to be in a digital type, converting the key field into a character type.
If the key field is digital, the key field is converted into character type by removing the leading 0.
In this embodiment, as shown in fig. 2, determining the hash value corresponding to the record according to the key field includes:
and step S21, according to the Chinese character internal code expansion specification, taking the byte array of the character type key field and writing the byte array into the character cache region.
The byte array of the character type key field is taken according to the GBK (Chinese character internal code extension standard) coding mode, and the byte array is transferred into a character buffer area for small-end storage.
And step S22, performing shift operation and logic operation on the byte array in the character cache region by using a preset initial hash value to obtain a hash value corresponding to the record.
Specifically, a priori random numbers seed, m and r are taken, and an initial hash value h is m x of seed, wherein m is a constant defined initially and has no practical meaning, and can be understood as a power of an exponential function in the hash function. Specifically, the value of m may be m ═ 0xc6a4a7935bd1e995L, and may be adjusted according to the effect of actual splitting; the character buffer is the power of the length, the buffer processes every 8 bits, the hash value is continuous m, exclusive OR, and r bits are circularly shifted to the right. seed, m, r are based on empirical values. The most advantages of the hash algorithm are as follows: given a hash function, the hash of the file record after splitting is uniform, the same content of the key field is distributed relatively intensively, and the hash speed is high by utilizing a character buffering technology in the value taking process.
As an embodiment of the present invention, as shown in fig. 3, the method further includes: receiving an inspection file and a data file to be split, and verifying the data file to be split; after the data file to be split passes the verification, determining a processing mode of the data file to be split according to the corresponding relation between the check file and the data file to be split; the processing modes comprise a single thread processing mode and a multi-thread asynchronous processing mode.
The process for determining the processing mode specifically includes: receiving a BIN file (a data file, namely a data file to be split) and a CHK file (an inspection file) of an upstream system; checking the size, date, arrival and other factors of the BIN file; judging the corresponding relation between the CHK file and the BIN file, wherein one to one or one to many is judged; and creating threads in the thread pool according to the corresponding relation for splitting operation, wherein a single thread processes one file, and the multiple files are in an asynchronous relation.
As an embodiment of the present invention, writing the record into a small fragment file in a fragment folder includes: if the fact that the number of records in the small fragmented files does not reach the preset maximum number of records is known, the records are written into the small fragmented files; and if the fact that the number of records in the small fragment files reaches the preset maximum number of records is known, a small fragment file is newly built in the fragment folder, and the records are written into the newly built small fragment file.
The fragment folder comprises at least one fragment small file, and the maximum record number which can be recorded by the fragment small file needs to be defined in advance. When writing a certain record, firstly judging whether the current small fragment file reaches the maximum record number, if not, directly writing the record into the current small fragment file according to the file structure of the data file to be split. And if the record number in the current small fragment file reaches the maximum record number, newly building a small fragment file, and writing the record into the newly-built small fragment file according to the file structure of the data file to be split. Specifically, the file name of the newly created fragmented small file may be named based on the file name of the previous fragmented small file plus 1.
In an embodiment of the present invention, as shown in fig. 4, taking analysis of an individual client file as an example, a fragmentation process is as follows:
1) the number of the fragments of the folder, namely the number of the fragments, and the maximum number of records of the small fragmented files are defined. For example, the maximum number of records of a small file is 50000 based on the prior knowledge that the number of fragments of an individual client file is 128. The data is verified to be the optimal fragment data, and the splitting and processing effects are optimal.
2) A key field is defined, for example, a key field of an individual customer file is a customer number.
3) The records of the large file are read one by one.
4) The records are screened according to the key fields. For example, a personal client file will be culled if it contains a record of the public client number by mistake.
5) And taking the hash value according to the key field, and if the key field is in a character type, transcoding processing is required. Wherein, the process of taking the hash value according to the key field: and (4) taking the value of the key field for judgment, and directly returning to 0 if the value is NULL or an empty character string. Otherwise, the following judgments are continued: if the key field is character type, no conversion is carried out; if the key field is digital, the key field is converted into character type after removing the leading 0. And taking a byte array of the character type key field according to a GBK coding mode, and transferring the byte array into a character buffer area for small-end storage. And taking a priori random number seed, m and r, wherein the initial hash value h is m-character buffer length power of the seed. And taking out the characters in the character buffer area in an 8-bit one-cycle mode, calculating a value k, circularly shifting r bits to the right after k m, carrying out exclusive OR with the r bits, and carrying out exclusive OR again. h is xored with k and m. If the tail of the character buffer area is less than 8 bits, another 8-bit character buffer area is applied, and the number of the buffer area is calculated, exclusive OR is carried out with h, and m is carried out after exclusive OR. And circularly right shifting the hash value obtained after traversing all the character buffer areas by r bits again and carrying out XOR with the hash value, carrying out XOR by m, circularly right shifting the r bits again and carrying out XOR with the r bits again to obtain the final hash value. To summarize, the key field is written into the character buffer, and given an initial hash value, the buffer processes every 8 bits, and the hash value is continuously m, xor, and circularly right shifted by r bits. seed, m, r are empirically based values. The most advantages of the hash algorithm are as follows: given a hash function, the hash of the file record after splitting is uniform, the same content of the key field is distributed relatively intensively, and the hash speed is high by utilizing a character buffering technology in the value taking process.
6) And (4) the obtained hash value is left with the number of the fragments. For example, if the hash value of a record obtained in step 5) is 2, and the number of slices is 128, the remainder is 2.
7) Selecting a corresponding fragment folder according to the value of the remainder obtained in the step 6), and judging whether the corresponding fragment small file in the fragment folder reaches the maximum record number, namely the maximum record number of the single fragment small file defined in the step 1). If the small fragmented files are not full, the small fragmented files are directly written according to the structure of the data file to be split, otherwise, a new small fragmented file is built, the file names are accumulated by one, and the record is written according to the structure of the data file to be split.
8) And repeating the steps until the data file to be split is read.
The invention utilizes the Hash algorithm to carry out efficient file fragmentation, splits a large file into small fragmented files, fully considers the factors of multi-file concurrency, single file hashing, data validity and the like, alternates data checking actions, ensures that the small fragmented files are uniform and hashed, has higher data correctness, effectively improves the data validity, has high processing efficiency and is not easy to make mistakes, and provides high-quality basic service for a modern bank system to frequently process mass data files.
Fig. 5 is a schematic structural diagram of a file slicing apparatus according to an embodiment of the present invention, where the apparatus includes:
the record reading module 10 is configured to read a record in a data file to be split, and screen the record according to a key field corresponding to the data file to be split.
The records in the data file to be split are read one by one, the records are screened according to the key fields, and the records with errors are removed, for example, if the personal client file contains the record of the public client number by mistake, the records with the wrong client number are removed.
And a hash value module 20, configured to determine, according to the key field, a hash value corresponding to the record after the record screening passes.
The process of taking the hash value according to the key field specifically comprises the following steps: judging by taking the value of the key field, and if the value is NULL or an empty character string, directly returning to 0; otherwise, the following judgments are continued: if the key field is character type, then do not convert; if the key field is digital, the key field is converted into character type after removing the leading 0. And taking the byte array of the character type key field according to the GBK (Chinese character internal code extension specification) coding mode, and transferring the byte array into a character buffer area. The hash value corresponding to the record is obtained by performing operations such as shifting and logical operation on the character in the character buffer.
And a remainder determining module 30, configured to take the remainder of the hash value corresponding to the record and a preset number of fragments to obtain a remainder.
The number of the fragments is predefined and represents the number of folders to be split of the data file to be split. For example, based on the a priori knowledge, the number of fragments of an individual client file is 128, the hash value of a record is 2, and the remainder is 2.
A record writing module 40, configured to write the record into a small fragment file in a fragment folder according to the file structure of the data file to be split; and the fragment folders correspond to the remainders one by one.
The fragment folder and the remainder are in a one-to-one correspondence relationship, and the correspondence relationship can be realized in a manner that the file name of the fragment folder contains the remainder. For example, the filename of the sharded folder may be set to File-2, which corresponds to a remainder value of 2. The fragmented folder comprises at least one fragmented small file, the maximum record number which can be recorded by the fragmented small file needs to be predefined, for example, the number of fragments of an individual client file based on prior knowledge is 128, the maximum record number of the corresponding fragmented small file is 50000, and the verified data is the optimal fragmented data and has the best splitting and processing effects.
Further, when writing a record, it may be determined whether the small fragmented file of the current fragmented folder reaches the maximum record number, and if not, the record is written into the current small fragmented file directly according to the file structure of the data file to be split. And if the record number in the current small fragment file reaches the maximum record number, newly building a small fragment file, and writing the record into the newly-built small fragment file according to the file structure of the data file to be split. Specifically, the file name of the newly created fragmented small file may be named based on the file name of the previous fragmented small file plus 1.
As an embodiment of the present invention, the hash value module is further configured to: and if the key field is known to be in a digital type, converting the key field into a character type.
In this embodiment, as shown in fig. 6, the hash value module 20 includes:
the character cache unit is used for taking a byte array of the character type key field according to the Chinese character internal code expansion specification and writing the byte array into a character cache region;
and the hash value unit is used for carrying out shift operation and logic operation on the byte array in the character cache region by utilizing a preset initial hash value to obtain a hash value corresponding to the record.
As an embodiment of the present invention, the apparatus further comprises:
the file checking module is used for receiving the check file and the data file to be split and checking the data file to be split;
the processing mode module is used for determining the processing mode of the data file to be split according to the corresponding relation between the check file and the data file to be split after the data file to be split passes the verification; the processing modes comprise a single thread processing mode and a multi-thread asynchronous processing mode.
As an embodiment of the present invention, the recording writing module is further configured to: if the fact that the number of records in the small fragmented files does not reach the preset maximum number of records is known, the records are written into the small fragmented files; and if the fact that the number of records in the small fragment files reaches the preset maximum number of records is known, a small fragment file is newly built in the fragment folder, and the records are written into the newly built small fragment file.
Based on the same application concept as the file fragmentation method, the invention also provides the file fragmentation device. Because the principle of solving the problems of the file fragmenting device is similar to that of a file fragmenting method, the implementation of the file fragmenting device can refer to the implementation of the file fragmenting method, and repeated parts are not repeated.
The invention utilizes the Hash algorithm to carry out efficient file fragmentation, splits a large file into small fragmented files, fully considers the factors of single file hashing, data validity and the like, has high processing efficiency, is not easy to make mistakes, and provides high-quality basic service for a modern bank system to frequently process mass data files.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method when executing the program.
The present invention also provides a computer-readable storage medium storing a computer program for executing the above method.
As shown in fig. 7, the electronic device 600 may further include: communication module 110, input unit 120, audio processing unit 130, display 160, power supply 170. It is noted that the electronic device 600 does not necessarily include all of the components shown in fig. 7; furthermore, the electronic device 600 may also comprise components not shown in fig. 7, which may be referred to in the prior art.
As shown in fig. 7, the central processor 100, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, the central processor 100 receiving input and controlling the operation of the various components of the electronic device 600.
The memory 140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 100 may execute the program stored in the memory 140 to realize information storage or processing, etc.
The input unit 120 provides input to the cpu 100. The input unit 120 is, for example, a key or a touch input device. The power supply 170 is used to provide power to the electronic device 600. The display 160 is used to display an object to be displayed, such as an image or a character. The display may be, for example, an LCD display, but is not limited thereto.
The memory 140 may be a solid state memory such as Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 140 may also be some other type of device. Memory 140 includes buffer memory 141 (sometimes referred to as a buffer). The memory 140 may include an application/function storage section 142, and the application/function storage section 142 is used to store application programs and function programs or a flow for executing the operation of the electronic device 600 by the central processing unit 100.
The memory 140 may also include a data store 143, the data store 143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage portion 144 of the memory 140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging application, address book application, etc.).
The communication module 110 is a transmitter/receiver 110 that transmits and receives signals via an antenna 111. The communication module (transmitter/receiver) 110 is coupled to the central processor 100 to provide an input signal and receive an output signal, which may be the same as in the case of a conventional mobile communication terminal.
Based on different communication technologies, a plurality of communication modules 110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 110 is also coupled to a speaker 131 and a microphone 132 via an audio processor 130 to provide audio output via the speaker 131 and receive audio input from the microphone 132 to implement general telecommunications functions. Audio processor 130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, an audio processor 130 is also coupled to the central processor 100, so that recording on the local can be enabled through a microphone 132, and so that sound stored on the local can be played through a speaker 131.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (12)

1. A method of document fragmentation, the method comprising:
reading a record in a data file to be split, and screening the record according to a key field corresponding to the data file to be split;
after the record screening is passed, determining a hash value corresponding to the record according to the key field;
taking the hash value corresponding to the record and a preset number of fragments for remainder to obtain a remainder;
writing the record into a small fragment file in a fragment folder according to the file structure of the data file to be split; and the fragment folders correspond to the remainders one by one.
2. The method of claim 1, further comprising: and if the key field is known to be in a digital type, converting the key field into a character type.
3. The method of claim 2, wherein determining the hash value corresponding to the record according to the key field comprises:
according to the Chinese character internal code expansion specification, taking a byte array of a character type key field, and writing the byte array into a character cache region;
and carrying out shift operation and logic operation on the byte arrays in the character cache region by using a preset initial hash value to obtain a hash value corresponding to the record.
4. The method of claim 1, further comprising:
receiving an inspection file and a data file to be split, and verifying the data file to be split;
after the data file to be split passes the verification, determining a processing mode of the data file to be split according to the corresponding relation between the check file and the data file to be split; the processing modes comprise a single thread processing mode and a multi-thread asynchronous processing mode.
5. The method of claim 1, wherein writing the record to a sharded doclet within a sharded folder comprises:
if the fact that the number of records in the small fragmented files does not reach the preset maximum number of records is known, the records are written into the small fragmented files;
and if the fact that the number of records in the small fragment files reaches the preset maximum number of records is known, a small fragment file is newly built in the fragment folder, and the records are written into the newly built small fragment file.
6. A document fragmentation device, the device comprising:
the record reading module is used for reading a record in a data file to be split and screening the record according to a key field corresponding to the data file to be split;
the hash value module is used for determining a hash value corresponding to the record according to the key field after the record screening is passed;
a remainder determining module, configured to take the remainder of the hash value corresponding to the record and a preset number of fragments to obtain a remainder;
the record writing module is used for writing the record into the small fragment files in the fragment folder according to the file structure of the data file to be split; and the fragment folders correspond to the remainders one by one.
7. The apparatus of claim 6, wherein the hash value module is further configured to: and if the key field is known to be in a digital type, converting the key field into a character type.
8. The apparatus of claim 7, wherein the hash value module comprises:
the character cache unit is used for taking a byte array of the character type key field according to the Chinese character internal code expansion specification and writing the byte array into a character cache region;
and the hash value unit is used for carrying out shift operation and logic operation on the byte array in the character cache region by utilizing a preset initial hash value to obtain a hash value corresponding to the record.
9. The apparatus of claim 6, further comprising:
the file checking module is used for receiving the check file and the data file to be split and checking the data file to be split;
the processing mode module is used for determining the processing mode of the data file to be split according to the corresponding relation between the check file and the data file to be split after the data file to be split passes the verification; the processing modes comprise a single thread processing mode and a multi-thread asynchronous processing mode.
10. The apparatus of claim 6, wherein the record writing module is further configured to: if the fact that the number of records in the small fragmented files does not reach the preset maximum number of records is known, the records are written into the small fragmented files; and if the fact that the number of records in the small fragment files reaches the preset maximum number of records is known, a small fragment file is newly built in the fragment folder, and the records are written into the newly built small fragment file.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 5 when executing the program.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 5.
CN202011633475.6A 2020-12-31 2020-12-31 File fragmentation method and device Pending CN112732650A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011633475.6A CN112732650A (en) 2020-12-31 2020-12-31 File fragmentation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011633475.6A CN112732650A (en) 2020-12-31 2020-12-31 File fragmentation method and device

Publications (1)

Publication Number Publication Date
CN112732650A true CN112732650A (en) 2021-04-30

Family

ID=75608581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011633475.6A Pending CN112732650A (en) 2020-12-31 2020-12-31 File fragmentation method and device

Country Status (1)

Country Link
CN (1) CN112732650A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407492A (en) * 2021-06-18 2021-09-17 中国人民银行清算总中心 File fragment storage method, fragment file recombination method, device and file protection system
CN113641841A (en) * 2021-10-15 2021-11-12 支付宝(杭州)信息技术有限公司 Data encoding method, graph data storage method, graph data query method and device
CN113946549A (en) * 2021-09-28 2022-01-18 杭州星犀科技有限公司 Method, system and storage medium for file splitting
CN114218173A (en) * 2021-12-30 2022-03-22 北京宇信科技集团股份有限公司 Batch processing system, processing method, medium and equipment for account-transfer transaction files
CN117573620A (en) * 2024-01-10 2024-02-20 中电数据产业有限公司 Large file splitting concurrent reading method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033295A (en) * 2018-07-13 2018-12-18 成都亚信网络安全产业技术研究院有限公司 The merging method and device of super large data set
CN109977077A (en) * 2019-03-25 2019-07-05 腾讯科技(深圳)有限公司 Model file storage method, device, readable storage medium storing program for executing and computer equipment
WO2019157929A1 (en) * 2018-02-13 2019-08-22 阿里巴巴集团控股有限公司 File processing method, device, and equipment
CN111243679A (en) * 2020-01-15 2020-06-05 重庆邮电大学 Storage and retrieval method for microbial community species diversity data
CN111382128A (en) * 2020-03-20 2020-07-07 中国邮政储蓄银行股份有限公司 File splitting method and device and computer system
CN111428140A (en) * 2020-04-13 2020-07-17 上海东普信息科技有限公司 High-concurrency data retrieval method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019157929A1 (en) * 2018-02-13 2019-08-22 阿里巴巴集团控股有限公司 File processing method, device, and equipment
CN109033295A (en) * 2018-07-13 2018-12-18 成都亚信网络安全产业技术研究院有限公司 The merging method and device of super large data set
CN109977077A (en) * 2019-03-25 2019-07-05 腾讯科技(深圳)有限公司 Model file storage method, device, readable storage medium storing program for executing and computer equipment
CN111243679A (en) * 2020-01-15 2020-06-05 重庆邮电大学 Storage and retrieval method for microbial community species diversity data
CN111382128A (en) * 2020-03-20 2020-07-07 中国邮政储蓄银行股份有限公司 File splitting method and device and computer system
CN111428140A (en) * 2020-04-13 2020-07-17 上海东普信息科技有限公司 High-concurrency data retrieval method, device, equipment and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407492A (en) * 2021-06-18 2021-09-17 中国人民银行清算总中心 File fragment storage method, fragment file recombination method, device and file protection system
CN113407492B (en) * 2021-06-18 2024-03-26 中国人民银行清算总中心 Method and device for storing file fragments and reorganizing file fragments and file protection system
CN113946549A (en) * 2021-09-28 2022-01-18 杭州星犀科技有限公司 Method, system and storage medium for file splitting
CN113641841A (en) * 2021-10-15 2021-11-12 支付宝(杭州)信息技术有限公司 Data encoding method, graph data storage method, graph data query method and device
CN113641841B (en) * 2021-10-15 2022-02-22 支付宝(杭州)信息技术有限公司 Data encoding method, graph data storage method, graph data query method and device
CN114218173A (en) * 2021-12-30 2022-03-22 北京宇信科技集团股份有限公司 Batch processing system, processing method, medium and equipment for account-transfer transaction files
CN117573620A (en) * 2024-01-10 2024-02-20 中电数据产业有限公司 Large file splitting concurrent reading method and system
CN117573620B (en) * 2024-01-10 2024-04-02 中电数据产业有限公司 Large file splitting concurrent reading method and system

Similar Documents

Publication Publication Date Title
CN112732650A (en) File fragmentation method and device
US10778246B2 (en) Managing compression and storage of genomic data
CN112395300B (en) Data processing method, device and equipment based on block chain and readable storage medium
US10649905B2 (en) Method and apparatus for storing data
CN111444192B (en) Method, device and equipment for generating Hash of global state in block chain type account book
CN111078672B (en) Data comparison method and device for database
CN111680067A (en) Data processing method, device and system based on block chain
CN109542495A (en) A kind of method for upgrading software and device
CN111949614A (en) Bank system file conversion method and device
CN112784112A (en) Message checking method and device
CN112732571A (en) Test data generation method and device
CN115080515A (en) Block chain based system file sharing method and system
CN111414303A (en) Auxiliary system and method for script performance test
JP2010061518A (en) Apparatus and method for storing data and program
CN112783853A (en) Operation processing method, device and system based on block chain
CN106162311A (en) A kind of method playing Internet video and terminal unit
CN113709059B (en) Link traffic recording method and node
US20070282924A1 (en) Devices and methods for checking and determining control values
CN111290700A (en) Distributed data reading and writing method and system
CN111291017B (en) Mirror image storage and extraction method and device of mirror image warehouse
CN109584891B (en) Audio decoding method, device, equipment and medium in embedded environment
CN114238213A (en) Multithreading file analysis method and device
CN114691622A (en) Data storage system, method, device, equipment and storage medium
CN112988598A (en) Method and device for automatically testing interface
US9059728B2 (en) Random extraction from compressed data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination