CN106708825A - Data file processing method and system - Google Patents

Data file processing method and system Download PDF

Info

Publication number
CN106708825A
CN106708825A CN201510454768.0A CN201510454768A CN106708825A CN 106708825 A CN106708825 A CN 106708825A CN 201510454768 A CN201510454768 A CN 201510454768A CN 106708825 A CN106708825 A CN 106708825A
Authority
CN
China
Prior art keywords
data
data file
file
shared drive
directory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510454768.0A
Other languages
Chinese (zh)
Other versions
CN106708825B (en
Inventor
王刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201510454768.0A priority Critical patent/CN106708825B/en
Publication of CN106708825A publication Critical patent/CN106708825A/en
Application granted granted Critical
Publication of CN106708825B publication Critical patent/CN106708825B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a data file processing method and system. The method comprises the steps of obtaining a data file in a disk, wherein the data file is stored by a data structure of a shared memory; loading the data file to the shared memory in a memory mapping manner, and initializing the shared memory; recording data updating information based on the loading process, wherein the data updating information comprises a file name of the data file and first time information, and the first time information is loading time of the loading process; and performing reading processing on the data file according to the data updating information. According to the method and the system provided by embodiments of the invention, the data file, adopting a plaintext data structure, in the disk is stored by the data structure of the shard memory and loaded to the shared memory, so that the loading efficiency is improved; and a one-loading and multi-use function is supported, namely, a plurality of processes use the same shared memory data through the shared memory, so that the additional memory occupation is greatly reduced.

Description

A kind of data documents disposal method and system
Technical field
The invention belongs to communication technical field, more particularly to a kind of data documents disposal method and system.
Background technology
With developing rapidly for Internet technology, it has been required for substantial amounts of data to support decision-making in the substantial amounts of service such as the recommendation service in retrieval string parsing and error correction, personalized recommendation business in searching service.Performance is then that online service needs to load substantial amounts of data in a program, per treatment to need to carry out substantial amounts of table lookup operation.Simultaneously, it is necessary to be updated to data according to certain frequency, to adapt to change.The frequency of renewal can be that (second level), quasi real time (minute level) or timing (day level) update in real time.
In the prior art, usual each service processes are voluntarily responsible for the loading and renewal of data, usually process reads data file, and oneself builds memory data structure, in order to support not withdraw renewal, general way is, starting a separate threads carries out data renewal, maintain old data constant at no point in the update process, treat after the completion of new data loading, delete old data.
In the research and practice process to prior art, it was found by the inventors of the present invention that internal storage data can only be used in individual process in the prior art, if multiple processes need to use identical data, then need to load respectively, update respectively, so as to extra EMS memory occupation can be caused;And the data are clear data structure, the load time is more long, so as to cause loading efficiency not high.
The content of the invention
It is an object of the invention to provide a kind of data documents disposal method and system, it is intended to improve data documents disposal accuracy rate and recall rate.
In order to solve the above technical problems, the embodiment of the present invention provides following technical scheme:
A kind of data documents disposal method, including:
The data file on disk is obtained, wherein the data file is deposited with the data structure of shared drive;
The data file is loaded into the shared drive by the way of internal memory mapping, and the shared drive is initialized;
Based on the loading procedure record data fresh information, the data fresh information includes the filename and very first time information of data file, and the very first time information is the time when loading procedure is loaded;
Treatment is read out to data file according to the data fresh information.
In order to solve the above technical problems, the embodiment of the present invention also provides following technical scheme:
A kind of data documents disposal system, including:
Data management module, for obtaining the data file on disk, wherein the data file is deposited with the data structure of shared drive;The data file is loaded into the shared drive by the way of internal memory mapping, and the shared drive is initialized;Based on the loading procedure record data fresh information, the data fresh information includes the filename and very first time information of data file, and the very first time information is the time when loading procedure is loaded;
Data read module, for being read out treatment to data file according to the data fresh information.
Relative to prior art, the present embodiment, first data file is stored on disk with the data structure of shared drive, and the data file will be loaded into the shared drive by the way of internal memory mapping, and record fresh information, to realize carrying out data file loading processing, and then data file is updated and loaded according to the data fresh information, the data in shared drive are used in conjunction with so as to process;The embodiment of the present invention is deposited by by the data file of clear data structure on disk with the data structure of shared drive, and is loaded into shared drive, improves loading efficiency;Support that loading, many places use at one, i.e., by shared drive, realize that multiple processes, using with portion shared drive data, greatly reduce extra EMS memory occupation.
Brief description of the drawings
Below in conjunction with the accompanying drawings, described in detail by specific embodiment of the invention, technical scheme and other beneficial effects will be made apparent.
Fig. 1 a are the schematic diagram of a scenario of data documents disposal method provided in an embodiment of the present invention;
Fig. 1 b are the schematic flow sheet of data documents disposal method provided in an embodiment of the present invention;
Fig. 2 is the schematic flow sheet of data documents disposal method provided in an embodiment of the present invention;
Fig. 3 is that the data structure of data file provided in an embodiment of the present invention is illustrated;
Fig. 4 is the structural representation of data documents disposal system provided in an embodiment of the present invention;
Fig. 5 is the structural representation of server provided in an embodiment of the present invention.
Specific embodiment
Schema is refer to, wherein identical element numbers represent identical component, and principle of the invention is to implement to be illustrated in an appropriate computing environment.The following description is that, based on the illustrated specific embodiment of the invention, it is not construed as the limitation present invention other specific embodiments not detailed herein.
In the following description, specific embodiment of the invention will illustrate with reference to the step and symbol as performed by one or multi-section computer, unless otherwise stating clearly.Therefore, these steps and operation will have to mention for several times and be performed by computer, and computer as referred to herein is performed and included by representing with the operation of the computer processing unit of the electronic signal of the data in a structuring pattern.At this operation conversion data or the position being maintained in the memory system of the computer, its is reconfigurable or changes the running of the computer in the way of known to the tester of this area in addition.The data structure that the data are maintained is the provider location of the internal memory, and it has as particular characteristics defined in the data form.But, the principle of the invention is illustrated with above-mentioned word, and it is not represented as a kind of limitation, and this area tester will appreciate that plurality of step as described below and operation also may be implemented in the middle of hardware.
Principle of the invention is operated using many other wide usages or specific purpose computing, communication environment or configuration.The known example for being suitable for arithmetic system of the invention, environment and configuration may include system based on the hand-held phone of (but not limited to), personal computer, server, multicomputer system, micro computer, body frame configuration computer and distributed computing environment, which includes any said system or device.
It is the software object performed in the arithmetic system that term as used herein " module " can be regarded as.It is the objective for implementation in the arithmetic system that different components as herein described, module, engine and service can be regarded as.And device and method as herein described is preferably implemented in the way of software, can also be implemented on hardware certainly, within the scope of the present invention.
The embodiment of the present invention provides a kind of data documents disposal method and system.
Referring to Fig. 1 a, the schematic diagram of a scenario of the data documents disposal system that the figure is provided by the embodiment of the present invention, the data documents disposal system can be specifically integrated in the equipment such as server, the data documents disposal system can specifically include data management module, it is mainly used in obtaining the data file on disk, wherein the data file is deposited with the data structure of shared drive;Wherein, the data structure of shared drive can include array, hash table, even numbers group word lookup tree etc., the data file is loaded into the shared drive by the way of internal memory mapping thereafter, and the shared drive is initialized;Based on the loading procedure record data fresh information, the data fresh information includes the filename and very first time information of data file, and the very first time information is the time when loading procedure is loaded;Certainly, the data management module can also further be used to be updated data file treatment etc.;Loading/update
Additionally, the data documents disposal system can also specifically include data read module, it is mainly used for that data file is updated and loaded according to the data fresh information, the data in shared drive is used in conjunction with so as to process;In addition, the data documents disposal system can also specifically include peripheral system, alternatively referred to as update detection module, it is mainly used for being updated detection to the data file on disk, so that data management module is updated and loading processing, data read module is updated and loads according to data fresh information.
To be described in detail respectively below.
First embodiment
In the present embodiment, will be described from the angle of data documents disposal, the data documents disposal system can be specifically integrated in the equipment such as server.
A kind of data documents disposal method, including:The data file on disk is obtained, the wherein data file is deposited with the data structure of shared drive;The data file is loaded into shared drive by the way of internal memory mapping, and the shared drive is initialized;Based on the loading procedure record data fresh information, the data fresh information includes the filename and very first time information of data file, and the very first time information is the time when loading procedure is loaded;Treatment is read out to data file according to the data fresh information.
Fig. 1 b are referred to, Fig. 1 b are the schematic flow sheets of the data documents disposal method that first embodiment of the invention is provided.The method includes:
In step S101, the data file on disk is obtained, the wherein data file is deposited with the data structure of shared drive.
It is understood that when data file loading is carried out, first can be pre-processed to the data file on disk, such as:Data file is stored in disk with the data structure of shared drive, wherein, the data structure of shared drive includes but is not limited to array, hash table, even numbers group word lookup tree (trie) etc., and it is not especially limited herein.
In step s 102, the data file is loaded into the shared drive by the way of internal memory mapping, and the shared drive is initialized.
For example, the operation for initializing herein can refer specifically to carry out initialization operation to the lock in internal storage data, mainly for the internal storage data for having read-write (such as being modified to partial data) demand simultaneously, it is necessary to design lock (such as Read-Write Locks, sequence lock) in memory data structure to ensure correct access.
Optionally, after the shared drive is initialized, the data file on disk can also be carried out to update detection in real time, for example, specifically can be as follows:
(1) when determining the data file that there is renewal in need in detection, the data file that the needs update is copied under more new directory;
(2) data file under this more new directory is loaded into shared drive, and the shared drive is initialized;
(3) data file under data directory is moved into backup directory;Data file under this more new directory is moved under the data directory.
After data file loading updates, the mapping of original data file can also be deleted.
It should be noted that, in the embodiment of the present invention, the more new directory, the data directory and the backup directory are set in advance, and these three catalogues are in identical file system, to ensure that data file file system index node (inode) when mobile keeps constant, so as to keep internal memory mapping relations;The index node can be used to deposit the essential information of archives and catalogue, comprising time, shelves name, user and group etc..
It is further alternative, detection can be updated to data file by the mode such as CRC or message digest algorithm (MD5, Message-Digest Algorithm 5), it is not specifically described herein.
In step s 103, based on the loading procedure record data fresh information, the data fresh information includes the filename and very first time information of data file, and the very first time information is the time when loading procedure is loaded.
It is changed by the data structure to data in magnetic disk file, the data file is loaded into shared drive, and record the data fresh information of correlation, so that the module of data management also completes the process of data loading.
In step S104, treatment is read out to data file according to the data fresh information.
Such as:After the module of data management have recorded data fresh information, the data fresh information can be read out and be detected according to prefixed time interval using the module of data;If it is determined that data fresh information middle finger is shown with the fresh information of purpose data file, then in data file to be loaded into the shared drive by the way of internal memory maps, and the time recorded when the module for using data maps loading is the second temporal information.
It is appreciated that, the initialization operation due to step S102 executeds, therefore this need not again do operation bidirectional using the module of data, and operating system can ensure that same data file is mapped in identical shared drive, so as to also complete the process that data are loaded using the module of data.
From the above, the data documents disposal method that the present embodiment is provided, first data file is stored on disk with the data structure of shared drive, and the data file will be loaded into the shared drive by the way of internal memory mapping, and record fresh information, to realize carrying out data file loading processing, and then data file is updated and loaded according to the data fresh information, the data in shared drive are used in conjunction with so as to process;The embodiment of the present invention is deposited by by the data file of clear data structure on disk with the data structure of shared drive, and is loaded into shared drive, improves loading efficiency;Support that loading, many places use at one, i.e., by shared drive, realize that multiple processes, using with portion shared drive data, greatly reduce extra EMS memory occupation.
Second embodiment
, be described in further detail for citing below by the method according to described by first embodiment.
The data documents disposal system includes:Data management module, data are using module and update detection module;First, the data file on disk is mapped to shared drive by data management module, records the data fresh information of the process.Secondly, data file is mapped to shared drive by data using module also according to data fresh information;After the completion of data loading, updating detection module can further be updated detection to disk file in real time, so that data management module and data read module are updated and load according to data.
The data that wherein system is related in loading is updated include:Data file, shared drive data and recorded data fresh information on disk, and for checking the CRC file (CRC, Cyclic Redundancy Check) of file authentication and indicating mark (flag) file of renewal etc..
Hereinafter will be described in more detail.
As shown in Fig. 2 a kind of data documents disposal method, idiographic flow can be as follows:
In step s 201, update detection module and detection is updated to the data file on disk.
Wherein, the mode for updating detection includes but is not limited to crc verifications, md5sum verifications.Trigger step " data file for needing to update is copied under more new directory " after verifying successfully.
It should be noted that data file is stored on disk with the data structure of shared drive, the disk file of data is alternatively referred to as deposited;System need data are regular into the form that directly can be used in shared drive, that is, shared drive data disk storage form.
For example, can be specific as follows:
It is stored on disk in binary file form, shared drive is mapped to by way of internal memory maps.Its content can be any shared drive data structure.Including but not limited to array, hash table, even numbers group trie trees etc..Below by taking hash table as an example, its data structure can be as shown in figure 3, hash table data structure can include:
(1) header information HEADER, deposits the metadata of hash table.The lock that for example data type, version number, control multi-process are accessed, hash table statistical information etc..
(2) hash bucket BUCKET, content is directed to the index of NODE arrays.
(3) node NODE, content includes pointing to index, the key (key) of next node.The index of CHUNK arrays is pointed to, is elongated hash table for value, also the length including being worth.
(4) data block CHUNK, fixed length hashes direct storage value, and elongated hash also deposits an index for pointing to next CHUNK.
It is contemplated that being only analyzed explanation by taking hash table data structure as an example herein, limitation of the invention is not constituted.
In step S202, when the data file that there is renewal in need is determined, update detection module and copy under more new directory the data file that the needs update.
At the same time, updating detection module can will need the filename of the data file for updating to write flag files, so that data management module can periodically check flag files, read filename therein.
In step S203, be loaded into the data file under this more new directory in shared drive by data management module, and the shared drive is initialized.
In step S204, the data file under data directory is moved to backup directory by data management module.
In step S205, data management module moves under the data directory data file under this more new directory.
It is understood that step S203 to step S205 updates the data a kind of more preferred mode of file for data management module.
It should be noted that, in the embodiment of the present invention, the more new directory, the data directory and the backup directory are set in advance, and these three catalogues are in identical file system, to ensure that data file file system index node (inode) when mobile keeps constant, so as to keep internal memory mapping relations;The index node can be used to deposit the essential information of archives and catalogue, comprising time, shelves name, user and group etc..
Further, can be loaded into the data file under this more new directory in shared drive by the way of internal memory mapping by data management module, wherein, can be specific:Internal memory mapping refers to just by a file to one piece of mapping of internal memory.Win32 is provided allows application program File Mapping a to function for process (CreateFileMapping).Some are similar for Memory Mapping File and virtual memory, one region of address space can be retained by Memory Mapping File, physical storage is submitted into this region simultaneously, the physical storage of internal memory mapping must be mapped file first from a file being already present on disk before being operated to this document.When being stored in the file on disk using internal memory mapping treatment, it will not be necessary to perform I/O operation to file again so that internal memory can play considerable effect when being mapped in the file for the treatment of big data quantity.
In addition, be loaded into the data file under this more new directory in shared drive by data management module, exactly " shared drive data " are updated, wherein, shared drive data are stored in shared drive, the data that can be used in conjunction with by multiple processes.The implication of data is by process interpretation.For example, fixed/variable hash table, general key-value storage organizations, trie trees etc.." data file " is mapped to internal memory by its loading generally by direct, and does necessary initialization operation completion.
In step S206, data management module deletes the data file mapping before data file update, and updates the data fresh information.
Wherein, the data fresh information includes the filename and very first time information of data file, and data file is loaded into the very first time information current time during shared drive for data management module.
It is understood that data fresh information is stored in shared drive, to record the update status of " shared drive data ".General filename using data file+in the form of the renewal time.
For example:When data management module determines that data file has renewal, time point while data file is mapped into shared drive during record loading is " TT ", because data management module can periodically check flag files, therefore the corresponding file of data file that current renewal can be read is entitled " AA ", then record data fresh information is " AA+TT ".
In step S207, data read module is read out according to prefixed time interval to the data fresh information.
In step S208, if the time in very first time information indicated in the data fresh information for reading is later than the time in the second temporal information, data read module determines that data file has renewal.
In step S209, updated data file under the data directory is mapped to the shared drive and is waited by data read module.
In step S210, data read module deletes the data file mapping before data file update when the stand-by period preset time threshold is exceeded.
It is understood that step S207 to step S210 is a kind of more preferred mode that the data fresh information that data read module is recorded according to data management module carries out data file update and loading.
Such as, data read module periodic test data fresh information, judges whether needs and updates the data by reading the temporal information in fresh information.If it find that the time in very first time information indicated in data fresh information is later than the time in the second temporal information, then data read module determines that data file has renewal, that is, if in the temporal information of record, the time that data management module updates the data in the very first time information recorded during file is " 8:00 ", and the last loading data file of data read module be time in the second temporal information of record be " 7:00 ", then can determine that data file has renewal, data read module needs to be updated loading to the data file for updating.
Data read module updates loading data file, can be specific as follows:
Such as:The data file being updated under data directory is mapped to shared drive, because this when can simultaneously have two mappings, legacy data file and new data file are while in shared drive.Because current legacy data file also has access, so the internal memory mapping of legacy data file can not be released at once.Waited after having loaded, waited some seconds, released the mapping of old data file, such as 2~30S, to ensure that legacy data file is no longer used.
Herein it should be noted that, each functional module (such as data management module and data read module) each Self management shared drive of oneself, after all functional modules all relieve old data file mapping, old data file can just be recycled by operating system.In the processing system, data management module will be responsible for loading, initialize and updating write-in, and these operations are all exclusive, it is necessary to be done by a process.Data read module then can be as another process, for reading data.
It is appreciated that, the embodiment is analyzed mainly for the renewal process of data management module and data read module, the part do not described in detail in this embodiment, such as, data file is loaded into shared drive by data management module, and the partial content for being initialized, detailed description of the first embodiment for data documents disposal method is may refer to, here is omitted.
From the above, the data documents disposal method that the present embodiment is provided, first with the data structure of shared drive be stored on disk data file by data management module, and the data file will be loaded into the shared drive by the way of internal memory mapping, and record fresh information, to realize being updated data file and loading processing, and then data read module is updated and loads according to the data fresh information to data file, and the data in shared drive are used in conjunction with so as to process;The embodiment of the present invention is deposited by by the data file of clear data structure on disk with the data structure of shared drive, and is loaded into shared drive, improves loading efficiency;Support that loading, many places use at one, i.e., by shared drive, realize that multiple processes, using with portion shared drive data, greatly reduce extra EMS memory occupation.
3rd embodiment
For ease of preferably implementing data documents disposal method provided in an embodiment of the present invention, the embodiment of the present invention also provides a kind of system based on above-mentioned data documents disposal method.Wherein the implication of noun is identical with the method for above-mentioned data documents disposal, implements the explanation during details may be referred to embodiment of the method.
Fig. 4 is referred to, Fig. 4 is the structural representation of data documents disposal system provided in an embodiment of the present invention, can specifically include data management module 401 and data read module 402.
Wherein, the data management module 401, for the data file on disk to be mapped in shared drive, and initializes to the shared drive;Based on the loading procedure record data fresh information, to realize the loading processing to data file.Can be specific as follows:
Data management module 401 obtains the data file on disk, and the wherein data file is deposited with the data structure of shared drive;The data file is loaded into the shared drive by the way of internal memory mapping, and the shared drive is initialized;Based on the loading procedure record data fresh information, the data fresh information includes the filename and very first time information of data file, and the very first time information is the time when loading procedure is loaded.
For example, the operation for initializing herein can refer specifically to carry out initialization operation to the lock in internal storage data, mainly for the internal storage data for having read-write (such as being modified to partial data) demand simultaneously, it is necessary to design lock (such as Read-Write Locks, sequence lock) in memory data structure to ensure correct access.
The data management module 401 is used to be read out data file treatment according to the data fresh information, the data in shared drive can be used in conjunction with so as to process specifically, data management module 401 is updated and loads according to data fresh information to data file.
Such as:After the module of data management have recorded data fresh information, data read module 402 can be read out and detect according to prefixed time interval to the data fresh information;If it is determined that data fresh information middle finger is shown with the fresh information of purpose data file, then in data file to be loaded into the shared drive by the way of internal memory maps, and the time recorded when the data read module 402 mapping is loaded is the second temporal information.
It is appreciated that, the initialization operation due to the executed of data management module 401, therefore this need not again do operation bidirectional using the module of data, operating system can ensure that same data file is mapped in identical shared drive, so as to also complete the process that data are loaded using the module of data.
Optionally, the data management module 401, before the data file on disk is obtained, can be also used for that data file is stored in disk with the data structure of shared drive, and the data structure of the shared drive includes array, hash table, even numbers group word lookup tree.
It is understood that when data file loading is carried out, first can be pre-processed to the data file on disk, such as:Data file is stored in disk with the data structure of shared drive, wherein, the data structure of shared drive includes but is not limited to array, hash table, even numbers group word lookup tree (trie) etc., and it is not especially limited herein.
It is stored on disk in binary file form, shared drive is mapped to by way of internal memory maps.Its content can be any shared drive data structure.Including but not limited to array, hash table, even numbers group trie trees etc..Below by taking hash table as an example, its data structure can be as shown in figure 3, hash table data structure can include:
(1) header information HEADER, deposits the metadata of hash table.The lock that for example data type, version number, control multi-process are accessed, hash table statistical information etc..
(2) hash bucket BUCKET, content is directed to the index of NODE arrays.
(3) node NODE, content includes pointing to index, the key (key) of next node.The index of CHUNK arrays is pointed to, is elongated hash table for value, also the length including being worth.
(4) data block CHUNK, fixed length hashes direct storage value, and elongated hash also deposits an index for pointing to next CHUNK.
It is contemplated that being only analyzed explanation by taking hash table data structure as an example herein, limitation of the invention is not constituted.
Optionally, after the shared drive is initialized, the data file on disk can also be carried out to update detection in real time, for example, specifically can be as follows:
As shown in figure 4, the system can also include updating detection module 403, for being updated detection to the data file on disk;Wherein, the mode for updating detection includes but is not limited to crc verifications, md5sum verifications.When the data file that there is renewal in need is determined, the data file that the needs update is copied under more new directory;
The data management module 401, can be also used for being loaded into the data file under this more new directory in shared drive, and initialize the shared drive;Data file under data directory is moved into backup directory;Data file under this more new directory is moved under the data directory.
After data file loading updates, the mapping of original data file can also be deleted.
Such as:The data management module 401, after the data file under this more new directory is moved under the data directory, can be also used for deleting the data file mapping before data file update.The data management module 401, after the data file mapping before deleting data file update, can be also used for updating the data fresh information.
It should be noted that, in the embodiment of the present invention, the more new directory, the data directory and the backup directory are set in advance, and these three catalogues are in identical file system, to ensure that data file file system index node (inode) when mobile keeps constant, so as to keep internal memory mapping relations;The index node can be used to deposit the essential information of archives and catalogue, comprising time, shelves name, user and group etc..
It is further alternative, detection can be updated to data file by the mode such as CRC or message digest algorithm MD5, it is not specifically described herein.
Further, can be loaded into the data file under this more new directory in shared drive by the way of internal memory mapping by data management module 401, wherein, can be specific:Internal memory mapping refers to just by a file to one piece of mapping of internal memory.Win32 is provided allows application program File Mapping a to function for process (CreateFileMapping).Some are similar for Memory Mapping File and virtual memory, one region of address space can be retained by Memory Mapping File, physical storage is submitted into this region simultaneously, the physical storage of internal memory mapping must be mapped file first from a file being already present on disk before being operated to this document.When being stored in the file on disk using internal memory mapping treatment, it will not be necessary to perform I/O operation to file again so that internal memory can play considerable effect when being mapped in the file for the treatment of big data quantity.
In addition, be loaded into the data file under this more new directory in shared drive by data management module 401, exactly " shared drive data " are updated, wherein, shared drive data are stored in shared drive, the data that can be used in conjunction with by multiple processes.The implication of data is by process interpretation.For example, fixed/variable hash table, general key-value storage organizations, trie trees etc.." data file " is mapped to internal memory by its loading generally by direct, and does necessary initialization operation completion.
Now, the data read module 402, be can be also used for according to prefixed time interval, and the data fresh information is read out;If it is determined that data fresh information middle finger is shown with the fresh information of purpose data file, then in data file being loaded into the shared drive by the way of the internal memory mapping, and time when recording loading is the second temporal information.
Further, the data read module 402, after being read out to the data fresh information, if the time that the time in very first time information indicated in being additionally operable to the data fresh information for reading is later than in the second temporal information, it is determined that data file has renewal;Updated data file under the data directory is mapped to the shared drive and is waited;When the stand-by period preset time threshold is exceeded, the data file mapping before data file update is deleted.
Herein it should be noted that, each functional module (such as data management module 401 and data read module 402) each Self management shared drive of oneself, after all functional modules all relieve old data file mapping, old data file can just be recycled by operating system.In the processing system, data management module will be responsible for loading, initialize and updating write-in, and these operations are all exclusive, it is necessary to be done by a process.Data read module then can be as another process, for reading data.
During specific implementation, above modules can be realized as independent entity, it is also possible to be combined, realized as same or several entities, such as, reference can be made to second embodiment, the specific implementation of above modules can be found in embodiment of the method above, will not be repeated here.
From the above, the processing system of the data file that the present embodiment is provided, first data file is stored on disk with the data structure of shared drive, and the data file will be loaded into the shared drive by the way of internal memory mapping, and record fresh information, to realize carrying out data file loading processing, and then data file is updated and loaded according to the data fresh information, the data in shared drive are used in conjunction with so as to process;The embodiment of the present invention is deposited by by the data file of clear data structure on disk with the data structure of shared drive, and is loaded into shared drive, improves loading efficiency;Support that loading, many places use at one, i.e., by shared drive, realize that multiple processes, using with portion shared drive data, greatly reduce extra EMS memory occupation.
Fourth embodiment
The embodiment of the present invention also provides a kind of server, wherein can be with the data documents disposal system of the integrated embodiment of the present invention, as shown in figure 5, it illustrates the structural representation of the server involved by the embodiment of the present invention, specifically:
The server can include the part such as or memory 502, radio frequency (Radio Frequency, RF) circuit 503, power supply 504, input block 505 and display unit 506 of processor 501, one or more computer-readable recording mediums of more than one processing core.It will be understood by those skilled in the art that the server architecture shown in Fig. 5 does not constitute the restriction to server, part more more or less than diagram can be included, or combine some parts, or different part arrangements.Wherein:
Processor 501 is the control centre of the server, using various interfaces and the various pieces of the whole server of connection, by running or performing software program and/or module of the storage in memory 502, and call data of the storage in memory 502, the various functions and processing data of execute server, so as to carry out integral monitoring to server.Optionally, processor 501 may include one or more processing cores;Preferably, processor 501 can integrated application processor and modem processor, wherein, application processor mainly processes operating system, user interface and application program etc., and modem processor mainly processes radio communication.It is understood that above-mentioned modem processor can not also be integrated into processor 501.
Memory 502 can be used to store software program and module, and processor 501 stores the software program and module in memory 502 by operation, so as to perform various function application and data processing.Memory 502 can mainly include storing program area and storage data field, wherein, application program (such as sound-playing function, image player function etc.) that storing program area can be needed for storage program area, at least one function etc.;Storage data field can be stored and use created data etc. according to server.Additionally, memory 502 can include high-speed random access memory, nonvolatile memory, for example, at least one disk memory, flush memory device or other volatile solid-state parts can also be included.Correspondingly, memory 502 can also include Memory Controller, to provide access of the processor 501 to memory 502.
During RF circuits 503 can be used to receive and send messages, the reception and transmission of signal especially, after the downlink information of base station is received, transfer to one or more than one processor 501 are processed;In addition, up data is activation will be related to base station.Generally, RF circuits 503 include but is not limited to antenna, at least one amplifier, tuner, one or more oscillators, subscriber identity module (SIM) card, transceiver, coupler, low-noise amplifier (LNA, Low Noise Amplifier), duplexer etc..Additionally, RF circuits 503 can also be communicated by radio communication with network and other equipment.The radio communication can use any communication standard or agreement,Including but not limited to global system for mobile communications (GSM,Global System of Mobile communication),General packet radio service (GPRS,General Packet Radio Service),CDMA (CDMA,Code Division Multiple Access),WCDMA (WCDMA,Wideband Code Division Multiple Access),Long Term Evolution (LTE,Long Term Evolution),Email,Short Message Service (SMS,Short Messaging Service) etc..
Server also includes the power supply 504 (such as battery) powered to all parts, preferably, power supply can be logically contiguous with processor 501 by power-supply management system, so as to realize the functions such as management charging, electric discharge and power managed by power-supply management system.Power supply 504 can also be including one or more direct current or AC power, recharging system, power failure detection circuit, power supply changeover device or the random component such as inverter, power supply status indicator.
The server may also include input block 505, and the input block 505 can be used to receive the numeral or character information of input, and produce the keyboard relevant with user's setting and function control, mouse, action bars, optics or trace ball signal input.
The server may also include display unit 506, the display unit 506 can be used for display by the information of user input or be supplied to the information of user and the various graphical user interface of server, and these graphical user interface can be made up of figure, text, icon, video and its any combination.Display unit 508 may include display panel, optionally, display panel can be configured using forms such as liquid crystal display (LCD, Liquid Crystal Display), Organic Light Emitting Diodes (OLED, Organic Light-Emitting Diode).
Specifically in the present embodiment, processor 501 in server can be according to following instruction, the corresponding executable file of the process of one or more application program is loaded into memory 502, and application program of the storage in memory 502 is run by processor 501, it is as follows so as to realize various functions:
The data file on disk is obtained, the wherein data file is deposited with the data structure of shared drive;
The data file is loaded into the shared drive by the way of internal memory mapping, and the shared drive is initialized;
Based on the loading procedure record data fresh information, the data fresh information includes the filename and very first time information of data file, and the very first time information is the time when loading procedure is loaded;
Treatment is read out to data file according to the data fresh information.
Preferably, the processor 501 can be also used for:Before obtaining the data file on disk,
Data file is stored in disk with the data structure of shared drive, the data structure of the shared drive includes array, hash table, even numbers group word lookup tree.
Preferably, the processor 501 be can be also used for, and the data file is loaded into the shared drive, and after being initialized to the shared drive:
Detection is updated to the data file on disk;When the data file that there is renewal in need is determined, the data file that the needs update is copied under more new directory;Data file under this more new directory is loaded into shared drive, and the shared drive is initialized;Data file under data directory is moved into backup directory;Data file under this more new directory is moved under the data directory, wherein, the more new directory, the data directory and the backup directory are in identical file system.
Based on this, the processor 501 can be also used for deleting the data file mapping before data file update.
Preferably, the processor 501 can be also used for, after the data file mapping before deletion data file update:
Update the data fresh information;According to prefixed time interval, the data fresh information is read out;If it is determined that data fresh information middle finger is shown with the fresh information of purpose data file, then in data file being loaded into the shared drive by the way of the internal memory mapping, and time when recording loading is the second temporal information.
Based on this, the processor 501 can be also used for:After being read out to the data fresh information, if the time in very first time information indicated in the data fresh information for reading is later than the time in the second temporal information, it is determined that data file has renewal;Updated data file under the data directory is mapped to the shared drive and is waited;When the stand-by period preset time threshold is exceeded, the data file mapping before data file update is deleted.
From the above, in the server that the present embodiment is provided, first data file is stored on disk with the data structure of shared drive, and the data file will be loaded into the shared drive by the way of internal memory mapping, and record fresh information, to realize carrying out data file loading processing, and then data file is updated and loaded according to the data fresh information, the data in shared drive are used in conjunction with so as to process;The embodiment of the present invention is deposited by by the data file of clear data structure on disk with the data structure of shared drive, and is loaded into shared drive, improves loading efficiency;Support that loading, many places use at one, i.e., by shared drive, realize that multiple processes, using with portion shared drive data, greatly reduce extra EMS memory occupation.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the part described in detail in certain embodiment, may refer to the detailed description above with respect to data documents disposal method, and here is omitted.
The data documents disposal system provided in an embodiment of the present invention, it is for example computer, panel computer, the mobile phone with touch function etc., the data documents disposal system belongs to same design with the data documents disposal method in foregoing embodiments, the either method provided in the data documents disposal embodiment of the method can be provided in the data documents disposal system, it implements process and refers to the data documents disposal embodiment of the method, and here is omitted.
It should be noted that, for data documents disposal method of the present invention, this area common test personnel are appreciated that to realize all or part of flow of data documents disposal method described in the embodiment of the present invention, computer program be can be by control the hardware of correlation to complete, the computer program can be stored in a computer read/write memory medium, if storage is in the memory of terminal, and by least one computing device in the terminal, the flow of the embodiment of data documents disposal method as described is may include in the process of implementation.Wherein, described storage medium can be magnetic disc, CD, read-only storage (ROM, Read Only Memory), random access memory (RAM, Random Access Memory) etc..
For the data documents disposal system of the embodiment of the present invention, its each functional module can be integrated in a process chip, or modules are individually physically present, it is also possible to which two or more modules are integrated in a module.Above-mentioned integrated module can both be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.If the integrated module is to realize in the form of software function module and as independent production marketing or when using, it is also possible to which in a computer read/write memory medium, the storage medium is for example read-only storage, disk or CD etc. for storage.
A kind of data documents disposal method and system for being provided the embodiment of the present invention above are described in detail, specific case used herein is set forth to principle of the invention and implementation method, and the explanation of above example is only intended to help and understands the method for the present invention and its core concept;Simultaneously for those skilled in the art, according to thought of the invention, will change in specific embodiments and applications, in sum, this specification content should not be construed as limiting the invention.

Claims (12)

1. a kind of data documents disposal method, it is characterised in that including:
The data file on disk is obtained, wherein the data file is deposited with the data structure of shared drive Put;
The data file is loaded into the shared drive by the way of internal memory mapping, and to described common Internal memory is enjoyed to be initialized;
Based on the loading procedure record data fresh information, the data fresh information includes the text of data file Part name and very first time information, the very first time information are the time when loading procedure is loaded;
Treatment is read out to data file according to the data fresh information.
2. data documents disposal method according to claim 1, it is characterised in that the acquisition disk On data file before, also include:
Data file is stored in disk with the data structure of shared drive, the data knot of the shared drive Structure includes array, hash table, even numbers group word lookup tree.
3. data documents disposal method according to claim 1, it is characterised in that described by the number It is loaded into the shared drive according to file, and after being initialized to the shared drive, is also included:
Detection is updated to the data file on disk;
When the data file that there is renewal in need is determined, the data file for updating is needed to replicate by described To under more new directory;
Data file under the more new directory is loaded into shared drive, and the shared drive is carried out Initialization;
Data file under data directory is moved into backup directory;
Data file under the more new directory is moved under the data directory, wherein, the renewal mesh Record, the data directory and the backup directory are in identical file system.
4. data documents disposal method according to claim 3, it is characterised in that it is described will described in more After data file under new directory is moved under the data directory, also include:
Delete the data file mapping before data file update.
5. data documents disposal method according to claim 4, it is characterised in that the deletion data After data file mapping before file renewal, also include:
Update the data fresh information;
According to prefixed time interval, the data fresh information is read out;
If it is determined that data fresh information middle finger is shown with the fresh information of purpose data file, then reflected using internal memory Be loaded into data file in the shared drive by the mode penetrated, and time when recording loading was the second time Information.
6. data documents disposal method according to claim 5, it is characterised in that described to the number After being read out according to fresh information, also include:
If the time in the data fresh information for reading in indicated very first time information is later than the second time letter Time in breath, it is determined that data file has renewal;
Updated data file under the data directory is mapped to the shared drive and is waited;
When the stand-by period preset time threshold is exceeded, the data file mapping before data file update is deleted.
7. a kind of data documents disposal system, it is characterised in that including:
Data management module, for obtaining the data file on disk, wherein the data file is with shared interior The data structure deposited is deposited;The data file is loaded into described sharing by the way of internal memory mapping In internal memory, and the shared drive is initialized;Based on the loading procedure record data fresh information, The data fresh information includes the filename and very first time information of data file, the very first time letter It is the time when loading procedure is loaded to cease;
Data read module, for being read out treatment to data file according to the data fresh information.
8. data documents disposal system according to claim 7, it is characterised in that the data management Module, before the data file on disk is obtained, is additionally operable to the data structure of shared drive that data are literary Part is stored in disk, and the data structure of the shared drive includes that array, hash table, even numbers group word are looked into Look for tree.
9. data documents disposal system according to claim 7, it is characterised in that the system is also wrapped Renewal detection module is included, for being updated detection to the data file on disk;Needed when determining there are During the data file to be updated, the data file for needing to update is copied under more new directory;
The data management module, is additionally operable to for the data file under the more new directory to be loaded into shared drive In, and the shared drive is initialized;Data file under data directory is moved into backup directory; Data file under the more new directory is moved under the data directory, wherein, the more new directory, The data directory and the backup directory are in identical file system.
10. data documents disposal system according to claim 9, it is characterised in that the data pipe Reason module, after the data file under the more new directory is moved under the data directory, is additionally operable to Delete the data file mapping before data file update.
11. data documents disposal systems according to claim 10, it is characterised in that the data pipe Reason module, after the data file mapping before deleting data file update, is additionally operable to update the data more Fresh information;
The data read module, is additionally operable to according to prefixed time interval, and the data fresh information is carried out Read;If it is determined that data fresh information middle finger is shown with the fresh information of purpose data file, then using internal memory Be loaded into data file in the shared drive by the mode of mapping, and time when recording loading is when being second Between information.
12. data documents disposal systems according to claim 11, it is characterised in that the data are read Modulus block, after being read out to the data fresh information, if being additionally operable to the data fresh information for reading In time in indicated very first time information be later than time in the second temporal information, it is determined that data text Part has renewal;Updated data file under the data directory is mapped to the shared drive and carried out etc. Treat;When the stand-by period preset time threshold is exceeded, the data file mapping before data file update is deleted.
CN201510454768.0A 2015-07-29 2015-07-29 A kind of data file processing method and system Active CN106708825B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510454768.0A CN106708825B (en) 2015-07-29 2015-07-29 A kind of data file processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510454768.0A CN106708825B (en) 2015-07-29 2015-07-29 A kind of data file processing method and system

Publications (2)

Publication Number Publication Date
CN106708825A true CN106708825A (en) 2017-05-24
CN106708825B CN106708825B (en) 2019-09-27

Family

ID=58894947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510454768.0A Active CN106708825B (en) 2015-07-29 2015-07-29 A kind of data file processing method and system

Country Status (1)

Country Link
CN (1) CN106708825B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908798A (en) * 2017-12-20 2018-04-13 浙江煮艺文化科技有限公司 The processing method and system of a kind of data file
CN108958732A (en) * 2018-06-28 2018-12-07 上海恺英网络科技有限公司 A kind of data load method and equipment based on PHP
CN109359005A (en) * 2018-09-14 2019-02-19 厦门天锐科技股份有限公司 A kind of data acquisition treatment method of striding course
CN109542911A (en) * 2018-12-03 2019-03-29 郑州云海信息技术有限公司 A kind of metadata organization method, system, equipment and computer readable storage medium
CN110716939A (en) * 2019-10-16 2020-01-21 深圳市网心科技有限公司 Data management method, electronic device, system and medium
CN111158611A (en) * 2020-03-26 2020-05-15 长春师范大学 New energy automobile controller memory management method
CN111736973A (en) * 2020-06-24 2020-10-02 北京奇艺世纪科技有限公司 Service starting method, device, server and storage medium
CN113110944A (en) * 2021-03-31 2021-07-13 北京达佳互联信息技术有限公司 Information searching method, device, server, readable storage medium and program product
CN113806593A (en) * 2020-06-17 2021-12-17 新疆金风科技股份有限公司 Communication abnormity detection method and device for wind power plant and plant controller

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101082928A (en) * 2007-06-25 2007-12-05 腾讯科技(深圳)有限公司 Method for accessing database and data-base mapping system
CN101296157A (en) * 2007-04-26 2008-10-29 北京师范大学珠海分校 Multi-network card coordination communication method
CN101551808A (en) * 2009-05-13 2009-10-07 山东中创软件商用中间件股份有限公司 Technology supporting multi-process embedded tree-based databases
CN101986649A (en) * 2010-11-29 2011-03-16 深圳天源迪科信息技术股份有限公司 Shared data center used in telecommunication industry billing system
CN102890679A (en) * 2011-07-20 2013-01-23 中兴通讯股份有限公司 Method and system for processing data version

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101296157A (en) * 2007-04-26 2008-10-29 北京师范大学珠海分校 Multi-network card coordination communication method
CN101082928A (en) * 2007-06-25 2007-12-05 腾讯科技(深圳)有限公司 Method for accessing database and data-base mapping system
CN101551808A (en) * 2009-05-13 2009-10-07 山东中创软件商用中间件股份有限公司 Technology supporting multi-process embedded tree-based databases
CN101986649A (en) * 2010-11-29 2011-03-16 深圳天源迪科信息技术股份有限公司 Shared data center used in telecommunication industry billing system
CN102890679A (en) * 2011-07-20 2013-01-23 中兴通讯股份有限公司 Method and system for processing data version

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908798A (en) * 2017-12-20 2018-04-13 浙江煮艺文化科技有限公司 The processing method and system of a kind of data file
CN108958732A (en) * 2018-06-28 2018-12-07 上海恺英网络科技有限公司 A kind of data load method and equipment based on PHP
CN109359005A (en) * 2018-09-14 2019-02-19 厦门天锐科技股份有限公司 A kind of data acquisition treatment method of striding course
CN109359005B (en) * 2018-09-14 2022-04-19 厦门天锐科技股份有限公司 Cross-process data acquisition and processing method
CN109542911A (en) * 2018-12-03 2019-03-29 郑州云海信息技术有限公司 A kind of metadata organization method, system, equipment and computer readable storage medium
CN109542911B (en) * 2018-12-03 2021-10-29 郑州云海信息技术有限公司 Metadata organization method, system, equipment and computer readable storage medium
CN110716939A (en) * 2019-10-16 2020-01-21 深圳市网心科技有限公司 Data management method, electronic device, system and medium
CN111158611A (en) * 2020-03-26 2020-05-15 长春师范大学 New energy automobile controller memory management method
CN113806593A (en) * 2020-06-17 2021-12-17 新疆金风科技股份有限公司 Communication abnormity detection method and device for wind power plant and plant controller
CN111736973A (en) * 2020-06-24 2020-10-02 北京奇艺世纪科技有限公司 Service starting method, device, server and storage medium
CN113110944A (en) * 2021-03-31 2021-07-13 北京达佳互联信息技术有限公司 Information searching method, device, server, readable storage medium and program product

Also Published As

Publication number Publication date
CN106708825B (en) 2019-09-27

Similar Documents

Publication Publication Date Title
CN106708825A (en) Data file processing method and system
US11093472B2 (en) Using an LSM tree file structure for the on-disk format of an object storage platform
CN110018998B (en) File management method and system, electronic equipment and storage medium
US7636736B1 (en) Method and apparatus for creating and using a policy-based access/change log
US20130339314A1 (en) Elimination of duplicate objects in storage clusters
CN106021256A (en) De-duplicating distributed file system using cloud-based object store
CN105988996B (en) Index file generation method and device
US11176110B2 (en) Data updating method and device for a distributed database system
CN110737682A (en) cache operation method, device, storage medium and electronic equipment
CN102984357B (en) Contact person information managing method and managing device
CN111177143B (en) Key value data storage method and device, storage medium and electronic equipment
US11567681B2 (en) Method and system for synchronizing requests related to key-value storage having different portions
CN108370385B (en) System, method and computer readable medium for transmitting container file over network
CN105447166A (en) Keyword based information search method and system
US8296270B2 (en) Adaptive logging apparatus and method
US11748357B2 (en) Method and system for searching a key-value storage
JP2018049653A (en) Cache management
Zhai et al. Hadoop perfect file: A fast and memory-efficient metadata access archive file to face small files problem in hdfs
US11550913B2 (en) System and method for performing an antivirus scan using file level deduplication
CN107085613B (en) Method and device for filtering files to be put in storage
JP6788002B2 (en) Data storage methods and devices for mobile devices
CN111694703A (en) Cache region management method and device and computer equipment
CN111813783B (en) Data processing method, device, computer equipment and storage medium
US10311021B1 (en) Systems and methods for indexing backup file metadata
CN113986921A (en) Blacklist query method, system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant