WO2012083877A1 - Method and device for creating indexes for mass data records - Google Patents

Method and device for creating indexes for mass data records Download PDF

Info

Publication number
WO2012083877A1
WO2012083877A1 PCT/CN2011/084518 CN2011084518W WO2012083877A1 WO 2012083877 A1 WO2012083877 A1 WO 2012083877A1 CN 2011084518 W CN2011084518 W CN 2011084518W WO 2012083877 A1 WO2012083877 A1 WO 2012083877A1
Authority
WO
Grant status
Application
Patent type
Prior art keywords
file
identifier
logical
time domain
time
Prior art date
Application number
PCT/CN2011/084518
Other languages
French (fr)
Chinese (zh)
Inventor
王俊
程宁
王冲
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Abstract

Disclosed are a method and device for creating indexes for mass data records. The method includes: acquiring the current system time when a new write file request message is received; generating index keywords for a file according to the current system time and the file identifier of the file requested to be written by the write file request message; and creating an association relationship between the index keywords and the file. By way of the present invention, quick location for mass data storage can be realized.

Description

Establishing mass index data recording method and device TECHNICAL FIELD The present invention relates to the field of computer and communications technology, particularly, to a mass data recording method and apparatus for indexing. BACKGROUND ART result memory database is usually used as the index key, but for mass data storage in this way by name or other attribute file write time indexing slow or index is not unique. For example, the current IPTV system to access stored media files via a distributed file system, the file system metadata file memory database management systems. In the application, it requires the system to support five million file records and twenty million (CHUNK) record, the existing index method if you can not reach quickly locate needs. SUMMARY indexing to provide a method and apparatus for recording mass data of the present invention to solve at least one of the above problems. According to one aspect of the present invention, there is provided a recording mass data indexing method, comprising: upon receiving a new write file request message to obtain the current system time; according to the current system time and the write file write file request message requesting the identification document, generating an index file of the key; relationship of the index keys in association with the file. Preferably, the document file may be distributed. Preferably, the file identifier may include: identifying a first file of the logical file logical distributed file; the basis of the current system time to generate the file identifier and the index key comprising: Step A: The said current system time with respect to the total duration of the predetermined elapsed time, obtaining a first time domain parameter; step B: the preset configuration policy, to generate the first file logical identifier; step C: the first time domain parameter the first logical file identifier synthesized as lookup keys; step D: searching the lookup keys recorded in the data area, if you can not find the lookup keys or the first parameter and the time domain the second time domain parameters and a second logical file identifier identifying a first logical file lookup keys to find instructions do not complete the same, it indicates that the first logical document identifier is valid, the first parameter and the time domain a unique ID identifying said first logical files as lookup keys to the synthesis of the logical file, the lookup keys as index keys. Preferably, the document identifier may comprise: a first segment file identifying the first logical file to identify the logical file and the distributed file segment file in the distributed file; the basis of the current system time and the index key to generate the file identifier comprising: step a: the current system time with respect to the total duration of the predetermined elapsed time, obtaining a first time domain parameter; step B: the preset configuration policy, generate the the first file identifier and said first logical segment files identified; step C: the first time domain parameters, the first logical file identifier and the file identifier of the first fragment is synthesized lookup keys; step D: Find the data recording area in the lookup keys, if the lookup keys can not be searched, or the first file logical identifier to lookup keys to find a first indication of the time domain parameters and with the second time-domain parameter and a second identification document does not achieve the same logic, and the first time domain parameters and said first fragment and said second file identifier and the time domain parameters Said second segment file identification is not completed, this indicates that the first logical combination of the first file identifier identifies valid segment files, the first time domain parameters, the first logical file identifier and the first segment file identifying the synthetic key as the unique ID of the lookup segment file, the lookup keys as index keys. Preferably, if the same parameter a first time domain and the second time domain parameter and the first fragment and said second file identifier identifies the same segment files, the method further comprising: modifying the value of the first segment file identifies the fragment generated new file identifier, then the new fragment file identifier as the first fragment of the file identifier, the step returns square. Preferably, generating the first identifier may include a file fragment: value segment file identified by the acquired last generation, will increase the value of the specified increment, to obtain the first fragment identification document, wherein the said sheet document identification quantile is determined by the configuration occupied policy; modifying the first sub-sheet document identification value to generate a new segment file identification may comprise: the value of the first segment file identifier increasing the specified increment the new fragment obtained file identifier. Preferably, if the same as the first time and the second time domain parameter and the first parameter field identifying a logical file with the same file identifier of the second logic, the method further comprising: modifying the first logical file identification value to generate a new logical file identification, and then the new logical file identifier file identifier as the first logic returns to step square. Preferably, generating the first file logical identifier may include: a file identified by the acquired logical value of the last generation, will increase the value of the specified increment, to give the first logic file identifier, wherein the logical file is determined by identifying the number of bits occupied by the configuration policy; a first logic value of said identification file is modified to generate a new logical file identifier may include: increase of the increment value of the specified first logical file identifier the resulting new logical file identifier. Preferably, the step A may comprise: according to the total length, obtaining the value of each bit field of the time domain, wherein the bit field comprises a time domain: In the domain, the domain minutes or hours domain and second domain, the total duration of minutes or hours of m n k seconds, the value for the field in said recording of m, the field for minutes or hours domain recording the value of n, the second domain for recording the value of k, m, n and k is an integer greater than or equal to 0; time domain parameter to obtain the first bit of the mixing time domain arranged in accordance with policy, wherein the policy configuration comprising: a unit hour within seconds of the time domain aligned with the same order of magnitude, in the domain, the hour and minute domain or displacement domain to the time domain to obtain the first low-order parameter; Alternatively, the number of units within minutes or hours year the time domain aligned to the same order of magnitude, displacement of the domain corresponding to a lower bit in the time domain to obtain the first parameter. Preferably, after establishing the association relationship between the key and the index file, the method may further comprise: the same time concurrent write request exceeds a preset ratio; logical distributed file modification or configuration file identification the new bit and the key index identifier occupied segment file, and / or modify or configure the resulting mixed configuration policy bit of the first time-domain parameters, returns the procedure B, the distributed file rebuild word. Preferably, the lookup keys A method according modulo composite folded. Preferably, establishing the index keys in association with the file may comprise: applying an empty record in the memory location in the database record of the file, the file name and the logical index of the distributed key file correspondence relation memory word to the recording position, and the index key is added to the index data area; segment files when the distributed file is stored, the index key as the segment files the actual file name of the store. According to another aspect of the present invention, there is provided one kind of index recording mass data establishing means comprises: an obtaining module, arranged to upon receiving a new write file request to obtain the current system time; generation module, according to said current system time and the write file request message document identification request file write, generating an index file of the distributed key; to build up a correlation between the index key with the file module, provided . Preferably, the file may be distributed file; the file identifier may comprise: a first file logical file logical distributed file identifier; if the generating module comprises: an obtaining submodule, is provided according to the the current system time with respect to the total duration of the predetermined elapsed time, obtaining a first time domain parameter; generating submodule, is provided according to a preset configuration policy, to generate the first file logical identifier; synthons module, provided to the a first time-domain parameter and the first logical file identifier synthesized as lookup keys; submodule lookup, the lookup keys arranged to look at the data area of ​​the recording, if the lookup keys can not be searched or the a first time-domain parameter and a second parameter of the first time domain identifier indicating the logical file lookup keys to find and identify a second not achieve the same logical file, then the lookup keys as index keys . Preferably, the file identifier may further comprise: the segment files distributed file of the first segment file identifier; if the generation sub-module is also provided according to a preset configuration policy generating the first segment file identifier; the synthesis sub-module, to set the first time domain parameters, the first logical file identifier and the file identifier of the first fragment is synthesized lookup keys; Find the sub-module, searching the lookup keys set in the data area of ​​the recording, if the lookup keys can not be searched, or the first time domain parameters and said first logical identifier and the found file lookup keys indicated the second time domain parameters and a second logic to accomplish the same file identifier and not the first time domain parameters and said first fragment and said second file identifier parameter and the second time domain fragment file identifier is not completed the same, then the lookup key value as the index key. By the present invention, the writing time with the file identifier file index key composed of a combination, the prior art to solve the positioning of slow mass data storage problems, thereby achieving a rapid positioning effect. BRIEF DESCRIPTION OF THE DRAWINGS described herein are intended to provide further understanding of the present invention, constitute a part of this application, exemplary embodiments of the present invention are used to explain the present invention without unduly limiting the present invention. In the drawings: FIG. 1 is a flowchart of a method based on the index to establish mass data recording embodiment of the present invention; FIG. 2 is a schematic diagram of a synthesis of index keys in accordance with an embodiment of the present invention; FIG. 3 is an embodiment of the present invention synthesis schematic diagram of another embodiment of the index key; FIG. 4 is a schematic diagram of the distribution of the bit field in the time domain embodiment of the present invention; FIG. 5 is an embodiment of the present invention, after the second bit field in the time domain mixing schematic profile ; FIG. 6 is a schematic diagram of the distribution of the number of bit fields mixed hours embodiment of the present invention, a time domain; FIG. 7 is a schematic view of a logical bit fields in a distributed fashion file segment file identifying the embodiment of the present invention is identified ; bitfield a schematic view of another embodiment of the distributed logical file segment file identifying FIG. 8 is an embodiment of the present invention is identified; FIG. 9 is a logical identifier according to another document segment file identifying the embodiment of the present invention a schematic diagram of bit fields in a distributed manner; FIG. 10 is a flowchart of a distributed file index key embodiments of the present invention is generated; FIG. 11 is an embodiment of the present invention sea The structure of the index data recording apparatus establish a schematic diagram; FIG. 12 is a schematic structural diagram generation module 20 according to an embodiment of the present invention. DETAILED DESCRIPTION hereinafter with reference to the accompanying drawings in conjunction with embodiments of the present invention will be described in detail. Incidentally, in the case of no conflict, embodiments and features of the embodiments of the present application can be combined with each other. 1 is a flowchart of a method based on the index to establish mass data recording embodiment of the present invention, shown in Figure 1, the method mainly includes the following steps (step S102- Step S106): step S102, the received new write file request message to obtain the current system time; in embodiments of the invention, for each newly written file request, when acquiring the current system time, the system can be acquired from a predetermined time (for example, 1 1970 month at 0:00 on the 1st) the total duration of the current play experienced, the total length of time can be recorded as: m n h k in seconds, or, m k n minutes in seconds. File step S104, the request message requesting to write the file according to the current system time and the write file identifier, generating an index file of the key; distributed file, for example, using the distributed file segment file storage, the metadata is divided into segment file and logical file, the current system time when generating the index key of the file, step S102 may be obtained by mixing a time domain parameter bit, then the time-domain parameters and distribution logical combination of logical file identifier file type file as an index key, shown in Figure 2, or may be combined with the segment file to identify the logical file segment file and logical file distribution time domain parameter file the index key, as shown in FIG. For example, the index key can be a combination of the following two ways: a way, by the index key time domain parameters and logical file identifier, which mainly comprises the following steps: Step A: The current system time with respect to the predetermined elapsed time the total duration of the acquisition time domain parameter; step B: the preset configuration policy, to generate the logical file identifier; step C: the time domain parameters of the logical file identifier synthesized as lookup keys; step D: data area for recording in the lookup lookup keys, time domain parameters and logical file is not found if the lookup keys or the time domain parameters and identifying the logical file lookup keys to find the identity is indicated the same is completed, then step B generates a logical file valid identification is available, the above-described lookup keys a step of obtaining time domain parameter generated in step B synthesis logical file identifier of the logical file as the only - ID, the Find key as the index key. Second way, by a time domain parameter index keys, identification and the logical file segment file identifier, which mainly comprises the following steps: Step A: The current system time to the total time length with respect to a predetermined elapsed time, acquisition time domain parameter; a step B: the preset configuration policy, and to generate the logical file identifier identifies the segment files; step C: the time domain parameters, the logical file identifier and the file identifier fragment is synthesized lookup keys ; step D: Find data area recorded in the lookup keys, if the lookup keys can not be searched, or step a and step B, the time domain parameter in the lookup logical file identifier to lookup keys indicated time domain parameters and does not achieve the same logical file identifier, and said step a in the time domain parameters and step B of the lookup segment file identifier to lookup keys indicated time domain parameters and segment file complete identity is not the same, then the combination of step B and the generated logical file identifier identifies valid segment file is available, a time domain parameter acquired in step a Step B generated by the logical file identifier and file identification fragment synthesized as lookup keys unique ID of the segment file, the lookup keys as index keys. Wherein, in the above step A and a second approach in accordance with the total length of the above, obtain the value of each bit field of the time domain shown in Figure 4, the time domain may include: In the domain, the domain and second domain hour, e.g. , if the total time length of m n h k seconds, the domain of values ​​for the m, the hour field for recording the value of n in the time domain, the second domain for recording the value of k, m, n and k is an integer greater than or equal to 0. Wherein, in the domain may occupy eight, bits 13-14 hours may occupy fields, and second domain may occupy 13-14 bits. Alternatively, the total length may also be described as a k m n minutes of seconds, then the hour field replacement in FIG. 4 minutes domain, the minute field value recorded in the total length n. May then be obtained according to the configuration of the time-domain policy bit mixing new time domain parameter; wherein the time domain by mixing policy may have multiple bits, for example, may include the following two:

(1) based on the equalization unit hour of seconds: the number of seconds will be aligned with a relatively continuous change in the target set of data size in a unit of hour to the same order of magnitude higher than the lower bits to shift hours, 5 is mixed in the second bit field profile after a time domain. FIG. In this way the mixing, dispersion can affect large numbers of files in a short time index keys generated.

(2) based on the unit annual hours or equalization of the number of minutes: as above, to align the size of the unit annual number of hours or minutes, this relatively continuous change in the amount of data of the target set to the same magnitude, displacement is less than hour to a lower position, FIG. 6 schematic bitfields hours after mixing in the time domain by the distribution. In this way, it can affect the discrete index key for a long time a large number of files generated. In the above embodiment two, the time domain parameter if the segment file identifies the lookup to find the critical value of the time domain parameters of Step B in step A indicates and segment file identifier are the same, the need to modify Step B file fragmentation is worth to identify new segment file identifier, the paper sheet document identifies the file as a fragmented file is distributed identification, return to step c, regenerate a new index keys until you find unoccupied points piece file identification. In the above embodiment and a second approach, if the time domain parameters and logical file Step A Step B in identifying a time domain parameters and identification were the same logical file lookup keys to find the indication indicates that the next time the the logical file identifier is occupied, then modify the logical file identifier, generate a new logical file identification, return to step C to generate a new index key, this time until the logical file unoccupied find identity. In embodiments of the present invention, in order to facilitate the use of the logical identifier and the file identifier segment file can be calculated global increment respectively, and generates the logical file segment file identifier identified in the step (B), obtaining a logical file identifier generated after the value of the chip or sub-file identifier, the specified increment value increases, the distributed file to obtain identification and logical file segment file identifier, the file identified in the above modification, the file identifier may be specified by an increase amount (e.g., 1) to obtain a new document identification. In the embodiment of the present invention, and identifies the logical file segment file identification bits occupied bit needs to be determined depending on the size and logical file segment file size, e.g., for a small number of large files, position illustrated in Figure 7 may be employed domain in a distributed manner, in this case, segment file to occupy more bits; for a large number of small files, the bit field distribution shown in FIG. 8 may be adopted, in this case, a logical file uses more position; or a case where the number ratio of not determine the size of the stored files in the case of mixing, that is a large file, there are many small files, the division can take direct average, i.e. using bit 9 shown in FIG. domain in a distributed manner. By selecting different bit fields in a distributed fashion, so that the same point in time may be processed in different incremental document identification, it can be applied to different scenarios. In the above embodiment step c and a second approach, the lookup keys during synthesis, can be synthesized using modulo folding method, for example, for one embodiment, the step of generating a time domain parameter A plus logical file generated in step B time domain parameters of the ID or the document ID splice become logical 64-bit value, less than 64 partially filled with zeros, to obtain a keyword, the keyword lookup keys is the application of the scale performs an arithmetic modulo operation values ​​obtained. Among them, the application scale value is the estimated total number of files that can be stored under the current system scenarios. Step S106, to establish relationships with the key index of the file. For example, for distributed file storage and can apply an empty record in the memory location in the database, the corresponding relationship between the index key stored in the logical file name obtained in step S104 and to the recording position, while the index key is added to the index data area, a storage position of the index keys is about the distributed file corresponding logical file, thereby completing the key index associated with the logical file construct. After establishing the logical file associated with the index key, i.e., physical storage file segment file using the index key as a file name of the actual memory, enabling a mapping between the logical file names to physical storage. Can query the physical file storage logical file name by name, by the name of the physical file that is stored in the index key can also query the logical file name. Below logical file distributed file identifier generated index key file distributed to illustrate key embodiments of how to generate the index file is distributed in the embodiment of the present invention. As shown, in the embodiment of the present invention may be generated in a distributed file 10 according to the index key steps: Step 1001, upon receiving the distributed file write request, the total number of seconds the system acquisition time; step 1002 , judgment is based equalization policy seconds in units of hours or the use of time-domain values ​​generated based on a balanced policy of the number of hours per unit year; step 1003, using the step 1002 indicates a policy creation time field value; step 1004, judgment is using large bitfield file or distributed manner using a bit field of small files in a distributed fashion, i.e. using the bit field distribution shown in FIG. 7 or 8-bit mode field distribution shown in FIG; step 1005, step 1004 using the selected strategy the distributed file generated logical file ID; step 1006, the synthesis of new lookup keys generated in step 1003 and the step 1005 time-domain values ​​generated logical file ID; step 1007, step 1006 using the synthesis check hash lookup keys data entry area to find whether the lookup key value, if yes, performing step 1008, otherwise , Performing step 1010; if the time-domain values ​​and the logical file ID in step 1008, it is determined in step 1003 to generate a time-domain values, and step 1005 generates a logical file ID of the found values ​​indicated are the same, if yes, step 1009 ; otherwise, performing step 1010; step 1009, step 1005 will generate an incremental increase logical file ID, the distributed file to obtain a new logical file ID, returns to step 1006; step 1010, step 1003 will generate a time-domain step logical value and the document ID 1005 generated as the logical combination of the unique document ID, the current generated lookup keys as indexes of the distributed file. In an embodiment of the present invention, after the establishment of the index file key, also depending on the application, the index keys stored in the file to be modified. For example, multiple field devices in IPTV applications, the data can be analyzed by the device operating site, get, daily, hourly distribution statistics for the month. Then the statistical analysis of the data, if the data is written to the time distribution is uneven, with most of the write request time overlap, that is, within a second time, repeatedly written request occurs, the data amount of one million statistical highest one second concurrent write requests may reach 400, statistical data on the amount of ten million, the highest one second concurrent write requests may reach more than 2,000. If there is concurrent write request time exceeds a predetermined ratio (e.g., 80%), in this case, the offset bit hybrid selection very critical. Thus, in the embodiment of the present invention may also be modified or distributed configuration file and logical file segment file identifier identifies occupied bits, the configuration logic may be modified or file identification (file identification and fragment) was mixed with a time-domain values sequence, which can effectively prevent data collision. Similarly, if the statistical results are highly correlated with the time domain can modify or bit mixing range and are sequentially arranged in the time domain, i.e., modify or configure the time domain mixing bitwise policy, so as to effectively avoid data conflicts, to enhance the indexed data efficiency . When modified, index query function of the system temporarily disabled, and the need to rebuild a new index when the new index is created, the system can perform more efficient service. Incidentally, although the embodiment of the present invention to store the distributed file as an example, but not limited to, the technical solutions provided in embodiments of the present invention is also applicable to other stored documents, e.g., if no other files and includes a logical file segment file, may be directly synthesized to identify a time-domain value of the key values ​​of the index file of the document, the specific implementation process is similar to the distributed file, are not repeated here. FIG 11 is a mass based on the index to establish a data recording to an embodiment of the present invention apparatus 11, the apparatus including: an obtaining module 10, when receiving the new write file request to obtain the current system time; generation module 20, the setting file according to the request message requesting the current system time and the write files are written document identification, generating an index file of the distributed key; establishing module 30, arranged to establish the index keyword association with the file. For distributed file generation module 20 may generate a file according to a first logical identifier of the distributed file system and the current time index, in this case, 12, generating module 20 may comprise: an obtaining submodule 210, is set according to the current system time with respect to the total duration of the predetermined elapsed time, obtaining a first time domain parameter; generating submodule 220, configuration is set according to a preset policy, to generate the first file logical identifier; synthesis sub-module 230, setting the first parameter to the first time domain logical file identifier synthesized as lookup keys; searching submodule 240, to find the key in a lookup data recorded area, if not lookup keys to find the time domain or the first parameter and the first logical identifier and the found file lookup keys indicated by a second time-domain parameter and a second identification document does not achieve the same logic, as will be said lookup keys as index keys. Wherein the obtaining submodule 210 according to the above total length, obtain the value of each bit field of the time domain. Wherein the time domain may include: In the domain, the domain and second domain hour, for example, if the total length of n is m h k seconds, the time domain of the domain for recording the values ​​of m, the h field for recording the value of n, the second domain for recording the values ​​of k, m, n and k is an integer greater than or equal to 0. Wherein, in the domain may occupy eight, bits 13-14 hours may occupy fields, and second domain may occupy 13-14 bits. Alternatively, the total length may also be described as a k m n minutes of seconds, then the hour field replacement in FIG. 4 minutes domain, the minute field value recorded in the total length n. May then be a time domain time domain parameter to obtain a new mixed in a bit configuration policy; wherein said policy-based equalization unit hour the number of seconds, the unit may be based on the above-described equalization annual number of minutes or hours of strategies may be employed. If sub-module 240 to find the aforesaid search key to find the same value from the data area, and the acquisition time domain parameter generation sub-module and sub-module 210 acquires a time domain parameter 220 generates logical file and the found identification value indicating and logical file identifier are the same, it indicates that the logical file identifier at that time is occupied, the sub-module 240 to find the trigger generation sub-module 220 to modify the logical file identification, to generate a new logical file identifier, the new logical file identification sub-module 230 is input to the synthesis, to trigger a new lookup keys synthon synthesis module 230. In an embodiment of the present invention, the generation module 20 may also identify the files according to a first logical distributed file, the second segment file identifier and the current system time generating an index, in this case, the generating submodule 220, also provided to the configuration according to the preset policy, the first fragment to generate the document identification; synthons the module 230 is provided to the first time domain parameters, the first logical identifier and the file the first fragment is synthesized lookup keys file identifier; Find the sub-module 240 is provided to locate the lookup keys in the data area of ​​the recording, if the lookup keys can not be searched, or the first time domain parameters and the identifier indicating a first logical file lookup keys to find a second time domain parameters and a second logic to accomplish the same file identifier and not the first time domain parameters and said first identification file fragment and the second time domain parameters and the second sub-sheet document identification do not complete the same, then the lookup keys as index keys. Same time, if the sub-module 240 to find the data found in the region above the same value as lookup keys, and obtaining sub-module 210 acquires a time domain parameter generation sub-module 220 and the generated segment file identifying the found value indicates the domain parameters and logical file identifier are the same, it indicates that the segment file identifying at that time is occupied, the sub-module 240 to find the trigger generation sub-module 220 identifies the segment files modified to generate a new segment file identifier, the new segment file identification sub-module 230 is input to the synthesis, the synthesis module 230 triggers synthon new lookup keys. Wherein, when the module 30 is to establish the relationship between the index keys associated with the file, and can apply an empty record in the memory location in the database record of the file, the logical distributed file name of the file and the the index keys stored in said correspondence relation to said recording position, and the index key is added to the index data area; and storing said time segment files distributed file, the index key as the said segment file name of the file is actually stored. In order to achieve the index keys associated with the file. It said means provided in the embodiment of the present invention may further include a detection module configured to detect a time concurrent write request exceeds a predetermined ratio, if yes, trigger the updating module; said module is configured to modify or update the configuration file distributed logical file identifier and file identification slices occupied position, and / or modify or configure the resulting mixed configuration policy bit of the first time-domain parameters, the trigger generation module 20, the distributed file regenerate new the index key. From the above description, it can be seen that, in the embodiment of the present invention, the position of the index keys by selecting mixed, and analyzes various application model adaptation parameter to modify the index to optimize the efficiency of the system to ensure the timely high concurrency fast processing power. Obviously, those skilled in the art will appreciate that the present invention each module or each step may be a general-purpose computing device, they can be integrated in a single computing device or distributed in a network composed of multiple computing devices on, alternatively, they may be implemented by program codes executable by a computing device, so that, to be performed by a computing device stored in a storage means, and in some cases, may be different from the order here performing the steps shown or described, or they are made into integrated circuit modules, or by making them of a plurality of modules or steps in a single integrated circuit module. Thus, the present invention is not limited to any particular hardware and software combination. The above are only preferred embodiments of the present invention, it is not intended to limit the invention to those skilled in the art, the present invention may have various changes and variations. Any modification within the spirit and principle of the present invention, made, equivalent substitutions, improvements, etc., should be included within the scope of the present invention.

Claims

Claims
An indexing mass data recording method, comprising:
Upon receiving the new write file request message to obtain the current system time;
File request message according to the current system time and the write file request to write a file identifier, generating an index file of the key;
Establishing relationships with the key index of the file.
2. The method according to claim 1, wherein the file is a distributed file.
3. The method according to claim 2, wherein said file identifier comprises: a logical file logical distributed file a first file identifier; the identifier generated according to the current system time and the index key file word include:
Step A: The current system time to the total time length of a predetermined elapsed time with respect to obtaining a first time domain parameter;
Step B: The preset configuration policy, to generate the first file logical identifier; Step C: the first time domain parameter from the first logical file identifier synthesized as lookup keys; Step D: recording Finding the data area lookup keys, if the lookup keys can not be searched or the first time domain parameters and a second time to the first logic parameter field identifying the found file lookup keys and indication second logic file identifier is not completed, this indicates that the first logical document identifier is valid, the first time domain parameters of the first logic synthesis of the file identifier of the logical file as lookup keys unique ID, look for the key value as the index key.
The method according to claim 2, wherein said file identifier comprising: a first segment file the logical file a first logical distributed file fragment file identifier and the identifier of the distributed file ; according to the current system time and the index key to generate the file identifier comprises:
Step A: The current system time to the total time length of a predetermined elapsed time with respect to obtaining a first time domain parameter;
Step B: The preset configuration policy, to generate the first file logical identifier and the first sub-segment file identifier; Step c: the first time domain parameters, the first logical identifier and the file The first segment file identifies synthesized as lookup keys;
Step D: Find data area recorded in the lookup keys, if the lookup keys can not be searched, or the first time domain parameters and said first logical identifier and the found file lookup keys indicated second time domain parameters and a second identification document does not achieve the same logic, and the first time domain parameters and said first fragment and said second file identifier parameter and the second time domain fragment file identifier not completed, this indicates that the first logical combination with the file identifier identifying the first document sheet active points, the first time domain parameters, the first logical identifier and the first file segment files the identification synthesized as lookup keys unique ID of the segment file, the lookup keys as index keys.
5. The method of claim 4, wherein, if the same parameter a first time domain and the second time domain parameters and the same file identifier and the first fragment to the second fragment file identifier, the method further comprising: modifying the first document identification value to generate a new slice segment file identification, and then the new fragment file identifier as the first fragment of the file identifier, returns to step C.
6. The method according to claim 5, wherein generating the first identifier comprises a fragment file: Gets the value of the segment file identifying the last generation, the value of the specified incremental increase, to give the first identifying a segment file, wherein the file identification quantile sheet occupied by the configuration policy determination;
Modifying said value identifying a first file to generate a new slice segment file identifier comprises: increasing the value of the first fragment of the file identified by the specified increment to obtain said new fragment file identifier.
The method according to claim 3 or claim 4, wherein, if the same as the first time and the second time domain parameter and the first parameter field identifying a logical file with the same file identifier of the second logic, the method further comprising: modifying the first value of the logical file to generate a new logical identifier identifies the file, and then the new logical file identifier file identifier as the first logic returns to step C.
8. The method according to claim 7, wherein generating the first logic file identifier comprising: a file identified by the acquired logical value last generation, will increase the value of the specified increment, the first logic to obtain file identifier, wherein the logical file identification number of bits occupied by the policy determines the configuration;
Modifying the first logic value to generate a new file identifier identifies the logical file comprising: a first logic increments the value of the identifier of the file specified increment the obtained new logical file identifier.
9. The method according to any one of the claims 3-6, wherein said step A comprises: according to the total length, obtaining the value of each bit field of the time domain, wherein the bit field in the time domain comprising: a year field, minutes or hours domain domain, second domain, and the length of m n h k seconds or minutes total time, in the field for recording the value of m, the hour or minute domain with domain recording the value of n, the second domain for recording the values ​​of k, m, n and k is an integer greater than or equal to 0; bit mixed to obtain the time domain in accordance with the first configuration strategy time domain parameters, wherein said configuration policy comprises: a unit within hours of the time domain aligned seconds with the same order of magnitude, in the domain, the hour and minute domain or displacement domain to obtain the second low a time domain parameter; or, a unit of hours or minutes year aligned with the same order of magnitude to the time domain, the domain corresponding to the displacement of the lower bit of the time domain to obtain the first parameter.
10. The method according to claim 9, wherein, after establishing the association relationship between the key and the index file, the method further comprising:
Same time concurrent write request exceeds a preset ratio;
Modify or configure the distributed logical file identifier and file segment file identifying occupied bits, and / or modify or configure the policy configuration bit obtained by mixing the first time-domain parameters, returns the step B, and regenerated the new index key of the distributed file.
11. A method according to any one of claims 3-6, wherein the method according to the modulo composite folded lookup keys.
12. The method of association according to any one of claims 2-6, wherein establishing the index key and the file comprises:
Apply an empty record in the memory location in the database record of the file, the file name of the logical distributed file and stores correspondence between the index keys to the recording position, and the index key is added the index data area;
When storing the segment files distributed file, the index key as a file name of the file is actually stored fragment.
A massive data recorded indexing apparatus comprising:
Obtaining module, arranged to upon receiving a new write file request to obtain the current system time; file generating module, a request message to request according to the current system time and the write file write file identification, to generate the index key distributed file; establishing module, is set to build relationships with the key index of the file.
14. The apparatus according to claim 13, wherein the file is a distributed file; the file identifier comprises: a logical file logical distributed file a first file identifier; the generating module comprises: acquiring sub module, arranged according to the current system time with respect to the total duration of the predetermined elapsed time, obtaining a first time domain parameter;
Generating sub-module, is set according to a preset configuration policy, to generate the first file logical identifier; synthons module, the first set of time domain parameter file identifies the first logical synthesized as lookup keys;
Searching submodule, is provided to locate the lookup keys recorded in the data area, if you can not find the lookup keys or the first time domain parameters and said first logical identifier and the found file lookup key a second value indicating a time domain parameter and a second identification document does not achieve the same logic, then the lookup keys as index keys.
15. The apparatus according to claim 14, wherein said file identifier further comprises: the segment files distributed file a first segment file identifier;
The generation sub-module is also provided according to a preset configuration policy, the first fragment to generate the document identification;
The synthesis sub-module, to set the first time domain parameters, the first logical file identifier and the file identifier of the first fragment is synthesized lookup keys;
Find the sub-module, arranged to look in the data area of ​​the recording of the lookup keys, if the lookup keys can not be searched, or the first time domain parameters and identifying the first logical file found the lookup keys to indicate second time domain parameters and a second logic to accomplish the same file identifier and not the first time domain parameters and said first fragment and said second file identifier and said second time domain parameters segment file does not achieve the same identifier, then the lookup keys as index keys.
PCT/CN2011/084518 2010-12-24 2011-12-23 Method and device for creating indexes for mass data records WO2012083877A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN 201010606358 CN102024057B (en) 2010-12-24 2010-12-24 Method and device for building index of mass data record
CN201010606358.0 2010-12-24

Publications (1)

Publication Number Publication Date
WO2012083877A1 true true WO2012083877A1 (en) 2012-06-28

Family

ID=43865354

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/084518 WO2012083877A1 (en) 2010-12-24 2011-12-23 Method and device for creating indexes for mass data records

Country Status (2)

Country Link
CN (1) CN102024057B (en)
WO (1) WO2012083877A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102024057B (en) * 2010-12-24 2015-07-01 中兴通讯股份有限公司 Method and device for building index of mass data record
WO2015051499A1 (en) * 2013-10-08 2015-04-16 华为技术有限公司 Method and system for processing content information
CN105005624B (en) * 2015-07-31 2018-05-08 天脉聚源(北京)传媒科技有限公司 A method and apparatus for generating an index of the document id

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101167047A (en) * 2005-04-22 2008-04-23 微软公司 Local thumbnail cache
US20100146004A1 (en) * 2005-07-20 2010-06-10 Siew Yong Sim-Tang Method Of Creating Hierarchical Indices For A Distributed Object System
US7783615B1 (en) * 2005-09-30 2010-08-24 Emc Corporation Apparatus and method for building a file system index
CN102024057A (en) * 2010-12-24 2011-04-20 中兴通讯股份有限公司 Method and device for building index of mass data record

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101398869B (en) * 2008-10-07 2010-04-14 深圳市蓝韵实业有限公司 Mass data storage means

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101167047A (en) * 2005-04-22 2008-04-23 微软公司 Local thumbnail cache
US20100146004A1 (en) * 2005-07-20 2010-06-10 Siew Yong Sim-Tang Method Of Creating Hierarchical Indices For A Distributed Object System
US7783615B1 (en) * 2005-09-30 2010-08-24 Emc Corporation Apparatus and method for building a file system index
CN102024057A (en) * 2010-12-24 2011-04-20 中兴通讯股份有限公司 Method and device for building index of mass data record

Also Published As

Publication number Publication date Type
CN102024057B (en) 2015-07-01 grant
CN102024057A (en) 2011-04-20 application

Similar Documents

Publication Publication Date Title
US20110225165A1 (en) Method and system for partitioning search indexes
US7739288B2 (en) Systems and methods of directory entry encodings
Harris et al. 4store: The design and implementation of a clustered RDF store
US20070112795A1 (en) Scalable retrieval of data entries using an array index or a secondary key
CN101866358A (en) Multidimensional interval querying method and system thereof
KR101245994B1 (en) Parallel distributed processing system and method
CN102467570A (en) Connection query system and method for distributed data warehouse
US20110265177A1 (en) Search result presentation
CN1635494A (en) Method for implementing class memory database access and retrieval
CN101976322A (en) Safety metadata management method based on integrality checking
US8959110B2 (en) Dynamic query for external data connections
CN103150394A (en) Distributed file system metadata management method facing to high-performance calculation
US8472289B2 (en) Static TOC indexing system and method
US9519664B1 (en) Index structure navigation using page versions for read-only nodes
US20140280024A1 (en) Joining large database tables
CN102694860A (en) Method, equipment and system for data processing of cloud storage
US7949630B1 (en) Storage of data addresses with hashes in backup systems
CN102739622A (en) Expandable data storage system
US20120215980A1 (en) Restoring data backed up in a content addressed storage (cas) system
CN103973810A (en) Data processing method and device based on IP disk
CN101344893A (en) History data access method and apparatus
Apaydin et al. Approximate encoding for direct access and query processing over compressed bitmaps
Chung et al. An indexing method for wireless broadcast XML data
CN103914544A (en) Method for quickly matching Chinese addresses in multi-level manner on basis of address feature words
US7904488B2 (en) Time stamp methods for unified plant model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11851011

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct app. not ent. europ. phase

Ref document number: 11851011

Country of ref document: EP

Kind code of ref document: A1