CN102693315A - Method and device for removing URL (uniform resource locator) duplicate on basis of shared memory mapping - Google Patents

Method and device for removing URL (uniform resource locator) duplicate on basis of shared memory mapping Download PDF

Info

Publication number
CN102693315A
CN102693315A CN2012101713168A CN201210171316A CN102693315A CN 102693315 A CN102693315 A CN 102693315A CN 2012101713168 A CN2012101713168 A CN 2012101713168A CN 201210171316 A CN201210171316 A CN 201210171316A CN 102693315 A CN102693315 A CN 102693315A
Authority
CN
China
Prior art keywords
url
signature
file
shared drive
drive mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012101713168A
Other languages
Chinese (zh)
Inventor
司贺华
孔凡兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI JP ELECTRONIC BUSINESS CO Ltd
Original Assignee
SHANGHAI JP ELECTRONIC BUSINESS CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI JP ELECTRONIC BUSINESS CO Ltd filed Critical SHANGHAI JP ELECTRONIC BUSINESS CO Ltd
Priority to CN2012101713168A priority Critical patent/CN102693315A/en
Publication of CN102693315A publication Critical patent/CN102693315A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a method and a device for removing URL (uniform resource locator) duplicate on basis of shared memory mapping. The method includes the steps of initializing, URL signing, signature indexing, shared memory mapping and position based duplicate judging. By URL signing, the URL is finally memorized in one bit of a signature file, and occupied memory space of the URL is greatly compressed. By shared memory mapping, operation to an internal memory is synchronized to an external memorized signature file, and the duplicate removing of the URL is accelerated. By memorizing corresponding bit of the URL to the signature file, the duplicate removing effect can last for a long time. The invention further provides a device for removing URL duplicate on basis of shared memory mapping. The device comprises a central processing unit, an internal memory, an external memory, a network card and an input-output device. The internal memory, the external memory, the network card and the input-output device are all connected with the central processing unit. The central processing further comprises a URL capturing module, a URL signing module, a signature indexing module, a shared memory mapping module and a position based duplicate judging module which are connected together.

Description

A kind of URL removing repeat method and device based on the shared drive mapping
Technical field
The invention belongs to the computer network information process field, relate to removing repeat method and the product of uniform resource position mark URL (Uniform Resource Locator), be specifically related to a kind of URL removing repeat method and device based on the shared drive mapping.
Background technology
The development of Internet technology in recent years, to the acquisition filter that repeats uniform resource position mark URL (Uniform Resource Locator) with go heavily to handle, it is more and more important to seem.URL goes the weight technology to be widely used in network audit system, the search engine system.If can quick identification go out repetition URL, remove the subsequent processing steps of repetition URL from, will improve performance greatly.
URL goes the weight technology, need consider from two aspects: URL storage space and URL matching speed.The storage space of URL is meant maximum number and every the memory headroom that URL is shared that can handle non-repetition URL.The URL matching speed is through judging whether a URL record is to weigh the used time of repetition URL.
The external URL that uses goes the Related product of weight technology a lot, and all crawler systems all need be used this function.What the search engine Larbin that increases income had used that Bloom Filter algorithm realized URL goes the heavy filtration function, and this is the low and url filtering velocity ratio of the shared storage space of an a kind of URL algorithm faster.To be Burton Bloom proposed in 1970 Bloom Filter algorithm (Burton Bloom.Bloom filter [EB/OL] .http: //en.wikipedia.org/wiki/Bloom_filter, 1970).Yet Bloom Filter algorithm has a defective, and erroneous judgement is to a certain degree promptly arranged, and is about to the URL that non-repetitive URL is judged to be repetition.
Go in the research field of heavy speed in raising URL coupling, Hash table (Hash Table) is an important research direction.Because Hash table speed when searching information is very fast, thus storage organization in internal memory, generally all adopted during storage URL based on Hash table, thereby it is more valuable to go to be directed against in the weight technology research of Hash table and hash function at URL.People such as the Zheng Weibin of Xi'an Communications University have realized removing treasure based on the URL of Hash table, use the bitmap method to improve performance (Zheng Weibin, the Zhang Deyun of url filtering; Ding Huining; Li Jihua, high of heap of stone. based on the high-performance url filtering device research of Hash table. small-sized microcomputer system .2005,26 (2); The 17-180 page or leaf).The URL of Zheng Weibin removes treasure to choose to be applicable to the position hash function of character string Hash, has also realized simple storage optimization and cache optimization, and has proved this high efficiency of removing treasure based on the URL of bitmap method Hash table through experiment.Use Hash table to carry out the URL coupling and go heavy technology; The URL number of preserving depends on the size of Hash table bucket; Can be subject to the size of internal memory to the end, the Hash table scheme is stored in internal memory fully simultaneously, does not preserve last result; Go heavy record to fail to preserve, therefore use the URL duplicate removal device of Hash table in magnanimity URL goes to weigh, to be not suitable for fully.
Except the URL with magnanimity stores on the hard disk, the storage of URL itself is optimized, also be one of content of research.People such as Kasom have proposed a kind of method based on Delta Encoding, and use Adelson-Velskii-Landis tree to improve the efficient of searching, and have obtained about 50% compressibility; The effect that reaches is (Koht-arsa K, Sanguanpong S.In-memory URL Compression, National Computer Science and Engineering Conference very obviously; Chiang Mai; Thailand, November 7-9,2001:425-428).People such as Genova have proposed to use signature to improve URL seek rate and method (the Genova Z that reduces storage space; Christensen K.Efficient Summarization of URLs using CRC32 for Implementing URL Switching; CONFERENCE ON LOCAL COMPUTER NETWORKS, 2002:102-105P).His so-called signature; In fact just be to use CRC32 (Cyclic Redundancy Check 32) that URL is encoded into regular length; Use the CRC32 sign indicating number to replace the key of URL character string as Hash table, the storage space of the URL that had both reduced has like this improved again and has searched and gone heavy speed.Yet the CRC32 of magnanimity URL encodes in whole stored memories, still is subject to hardware condition, and matching speed also is a problem simultaneously.
Summary of the invention
To the above-mentioned deficiency of prior art, the object of the present invention is to provide the URL removing repeat method of shining upon SMM (Shared Memory-Mapped) efficiently based on shared drive, solve the deficiency of existing URL removing repeat method.The object of the invention is also providing a kind of URL duplicate removal device of realizing said method; Realized that URL goes heavy function; Improved and gone heavy speed, reduced storage space (every URL accounts for 1), preserved simultaneously and gone heavy result, can be applied to search engine, network audit system etc.
URL removing repeat method and device based on the shared drive mapping of the present invention, its characteristic comprises:
The removing repeat method comprises initialization, URL signature, signature index, shared drive shines upon and step-by-step is declared heavily; Device comprises central processing unit, internal storage, external storage, network interface card and input-output device; These equipment all are connected with central processing unit, and described central processing unit comprises that again the URL trapping module that links together, URL signature blocks, signature index module, shared drive mapping block and step-by-step declare the molality piece.
Initialization: install the loading of required related data and the initialization of shared memory-mapped;
The objective of the invention is to realize through following technical proposal:
This installs each module that described central processing unit is used to connect control device, the function of finishing device;
This installs the packet that described network interface card is used to catch storage URL information;
This installs the storage medium that described internal storage is device data in service;
This installs described external storage is that device is preserved the storage medium that removes heavy destination file;
This installs described URL trapping module and through network interface card network packet is resolved, and analyzes corresponding URL, and other module of supplying apparatus is used;
The URL signature blocks: be used for the URL of random length is converted to the signature of fixed length, this signature is unique correspondence within the specific limits; The method of signature includes but not limited to CRC32 (Cyclical Redundancy Check 32; Have 32 bits), SHA1 (Secure Hash Algorithm 1 has 160 bits); MD5 (Message Digest Algorithm5 has 128 bits) scheduling algorithm;
The signature index module: according to the byte number x (x=x11+x12) of signature, the x11 byte is as filename before choosing, and residue x12 byte is as file content; It is bigger that signature bytes is counted x, then is split as 3 grades even more (if more than 3 grades, then are divided into multistage catalogue); With 3 grades is example: x=x21+x22+x23; As directory name, middle x22 byte is as filename with the preceding x21 byte of signing, and residue x23 byte is as file content; The method of index be exactly according to the signature appointment byte find corresponding catalogue and file;
The shared drive mapping block: owing to be stored as document form, but the reading and writing of files system overhead is very big, thus adopt the method for shared drive mapping, internal memory and above-mentioned signature file is corresponding one by one; Directly, promptly accomplish the synchronous of internal memory and file by the shared drive mapping block to internal memory operation;
The molality piece is declared in step-by-step: according to the signature index, find the corresponding shared drive of signature file content position, read the value of internal memory, and judge whether this URL repeats.
URL removing repeat method based on the shared drive mapping of the present invention may further comprise the steps:
1) initialization step: according to predetermined signature index scheme, initialization files are accomplished the initialization of shared drive mapping to internal memory;
2) URL catches step, is that URL from the information that network interface card obtains identifies supplying apparatus and goes heavily.
3) URL signature step is to begin from obtaining URL, converts URL the byte of fixed length into through signature algorithm, then signature bytes is passed to signature index step, according to the difference of signature algorithm, adopts different detailed steps.
4) signature index step is according to selected directory level, selected index scheme, and it realizes that directly influencing the device initialization specifically realizes;
5) shared drive mapping step; Make and realize shared drive through the same signature file of mapping between the process, after signature file was mapped to the process address space, process can conduct interviews to file as visiting common memory; Needn't call read () again, write operations such as (); In other words be exactly to do a reflection to the content of a file in the internal memory the inside, internal memory operation is faster than disk operating;
6) heavy step is declared in step-by-step, and according to the value of signature file content field, index finds institute on the throne, judges whether this position is 1, if 1 this URL is repetition URL, if 0 with this position 1, and return the result of non-heavy URL.
Beneficial effect of the present invention is through signature URL finally to be stored with 1 in the signature file, has compressed the shared storage space of URL greatly; Through the shared drive mapping, will be synchronized in the signature file the operation of internal memory, improve URL and gone heavy speed; The position that URL is corresponding is stored in the signature file, long preservation go heavy result.
Description of drawings
Fig. 1 is a structural representation of the present invention;
Fig. 2 is based on the process flow diagram of the URL removing repeat method of shared drive mapping;
Fig. 3 is the treatment scheme of shared drive mapping;
The URL that Fig. 4 is based on bitmap method Hash table goes the density current journey;
Embodiment
For example the present invention is done below in conjunction with accompanying drawing and to describe in further detail:
Shown in Figure 1 is a kind of flow process of the URL removing repeat method based on shared drive mapping, comprising:
Device initialization module: install the loading of required related data and the initialization of shared memory-mapped;
Central processing unit: be used to connect each module of control device, the function of finishing device;
Network interface card: the packet that is used to catch storage URL information;
Internal storage: the storage medium of device runtime data;
External storage: device is preserved the storage medium that removes heavy destination file;
URL trapping module: obtain the required heavy URL that goes;
URL signature blocks: the signature that is used for the URL of random length is converted to fixed length;
The signature index module: the byte hierarchical index according to signature arrives corresponding signature file;
Memory-mapped module: signature file and internal memory are shared mapping, make device be updated in the signature file synchronously to the operation of internal memory;
The molality piece is declared in step-by-step: the value according to corresponding position in the signature file content judges whether this URL repeats.
Embodiment
Fig. 2 has provided a kind of flow process of the URL removing repeat method based on shared drive mapping, uses CRC32 to be example with signature algorithm.
1, overall design
Embodiment overall design based on the URL duplicate removal device of shared drive mapping is following: beginning when at first installing initial launch; The URL signature file that is stored in external memory is all opened in internal memory; If not existing, creates file and with all positions 0; The shared drive mapping block carries out the shared drive mapping and obtains the SMM pointer, gets final product close file then.To the read-write operation of SMM pointer, will be mapped to the read-write of URL storage file later on.The URL that network interface card is caught carries out the CRC32 signature, obtains the CRC32 sign indicating number.The SMM pointer all is put in the array manages, according to the CRC32 sign indicating number preceding 8 find corresponding SMM pointer from this array; Judge the value of the corresponding positions of the memory field that this SMM pointer is corresponding again; If 1, then be expressed as the URL of repetition, if 0; Being exactly non-repetitive URL, is 1 with this position simultaneously.Repeat as above flow process, promptly be based on the realization flow of the URL duplicate removal device of shared drive mapping.
2, signature file index
Present embodiment uses CRC32 algorithm signature.CRC32 is one 32 a value, deposits this information in file, file altogether 2 32The position, promptly 2 32/ 8 equal the 512MB byte, so can consider that all depositing it in byte-sized is in the file of 512MB.Yet have individual problem this moment, although read-write always is a file, 2 32Addressing in the file of size is certain to slow.Therefore, the scheme of present embodiment is: get preceding 8 of CRC32 value and can produce 256 files of 00 to FF by name as filename, with all the other 24 files that are stored in a 2MB size, single file addressing space is 2 24
3, shared drive mapping
In the URL removing repeat method based on the shared drive mapping, the treatment scheme of shared drive mapping is as shown in Figure 3.Module will judge earlier whether signature file is created, if then directly open, otherwise needs to create file, and glue file size and opsition dependent 0; And then carry out the shared drive mapping, and need SMM synchronous simultaneously, and return the SMM pointer, it is used to supply to call duplicate removal device.
4, experimental result and analysis
1) experimental procedure
The test data of experiment derives from the file that comprises URL information that primary network is caught the bag storage, and contained URL number is 75012, and then average every URL length is 50.5 bytes.
The flow process of experiment is: the URL in this packet file carried out URL goes the density current journey, inquire about this URL (if inquiry is then inserted this record less than this record), and the record queries time, read next bar record then, repeat aforesaid operations.After 75012 records have all been looked into, read this file more again at last, the time that record is inquired about whole URL for the second time.
Experimental program: the test result of present embodiment with remove the heavy filtration device based on the URL of bitmap method Hash table and go the treasure contrast that experimentizes based on the URL that Bloom Filter algorithm is realized, proved the high efficiency that the present invention realizes URL removing repeat method.In addition, according to every kind of method, revise correlation parameter and obtain corresponding experimental data.Each method experiment several times is averaged.Among the present invention, the experiment number of each device is 4 or 5, draws a mean value then, proves the effect of experiment.
2) contrast scheme is described
Realized in the testing scheme of the present invention that the high-performance URL based on bitmap method Hash table that the refined people of grade of the Zheng Wei of Xi'an Communications University proposes removes treasure, and contrasted with the URL duplicate removal device that shines upon based on shared drive.URL based on bitmap method Hash table goes the density current journey as shown in Figure 4.
URL removing repeat method design based on Bloom Filter algorithm: the realization of Bloom Filter algorithm, the factor that influences its processing speed is the selection of bit array size m value and hash function number k, when considering False Rate certainly, also will consider the quantity n of element.In the Bloom Filter algorithm that this experiment realizes; Obtaining of hash function number k value then adopted following scheme: select the URL character string is carried out CRC32 coding (corresponding 1 no symbol shaping value), MD5 coding (corresponding 4 no symbol shaping values), secure hash standard SHA1 coding (corresponding 5 no symbol shaping values); So, can be equivalent to the value of 10 hash functions through the value of these three algorithm coding process gained.And, can select the combination of three encoded radios, just can generate different k values, the maximal value of k is 10, obtains different experimental datas.
3) experimental result
URL duplicate removal device experimental result based on the shared drive mapping is as shown in table 1.
Table 1 is based on the experimental result of the URL duplicate removal device of shared drive mapping
Figure BSA00000725238600051
Use goes the experimental result of treasure as shown in table 2 based on the URL of bitmap method.
The experimental result that table 2 uses the URL based on bitmap method Hash table to remove treasure
Figure BSA00000725238600052
Based on the hash function number of Bloom Filter algorithm, kcrc32=1, kmd5=4, ksha1=5 selects the combination of three kinds of codings, and then k is exactly the addition of corresponding kx value, can be combined into multiple different k value, and the maximal value of k is 10.In this experiment, the m value of Bloom Filter algorithm difference 80000 and 8000000, n=75012, k makes up according to different coding, gets value together.Experimental result is as shown in table 3.
The URL removing repeat method experimental result of table 3Bloom Filter algorithm
Figure BSA00000725238600053
4) experimental result comparative analysis
Employing is 10000 o'clock based on the URL removing repeat method of bitmap method Hash table in the Hash table size, and the time of inquiring about and inserting 75012 records is 252472 microseconds, and the time of inquiry is 185054 microseconds again after all inserting.And add the time that opens file based on the URL removing repeat method that shared drive shines upon is 787788.5 microseconds; Query time more afterwards is 167850 microseconds (are example to deposit 256 files in); It is thus clear that method performance of the present invention is superior to going heavily based on the URL of bitmap method.For the URL removing repeat method that Bloom Filter algorithm is realized, get the fastest k of Bloom Filter algorithm and equal 4 promptly only to select the scheme time of MD5 coding be the shortest, but also up to 290936.0 microseconds, still slow than other two kinds of methods.So based on the URL removing repeat method that Bloom Filter algorithm is realized, the time mainly has been used in the generation of cryptographic hash, the k value is big more, and the time of calculating is big more, and performance is also just poor more.In addition; Owing to use the URL removing repeat method of bitmap method; Performance reduces along with the increase of URL record number in meeting; And can write down the increase of number and cause the raising of False Rate along with URL based on the URL removing repeat method of Bloom Filter algorithm, but there is not above-mentioned any problem based on the URL removing repeat method of shared drive mapping.In addition, when after program withdraws from, moving once more, owing to exist in the file based on the record of the URL removing repeat method of shared drive mapping; Be as the criterion with the record in this file when going once more to weigh; And for other two kinds of URL removing repeat methods, data all exist in the internal memory, when moving once more after program withdraws from; Original record all disappears, and need restart heavily.
Comprehensive result of upper experiment, the URL removing repeat method of shining upon based on shared drive is optimum scheme, has reached the object of the invention.No matter the explanation of this instance is performance or correctness aspect, and the present invention is applicable to that the URL in search engine, the network audit system goes major punishment disconnected.
The above only is a preferred embodiments of the present invention, is not to any pro forma restriction of the present invention; Though the present invention describes with the graphic technique of preferred embodiments, yet be not in order to limiting the present invention, anyly be familiar with the professional and technical personnel; In the scope that does not break away from technical scheme of the present invention; All the method for foregoing description capable of using and technology contents are made the change and the adjustment of part, and be the case that is equal to adjustment after the adjustment and describe, be the content that does not break away from the present invention's technology in every case; Any simple modification and adjustment according to technical spirit of the present invention is done above description case all still belong in the scope of technical scheme of the present invention.

Claims (6)

1. based on the URL removing repeat method of shared drive mapping, comprise that initialization module, URL signature blocks, signature index module, shared drive mapping block and step-by-step declare the molality piece;
Its characteristic comprises:
Device initialization module: install the loading of required related data and the initialization of shared memory-mapped;
The URL signature blocks: be used for the URL of random length is converted to the signature of fixed length, this signature is unique correspondence within the specific limits; The method of signature includes but not limited to CRC32 (Cyclical Redundancy Check 32; Have 32 bits), SHA1 (Secure Hash Algorithm 1 has 160 bits); MD5 (Message Digest Algorithm5 has 128 bits) scheduling algorithm;
The signature index module: according to the byte number x (x=x11+x12) of signature, the x11 byte is as filename before choosing, and residue x12 byte is as file content; It is bigger that signature bytes is counted x, then is split as 3 grades even more (if more than 3 grades, then are divided into multistage catalogue); With 3 grades is example: x=x21+x22+x23; As directory name, middle x22 byte is as filename with the preceding x21 byte of signing, and residue x23 byte is as file content; The method of index is exactly to search according to the byte of signature appointment to get final product;
The shared drive mapping block: owing to be stored as document form, but the reading and writing of files system overhead is very big, thus adopt the method for shared drive mapping, internal memory and above-mentioned signature file is corresponding one by one; Directly, promptly accomplish the synchronous of internal memory and file by the shared drive mapping block to internal memory operation;
The molality piece is declared in step-by-step: according to the signature index, find the corresponding shared drive of signature file content position, read the value of internal memory, and judge whether this URL repeats.
2. the URL removing repeat method based on the shared drive mapping according to claim 1 is characterized in that may further comprise the steps:
1) initialization step: according to predetermined signature index scheme, initialization files are accomplished the initialization of shared drive mapping to internal memory;
2) URL catches step, is that URL from the information that network interface card obtains identifies supplying apparatus and goes heavily.
3) URL signature step is to begin from obtaining URL, converts URL the byte of fixed length into through signature algorithm, then signature bytes is passed to signature index step, according to the difference of signature algorithm, adopts different detailed steps.
4) signature index step is according to selected directory level, selected index scheme, and it realizes that directly influencing the device initialization specifically realizes;
5) shared drive mapping step; Make and realize shared drive through the same signature file of mapping between the process, after signature file was mapped to the process address space, process can conduct interviews to file as visiting common memory; Needn't call read () again, write operations such as (); In other words be exactly to do a reflection to the content of a file in the internal memory the inside, internal memory operation is faster than disk operating;
6) heavy step is declared in step-by-step, and according to the value of signature file content field, index finds institute on the throne, judges whether this position is 1, if 1 this URL is repetition URL, if 0 with this position 1, and return the result of non-heavy URL.
3. the URL duplicate removal device that method according to claim 1 is made into; Its characteristic comprises central processing unit, internal storage, external storage, network interface card and input-output device; These equipment all are connected with central processing unit, and described central processing unit comprises that again the URL trapping module that links together, URL signature blocks, signature index module, shared drive mapping block and step-by-step declare the molality piece.
4. the URL removing repeat method based on the shared drive mapping according to claim 1; It is characterized in that overall design is following: carry out initialization when at first installing initial launch; The signature file of storage URL is all opened; If not existing, creates file and with all positions 0, the shared drive mapping block carries out the shared drive mapping and obtains the SMM pointer, gets final product close file then; To the read-write operation of SMM pointer, will be mapped to the read-write of URL storage file later on; The SMM pointer all is put in the array manages, according to preceding 8 of the CRC32 sign indicating number, from this array, find corresponding SMM pointer later on; Judge the value of the corresponding positions of the memory field that this SMM pointer is corresponding again; If 1, then be expressed as the URL of repetition, if 0; Being exactly non-repetitive URL, is 1 with this position simultaneously; Repeat as above flow process, promptly be based on the realization of the URL removing repeat method of shared drive mapping.
5. shared drive mapping according to claim 4 is characterized in that will judging earlier whether signature file is created, if then directly open, otherwise needs to create file, and glue file size and opsition dependent 0; And then carry out the shared drive mapping, and need smm synchronous simultaneously, and return the smm pointer, it is used to supply to call duplicate removal device.
6. the URL removing repeat method based on the shared drive mapping according to claim 1 is characterized in that beneficial effect is through signature URL finally to be stored with 1 in the signature file, has compressed the shared storage space of URL greatly; Through the shared drive mapping, will be synchronized in the signature file the operation of internal memory, improve URL and gone heavy speed; The position that URL is corresponding is stored in the signature file, long preservation go heavy result.
CN2012101713168A 2012-05-29 2012-05-29 Method and device for removing URL (uniform resource locator) duplicate on basis of shared memory mapping Pending CN102693315A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012101713168A CN102693315A (en) 2012-05-29 2012-05-29 Method and device for removing URL (uniform resource locator) duplicate on basis of shared memory mapping

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012101713168A CN102693315A (en) 2012-05-29 2012-05-29 Method and device for removing URL (uniform resource locator) duplicate on basis of shared memory mapping

Publications (1)

Publication Number Publication Date
CN102693315A true CN102693315A (en) 2012-09-26

Family

ID=46858748

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012101713168A Pending CN102693315A (en) 2012-05-29 2012-05-29 Method and device for removing URL (uniform resource locator) duplicate on basis of shared memory mapping

Country Status (1)

Country Link
CN (1) CN102693315A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572973A (en) * 2014-12-31 2015-04-29 上海格尔软件股份有限公司 High-performance memory caching system and method
CN105900395A (en) * 2014-01-16 2016-08-24 富士通株式会社 Communication apparatus, communication method, and communication program
CN106713479A (en) * 2017-01-06 2017-05-24 南京铱迅信息技术股份有限公司 Cloud-based file duplicate-removing method
CN107357862A (en) * 2017-06-30 2017-11-17 中国联合网络通信集团有限公司 Calling list rearrangement method and device
CN111797335A (en) * 2020-07-06 2020-10-20 北京基软科技有限公司 Multi-dimensional information publishing and retrieving system and method

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105900395A (en) * 2014-01-16 2016-08-24 富士通株式会社 Communication apparatus, communication method, and communication program
CN104572973A (en) * 2014-12-31 2015-04-29 上海格尔软件股份有限公司 High-performance memory caching system and method
CN106713479A (en) * 2017-01-06 2017-05-24 南京铱迅信息技术股份有限公司 Cloud-based file duplicate-removing method
CN106713479B (en) * 2017-01-06 2020-04-10 南京铱迅信息技术股份有限公司 Cloud-based file duplicate removal method
CN107357862A (en) * 2017-06-30 2017-11-17 中国联合网络通信集团有限公司 Calling list rearrangement method and device
CN107357862B (en) * 2017-06-30 2020-03-13 中国联合网络通信集团有限公司 Method and device for arranging repeated voice messages
CN111797335A (en) * 2020-07-06 2020-10-20 北京基软科技有限公司 Multi-dimensional information publishing and retrieving system and method

Similar Documents

Publication Publication Date Title
CN103488709B (en) A kind of index establishing method and system, search method and system
CN103136243B (en) File system duplicate removal method based on cloud storage and device
CN102222085B (en) Data de-duplication method based on combination of similarity and locality
US20190222603A1 (en) Method and apparatus for network forensics compression and storage
US9678688B2 (en) System and method for data deduplication for disk storage subsystems
CN106874348B (en) File storage and index method and device and file reading method
US7783615B1 (en) Apparatus and method for building a file system index
US9262432B2 (en) Scalable mechanism for detection of commonality in a deduplicated data set
CN102725755B (en) Method and system of file access
WO2012065408A1 (en) Disaster tolerance data backup method and system
CN106202173B (en) A kind of intelligent rearrangement and system of file repository storage
CN106407224A (en) Method and device for file compaction in KV (Key-Value)-Store system
CN101963982A (en) Method for managing metadata of redundancy deletion and storage system based on location sensitive Hash
CN102693315A (en) Method and device for removing URL (uniform resource locator) duplicate on basis of shared memory mapping
CN103955530A (en) Data reconstruction and optimization method of on-line repeating data deletion system
CN110569245A (en) Fingerprint index prefetching method based on reinforcement learning in data de-duplication system
CN107046812A (en) A kind of data save method and device
CN104636477B (en) The De-weight method of push list before a kind of information push
CN103152430B (en) A kind of reduce the cloud storage method that data take up room
CN109522283A (en) A kind of data de-duplication method and system
CN102253991A (en) Uniform resource locator (URL) storage method, web filtering method, device and system
CN104050103A (en) Cache replacement method and system for data recovery
CN104050057B (en) Historical sensed data duplicate removal fragment eliminating method and system
CN102722450B (en) Storage method for redundancy deletion block device based on location-sensitive hash
CN104965835B (en) A kind of file read/write method and device of distributed file system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
DD01 Delivery of document by public notice

Addressee: Shanghai JP Electronic Business Co.,Ltd.

Document name: Notification of before Expiration of Request of Examination as to Substance

DD01 Delivery of document by public notice

Addressee: Shanghai JP Electronic Business Co.,Ltd.

Document name: Notification that Application Deemed to be Withdrawn

C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120926