CN112149416A - Method for detecting hot spot academic research topic in distributed academic data warehouse - Google Patents

Method for detecting hot spot academic research topic in distributed academic data warehouse Download PDF

Info

Publication number
CN112149416A
CN112149416A CN202010938852.0A CN202010938852A CN112149416A CN 112149416 A CN112149416 A CN 112149416A CN 202010938852 A CN202010938852 A CN 202010938852A CN 112149416 A CN112149416 A CN 112149416A
Authority
CN
China
Prior art keywords
data
academic
stage
vocabulary
encoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010938852.0A
Other languages
Chinese (zh)
Other versions
CN112149416B (en
Inventor
戴海鹏
陈贵海
李猛
汪笑宇
夏瑞
谢榕彪
于俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202010938852.0A priority Critical patent/CN112149416B/en
Publication of CN112149416A publication Critical patent/CN112149416A/en
Application granted granted Critical
Publication of CN112149416B publication Critical patent/CN112149416B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A method for detecting hot spot academic research topics in a distributed academic data warehouse comprises a data sampling compression coding stage, a transmission stage and a data recovery and detection stage on a central server, wherein the data sampling compression coding stage comprises a data sampling compression coding stage and a data recovery and detection stage; performing multiple sampling on each academic word extracted from the academic document by data sampling compression coding to determine whether the academic word enters each encoding type cuckoo filter in the group, wherein successfully sampled words enter a data encoding stage; the data compression and encoding stage is responsible for scanning all documents in each distributed data warehouse and extracting academic research vocabularies from the documents by utilizing a word segmentation device; the data transmission stage is responsible for transmitting the coded cuckoo filter for recording the compressed data in each distributed data warehouse to the central server; the data recovery and detection stage is to decode and recover the original vocabulary from the encoded cuckoo filter constructed from each distributed data set and estimate the heat degree of the vocabulary on the central server.

Description

Method for detecting hot spot academic research topic in distributed academic data warehouse
Technical Field
The invention relates to data mining, and more particularly: is a framework of methods related to detecting hot spot academic research topics in a distributed academic data repository.
Background
With the steady increase of the level and scale of the domestic academic research in recent years, the number of published academic papers is increasing day by day. For example, a report on scientific and engineering indicators issued by the national science foundation in the united states shows that the academic papers published in china in 2016 have exceeded 42.6 ten thousand, corresponding to 18.6% of the total international number, which exceeds the united states and makes china the first major number of academic papers. However, with the increase of the number of published papers and the continuous divergence of academic research directions, it is more and more difficult to grasp the current academic hotspots and track the corresponding research progress, thereby increasing the difficulty of the novice scientific researchers to follow the academic research frontier; in addition, scientific research projects and funds are difficult to arrange reasonably for scientific research management institutions.
In recent years, there has been some research work beginning to focus on the detection of academic hotspots, with a lower limit: (1) hot topics can only be detected in a centralized academic repository; (2) continuous scientific research file updating cannot be supported; (3) a large amount of network bandwidth and memory resources are required to support the detection process. Considering that the existing academic warehouse deployment mode is distributed deployment and needs to define the requirements of hot research topics in practice, the existing research works have certain limitations and cannot be directly used for detecting the academic research hot topics in the distributed data warehouse, so the existing works cannot solve the target provided by the invention.
Therefore, it is an urgent need to solve the problem of the art to provide a method and a system for detecting hot spot research topics in a distributed data warehouse, effectively reduce the amount of data to be transmitted in a distributed environment, and ensure the accuracy of detecting the hot spot topic.
Disclosure of Invention
The invention aims to: on the premise of keeping low communication traffic, hot-spot research topics are detected in a distributed data warehouse.
In order to achieve the purpose, the technical scheme of the invention is as follows: a method of detecting hot spot academic research topics in a distributed academic data warehouse, characterized by; the method comprises a data sampling compression coding and transmission stage in a distributed data warehouse and a data recovery and detection stage on a central server;
wherein:
the data sampling, compressing and encoding stage is responsible for maintaining a group of encoding type cuckoo filters, multiple sampling is carried out on each academic word extracted from the academic document to determine whether the academic word enters each encoding type cuckoo filter in the group, and the successfully sampled vocabulary enters the data encoding stage;
the data Coding stage is responsible for scanning all documents in each distributed data warehouse, extracting academic research vocabularies from the documents by using a word splitter, compressing and Coding the extracted academic vocabularies and frequency thereof, and recording the compressed and coded academic research vocabularies and the frequency thereof into a storage structure of a Coding Cuckoo Filter (Coding Cuckoo Filter);
the data transmission stage is responsible for transmitting the coded cuckoo filter for recording the compressed data in each distributed data warehouse to the central server;
the data recovery and detection stage is to decode and recover original words from the encoded cuckoo filters constructed from the distributed data sets on the central server, estimate the heat (frequency) of the words, and output hot research topics according to the heat (frequency) requirements of academic topics given by users. On the basis of the coded cuckoo filters sent by the distributed servers, the potential hot research topic vocabularies and the heat degrees of the hot research topic vocabularies are recovered according to the compressed data stored in the distributed servers, the total heat degrees in all distributed data warehouses are calculated, and finally the hot research topics are output according to the total heat degrees.
The method comprises the steps of (1) detecting hot spot academic research topics in a distributed academic data warehouse, (1) maintaining a group of encoding type cuckoo filters in a data storage stage, and determining whether to store each academic vocabulary into each filter or not through multiple sampling. (2) In the data coding and transmission stage, the original data is not stored, but the code, fingerprint information and frequency information of the original data are stored; (2) in the data coding stage, the frequency of each academic vocabulary is sampled and then recorded together with the codes and fingerprints of the academic vocabulary; (3) then, in the data recovery and detection stage, according to the fingerprint information, the codes belonging to the same element are gathered and then decoded to recover the original data; (4) then, in the data recovery and detection stage, according to the fingerprint information, the codes belonging to the research vocabulary are gathered and then decoded to recover the original data;
in the data encoding and transmission stage, the data is firstly subjected to multi-sampling and then is subjected to compression encoding and then is stored in the encoding cuckoo filter.
In the data recovery and detection stage, the heat degree of the academic vocabulary is recovered by a maximum likelihood estimation method.
The invention aims to provide a method for detecting hot research topics in a distributed academic data warehouse, which comprises the following steps: designing a system model for distributed computing detection; the storage capacity of the academic topic words is compressed by using an encoding technology; it is proposed to further reduce data storage and traffic using multisampling techniques; it is proposed to increase the speed of the data processing process using encoded cuckoo filter storage. Specifically, the present invention: 1. designing a hot academic topic detection system model; 2. proposing a distributed scanning academic document, extracting hot words, compressing and coding the hot words and storing the academic words and the hot degrees; 3. the encoded data is stored into an encoded cuckoo filter to accelerate the data processing speed;
the topic of academic hot topic refers to a problem which is researched by a large number of researches in academic research, and the form of the problem is also expressed in the form of words. Firstly, providing a system model for detecting academic hot topics in a distributed academic data warehouse; secondly, in a data sampling stage, a multi-sampling technology is adopted, so that the data storage capacity is reduced, and meanwhile, higher accuracy is kept; in the data coding and transmission stage, in each distributed data warehouse, a coding technology is used for compressing topic words contained in academic files recorded in each distributed data warehouse, and then the coded data is stored into a coding cuckoo filter; in the data transmission stage, compressed data are transmitted to a designated central server, hot topics and occurrence frequency of the hot topics are recovered, and finally all hot academic topics are output according to topic popularity requirements provided by users. The invention provides a method for detecting academic research hot topics in a distributed academic data warehouse for the first time, which effectively reduces a large amount of data communication traffic generated for detecting the academic research hot under a distributed environment, pertinently provides effective theoretical performance guarantee, and can be used for detecting the academic hot topics and calculating topic heat.
The invention has the beneficial effects that: 1. the data volume required to be transmitted by the academic hot topic in the distributed data environment is effectively reduced by using the coded compressed data; 2. the data volume of storage and transmission is further reduced by utilizing a multi-sampling technology; 3. the encoded data is stored into the encoded cuckoo filter and transmitted as a whole, which greatly speeds up the data processing and transmission time.
Drawings
FIG. 1 is a system architecture diagram of the present invention;
FIG. 2 is a flow chart of a data sampling phase;
fig. 3 is a flow chart of data decoding recovery and detection.
Detailed Description
The system architecture of the present invention is shown in fig. 1, and includes a central server and a distributed academic repository. The invention has two stages: (1) data sampling, data compression and encoding stages and transmission stages completed in the distributed data warehouse; (2) and a data decoding and recovering and hot topic detection stage on the central server. The data compression and encoding stage can be further subdivided into 3 steps: data sampling, data coding and fingerprint information acquisition, data storage and data transmission; the data decoding recovery and detection phase can be divided into 2 steps: data decoding recovery and hot word detection.
Stage 1.1: data sampling phase
In the data sampling stage, a group of encoding type cuckoo filters are maintained for each distributed data warehouse, academic documents in the data warehouse are scanned, a word segmentation device is used for extracting academic research words in the academic documents, and finally, multiple sampling determines how the words are stored. The process of multisampling is as follows: (1) the sampling probability of the encoding type cuckoo filter in each group is increased according to the sequence number and shows geometric attenuation, such as: the sampling probability of the first filter is 1%, the second is 0.2%, the third is 0.04%, and so on; (2) each academic vocabulary is independently sampled on all the encoding type cuckoo filters in the group according to the preset sampling frequency of the filters, and the sampling process of a plurality of filters forms multiple sampling. The successfully sampled academic vocabulary will enter the subsequent encoding stage.
Stage 1.2: data encoding phase
In the data encoding stage, the academic vocabulary is firstly compressed, then the vocabulary fingerprint information is acquired, and then the acquired code and the fingerprint information are inserted into the encoding type cuckoo filter, and meanwhile, the counter of the insertion position is increased by 1.
And (3) a compression process: each academic vocabulary has an identification number (ID) that can be obtained directly from the english character or from a binary representation of the chinese code. Since the number is usually long, direct transmission causes excessive traffic. To solve the traffic problem, we first perform a lossy compression (Raptor code) code on the data, as follows:
Figure BDA0002672912220000041
raptor code coding matrix [ a ]ij],1≤jIf l is less than or equal to l, the corresponding length of vocabulary ID is
Figure BDA0002672912220000044
The coded result of the bit is
Figure BDA0002672912220000042
The calculation process is as follows:
Figure BDA0002672912220000043
fingerprint acquisition, given vocabulary ID, and hash function hf(. The) fingerprint information f (length is p) is obtained as follows:
f=hf(ID)%2pwhere% represents a modulo operation.
After acquiring the code and the fingerprint, inserting the code information and the fingerprint information according to an insertion mode of a common cuckoo filter: (1) computing potentially two insertable data buckets using two hash functions; (2) if the two positions have spaces which can be inserted, the two positions are directly inserted; (3) if there is no space to insert in these two positions, it directly kicks out an element to free up the position to insert, then the element proposed is inserted again by repeating the above process.
Stage 1.3: data transmission
When the encoding stage of each distributed data warehouse is completed, the encoded cuckoo filter storing the compressed data information needs to be sent to a designated central server.
Stage 2.1: data recovery and detection
After all data sets are sent to some central server, we need to extract the compressed data from the encoded cuckoo filters from the different distributed academic repositories, then decode to recover the original data and estimate the heat.
Extracting compressed data: after obtaining the encoded cuckoo filters sent by the servers, the encoded cuckoo filters are arranged and aligned for processing. And traversing all the data buckets for all the cuckoo filters, selecting the current data bucket, then taking out the elements in the current data bucket, and then extracting the elements which are in the same insertion position and have the same fingerprint information and are transmitted from all the distributed data warehouses according to the fingerprint information of the elements to form a same type encoding group. As shown in fig. 3, the traversal encounters element 1, and then all the remaining elements in the same code group are extracted according to element 1.
And (3) decoding: the extracted code is substituted into equation 1 to decode the original vocabulary ID.
And (3) heat estimation: and aiming at the decoded vocabulary ID, calculating an estimated heat value by utilizing maximum likelihood estimation according to the value in the counter of the decoded vocabulary ID and the corresponding sampling probability, and then outputting the decoded vocabulary and the corresponding heat.

Claims (9)

1. A method for detecting hot spot academic research topics in a distributed academic data warehouse is characterized by comprising a data sampling compression coding stage, a transmission stage and a data recovery and detection stage on a central server, wherein the data sampling compression coding stage comprises a data sampling compression coding stage and a data recovery and detection stage;
wherein: the data sampling, compressing and encoding stage is responsible for maintaining a group of encoding type cuckoo filters, multiple sampling is carried out on each academic word extracted from the academic document to determine whether the academic word enters each encoding type cuckoo filter in the group, and the successfully sampled vocabulary enters the data encoding stage;
the data compression and encoding stage is responsible for scanning all documents in each distributed data warehouse, extracting academic research vocabularies from the documents by using a word splitter, compressing and encoding the extracted academic vocabularies and frequency thereof, and recording the compressed and encoded academic research vocabularies and the frequency thereof into a storage structure of a Coding Cuckoo Filter (Coding Cuckoo Filter);
the data transmission stage is responsible for transmitting the coded cuckoo filter for recording the compressed data in each distributed data warehouse to the central server;
the data recovery and detection stage is to decode and recover original words from the encoded cuckoo filters constructed from the distributed data sets on the central server, estimate the heat (frequency) of the words, and output hot research topics according to the heat (frequency) requirements of academic topics given by users.
2. The method of detecting hot spots academic research topics as claimed in claim 1, wherein hot spots academic research topics are detected in a distributed academic data warehouse, (1) a set of encoded cuckoo filters is maintained during a data storage phase, and whether to store into each filter is determined by multiple sampling for each academic vocabulary; (2) in the data coding and transmission stage, the original data is not stored, but the code, fingerprint information and frequency information of the original data are stored; (2) in the data coding stage, the frequency of each academic vocabulary is sampled and then recorded together with the codes and fingerprints of the academic vocabulary; (3) then, in the data recovery and detection stage, according to the fingerprint information, the codes belonging to the same element are gathered and then decoded to recover the original data; (4) and then, in the data recovery and detection stage, according to the fingerprint information, the codes belonging to the research vocabulary are gathered and then decoded to recover the original data.
3. The method for detecting hot academic research topics as claimed in claim 1, wherein the method framework for detecting persistent network attacks in the distributed network is characterized in that in the data encoding and transmission stage, data is firstly subjected to multi-sampling and then is subjected to compression encoding and then is stored in the process of entering the encoding cuckoo filter.
4. The method of detecting hot academic research topics as claimed in claim 1, wherein during the data recovery and detection phase, the heat of the academic vocabulary is recovered by means of maximum likelihood estimation.
5. The method of claim 1, wherein during the data sampling phase, a set of encoded cuckoo filters is maintained for each distributed data warehouse, and then the academic documents in the data warehouse are scanned and the academic research vocabulary is extracted by a word splitter, and finally the multiple sampling determines how the vocabulary is stored; the process of multisampling is as follows: (1) the sampling probability of the encoding type cuckoo filter in each group is increased according to the sequence number and shows geometric attenuation, such as: the sampling probability of the first filter is 1%, the second is 0.2%, the third is 0.04%, and so on; (2) each academic vocabulary is independently sampled on all the encoding type cuckoo filters in the group according to the preset sampling frequency of the filters, and the sampling process of a plurality of filters forms multiple sampling; the successfully sampled academic vocabulary will enter the subsequent encoding stage.
6. The method of claim 1, wherein in the data encoding stage, the academic vocabulary is compressed first, then the vocabulary fingerprint information is acquired, and then the acquired code and fingerprint information are inserted into the encoded cuckoo filter, and the counter of the insertion position is increased by 1.
7. The method of detecting hot academic research topics as claimed in claim 1, wherein the compression encoding process: each academic vocabulary has an identification number (ID), which can be directly obtained from English characters or binary representation of Chinese coding; since the number is usually long, direct transmission causes excessive traffic; in order to solve the problem of communication traffic, the data is first subjected to a lossy compression (Raptor code) code, and the process is as follows:
Figure FDA0002672912210000021
raptor code coding matrix [ a ]ij]J is more than or equal to 1 and less than or equal to l, the corresponding length of vocabulary ID is
Figure FDA0002672912210000022
The coded result of the bit is
Figure FDA0002672912210000023
The calculation process is as follows:
Figure FDA0002672912210000024
fingerprint acquisition, given vocabulary ID, and hash function hf(. The) fingerprint information f (length is p) is obtained as follows:
f=hf(ID)%2pwhere% represents a modulo operation;
after the code and the fingerprint are obtained, inserting the code information and the fingerprint information according to the insertion mode of a common cuckoo filter: (1) computing potentially two insertable data buckets using two hash functions; (2) if the two positions have spaces which can be inserted, the two positions are directly inserted; (3) if there is no space to insert in these two positions, it directly kicks out an element to free up the position to insert, then the element proposed is inserted again by repeating the above process.
8. The method of detecting a hot academic research topic according to claim 1,
extracting compressed data: after obtaining the coded cuckoo filters sent by each server, arranging and aligning the coded cuckoo filters for processing; traversing all data buckets for all cuckoo filters, selecting the current data bucket, then taking out the elements in the current data bucket, and then extracting the elements which are transmitted by all distributed data warehouses and have the same fingerprint information at the same insertion position according to the fingerprint information of the elements to form a similar encoding group; the traversal encounters element 1, and then all the other elements in the same coding group with the element are extracted according to element 1.
9. The method of detecting a hot academic research topic of claim 1, wherein decoding: substituting the extracted codes into a formula 1 to decode an original vocabulary ID; and (3) heat estimation: and aiming at the decoded vocabulary ID, calculating an estimated heat value by utilizing maximum likelihood estimation according to the value in the counter of the decoded vocabulary ID and the corresponding sampling probability, and then outputting the decoded vocabulary and the corresponding heat.
CN202010938852.0A 2020-09-09 2020-09-09 Method for detecting hot academic research topics in distributed academic data warehouse Active CN112149416B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010938852.0A CN112149416B (en) 2020-09-09 2020-09-09 Method for detecting hot academic research topics in distributed academic data warehouse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010938852.0A CN112149416B (en) 2020-09-09 2020-09-09 Method for detecting hot academic research topics in distributed academic data warehouse

Publications (2)

Publication Number Publication Date
CN112149416A true CN112149416A (en) 2020-12-29
CN112149416B CN112149416B (en) 2023-08-22

Family

ID=73890168

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010938852.0A Active CN112149416B (en) 2020-09-09 2020-09-09 Method for detecting hot academic research topics in distributed academic data warehouse

Country Status (1)

Country Link
CN (1) CN112149416B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170235496A1 (en) * 2016-02-11 2017-08-17 Dell Products L.P. Data deduplication with augmented cuckoo filters
CN108494790A (en) * 2018-04-08 2018-09-04 南京大学 A method of detecting sustained network attack in distributed network
CN109815234A (en) * 2018-12-29 2019-05-28 杭州中科先进技术研究院有限公司 A kind of multiple cuckoo filter under streaming computing model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170235496A1 (en) * 2016-02-11 2017-08-17 Dell Products L.P. Data deduplication with augmented cuckoo filters
CN108494790A (en) * 2018-04-08 2018-09-04 南京大学 A method of detecting sustained network attack in distributed network
CN109815234A (en) * 2018-12-29 2019-05-28 杭州中科先进技术研究院有限公司 A kind of multiple cuckoo filter under streaming computing model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAIPENG DAI等: "Finding Persistent Items in Distributed Datasets", 《IEEE/ACM TRANSACTIONS ON NETWORKING》, vol. 28, no. 1, pages 1 - 14, XP011773533, DOI: 10.1109/TNET.2019.2946417 *
MENG LI等: "Thresholded Monitoring in Distributed Data Streams", 《IEEE/ACM TRANSACTIONS ON NETWORKING》, vol. 28, no. 3, pages 1033 - 1046, XP011794038, DOI: 10.1109/ICDCS.2019.00030 *

Also Published As

Publication number Publication date
CN112149416B (en) 2023-08-22

Similar Documents

Publication Publication Date Title
CN102682024B (en) Method for recombining incomplete JPEG file fragmentation
CN102937926A (en) Method and device for recovering deleted sqlite files on mobile terminal
CN102148805A (en) Feature matching method and device
CN105426479A (en) Method and system for quickly searching for title through picture
CN101968796B (en) Method for segmenting bidirectionally and concurrently executed file level variable-length data
CN105068889B (en) Recover the method for complete deletion file in Ext3/Ext4
CN102045268B (en) A kind of e-mail data restoration methods and device
CN109753382B (en) Recovery method and system for database deleted records
CN117376632B (en) Data recovery method and system based on intelligent depth synthesis
CN113901006A (en) Large-scale gene sequencing data storage and query system
CN106844607A (en) A kind of SQLite data reconstruction methods suitable for non-integer major key and idle merged block
CN102867049A (en) Chinese PINYIN quick word segmentation method based on word search tree
CN116594572B (en) Floating point number stream data compression method, device, computer equipment and medium
CN108494790B (en) Method for detecting continuous network attack in distributed network
CN111008183B (en) Storage method and system for business wind control log data
CN102073682B (en) Code mode-based document data recovering system and quick recovering method thereof
CN112149416B (en) Method for detecting hot academic research topics in distributed academic data warehouse
CN102693315A (en) Method and device for removing URL (uniform resource locator) duplicate on basis of shared memory mapping
CN112800183B (en) Content name data processing method and terminal equipment
CN112799872B (en) Erasure code encoding method and device based on key value pair storage system
CN110401941B (en) Cache data security management method in esim card
CN112261600B (en) Short message content fast matching method and short message intercepting method based on content
CN105512305A (en) Serialization-based document compression and decompression method and device
CN110941730A (en) Retrieval method and device based on human face feature data migration
CN112765421B (en) Data retrieval method and device and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant