CN102708216A - Word-segmentation organizing method and clustering method for ciphertext search - Google Patents

Word-segmentation organizing method and clustering method for ciphertext search Download PDF

Info

Publication number
CN102708216A
CN102708216A CN2012102227877A CN201210222787A CN102708216A CN 102708216 A CN102708216 A CN 102708216A CN 2012102227877 A CN2012102227877 A CN 2012102227877A CN 201210222787 A CN201210222787 A CN 201210222787A CN 102708216 A CN102708216 A CN 102708216A
Authority
CN
China
Prior art keywords
document
participle
ciphertext
cluster
server end
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012102227877A
Other languages
Chinese (zh)
Inventor
陆月明
马良
袁玉宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN2012102227877A priority Critical patent/CN102708216A/en
Publication of CN102708216A publication Critical patent/CN102708216A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a word-segmentation organizing method and a clustering method for a ciphertext search. The word-segmentation organizing method and clustering method solves an indexing and searching problems of a private document of a user on a public server; word segmentation, feature extraction and summarization are carried out to the private document by a client end, and encryption is carried to segmented words, the document and a summary, so that the privacy of the private document of the user on the server is ensured; and the encrypted document is stored and clustered on a server end, an index is built for convenience in searching, so that searching requests of the user are responded and secondary clustering of the ciphertext is achieved. The word-segmentation organizing method and the clustering method has the characteristics of segmenting the words in the document based on the client end, storing and clustering based on the ciphertext and searching and clustering based on the ciphertext.

Description

A kind of participle method for organizing and clustering method towards cipher text searching
Technical field
The present invention relates to be applied to a kind of the participle method for organizing and the clustering method of search engine, belong to the computer science and technology field towards cipher text searching.
Background technology
The data volume that service on the internet at present produces is increasing; Typical Internet service (like microblogging, search engine, community website, Video Applications etc.) has produced ultra-large data volume; Like this; Big data processing technique is arisen at the historic moment, and big its core of data processing technique is a kind of data-intensive computing technique, is typically the cloud computing technology.
Be accompanied by the appearance of cloud computing technology; Comprise the appearance of cloud memory technology, cloud search technique, virtual machine computing technique; The service of applying Internet cloud computing all concentrates on the service of common data; No matter search engine or microblogging, disclosed to a certain extent is to be considered to not be the too data of secret entirely.Along with the appearance of systems of real name such as microblogging, the protection of user's private data has been put on the agenda and, and the cloud computing calculating that to be the third party provide, the security of this calculating receives the query of user and high-end customer.The safety of cloud data becomes a maximum challenge, and the data-privacy resist technology becomes the problem that primary need solves in the cloud computing.
The safety technique of computational resource, Internet resources and storage resources is developing in field separately at present.Along with the development of cloud computing, people seem more urgent to the security study of calculating and store.Security and the Trusted Computing aspect research direction calculated are a lot, but are to grow up later in 2010 to the computations of cloud computing, mainly contain both direction at present, a quantum computer safety calculated direction, and another is the homomorphic cryptography algorithm.The achievement in research of this both direction does not also arrive practical application at present, mainly is that full homomorphic cryptography algorithm and quantum calculation machine technology are not very ripe, and part technology and key issue are also being captured.
The present invention proposes a kind of participle method for organizing and clustering method towards cipher text searching; This invention is calculated the strategy that two kinds of methods combine through content understanding and enciphered data; Understand document content through the plaintext segmenting method, calculate (cluster) through enciphered data and realize secret protection and information cluster.
Summary of the invention
The present invention's " a kind of participle method for organizing and clustering method towards cipher text searching " comprises two parts: client document participle method for organizing, server end enciphered data clustering method.
(1) client document participle method for organizing
Participle (like " Chinese word segmentation ") is a kind of technology of article, paragraph (being called " document " here) being divided phrase based on " semanteme ", is an important techniques in the search engine.The Chinese Academy of Sciences and Fudan University etc. are all studied at present, and have obtained good effect.The tissue of word-dividing mode in search engine be arranged on server end (like cloud computing servers such as Baidu; Here be called state equipment); After just document (the Word format file, the PDF document that comprise html format document, Microsoft) obtains from network; Server end carries out participle to document, becomes " phrase " by document at once.This a kind of participle organizational form at server end is a kind of participle organizational form to public service.The challenge of this a kind of organizational form is that document is transparent to server, can not protect the privacy of document.
And in the system of non-public service; As in publicly-owned cloud, having the application of privately owned cloud network; Particularly user's private information or unit private information; The privacy of these information needs protection, and the server that can not be provided public service is known, so need design a kind of " novel participle organizational form ".
The client document participle method for organizing that this patent proposes is a kind of participle organizational form of carrying out in client (comprising people's computing machine, the privately owned access device of mobile phone); Be present in the word-dividing mode of the privately owned equipment of client, to avoid in the process of participle, revealing client's privacy information.In order to keep original information, must carry out feature extraction and participle to former document.
(like Baidu) is different with public search engine; The document that the document of the search of privately owned document mainly produces from user itself; There is vigilance to these documents in the user in the storage of server end shared device; So can not there be server end in the plaintext of these documents, but in order to realize storage, search at server end, the privately owned equipment of client must be born a part of calculation task: the client participle.
The client segmenting method is a kind of novel computing method; The client word-dividing mode is present in the structure as shown in Figure 1, and this module mainly comprises extraction, file encryption, five functions of document summary encryption of document being carried out participle, participle encryption, file characteristics vector.
(1) document participle.Document carries out participle and is meant original document is carried out participle according to semanteme to have identical functions with general document participle.
(2) participle is encrypted.The participle encryption be meant in order to store participle on the server and to carry out for next step cluster and search based on participle encrypted in participle, and after the encryption, the participle that stores on the server is the participle ciphertext.
(3) extraction of file characteristics vector.The extraction of file characteristics vector is the document quantificational description that realizes for the cluster that realizes document, and this part is based on expressly and extracts, and on server, preserves.
(4) file encryption.File encryption is the AES of taking in order on server, to preserve document data.
(5) document summary.Document to the user carries out summary, helps the search of document.
Table 1 has been described the main operation of client word-dividing mode and the content that content is stored between privately owned terminal device of client and server end state equipment.Can find out that in order to protect the content privacy, all participle activity and encryption activities must be accomplished on the privately owned terminal device of client.
Operation that table 1 is main and content type
Content Client Server end Operation
Document Expressly Ciphertext Client encrypt
The file characteristics vector Extract Storage Client extracts
The participle phrase Expressly Ciphertext Client encrypt
The document summary Expressly Ciphertext Client encrypt
(2) server end enciphered data clustering method
Data clusters be for ease category search and adopt form different classes to multiple document according to content.Clustering methods such as K-Means are typically arranged.Clustering method is usually used in search engine system, library's document retrieval system etc.Server end enciphered data clustering method is a kind of novel computing method.
In order to protect privacy, keeping participle, the proper vector of article and the document of encryption encrypted on the server end shared device.Server end enciphered data cluster is a kind of mathematical computations (and can relate to the feature extraction of sentence based on cluster expressly, as based on the cluster of emotion etc.) fully.Server end enciphered data clustering method is present in the cluster application program module of shared device of server end, comprises storage cluster and search cluster respectively.Server end is as shown in Figure 2 based on the storage clustering method of ciphertext participle.The storage cluster is that document stores in the memory device according to certain rules, plays the purpose of index.Server end forms a plurality of types through the proper vector of document being calculated back (as adopting the K-Means algorithm), and each type adopts the ciphertext participle to mark.The ciphertext participle is pointed in each document ciphertext, summary link.
Searching and clustering method is as shown in Figure 3, and searching and clustering method is that client is submitted keyword to, and client is calculated near synonym, conjunctive word, the synonym of keyword, and passes to server end after encrypting keyword, near synonym, conjunctive word, synonym simultaneously.The search cluster module of server end is collected the associated class of these ciphertexts according to keyword, near synonym, conjunctive word, synon ciphertext, and carries out the secondary cluster.We call the search cluster to the secondary cluster.
Server end passes to client to the summary of ciphertext through the search cluster, and client is obtained the summary info of document through deciphering, for further full-text search provides the basis.
Description of drawings
Fig. 1 client participle method for organizing
Fig. 2 server end is based on the storage clustering method of ciphertext participle
Fig. 3 server end search cluster
Fig. 4 is towards the participle method for organizing of cipher text searching and the instance of clustering method
Embodiment
To combine the accompanying drawing in the embodiment of the invention below, the technical scheme in the embodiment of the invention is carried out clear, intactly description, obviously, described embodiment also only is a part of embodiment of the present invention, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills are not making the every other embodiment that is obtained under the creative work prerequisite, all belong to the scope of the present invention's protection.
For " a kind of participle method for organizing and clustering method towards cipher text searching " is described, provide an instance of introducing the document (tomato .txt) of tomato about storage and search here.Document mainly introduced tomato be red, circle, be information such as vegetables.Fig. 4 is towards the participle method for organizing of cipher text searching and the instance of clustering method.
In conjunction with instance and Fig. 4, a kind ofly be towards the participle method for organizing of cipher text searching and the processing procedure of clustering method:
(1) the participle stage.The client word-dividing mode is handled document " tomato .txt " on the privately owned equipment of client, through participle, document be divided into participle [tomato, red, the circle; Vegetables ... ], and the proper vector [1,1,1 of formation document; 1,0,0,0 ... ] and summary document " tomato is red round vegetables ".
(2) the encryption stage.Encrypting module on the privately owned equipment of client to participle [tomato, red, the circle; Vegetables ... ], summary document " tomato is red round vegetables ", document " tomato .txt " adopt AES (as adopting RSA Algorithm) to encrypt, and obtains ciphertext participle [aa; Bb, cc, dd; ], ciphertext summary document " ee ", ciphertext document " ff.txt ", and submit to server end cluster module.
(3) the storage cluster stage.Storage cluster module on the server end shared device the ciphertext participle [aa, bb, cc, dd ... ], ciphertext summary document " ee ", ciphertext document " ff.txt ", proper vector [1,1,1,1,0,0,0 ... ] set up peer link.Then with regard to proper vector [1,1,1,1,0,0,0 ... ] and server on the further feature vector carry out cluster, form the class name of document with regard to centre word, like vegetables " xx ", tomato " aa " type.
(4) search participle input phase.Word-dividing mode is analyzed the search word of user's input in client; Here suppose that search word is " tomato "; Search near synonym (like " red fruit "), conjunctive word (like " tomato plant "), the synonym (like " tomato ") of search word, and encrypt the keyword after the encryption " tomato "; Near synonym " red fruit ", synonym (like " tomato ") ciphertext is sent to server end.
(5) the search cluster stage.Search cluster module is at the keyword " tomato " of server end to input; Near synonym " red fruit "; Synonym (like " tomato ") ciphertext is analyzed, and searches the class of these speech, find the main document of these types after; Search plain cluster module these documents are carried out the secondary cluster, the document summary of secondary cluster is sent to user client.
(6) user's cryptanalysis stage.The search summary that the cryptanalysis module is passed back (5) in client is deciphered, and the summary after the deciphering is presented on the users'interfaces with certain form, makes things convenient for the user to make decision.
Advantage of the present invention
The present invention proposes a kind of participle method for organizing and clustering method towards cipher text searching, and this invention is applied to the cluster and the search of own data of user and document.The classification for search and the cluster of it and public service have very big difference, have guaranteed the privacy of subscriber data in the server end storage, and utilize the powerful storage and the computing power of server, realize the cluster and the search of encrypted document.
Advantage of the present invention mainly contains:
(1) through the segmenting method of client, having solved server end can not be to problems such as the participle of encrypted document, feature extraction, summaries.
(2) through the encryption of participle and document, make user's document have certain privacy in the storage of server end.
(3) through the storage cluster of server end, make encrypted document set up good index, be convenient to search, satisfy the condition of big data computation.
(4), realize the search of user to the server end data through the search cluster of server end.

Claims (1)

1. the present invention proposes a kind of participle method for organizing and clustering method towards cipher text searching, and this invention is carried out participle, feature extraction, summary in client to oneself privately owned document, and participle, document, summary are encrypted; At server end encrypted document is stored cluster and storage, and response user's searching request, the secondary cluster of realization ciphertext.
Principal feature of the present invention has:
(1) client-based document participle, the traditional documents participle is at server end, and segmenting method of the present invention can guarantee the privacy of customer documentation, is not known by the server shared device.
(2) based on the storage cluster of ciphertext, the conventional store clustering method is to realize cluster expressly at server end, and the present invention can realize the storage cluster based on ciphertext.
(3) based on the search secondary cluster of ciphertext, traditional searching and clustering method is to realize secondary cluster expressly at server end, and the present invention can realize the search secondary cluster based on ciphertext.
CN2012102227877A 2012-06-28 2012-06-28 Word-segmentation organizing method and clustering method for ciphertext search Pending CN102708216A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012102227877A CN102708216A (en) 2012-06-28 2012-06-28 Word-segmentation organizing method and clustering method for ciphertext search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012102227877A CN102708216A (en) 2012-06-28 2012-06-28 Word-segmentation organizing method and clustering method for ciphertext search

Publications (1)

Publication Number Publication Date
CN102708216A true CN102708216A (en) 2012-10-03

Family

ID=46900982

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012102227877A Pending CN102708216A (en) 2012-06-28 2012-06-28 Word-segmentation organizing method and clustering method for ciphertext search

Country Status (1)

Country Link
CN (1) CN102708216A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103024035A (en) * 2012-12-11 2013-04-03 上海交通大学 Safe and energy-saving encryption searching method based on mobile cloud platform
CN103595730A (en) * 2013-11-28 2014-02-19 中国科学院信息工程研究所 Ciphertext cloud storage method and system
CN103927340A (en) * 2014-03-27 2014-07-16 中国科学院信息工程研究所 Ciphertext retrieval method
CN107241182A (en) * 2017-06-29 2017-10-10 电子科技大学 A kind of secret protection hierarchy clustering method based on vectorial homomorphic cryptography
CN108513659A (en) * 2016-12-29 2018-09-07 谷歌有限责任公司 The key data that search and retrieval are maintained using key data library
CN109495565A (en) * 2018-11-14 2019-03-19 中国科学院上海微系统与信息技术研究所 High concurrent service request processing method and equipment based on distributed ubiquitous computation
CN113239393A (en) * 2021-04-29 2021-08-10 重庆邮电大学 Longitudinal federal k-Means privacy protection method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1588365A (en) * 2004-08-02 2005-03-02 中国科学院计算机网络信息中心 Ciphertext global search technology
CN101520800A (en) * 2009-03-27 2009-09-02 华中科技大学 Cryptogram-based safe full-text indexing and retrieval system
CN101694672A (en) * 2009-10-16 2010-04-14 华中科技大学 Distributed safe retrieval system
CN102024054A (en) * 2010-12-10 2011-04-20 中国科学院软件研究所 Ciphertext cloud-storage oriented document retrieval method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1588365A (en) * 2004-08-02 2005-03-02 中国科学院计算机网络信息中心 Ciphertext global search technology
CN101520800A (en) * 2009-03-27 2009-09-02 华中科技大学 Cryptogram-based safe full-text indexing and retrieval system
CN101694672A (en) * 2009-10-16 2010-04-14 华中科技大学 Distributed safe retrieval system
CN102024054A (en) * 2010-12-10 2011-04-20 中国科学院软件研究所 Ciphertext cloud-storage oriented document retrieval method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭利刚: "密文全文检索系统的研究与实现", 《中国优秀硕士学位论文全文数据库(电子期刊)》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103024035B (en) * 2012-12-11 2015-04-15 上海交通大学 Safe and energy-saving encryption searching method based on mobile cloud platform
CN103024035A (en) * 2012-12-11 2013-04-03 上海交通大学 Safe and energy-saving encryption searching method based on mobile cloud platform
CN103595730A (en) * 2013-11-28 2014-02-19 中国科学院信息工程研究所 Ciphertext cloud storage method and system
CN103595730B (en) * 2013-11-28 2016-06-08 中国科学院信息工程研究所 A kind of ciphertext cloud storage method and system
CN103927340A (en) * 2014-03-27 2014-07-16 中国科学院信息工程研究所 Ciphertext retrieval method
CN103927340B (en) * 2014-03-27 2017-06-27 中国科学院信息工程研究所 A kind of cipher text retrieval method
CN108513659B (en) * 2016-12-29 2021-12-28 谷歌有限责任公司 Searching and retrieving keying data maintained using a keying database
CN108513659A (en) * 2016-12-29 2018-09-07 谷歌有限责任公司 The key data that search and retrieval are maintained using key data library
US11372946B2 (en) 2016-12-29 2022-06-28 Google Llc Search and retrieval of keyed data maintained using a keyed database
CN107241182A (en) * 2017-06-29 2017-10-10 电子科技大学 A kind of secret protection hierarchy clustering method based on vectorial homomorphic cryptography
CN109495565B (en) * 2018-11-14 2021-11-30 中国科学院上海微系统与信息技术研究所 High-concurrency service request processing method and device based on distributed ubiquitous computing
CN109495565A (en) * 2018-11-14 2019-03-19 中国科学院上海微系统与信息技术研究所 High concurrent service request processing method and equipment based on distributed ubiquitous computation
CN113239393A (en) * 2021-04-29 2021-08-10 重庆邮电大学 Longitudinal federal k-Means privacy protection method and device and electronic equipment
CN113239393B (en) * 2021-04-29 2022-03-22 重庆邮电大学 Longitudinal federal k-Means privacy protection method and device and electronic equipment

Similar Documents

Publication Publication Date Title
US10013574B2 (en) Method and apparatus for secure storage and retrieval of encrypted files in public cloud-computing platforms
CN102708216A (en) Word-segmentation organizing method and clustering method for ciphertext search
Ren et al. Toward secure and effective data utilization in public cloud
CN109063509A (en) It is a kind of that encryption method can search for based on keywords semantics sequence
CN108363689A (en) Secret protection multi-key word Top-k cipher text retrieval methods towards mixed cloud and system
CN107622212A (en) A kind of mixing cipher text retrieval method based on double trapdoors
Rajesh et al. Survey on privacy preserving data mining techniques using recent algorithms
Qayyum Data security in mobile cloud computing: A state of the art review
CN109739945A (en) A kind of multi-key word ciphertext ordering searching method based on hybrid index
Nguyen et al. An enhanced scheme for privacy-preserving association rules mining on horizontally distributed databases
Zhang et al. Efficient personalized search over encrypted data for mobile edge-assisted cloud storage
CN110990518A (en) Unstructured data security method for smart power grid
Handa et al. Efficient privacy‐preserving scheme supporting disjunctive multi‐keyword search with ranking
CN107329911B (en) Cache replacement method based on CP-ABE attribute access mechanism
Pal et al. Efficient search on encrypted data using bloom filter
Gan et al. Dynamic searchable symmetric encryption with forward and backward privacy: A survey
Vulapula et al. Secure and efficient data storage scheme for unstructured data in hybrid cloud environment
CN111966778A (en) Multi-keyword ciphertext sorting and searching method based on keyword grouping reverse index
Yong et al. Keyword semantic extended top-k Ciphertext retrieval scheme over hybrid Government cloud environment
Dasic et al. Applications of the search as a service (SaaS)
Gupta et al. Miscegenation of scalable and DEP3K performance evaluation of nosql-cassandra for bigdata applications deployed in cloud
Kamble et al. A study on fuzzy keywords search techniques and incorporating certificateless cryptography
Shen et al. Ranked searchable symmetric encryption supporting conjunctive queries
Swathika et al. Time-conserving deduplicated data retrieval framework for the cloud computing environment
CN115118448B (en) Data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20121003