CN117349879A - Text data anonymization privacy protection method based on continuous word bag model - Google Patents

Text data anonymization privacy protection method based on continuous word bag model Download PDF

Info

Publication number
CN117349879A
CN117349879A CN202311165334.XA CN202311165334A CN117349879A CN 117349879 A CN117349879 A CN 117349879A CN 202311165334 A CN202311165334 A CN 202311165334A CN 117349879 A CN117349879 A CN 117349879A
Authority
CN
China
Prior art keywords
text data
word
text
words
anonymizing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311165334.XA
Other languages
Chinese (zh)
Inventor
吴萍
郭海宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Hankang Dongyou Information Technology Co ltd
Original Assignee
Jiangsu Hankang Dongyou Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Hankang Dongyou Information Technology Co ltd filed Critical Jiangsu Hankang Dongyou Information Technology Co ltd
Priority to CN202311165334.XA priority Critical patent/CN117349879A/en
Publication of CN117349879A publication Critical patent/CN117349879A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a text data anonymization privacy protection method based on a continuous word bag model, which relates to the technical field of data security protection, and aims to solve the problems that the method is difficult to determine the corresponding relation between a user and a certain feature in a table when an attacker attacks the certain feature through a link, so that the attacker cannot determine the specific identity of the user through part of the features, the privacy of the data to be issued is protected, and anonymization protection is realized. The method and the device are used for solving the problem of user data privacy protection in the technical field of data security privacy protection.

Description

Text data anonymization privacy protection method based on continuous word bag model
Technical Field
The invention relates to the technical field of data security protection, in particular to a text data anonymization privacy protection method based on a continuous word bag model.
Background
With the rapid development of information application technology, complete data becomes a necessary premise for the development of various industries, and in this context, data sharing is also one of the popular applications of cloud storage technology. However, due to the huge value of the data itself, the security problem of the data in the sharing process is more serious. Malicious users, malicious cloud storage servers, and hackers can snoop the privacy of users through various methods, and it is common to mine sensitive information through data published by users. In the context of data-oriented decisions, people release and share data more frequently. The value of data, especially private data, is increasing in various aspects, however, at the same time, a large number of severe private data leakage events occur in the data distribution process aiming at information sharing and data mining;
how to effectively select data features and select an efficient data anonymization model to realize effective anonymization data privacy protection is a problem which needs to be solved, and therefore, a text data anonymization privacy protection method based on a continuous word bag model is provided.
Disclosure of Invention
The invention aims to provide a text data anonymization privacy protection method based on a continuous bag-of-words model.
The aim of the invention can be achieved by the following technical scheme: a text data anonymization privacy protection method based on a continuous word bag model comprises the following steps:
step S1: constructing a text feature information base, and establishing a corresponding text definition model according to the constructed text feature information base;
step S2: acquiring text data to be anonymously protected, preprocessing the acquired text data, and completing information standardization of the acquired text data;
step S3: inputting the text data with standardized information into a text definition model, and finishing the feature extraction of the text data;
step S4: and anonymizing the text data with the feature extraction, and anonymizing the text data.
Further, the text feature information base comprises an English feature information base and a Chinese feature information base, and corresponding ontology construction rules are respectively imported into the English feature information base and the Chinese feature information base.
Further, the ontology construction rule is customized according to the user requirement, the user can select required elements according to the requirement to form a new ontology construction rule, the formed new ontology construction rule is recorded as the user customization construction rule, and a corresponding text definition model is built according to the formed user customization construction rule.
Further, the preprocessing process for the text data comprises the following steps:
marking text data which is required to be anonymously protected by a user, obtaining the type of the text data, and executing corresponding information standardization operation on the text data according to the type of the text data, wherein the type of the text data comprises a Chinese type and an English type.
Further, when the type of the text data is a chinese type, the information normalization operation performed on the text data is:
setting a stop word list, wherein a plurality of entries are arranged in the stop word list, marking specific words in text data according to the entries in the stop word list, and eliminating the specific words from the text data;
setting a symbol table and a character table, wherein a plurality of punctuation marks are arranged in the symbol table, and a plurality of characters are arranged in the character table;
and eliminating punctuation marks and special characters in the text data according to the symbol table and the character table.
Further, when the type of the text data is an english type, the information normalization operation performed on the text data is:
setting an English root list, wherein a plurality of English roots are arranged in the English root list, and a related word list is set for the English roots according to actual conditions, wherein a plurality of related words are arranged in the related word list, and at least one related word is arranged in the related word list;
performing word variant reduction on the obtained text data according to the English root list;
and then converting all capitalized words or letters in the text data into lowercase words or letters.
Further, the feature extraction process of the text data includes:
extracting text data features to obtain corresponding text features, generating a corresponding word vector dictionary table according to the obtained text features,
the number of words contained in the text data is noted as n, and each word is numbered i, wherein i=1, 2, … …, n, n is an integer, and n > 0;
the text data is defined as s=w by Word Embedding algorithm i Wherein w is i For Word vectors, representing words numbered i in text data, using an Embedding matrix e=e in the Word Embedding algorithm i Each word w in the text data i Word vector x mapped to a multi-dimensional succession of values i Obtaining word vector matrix X=x of text data i Wherein x is i =(E×w i )∈R d D represents the dimension of the word vector;
after feature extraction of the text data is completed, anonymizing the text data is carried out.
Further, the process of anonymizing the text data includes:
the obtained word vector matrix is marked, word vectors in the word vector matrix are selected through a greedy algorithm of forward sequence search according to the label sequence of each word vector in the word vector matrix, the selected word vectors are marked to serve as anonymous features, and the anonymous features are anonymously protected, so that when the selected data features are attacked, because the selected data features are associated with other features, the corresponding relation between a user and a certain feature is difficult to determine by an attacker, and the attacker cannot determine the specific identity of the user through part of the features.
Compared with the prior art, the invention has the beneficial effects that: the method comprises the steps of preprocessing text data to complete information standardization processing of the text data, constructing a body of keywords in the text, extracting features of the text data, and finally selecting anonymous features through a greedy algorithm of forward sequence search, so that when an attacker attacks one feature through links, the selected data are associated with other features in a table, the corresponding relation between the user and the one feature is difficult to determine, the attacker cannot determine the specific identity of the user through part of the features, privacy of the data to be distributed is protected, and anonymization protection is achieved. The method and the device are used for solving the problem of user data privacy protection in the technical field of data security privacy protection.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
FIG. 1 is a flow chart of the present invention.
Detailed Description
As shown in fig. 1, the text data anonymization privacy protection method based on the continuous bag-of-words model comprises the following steps:
step S1: constructing a text feature information base, and establishing a corresponding text definition model according to the constructed text feature information base;
step S2: acquiring text data to be anonymously protected, preprocessing the acquired text data, and completing information standardization of the acquired text data;
step S3: inputting the text data with standardized information into a text definition model, and finishing the feature extraction of the text data;
step S4: and anonymizing the text data with the feature extraction, and anonymizing the text data.
It should be further noted that, in the implementation process, the text feature information base includes an english feature information base and a chinese feature information base;
respectively importing corresponding ontology construction rules into an English feature information base and a Chinese feature information base, and marking the ontology construction rules as follows
Wherein C is i Representing an ontology concept class set, consisting of a plurality ofThe ontology concept elements are composed; a is that j Representing an ontology attribute class set, which consists of a plurality of ontology attribute elements; r is R k Representing an ontology relation class set, which consists of a plurality of ontology relation elements; i=1, 2, … …, c; j=1, 2, … …, a; k=1, 2, … …, r; c. a and r are integers and are larger than 0;
establishing a corresponding vocabulary according to the elements in each set and associating the vocabulary with the corresponding elements; the vocabulary list consists of a plurality of vocabularies;
in the specific implementation process, the ontology construction rules are customized according to specific requirements of users, the users can select required elements in each set according to the requirements to form new ontology construction rules, the formed new ontology construction rules are recorded as user customization construction rules, and corresponding text definition models are built according to the formed user customization construction rules;
illustrating:
custom building rules for
Wherein C is i =[C 1 ,C 2 ,C 3 ];A i =[A 1 ,A 2 ,A 3 ];R i =[R 1 ,R 2 ,R 3 ];
C is C 1 ,C 2 ,C 3 The corresponding ontology concept elements are defined as name, address and telephone number, respectively;
then the corresponding ontology concept class set C in the Chinese feature information base i = [ name, address, telephone number ]]Ontology concept class set C in English feature information base i =[Name,Address,TelephoneNumber];
Similarly, let A 1 ,A 2 ,A 3 The corresponding ontology attribute elements are marked as name class attributes, address class attributes and telephone class attributes;
the corresponding ontology attribute class set A in the Chinese feature information base i = [ name class attribute, address class attribute, phone class attribute]English character information baseWithin the ontology-attribute class set A i =[NameType,AddressType,TelephoneNumberType];
Similarly, R is 1 ,R 2 ,R 3 Defining a friend relationship, a family relationship and a colleague relationship;
the corresponding ontology relation class set R in the Chinese characteristic information base i = [ name relationship, person relationship, colleague relationship]Ontology relation class set R in English feature information base i =[Friend,Family,Colleague];
And obtaining vocabularies associated with the elements according to the elements contained in each set, and completing the construction of the text definition model.
It should be further noted that, in the implementation process, the preprocessing process for the text data includes:
marking text data which is required to be anonymously protected by a user, obtaining the type of the text data, and executing corresponding information standardization operation on the text data according to the type of the text data; it should be further noted that, in the implementation process, the text data includes a chinese type and an english type;
when the type of the text data is a Chinese type, the information normalization operation performed on the text data is as follows:
sequentially performing text stop word removal and sign removal operation on the obtained text data to complete information standardization operation on the text data;
it should be further noted that, in the implementation process, the specific content of the text data for performing the text removal stop word operation is:
setting a stop word list, wherein a plurality of entries are arranged in the stop word list, marking specific words in text data according to the entries in the stop word list, and eliminating the specific words from the text data; the specific vocabulary is specifically words which do not play a role in actual meaning in the meaning of text data content in actual application, such as pronouns, articles, prepositions, conjunctions and moral verbs;
it should be further noted that, in the implementation process, the specific content of the text data de-sign operation is:
setting a symbol table and a character table, wherein a plurality of punctuation marks are arranged in the symbol table, and a plurality of characters are arranged in the character table;
eliminating punctuation marks and special characters in the text data according to the symbol table and the character table to finish information standardization operation of the text data;
through removing text stop words and signs from the text data of the Chinese type, when the text data of the Chinese type is processed, the text data content is optimized, and the volume of the text data can be reduced on the premise of keeping the original text data content and meaning, so that the calculation amount of a system is reduced, the calculation efficiency of the system is improved, and unnecessary calculation of the system is reduced;
when the type of the text data is an English type, the information normalization operation performed on the text data is as follows:
setting an English root list, wherein a plurality of English roots are arranged in the English root list, and a related word list is set for the English roots according to actual conditions, wherein a plurality of related words are arranged in the related word list, and at least one related word is arranged in the related word list;
performing word variant reduction on the obtained text data according to the English root list; the word variant is specifically restored to be the English root corresponding to the associated word when the word in the text data is the associated word in the associated word list corresponding to a certain English root in the English root list;
converting all capitalized words or letters in the text data into lowercase words or letters;
the text data of English type is restored by word variant, so that the text data content format is consistent when the text data of English type is processed, and the volume of the text data can be reduced on the premise of keeping the original text data content by utilizing word variant restoration, thereby reducing the calculation amount of a system;
importing the preprocessed text data into a text definition model to finish the feature extraction of the text data, wherein the specific process comprises the following steps:
extracting text data features to obtain corresponding text features, and generating a corresponding word vector dictionary table according to the obtained text features, wherein the dimension of the word vector dictionary table is 100 dimensions, and each dimension corresponds to one text feature, namely
The number of words contained in the text data is noted as n, and each word is numbered i, wherein i=1, 2, … …, n, n is an integer, and n > 0;
the text data is defined as s=w by Word Embedding algorithm i Wherein w is i For Word vectors, representing words numbered i in text data, using an Embedding matrix e=e in the Word Embedding algorithm i Each word w in the text data i Word vector x mapped to a multi-dimensional succession of values i Obtaining word vector matrix X=x of text data i Wherein x is i =(E×w i )∈R d D represents the dimension of the word vector, and is determined by a user customized construction rule;
after feature extraction of the text data is completed, anonymizing the text data is carried out.
It should be further noted that, in the implementation process, the process of anonymizing the text data includes:
marking the obtained word vector matrix, selecting the word vector in the word vector matrix through a greedy algorithm of forward sequence search according to the label sequence of each word vector in the word vector matrix, marking the selected word vector as an anonymous feature, and anonymously protecting the anonymous feature.
The method comprises the steps of preprocessing text data to complete information standardization processing of the text data, constructing a body of keywords in the text, extracting features of the text data, and finally selecting anonymous features through a greedy algorithm of forward sequence search, so that when an attacker attacks one feature through links, the selected data are associated with other features in a table, the corresponding relation between the user and the one feature is difficult to determine, the attacker cannot determine the specific identity of the user through part of the features, privacy of the data to be distributed is protected, and anonymization protection is achieved. The method and the device are used for solving the problem of user data privacy protection in the technical field of data security privacy protection.
The above embodiments are only for illustrating the technical method of the present invention and not for limiting the same, and it should be understood by those skilled in the art that the technical method of the present invention may be modified or substituted without departing from the spirit and scope of the technical method of the present invention.

Claims (8)

1. The text data anonymization privacy protection method based on the continuous word bag model is characterized by comprising the following steps of:
step S1: constructing a text feature information base, and establishing a corresponding text definition model according to the constructed text feature information base;
step S2: acquiring text data to be anonymously protected, preprocessing the acquired text data, and completing information standardization of the acquired text data;
step S3: inputting the text data with standardized information into a text definition model, and finishing the feature extraction of the text data;
step S4: and anonymizing the text data with the feature extraction, and anonymizing the text data.
2. The privacy protection method for anonymizing text data based on continuous bag-of-words model according to claim 1, wherein the text feature information base comprises an english feature information base and a chinese feature information base, and the corresponding ontology construction rules are respectively imported into the english feature information base and the chinese feature information base.
3. The privacy protection method for anonymizing text data based on continuous bag-of-words models according to claim 2, wherein the ontology construction rules are customized according to user requirements, a user can select required elements according to the requirements to form new ontology construction rules, the formed new ontology construction rules are recorded as user customization construction rules, and a corresponding text definition model is built according to the formed user customization construction rules.
4. The privacy preserving method for anonymizing text data based on continuous bag of words model as claimed in claim 3, wherein the preprocessing procedure for the text data comprises:
marking text data which is required to be anonymously protected by a user, obtaining the type of the text data, and executing corresponding information standardization operation on the text data according to the type of the text data, wherein the type of the text data comprises a Chinese type and an English type.
5. The privacy preserving method of anonymizing text data based on continuous bag of words model as claimed in claim 4, wherein when the type of text data is a chinese type, the information normalization operation performed on the text data is:
setting a stop word list, wherein a plurality of entries are arranged in the stop word list, marking specific words in text data according to the entries in the stop word list, and eliminating the specific words from the text data;
setting a symbol table and a character table, wherein a plurality of punctuation marks are arranged in the symbol table, and a plurality of characters are arranged in the character table;
and eliminating punctuation marks and special characters in the text data according to the symbol table and the character table.
6. The privacy preserving method for anonymizing text data based on continuous bag of words model as claimed in claim 5, wherein when the type of text data is english type, the information normalization operation performed on the text data is:
setting an English root list, wherein a plurality of English roots are arranged in the English root list, and a related word list is set for the English roots according to actual conditions, wherein a plurality of related words are arranged in the related word list, and at least one related word is arranged in the related word list;
performing word variant reduction on the obtained text data according to the English root list;
and then converting all capitalized words or letters in the text data into lowercase words or letters.
7. The privacy preserving method for anonymizing text data based on continuous bag of words model as claimed in claim 6, wherein the feature extraction process of the text data comprises:
extracting text data features to obtain corresponding text features, generating a corresponding word vector dictionary table according to the obtained text features,
the number of words contained in the text data is noted as n, and each word is numbered i, wherein i=1, 2, … …, n, n is an integer, and n > 0;
the text data is defined as s=w by Word Embedding algorithm i Wherein w is i For Word vectors, representing words numbered i in text data, using an Embedding matrix e=e in the Word Embedding algorithm i Each word w in the text data i Word vector x mapped to a multi-dimensional succession of values i Obtaining word vector matrix X=x of text data i Wherein x is i =(E×w i )∈R d D represents the dimension of the word vector;
after feature extraction of the text data is completed, anonymizing the text data is carried out.
8. The privacy preserving method for anonymizing text data based on continuous bag of words model as recited in claim 7, wherein the anonymizing text data comprises:
marking the obtained word vector matrix, selecting the word vector in the word vector matrix through a greedy algorithm of forward sequence search according to the label sequence of each word vector in the word vector matrix, marking the selected word vector as an anonymous feature, and anonymously protecting the anonymous feature.
CN202311165334.XA 2023-09-11 2023-09-11 Text data anonymization privacy protection method based on continuous word bag model Pending CN117349879A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311165334.XA CN117349879A (en) 2023-09-11 2023-09-11 Text data anonymization privacy protection method based on continuous word bag model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311165334.XA CN117349879A (en) 2023-09-11 2023-09-11 Text data anonymization privacy protection method based on continuous word bag model

Publications (1)

Publication Number Publication Date
CN117349879A true CN117349879A (en) 2024-01-05

Family

ID=89356485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311165334.XA Pending CN117349879A (en) 2023-09-11 2023-09-11 Text data anonymization privacy protection method based on continuous word bag model

Country Status (1)

Country Link
CN (1) CN117349879A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101964034A (en) * 2010-09-30 2011-02-02 浙江大学 Privacy protection method for mode information loss minimized sequence data
CN109960727A (en) * 2019-02-28 2019-07-02 天津工业大学 For the individual privacy information automatic testing method and system of non-structured text
CN111079174A (en) * 2019-11-21 2020-04-28 中国电力科学研究院有限公司 Power consumption data desensitization method and system based on anonymization and differential privacy technology
CN112507388A (en) * 2021-02-05 2021-03-16 支付宝(杭州)信息技术有限公司 Word2vec model training method, device and system based on privacy protection
CN112733186A (en) * 2020-12-31 2021-04-30 上海竞动科技有限公司 User privacy data analysis method and device
CN113743496A (en) * 2021-09-01 2021-12-03 北京工业大学 K-anonymous data processing method and system based on cluster mapping
CN115712703A (en) * 2022-12-26 2023-02-24 合肥随铥互联网科技有限公司 Decision analysis method and server applied to big data anonymous processing
CN115858785A (en) * 2022-12-06 2023-03-28 北京安信天行科技有限公司 Sensitive data identification method and system based on big data
CN115881257A (en) * 2022-03-03 2023-03-31 杨文宝 User privacy protection method and system applied to big data
CN115982765A (en) * 2022-12-28 2023-04-18 中移信息技术有限公司 Data desensitization method, device, equipment and computer readable storage medium
CN116305249A (en) * 2023-02-09 2023-06-23 苏州科技大学 Privacy information protection system and method for text data transmission
CN116502258A (en) * 2023-03-16 2023-07-28 上海梅斯医药科技有限公司 Sensitive information desensitization and recognition system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101964034A (en) * 2010-09-30 2011-02-02 浙江大学 Privacy protection method for mode information loss minimized sequence data
CN109960727A (en) * 2019-02-28 2019-07-02 天津工业大学 For the individual privacy information automatic testing method and system of non-structured text
CN111079174A (en) * 2019-11-21 2020-04-28 中国电力科学研究院有限公司 Power consumption data desensitization method and system based on anonymization and differential privacy technology
CN112733186A (en) * 2020-12-31 2021-04-30 上海竞动科技有限公司 User privacy data analysis method and device
CN112507388A (en) * 2021-02-05 2021-03-16 支付宝(杭州)信息技术有限公司 Word2vec model training method, device and system based on privacy protection
CN113743496A (en) * 2021-09-01 2021-12-03 北京工业大学 K-anonymous data processing method and system based on cluster mapping
CN115881257A (en) * 2022-03-03 2023-03-31 杨文宝 User privacy protection method and system applied to big data
CN115858785A (en) * 2022-12-06 2023-03-28 北京安信天行科技有限公司 Sensitive data identification method and system based on big data
CN115712703A (en) * 2022-12-26 2023-02-24 合肥随铥互联网科技有限公司 Decision analysis method and server applied to big data anonymous processing
CN115982765A (en) * 2022-12-28 2023-04-18 中移信息技术有限公司 Data desensitization method, device, equipment and computer readable storage medium
CN116305249A (en) * 2023-02-09 2023-06-23 苏州科技大学 Privacy information protection system and method for text data transmission
CN116502258A (en) * 2023-03-16 2023-07-28 上海梅斯医药科技有限公司 Sensitive information desensitization and recognition system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
(意)吉安卡洛·扎克尼(GIANCARLOZACCONE): "《TensorFlow深度学习》", 29 February 2020, pages: 170 - 172 *
李天舟: "文本文档中敏感信息发现及脱敏方法研究", 《北京交通大学》, 15 March 2021 (2021-03-15) *
王亚欣: "基于文本内容的敏感信息识别", 《兰州大学》, 15 January 2023 (2023-01-15) *
黄天元等: "《文本数据挖掘 基于R语言》", 30 April 2021, pages: 86 - 88 *

Similar Documents

Publication Publication Date Title
CN109815742B (en) Data desensitization method and device
Yadav et al. A novel approach of bulk data hiding using text steganography
Wang et al. A coverless plain text steganography based on character features
CN105426445A (en) Format-preserving data desensitization method
US10706160B1 (en) Methods, systems, and articles of manufacture for protecting data in an electronic document using steganography techniques
CN112541196B (en) Dynamic data desensitization method and system
CN111079386B (en) Address recognition method, device, equipment and storage medium
CN105512523B (en) The digital watermark embedding and extracting method of a kind of anonymization
Khairullah A novel text steganography system using font color of the invisible characters in microsoft word documents
CA2928836A1 (en) Methods and apparatuses of digital data processing
CN112328735A (en) Hot topic determination method and device and terminal equipment
US8750605B2 (en) Searchable color encoded file composing method and searchable color encoded file system
CN107451036A (en) Input reminding method, device and equipment
Kilichev et al. Errors in SMS to hide short messages
Rafat et al. Secure digital steganography for ASCII text documents
CN113129875A (en) Voice data privacy protection method based on countermeasure sample
WO2024066271A1 (en) Database watermark embedding method and apparatus, database watermark tracing method and apparatus, and electronic device
CN117349879A (en) Text data anonymization privacy protection method based on continuous word bag model
CN111881480A (en) Private data encryption method and device, computer equipment and storage medium
CN104363348B (en) Information data processing method and processing device
CN108985759B (en) Address generating method, system, equipment and storage medium for cryptocurrency
CN104750665A (en) Text message processing method and text message processing device
CN112507388B (en) Word2vec model training method, device and system based on privacy protection
CN115712722A (en) Clustering system, method, electronic device and storage medium for multi-language short message text
Granados et al. Is the contextual information relevant in text clustering by compression?

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination