CN112364165A - Automatic classification method based on Chinese privacy policy terms - Google Patents

Automatic classification method based on Chinese privacy policy terms Download PDF

Info

Publication number
CN112364165A
CN112364165A CN202011261262.5A CN202011261262A CN112364165A CN 112364165 A CN112364165 A CN 112364165A CN 202011261262 A CN202011261262 A CN 202011261262A CN 112364165 A CN112364165 A CN 112364165A
Authority
CN
China
Prior art keywords
data
privacy policy
data set
word
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011261262.5A
Other languages
Chinese (zh)
Inventor
朱璋颖
陆亦恬
唐祝寿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Benzhong Information Technology Co ltd
Original Assignee
Shanghai Benzhong Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Benzhong Information Technology Co ltd filed Critical Shanghai Benzhong Information Technology Co ltd
Priority to CN202011261262.5A priority Critical patent/CN112364165A/en
Publication of CN112364165A publication Critical patent/CN112364165A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an automatic classification method based on Chinese privacy policy terms, which belongs to the technical field of natural language processing and comprises the following steps: data processing: acquiring privacy policies of a plurality of applications as a data set, manually labeling the data set to obtain a data set with a label, and then cleaning the data set to obtain a training sample data set; training data: selecting features of the training sample data set, selecting effective features capable of identifying different clauses and categories, and establishing a detection model; determining whether the privacy policy text has integrity. According to the automatic classification method based on the Chinese privacy policy terms, provided by the invention, through automatic classification based on the privacy policy terms, the privacy policy contents are quickly classified under various classification type attributes, so that convenience is brought to reading and understanding of a user, meanwhile, the completeness detection of the privacy policy terms is realized, and the user can quickly identify whether the privacy policy is complete or not.

Description

Automatic classification method based on Chinese privacy policy terms
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to an automatic classification method based on Chinese privacy policy terms.
Background
With the increasing number of people using APP, the leakage events of the privacy data of users emerge endlessly. The APP privacy policy is a statement that discloses how user data is collected, used, shared, and managed, and is an autonomous measure for the APP operator to collect user information. However, the privacy policy is long in space, difficult to understand and long in reading time, and most users agree to the privacy policy without reading directly, so that the problem in the privacy policy is likely to be a vulnerability of privacy security of the users. Aiming at the above phenomena, in order to help the user to read the Chinese privacy policy and reflect the quality of the privacy policy, the contents of the Chinese privacy policy terms need to be automatically classified.
In the prior art, the content of the Chinese privacy policy is analyzed by a content analysis method. Through classifying and counting the contents of the Chinese privacy policy, the analysis dimension characteristics and the mutual relation are summarized, and the comparison is carried out according to the research target so as to obtain the conclusion about the current situation of the Chinese privacy policy and the like. The encoding is a key step of content analysis, but encoding a large amount of content is cumbersome, and meanwhile, due to the fact that an artificial encoding process generates errors, the intrinsic effectiveness of content analysis is low. Therefore, a simple and effective detection method is needed to find out the problems existing in the chinese privacy policy quickly, accurately and automatically, and improve the readability of the chinese privacy policy for the user to read conveniently.
Disclosure of Invention
The invention aims to provide an automatic classification method based on Chinese privacy policy terms, and aims to solve the technical problems that in the prior art, a large amount of contents are coded more complexly, and the content analysis has low intrinsic effectiveness.
In order to achieve the purpose, the invention adopts the technical scheme that: the automatic classification method based on the Chinese privacy policy terms comprises the following steps:
data processing: the method comprises the steps of obtaining a plurality of applied privacy policies as data sets, marking the terms of the privacy policies to obtain data sets with labels, and then cleaning the data sets to obtain training sample data sets;
training data: selecting features of the training sample data set, selecting effective features capable of identifying different clauses and categories, training a classifier based on the feature vectors of the clauses and the categories, and establishing a detection model;
and (3) data detection: receiving a privacy policy text through the detection model, classifying the clause content of the privacy policy text under various types of attributes, and judging whether the privacy policy text has integrity.
Further, the data processing includes:
acquiring data;
establishing a data marking standard according to the requirements of laws and regulations, wherein the data marking standard comprises all terms required to be completely covered by privacy policies in the laws and regulations;
labeling the data;
and removing noise words in the data, and performing word segmentation processing by using a word segmentation tool to obtain a clause data set with a label after word segmentation.
Further, the data annotation criteria comprises a number of classification categories, wherein the classification categories include at least one of first party collection/use, sharing/transfer/disclosure with third parties, data security, user access/editing/deletion methods, term changes, terms facing a particular demographic group, and other general information.
Further, the data annotation standard contains 7 classification categories, 50 attributes and 91 values.
Further, the data training comprises:
and performing feature selection on the training sample data set through a TF-IDF algorithm, wherein the calculation formula is as follows:
TF-IDF=TF×IDF
for the ith word ti, the TF formula is:
Figure BDA0002774709200000031
in the above formula, ni,jIs the word tiAt jth file djThe denominator is in the file djSum of the occurrence numbers of all words in, nk,jPresentation document djThe k-th word in the document djNumber of occurrences of, tfi,jIndicates the word tiIn document djThe word frequency of (1);
the IDF formula is:
Figure BDA0002774709200000032
wherein idfiIndicates the word tiThe reverse file frequency of (2);
d represents the total number of files in the corpus;
|{j:ti∈djdenotes the word t is includediThe number of files.
Further, the data detection includes:
and (3) calculating classification probability: calculating a support vector machine classifier trained by each category i in the privacy policy text, and predicting the probability of y being i, wherein i being (1,2,3 …, k) k is the number of categories;
selecting categories: for a given new input x, taking one classification class with the highest probability of predicting y to i by the classifier trained by each classification class as the classification class of the new input x.
Further, the word segmentation tool is a Jieba word segmentation tool.
Further, removing the noise words in the data by adopting a Hadamard decommissioning vocabulary.
The automatic classification method based on the Chinese privacy policy terms provided by the invention has the beneficial effects that: compared with the prior art, the automatic classification method based on the Chinese privacy policy terms, disclosed by the invention, has the advantages that the privacy policy contents are quickly classified under various classification type attributes through automatic classification based on the privacy policy terms, so that convenience is brought to reading and understanding of a user, meanwhile, the completeness detection of the privacy policy terms is realized, and the user can quickly identify whether the privacy policy is complete or not.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flowchart of an automatic classification method based on Chinese privacy policy terms according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a classification stage of a support vector machine in an automatic classification method based on Chinese privacy policy terms according to an embodiment of the present invention;
fig. 3 is a schematic diagram of input and output of three stages of support vector machine classification in the automatic classification method based on the chinese privacy policy clause according to the embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1 to fig. 3, an automatic classification method based on the terms of the chinese privacy policy according to the present invention will now be described. The automatic classification method based on the Chinese privacy policy terms comprises the following steps:
s1, data processing: acquiring a plurality of applied privacy policies as a data set, labeling the terms of the privacy policies to obtain a data set with a label, and then cleaning the data set to obtain a training sample data set;
the applications are obtained from the application marketplace or download channels in various application official websites.
The process of cleansing the data set includes: word segmentation and de-noising of the sound words.
The specific implementation of the step can be as follows:
s1.1, acquiring data;
privacy policies are obtained by web crawlers for mobile application markets or/and internet websites.
More specifically, the privacy policy of the 100 popular applications in the application market is obtained through the web crawler.
S1.2, establishing a data marking standard according to the requirements of laws and regulations, wherein the data marking standard comprises all terms required to be completely covered by privacy policies in the laws and regulations;
the data annotation criteria comprises a number of classification categories, wherein the classification categories include at least one of first party collection/use, sharing/transfer/disclosure with third parties, data security, user access/editing/deletion methods, term changes, terms facing a particular demographic group, and other general information.
More specifically, the data annotation criteria contained 7 classification categories, 50 attributes, and 91 values. Where the classification category represents a basic classification, such as: the First Party collects the usage, shares with the Third Party, etc., and respectively represents the usage, the sharing with the Third Party, etc., by the tags of First-Party-Collect-Use, Third-Party-Share, etc. The attribute represents the specific content corresponding to the basic classification, for example, the collection and Use of the First Party are continuously divided into collection purposes, the selection of the User and the like, and the First Party is represented by tags such as First-Party-Collection-User-Purpose, First-Party-Collection-User-Collection and the like. For the attribute, corresponding value options are designed, for example, whether the interactive attribute sets two values of 'yes' and 'no'.
S1.3, marking the data;
on the basis of fully understanding the classification standard, the privacy policy is labeled by using an online labeling tool BRAT. The consistency of the data annotation is tested by Cohen's kappa coefficient, and the content of the data annotation is proved to be credible.
More specifically, this step ultimately resulted in a data set containing 100 chinese privacy policy terms, including 11,440 category and attribute tags.
And S1.4, removing noise words in the data, and performing word segmentation processing by using a word segmentation tool to obtain a clause data set with a label after word segmentation.
And calling the stop word list to remove noise from the privacy policy terms, reducing noise words in the privacy policy and improving the subsequent classification effect. Illustratively, the stop word list is a Hadamard stop word list.
And performing word segmentation processing on the cleaned data through a Jieba word segmentation tool to obtain a clause data set with a label after word segmentation. The noise words are words which appear frequently and have no practical meaning, such as "and", "even", etc.
S2, training data: selecting features of a training sample data set, selecting effective features capable of identifying different clauses and categories, training a classifier based on the feature vectors of the clauses of each category, and establishing a detection model;
the specific implementation of the step can be as follows:
selecting features of a training sample data set through a TF-IDF algorithm, and selecting effective features capable of identifying different clause classification categories;
the calculation formula is as follows:
TF-IDF=TF×IDF
for the ith word ti, the TF formula is:
Figure BDA0002774709200000061
in the above formula, ni,jIs the word tiAt jth file djThe denominator is in the file djSum of the occurrence numbers of all words in, nk,jPresentation document djThe k-th word in the document djNumber of occurrences of, tfi,jIndicates the word tiIn document djThe word frequency of (1);
the IDF formula is:
Figure BDA0002774709200000062
wherein idfiIndicates the word tiThe reverse file frequency of (2);
d represents the total number of files in the corpus;
|{j:ti∈djdenotes the word t is includediThe number of files.
The classification algorithm mainly adopted by the invention is a support vector machine algorithm.
The data set is a multi-label data set and is different from the general two-classification problem, so that a classifier is constructed by adopting an One-vs-all strategy, particularly, an OneVsRestClassifier in a scinit-left toolkit is adopted for implementation, and under the condition of small sample number and large feature number, a linear support vector machine is considered, and a kernel function is used for mapping a finite-dimensional space to a high-dimensional space, so that the finite-dimensional space can be linearly classified.
The specific method comprises the following steps:
marking one of the classes as positive (y 1) and then all others as negative, this model is denoted as
Figure BDA0002774709200000071
Then, similarly, the second class is selected to be marked as the positive-going class (y ═ 2), and the other classes are marked as the negative-going classes, and the model is marked as
Figure BDA0002774709200000072
And so on.
Finally, a series of models are obtained, which are abbreviated as:
Figure BDA0002774709200000073
where i is (1,2,3 …, k), and k is the number of categories.
When prediction is needed, all classifiers are run once, and then the output variable with the highest probability is selected for each input variable. Finally according toAnd (3) training a support vector machine classifier by a one-vs-all strategy:
Figure BDA0002774709200000074
where i corresponds to each possible y-i, a new value of x is input for making the prediction.
Inputting x in each classification model, selecting one of the order
Figure BDA0002774709200000075
Maximum i, i.e.
Figure BDA0002774709200000076
S3, data detection: and receiving the privacy policy text through the detection model, classifying the clause contents of the privacy policy text under various types of attributes, and judging whether the privacy policy text has integrity.
The specific implementation of the step can be as follows:
s3.1, calculating classification probability: calculating a probability that a support vector machine classifier trained by each classification category i in a privacy policy text predicts y-i, wherein i-i (1,2,3 …, k) k is the number of categories;
s3.2, selecting the categories: for a given new input x, taking one classification class with the highest probability of predicting y to i by the classifier trained by each classification class as the classification class of the new input x.
Based on which a privacy policy integrity check is performed.
Compared with the prior art, the automatic classification method based on the Chinese privacy policy terms, provided by the invention, has the advantages that the privacy policy contents are quickly classified under various classification type attributes through automatic classification based on the privacy policy terms, so that convenience is brought to reading and understanding of a user, meanwhile, the completeness detection of the privacy policy terms is realized, and the user can quickly identify whether the privacy policy is complete or not.
The invention provides a specific implementation mode, which comprises the following steps:
the method comprises the steps of obtaining Chinese privacy policy data from Huashi application markets, defining privacy policy term classification standards according to relevant laws and regulations, determining the content of privacy policy terms into 7 classification categories, and determining corresponding attributes under each category. The classification categories include: first party collection/use, sharing/transfer/disclosure with third parties, data security, user access/editing/deletion methods, terms change, terms facing specific groups of people, other general information;
corresponding attributes under each category: for example, the collection/use continuation of the first party is divided into collection purposes, user selection and the like, the sharing/transfer/disclosure continuation with the third party is divided into a sharing mode, constraints on the third party and the like, the data security continuation is divided into security measures, data storage time limits and the like, the user access/edit/delete method is divided into operation ways, operations which can be performed by the user and the like, the term change continuation is divided into change reasons, informing modes and the like, the terms facing a specific group are divided into user selection, supplier actions and the like, and other general information is divided into privacy policy application scope, operator information and the like.
And manually labeling terms according to the classification standard to obtain a labeled Chinese privacy policy data set. The data set is divided into two parts, one part is used for constructing a classifier, and the other part is used for detecting the accuracy of the model.
And for the marked Chinese privacy policy data, performing word segmentation on the marked clauses of each classification category through a Jieba word segmentation tool, and separating word groups by using blank spaces. And the Chinese privacy policy data introduces a deactivation word list at the same time, and words with high occurrence frequency and no practical meaning such as 'the', 'even' and the like are deleted.
The privacy policy data contains' if a person sends out overseas transmission of personal information in the process of using overseas transaction service, after obtaining your authorization agreement alone, the overseas receiver is ensured to process your personal information according to the policy description and strict security measures. For example, the result after word segmentation is [ in the process of using the overseas transaction service, if a overseas transmission of personal information is sent, after obtaining the authorization agreement of your alone, it is ensured that the overseas receiver has to process your personal information according to the policy description and strict security measures. The result after deleting the noise word is [ sending information in overseas transaction service, and overseas output authorization meaning ensuring overseas receiving information of strict security measures of root policy instructions for processing ].
And calculating the weight of each phrase in the document vector by adopting a TF-IDF algorithm, comparing the sizes of the phrase weights, and arranging the phrases from large to small according to the weights to obtain the keywords. The obtained keywords have good category distinguishing capability and can be used as characteristic attributes of the category.
Calculating the formula: TF-IDF ═ TF X IDF
Wherein, the TF term frequency represents the frequency of the term appearing in the document d; the frequency representation of the IDF reverse file is a measure of the general importance of a word, and if the number of documents containing the entry t is less, the IDF is larger, so that the entry t has good category distinguishing capability. The Term Frequency (TF) formula is: for the ith word tiIn the case of a composite material, for example,
Figure BDA0002774709200000091
in the above formula, ni,jIs the word tiAt jth file djThe denominator is in the file djSum of the occurrence numbers of all words in, nk,jPresentation document djThe k-th word in the document djNumber of occurrences of, tfi,jIndicates the word tiIn document djThe word frequency of (1);
IDF represents the Inverse Document Frequency (IDF), which is obtained by dividing the total document number by the number of documents containing the term and taking the obtained quotient to be a base-10 logarithm:
Figure BDA0002774709200000092
wherein idfiIndicates the word tiThe reverse file frequency of (2);
d represents the total number of files in the corpus;
|{j:ti∈djdenotes the word t is includediThe number of files of (a);
the TF-IDF algorithm can filter common words, retain important words (keywords), and further clean data through the synonym dictionary.
The steps are a preparation working stage, wherein part of privacy policy data is input, and the output is a privacy policy data sample with a category attribute label and a keyword. The data sample is obtained from the application market and classified. Keywords refer to characteristic attributes that may represent a category.
In order to realize the multi-label classification problem, the multi-label problem is converted into the multi-classification problem, and a classifier is constructed by adopting a one-vs-all strategy.
The method comprises the following steps:
marking one of the classes as positive (y 1) and then all others as negative, this model is denoted as
Figure BDA0002774709200000101
Then, similarly, the second class is selected to be marked as the positive-going class (y ═ 2), and the other classes are marked as the negative-going classes, and the model is marked as
Figure BDA0002774709200000102
And so on.
Finally, a series of models are obtained, which are abbreviated as:
Figure BDA0002774709200000103
where i is (1,2,3 …, k), and k is the number of categories.
When prediction is needed, all classifiers are run once, and then the output variable with the highest probability is selected for each input variable.
And finally training a support vector machine classifier according to a one-vs-all strategy:
Figure BDA0002774709200000104
where i corresponds to each possible y-i, and to make a prediction, a new value of x is input, which is used to make the prediction. Inputting x in each classification model, selecting one of the order
Figure BDA0002774709200000105
Maximum i, i.e.
Figure BDA0002774709200000106
The above is automatically calculated by a program, and the output is a classifier.
And finally, classifying the items to be classified by using a classifier to obtain the mapping relation between the items to be classified and the categories and further obtain the integrity identification of the items to be classified.
According to the method, from the perspective of natural language processing technology, privacy policies in an application market are collected, characteristic attributes are obtained through analysis, and a detection model is established through classifier training. The detection model can quickly and accurately classify each term in the privacy policy by receiving the privacy policy from the application market, so that whether the privacy policy completely covers all terms required by related regulations becomes clear at a glance, the completeness detection of the privacy policy is realized, and the readability of the privacy policy is improved at the same time
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (8)

1. An automatic classification method based on Chinese privacy policy terms is characterized by comprising the following steps:
data processing: acquiring a plurality of applied privacy policies as a data set, manually marking the terms of the privacy policies to obtain a data set with a label, and then cleaning the data set to obtain a training sample data set;
training data: selecting features of the training sample data set, selecting effective features capable of identifying different clauses and categories, training a classifier based on the feature vectors of the clauses and the categories, and establishing a detection model;
and (3) data detection: receiving a privacy policy text through the detection model, classifying the clause content of the privacy policy text under various types of attributes, and judging whether the privacy policy text has integrity.
2. The method of claim 1, wherein the data processing comprises:
acquiring data;
establishing a data marking standard according to the requirements of laws and regulations, wherein the data marking standard comprises all terms required to be completely covered by privacy policies in the laws and regulations;
labeling the data;
and removing noise words in the data, and performing word segmentation processing by using a word segmentation tool to obtain a clause data set with a label after word segmentation.
3. The method of claim 2, wherein the method further comprises: the data annotation criteria comprises a number of classification categories, wherein the classification categories include at least one of first party collection/use, sharing/transfer/disclosure with third parties, data security, user access/editing/deletion methods, term changes, terms facing a particular demographic group, and other general information.
4. The method of claim 3, wherein the method further comprises: the data annotation criteria contained 7 classification categories, 50 attributes, and 91 values.
5. The method of claim 3, wherein the data training comprises:
and performing feature selection on the training sample data set through a TF-IDF algorithm, wherein the calculation formula is as follows:
TF-IDF=TF×IDF
for the ith word ti, the TF formula is:
Figure FDA0002774709190000021
in the above formula, ni,jIs the word tiAt jth file djThe denominator is in the file djSum of the occurrence numbers of all words in, nk,jPresentation document djThe k-th word in the document djNumber of occurrences of, tfi,jIndicates the word tiIn document djThe word frequency of (1);
the IDF formula is:
Figure FDA0002774709190000022
wherein idfiIndicates the word tiThe reverse file frequency of (2);
d represents the total number of files in the corpus;
|{j:ti∈djdenotes the word t is includediThe number of files.
6. The method of claim 5, wherein the data detection comprises:
and (3) calculating classification probability: calculating a support vector machine classifier trained by each category i in the privacy policy text, and predicting the probability of y being i, wherein i being (1,2,3 …, k) k is the number of categories;
selecting categories: for a given new input x, taking one classification class with the highest probability of predicting y to i by the classifier trained by each classification class as the classification class of the new input x.
7. The method of claim 2, wherein the method further comprises: the word segmentation tool is a jieba word segmentation tool.
8. The method of claim 2, wherein the method further comprises: and removing the noise words in the data by adopting a Hadamard decommissioning vocabulary.
CN202011261262.5A 2020-11-12 2020-11-12 Automatic classification method based on Chinese privacy policy terms Pending CN112364165A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011261262.5A CN112364165A (en) 2020-11-12 2020-11-12 Automatic classification method based on Chinese privacy policy terms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011261262.5A CN112364165A (en) 2020-11-12 2020-11-12 Automatic classification method based on Chinese privacy policy terms

Publications (1)

Publication Number Publication Date
CN112364165A true CN112364165A (en) 2021-02-12

Family

ID=74515398

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011261262.5A Pending CN112364165A (en) 2020-11-12 2020-11-12 Automatic classification method based on Chinese privacy policy terms

Country Status (1)

Country Link
CN (1) CN112364165A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051607A (en) * 2021-03-11 2021-06-29 天津大学 Privacy policy information extraction method
CN113076538A (en) * 2021-04-02 2021-07-06 北京邮电大学 Method for extracting embedded privacy policy of mobile application APK file
CN113220877A (en) * 2021-04-30 2021-08-06 天津大学 Privacy policy compliance detection method
CN113282955A (en) * 2021-06-01 2021-08-20 上海交通大学 Method, system, terminal and medium for extracting privacy information in privacy policy
CN113326536A (en) * 2021-06-02 2021-08-31 支付宝(杭州)信息技术有限公司 Method and device for judging compliance of application program
CN113723085A (en) * 2021-08-26 2021-11-30 北京航空航天大学 Pseudo-fuzzy detection method in privacy policy document
CN115080924A (en) * 2022-07-25 2022-09-20 南开大学 Software license clause extraction method based on natural language understanding

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583208A (en) * 2018-12-03 2019-04-05 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Malicious software identification method and system based on mobile application comment data
CN109657207A (en) * 2018-11-29 2019-04-19 爱保科技(横琴)有限公司 The formatting processing method and processing unit of clause
CN110413789A (en) * 2019-07-31 2019-11-05 广西师范大学 A kind of exercise automatic classification method based on SVM
CN110533305A (en) * 2019-08-12 2019-12-03 北京科技大学 A kind of smelter work safety accident Synthetical prevention method
CN110674289A (en) * 2019-07-04 2020-01-10 南瑞集团有限公司 Method, device and storage medium for judging article belonged classification based on word segmentation weight
CN110705955A (en) * 2019-08-22 2020-01-17 阿里巴巴集团控股有限公司 Contract detection method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657207A (en) * 2018-11-29 2019-04-19 爱保科技(横琴)有限公司 The formatting processing method and processing unit of clause
CN109583208A (en) * 2018-12-03 2019-04-05 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Malicious software identification method and system based on mobile application comment data
CN110674289A (en) * 2019-07-04 2020-01-10 南瑞集团有限公司 Method, device and storage medium for judging article belonged classification based on word segmentation weight
CN110413789A (en) * 2019-07-31 2019-11-05 广西师范大学 A kind of exercise automatic classification method based on SVM
CN110533305A (en) * 2019-08-12 2019-12-03 北京科技大学 A kind of smelter work safety accident Synthetical prevention method
CN110705955A (en) * 2019-08-22 2020-01-17 阿里巴巴集团控股有限公司 Contract detection method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐雷等: "移动APP隐私条款可获得性及内容分析研究", 《现代情报》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051607A (en) * 2021-03-11 2021-06-29 天津大学 Privacy policy information extraction method
CN113051607B (en) * 2021-03-11 2022-04-19 天津大学 Privacy policy information extraction method
CN113076538A (en) * 2021-04-02 2021-07-06 北京邮电大学 Method for extracting embedded privacy policy of mobile application APK file
CN113076538B (en) * 2021-04-02 2021-12-14 北京邮电大学 Method for extracting embedded privacy policy of mobile application APK file
CN113220877A (en) * 2021-04-30 2021-08-06 天津大学 Privacy policy compliance detection method
CN113282955A (en) * 2021-06-01 2021-08-20 上海交通大学 Method, system, terminal and medium for extracting privacy information in privacy policy
CN113326536A (en) * 2021-06-02 2021-08-31 支付宝(杭州)信息技术有限公司 Method and device for judging compliance of application program
CN113723085A (en) * 2021-08-26 2021-11-30 北京航空航天大学 Pseudo-fuzzy detection method in privacy policy document
CN113723085B (en) * 2021-08-26 2024-05-24 北京航空航天大学 Pseudo-fuzzy detection method in privacy policy document
CN115080924A (en) * 2022-07-25 2022-09-20 南开大学 Software license clause extraction method based on natural language understanding
CN115080924B (en) * 2022-07-25 2022-11-15 南开大学 Software license clause extraction method based on natural language understanding

Similar Documents

Publication Publication Date Title
CN112364165A (en) Automatic classification method based on Chinese privacy policy terms
CN111274365B (en) Intelligent inquiry method and device based on semantic understanding, storage medium and server
CN107291780B (en) User comment information display method and device
TWI653542B (en) Method, system and device for discovering and tracking hot topics based on network media data flow
JP4920023B2 (en) Inter-object competition index calculation method and system
AU2017200585A1 (en) System and engine for seeded clustering of news events
Im et al. Linked tag: image annotation using semantic relationships between image tags
WO2009134462A2 (en) Method and system to predict the likelihood of topics
US20100153320A1 (en) Method and arrangement for sim algorithm automatic charset detection
CN107193883B (en) Data processing method and system
CN113076735B (en) Target information acquisition method, device and server
CN108228612B (en) Method and device for extracting network event keywords and emotional tendency
CN110287292A (en) A kind of judge's measurement of penalty irrelevance prediction technique and device
JP5098631B2 (en) Mail classification system, mail search system
WO2023273303A1 (en) Tree model-based method and apparatus for acquiring degree of influence of event, and computer device
Wagner Privacy Policies Across the Ages: Content and Readability of Privacy Policies 1996--2021
Omondiagbe et al. Features that predict the acceptability of java and javascript answers on stack overflow
CN116610853A (en) Search recommendation method, search recommendation system, computer device, and storage medium
CN114202443A (en) Policy classification method, device, equipment and storage medium
JP3583631B2 (en) Information mining method, information mining device, and computer-readable recording medium recording information mining program
Mohemad et al. Performance analysis in text clustering using k-means and k-medoids algorithms for Malay crime documents
Wang et al. A collaborative filtering algorithm fusing user-based, item-based and social networks
CN112434126B (en) Information processing method, device, equipment and storage medium
Kotenko et al. The intelligent system for detection and counteraction of malicious and inappropriate information on the Internet
Al Mahmud et al. A New Technique to Classification of Bengali News Grounded on ML and DL Models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210212

RJ01 Rejection of invention patent application after publication