CN109145308A - A kind of concerning security matters text recognition method based on improvement naive Bayesian - Google Patents

A kind of concerning security matters text recognition method based on improvement naive Bayesian Download PDF

Info

Publication number
CN109145308A
CN109145308A CN201811134941.9A CN201811134941A CN109145308A CN 109145308 A CN109145308 A CN 109145308A CN 201811134941 A CN201811134941 A CN 201811134941A CN 109145308 A CN109145308 A CN 109145308A
Authority
CN
China
Prior art keywords
feature
concerning security
security matters
text
naive bayesian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811134941.9A
Other languages
Chinese (zh)
Other versions
CN109145308B (en
Inventor
敬思远
杨骏
孙锐
郭肇毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Leshan Normal University
Original Assignee
Leshan Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leshan Normal University filed Critical Leshan Normal University
Priority to CN201811134941.9A priority Critical patent/CN109145308B/en
Publication of CN109145308A publication Critical patent/CN109145308A/en
Application granted granted Critical
Publication of CN109145308B publication Critical patent/CN109145308B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Abstract

The invention discloses a kind of based on the concerning security matters text recognition method for improving naive Bayesian, comprising the following steps: S1. building model-naive Bayesian simultaneously carries out incremental learning;S2. the model-naive Bayesian that load incremental learning obtains;S3. text to be identified is read;S4. text is identified using model-naive Bayesian, and marks its corresponding level of confidentiality.In the present invention, makes study more reasonable based on naive Bayesian weighted model, and propose the incremental learning scheme of feature weight, the accuracy rate of concerning security matters text detection can be substantially improved;Based on the carry out incremental learning that concerning security matters feature space changes, simply and effectively solve the problems, such as that the level of confidentiality for the concerning security matters feature that new concerning security matters feature is added or has been friends in the past declines.

Description

A kind of concerning security matters text recognition method based on improvement naive Bayesian
Technical field
The present invention relates to concerning security matters text identifications, more particularly to a kind of based on the concerning security matters text identification for improving naive Bayesian Method.
Background technique
With the development of information technology, can be realized a large amount of comprehensive office, research and production business information system gradually It appears in social life and work, a large amount of sensitive data and information is store in information system.How classified information is prevented It is leaked to the external world by internet, is currently highly desirable solve the problems, such as.
The automatic detection of concerning security matters text is the effective technology means to solve the above problems.According to Bell_Lapadula model, Current classified information is generally divided into disclosure, secret, secret and top-secret four grades.When concerning security matters text hand on network When the change of current turns (such as official document, Email etc.), which can effectively detect level of confidentiality belonging to the text.When detecting this After the level of confidentiality of text, then the level of confidentiality label demarcated with user oneself compares, and can find the information flow of the concerning security matters text It is whether legal.For example, if text information labeling is " disclosure ", and the level of confidentiality that automatic detection algorithm detects by user It is " secret ", then it is illegal to can determine that the behavior belongs to.
Naive Bayesian (Bayes) be current text detection field one of main stream approach.But based on simplicity Bayes realizes that the automatic detection of concerning security matters text needs to solve two hang-ups: (1) since the particularity of confidential document (cannot be random Check), it is difficult to it obtains complete mark sample and model-naive Bayesian is learnt;(2) the concerning security matters feature in text (relates to Close keyword) it can change with time-shift, the keyword of concerning security matters can not become new concerning security matters feature before some;And It was the word of concerning security matters feature before some, its level of confidentiality may can be gradually decreased with the time, and there is presently no methods to be able to solve The problem.
Summary of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of based on the concerning security matters text for improving naive Bayesian This recognition methods.
The purpose of the present invention is achieved through the following technical solutions: a kind of based on the concerning security matters text for improving naive Bayesian This recognition methods, it is characterised in that: the following steps are included:
S1. it constructs model-naive Bayesian and carries out incremental learning;
S2. the model-naive Bayesian that load incremental learning obtains;
S3. text to be identified is read;
S4. text is identified using model-naive Bayesian, and marks its corresponding level of confidentiality.
Further, the concerning security matters text recognition method further includes recognition result uploading step: by the identification knot of step S4 Fruit uploads to unified control centre.
Further, the step S1 includes following sub-step:
S101. building model-naive Bayesian identifies the sample with user annotation label;
S102. the label of unified control center administrator will identify that label and user annotation compares, if it is Identification mistake, the sample and its correct label are just added to sample database;
S103. naive Bayesian weighted model is constructed;
S104. the concerning security matters feature level of confidentiality for having new concerning security matters feature to be added or have been friends in the past in concerning security matters feature space changes When, the carry out incremental learning based on the change of concerning security matters feature space;
S105. incremental learning is carried out according to the variation of sample database and concerning security matters feature database;
S106. the model after study is written in model-naive Bayesian, and system is notified to be reloaded.
Closer, the step S101 includes:
The first, model-naive Bayesian is constructed:
If the sample space D of concerning security matters text is by feature space W={ w1,w2,…,wnAnd classification space C={ c1,c2,…, cmComposition;The word for including in sample space D, that is, text, classification space C, that is, concerning security matters text level of confidentiality;To a given text d= {w1,w2,…,wl, model-naive Bayesian by calculate the text belong to posterior probability of all categories, to its generic into Row differentiates;The posterior probability of which classification is big, and the testing result of the text is exactly that corresponding classification, and discriminate is as follows:
Wherein P (ci) indicate classification prior probability;P(wj|ci) indicate in classification ciUnder the conditions of, feature wjWhat is occurred is general Rate:
Wherein | C |, | D | and | W | respectively indicate the size of classification space, sample space and feature space;count(ci) table Show and belongs to classification ciSample number,It indicates in classification ciIn there is feature wjSample number;
The second, the sample with user annotation label is identified using model-naive Bayesian, obtains each sample Recognition result.
The step S103 includes:
The first, naive Bayesian weighted model is constructed:
λj,iIndicate that j-th of feature belongs to the weight of i-th of classification in feature space, according to Bell_Lapadula model, Each feature has 4 weights, respectively corresponds disclosure, secret, secret and top secret:
Wherein TFi(wj) it is text feature wjIn ciThe word frequency occurred in classification text;IDFi(wj) it is improved inverse document Frequency;Text feature number of files in class is bigger, and the number of files occurred in other classes is smaller, then its weight is bigger.
The step S104 includes:
It is new special when the concerning security matters feature level of confidentiality for having new concerning security matters feature to be added or have been friends in the past in concerning security matters feature space changes The case where sign is added: P (t is selected first from the other feature generic with new featurej|ci) the maximum feature of value, owned Information is copied to new feature, the weight λ according to step S103 to all features under the categoryj,iWith conditional probability P (wj|ci) It is reevaluated;Then P (t is selected from the other feature different classes of with new featurej|ci) the smallest feature of value, by its institute There is information to be copied to new feature, then the weight λ according to step S103 to all features under the categoryj,iWith conditional probability P (wj|ci) reevaluated;
The case where changing for old feature concerning security matters feature level of confidentiality similarly, first from generic other of variation characteristic P (t is selected in featurej|ci) the maximum feature of value, its all information is copied to transform characteristics, according to step S103 to all Weight λ of the feature under the categoryj,iWith conditional probability P (wj|ci) reevaluated;Then from different classes of with transform characteristics Other feature in select P (tj|ci) the smallest feature of value, its all information is copied to transform characteristics, then according to step Weight λ of the S103 to all features under the categoryj,iWith conditional probability P (wj|ci) reevaluated.
The step S105 includes:
Feature weight realizes incremental learning in two dimensions of sample space and feature space:
Wherein TF 'i() and count'() indicate the statistical result on sample increment collection;
Incremental learning based on feature weight obtains P (ci) and P (wj|ci) incremental learning result:
The beneficial effects of the present invention are: making study more reasonable based on naive Bayesian weighted model, and propose spy The incremental learning scheme for levying weight, can be substantially improved the accuracy rate of concerning security matters text detection;Changed based on concerning security matters feature space Incremental learning is carried out, simply and effectively solves the level of confidentiality decline of the new addition of concerning security matters feature or the concerning security matters feature haveing been friends in the past Problem.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention;
Fig. 2 is the flow chart that model-naive Bayesian carries out incremental learning.
Specific embodiment
Technical solution of the present invention is described in further detail with reference to the accompanying drawing, but protection scope of the present invention is not limited to It is as described below.
As shown in Figure 1, a kind of based on the concerning security matters text recognition method for improving naive Bayesian, comprising the following steps:
S1. it constructs model-naive Bayesian and carries out incremental learning;
S2. the model-naive Bayesian that load incremental learning obtains;
S3. text to be identified is read;
S4. text is identified using model-naive Bayesian, and marks its corresponding level of confidentiality.
In embodiments herein, the concerning security matters text recognition method further includes recognition result uploading step: by step The recognition result of S4 uploads to unified control centre.
As described in Figure 2, the step S1 includes following sub-step:
S101. building model-naive Bayesian identifies the sample with user annotation label;
S102. the label of unified control center administrator will identify that label and user annotation compares, if it is Identification mistake, the sample and its correct label are just added to sample database;
S103. naive Bayesian weighted model is constructed;
S104. the concerning security matters feature level of confidentiality for having new concerning security matters feature to be added or have been friends in the past in concerning security matters feature space changes When, the carry out incremental learning based on the change of concerning security matters feature space;
S105. incremental learning is carried out according to the variation of sample database and concerning security matters feature database;
S106. the model after study is written in model-naive Bayesian, and system is notified to be reloaded.
Wherein, the step S101 includes:
The first, model-naive Bayesian is constructed:
If the sample space D of concerning security matters text is by feature space W={ w1,w2,…,wnAnd classification space C={ c1,c2,…, cmComposition;The word for including in sample space D, that is, text, classification space C, that is, concerning security matters text level of confidentiality;To a given text d= {w1,w2,…,wl, model-naive Bayesian by calculate the text belong to posterior probability of all categories, to its generic into Row differentiates;The posterior probability of which classification is big, and the testing result of the text is exactly that corresponding classification, and discriminate is as follows:
Wherein P (ci) indicate classification prior probability;P(wj|ci) indicate in classification ciUnder the conditions of, feature wjWhat is occurred is general Rate:
Wherein | C |, | D | and | W | respectively indicate the size of classification space, sample space and feature space;count(ci) table Show and belongs to classification ciSample number,It indicates in classification ciIn there is feature wjSample number;
The second, the sample with user annotation label is identified using model-naive Bayesian, obtains each sample Recognition result.
The step S103 includes:
The first, naive Bayesian weighted model is constructed:
λj,iIndicate that j-th of feature belongs to the weight of i-th of classification in feature space, according to Bell_Lapadula model, Each feature has 4 weights, respectively corresponds disclosure, secret, secret and top secret:
Wherein TFi(wj) it is text feature wjIn ciThe word frequency occurred in classification text;IDFi(wj) it is improved inverse document Frequency;Text feature number of files in class is bigger, and the number of files occurred in other classes is smaller, then its weight is bigger.
Concerning security matters text detection is a kind of very special application scenarios, the at any time migration of time, certain passes no before this Keyword may become concerning security matters feature;And the feature of some concerning security matters before this, level of confidentiality can then gradually decrease.Therefore, it is necessary to a kind of energy Enough adapt to the learning algorithm of this variation.It is readily apparent that, it must have specified level of confidentiality that a new concerning security matters feature, which is added, (such as code name of certain action).In other words, it is very high that this article eigen, which belongs to the confidence level of the category,.One Geju City relates to It is also similar that the level of confidentiality of close feature, which reduces (such as being reduced to confidential from confidential),.Therefore, one kind is proposed in the present invention very Simple strategy is solved, and specifically, the step S104 includes:
It is new special when the concerning security matters feature level of confidentiality for having new concerning security matters feature to be added or have been friends in the past in concerning security matters feature space changes The case where sign is added: P (t is selected first from the other feature generic with new featurej|ci) the maximum feature of value, owned Information is copied to new feature, the weight λ according to step S103 to all features under the categoryj,iWith conditional probability P (wj|ci) It is reevaluated;Then P (t is selected from the other feature different classes of with new featurej|ci) the smallest feature of value, by its institute There is information to be copied to new feature, then the weight λ according to step S103 to all features under the categoryj,iWith conditional probability P (wj|ci) reevaluated;
The case where changing for old feature concerning security matters feature level of confidentiality similarly, first from generic other of variation characteristic P (t is selected in featurej|ci) the maximum feature of value, its all information is copied to transform characteristics, according to step S103 to all Weight λ of the feature under the categoryj,iWith conditional probability P (wj|ci) reevaluated;Then from different classes of with transform characteristics Other feature in select P (tj|ci) the smallest feature of value, its all information is copied to transform characteristics, then according to step Weight λ of the S103 to all features under the categoryj,iWith conditional probability P (wj|ci) reevaluated.
The step S105 includes:
Feature weight realizes incremental learning in two dimensions of sample space and feature space:
Wherein TF 'i' () and count'() indicate statistical result on sample increment collection;
Incremental learning based on feature weight obtains P (ci) and P (wj|ci) incremental learning result:
Most common feature weight learning method is TF-IDF, and still, there is no consider for traditional TF-IDF weight Distribution situation of the text feature in different classes of and same category.For example, some concerning security matters text feature can be in some classification It is a large amount of to occur, and seldom occur in other classifications, or even do not occur;Or this feature can lacking in some classification (such as secret class) Largely occur in amount file, and does not occur in same category of other texts.And it is weighted in the present invention based on naive Bayesian Model can solve the problems, such as this better, so that the study of model-naive Bayesian is more reasonable, can be substantially improved and relate to The accuracy rate of close text detection;The present invention can make feature weight in sample according to the variation of sample database and concerning security matters feature database simultaneously Two dimensions in this space and feature space realize incremental learning;In addition, the progress changed in the present invention based on concerning security matters feature space Incremental learning simply and effectively solves asking for the level of confidentiality decline of the new addition of concerning security matters feature or the concerning security matters feature haveing been friends in the past Topic.
The above is a preferred embodiment of the present invention, it should be understood that the present invention is not limited to shape described herein Formula should not be viewed as excluding other embodiments, and can be used for other combinations, modification and environment, and can be in this paper institute It states in contemplated scope, modifications can be made through the above teachings or related fields of technology or knowledge.And what those skilled in the art were carried out Modifications and changes do not depart from the spirit and scope of the present invention, then all should be within the scope of protection of the appended claims of the present invention.

Claims (7)

1. a kind of based on the concerning security matters text recognition method for improving naive Bayesian, it is characterised in that: the following steps are included:
S1. it constructs model-naive Bayesian and carries out incremental learning;
S2. the model-naive Bayesian that load incremental learning obtains;
S3. text to be identified is read;
S4. text is identified using model-naive Bayesian, and marks its corresponding level of confidentiality.
2. according to claim 1 a kind of based on the concerning security matters text recognition method for improving naive Bayesian, it is characterised in that: Further include recognition result uploading step: the recognition result of step S4 is uploaded to unified control centre.
3. according to claim 1 a kind of based on the concerning security matters text recognition method for improving naive Bayesian, it is characterised in that: The step S1 includes following sub-step:
S101. building model-naive Bayesian identifies the sample with user annotation label;
S102. the label of unified control center administrator will identify that label and user annotation compares, if it is identification Mistake, the sample and its correct label are just added to sample database;
S103. naive Bayesian weighted model is constructed;
S104. when the concerning security matters feature level of confidentiality for having new concerning security matters feature to be added or have been friends in the past in concerning security matters feature space changes, base In the carry out incremental learning that concerning security matters feature space changes;
S105. incremental learning is carried out according to the variation of sample database and concerning security matters feature database;
S106. the model after study is written in model-naive Bayesian, and system is notified to be reloaded.
4. according to claim 3 a kind of based on the concerning security matters text recognition method for improving naive Bayesian, it is characterised in that: The step S101 includes:
The first, model-naive Bayesian is constructed:
If the sample space D of concerning security matters text is by feature space W={ w1,w2,…,wnAnd classification space C={ c1,c2,…,cmGroup At;The word for including in sample space D, that is, text, classification space C, that is, concerning security matters text level of confidentiality;To a given text d={ w1, w2,…,wl, model-naive Bayesian belongs to posterior probability of all categories by calculating the text, sentences to its generic Not;The posterior probability of which classification is big, and the testing result of the text is exactly that corresponding classification, and discriminate is as follows:
Wherein P (ci) indicate classification prior probability;P(wj|ci) indicate in classification ciUnder the conditions of, feature wjThe probability of appearance:
Wherein | C |, | D | and | W | respectively indicate the size of classification space, sample space and feature space;count(ci) indicate to belong to In classification ciSample number, count (wj∧ci) indicate in classification ciIn there is feature wjSample number;
The second, the sample with user annotation label is identified using model-naive Bayesian, obtains the knowledge of each sample Other result.
5. according to claim 3 a kind of based on the concerning security matters text recognition method for improving naive Bayesian, it is characterised in that: The step S103 includes:
The first, naive Bayesian weighted model is constructed:
λj,iJ-th of feature belongs to the weight of i-th of classification in expression feature space, according to Bell_Lapadula model, each Feature has 4 weights, respectively corresponds disclosure, secret, secret and top secret:
Wherein TFi(wj) it is text feature wjIn ciThe word frequency occurred in classification text;IDFi(wj) it is improved inverse document frequency; Text feature number of files in class is bigger, and the number of files occurred in other classes is smaller, then its weight is bigger.
6. according to claim 3 a kind of based on the concerning security matters text recognition method for improving naive Bayesian, it is characterised in that: The step S104 includes:
When the concerning security matters feature level of confidentiality for having new concerning security matters feature to be added or have been friends in the past in concerning security matters feature space changes, new feature adds The case where entering: P (t is selected first from the other feature generic with new featurej|ci) the maximum feature of value, by its all information It is copied to new feature, the weight λ according to step S103 to all features under the categoryj,iWith conditional probability P (wj|ci) carry out It reevaluates;Then P (t is selected from the other feature different classes of with new featurej|ci) the smallest feature of value, by its all letter Breath is copied to new feature, then the weight λ according to step S103 to all features under the categoryj,iWith conditional probability P (wj| ci) reevaluated;
The case where changing for old feature concerning security matters feature level of confidentiality similarly, first from the other feature generic with variation characteristic Middle selection P (tj|ci) the maximum feature of value, its all information is copied to transform characteristics, according to step S103 to all features Weight λ under the categoryj,iWith conditional probability P (wj|ci) reevaluated;Then from different classes of its of transform characteristics P (t is selected in its featurej|ci) the smallest feature of value, its all information is copied to transform characteristics, then according to step S103 To weight λ of all features under the categoryj,iWith conditional probability P (wj|ci) reevaluated.
7. according to claim 3 a kind of based on the concerning security matters text recognition method for improving naive Bayesian, it is characterised in that: The step S105 includes:
Feature weight realizes incremental learning in two dimensions of sample space and feature space:
Wherein TFi' () and count'() indicate statistical result on sample increment collection;
Incremental learning based on feature weight obtains P (ci) and P (wj|ci) incremental learning result:
CN201811134941.9A 2018-09-28 2018-09-28 Secret-related text recognition method based on improved naive Bayes Active CN109145308B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811134941.9A CN109145308B (en) 2018-09-28 2018-09-28 Secret-related text recognition method based on improved naive Bayes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811134941.9A CN109145308B (en) 2018-09-28 2018-09-28 Secret-related text recognition method based on improved naive Bayes

Publications (2)

Publication Number Publication Date
CN109145308A true CN109145308A (en) 2019-01-04
CN109145308B CN109145308B (en) 2022-07-12

Family

ID=64813077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811134941.9A Active CN109145308B (en) 2018-09-28 2018-09-28 Secret-related text recognition method based on improved naive Bayes

Country Status (1)

Country Link
CN (1) CN109145308B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783873A (en) * 2020-06-30 2020-10-16 中国工商银行股份有限公司 Incremental naive Bayes model-based user portrait method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000026795A1 (en) * 1998-10-30 2000-05-11 Justsystem Pittsburgh Research Center, Inc. Method for content-based filtering of messages by analyzing term characteristics within a message
CN107480123A (en) * 2017-06-28 2017-12-15 武汉斗鱼网络科技有限公司 A kind of recognition methods, device and the computer equipment of rubbish barrage
CN107908649A (en) * 2017-10-11 2018-04-13 北京智慧星光信息技术有限公司 A kind of control method of text classification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000026795A1 (en) * 1998-10-30 2000-05-11 Justsystem Pittsburgh Research Center, Inc. Method for content-based filtering of messages by analyzing term characteristics within a message
CN107480123A (en) * 2017-06-28 2017-12-15 武汉斗鱼网络科技有限公司 A kind of recognition methods, device and the computer equipment of rubbish barrage
CN107908649A (en) * 2017-10-11 2018-04-13 北京智慧星光信息技术有限公司 A kind of control method of text classification

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HAN JOON KIM 等: "Integrating Incremental Feature Weighting into NaÏve Bayes Text Classifier", 《2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS》 *
侯凯: ""加权贝叶斯增量学习中文文本分类研究"", 《中国优秀硕士论文全文数据库》 *
饶丽丽等: "基于特征相关的改进加权朴素贝叶斯分类算法", 《厦门大学学报(自然科学版)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783873A (en) * 2020-06-30 2020-10-16 中国工商银行股份有限公司 Incremental naive Bayes model-based user portrait method and device
CN111783873B (en) * 2020-06-30 2023-08-25 中国工商银行股份有限公司 User portrait method and device based on increment naive Bayes model

Also Published As

Publication number Publication date
CN109145308B (en) 2022-07-12

Similar Documents

Publication Publication Date Title
US20200279105A1 (en) Deep learning engine and methods for content and context aware data classification
Hashemi et al. Query intent detection using convolutional neural networks
Sebastiani Text categorization
Liu et al. Adaptive co-training SVM for sentiment classification on tweets
Halgaš et al. Catching the Phish: Detecting phishing attacks using recurrent neural networks (RNNs)
US10637826B1 (en) Policy compliance verification using semantic distance and nearest neighbor search of labeled content
US20110004573A1 (en) Identifying training documents for a content classifier
CN103455545A (en) Location estimation of social network users
CN111758098B (en) Named entity identification and extraction using genetic programming
Akhter et al. Supervised ensemble learning methods towards automatically filtering Urdu fake news within social media
CN110990676A (en) Social media hotspot topic extraction method and system
CN110532390A (en) A kind of news keyword extracting method based on NER and Complex Networks Feature
CN106294861B (en) Text polymerize and shows method and system in intelligence channel towards large-scale data
CN110321707A (en) A kind of SQL injection detection method based on big data algorithm
CN114595689A (en) Data processing method, data processing device, storage medium and computer equipment
CN109145308A (en) A kind of concerning security matters text recognition method based on improvement naive Bayesian
CN109543038A (en) A kind of sentiment analysis method applied to text data
US20230281306A1 (en) System and method for detecting leaked documents on a computer network
Prilepok et al. Spam detection using data compression and signatures
Chai et al. Automatically measuring the quality of user generated content in forums
CN116578708A (en) Paper data name disambiguation algorithm based on graph neural network
CN111368092A (en) Knowledge graph construction method based on trusted webpage resources
Jahnavi et al. A cogitate study on text mining
CN107491424B (en) Chinese document gene matching method based on multi-weight system
CN112434126B (en) Information processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant