CN109145308A - A kind of concerning security matters text recognition method based on improvement naive Bayesian - Google Patents
A kind of concerning security matters text recognition method based on improvement naive Bayesian Download PDFInfo
- Publication number
- CN109145308A CN109145308A CN201811134941.9A CN201811134941A CN109145308A CN 109145308 A CN109145308 A CN 109145308A CN 201811134941 A CN201811134941 A CN 201811134941A CN 109145308 A CN109145308 A CN 109145308A
- Authority
- CN
- China
- Prior art keywords
- feature
- concerning security
- security matters
- text
- naive bayesian
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Abstract
The invention discloses a kind of based on the concerning security matters text recognition method for improving naive Bayesian, comprising the following steps: S1. building model-naive Bayesian simultaneously carries out incremental learning;S2. the model-naive Bayesian that load incremental learning obtains;S3. text to be identified is read;S4. text is identified using model-naive Bayesian, and marks its corresponding level of confidentiality.In the present invention, makes study more reasonable based on naive Bayesian weighted model, and propose the incremental learning scheme of feature weight, the accuracy rate of concerning security matters text detection can be substantially improved;Based on the carry out incremental learning that concerning security matters feature space changes, simply and effectively solve the problems, such as that the level of confidentiality for the concerning security matters feature that new concerning security matters feature is added or has been friends in the past declines.
Description
Technical field
The present invention relates to concerning security matters text identifications, more particularly to a kind of based on the concerning security matters text identification for improving naive Bayesian
Method.
Background technique
With the development of information technology, can be realized a large amount of comprehensive office, research and production business information system gradually
It appears in social life and work, a large amount of sensitive data and information is store in information system.How classified information is prevented
It is leaked to the external world by internet, is currently highly desirable solve the problems, such as.
The automatic detection of concerning security matters text is the effective technology means to solve the above problems.According to Bell_Lapadula model,
Current classified information is generally divided into disclosure, secret, secret and top-secret four grades.When concerning security matters text hand on network
When the change of current turns (such as official document, Email etc.), which can effectively detect level of confidentiality belonging to the text.When detecting this
After the level of confidentiality of text, then the level of confidentiality label demarcated with user oneself compares, and can find the information flow of the concerning security matters text
It is whether legal.For example, if text information labeling is " disclosure ", and the level of confidentiality that automatic detection algorithm detects by user
It is " secret ", then it is illegal to can determine that the behavior belongs to.
Naive Bayesian (Bayes) be current text detection field one of main stream approach.But based on simplicity
Bayes realizes that the automatic detection of concerning security matters text needs to solve two hang-ups: (1) since the particularity of confidential document (cannot be random
Check), it is difficult to it obtains complete mark sample and model-naive Bayesian is learnt;(2) the concerning security matters feature in text (relates to
Close keyword) it can change with time-shift, the keyword of concerning security matters can not become new concerning security matters feature before some;And
It was the word of concerning security matters feature before some, its level of confidentiality may can be gradually decreased with the time, and there is presently no methods to be able to solve
The problem.
Summary of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of based on the concerning security matters text for improving naive Bayesian
This recognition methods.
The purpose of the present invention is achieved through the following technical solutions: a kind of based on the concerning security matters text for improving naive Bayesian
This recognition methods, it is characterised in that: the following steps are included:
S1. it constructs model-naive Bayesian and carries out incremental learning;
S2. the model-naive Bayesian that load incremental learning obtains;
S3. text to be identified is read;
S4. text is identified using model-naive Bayesian, and marks its corresponding level of confidentiality.
Further, the concerning security matters text recognition method further includes recognition result uploading step: by the identification knot of step S4
Fruit uploads to unified control centre.
Further, the step S1 includes following sub-step:
S101. building model-naive Bayesian identifies the sample with user annotation label;
S102. the label of unified control center administrator will identify that label and user annotation compares, if it is
Identification mistake, the sample and its correct label are just added to sample database;
S103. naive Bayesian weighted model is constructed;
S104. the concerning security matters feature level of confidentiality for having new concerning security matters feature to be added or have been friends in the past in concerning security matters feature space changes
When, the carry out incremental learning based on the change of concerning security matters feature space;
S105. incremental learning is carried out according to the variation of sample database and concerning security matters feature database;
S106. the model after study is written in model-naive Bayesian, and system is notified to be reloaded.
Closer, the step S101 includes:
The first, model-naive Bayesian is constructed:
If the sample space D of concerning security matters text is by feature space W={ w1,w2,…,wnAnd classification space C={ c1,c2,…,
cmComposition;The word for including in sample space D, that is, text, classification space C, that is, concerning security matters text level of confidentiality;To a given text d=
{w1,w2,…,wl, model-naive Bayesian by calculate the text belong to posterior probability of all categories, to its generic into
Row differentiates;The posterior probability of which classification is big, and the testing result of the text is exactly that corresponding classification, and discriminate is as follows:
Wherein P (ci) indicate classification prior probability;P(wj|ci) indicate in classification ciUnder the conditions of, feature wjWhat is occurred is general
Rate:
Wherein | C |, | D | and | W | respectively indicate the size of classification space, sample space and feature space;count(ci) table
Show and belongs to classification ciSample number,It indicates in classification ciIn there is feature wjSample number;
The second, the sample with user annotation label is identified using model-naive Bayesian, obtains each sample
Recognition result.
The step S103 includes:
The first, naive Bayesian weighted model is constructed:
λj,iIndicate that j-th of feature belongs to the weight of i-th of classification in feature space, according to Bell_Lapadula model,
Each feature has 4 weights, respectively corresponds disclosure, secret, secret and top secret:
Wherein TFi(wj) it is text feature wjIn ciThe word frequency occurred in classification text;IDFi(wj) it is improved inverse document
Frequency;Text feature number of files in class is bigger, and the number of files occurred in other classes is smaller, then its weight is bigger.
The step S104 includes:
It is new special when the concerning security matters feature level of confidentiality for having new concerning security matters feature to be added or have been friends in the past in concerning security matters feature space changes
The case where sign is added: P (t is selected first from the other feature generic with new featurej|ci) the maximum feature of value, owned
Information is copied to new feature, the weight λ according to step S103 to all features under the categoryj,iWith conditional probability P (wj|ci)
It is reevaluated;Then P (t is selected from the other feature different classes of with new featurej|ci) the smallest feature of value, by its institute
There is information to be copied to new feature, then the weight λ according to step S103 to all features under the categoryj,iWith conditional probability P
(wj|ci) reevaluated;
The case where changing for old feature concerning security matters feature level of confidentiality similarly, first from generic other of variation characteristic
P (t is selected in featurej|ci) the maximum feature of value, its all information is copied to transform characteristics, according to step S103 to all
Weight λ of the feature under the categoryj,iWith conditional probability P (wj|ci) reevaluated;Then from different classes of with transform characteristics
Other feature in select P (tj|ci) the smallest feature of value, its all information is copied to transform characteristics, then according to step
Weight λ of the S103 to all features under the categoryj,iWith conditional probability P (wj|ci) reevaluated.
The step S105 includes:
Feature weight realizes incremental learning in two dimensions of sample space and feature space:
Wherein TF 'i() and count'() indicate the statistical result on sample increment collection;
Incremental learning based on feature weight obtains P (ci) and P (wj|ci) incremental learning result:
The beneficial effects of the present invention are: making study more reasonable based on naive Bayesian weighted model, and propose spy
The incremental learning scheme for levying weight, can be substantially improved the accuracy rate of concerning security matters text detection;Changed based on concerning security matters feature space
Incremental learning is carried out, simply and effectively solves the level of confidentiality decline of the new addition of concerning security matters feature or the concerning security matters feature haveing been friends in the past
Problem.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention;
Fig. 2 is the flow chart that model-naive Bayesian carries out incremental learning.
Specific embodiment
Technical solution of the present invention is described in further detail with reference to the accompanying drawing, but protection scope of the present invention is not limited to
It is as described below.
As shown in Figure 1, a kind of based on the concerning security matters text recognition method for improving naive Bayesian, comprising the following steps:
S1. it constructs model-naive Bayesian and carries out incremental learning;
S2. the model-naive Bayesian that load incremental learning obtains;
S3. text to be identified is read;
S4. text is identified using model-naive Bayesian, and marks its corresponding level of confidentiality.
In embodiments herein, the concerning security matters text recognition method further includes recognition result uploading step: by step
The recognition result of S4 uploads to unified control centre.
As described in Figure 2, the step S1 includes following sub-step:
S101. building model-naive Bayesian identifies the sample with user annotation label;
S102. the label of unified control center administrator will identify that label and user annotation compares, if it is
Identification mistake, the sample and its correct label are just added to sample database;
S103. naive Bayesian weighted model is constructed;
S104. the concerning security matters feature level of confidentiality for having new concerning security matters feature to be added or have been friends in the past in concerning security matters feature space changes
When, the carry out incremental learning based on the change of concerning security matters feature space;
S105. incremental learning is carried out according to the variation of sample database and concerning security matters feature database;
S106. the model after study is written in model-naive Bayesian, and system is notified to be reloaded.
Wherein, the step S101 includes:
The first, model-naive Bayesian is constructed:
If the sample space D of concerning security matters text is by feature space W={ w1,w2,…,wnAnd classification space C={ c1,c2,…,
cmComposition;The word for including in sample space D, that is, text, classification space C, that is, concerning security matters text level of confidentiality;To a given text d=
{w1,w2,…,wl, model-naive Bayesian by calculate the text belong to posterior probability of all categories, to its generic into
Row differentiates;The posterior probability of which classification is big, and the testing result of the text is exactly that corresponding classification, and discriminate is as follows:
Wherein P (ci) indicate classification prior probability;P(wj|ci) indicate in classification ciUnder the conditions of, feature wjWhat is occurred is general
Rate:
Wherein | C |, | D | and | W | respectively indicate the size of classification space, sample space and feature space;count(ci) table
Show and belongs to classification ciSample number,It indicates in classification ciIn there is feature wjSample number;
The second, the sample with user annotation label is identified using model-naive Bayesian, obtains each sample
Recognition result.
The step S103 includes:
The first, naive Bayesian weighted model is constructed:
λj,iIndicate that j-th of feature belongs to the weight of i-th of classification in feature space, according to Bell_Lapadula model,
Each feature has 4 weights, respectively corresponds disclosure, secret, secret and top secret:
Wherein TFi(wj) it is text feature wjIn ciThe word frequency occurred in classification text;IDFi(wj) it is improved inverse document
Frequency;Text feature number of files in class is bigger, and the number of files occurred in other classes is smaller, then its weight is bigger.
Concerning security matters text detection is a kind of very special application scenarios, the at any time migration of time, certain passes no before this
Keyword may become concerning security matters feature;And the feature of some concerning security matters before this, level of confidentiality can then gradually decrease.Therefore, it is necessary to a kind of energy
Enough adapt to the learning algorithm of this variation.It is readily apparent that, it must have specified level of confidentiality that a new concerning security matters feature, which is added,
(such as code name of certain action).In other words, it is very high that this article eigen, which belongs to the confidence level of the category,.One Geju City relates to
It is also similar that the level of confidentiality of close feature, which reduces (such as being reduced to confidential from confidential),.Therefore, one kind is proposed in the present invention very
Simple strategy is solved, and specifically, the step S104 includes:
It is new special when the concerning security matters feature level of confidentiality for having new concerning security matters feature to be added or have been friends in the past in concerning security matters feature space changes
The case where sign is added: P (t is selected first from the other feature generic with new featurej|ci) the maximum feature of value, owned
Information is copied to new feature, the weight λ according to step S103 to all features under the categoryj,iWith conditional probability P (wj|ci)
It is reevaluated;Then P (t is selected from the other feature different classes of with new featurej|ci) the smallest feature of value, by its institute
There is information to be copied to new feature, then the weight λ according to step S103 to all features under the categoryj,iWith conditional probability P
(wj|ci) reevaluated;
The case where changing for old feature concerning security matters feature level of confidentiality similarly, first from generic other of variation characteristic
P (t is selected in featurej|ci) the maximum feature of value, its all information is copied to transform characteristics, according to step S103 to all
Weight λ of the feature under the categoryj,iWith conditional probability P (wj|ci) reevaluated;Then from different classes of with transform characteristics
Other feature in select P (tj|ci) the smallest feature of value, its all information is copied to transform characteristics, then according to step
Weight λ of the S103 to all features under the categoryj,iWith conditional probability P (wj|ci) reevaluated.
The step S105 includes:
Feature weight realizes incremental learning in two dimensions of sample space and feature space:
Wherein TF 'i' () and count'() indicate statistical result on sample increment collection;
Incremental learning based on feature weight obtains P (ci) and P (wj|ci) incremental learning result:
Most common feature weight learning method is TF-IDF, and still, there is no consider for traditional TF-IDF weight
Distribution situation of the text feature in different classes of and same category.For example, some concerning security matters text feature can be in some classification
It is a large amount of to occur, and seldom occur in other classifications, or even do not occur;Or this feature can lacking in some classification (such as secret class)
Largely occur in amount file, and does not occur in same category of other texts.And it is weighted in the present invention based on naive Bayesian
Model can solve the problems, such as this better, so that the study of model-naive Bayesian is more reasonable, can be substantially improved and relate to
The accuracy rate of close text detection;The present invention can make feature weight in sample according to the variation of sample database and concerning security matters feature database simultaneously
Two dimensions in this space and feature space realize incremental learning;In addition, the progress changed in the present invention based on concerning security matters feature space
Incremental learning simply and effectively solves asking for the level of confidentiality decline of the new addition of concerning security matters feature or the concerning security matters feature haveing been friends in the past
Topic.
The above is a preferred embodiment of the present invention, it should be understood that the present invention is not limited to shape described herein
Formula should not be viewed as excluding other embodiments, and can be used for other combinations, modification and environment, and can be in this paper institute
It states in contemplated scope, modifications can be made through the above teachings or related fields of technology or knowledge.And what those skilled in the art were carried out
Modifications and changes do not depart from the spirit and scope of the present invention, then all should be within the scope of protection of the appended claims of the present invention.
Claims (7)
1. a kind of based on the concerning security matters text recognition method for improving naive Bayesian, it is characterised in that: the following steps are included:
S1. it constructs model-naive Bayesian and carries out incremental learning;
S2. the model-naive Bayesian that load incremental learning obtains;
S3. text to be identified is read;
S4. text is identified using model-naive Bayesian, and marks its corresponding level of confidentiality.
2. according to claim 1 a kind of based on the concerning security matters text recognition method for improving naive Bayesian, it is characterised in that:
Further include recognition result uploading step: the recognition result of step S4 is uploaded to unified control centre.
3. according to claim 1 a kind of based on the concerning security matters text recognition method for improving naive Bayesian, it is characterised in that:
The step S1 includes following sub-step:
S101. building model-naive Bayesian identifies the sample with user annotation label;
S102. the label of unified control center administrator will identify that label and user annotation compares, if it is identification
Mistake, the sample and its correct label are just added to sample database;
S103. naive Bayesian weighted model is constructed;
S104. when the concerning security matters feature level of confidentiality for having new concerning security matters feature to be added or have been friends in the past in concerning security matters feature space changes, base
In the carry out incremental learning that concerning security matters feature space changes;
S105. incremental learning is carried out according to the variation of sample database and concerning security matters feature database;
S106. the model after study is written in model-naive Bayesian, and system is notified to be reloaded.
4. according to claim 3 a kind of based on the concerning security matters text recognition method for improving naive Bayesian, it is characterised in that:
The step S101 includes:
The first, model-naive Bayesian is constructed:
If the sample space D of concerning security matters text is by feature space W={ w1,w2,…,wnAnd classification space C={ c1,c2,…,cmGroup
At;The word for including in sample space D, that is, text, classification space C, that is, concerning security matters text level of confidentiality;To a given text d={ w1,
w2,…,wl, model-naive Bayesian belongs to posterior probability of all categories by calculating the text, sentences to its generic
Not;The posterior probability of which classification is big, and the testing result of the text is exactly that corresponding classification, and discriminate is as follows:
Wherein P (ci) indicate classification prior probability;P(wj|ci) indicate in classification ciUnder the conditions of, feature wjThe probability of appearance:
Wherein | C |, | D | and | W | respectively indicate the size of classification space, sample space and feature space;count(ci) indicate to belong to
In classification ciSample number, count (wj∧ci) indicate in classification ciIn there is feature wjSample number;
The second, the sample with user annotation label is identified using model-naive Bayesian, obtains the knowledge of each sample
Other result.
5. according to claim 3 a kind of based on the concerning security matters text recognition method for improving naive Bayesian, it is characterised in that:
The step S103 includes:
The first, naive Bayesian weighted model is constructed:
λj,iJ-th of feature belongs to the weight of i-th of classification in expression feature space, according to Bell_Lapadula model, each
Feature has 4 weights, respectively corresponds disclosure, secret, secret and top secret:
Wherein TFi(wj) it is text feature wjIn ciThe word frequency occurred in classification text;IDFi(wj) it is improved inverse document frequency;
Text feature number of files in class is bigger, and the number of files occurred in other classes is smaller, then its weight is bigger.
6. according to claim 3 a kind of based on the concerning security matters text recognition method for improving naive Bayesian, it is characterised in that:
The step S104 includes:
When the concerning security matters feature level of confidentiality for having new concerning security matters feature to be added or have been friends in the past in concerning security matters feature space changes, new feature adds
The case where entering: P (t is selected first from the other feature generic with new featurej|ci) the maximum feature of value, by its all information
It is copied to new feature, the weight λ according to step S103 to all features under the categoryj,iWith conditional probability P (wj|ci) carry out
It reevaluates;Then P (t is selected from the other feature different classes of with new featurej|ci) the smallest feature of value, by its all letter
Breath is copied to new feature, then the weight λ according to step S103 to all features under the categoryj,iWith conditional probability P (wj|
ci) reevaluated;
The case where changing for old feature concerning security matters feature level of confidentiality similarly, first from the other feature generic with variation characteristic
Middle selection P (tj|ci) the maximum feature of value, its all information is copied to transform characteristics, according to step S103 to all features
Weight λ under the categoryj,iWith conditional probability P (wj|ci) reevaluated;Then from different classes of its of transform characteristics
P (t is selected in its featurej|ci) the smallest feature of value, its all information is copied to transform characteristics, then according to step S103
To weight λ of all features under the categoryj,iWith conditional probability P (wj|ci) reevaluated.
7. according to claim 3 a kind of based on the concerning security matters text recognition method for improving naive Bayesian, it is characterised in that:
The step S105 includes:
Feature weight realizes incremental learning in two dimensions of sample space and feature space:
Wherein TFi' () and count'() indicate statistical result on sample increment collection;
Incremental learning based on feature weight obtains P (ci) and P (wj|ci) incremental learning result:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811134941.9A CN109145308B (en) | 2018-09-28 | 2018-09-28 | Secret-related text recognition method based on improved naive Bayes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811134941.9A CN109145308B (en) | 2018-09-28 | 2018-09-28 | Secret-related text recognition method based on improved naive Bayes |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109145308A true CN109145308A (en) | 2019-01-04 |
CN109145308B CN109145308B (en) | 2022-07-12 |
Family
ID=64813077
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811134941.9A Active CN109145308B (en) | 2018-09-28 | 2018-09-28 | Secret-related text recognition method based on improved naive Bayes |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109145308B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111783873A (en) * | 2020-06-30 | 2020-10-16 | 中国工商银行股份有限公司 | Incremental naive Bayes model-based user portrait method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000026795A1 (en) * | 1998-10-30 | 2000-05-11 | Justsystem Pittsburgh Research Center, Inc. | Method for content-based filtering of messages by analyzing term characteristics within a message |
CN107480123A (en) * | 2017-06-28 | 2017-12-15 | 武汉斗鱼网络科技有限公司 | A kind of recognition methods, device and the computer equipment of rubbish barrage |
CN107908649A (en) * | 2017-10-11 | 2018-04-13 | 北京智慧星光信息技术有限公司 | A kind of control method of text classification |
-
2018
- 2018-09-28 CN CN201811134941.9A patent/CN109145308B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000026795A1 (en) * | 1998-10-30 | 2000-05-11 | Justsystem Pittsburgh Research Center, Inc. | Method for content-based filtering of messages by analyzing term characteristics within a message |
CN107480123A (en) * | 2017-06-28 | 2017-12-15 | 武汉斗鱼网络科技有限公司 | A kind of recognition methods, device and the computer equipment of rubbish barrage |
CN107908649A (en) * | 2017-10-11 | 2018-04-13 | 北京智慧星光信息技术有限公司 | A kind of control method of text classification |
Non-Patent Citations (3)
Title |
---|
HAN JOON KIM 等: "Integrating Incremental Feature Weighting into NaÏve Bayes Text Classifier", 《2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS》 * |
侯凯: ""加权贝叶斯增量学习中文文本分类研究"", 《中国优秀硕士论文全文数据库》 * |
饶丽丽等: "基于特征相关的改进加权朴素贝叶斯分类算法", 《厦门大学学报(自然科学版)》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111783873A (en) * | 2020-06-30 | 2020-10-16 | 中国工商银行股份有限公司 | Incremental naive Bayes model-based user portrait method and device |
CN111783873B (en) * | 2020-06-30 | 2023-08-25 | 中国工商银行股份有限公司 | User portrait method and device based on increment naive Bayes model |
Also Published As
Publication number | Publication date |
---|---|
CN109145308B (en) | 2022-07-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200279105A1 (en) | Deep learning engine and methods for content and context aware data classification | |
Hashemi et al. | Query intent detection using convolutional neural networks | |
Sebastiani | Text categorization | |
Liu et al. | Adaptive co-training SVM for sentiment classification on tweets | |
Halgaš et al. | Catching the Phish: Detecting phishing attacks using recurrent neural networks (RNNs) | |
US10637826B1 (en) | Policy compliance verification using semantic distance and nearest neighbor search of labeled content | |
US20110004573A1 (en) | Identifying training documents for a content classifier | |
CN103455545A (en) | Location estimation of social network users | |
CN111758098B (en) | Named entity identification and extraction using genetic programming | |
Akhter et al. | Supervised ensemble learning methods towards automatically filtering Urdu fake news within social media | |
CN110990676A (en) | Social media hotspot topic extraction method and system | |
CN110532390A (en) | A kind of news keyword extracting method based on NER and Complex Networks Feature | |
CN106294861B (en) | Text polymerize and shows method and system in intelligence channel towards large-scale data | |
CN110321707A (en) | A kind of SQL injection detection method based on big data algorithm | |
CN114595689A (en) | Data processing method, data processing device, storage medium and computer equipment | |
CN109145308A (en) | A kind of concerning security matters text recognition method based on improvement naive Bayesian | |
CN109543038A (en) | A kind of sentiment analysis method applied to text data | |
US20230281306A1 (en) | System and method for detecting leaked documents on a computer network | |
Prilepok et al. | Spam detection using data compression and signatures | |
Chai et al. | Automatically measuring the quality of user generated content in forums | |
CN116578708A (en) | Paper data name disambiguation algorithm based on graph neural network | |
CN111368092A (en) | Knowledge graph construction method based on trusted webpage resources | |
Jahnavi et al. | A cogitate study on text mining | |
CN107491424B (en) | Chinese document gene matching method based on multi-weight system | |
CN112434126B (en) | Information processing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |