CN103870758A - Classified information security classification affiliation method based on word classification combined judgment and probability statistics - Google Patents

Classified information security classification affiliation method based on word classification combined judgment and probability statistics Download PDF

Info

Publication number
CN103870758A
CN103870758A CN201410103973.8A CN201410103973A CN103870758A CN 103870758 A CN103870758 A CN 103870758A CN 201410103973 A CN201410103973 A CN 201410103973A CN 103870758 A CN103870758 A CN 103870758A
Authority
CN
China
Prior art keywords
word
level
confidentiality
article
condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410103973.8A
Other languages
Chinese (zh)
Other versions
CN103870758B (en
Inventor
陈建
欧阳国华
杨兴
李楠
史章军
向音
吕慧芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201410103973.8A priority Critical patent/CN103870758B/en
Publication of CN103870758A publication Critical patent/CN103870758A/en
Application granted granted Critical
Publication of CN103870758B publication Critical patent/CN103870758B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/604Tools and structures for managing or administering access control systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a classified information security classification affiliation method based on word classification combined judgment and probability statistics. The method comprises the steps: an artificial learning classified affiliation process is simulated, a classified condition data base and a word classification data base are built, a classified condition is compared with a word combination of an article to be analyzed to judge a classified level according to a secrecy regulation in a way of taking each class of the word combination as a classified necessary condition. The statement content of the article is analyzed by a computer, the grammar expression of a statement is neglected, the statement is abstracted into a logic combination of words, the classified level of the article is judged by contrasting with the combination condition of the secrecy regulation, and a feasible basis is provided for objectively and rapidly judging the classified article and the classified level.

Description

Sort out the classified information level of confidentiality affiliation method of combination judgement and probability statistics based on word
Technical field
The present invention relates to the level of confidentiality ownership technology of classified information, is a kind of classified information level of confidentiality affiliation method of sorting out combination judgement and probability statistics based on word specifically.
Background technology
The fixed close mode of traditional file, owing to lacking the effective technology means of level of confidentiality identification, inaccurate to the assurance of level of confidentiality boundary when fixed close, fixed close work is subjective.For the similar file of content, different identification people's identifying way difference, identification angle difference, identification, according to different, cause assert that the possibility of result there are differences, and have a strong impact on seriousness and the authority of Liao Wo army level of confidentiality identification.
China's basis for IT application facility, through building for many years, has formed fairly largely, and the department of most government, army has built the systems such as WWW, FTP, DNS, Email, OA.Objective fact reflects, a lot of office clerks have formed the custom of writing, preserve, transmit file with Word (as WORD, PPT, TXT etc.).E-file has become important information carrier and the transmission method of all departments of army and other various tissues.Clearly, information turns to all departments' routine work and has brought convenience, has also increased substantially work efficiency simultaneously.But, in enjoying the convenience that computing machine brings, also there is being subject at present the information security issue of extensive concern.Because government, army relate to the information of a large amount of levels of confidentiality in management operating, in order effectively to guarantee normal operation and the Information Security of all departments, be necessary that the information security centered by classified information is implemented to rank ownership accurately and effectively divides, so that the sphere of circulation of standardize information.Rely on computer technology, solve the problem that current level of confidentiality authentication method is single, subjectivity is strong, for level of confidentiality evaluation work provides scientific basis, improve level of confidentiality and identify work efficiency, realizing concerning security matters deciding grade and level digitizing, information categorization electronization, aid decision making intellectuality becomes the problem that solves of being eager at present already.
Summary of the invention
Technical matters to be solved by this invention is to solve the problem that current level of confidentiality authentication method is single, subjectivity is strong.Be reached for level of confidentiality evaluation work scientific basis is provided, improve level of confidentiality and identify work efficiency, realize maintain secrecy deciding grade and level digitizing, the intelligentized object of aid decision making.A kind of classified information level of confidentiality affiliation method of sorting out combination judgement and probability statistics based on word is provided.
The described classified information level of confidentiality affiliation method of sorting out combination judgement and probability statistics based on word, is characterized in that: carry out in turn in the steps below:
The first step: set up level of confidentiality condition database:
Analyze one by one security regulation, and the collection article relevant to this security regulation, classified information corresponding with regulations in article is summarized as to related term and the related term combination of some necessary conditions, comprise the combination of related term and related term, relate to condition classification, relate to field, corresponding regulations numbering sets up logic association, typing level of confidentiality condition database; Carry out independent collection according to " top secret condition database ", " confidential condition database ", " confidential condition database " three word banks respectively;
Second step: set up and enrich word and sort out database:
(1), in the article relevant to corresponding security regulation, add up the combination of the necessary condition that all these regulations relate to, and be summarized as the large class of some conditions;
(2), analyze the large class of each condition, determine the set of the class that the large class of each condition comprises; Each word is sorted out to the subset that is decomposed into step by step more some classes, until set can not divide again;
(3), analyze each end subset, list wherein representative word or phrase, set up word according to the logical relation of subordinate step by step and sort out database;
(4), sort out database and read word or phrase from word, use and grab word technology, according to representing word or phrase, scan existing level of confidentiality article, the class of the word of preserving according to word classification database captures concrete vocabulary, gets rid of wrong word, mistake word, enriches word and sorts out database;
The 3rd step: level of confidentiality article undetermined is just determined to level of confidentiality:
(1), paragraph or the statement of scanning article, utilize regular expression statement to sort out at word the information of having sorted out in database, according to this information characteristics, the word that meets word in article statement and sort out database is extracted;
(2), index terms sorts out database, judge the class that vocabulary is affiliated;
(3), determine the combination of the class in statement or paragraph;
(4) whether the combination that, judges class in statement or paragraph meets the arbitrary combination condition in level of confidentiality condition database completely, satisfied assert that the level of confidentiality of part is the level of confidentiality of this combination condition place database under this statement or paragraph, the concerning security matters rank of article is according to the superlative degree definition of level of confidentiality among whole statements or paragraph in article, the sequence of level of confidentiality is followed successively by top-secret > secret > secret, if do not meet any concerning security matters condition, not concerning security matters of article;
The 4th step: determine level of confidentiality: while occurring in article that many places meet secret or confidential condition, determined whether article level of confidentiality to upgrade by following manner:
(1), in different field, finding out concerning security matters field is an article m piece of writing of i, m>=500, analyze the classified information directly related with this field, find that the situation that level of confidentiality raises is a k piece of writing, in this field, the concerning security matters article level of confidentiality needed minimum information number that rises is b i,
B i=MIN(gathers { a ij),
Be expressed as the minimum value in every piece of classified information number in the k piece of writing level of confidentiality rising article in the i of concerning security matters field, wherein a ijrepresent the classified information number of the j piece of writing article in the i of concerning security matters field, wherein j represents the number among 1~k;
(2), the confidential document that is non-top secret for preliminary judgement, according to formula
α = c i b i ,
Wherein, c irepresent the classified information number in the i of field in article, when the upgrading of article level of confidentiality is judged in level of confidentiality promotion condition α>=1 item.
Further, in the 3rd step, if the first of article determined level of confidentiality and artificially set and be not inconsistent, need to sort out the combination of adding new concerning security matters vocabulary or word classification in database at word, associated neologisms, the class of neologisms and concrete security regulation when interpolation.
Prioritization scheme is, in the 4th step, in the time of α < 1, setting Optimal error rate is β, and in the time of 1-β≤α < 1, the overall level of confidentiality of this article is for rising level of confidentiality, and the calculated value of β is: ( 5 - 1 2 ) 4 &ap; ( 0.618 ) 4 = 14.58659418 % .
The present invention, by the statement content of Computer Analysis article, reaches the grammatical representation of ignoring statement, becomes by abstract statement the logical combination that word is sorted out, and contrasts the combination condition of security stipulation, judges article classified information level of confidentiality.
The foundation simulation of concerning security matters condition database manually according to security regulation to existing concerning security matters article learning process, by abstract concerning security matters condition be the word combination of the combination condition in different field, using this as necessary condition, meet combination part of speech in the field necessary condition be judged to be to meet concerning security matters condition, can manually repair combination condition in backstage.
Whether meet level of confidentiality condition not simply by article to be evaluated and the contrast of concerning security matters keyword, key is concerning security matters regulations abstract to become the combination to the each side condition in different field, the classification of the necessary condition of each side forms respectively data set in the field, in the classification combination as necessary condition the vocabulary of grabbing or phrase met respectively the requirement of classified information, this combination has met the requirement of concerning security matters conditions, these combination conditions are the conditions that are subordinated to respectively in different field, only in a territory, meet concerning security matters combination condition and just met at last place's concerning security matters, these concerning security matters have formed concerning security matters condition database.Thereby it is rigorous feasible that concerning security matters are analyzed, break through the external large software technology company intellectual analysis barrier of eventually being broken through that studies for a long period of time.
The article upgrading level of confidentiality that occurs confidential or confidential information for many places provides evaluation method, and this evaluation method has been learnt the process of artificial upgrading level of confidentiality equally.In each specific concerning security matters field, sum up the rule that allows level of confidentiality upgrading, broken through the classified information quantity that this rule limits and can allow level of confidentiality upgrading, the article in Optimal error rate is as scalable level of confidentiality article.
Whole concerning security matters evaluation process is rigorous, reliable, quick, objective, for computing machine is objective, large quantities of concerning security matters articles of fast processing provide practicable approach to concerning security matters automatic measure grading.
Accompanying drawing explanation
Fig. 1 is that database establishment process flow diagram sorted out in level of confidentiality condition database and word,
Fig. 2 is that database classified information ownership structural representation sorted out in word,
Fig. 3 is that data classified information typing schematic diagram sorted out in word,
Fig. 4 is article concerning security matters rank decision flowchart,
Fig. 5 is that database coding and structural representation sorted out in word,
Fig. 6 is level of confidentiality conditional combination schematic diagram.
Embodiment
Below in conjunction with embodiment and accompanying drawing, the present invention is further described:
General thought of the present invention is " enrich colony with individuality, judge individuality with colony, based on probability statistics, judge level of confidentiality ownership ".Because the narration of rules is high level overviews, abstract, why people can understand clause, is because of for people is at contacted confidential document before, has set up the relation of secret clause and concrete character express by empirical learning.If attempt to allow computing machine judge the ownership that article is intensive, people's cognitive process must be converted to the mode that computing machine can be identified.Want this target, must be through following three phases.
1. the empirical learning stage:
1.1 according to article title, a large amount of articles that meet a certain security regulation of collecting.
The reason why 1.2 manual analysis this articles are judged as this level of confidentiality is classified information.
1.3 are manually decomposed into classified information word and word combination.
1.4 repeat 1-3 step one by one according to security regulation.
1.5 sort out word and word combination.
1.6 set up word sorts out database.
Utilize the word of having sorted out, build up word and sort out database, utilize computing machine to grab word technology and find how similar word, enrich database.Word classification is herein that the word class of classified information is decomposed into place class, weapon class, Building class, tool-class, main body class, unit class, behavior class, direction class etc. according to security regulation.Word is sorted out database classified information ownership structural representation as shown in Figure 2.Classified information typing schematic diagram in enormous quantities as shown in Figure 3.Word classification database coding and structural representation are as shown in Figure 5.
Suppose a=main body class word, b=behavior class word, c=place class word, d=weapon class word, e=Building class word, f=tool-class word, g=direction class word, h=quantity class word.
1.7 set up level of confidentiality condition database.According to word classification database, word being combined abstract is that combination sorted out in word, word is sorted out and combined the concrete clause that is associated with security regulation.The Preliminary Construction mode of word classification database and level of confidentiality condition database as shown in Figure 1.Level of confidentiality conditional combination schematic diagram as shown in Figure 6.
If all information combination in level of confidentiality condition database are Φ=g(x), x ∈ (x 1, x 2, x 3), x 1represent secret, x 2represent secret, x 3represent top secret, that is: g(x 1) the expression combination of all confidential information in representation database, g(x 2) the expression combination of all confidential information in representation database, g(x 3) the expression combination of all top secret information in representation database.
Specific practice is:
The first step, analyzes security regulation, collects related article, by abstract classified information corresponding with regulations in article be the combination of some necessary conditions, by the combination of these necessary conditions and corresponding regulations numbering typing level of confidentiality condition database, and set up logic association, as Fig. 1 process 1. as shown in.This database comprises three word banks, is respectively " top secret condition database ", " confidential condition database ", " confidential condition database ", as shown in Figure 6.
For example, can, by " the planning policy of strategy, campaign ", resolve to the necessary condition combinations such as time, place, main body, weapon, behavior, then by regulations and necessary condition combination input database.
Second step, the necessary condition that all regulations are related to is added up, and by its abstract collecting as the large class of some conditions.As Fig. 1 process 2., process 3. as shown in.
For example, time class, place class, main body class, weapon class, behavior class, quantity class, verb classification, language class etc.As Fig. 2 process 1. as shown in.
The 3rd step, analyzes the large class of each condition, determines word classification set that it comprises.As Fig. 1 process 4. as shown in.
For example, " weapon class " can be decomposed into " land battle weapon class ", " air-to-air armament class ", " sea warfare weapon class " etc.As Fig. 2 process 2. as shown in.
The 4th step, sorts out a certain word to be decomposed into the subset that some words are sorted out, if subset also can continue to decompose, continues to decompose until set can not divide again.As Fig. 1 process 5., process 6., process 7. as shown in.
For example, " sea warfare weapon class " can be decomposed into " submarine class ", " destroyer class ", " escort vessel class ", " aircraft carrier ", " comprehensive carrier " etc. again, and " escort vessel class " continues to be subdivided into " east of a river level ", " the triumphant I type in river " etc. again.As Fig. 2 process 3., process 4. as shown in.
The 5th step, analyzes each subset, lists word representative in subset or phrase.As Fig. 1 process 8. as shown in.
For example, the representative phrase in " the triumphant I type in river " is " 529 warship " or " 054A " or " Zhoushan number ".As Fig. 2 process 5. as shown in.
The 6th step, sets up word according to the subordinate logical relation of " set sorted out in the large class of condition, word, word is sorted out subset, represented word or phrase " and sorts out database.As Fig. 1 process 9., shown in Fig. 5.
The 7th step, sorts out database and reads and represent word or phrase from word.As Fig. 3 process 1. as shown in.
For example,---" sea warfare weapon class "---" escort vessel class "---" the triumphant I type in river "---" 529 warship " that read " weapon class "
The 8th step, uses and grabs word technology, according to representing word or phrase, uses a large amount of existing level of confidentiality articles of search engine scanning, sorts out and captures concrete vocabulary according to word, gets rid of wrong word, mistake word, enriches word and sorts out database.As Fig. 3 process 2., process 3., process 4. as shown in.
For example, according to the feature of " 529 warship ", can set " 5 warship " the word mode of grabbing, can capture out the word of " 530 warship ", " 568 warship ", " 570 warship ", " 569 warship " equivalent feature, these concrete terms are enriched into word and sort out database.As Fig. 2 process 6. as shown in.
2. the preliminary judgement article level of confidentiality ownership stage:
Preliminary judgement article level of confidentiality ownership is followed following six principles: the first, and the style of article is not the factor of judging article level of confidentiality.The second, title, keyword, the Origin, Originator of article, the information of signing and issuing personnel are the key factors that judges article level of confidentiality ownership.The 3rd, if only have place's classified information in article, the level of confidentiality of this article is exactly the level of confidentiality of this classified information so.The 4th, if article has place's top secret information, other security information of many places, this article is top secret so.The 5th, if there is the confidential information in many places in article, many places confidential information, the level of confidentiality of this article is not less than secret so.The 6th, if there is many places confidential information in article, the level of confidentiality of this article is not less than secret so.
Detection sorted out in 2.1 words.Scanning article statement, sorts out according to word the word classification that database judges that classified information belongs to.
2.2 concerning security matters conditions detect.According to the data in level of confidentiality condition database, combination sorted out in the classified information word of retrieving in article to be detected.If sorting out combined information, the two word conforms to, this word concerning security matters and have level of confidentiality.Each concerning security matters level of confidentiality condition in level of confidentiality condition database is that combination sorted out in a word, and each the class word in combination is all necessary condition, and all necessary conditions that meet this word classification combination have met concerning security matters condition.
2.3 article level of confidentiality ownership are judged.The result of detection module and semantic Intelligent Measurement module sorted out in comprehensive word, judges the intensive ownership of entire article.Article level of confidentiality decision flowchart as shown in Figure 4.
2.4 intelligent learning.If made a fault in the ownership of article level of confidentiality is judged, need to add new concerning security matters vocabulary or word and sort out combination, can manually carry out later stage operation, when interpolation, want the word of clear and definite neologisms ownership to sort out (neologisms enter word and sort out database), also to specify new word and sort out the associated concrete security regulation (being newly combined into level of confidentiality condition database) of combination, the accuracy of judging to improve constantly article level of confidentiality ownership.
Preliminary judgement article level of confidentiality ownership be grammatical term for the character sort out combination whether with level of confidentiality condition database in information match, if y is the classified information in article, if δ=f (y) is the expression combination of classified information in article, i.e. a, b, c, d, e, f, g, h ... the various combinations of these classified informations.If δ ∈ is Φ, information is judged as relevant level of confidentiality.
That is: judge whether δ=f (y) ∈ Φ=g(x), if set up, δ concerning security matters, and be attributed to relevant level of confidentiality.
For example: in level of confidentiality condition database, one of confidential data structure is following formula
(main body class: regimental unit+behavior class: establishment+behavior class: adjustment+numeric class: numeral+measure word is sorted out: measure word) ∈ x 1
Article content is for " it is 1800 people that officers and men's number of 31 is simplified by 2000 people.”
Wherein, " 31 " ∈ (the regimental unit of main body class), " officers and men's number " ∈ (establishment of behavior class), " simplify " ∈ (adjustment of behavior class), " 2000,1800 " ∈ (numeral of numeric class), " people " (measure word that ∈ measure word is sorted out), be δ=f(31 group+officers and men number+simplify+2000+1800+ people), and δ ∈ Φ=g(x 1), think article content concerning security matters, and level of confidentiality is attributed to confidential.
Specific practice is:
The first step, by article input computer, is used computer scanning article statement, utilizes regular expression statement to sort out at word the information of sorting out in database, according to this information characteristics, the word that meets word classification database in article statement is extracted.As Fig. 4 process 1. as shown in.
Second step, database sorted out in index terms, judges that the word under vocabulary is sorted out.As Fig. 4 process 2. as shown in.
The 3rd step, determines that combination sorted out in the word in statement or paragraph.As Fig. 4 process 3. as shown in.
The 4th step, judges whether the combination that word in statement or paragraph is sorted out meets a certain condition in level of confidentiality condition database, as Fig. 4 process 4. as shown in.If the semanteme that article part gives expression to meets a condition in a certain classified data storehouse, the level of confidentiality of assert this part is the level of confidentiality of condition place database, as Fig. 4 process 5. as shown in.The concerning security matters rank of article is according to the superlative degree definition of level of confidentiality in full content, and the sequence of level of confidentiality is followed successively by top-secret > secret > secret.If do not meet any concerning security matters condition, not concerning security matters of article.
For example, " the middle of next month A army plans from B county, C mountain both direction is attacked Jia Cheng." resolve as following table according to Fig. 5.
Word or phrase Affiliated word is sorted out
The middle of next month 1.2
A army 3.1
B county, C mountain both direction 8.5
Attack 5.7
First city 2.4
The word classification of these words is combined as 1.2,3.1,8.5,5.7,2.4.Contrast Fig. 6, meets top-secret Article 5, the level of confidentiality of this statement is attributed to top secret.If this statement is in full unique top-secret statement, no matter other Wen Yibiaoda how many secrets, other content of confidential, the rank of this article is all top-secret.But, judge that article is as secret, secret need to further analysis.
3. based on probability statistics, judge level of confidentiality ownership
Final decision article level of confidentiality ownership is followed following three principles: the first, there is the article of the confidential information in many places, and there is the possibility that belongs to top secret.There is the article of many places confidential information, have and belong to confidential possibility.Second, analyze a large amount of concerning security matters articles in a field, find to contain some concerning security matters key elements in original file, through statistics, in the time that the probability of occurrence of these key elements meets certain mathematical law, the overall level of confidentiality of article can raise, so just think, as long as the article of the same domain occurring, has the feature that meets this mathematical law later, just think that the overall level of confidentiality of this article is higher than the level of confidentiality result of preliminary judgement.Second, the judgement of final level of confidentiality will be analyzed many places classified information and belong to respectively which field, for example work out physique, concept of operations, indication policy etc., according to historical statistical data, can the classified information in each field affect article rising level of confidentiality, concrete several ranks that raise, need to be judged by different situations.
3.1 calculate level of confidentiality rising condition.Collect the article that different field original manually defined level of confidentiality ownership, after confirming that level of confidentiality ownership is errorless, the achievement of operational phase two, counts the situation of " article level of confidentiality > preliminary judgement level of confidentiality ", adds up level of confidentiality rising situation.The article of different field is: rganizational structure, strategic campaign, deployment transfer, military logistics system etc., be made as respectively n 1, n 2, n 3, n 4Find out an article m piece of writing (m>=500) of concerning security matters field ni, find always to have a natural number b i, belong to confidential similar but not commensurate or confidential Information Number is c at certain piece of article isituation under, if established due to the present invention be used for judging concerning security matters letters Optimal error rate as
Figure BDA0000479437830000102
be that accuracy rate is 85.4%, assert at concerning security matters field n iin:
If α >=100% article entirety level of confidentiality rises to confidential or top-secret
If 85.4%≤α article entirety level of confidentiality has, α's may rise to secret
If α < 85.4% article entirety level of confidentiality is constant.
Concrete enforcement is:
The first step, carries out analysis for the existing fixed close article in various fields, and combing goes out the situation of real level of confidentiality higher than preliminary judgement level of confidentiality.
Second step, according to formula judge the final intensive ownership of article.
For example, if this article relate to weave and system (n 1) field, collect so 500 pieces, this field confidential article, find always to have a natural number b 1=2, in the time there is the weave and system information of 2 similar not commensurates, article confidential rises to confidential.If one piece of preliminary judgement is the c in confidential article 1=3, article entirety level of confidentiality rises to confidential.

Claims (3)

1. sort out combination based on word and judge and a classified information level of confidentiality affiliation method for probability statistics, it is characterized in that: carry out in turn in the steps below:
The first step: set up level of confidentiality condition database:
Analyze one by one security regulation, and the collection article relevant to this security regulation, classified information corresponding with regulations in article is summarized as to related term and the related term combination of some necessary conditions, comprise the combination of related term and related term, relate to condition classification, relate to field, corresponding regulations numbering, set up logic association, typing level of confidentiality condition database; Carry out independent collection according to " top secret condition database ", " confidential condition database ", " confidential condition database " three word banks respectively;
Second step: set up and enrich word and sort out database:
(1), in the article relevant to corresponding security regulation, add up the combination of the necessary condition that all these regulations relate to, and be summarized as the large class of some conditions;
(2), analyze the large class of each condition, determine the set of the class that the large class of each condition comprises; Each word is sorted out to the subset that is decomposed into step by step more some classes, until set can not divide again;
(3), analyze each end subset, list wherein representative word or phrase, set up word according to the logical relation of subordinate step by step and sort out database;
(4), sort out database and read word or phrase from word, use and grab word technology, according to representing word or phrase, scan existing level of confidentiality article, the class of the word of preserving according to word classification database captures concrete vocabulary, gets rid of wrong word, mistake word, enriches word and sorts out database;
The 3rd step: level of confidentiality article undetermined is just determined to level of confidentiality:
(1), paragraph or the statement of scanning article, utilize regular expression statement to sort out at word the information of having sorted out in database, according to this information characteristics, the word that meets word in article statement and sort out database is extracted;
(2), index terms sorts out database, judge the class that vocabulary is affiliated;
(3), determine the combination of the class in statement or paragraph;
(4) whether the combination that, judges class in statement or paragraph meets the arbitrary combination condition in level of confidentiality condition database completely, satisfied assert that the level of confidentiality of part is the level of confidentiality of this combination condition place database under this statement or paragraph, the concerning security matters rank of article is according to the superlative degree definition of level of confidentiality among whole statements or paragraph in article, the sequence of level of confidentiality is followed successively by top-secret > secret > secret, if do not meet any concerning security matters condition, not concerning security matters of article;
The 4th step: determine level of confidentiality: while occurring in article that many places meet secret or confidential condition, determined whether article level of confidentiality to upgrade by following manner:
(1), in different field, finding out concerning security matters field is an article m piece of writing of i, m>=500, analyze the classified information directly related with this field, find that the situation that level of confidentiality raises is a k piece of writing, in this field, the concerning security matters article level of confidentiality needed minimum information number that rises is b i,
: b i=MIN(gathers { a ij),
Be expressed as the minimum value in every piece of classified information number in the k piece of writing level of confidentiality rising article in the i of concerning security matters field, wherein a ijrepresent the classified information number of the j piece of writing article in the i of concerning security matters field, wherein j represents the number among 1~k;
(2), the confidential document that is non-top secret for preliminary judgement, according to formula
&alpha; = c i b i ,
Wherein, c irepresent the classified information number in the i of field in article, when the upgrading of article level of confidentiality is judged in level of confidentiality promotion condition α>=1 item.
2. the classified information level of confidentiality affiliation method of sorting out combination judgement and probability statistics based on word according to claim 1, it is characterized in that: in the 3rd step, if the first of article determined level of confidentiality and artificially sets and be not inconsistent, need to sort out the combination of adding new concerning security matters vocabulary or word classification in database at word, associated neologisms, the class of neologisms and concrete security regulation when interpolation.
3. the classified information level of confidentiality affiliation method of sorting out combination judgement and probability statistics based on word according to claim 1, it is characterized in that: in the 4th step, in the time of α < 1, setting Optimal error rate is β, in the time of 1-β≤α < 1, the overall level of confidentiality of this article is for rising level of confidentiality, and the calculated value of β is: ( 5 - 1 2 ) 4 &ap; ( 0.618 ) 4 = 14.58659418 % .
CN201410103973.8A 2014-03-20 2014-03-20 Sort out the classified information level of confidentiality affiliation method of combination judgement and probability statistics based on word Expired - Fee Related CN103870758B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410103973.8A CN103870758B (en) 2014-03-20 2014-03-20 Sort out the classified information level of confidentiality affiliation method of combination judgement and probability statistics based on word

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410103973.8A CN103870758B (en) 2014-03-20 2014-03-20 Sort out the classified information level of confidentiality affiliation method of combination judgement and probability statistics based on word

Publications (2)

Publication Number Publication Date
CN103870758A true CN103870758A (en) 2014-06-18
CN103870758B CN103870758B (en) 2016-05-11

Family

ID=50909281

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410103973.8A Expired - Fee Related CN103870758B (en) 2014-03-20 2014-03-20 Sort out the classified information level of confidentiality affiliation method of combination judgement and probability statistics based on word

Country Status (1)

Country Link
CN (1) CN103870758B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912946A (en) * 2016-04-05 2016-08-31 上海上讯信息技术股份有限公司 Document detection method and device
CN106844544A (en) * 2016-12-30 2017-06-13 全民互联科技(天津)有限公司 A kind of contract terms Risk Identification Method and system
CN109815709A (en) * 2018-12-11 2019-05-28 顺丰科技有限公司 Recognition methods, device, equipment and the storage medium that sensitive information illegally copies

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436599A (en) * 2011-10-28 2012-05-02 中国舰船研究设计中心 Secret determination information accounting method based on cascade secret determination information synchronous processing system
CN102819604A (en) * 2012-08-20 2012-12-12 徐亮 Method for retrieving confidential information of file and judging and marking security classification based on content correlation
CN103544446A (en) * 2012-07-16 2014-01-29 航天信息股份有限公司 Method and device for security classification calibration of files

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436599A (en) * 2011-10-28 2012-05-02 中国舰船研究设计中心 Secret determination information accounting method based on cascade secret determination information synchronous processing system
CN103544446A (en) * 2012-07-16 2014-01-29 航天信息股份有限公司 Method and device for security classification calibration of files
CN102819604A (en) * 2012-08-20 2012-12-12 徐亮 Method for retrieving confidential information of file and judging and marking security classification based on content correlation

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912946A (en) * 2016-04-05 2016-08-31 上海上讯信息技术股份有限公司 Document detection method and device
CN106844544A (en) * 2016-12-30 2017-06-13 全民互联科技(天津)有限公司 A kind of contract terms Risk Identification Method and system
CN109815709A (en) * 2018-12-11 2019-05-28 顺丰科技有限公司 Recognition methods, device, equipment and the storage medium that sensitive information illegally copies
CN109815709B (en) * 2018-12-11 2023-10-10 顺丰科技有限公司 Method, device, equipment and storage medium for identifying illegal copies of sensitive information

Also Published As

Publication number Publication date
CN103870758B (en) 2016-05-11

Similar Documents

Publication Publication Date Title
Wahono et al. Genetic feature selection for software defect prediction
CN103854063B (en) A kind of prediction of event occurrence risk method for early warning based on internet opening imformation
CN109918505B (en) Network security event visualization method based on text processing
DE112013004082T5 (en) Search system of the emotion entity for the microblog
WO2016177069A1 (en) Management method, device, spam short message monitoring system and computer storage medium
CN112001170B (en) Method and system for identifying deformed sensitive words
CN103870758B (en) Sort out the classified information level of confidentiality affiliation method of combination judgement and probability statistics based on word
CN114860882A (en) Fair competition review auxiliary method based on text classification model
CN103198362A (en) Method for coal mine safety evaluation
CN115277180A (en) Block chain log anomaly detection and tracing system
Nakagawa et al. Character-level convolutional neural network for predicting severity of software vulnerability from vulnerability description
Kim et al. Comparative experiment on TTP classification with class imbalance using oversampling from CTI dataset
CN112257425A (en) Power data analysis method and system based on data classification model
CN109635008A (en) A kind of equipment fault detection method based on machine learning
CN109308572A (en) The expected performance evaluation method of project of inviting outside investment based on policy goals guiding
CN114860903A (en) Event extraction, classification and fusion method oriented to network security field
Chu et al. A new feature weighting method based on probability distribution in imbalanced text classification
CN107885725A (en) A kind of method and device for handling recruitment data
Oudni et al. Accelerating effect of attribute variations: Accelerated gradual itemsets extraction
CN112052336B (en) Traffic emergency identification method and system based on social network platform information
Shen Application of Synthetic Data in Artificial Intelligence Trials from the Perspective of Judicial Justice
CN115809834B (en) Ecological environment analysis system based on environmental impact evaluation data
Hu et al. A multi-attribute decision analysis method based on rough sets dealing with uncertain information
Rao Extremism Video Detection In Social Media
Zhao et al. Research on the ideology monitoring system of cyberspace legal security based on Neural Network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160511

Termination date: 20170320

CF01 Termination of patent right due to non-payment of annual fee