CN103870758B - Sort out the classified information level of confidentiality affiliation method of combination judgement and probability statistics based on word - Google Patents

Sort out the classified information level of confidentiality affiliation method of combination judgement and probability statistics based on word Download PDF

Info

Publication number
CN103870758B
CN103870758B CN201410103973.8A CN201410103973A CN103870758B CN 103870758 B CN103870758 B CN 103870758B CN 201410103973 A CN201410103973 A CN 201410103973A CN 103870758 B CN103870758 B CN 103870758B
Authority
CN
China
Prior art keywords
confidentiality
word
level
article
condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410103973.8A
Other languages
Chinese (zh)
Other versions
CN103870758A (en
Inventor
陈建
欧阳国华
杨兴
李楠
史章军
向音
吕慧芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201410103973.8A priority Critical patent/CN103870758B/en
Publication of CN103870758A publication Critical patent/CN103870758A/en
Application granted granted Critical
Publication of CN103870758B publication Critical patent/CN103870758B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/604Tools and structures for managing or administering access control systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of classified information level of confidentiality affiliation method of sorting out combination judgement and probability statistics based on word, the artificial study of simulation concerning security matters ownership process sets up concerning security matters condition database and database sorted out in word, concerning security matters condition according to security regulation using each class in the combination of part of speech as concerning security matters necessary condition, compare with the part of speech combination that band is analyzed in article, judge concerning security matters rank. The present invention is by the statement content of Computer Analysis article, ignore the grammatical representation of statement, by the abstract statement logical combination that becomes part of speech, and contrast the combination condition of security stipulation, judge article classified information level of confidentiality, for objective, judge that concerning security matters article and concerning security matters rank provide feasible basis fast.

Description

Sort out the classified information level of confidentiality affiliation method of combination judgement and probability statistics based on word
Technical field
The present invention relates to the level of confidentiality ownership technology of classified information, is a kind of based on word classification combination judgement and probability specificallyThe classified information level of confidentiality affiliation method of statistics.
Background technology
The fixed close mode of traditional file, due to the effective technology means that lack level of confidentiality and assert, when fixed close to level of confidentiality boundary handleHold inaccurately, fixed close work is subjective. For the similar file of content, different identifying way difference, the identification angles of assert peopleDegree is different, identification foundation is different, causes assert that the possibility of result there are differences, and has a strong impact on the serious of Liao Wo army level of confidentiality identificationProperty and authority.
China's basis for IT application facility, through building for many years, has formed fairly largely, and the department of most government, armyThe systems such as WWW, FTP, DNS, Email, OA are built. Objective fact reflects, a lot of office clerks have formed useThe custom of file is write, preserves, transmitted to Word (as WORD, PPT, TXT etc.). It is each that e-file has become armyImportant information carrier and the transmission method of department and other various tissues. Clearly, information turns to all departments' routine work and bringsConvenience, also increased substantially operating efficiency simultaneously. But, enjoying the same of convenience that computer bringsTime, also there is being subject at present the information security issue of extensive concern. Because government, army relate in a large number in management operatingThe information of level of confidentiality, in order effectively to ensure normal operation and the Information Security of all departments, be necessary to taking classified information as inThe information security of the heart is implemented rank ownership accurately and effectively and is divided, so that the sphere of circulation of standardize information. Rely on computer technology,Solve the problem that current level of confidentiality authentication method is single, subjectivity is strong, for level of confidentiality qualification work provides scientific basis, improve level of confidentiality mirrorDetermine operating efficiency, realizing concerning security matters deciding grade and level digitlization, information categorization electronization, aid decision intellectuality becomes at present and is eager alreadyThe problem solving.
Summary of the invention
Technical problem to be solved by this invention is to solve the problem that current level of confidentiality authentication method is single, subjectivity is strong. ReachTo for level of confidentiality qualification work provides scientific basis, improve level of confidentiality qualification operating efficiency, realize maintain secrecy deciding grade and level digitlization, aid decisionIntelligentized object. A kind of classified information level of confidentiality affiliation method of sorting out combination judgement and probability statistics based on word is provided.
The described classified information level of confidentiality affiliation method of sorting out combination judgement and probability statistics based on word, is characterized in that: pressStating step carries out in turn:
The first step: set up level of confidentiality condition database:
Analyze one by one security regulation, and collect the article relevant to this security regulation, by relate to corresponding with regulations in articleConfidential information is summarized as related term and the related term combination of some necessary conditions, comprises related term and related term combination, relates to conditionClassification, relate to field, corresponding regulations numbering is set up logic association, typing level of confidentiality condition database; Respectively according to " top secret barEvent data storehouse ", " confidential condition database ", " confidential condition database " three word banks carry out independent collection;
Second step: set up and enrich word and sort out database:
(1), in the article relevant to corresponding security regulation, add up the group of the necessary condition that all these regulations relate toClose, and be summarized as the large class of some conditions;
(2), analyze the large class of each condition, determine the set of the class that the large class of each condition comprises; Each word is sorted out againSet is decomposed into step by step the subset of some classes, until can not divide again;
(3), analyze each end subset, list wherein representative word or phrase, according to subordinate step by stepLogical relation is set up word and is sorted out database;
(4), sort out database and read word or phrase from word, use and grab word technology, according to representing word or phrase,Scan existing level of confidentiality article, the class of the word of preserving according to word classification database captures concrete vocabulary, gets rid of wrong word, mistake word, richDatabase sorted out in rich word;
The 3rd step: level of confidentiality article undetermined is just determined to level of confidentiality:
(1), paragraph or the statement of scanning article, utilize regular expression statement to sort out in database and sort out at wordInformation, sort out the word of database and extract meeting word in article statement according to this information characteristics;
(2), index terms sorts out database, judge the class that vocabulary is affiliated;
(3), determine the combination of the class in statement or paragraph;
(4) whether the combination that, judges class in statement or paragraph meets the arbitrary combination bar in level of confidentiality condition database completelyPart, satisfied assert that the level of confidentiality of part is the level of confidentiality of this combination condition place database under this statement or paragraph, the relating to of articleLevel of confidentiality is not according to the superlative degree definition of level of confidentiality among whole statements or paragraph in article, and the sequence of level of confidentiality is followed successively by top secret >Secret > secret, if do not meet any concerning security matters condition, not concerning security matters of article;
The 4th step: determine level of confidentiality: while occurring in article that many places meet secret or confidential condition, determined by following mannerWhether article level of confidentiality is upgraded:
(1), in different field, finding out concerning security matters field is an article m section of i, m >=500, analyze and the direct phase in this fieldThe classified information closing, finds that the situation that level of confidentiality raises is a k section, the concerning security matters article level of confidentiality needed minimum information that rises in this fieldQuantity be bi
bi=MIN (set { aij}),
Be expressed as the minimum of a value in the quantity of every section of classified information in the k section level of confidentiality rising article in the i of concerning security matters field, whereinaijRepresent the quantity of the classified information of the j section article in the i of concerning security matters field, wherein j represents the number among 1~k;
(2), the confidential document that is non-top secret for preliminary judgement, according to formula
α = c i b i ,
Wherein, ciRepresent the quantity of the classified information in the i of field in article, when literary composition is judged in level of confidentiality promotion condition α >=1 itemThe upgrading of chapter level of confidentiality.
Further, in the 3rd step, if the first of article determined level of confidentiality and artificially set and be not inconsistent, need to sort out number at wordAccording to the combination of adding new concerning security matters vocabulary or word classification in storehouse, associated neologisms, the class of neologisms and concrete security regulation when interpolation.
Prioritization scheme is, in the 4th step, in the time of α < 1, setting Optimal error rate is β, in the time of 1-β≤α < 1, and this articleOverall level of confidentiality be the level of confidentiality that can rise, the calculated value of β is: ( 5 - 1 2 ) 4 &ap; ( 0.618 ) 4 = 14.58659418 % .
The present invention, by the statement content of Computer Analysis article, reaches the grammatical representation of ignoring statement, and statement is abstractBecome the logical combination that word is sorted out, and contrast the combination condition of security stipulation, judge article classified information level of confidentiality.
The foundation simulation of concerning security matters condition database manually according to security regulation to existing concerning security matters article learning process, by concerning security mattersCondition is abstract is the word combination of the combination condition in different field, using this as necessary condition, meets in the fieldThe necessary condition of combination part of speech be judged to be to meet concerning security matters condition, can manually repair combination condition in backstage.
Whether meet level of confidentiality condition not simply by article to be evaluated and the contrast of concerning security matters keyword, key is concerning security mattersRegulations abstract becomes the combination to the each side condition in different field, the in the field classification of the necessary condition of each sideForm respectively data set, what in the classification combination as necessary condition, institute's vocabulary of grabbing or phrase had met respectively classified information willAsk, this combination has met the requirement of concerning security matters conditions, and these combination conditions are the conditions that are subordinated to respectively in different field, only haveIn a territory, met concerning security matters combination condition and just met at last place's concerning security matters, these concerning security matters have formed concerning security matters barEvent data storehouse. Thereby it is rigorous feasible that concerning security matters are analyzed, break through external large software technology company and studied for a long period of time eventuallyWith the intellectual analysis barrier of breaking through.
The article upgrading level of confidentiality that occurs confidential or confidential information for many places provides evaluation method, this evaluation methodLearn equally the process of artificial upgrading level of confidentiality. In each specific concerning security matters field, sum up the rule that allow level of confidentiality upgradingRule, has broken through the classified information quantity that this rule limits and can allow level of confidentiality upgrading, article conduct in Optimal error rateScalable level of confidentiality article.
Whole concerning security matters evaluation process is rigorous, reliable, quick, objective, for computer is objective, the large quantities of concerning security matters articles of fast processingAnd provide practicable approach to concerning security matters automatic measure grading.
Brief description of the drawings
Fig. 1 is that database establishment flow chart sorted out in level of confidentiality condition database and word,
Fig. 2 is that database classified information ownership structural representation sorted out in word,
Fig. 3 is that data classified information typing schematic diagram sorted out in word,
Fig. 4 is article concerning security matters rank decision flowchart,
Fig. 5 is that database coding and structural representation sorted out in word,
Fig. 6 is level of confidentiality conditional combination schematic diagram.
Detailed description of the invention
Below in conjunction with embodiment and accompanying drawing, the present invention is further described:
General thought of the present invention is " to enrich colony with individuality, judge individuality with colony, based on probability statistics, judge closeLevel ownership ". Because the narration of rules is high level overviews, abstract, why people can understand provision, is because existing for peopleContacted confidential document before, has set up the relation of secret provision and concrete character express by empirical learning. If examinationFigure allows computer judge the ownership that article is intensive, people's cognitive process must be converted to the mode that computer can be identified. WantThink this target, must be through following three phases.
1. the empirical learning stage:
1.1 according to article title, a large amount of articles that meet a certain security regulation of collecting.
The reason why 1.2 manual analysis this articles are judged as this level of confidentiality is classified information.
1.3 are manually decomposed into classified information word and word combination.
1.4 repeat 1-3 step one by one according to security regulation.
1.5 sort out word and word combination.
1.6 set up word sorts out database.
Utilize the word of having sorted out, build up word and sort out database, utilize computer to grab word technology and find how similarWord, enriches database. Word classification is herein that the word class of classified information is decomposed into place class, weapon according to security regulationClass, Building class, tool-class, main body class, unit class, behavior class, direction class etc. Word is sorted out database classified information ownership structure and is shownIntention as shown in Figure 2. Classified information typing schematic diagram in enormous quantities as shown in Figure 3. Database coding and structural representation sorted out in wordAs shown in Figure 5.
Suppose a=main body class word, b=behavior class word, c=place class word, d=weapon class word, e=Building classWord, f=tool-class word, g=direction class word, h=quantity class word.
1.7 set up level of confidentiality condition database. Sorting out database according to word combines word abstract for word classification combination, by wordSort out the concrete clause that combination is associated with security regulation. The Preliminary Construction mode that database and level of confidentiality condition database sorted out in word asShown in Fig. 1. Level of confidentiality conditional combination schematic diagram as shown in Figure 6.
If all information combination in level of confidentiality condition database are Φ=g (x), x ∈ (x1,x2,x3),x1Represent secret, x2Represent secret, x3Represent top secret, that is: g (x1) the expression combination of all confidential information in representation database, g (x2) representative dataThe expression of all confidential information combination in storehouse, g (x3) the expression combination of all top secret information in representation database.
Specific practice is:
The first step, analyzes security regulation, collects related article, if by abstract classified information corresponding with regulations in article beThe combination of dry necessary condition, by these necessary condition combinations and corresponding regulations numbering typing level of confidentiality condition database, and buildsVertical logic association, as Fig. 1 process 1. as shown in. This database comprises three word banks, is respectively " top secret condition database ", " machineLevel of confidentiality condition database ", " confidential condition database ", as shown in Figure 6.
For example, can, by " the planning policy of strategy, campaign ", resolve to the necessary bars such as time, place, main body, weapon, behaviorPart combination, then by regulations and necessary condition combination input database.
Second step, the necessary condition that all regulations are related to is added up, and by its abstract collect into some conditions largeClass. As Fig. 1 process 2., process 3. as shown in.
For example, time class, place class, main body class, weapon class, behavior class, quantity class, verb classification, language class etc. As figure2 processes 1. shown in.
The 3rd step, analyzes the large class of each condition, determines word classification set that it comprises. As Fig. 1 process 4. as shown in.
For example, " weapon class " can be decomposed into " land battle weapon class ", " air-to-air armament class ", " sea warfare weapon class " etc. As Fig. 2Process 2. shown in.
The 4th step, sorts out a certain word to be decomposed into the subset that some words are sorted out, if subset also can continue to decompose, continuesDecompose until set can not divide again. As Fig. 1 process 5., process 6., process 7. as shown in.
For example, " sea warfare weapon class " can be decomposed into again " submarine class ", " destroyer class ", " escort vessel class ", " aircraft carrier "," comprehensive carrier " etc., " escort vessel class " continues to be subdivided into " east of a river level ", " the triumphant I type in river " etc. again. As Fig. 2 process 3., process 4.Shown in.
The 5th step, analyzes each subset, lists word representative in subset or phrase. As 8. institute of Fig. 1 processShow.
For example, the representative phrase in " the triumphant I type in river " is " 529 warship " or " 054A " or " Zhoushan number ". As 5. institute of Fig. 2 processShow.
The 6th step, according to the subordinate logic of " set sorted out in the large class of condition, word, word is sorted out subset, represented word or phrase "Relation is set up word and is sorted out database. As Fig. 1 process 9., shown in Fig. 5.
The 7th step, sorts out database and reads and represent word or phrase from word. As Fig. 3 process 1. as shown in.
For example,---" sea warfare weapon class "---" escort vessel class "---" the triumphant I type in river "---" 529 that read " weapon class "Warship "
The 8th step, uses and grabs word technology, according to representing word or phrase, uses a large amount of existing level of confidentiality literary compositions of search engine scanningChapter, sorts out and captures concrete vocabulary according to word, gets rid of wrong word, mistake word, enriches word and sorts out database. As Fig. 3 process 2., process 3.,Process 4. shown in.
For example, according to the feature of " 529 warship ", can set " 5?? warship " the word mode of grabbing, can capture out " 530 warship ", " 568Warship ", the word of " 570 warship ", " 569 warship " equivalent feature, these concrete terms are enriched into word and sort out database. As Fig. 2 process6. shown in.
2. the preliminary judgement article level of confidentiality ownership stage:
Preliminary judgement article level of confidentiality ownership is followed following six principles: the first, and the style of article is not to judge article level of confidentialityFactor. The second, title, keyword, the Origin, Originator of article, the information of signing and issuing personnel are judge article level of confidentiality ownership importantFactor. The 3rd, if only have place's classified information in article, the level of confidentiality of this article is exactly the level of confidentiality of this classified information so. TheFour, if article has place's top secret information, other security information of many places, this article is top secret so. The 5th, if articleIn have the confidential information in many places, many places confidential information, the level of confidentiality of this article is not less than secret so. The 6th, if in articleHave many places confidential information, the level of confidentiality of this article is not less than secret so.
Detection sorted out in 2.1 words. Scanning article statement, sorts out according to word the word classification that database judges that classified information belongs to.
2.2 concerning security matters conditions detect. According to the data in level of confidentiality condition database, retrieve the classified information in article to be detectedCombination sorted out in word. If sorting out combined information, the two word conforms to, this word concerning security matters and have level of confidentiality. In level of confidentiality condition databaseEach concerning security matters level of confidentiality condition be that combination sorted out in word, each the class word in combination is all necessary condition, meets thisAll necessary conditions that combination sorted out in word have met concerning security matters condition.
2.3 article level of confidentiality ownership are judged. The result of detection module and semantic Intelligent Measurement module sorted out in comprehensive word, judges wholeThe intensive ownership of section article. Article level of confidentiality decision flowchart as shown in Figure 4.
2.4 intelligence learning. If made a fault in the ownership of article level of confidentiality is judged, need to add new concerning security matters vocabulary orCombination sorted out in word, can manually carry out later stage operation, and (neologisms enter word and sort out number when interpolation, to want the word of clear and definite neologisms ownership to sort outAccording to storehouse), also to specify new word and sort out the associated concrete security regulation (being newly combined into level of confidentiality condition database) of combination, with constantlyImprove the accuracy that article level of confidentiality ownership is judged.
Preliminary judgement article level of confidentiality ownership be grammatical term for the character sort out combination whether with level of confidentiality condition database in information mutuallyJoin, establishing y is the classified information in article, and establishing δ=f (y) is the expression combination of classified information in article, i.e. a, and b, c, d, e, f, g,H ... the various combinations of these classified informations. If δ ∈ is Φ, information is judged as relevant level of confidentiality.
That is: judge whether δ=f (y) ∈ Φ=g (x), if set up, δ concerning security matters, and be attributed to relevant level of confidentiality.
For example: in level of confidentiality condition database, one of confidential data structure is following formula
(main body class: regimental unit+behavior class: establishment+behavior class: adjustment+numeric class: numeral+measure word is sorted out: measure word) ∈x1
Article content is for " it is 1800 people that officers and men's number of 31 is simplified by 2000 people. "
Wherein, " 31 " ∈ (the regimental unit of main body class), " officers and men's number " ∈ (establishment of behavior class), " simplifying " ∈(adjustment of behavior class), " 2000,1800 " ∈ (numeral of numeric class), " people " (measure word that ∈ measure word is sorted out), i.e. (31 of δ=f+ officers and men number+simplify+2000+1800+ people), and δ ∈ Φ=g (x1), think article content concerning security matters, and level of confidentiality is attributed to secretLevel of confidentiality.
Specific practice is:
The first step, by article input computer, is used computer scanning article statement, utilizes regular expression to explainSort out at word the information of sorting out in database, according to this information characteristics, the word that meets word classification database in article statement is carriedTake out. As Fig. 4 process 1. as shown in.
Second step, database sorted out in index terms, judges that the word under vocabulary is sorted out. As Fig. 4 process 2. as shown in.
The 3rd step, determines that combination sorted out in the word in statement or paragraph. As Fig. 4 process 3. as shown in.
The 4th step, judges whether the combination that in statement or paragraph, word is sorted out meets a certain in level of confidentiality condition databasePart, as Fig. 4 process 4. as shown in. If the semanteme that article part gives expression to meets a condition in a certain classified data storehouse,The level of confidentiality of assert this part is the level of confidentiality of condition place database, as Fig. 4 process 5. as shown in. The concerning security matters rank of article is according to allThe superlative degree of level of confidentiality definition in content, the sequence of level of confidentiality is followed successively by top secret > secret > secret. If do not meet any concerning security mattersCondition, not concerning security matters of article.
For example, " the middle of next month A army plans from B county, C mountain both direction is attacked Jia Cheng. " resolve as following table according to Fig. 5.
Word or phrase Affiliated word is sorted out
The middle of next month 1.2
A army 3.1
B county, C mountain both direction 8.5
Attack 5.7
First city 2.4
The word classification of these words is combined as 1.2,3.1,8.5,5.7,2.4. Contrast Fig. 6, meets top-secret Article 5, shouldThe level of confidentiality of statement is attributed to top secret. If this statement is in full unique top-secret statement, no matter other Wen Yibiaoda how manySecret, other content of confidential, the rank of this article is all top-secret. But, judge that article further divided as secret, secret needsAnalyse.
3. based on probability statistics, judge level of confidentiality ownership
Final decision article level of confidentiality ownership is followed following three principles: the first, there is the article of the confidential information in many places, and haveBelong to the possibility of top secret. There is the article of many places confidential information, have and belong to confidential possibility. The second, pointAnalyse a large amount of concerning security matters articles in a field, find to contain some concerning security matters key elements in original file, through statistics, when thisWhen the probability of occurrence of a little key elements meets certain mathematical law, the overall level of confidentiality of article can raise, so just think, as long as afterThe article of same domain occurring, has the feature that meets this mathematical law, just thinks that the overall level of confidentiality of this article is than tentatively sentencingFixed level of confidentiality result is high. The second, the judgement of final level of confidentiality will be analyzed many places classified information and belong to respectively which field, for example establishmentPhysique, concept of operations, instruction policy etc., according to historical statistical data, can the classified information in each field affect article and raise closeLevel, several ranks that specifically raise, need to be judged by different situations.
3.1 calculate level of confidentiality rising condition. Collect original article that has manually defined level of confidentiality ownership of different field, through trueRecognize level of confidentiality ownership errorless after, the achievement of operational phase two, counts the situation of " article level of confidentiality > preliminary judgement level of confidentiality ",Statistics level of confidentiality rising situation. The article of different field is: rganizational structure, strategic campaign, deployment transfer, military logistics system etc., respectivelyBe made as n1,n2,n3,n4 Find out concerning security matters field niAn article m section (m >=500), finding always has a natural number bi, at certain sectionThe quantity that article belongs to confidential similar but not commensurate or confidential information is ciSituation under, if establishedBecause the present invention is for judging the level of confidentiality ownership of classified information, departure rate, requiring accuracy rate to be not less thanIn 85% situation, Optimal error rate isBe that accuracy rate is 85.4%,Assert at concerning security matters field niIn:
If α >=100% article entirety level of confidentiality rises to confidential or top-secret
If 85.4%≤α article entirety level of confidentiality has, α's may rise to secret
If α < 85.4% article entirety level of confidentiality is constant.
Concrete enforcement is:
The first step, carries out analysis for the existing fixed close article in various fields, combing go out real level of confidentiality higher than at the beginning ofStep is judged the situation of level of confidentiality.
Second step, according to formulaJudge the final intensive ownership of article.
For example, if this article relate to weave and system (n1) field, collect so this field confidential article 500A section, finds always to have a natural number b1=2, in the time there is the weave and system information of 2 similar not commensurates, article secretLevel rises to confidential. If one section of preliminary judgement is the c in confidential article1=3,Article entirety level of confidentiality rises to confidential.

Claims (3)

1. sort out combination based on word and judge and a classified information level of confidentiality affiliation method for probability statistics, it is characterized in that: by followingStep is carried out in turn:
The first step: set up level of confidentiality condition database:
Analyze one by one security regulation, and collect the article relevant to this security regulation, by concerning security matters letter corresponding with regulations in articleBreath is summarized as the related term of some necessary conditions and related term combination, comprise related term and related term combination, relate to condition classification,Relate to the regulations numbering of field, correspondence, set up logic association, typing level of confidentiality condition database; Respectively according to " top secret conditional numberAccording to storehouse ", " confidential condition database ", " confidential condition database " three word banks carry out independent collection;
Second step: set up and enrich word and sort out database:
(1), in the article relevant to corresponding security regulation, add up the combination of the necessary condition that all these regulations relate to, andBe summarized as the large class of some conditions;
(2), analyze the large class of each condition, determine the set of the class that the large class of each condition comprises; Each word is sorted out again step by stepSet is decomposed into the subset of some classes, until can not divide again;
(3), analyze each end subset, list wherein representative word or phrase, according to the logic of subordinate step by stepRelation is set up word and is sorted out database;
(4), sort out database and read word or phrase from word, use and grab word technology, according to representing word or phrase, scanningExisting level of confidentiality article, the class of the word of preserving according to word classification database captures concrete vocabulary, gets rid of wrong word, mistake word, enriches wordSort out database;
The 3rd step: level of confidentiality article undetermined is just determined to level of confidentiality:
(1), paragraph or the statement of scanning article, utilize regular expression statement to sort out at word the letter of having sorted out in databaseBreath, extracts according to this information characteristics the word that meets word classification database in article statement;
(2), index terms sorts out database, judge the class that vocabulary is affiliated;
(3), determine the combination of the class in statement or paragraph;
(4), judge whether the combination of class in statement or paragraph meets the arbitrary combination condition in level of confidentiality condition database completely,Satisfied assert that the level of confidentiality of part is the level of confidentiality of this combination condition place database under this statement or paragraph, the concerning security matters level of article, according to the superlative degree definition of level of confidentiality among whole statements or paragraph in article, the sequence of level of confidentiality is not followed successively by top secret > secret >Secret, if do not meet any concerning security matters condition, not concerning security matters of article;
The 4th step: determine level of confidentiality: while occurring in article that many places meet secret or confidential condition, determined whether by following mannerArticle level of confidentiality is upgraded:
(1), in different field, finding out concerning security matters field is an article m section of i, m >=500, analyze directly related with this fieldClassified information, finds that the situation that level of confidentiality raises is a k section, the concerning security matters article level of confidentiality number of needed minimum information that rises in this fieldAmount is bi
: bi=MIN (set { aij}),
Be expressed as the minimum of a value in the quantity of every section of classified information in the k section level of confidentiality rising article in the i of concerning security matters field, wherein aijTableShow the quantity of the classified information of the j section article in the i of concerning security matters field, wherein j represents the number among 1~k;
(2), the confidential document that is non-top secret for preliminary judgement, according to formula
&alpha; = c i b i ,
Wherein, ciRepresent the quantity of the classified information in the i of field in article, when level of confidentiality promotion condition α >=1 item judges that article is closeLevel upgrading.
2. the classified information level of confidentiality affiliation method of sorting out combination judgement and probability statistics based on word according to claim 1,It is characterized in that: in the 3rd step, if the first of article determined level of confidentiality and artificially set and be not inconsistent, need to sort out in database and add at wordAdd the combination that new concerning security matters vocabulary or word are sorted out, associated neologisms, the class of neologisms and concrete security regulation when interpolation.
3. the classified information level of confidentiality affiliation method of sorting out combination judgement and probability statistics based on word according to claim 1,It is characterized in that: in the 4th step, in the time of α < 1, setting Optimal error rate is β, in the time of 1-β≤α < 1, the overall level of confidentiality of this articleFor the level of confidentiality that can rise, the calculated value of β is: ( 5 - 1 2 ) 4 &ap; ( 0.618 ) 4 = 14.58659418 % .
CN201410103973.8A 2014-03-20 2014-03-20 Sort out the classified information level of confidentiality affiliation method of combination judgement and probability statistics based on word Expired - Fee Related CN103870758B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410103973.8A CN103870758B (en) 2014-03-20 2014-03-20 Sort out the classified information level of confidentiality affiliation method of combination judgement and probability statistics based on word

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410103973.8A CN103870758B (en) 2014-03-20 2014-03-20 Sort out the classified information level of confidentiality affiliation method of combination judgement and probability statistics based on word

Publications (2)

Publication Number Publication Date
CN103870758A CN103870758A (en) 2014-06-18
CN103870758B true CN103870758B (en) 2016-05-11

Family

ID=50909281

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410103973.8A Expired - Fee Related CN103870758B (en) 2014-03-20 2014-03-20 Sort out the classified information level of confidentiality affiliation method of combination judgement and probability statistics based on word

Country Status (1)

Country Link
CN (1) CN103870758B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912946A (en) * 2016-04-05 2016-08-31 上海上讯信息技术股份有限公司 Document detection method and device
CN106844544A (en) * 2016-12-30 2017-06-13 全民互联科技(天津)有限公司 A kind of contract terms Risk Identification Method and system
CN109815709B (en) * 2018-12-11 2023-10-10 顺丰科技有限公司 Method, device, equipment and storage medium for identifying illegal copies of sensitive information

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436599A (en) * 2011-10-28 2012-05-02 中国舰船研究设计中心 Secret determination information accounting method based on cascade secret determination information synchronous processing system
CN102819604A (en) * 2012-08-20 2012-12-12 徐亮 Method for retrieving confidential information of file and judging and marking security classification based on content correlation
CN103544446A (en) * 2012-07-16 2014-01-29 航天信息股份有限公司 Method and device for security classification calibration of files

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436599A (en) * 2011-10-28 2012-05-02 中国舰船研究设计中心 Secret determination information accounting method based on cascade secret determination information synchronous processing system
CN103544446A (en) * 2012-07-16 2014-01-29 航天信息股份有限公司 Method and device for security classification calibration of files
CN102819604A (en) * 2012-08-20 2012-12-12 徐亮 Method for retrieving confidential information of file and judging and marking security classification based on content correlation

Also Published As

Publication number Publication date
CN103870758A (en) 2014-06-18

Similar Documents

Publication Publication Date Title
CN103854063B (en) A kind of prediction of event occurrence risk method for early warning based on internet opening imformation
CN112001170B (en) Method and system for identifying deformed sensitive words
CN102722719A (en) Intrusion detection method based on observational learning
CN103870758B (en) Sort out the classified information level of confidentiality affiliation method of combination judgement and probability statistics based on word
Kriesi et al. Poldem-protest dataset 30 European countries
Oliva et al. A multi-criteria model for the security assessment of large-infrastructure construction sites
Dekkers et al. Exercising discretion in border areas: On the changing social surround and decision field of internal border control in the Netherlands
Dekkers Mobility, Control and Technology in Border Areas: Discretion and Decision‐Making in the Information Age
Godfrid 13 Changes to the environmental monitoring institutions for the mining sector in San Juan, Argentina
Wang et al. New approach for information security evaluation and management of IT systems in educational institutions
Vitenburg et al. Project of automated system's information security system selection
Chu et al. A new feature weighting method based on probability distribution in imbalanced text classification
YILMAZ Financial Performance Analysis Of Greek Banks: The Seca Method
Lin et al. The establishment of green construction evaluation of building projects based on cloud model
Raynor Evidence versus politics in British probation
Oudni et al. Accelerating effect of attribute variations: Accelerated gradual itemsets extraction
Shen Application of Synthetic Data in Artificial Intelligence Trials from the Perspective of Judicial Justice
Yin et al. Construction safety risk assessment method of construction engineering based on improved SVM
Xia et al. BP Neural Network Algorithm for Computer Network Security Evaluation
Zhao et al. Research on the ideology monitoring system of cyberspace legal security based on Neural Network
Guo Operational risk assessment of stadium network platform using K-means algorithm
Qinqin Research on early warning mechanism of civil aviation accidents based on Grey Theory
Zhou et al. Evaluation of ecological environment governance effect based on entropy-TOPSIS method: Take Saihanba forest farm as an example
Cao Analysis of accident source and preventive measures in local coal mine
Li et al. Application of combined evaluation method based on comprehensive weight and gray-fuzzy theory in network security risk assessment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160511

Termination date: 20170320

CF01 Termination of patent right due to non-payment of annual fee