CN113886830A - Information security scoring system construction method based on artificial intelligence - Google Patents

Information security scoring system construction method based on artificial intelligence Download PDF

Info

Publication number
CN113886830A
CN113886830A CN202110958360.2A CN202110958360A CN113886830A CN 113886830 A CN113886830 A CN 113886830A CN 202110958360 A CN202110958360 A CN 202110958360A CN 113886830 A CN113886830 A CN 113886830A
Authority
CN
China
Prior art keywords
standard
text
standard text
updated
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110958360.2A
Other languages
Chinese (zh)
Inventor
才华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Southern Information Security Research Institute
Original Assignee
Guangdong Southern Information Security Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Southern Information Security Research Institute filed Critical Guangdong Southern Information Security Research Institute
Priority to CN202110958360.2A priority Critical patent/CN113886830A/en
Publication of CN113886830A publication Critical patent/CN113886830A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for constructing an information security scoring system based on artificial intelligence, which comprises the following steps: crawling standard texts related to industry standard online information safety; determining that the standard text is suspected to be updated; determining an industry standard text needing to be updated; performing structured extraction on the industry standard text needing to be updated through a preset text matching rule to obtain structured standard data; evaluating the difference between the new standard text and the old standard text and the difference between the new standard text and the old standard text according to the industry standard text needing to be updated; acquiring an automatic test system and a manual evaluation result of a reviewer; and the updated standard text is processed according to the comparison result, so that the problem of updating the standard text is solved more accurately.

Description

Information security scoring system construction method based on artificial intelligence
[ technical field ] A method for producing a semiconductor device
The invention relates to the field of information security scoring, in particular to a method for constructing an information security scoring system based on artificial intelligence.
[ background of the invention ]
With the continuous development of technology, information security becomes more and more important. When an information system is designed, a corresponding safety evaluation system is also designed, and the purpose of the safety evaluation system is to perform safety test on the current information system. Most common is a cryptographic complexity test. The complexity of the cipher refers to: the password set by the user must meet certain complexity, and if the password does not meet the certain complexity, a corresponding prompt is given. The safety scoring system is a system comprising various function tests, and if a certain function of the current information system is found not to meet the corresponding industry standard, the safety scoring system can correspondingly deduct points.
The existing safety scoring system depends on manual customization, a practitioner needs to learn the corresponding industry standard firstly and then manually design the corresponding scoring system, the method is relatively dependent on subjective judgment of the practitioner, and then energy is consumed for reading a large number of related industry standard detailed rules, so that the efficiency is not high. If a system can be designed to intelligently learn the existing industry standards and scoring rules, after a new industry standard appears, the scoring rules are intelligently updated and recommended to practitioners.
[ summary of the invention ]
The invention aims to solve the defects of the prior art and provides a method for constructing an information security scoring system based on artificial intelligence;
the purpose of the invention can be achieved by adopting the following technical scheme:
a method for constructing an information security scoring system based on artificial intelligence is characterized by comprising the following steps:
crawling standard texts related to industry standard online information safety;
according to the crawled standard texts, acquiring one piece of standard text data as first standard text data, traversing all other pieces of standard text data, taking one piece of standard text data as second standard text data during the traversal, acquiring a directory containing 'requirement' second words in the directory of the first standard text data as a first directory, acquiring a directory containing 'requirement' second words in the directory of the second standard text data as a second directory, and traversing the edit distance of each row of directories in the first directory and the second directory through a python-Levenshtein toolkit, calculating the proportion of the number of the subdirectories with the edit distance smaller than 2 in the number of the subdirectories of the first directory as the directory overlap proportion, when the overlapping proportion of the catalogues exceeds a preset value, the second standard text data is used as a suspected updating standard text of the first standard text data;
determining an industry standard text needing to be updated by adopting a hierarchical similarity calculation method according to the suspected updating standard text;
performing structured extraction on the industry standard text needing to be updated through a preset text matching rule to obtain structured standard data;
carrying out artificial classification according to the structured standard data, and training a binary classifier by a Bayesian classification method;
according to the industry standard text needing to be updated, a mapping relation between two differences is excavated by evaluating the difference between the new standard text and the old standard text and the difference between the new standard text and the old standard text;
automatically evaluating the network environment needing to be tested at present according to the structured standard data of the industry standard text needing to be updated through an automatic testing system to obtain an evaluation score p 1; manually evaluating the network environment needing to be tested by a reviewer to obtain an evaluation score p 2;
and comparing whether the results between the p1 and the p2 are consistent or not, and processing the updated standard text according to the comparison result.
Preferably, the crawling of the industry standard text related to the information security on the internet comprises:
extracting a target link according to the seed link and putting the target link into a queue to be crawled; analyzing and downloading standard text data from the page, and analyzing the html page by using a Jsoup component through webmagics; and storing the extracted standard text data in a text file format or storing the extracted standard text data in a database.
Preferably, the step of determining the industry standard text to be updated by adopting a hierarchical similarity calculation method according to the doubtful update standard text includes:
performing hierarchical similarity calculation on the corresponding multilevel directories, extracting the directories containing specific keywords in the hierarchical similarity calculation process, filtering the directories not containing the specific keywords, and constructing a standard text directory tree by a text analysis method; calculating the sum of the similarity between the first standard text data and the content text under each subdirectory in the first standard text data through a bm25 similarity calculation method, judging whether the similarity exceeds a threshold value, taking the standard file with a later release date as the industry standard text needing to be updated of the file with an earlier release date when the similarity is higher than the threshold value, and determining whether the standard text needing to be updated is difficult to determine whether the standard text needs to be updated through a method of fusing two pieces of file data when the similarity is not higher than the threshold value.
Preferably, the structured extraction of the industry standard text to be updated is performed through a preset text matching rule, so as to obtain structured standard data, and the method includes: the structured standard data comprises industry standard field names and scoring standard contents corresponding to the industry standard field names, and the structured standard data is manually classified into two types, namely a non-network safety type and an automatic network safety testing type. The structured standard unified preset rule is expressed as:
{ [ industry standard field name ], [ corresponding scoring standard content ] };
the non-network security type is a type which cannot pass the test and automatically test the security score;
the automatically testable network security type refers to a type that can automatically test a security score through a test system.
Preferably, according to the industry standard text needing to be updated, a mapping relation between the differences is mined by evaluating the differences between the new standard text and the old standard text and the differences between the new standard text and the old standard text; the method comprises the following steps: the variability is subdivided into three aspects: numerical values, degree adverbs, and numerical differences vary.
Preferably, an automatic testing system is used for automatically evaluating the network environment needing to be tested at present according to the structured standard data of the industry standard text needing to be updated, so that an evaluation score p1 is obtained; manually evaluating the network environment needing to be tested by a reviewer to obtain an evaluation score p 2; the method comprises the following steps: a software automatic test tool is preset in the automatic test system, and the automatic test tool can automatically test and evaluate a network environment according to the industry standard field name in the structured standard of the standard text or the corresponding grading standard content to obtain a test and evaluation score;
and the reviewer manually evaluates the network environment needing to be tested at present, digs the mapping relation between the two differences according to the structural standard data of the updated standard text and the differences between the new and old scoring standard texts and pushes the mapping relation to the reviewer, so that the reviewer is assisted to manually evaluate the network environment.
Preferably, comparing whether the results between p1 and p2 are consistent, and processing the updated standard text according to the comparison result comprises:
if the standard texts are consistent, judging that the updated standard texts are accurate, and warehousing the updated standard texts and the corresponding structured standard data;
if the results are inconsistent, the scoring difference caused by the fact that the updating standard text field is updated incorrectly is judged by default, and the reviewers list the fields with the updating errors and the reasons of the errors; updating a structured text matching rule of a standard text according to fields with update errors and reasons of the errors listed by the reviewers;
if the reviewer does not list the updating error of the specific field, the reviewer judges that the score cannot be automatically evaluated through the automatic test system due to the fact that the updated standard document has the non-network security field; and acquiring the industry standard field names and the corresponding scoring standard contents which cause scoring differences, marking the industry standard field names and the corresponding scoring standard contents as non-network security types, and retraining the binary classifier in the step 5.
[ description of the drawings ]
Fig. 1 is a flow chart of a method for constructing an artificial intelligence-based information security scoring system according to the present invention.
Fig. 2 is another schematic diagram of the method for constructing the information security scoring system based on artificial intelligence, namely a catalog tree of text information of the industry standard "guidance for evaluating network security level protection in financial industry".
Fig. 3 is another schematic diagram of a method for constructing an information security scoring system based on artificial intelligence according to the present invention, namely, part 2 of the implementation guidance of network security level in financial industry: basic requirements "catalog tree of textual information of industry standard.
[ detailed description ] embodiments
And S1, crawling standard texts related to industry standard online information safety as original data through a crawler frame webmagic. The concrete implementation is as follows: firstly, extracting a target link according to a seed link and putting the target link into a queue to be crawled; then, analyzing and downloading standard text data from the page, and analyzing the html page by webmagics by using a Jsoup component; and finally, storing the extracted standard text data in a text file format or storing the standard text data in a database.
S2, acquiring one piece of standard text data as first standard text data according to the standard text data crawled in S1, traversing all other standard text data, acquiring a directory containing a 'requirement' second word in the directory of the first standard text data as a first directory, acquiring a directory containing the 'requirement' second word in the directory of the second standard text data as a second directory, traversing the editing distance of each line of directories in the first directory and the second directory by a python-Levenshtein toolkit, calculating the proportion of the number of subdirectories with the editing distance smaller than 2 in the number of subdirectories of the first directory as a directory overlap proportion, and when the directory overlap proportion exceeds a preset value, taking the second standard text data as a suspected updating standard of the first standard text data, for example:
first, the first standard text data crawled by the crawler is: financial industry network security level protection evaluation guide and second standard text data financial industry network security level protection implementation guide part 2: basic requirements "these two industry standard documents. The corresponding old industry standard is supposed to be 'financial industry network security level protection evaluation guideline';
aiming at each industry standard file, extracting text information with a keyword 'requiring' in the name of a file secondary directory; and traversing the edit distance of each row of directories in the first directory and the second directory through a python-Levenshtein toolkit, and calculating the subdirectory condition with the edit distance less than 2, for example, text information extracted from a financial industry network security level protection evaluation guideline includes: 6, all text information under three secondary catalogues in the second-level evaluation requirement, the third-level evaluation requirement and the fourth-level evaluation requirement of 8; and financial industry network security level protection implementation guide part 2: the text information extracted in the basic requirement "includes: 7, 8, and 9, wherein the text information under the three second-level catalogues in the second-level safety requirement, the third-level safety requirement and the fourth-level safety requirement.
According to the second level requirement, four points are included below, and the part 2 of the implementation of the financial industry network security level protection is guided: the basic requirements are respectively: the method comprises the following steps of safety general requirements, cloud computing safety expansion requirements, mobile internet safety expansion requirements and internet of things safety expansion requirements; for the financial industry network security level protection evaluation guideline, the following are respectively: the method comprises the following steps of (1) safety general evaluation requirements, cloud computing safety evaluation expansion requirements, mobile internet safety evaluation expansion requirements and internet of things safety evaluation expansion requirements;
the list under "secure general requirements" includes: a secure physical environment, a secure communication network, a secure zone boundary, etc.;
according to the safe physical environment, the text information of the physical position of two industry standards is as follows:
7.1.1.1 physical location selection
This requirement includes:
a) the machine room site is selected in a building with the capabilities of shock resistance, wind resistance, rain resistance and the like.
b) The machine room site should be avoided in the top floor or basement of the building, otherwise, water and moisture proofing measures should be enhanced.
6.1.1.1 physical location selection
Evaluation unit (L2-PES1-01)
The evaluation unit comprises the following requirements:
a) evaluation indexes are as follows: the machine room site is selected in a building with the capabilities of shock resistance, wind resistance, rain resistance and the like.
b) Evaluation subjects: and recording form type documents and a computer room.
c) The evaluation implementation included the following:
1) whether the building in which the system is located has a building earthquake fortification approval document or not is checked.
2) It should be checked whether there is no rain water leakage in the machine room.
3) Whether the doors and windows of the machine room have serious dust caused by wind or not should be checked.
1) It should be checked whether the roof, wall, door, window, floor, etc. are not damaged or cracked.
d) Unit judgment: if 1) to-4) are all positive, the index requirements of the evaluation unit are met, otherwise, the index requirements of the evaluation unit are not met or partially met.
Evaluation unit (L2-PES1-02)
The evaluation unit comprises the following requirements:
a) evaluation indexes are as follows: the machine room site should be avoided being arranged on the top floor or the basement of the building, otherwise, waterproof and damp-proof construction should be strengthened.
b) Evaluation subjects: machine room.
c) Evaluation implementation: whether the machine room is not positioned at the top layer of the building or in the basement or not is checked, and if not, whether waterproof and moistureproof measures are taken or not is checked.
d) Unit judgment: if the evaluation implementation content is positive, the evaluation unit meets the index requirement, otherwise, the evaluation unit does not meet the index requirement.
The requirement standard in the financial industry network security level protection evaluation guideline comprises a financial industry network security level protection implementation guideline 2: all the contents in the basic requirements include the financial industry network security level protection implementation guidance part 2 in the financial industry network security level protection evaluation guideline: basic requirements all require that the catalog overlap ratio exceeds the preset value by 80%. Resulting in the system determining that the two industry standards are too similar. And thus cannot be distinguished. If no distinction is made, the system directs part 2 to implement financial industry network security level protection: the basic requirements "is parsed and the information of the text is extracted for updating the industry standard operation. Will directly result in an update error.
S3: and determining the industry standard text needing to be updated by adopting a hierarchical similarity calculation method according to the doubtful updating standard text obtained in the step S2. The method specifically comprises the following steps: performing hierarchical similarity calculation on the corresponding multilevel directories, extracting the directories containing specific keywords in the hierarchical similarity calculation process, filtering the directories not containing the specific keywords, and constructing a standard text directory tree by a text analysis method; calculating the sum of the similarity between the content texts under each subdirectory in the two standard files by a bm25 similarity calculation method, judging whether the similarity exceeds a threshold value, taking the standard file with the later release date as the industry standard text needing to be updated of the file with the earlier release date when the similarity is higher than the threshold value, and determining whether the industry standard text needing to be updated by a method of fusing the data of the two files, wherein the method specifically comprises the following steps:
building a standard text directory tree by a text analysis method, for example: according to the industry standard of the financial industry network security level protection evaluation guideline and the financial industry network security level protection implementation guideline part 2: basic requirements two text data of industry standard, through text analysis method, using the text information of the industry standard of financial industry network security level protection evaluation guideline as a catalogue tree, and directing the implementation of financial industry network security level protection to part 2: basic requirements the textual information of the industry standard works itself as a tree of contents. As shown in fig. 2 and 3, respectively.
(b) Calculating the sum of the similarity between the content texts under each subdirectory in the two standard files by a bm25 similarity calculation method, judging whether the similarity exceeds a threshold value, taking the file with the later release date as an industry standard file for updating the file with the earlier release date when the similarity is higher than the threshold value, and difficultly determining whether the file needs to be updated when the similarity is not higher than the threshold value, and updating the industry standard file by a method of fusing data of the two files. For example: and (2) calculating the industry standard of financial industry network security level protection evaluation guideline and financial industry network security level protection implementation guidance: basic requirements "distance between content texts under various subdirectories in the industry standard. The part 2 for realizing the text information of the industry standard of the financial industry network security level protection evaluation guideline and the financial industry network security level protection implementation guidance: basic requirements "comparison of similarity between textual information of industry standards. For calculating the similarity between two text messages, a bm25 similarity calculation method is used, the sum of the similarities of all directory data points of the two text messages is calculated, the greater the sum of the similarities is, the higher the similarity of the two standard data is, when the similarity is higher than a threshold value, a file with a later release date replaces a file with an earlier release date to serve as an updated industry standard file, when the similarity is not higher than the threshold value, whether the update is needed to be determined is difficult, and the industry standard file is updated through a method of fusing the two file data.
(c) Updating an industry standard file by a method of fusing two pieces of file data, for example: according to the fig. 2, each secondary topic under each level topic in the two catalogues is regarded as a data point, and as can be seen from the fig. 2, the data points under the "second level evaluation requirement (a)" in the primary topic are respectively: general requirements for safety assessment (A2-1),
The method comprises the following steps of cloud computing safety evaluation expansion requirement (A2-2), mobile internet safety evaluation expansion requirement (A2-3) and internet of things safety evaluation expansion requirement (A2-4). Other secondary topics are also divided according to the same reason. Similarly, the data points in the second level subject in fig. 3 also include: the system comprises a security general requirement (D2-1), a cloud computing security extension requirement (D2-2), a mobile internet security extension requirement (D2-3), an internet of things security extension requirement (D2-4) and the like. To facilitate the calculation of the distance between each data point, a label is assigned to each system as described in FIG. 2 (A2-1), (A2-2), (A2-3), (A2-4), and so on.
(d) And calculating the similarity between the data points aiming at the primary topic corresponding to the primary topic and the secondary topic corresponding to the secondary topic in the two catalogs, and merging and updating the two data points with the highest similarity. For example, it can be known by calculation that: data points (A2-1) in the industry standard of financial industry network security level protection evaluation guidelines and the financial industry network security level protection implementation guidelines part 2: the similarity of the data point (D2-1) is highest in the industry Standard, so the data point (A2-1) and the data point (D2-1) are merged into a New data point New _ A2-1. The data point (A2-2) is merged with the data point (D2-2) to become a New data point New _ A2-2. Similarly, the distance between the data point and the data point is continuously calculated, and the two data points with the minimum distance are found and merged into a new data point.
(e) Combining the newly combined data points to the corresponding primary topic according to the principle that the primary topic corresponds to the primary topic in the step (d). For example, the combined data point New _ A2-1, the combined data point New _ A2-2, the combined data point New _ A2-3 and the combined data point New _ A2-4 are combined into a New primary topic corresponding data point, New _ A. Other similar principles continue to merge combined data points to form a new primary topic. And finally, combining all newly generated primary topic data points to generate a New directory tree New _ 1. The meaning of this New directory tree New _1 is: the original financial industry network security level protection evaluation guideline industry standard document combines with the crawled financial industry network security level protection implementation guidance part 2: the basic requirements of the document of the industry standard document carries out text clustering and updates the document of the industry standard document New _1 into a New document of the industry standard document New _1, namely a directory tree of New _ 1.
(f) And (e) taking the industry standard file newly generated in the step (e) as an industry standard file needing to be updated.
And S4, performing structured extraction on the industry standard file to be updated through a preset text matching rule to obtain structured standard data, wherein the structured standard data comprise industry standard field names and scoring standard contents corresponding to the industry standard field names, and the structured standard data are manually classified into two types, namely a non-network safety type and an automatic network safety type. The structured standard unified preset rule is expressed as: { [ industry Standard field name ], [ corresponding Scoring Standard content ] }.
The non-network security type is a type which cannot pass the test and automatically test the security score.
The automatically testable network security type refers to a type that can automatically test a security score through a test system.
For example, based on an industry standard text of 'information security technology based on IPv6 high-performance network auditing system product security technical requirement', field information features are extracted from the text information by using a syntax analysis technology; the field information characteristics comprise field types and field contents corresponding to the field types. For this industry standard document, the extracted text message is all the text messages whose titles in the first-level title contain the "requirement" keyword. Other unwanted first-level headline text information is filtered out. If the first-level titles have: "safety function requirement", "environmental adaptability requirement", "performance requirement", "safety guarantee requirement". And for each secondary title, extracting the title of the corresponding last-level sub-theme as one of keys in the characteristic values, and taking the corresponding requirement of the text information as the value corresponding to the key.
The module of 'safety function requirement' can find that the standard requirement information in the industry standard document comprises a non-network safety type and an automatic network safety testing type. The five modules of information acquisition, data recovery, management control requirement, safety management and data storage are of a non-network safety type after the extracted text information is vectorized. And the text information of the remaining modules of 'audit record statistics', 'audit record analysis processing', 'identification and authentication' and 'audit log' is vectorized and then belongs to the type of automatically testable network security. For the non-network security type, "data storage", according to its last level title, the extracted keys are: "storage medium", "data deletion", "backup & restore", and the like.
The data format after the text information normalization process is as follows:
{ 'data delete': a) should be able to record basic information of deletion behavior, including time and date, operator, description of deleted content, b) should be able to set policy to automatically delete data that exceeds a retention time limit,
'data deletion scoring criteria': [ a) cannot record basic information of deletion behavior, including time and date, operator, description of deletion content, and 1 point is deducted, b) cannot set policy to automatically delete data exceeding a storage time limit, and 1 point is deducted ] }
{ 'backup and restore': the IPv6 network audit product should provide audit record backup and recovery functions,
'backup and restore scoring criteria': [ IPv6 network audit product does not provide audit record backup and recovery function, deduct 1 point ] }
For the type of automatically testable network security, "audit record analysis processing", according to its last level header, the extracted keys are: "association analysis", "anomaly analysis", "response alert", and the like.
The data format after the text information normalization process is as follows:
{ 'anomaly analysis': [ a) capable of predefining an abnormal event, capable of triggering an alarm when a certain threshold value is reached for the number of times or frequency domain occurred within a certain time period or when a certain flow rate reaches a threshold value, b) capable of defining an abnormal behavior based on the result of the correlation analysis function, triggering an alarm for the abnormal behavior of the system, c) other abnormal situations ],
'anomaly analysis scoring criteria': [ a) can predefine abnormal events, can trigger alarm when the frequency or frequency domain of occurrence in a certain time period reaches a certain threshold or certain flow reaches a threshold, and deduct 1 point, b) can define abnormal behaviors based on the result of the correlation analysis function, trigger alarm for the abnormal behaviors of the system, and deduct 1 point, c) other abnormal conditions, and deduct 1 point ] }
{ 'Association analysis': [ a) basic information association including collecting information based on time, event, source IP address, source port address, destination IP address, service type, network protocol, etc. ], b) statistical association, performing association analysis between different network events by using a data mining algorithm, etc.,
'association analysis scoring criteria': [ a) there is no basic information association, including collecting information based on time, event, source IP address, source port address, destination IP address, service type, network protocol, etc., deducting 0.5 points, b) there is no statistical association, and the method of data mining algorithm, etc. is used to make statistics for the association analysis between different network events, deducting 0.5 points ] }
{ 'response to warning': [ a) the product should support policy setting triggering alarms, b) should be able to record alarms, and the content should include: date, time, event subject, event level, event description, alarm frequency, event result, c) alarm mode should support at least one of mail alarm, SNMPtrap alarm, acousto-optic alarm, short message alarm, etc. ],
'response warning scoring criteria': [ a) the product cannot support the strategy setting to trigger the alarm, and deduct 1 point, b) the alarm cannot be recorded, and the content should include: date, time, event subject, event level, event description, alarm times, event result, and 1 point deduction, c) the alarm mode does not support at least one of mail alarm, SNMPtrap alarm, acousto-optic electric alarm, short message alarm, and the like, and 1 point deduction }
And S5, training a binary classifier by a Bayesian classification method according to the structured standard data which is obtained in the step S4 and is subjected to manual classification, wherein one type is a non-network safety type, the other type is a network safety type which can be automatically tested, the input parameters of the Bayesian classification comprise the field names of the structured standard data and the text information included in the field contents, and the prediction result of the Bayesian classification is the non-network safety type or the network safety type which can be automatically tested.
S6: according to the step S3, the industry standard texts needing to be updated are obtained, the mapping relation between the two differences is mined by evaluating the differences between the new and old standard texts and the differences between the new and old scoring standard texts, and the mapping relation is effectively modeled. The differences specifically include two aspects: 1. the degree of change 2 of the standard text and the degree of change of the scoring standard at different times. In particular, the distinctive features of standard text can also be subdivided into three aspects: numerical values, degree adverbs, and quantities. For example:
the numerical description in the standard text changes: old standard text: the security functions describing the product should cover 80%; new standard text: the safety functions describing the product should cover 90%. The change is characterized by a change in numerical characteristic from "80%" to "90%".
The degree adverb in the standard text changes: old standard text: part describes the safety function of the product; new standard text: the safety function of the product is fully described. The variation is in the degree adverb feature from "partial" to "complete".
The number in the standard text changes:
old standard text: the developer should provide a complete functional specification, which should satisfy the following requirements: a) fully describe the safety function of the product; b) describing the purpose and the using method of all safety function interfaces; c) identifying and describing all parameters related to each secure function interface; d) describing security function implementation behaviors related to the security function interface; e) describing a direct error message caused by the security function implementing the behavioral processing; f) describing the tracing of the safety function requirement to the safety function interface;
new standard text: the developer should provide a complete functional specification, which should satisfy the following requirements: a) fully describe the safety function of the product; b) describing the purpose and the using method of all safety function interfaces; c) identifying and describing all parameters related to each secure function interface; d) describing security function implementation behaviors related to the security function interface; e) describing a direct error message caused by the security function implementing the behavioral processing; f) describing the tracing of the safety function requirement to the safety function interface; g) describing all behaviors related to the safety function interface in the safety function implementation process; the variation is that the number of criteria increases from "6" to "7".
And S7, identifying the updated industry standard content through a difference evaluation system according to the standard text acquired in the step S3, wherein the difference evaluation system comprises a feature identification model. The method specifically comprises the following steps:
the feature recognition model is essentially an entity recognition model, intended to recognize entity words in a given text. Such as: the tomorrow is the mid-autumn festival, and the mid-autumn festival is a festival entity. Thus, this step identifies "feature" entities in a given text using an entity recognition model. The invention only relates to the industry standard and the scoring standard in the field of information security, and the related texts have stronger regularity, so that a better effect can be achieved by utilizing a named entity identification method based on rules. Firstly, taking out entities (features) in a training sample, establishing a feature word dictionary, then labeling sequences, and finally identifying the entities by using a regular matching method. Taking as an example the feature in "describing the safety function of a product should cover 90%": firstly, the sentence is segmented [ the safety function of the product is described to cover 90% ], then, a part-of-speech tagging tool is used for tagging, the tagging rule is that if the word is in a feature word dictionary, the word is tagged as E, and the rest words are tagged according to the part-of-speech given in the part-of-speech tagging tool, and the obtained sequence is [ VNDETNADVVE ]. And for the text, performing regular matching according to the obtained sequence rule to identify the characteristic entity of the text. And performing feature recognition on the industry standard text and the scoring standard text to be processed by using the model. And finally, carrying out nonlinear change on the LSTM-coded sentences by using a multilayer perceptron to realize semantic understanding of the features. And inputting the marked sequence as a feature and the label into an LSTM network for training, wherein the output dimension is 50, the activation function is tan h, then a 0.5 Dropout layer is added, the activation function is a full connection layer of softmax, adam is adopted by an optimizer, and the accuracy rate is adopted by the monitoring index. Specifically, the feature identification module identifies that the given industry standard text "describes that the safety function of the product should cover 90%" and the scoring standard text "describes that the safety function of the product should cover only 80%, and deducts 1 point; if the safety function of the described product only reaches 70%, the product is deducted for 2 points; less than 70%, and 5 points out (5 points out full) ". And after the numerical characteristics are identified, semantic understanding is carried out on the identified characteristics by using a text understanding module, so that modeling is carried out on the mapping relation between the industry standard text and the scoring standard text. For example: the change of the numerical characteristic of the industry standard is related to the change degree of the scoring standard, and when the numerical characteristic of the industry standard is adjusted from '80%', the numerical characteristic of the industry standard is adjusted
By "90%", the corresponding scoring criteria numerical characteristic is also changed from "80%" to "90%", the change being characterized by a correspondence in the change of the two characteristics.
S8, according to the method of the steps S1-S7, the standard text is continuously updated.
S9, automatically evaluating the network environment needing to be tested at present through an automatic testing system according to the structured standard data of the updated standard text obtained in the steps S1-S8 to obtain an evaluation score p 1; manually evaluating the network environment needing to be tested by a reviewer to obtain an evaluation score p 2;
a software automatic test tool is preset in the automatic test system, and the automatic test tool can automatically test and evaluate a network environment according to the industry standard field name in the structured standard of the standard text or the corresponding grading standard content to obtain a test and evaluation score;
the evaluation of the network environment needing to be tested at present by the reviewers manually comprises the following steps: the structural standard data of the updated standard texts obtained in the steps S1-S8 and the differences between the new and old scoring standard texts are mined, the mapping relation between the two differences is excavated and pushed to the reviewer, and the reviewer is assisted to manually evaluate the network environment;
comparing whether the results between p1 and p2 are consistent or not, and processing the updated standard text according to the comparison result;
if the standard texts are consistent, judging that the updated standard texts are accurate, and warehousing the updated standard texts and the corresponding structured standard data;
if the results are inconsistent, the default judgment result shows that the updating standard text field is updated wrongly to cause the grading difference, and the reviewer lists the field with the updating error and the reason of the error. And updating the structured text matching rule of the standard text according to the fields listed by the reviewers as the updating errors and the reasons of the errors.
If the reviewer does not list the specific field updating error, the reviewer judges that the score cannot be automatically evaluated through the automatic test system due to the fact that the updated standard document is provided with the non-network security field. And acquiring the industry standard field names and the corresponding scoring standard contents which cause scoring differences, marking the industry standard field names and the corresponding scoring standard contents as non-network security types, and retraining the binary classifier in the step 5.

Claims (7)

1. A method for constructing an information security scoring system based on artificial intelligence is characterized by comprising the following steps:
crawling standard texts related to industry standard online information safety;
according to the crawled standard texts, acquiring one piece of standard text data as first standard text data, traversing all other pieces of standard text data, taking one piece of standard text data as second standard text data during the traversal, acquiring a directory containing 'requirement' second words in the directory of the first standard text data as a first directory, acquiring a directory containing 'requirement' second words in the directory of the second standard text data as a second directory, and traversing the edit distance of each row of directories in the first directory and the second directory through a python-Levenshtein toolkit, calculating the proportion of the number of the subdirectories with the edit distance smaller than 2 in the number of the subdirectories of the first directory as the directory overlap proportion, when the overlapping proportion of the catalogues exceeds a preset value, the second standard text data is used as a suspected updating standard text of the first standard text data;
determining an industry standard text needing to be updated by adopting a hierarchical similarity calculation method according to the suspected updating standard text;
performing structured extraction on the industry standard text needing to be updated through a preset text matching rule to obtain structured standard data;
carrying out artificial classification according to the structured standard data, and training a binary classifier by a Bayesian classification method;
according to the industry standard text needing to be updated, a mapping relation between two differences is excavated by evaluating the difference between the new standard text and the old standard text and the difference between the new standard text and the old standard text;
automatically evaluating the network environment needing to be tested at present according to the structured standard data of the industry standard text needing to be updated through an automatic testing system to obtain an evaluation score p 1; manually evaluating the network environment needing to be tested by a reviewer to obtain an evaluation score p 2;
and comparing whether the results between the p1 and the p2 are consistent or not, and processing the updated standard text according to the comparison result.
2. The method for constructing the information security scoring system based on the artificial intelligence as claimed in claim 1, wherein: the method for crawling the standard texts relevant to the industry standard online information safety comprises the following steps:
extracting a target link according to the seed link and putting the target link into a queue to be crawled; analyzing and downloading standard text data from the page, and analyzing the html page by using a Jsoup component through webmagics; and storing the extracted standard text data in a text file format or storing the extracted standard text data in a database.
3. The method for constructing the information security scoring system based on the artificial intelligence as claimed in claim 1, wherein: and determining the industry standard text needing to be updated by adopting a hierarchical similarity calculation method according to the doubtful updating standard text, wherein the method comprises the following steps:
performing hierarchical similarity calculation on the corresponding multilevel directories, extracting the directories containing specific keywords in the hierarchical similarity calculation process, filtering the directories not containing the specific keywords, and constructing a standard text directory tree by a text analysis method; calculating the sum of the similarity between the first standard text data and the content text under each subdirectory in the first standard text data through a bm25 similarity calculation method, judging whether the similarity exceeds a threshold value, taking the standard file with a later release date as the industry standard text needing to be updated of the file with an earlier release date when the similarity is higher than the threshold value, and determining whether the standard text needing to be updated is difficult to determine whether the standard text needs to be updated through a method of fusing two pieces of file data when the similarity is not higher than the threshold value.
4. The method for constructing the information security scoring system based on the artificial intelligence as claimed in claim 1, wherein: the structured extraction is carried out on the industry standard text needing to be updated through a preset text matching rule to obtain structured standard data, and the method comprises the following steps: the structured standard data comprises industry standard field names and scoring standard contents corresponding to the industry standard field names, and the structured standard data is manually classified into two types, namely a non-network safety type and an automatic network safety testing type. The structured standard unified preset rule is expressed as: { [ industry standard field name ], [ corresponding scoring standard content ] };
the non-network security type is a type which cannot pass the test and automatically test the security score;
the automatically testable network security type refers to a type that can automatically test a security score through a test system.
5. The method for constructing the information security scoring system based on the artificial intelligence as claimed in claim 1, wherein: the mapping relation between the two differences is mined by evaluating the difference between the new standard text and the old standard text and the difference between the new standard text and the old standard text according to the industry standard text needing to be updated; the method comprises the following steps: the variability is subdivided into three aspects: numerical values, degree adverbs, and numerical differences vary.
6. The method for constructing the information security scoring system based on the artificial intelligence as claimed in claim 1, wherein: the network environment needing to be tested at present is automatically evaluated according to the structured standard data of the industry standard text needing to be updated through the automatic testing system, and an evaluation score p1 is obtained; manually evaluating the network environment needing to be tested by a reviewer to obtain an evaluation score p 2; the method comprises the following steps: a software automatic test tool is preset in the automatic test system, and the automatic test tool can automatically test and evaluate a network environment according to the industry standard field name in the structured standard of the standard text or the corresponding grading standard content to obtain a test and evaluation score;
and the reviewer manually evaluates the network environment needing to be tested at present, digs the mapping relation between the two differences according to the structural standard data of the updated standard text and the differences between the new and old scoring standard texts and pushes the mapping relation to the reviewer, so that the reviewer is assisted to manually evaluate the network environment.
7. The method for constructing the information security scoring system based on the artificial intelligence as claimed in claim 1, wherein: whether the results of the comparison between the p1 and the p2 are consistent or not is judged, and the updated standard text is processed according to the comparison result, wherein the method comprises the following steps:
if the standard texts are consistent, judging that the updated standard texts are accurate, and warehousing the updated standard texts and the corresponding structured standard data;
if the results are inconsistent, the scoring difference caused by the fact that the updating standard text field is updated incorrectly is judged by default, and the reviewers list the fields with the updating errors and the reasons of the errors; updating a structured text matching rule of a standard text according to fields with update errors and reasons of the errors listed by the reviewers;
if the reviewer does not list the updating error of the specific field, the reviewer judges that the score cannot be automatically evaluated through the automatic test system due to the fact that the updated standard document has the non-network security field; and acquiring the industry standard field names and the corresponding scoring standard contents which cause scoring differences, marking the industry standard field names and the corresponding scoring standard contents as non-network security types, and retraining the binary classifier.
CN202110958360.2A 2021-08-20 2021-08-20 Information security scoring system construction method based on artificial intelligence Pending CN113886830A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110958360.2A CN113886830A (en) 2021-08-20 2021-08-20 Information security scoring system construction method based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110958360.2A CN113886830A (en) 2021-08-20 2021-08-20 Information security scoring system construction method based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN113886830A true CN113886830A (en) 2022-01-04

Family

ID=79010798

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110958360.2A Pending CN113886830A (en) 2021-08-20 2021-08-20 Information security scoring system construction method based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN113886830A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114676229A (en) * 2022-04-20 2022-06-28 国网安徽省电力有限公司滁州供电公司 Technical improvement major repair project file management system and management method
CN115174270A (en) * 2022-09-05 2022-10-11 杭州安恒信息技术股份有限公司 Behavior abnormity detection method, device, equipment and medium
CN116595588A (en) * 2023-07-17 2023-08-15 卡斯柯信号(北京)有限公司 Safety analysis method and device for railway signal system development process
CN117240766A (en) * 2023-11-15 2023-12-15 合肥天帷信息安全技术有限公司 Automatic grading and selecting method and system for network security level protection evaluation

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114676229A (en) * 2022-04-20 2022-06-28 国网安徽省电力有限公司滁州供电公司 Technical improvement major repair project file management system and management method
CN114676229B (en) * 2022-04-20 2023-01-24 国网安徽省电力有限公司滁州供电公司 Technical improvement major repair project file management system and management method
CN115174270A (en) * 2022-09-05 2022-10-11 杭州安恒信息技术股份有限公司 Behavior abnormity detection method, device, equipment and medium
CN115174270B (en) * 2022-09-05 2022-11-29 杭州安恒信息技术股份有限公司 Behavior abnormity detection method, device, equipment and medium
CN116595588A (en) * 2023-07-17 2023-08-15 卡斯柯信号(北京)有限公司 Safety analysis method and device for railway signal system development process
CN117240766A (en) * 2023-11-15 2023-12-15 合肥天帷信息安全技术有限公司 Automatic grading and selecting method and system for network security level protection evaluation
CN117240766B (en) * 2023-11-15 2024-02-13 合肥天帷信息安全技术有限公司 Automatic grading and selecting method and system for network security level protection evaluation

Similar Documents

Publication Publication Date Title
CN113886830A (en) Information security scoring system construction method based on artificial intelligence
CN110321371B (en) Log data anomaly detection method, device, terminal and medium
CN106202561B (en) Digitlization contingency management case base construction method and device based on text big data
Zimmeck et al. Privee: An architecture for automatically analyzing web privacy policies
KR101545215B1 (en) system and method for automatically manageing fault events of data center
CN113836381A (en) System scoring coverage degree tuning method
Srinath et al. Privacy at scale: Introducing the PrivaSeer corpus of web privacy policies
CN106055541A (en) News content sensitive word filtering method and system
CN104820629A (en) Intelligent system and method for emergently processing public sentiment emergency
CN103761173A (en) Log based computer system fault diagnosis method and device
CN103154845A (en) Machine learning for power grids
US20170235784A1 (en) System and method for improving performance of unstructured text extraction
CN104765733A (en) Method and device for analyzing social network event
Saravanan et al. Improving legal document summarization using graphical models
CN110990836A (en) Code leakage detection system and method based on natural language processing technology
US20180181559A1 (en) Utilizing user-verified data for training confidence level models
KR102124935B1 (en) Disaster Monitoring System, Method Using Crowd Sourcing, and Computer Program therefor
CN112560031A (en) Lesovirus detection method and system
CN113779261B (en) Quality evaluation method and device of knowledge graph, computer equipment and storage medium
CN114398667A (en) Data security access system and method of computer storage system
Nishioka et al. Analysing the evolution of knowledge graphs for the purpose of change verification
CN115619090B (en) Safety assessment method based on model and data driving
CN109918638B (en) Network data monitoring method
US20180260476A1 (en) Expert stance classification using computerized text analytics
Pochampally et al. Notability determination for Wikipedia

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination