CN113886830A

CN113886830A - Information security scoring system construction method based on artificial intelligence

Info

Publication number: CN113886830A
Application number: CN202110958360.2A
Authority: CN
Inventors: 才华
Original assignee: Guangdong Southern Information Security Research Institute
Current assignee: Guangdong Southern Information Security Research Institute
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2022-01-04

Abstract

The invention discloses a method for constructing an information security scoring system based on artificial intelligence, which comprises the following steps: crawling standard texts related to industry standard online information safety; determining that the standard text is suspected to be updated; determining an industry standard text needing to be updated; performing structured extraction on the industry standard text needing to be updated through a preset text matching rule to obtain structured standard data; evaluating the difference between the new standard text and the old standard text and the difference between the new standard text and the old standard text according to the industry standard text needing to be updated; acquiring an automatic test system and a manual evaluation result of a reviewer; and the updated standard text is processed according to the comparison result, so that the problem of updating the standard text is solved more accurately.

Description

Information security scoring system construction method based on artificial intelligence

[ technical field ] A method for producing a semiconductor device

The invention relates to the field of information security scoring, in particular to a method for constructing an information security scoring system based on artificial intelligence.

[ background of the invention ]

With the continuous development of technology, information security becomes more and more important. When an information system is designed, a corresponding safety evaluation system is also designed, and the purpose of the safety evaluation system is to perform safety test on the current information system. Most common is a cryptographic complexity test. The complexity of the cipher refers to: the password set by the user must meet certain complexity, and if the password does not meet the certain complexity, a corresponding prompt is given. The safety scoring system is a system comprising various function tests, and if a certain function of the current information system is found not to meet the corresponding industry standard, the safety scoring system can correspondingly deduct points.

The existing safety scoring system depends on manual customization, a practitioner needs to learn the corresponding industry standard firstly and then manually design the corresponding scoring system, the method is relatively dependent on subjective judgment of the practitioner, and then energy is consumed for reading a large number of related industry standard detailed rules, so that the efficiency is not high. If a system can be designed to intelligently learn the existing industry standards and scoring rules, after a new industry standard appears, the scoring rules are intelligently updated and recommended to practitioners.

[ summary of the invention ]

The invention aims to solve the defects of the prior art and provides a method for constructing an information security scoring system based on artificial intelligence;

the purpose of the invention can be achieved by adopting the following technical scheme:

a method for constructing an information security scoring system based on artificial intelligence is characterized by comprising the following steps:

crawling standard texts related to industry standard online information safety;

according to the crawled standard texts, acquiring one piece of standard text data as first standard text data, traversing all other pieces of standard text data, taking one piece of standard text data as second standard text data during the traversal, acquiring a directory containing 'requirement' second words in the directory of the first standard text data as a first directory, acquiring a directory containing 'requirement' second words in the directory of the second standard text data as a second directory, and traversing the edit distance of each row of directories in the first directory and the second directory through a python-Levenshtein toolkit, calculating the proportion of the number of the subdirectories with the edit distance smaller than 2 in the number of the subdirectories of the first directory as the directory overlap proportion, when the overlapping proportion of the catalogues exceeds a preset value, the second standard text data is used as a suspected updating standard text of the first standard text data;

determining an industry standard text needing to be updated by adopting a hierarchical similarity calculation method according to the suspected updating standard text;

performing structured extraction on the industry standard text needing to be updated through a preset text matching rule to obtain structured standard data;

carrying out artificial classification according to the structured standard data, and training a binary classifier by a Bayesian classification method;

according to the industry standard text needing to be updated, a mapping relation between two differences is excavated by evaluating the difference between the new standard text and the old standard text and the difference between the new standard text and the old standard text;

automatically evaluating the network environment needing to be tested at present according to the structured standard data of the industry standard text needing to be updated through an automatic testing system to obtain an evaluation score p 1; manually evaluating the network environment needing to be tested by a reviewer to obtain an evaluation score p 2;

and comparing whether the results between the p1 and the p2 are consistent or not, and processing the updated standard text according to the comparison result.

Preferably, the crawling of the industry standard text related to the information security on the internet comprises:

extracting a target link according to the seed link and putting the target link into a queue to be crawled; analyzing and downloading standard text data from the page, and analyzing the html page by using a Jsoup component through webmagics; and storing the extracted standard text data in a text file format or storing the extracted standard text data in a database.

Preferably, the step of determining the industry standard text to be updated by adopting a hierarchical similarity calculation method according to the doubtful update standard text includes:

performing hierarchical similarity calculation on the corresponding multilevel directories, extracting the directories containing specific keywords in the hierarchical similarity calculation process, filtering the directories not containing the specific keywords, and constructing a standard text directory tree by a text analysis method; calculating the sum of the similarity between the first standard text data and the content text under each subdirectory in the first standard text data through a bm25 similarity calculation method, judging whether the similarity exceeds a threshold value, taking the standard file with a later release date as the industry standard text needing to be updated of the file with an earlier release date when the similarity is higher than the threshold value, and determining whether the standard text needing to be updated is difficult to determine whether the standard text needs to be updated through a method of fusing two pieces of file data when the similarity is not higher than the threshold value.

Preferably, the structured extraction of the industry standard text to be updated is performed through a preset text matching rule, so as to obtain structured standard data, and the method includes: the structured standard data comprises industry standard field names and scoring standard contents corresponding to the industry standard field names, and the structured standard data is manually classified into two types, namely a non-network safety type and an automatic network safety testing type. The structured standard unified preset rule is expressed as:

{ [ industry standard field name ], [ corresponding scoring standard content ] };

the non-network security type is a type which cannot pass the test and automatically test the security score;

the automatically testable network security type refers to a type that can automatically test a security score through a test system.

Preferably, according to the industry standard text needing to be updated, a mapping relation between the differences is mined by evaluating the differences between the new standard text and the old standard text and the differences between the new standard text and the old standard text; the method comprises the following steps: the variability is subdivided into three aspects: numerical values, degree adverbs, and numerical differences vary.

Preferably, an automatic testing system is used for automatically evaluating the network environment needing to be tested at present according to the structured standard data of the industry standard text needing to be updated, so that an evaluation score p1 is obtained; manually evaluating the network environment needing to be tested by a reviewer to obtain an evaluation score p 2; the method comprises the following steps: a software automatic test tool is preset in the automatic test system, and the automatic test tool can automatically test and evaluate a network environment according to the industry standard field name in the structured standard of the standard text or the corresponding grading standard content to obtain a test and evaluation score;

and the reviewer manually evaluates the network environment needing to be tested at present, digs the mapping relation between the two differences according to the structural standard data of the updated standard text and the differences between the new and old scoring standard texts and pushes the mapping relation to the reviewer, so that the reviewer is assisted to manually evaluate the network environment.

Preferably, comparing whether the results between p1 and p2 are consistent, and processing the updated standard text according to the comparison result comprises:

if the standard texts are consistent, judging that the updated standard texts are accurate, and warehousing the updated standard texts and the corresponding structured standard data;

if the results are inconsistent, the scoring difference caused by the fact that the updating standard text field is updated incorrectly is judged by default, and the reviewers list the fields with the updating errors and the reasons of the errors; updating a structured text matching rule of a standard text according to fields with update errors and reasons of the errors listed by the reviewers;

if the reviewer does not list the updating error of the specific field, the reviewer judges that the score cannot be automatically evaluated through the automatic test system due to the fact that the updated standard document has the non-network security field; and acquiring the industry standard field names and the corresponding scoring standard contents which cause scoring differences, marking the industry standard field names and the corresponding scoring standard contents as non-network security types, and retraining the binary classifier in the step 5.

[ description of the drawings ]

Fig. 1 is a flow chart of a method for constructing an artificial intelligence-based information security scoring system according to the present invention.

Fig. 2 is another schematic diagram of the method for constructing the information security scoring system based on artificial intelligence, namely a catalog tree of text information of the industry standard "guidance for evaluating network security level protection in financial industry".

Fig. 3 is another schematic diagram of a method for constructing an information security scoring system based on artificial intelligence according to the present invention, namely, part 2 of the implementation guidance of network security level in financial industry: basic requirements "catalog tree of textual information of industry standard.

[ detailed description ] embodiments

And S1, crawling standard texts related to industry standard online information safety as original data through a crawler frame webmagic. The concrete implementation is as follows: firstly, extracting a target link according to a seed link and putting the target link into a queue to be crawled; then, analyzing and downloading standard text data from the page, and analyzing the html page by webmagics by using a Jsoup component; and finally, storing the extracted standard text data in a text file format or storing the standard text data in a database.

S2, acquiring one piece of standard text data as first standard text data according to the standard text data crawled in S1, traversing all other standard text data, acquiring a directory containing a 'requirement' second word in the directory of the first standard text data as a first directory, acquiring a directory containing the 'requirement' second word in the directory of the second standard text data as a second directory, traversing the editing distance of each line of directories in the first directory and the second directory by a python-Levenshtein toolkit, calculating the proportion of the number of subdirectories with the editing distance smaller than 2 in the number of subdirectories of the first directory as a directory overlap proportion, and when the directory overlap proportion exceeds a preset value, taking the second standard text data as a suspected updating standard of the first standard text data, for example:

first, the first standard text data crawled by the crawler is: financial industry network security level protection evaluation guide and second standard text data financial industry network security level protection implementation guide part 2: basic requirements "these two industry standard documents. The corresponding old industry standard is supposed to be 'financial industry network security level protection evaluation guideline';

aiming at each industry standard file, extracting text information with a keyword 'requiring' in the name of a file secondary directory; and traversing the edit distance of each row of directories in the first directory and the second directory through a python-Levenshtein toolkit, and calculating the subdirectory condition with the edit distance less than 2, for example, text information extracted from a financial industry network security level protection evaluation guideline includes: 6, all text information under three secondary catalogues in the second-level evaluation requirement, the third-level evaluation requirement and the fourth-level evaluation requirement of 8; and financial industry network security level protection implementation guide part 2: the text information extracted in the basic requirement "includes: 7, 8, and 9, wherein the text information under the three second-level catalogues in the second-level safety requirement, the third-level safety requirement and the fourth-level safety requirement.

According to the second level requirement, four points are included below, and the part 2 of the implementation of the financial industry network security level protection is guided: the basic requirements are respectively: the method comprises the following steps of safety general requirements, cloud computing safety expansion requirements, mobile internet safety expansion requirements and internet of things safety expansion requirements; for the financial industry network security level protection evaluation guideline, the following are respectively: the method comprises the following steps of (1) safety general evaluation requirements, cloud computing safety evaluation expansion requirements, mobile internet safety evaluation expansion requirements and internet of things safety evaluation expansion requirements;

the list under "secure general requirements" includes: a secure physical environment, a secure communication network, a secure zone boundary, etc.;

according to the safe physical environment, the text information of the physical position of two industry standards is as follows:

7.1.1.1 physical location selection

This requirement includes:

a) the machine room site is selected in a building with the capabilities of shock resistance, wind resistance, rain resistance and the like.

b) The machine room site should be avoided in the top floor or basement of the building, otherwise, water and moisture proofing measures should be enhanced.

6.1.1.1 physical location selection

Evaluation unit (L2-PES1-01)

The evaluation unit comprises the following requirements:

a) evaluation indexes are as follows: the machine room site is selected in a building with the capabilities of shock resistance, wind resistance, rain resistance and the like.

b) Evaluation subjects: and recording form type documents and a computer room.

c) The evaluation implementation included the following:

1) whether the building in which the system is located has a building earthquake fortification approval document or not is checked.

2) It should be checked whether there is no rain water leakage in the machine room.

3) Whether the doors and windows of the machine room have serious dust caused by wind or not should be checked.

1) It should be checked whether the roof, wall, door, window, floor, etc. are not damaged or cracked.

d) Unit judgment: if 1) to-4) are all positive, the index requirements of the evaluation unit are met, otherwise, the index requirements of the evaluation unit are not met or partially met.

Evaluation unit (L2-PES1-02)

The evaluation unit comprises the following requirements:

a) evaluation indexes are as follows: the machine room site should be avoided being arranged on the top floor or the basement of the building, otherwise, waterproof and damp-proof construction should be strengthened.

b) Evaluation subjects: machine room.

c) Evaluation implementation: whether the machine room is not positioned at the top layer of the building or in the basement or not is checked, and if not, whether waterproof and moistureproof measures are taken or not is checked.

d) Unit judgment: if the evaluation implementation content is positive, the evaluation unit meets the index requirement, otherwise, the evaluation unit does not meet the index requirement.

The requirement standard in the financial industry network security level protection evaluation guideline comprises a financial industry network security level protection implementation guideline 2: all the contents in the basic requirements include the financial industry network security level protection implementation guidance part 2 in the financial industry network security level protection evaluation guideline: basic requirements all require that the catalog overlap ratio exceeds the preset value by 80%. Resulting in the system determining that the two industry standards are too similar. And thus cannot be distinguished. If no distinction is made, the system directs part 2 to implement financial industry network security level protection: the basic requirements "is parsed and the information of the text is extracted for updating the industry standard operation. Will directly result in an update error.

S3: and determining the industry standard text needing to be updated by adopting a hierarchical similarity calculation method according to the doubtful updating standard text obtained in the step S2. The method specifically comprises the following steps: performing hierarchical similarity calculation on the corresponding multilevel directories, extracting the directories containing specific keywords in the hierarchical similarity calculation process, filtering the directories not containing the specific keywords, and constructing a standard text directory tree by a text analysis method; calculating the sum of the similarity between the content texts under each subdirectory in the two standard files by a bm25 similarity calculation method, judging whether the similarity exceeds a threshold value, taking the standard file with the later release date as the industry standard text needing to be updated of the file with the earlier release date when the similarity is higher than the threshold value, and determining whether the industry standard text needing to be updated by a method of fusing the data of the two files, wherein the method specifically comprises the following steps:

building a standard text directory tree by a text analysis method, for example: according to the industry standard of the financial industry network security level protection evaluation guideline and the financial industry network security level protection implementation guideline part 2: basic requirements two text data of industry standard, through text analysis method, using the text information of the industry standard of financial industry network security level protection evaluation guideline as a catalogue tree, and directing the implementation of financial industry network security level protection to part 2: basic requirements the textual information of the industry standard works itself as a tree of contents. As shown in fig. 2 and 3, respectively.

(b) Calculating the sum of the similarity between the content texts under each subdirectory in the two standard files by a bm25 similarity calculation method, judging whether the similarity exceeds a threshold value, taking the file with the later release date as an industry standard file for updating the file with the earlier release date when the similarity is higher than the threshold value, and difficultly determining whether the file needs to be updated when the similarity is not higher than the threshold value, and updating the industry standard file by a method of fusing data of the two files. For example: and (2) calculating the industry standard of financial industry network security level protection evaluation guideline and financial industry network security level protection implementation guidance: basic requirements "distance between content texts under various subdirectories in the industry standard. The part 2 for realizing the text information of the industry standard of the financial industry network security level protection evaluation guideline and the financial industry network security level protection implementation guidance: basic requirements "comparison of similarity between textual information of industry standards. For calculating the similarity between two text messages, a bm25 similarity calculation method is used, the sum of the similarities of all directory data points of the two text messages is calculated, the greater the sum of the similarities is, the higher the similarity of the two standard data is, when the similarity is higher than a threshold value, a file with a later release date replaces a file with an earlier release date to serve as an updated industry standard file, when the similarity is not higher than the threshold value, whether the update is needed to be determined is difficult, and the industry standard file is updated through a method of fusing the two file data.

(c) Updating an industry standard file by a method of fusing two pieces of file data, for example: according to the fig. 2, each secondary topic under each level topic in the two catalogues is regarded as a data point, and as can be seen from the fig. 2, the data points under the "second level evaluation requirement (a)" in the primary topic are respectively: general requirements for safety assessment (A2-1),

The method comprises the following steps of cloud computing safety evaluation expansion requirement (A2-2), mobile internet safety evaluation expansion requirement (A2-3) and internet of things safety evaluation expansion requirement (A2-4). Other secondary topics are also divided according to the same reason. Similarly, the data points in the second level subject in fig. 3 also include: the system comprises a security general requirement (D2-1), a cloud computing security extension requirement (D2-2), a mobile internet security extension requirement (D2-3), an internet of things security extension requirement (D2-4) and the like. To facilitate the calculation of the distance between each data point, a label is assigned to each system as described in FIG. 2 (A2-1), (A2-2), (A2-3), (A2-4), and so on.

(d) And calculating the similarity between the data points aiming at the primary topic corresponding to the primary topic and the secondary topic corresponding to the secondary topic in the two catalogs, and merging and updating the two data points with the highest similarity. For example, it can be known by calculation that: data points (A2-1) in the industry standard of financial industry network security level protection evaluation guidelines and the financial industry network security level protection implementation guidelines part 2: the similarity of the data point (D2-1) is highest in the industry Standard, so the data point (A2-1) and the data point (D2-1) are merged into a New data point New _ A2-1. The data point (A2-2) is merged with the data point (D2-2) to become a New data point New _ A2-2. Similarly, the distance between the data point and the data point is continuously calculated, and the two data points with the minimum distance are found and merged into a new data point.

(e) Combining the newly combined data points to the corresponding primary topic according to the principle that the primary topic corresponds to the primary topic in the step (d). For example, the combined data point New _ A2-1, the combined data point New _ A2-2, the combined data point New _ A2-3 and the combined data point New _ A2-4 are combined into a New primary topic corresponding data point, New _ A. Other similar principles continue to merge combined data points to form a new primary topic. And finally, combining all newly generated primary topic data points to generate a New directory tree New _ 1. The meaning of this New directory tree New _1 is: the original financial industry network security level protection evaluation guideline industry standard document combines with the crawled financial industry network security level protection implementation guidance part 2: the basic requirements of the document of the industry standard document carries out text clustering and updates the document of the industry standard document New _1 into a New document of the industry standard document New _1, namely a directory tree of New _ 1.

(f) And (e) taking the industry standard file newly generated in the step (e) as an industry standard file needing to be updated.

And S4, performing structured extraction on the industry standard file to be updated through a preset text matching rule to obtain structured standard data, wherein the structured standard data comprise industry standard field names and scoring standard contents corresponding to the industry standard field names, and the structured standard data are manually classified into two types, namely a non-network safety type and an automatic network safety type. The structured standard unified preset rule is expressed as: { [ industry Standard field name ], [ corresponding Scoring Standard content ] }.

The non-network security type is a type which cannot pass the test and automatically test the security score.

For example, based on an industry standard text of 'information security technology based on IPv6 high-performance network auditing system product security technical requirement', field information features are extracted from the text information by using a syntax analysis technology; the field information characteristics comprise field types and field contents corresponding to the field types. For this industry standard document, the extracted text message is all the text messages whose titles in the first-level title contain the "requirement" keyword. Other unwanted first-level headline text information is filtered out. If the first-level titles have: "safety function requirement", "environmental adaptability requirement", "performance requirement", "safety guarantee requirement". And for each secondary title, extracting the title of the corresponding last-level sub-theme as one of keys in the characteristic values, and taking the corresponding requirement of the text information as the value corresponding to the key.

The module of 'safety function requirement' can find that the standard requirement information in the industry standard document comprises a non-network safety type and an automatic network safety testing type. The five modules of information acquisition, data recovery, management control requirement, safety management and data storage are of a non-network safety type after the extracted text information is vectorized. And the text information of the remaining modules of 'audit record statistics', 'audit record analysis processing', 'identification and authentication' and 'audit log' is vectorized and then belongs to the type of automatically testable network security. For the non-network security type, "data storage", according to its last level title, the extracted keys are: "storage medium", "data deletion", "backup & restore", and the like.

The data format after the text information normalization process is as follows:

{ 'data delete': a) should be able to record basic information of deletion behavior, including time and date, operator, description of deleted content, b) should be able to set policy to automatically delete data that exceeds a retention time limit,

'data deletion scoring criteria': [ a) cannot record basic information of deletion behavior, including time and date, operator, description of deletion content, and 1 point is deducted, b) cannot set policy to automatically delete data exceeding a storage time limit, and 1 point is deducted ] }

{ 'backup and restore': the IPv6 network audit product should provide audit record backup and recovery functions,

'backup and restore scoring criteria': [ IPv6 network audit product does not provide audit record backup and recovery function, deduct 1 point ] }

For the type of automatically testable network security, "audit record analysis processing", according to its last level header, the extracted keys are: "association analysis", "anomaly analysis", "response alert", and the like.

The data format after the text information normalization process is as follows:

{ 'anomaly analysis': [ a) capable of predefining an abnormal event, capable of triggering an alarm when a certain threshold value is reached for the number of times or frequency domain occurred within a certain time period or when a certain flow rate reaches a threshold value, b) capable of defining an abnormal behavior based on the result of the correlation analysis function, triggering an alarm for the abnormal behavior of the system, c) other abnormal situations ],

'anomaly analysis scoring criteria': [ a) can predefine abnormal events, can trigger alarm when the frequency or frequency domain of occurrence in a certain time period reaches a certain threshold or certain flow reaches a threshold, and deduct 1 point, b) can define abnormal behaviors based on the result of the correlation analysis function, trigger alarm for the abnormal behaviors of the system, and deduct 1 point, c) other abnormal conditions, and deduct 1 point ] }

{ 'Association analysis': [ a) basic information association including collecting information based on time, event, source IP address, source port address, destination IP address, service type, network protocol, etc. ], b) statistical association, performing association analysis between different network events by using a data mining algorithm, etc.,

'association analysis scoring criteria': [ a) there is no basic information association, including collecting information based on time, event, source IP address, source port address, destination IP address, service type, network protocol, etc., deducting 0.5 points, b) there is no statistical association, and the method of data mining algorithm, etc. is used to make statistics for the association analysis between different network events, deducting 0.5 points ] }

{ 'response to warning': [ a) the product should support policy setting triggering alarms, b) should be able to record alarms, and the content should include: date, time, event subject, event level, event description, alarm frequency, event result, c) alarm mode should support at least one of mail alarm, SNMPtrap alarm, acousto-optic alarm, short message alarm, etc. ],

'response warning scoring criteria': [ a) the product cannot support the strategy setting to trigger the alarm, and deduct 1 point, b) the alarm cannot be recorded, and the content should include: date, time, event subject, event level, event description, alarm times, event result, and 1 point deduction, c) the alarm mode does not support at least one of mail alarm, SNMPtrap alarm, acousto-optic electric alarm, short message alarm, and the like, and 1 point deduction }

And S5, training a binary classifier by a Bayesian classification method according to the structured standard data which is obtained in the step S4 and is subjected to manual classification, wherein one type is a non-network safety type, the other type is a network safety type which can be automatically tested, the input parameters of the Bayesian classification comprise the field names of the structured standard data and the text information included in the field contents, and the prediction result of the Bayesian classification is the non-network safety type or the network safety type which can be automatically tested.

S6: according to the step S3, the industry standard texts needing to be updated are obtained, the mapping relation between the two differences is mined by evaluating the differences between the new and old standard texts and the differences between the new and old scoring standard texts, and the mapping relation is effectively modeled. The differences specifically include two aspects: 1. the degree of change 2 of the standard text and the degree of change of the scoring standard at different times. In particular, the distinctive features of standard text can also be subdivided into three aspects: numerical values, degree adverbs, and quantities. For example:

the numerical description in the standard text changes: old standard text: the security functions describing the product should cover 80%; new standard text: the safety functions describing the product should cover 90%. The change is characterized by a change in numerical characteristic from "80%" to "90%".

The degree adverb in the standard text changes: old standard text: part describes the safety function of the product; new standard text: the safety function of the product is fully described. The variation is in the degree adverb feature from "partial" to "complete".

The number in the standard text changes:

old standard text: the developer should provide a complete functional specification, which should satisfy the following requirements: a) fully describe the safety function of the product; b) describing the purpose and the using method of all safety function interfaces; c) identifying and describing all parameters related to each secure function interface; d) describing security function implementation behaviors related to the security function interface; e) describing a direct error message caused by the security function implementing the behavioral processing; f) describing the tracing of the safety function requirement to the safety function interface;

new standard text: the developer should provide a complete functional specification, which should satisfy the following requirements: a) fully describe the safety function of the product; b) describing the purpose and the using method of all safety function interfaces; c) identifying and describing all parameters related to each secure function interface; d) describing security function implementation behaviors related to the security function interface; e) describing a direct error message caused by the security function implementing the behavioral processing; f) describing the tracing of the safety function requirement to the safety function interface; g) describing all behaviors related to the safety function interface in the safety function implementation process; the variation is that the number of criteria increases from "6" to "7".

And S7, identifying the updated industry standard content through a difference evaluation system according to the standard text acquired in the step S3, wherein the difference evaluation system comprises a feature identification model. The method specifically comprises the following steps:

the feature recognition model is essentially an entity recognition model, intended to recognize entity words in a given text. Such as: the tomorrow is the mid-autumn festival, and the mid-autumn festival is a festival entity. Thus, this step identifies "feature" entities in a given text using an entity recognition model. The invention only relates to the industry standard and the scoring standard in the field of information security, and the related texts have stronger regularity, so that a better effect can be achieved by utilizing a named entity identification method based on rules. Firstly, taking out entities (features) in a training sample, establishing a feature word dictionary, then labeling sequences, and finally identifying the entities by using a regular matching method. Taking as an example the feature in "describing the safety function of a product should cover 90%": firstly, the sentence is segmented [ the safety function of the product is described to cover 90% ], then, a part-of-speech tagging tool is used for tagging, the tagging rule is that if the word is in a feature word dictionary, the word is tagged as E, and the rest words are tagged according to the part-of-speech given in the part-of-speech tagging tool, and the obtained sequence is [ VNDETNADVVE ]. And for the text, performing regular matching according to the obtained sequence rule to identify the characteristic entity of the text. And performing feature recognition on the industry standard text and the scoring standard text to be processed by using the model. And finally, carrying out nonlinear change on the LSTM-coded sentences by using a multilayer perceptron to realize semantic understanding of the features. And inputting the marked sequence as a feature and the label into an LSTM network for training, wherein the output dimension is 50, the activation function is tan h, then a 0.5 Dropout layer is added, the activation function is a full connection layer of softmax, adam is adopted by an optimizer, and the accuracy rate is adopted by the monitoring index. Specifically, the feature identification module identifies that the given industry standard text "describes that the safety function of the product should cover 90%" and the scoring standard text "describes that the safety function of the product should cover only 80%, and deducts 1 point; if the safety function of the described product only reaches 70%, the product is deducted for 2 points; less than 70%, and 5 points out (5 points out full) ". And after the numerical characteristics are identified, semantic understanding is carried out on the identified characteristics by using a text understanding module, so that modeling is carried out on the mapping relation between the industry standard text and the scoring standard text. For example: the change of the numerical characteristic of the industry standard is related to the change degree of the scoring standard, and when the numerical characteristic of the industry standard is adjusted from '80%', the numerical characteristic of the industry standard is adjusted

By "90%", the corresponding scoring criteria numerical characteristic is also changed from "80%" to "90%", the change being characterized by a correspondence in the change of the two characteristics.

S8, according to the method of the steps S1-S7, the standard text is continuously updated.

S9, automatically evaluating the network environment needing to be tested at present through an automatic testing system according to the structured standard data of the updated standard text obtained in the steps S1-S8 to obtain an evaluation score p 1; manually evaluating the network environment needing to be tested by a reviewer to obtain an evaluation score p 2;

a software automatic test tool is preset in the automatic test system, and the automatic test tool can automatically test and evaluate a network environment according to the industry standard field name in the structured standard of the standard text or the corresponding grading standard content to obtain a test and evaluation score;

the evaluation of the network environment needing to be tested at present by the reviewers manually comprises the following steps: the structural standard data of the updated standard texts obtained in the steps S1-S8 and the differences between the new and old scoring standard texts are mined, the mapping relation between the two differences is excavated and pushed to the reviewer, and the reviewer is assisted to manually evaluate the network environment;

comparing whether the results between p1 and p2 are consistent or not, and processing the updated standard text according to the comparison result;

if the results are inconsistent, the default judgment result shows that the updating standard text field is updated wrongly to cause the grading difference, and the reviewer lists the field with the updating error and the reason of the error. And updating the structured text matching rule of the standard text according to the fields listed by the reviewers as the updating errors and the reasons of the errors.

If the reviewer does not list the specific field updating error, the reviewer judges that the score cannot be automatically evaluated through the automatic test system due to the fact that the updated standard document is provided with the non-network security field. And acquiring the industry standard field names and the corresponding scoring standard contents which cause scoring differences, marking the industry standard field names and the corresponding scoring standard contents as non-network security types, and retraining the binary classifier in the step 5.

Claims

1. A method for constructing an information security scoring system based on artificial intelligence is characterized by comprising the following steps:

crawling standard texts related to industry standard online information safety;

2. The method for constructing the information security scoring system based on the artificial intelligence as claimed in claim 1, wherein: the method for crawling the standard texts relevant to the industry standard online information safety comprises the following steps:

3. The method for constructing the information security scoring system based on the artificial intelligence as claimed in claim 1, wherein: and determining the industry standard text needing to be updated by adopting a hierarchical similarity calculation method according to the doubtful updating standard text, wherein the method comprises the following steps:

4. The method for constructing the information security scoring system based on the artificial intelligence as claimed in claim 1, wherein: the structured extraction is carried out on the industry standard text needing to be updated through a preset text matching rule to obtain structured standard data, and the method comprises the following steps: the structured standard data comprises industry standard field names and scoring standard contents corresponding to the industry standard field names, and the structured standard data is manually classified into two types, namely a non-network safety type and an automatic network safety testing type. The structured standard unified preset rule is expressed as: { [ industry standard field name ], [ corresponding scoring standard content ] };

5. The method for constructing the information security scoring system based on the artificial intelligence as claimed in claim 1, wherein: the mapping relation between the two differences is mined by evaluating the difference between the new standard text and the old standard text and the difference between the new standard text and the old standard text according to the industry standard text needing to be updated; the method comprises the following steps: the variability is subdivided into three aspects: numerical values, degree adverbs, and numerical differences vary.

6. The method for constructing the information security scoring system based on the artificial intelligence as claimed in claim 1, wherein: the network environment needing to be tested at present is automatically evaluated according to the structured standard data of the industry standard text needing to be updated through the automatic testing system, and an evaluation score p1 is obtained; manually evaluating the network environment needing to be tested by a reviewer to obtain an evaluation score p 2; the method comprises the following steps: a software automatic test tool is preset in the automatic test system, and the automatic test tool can automatically test and evaluate a network environment according to the industry standard field name in the structured standard of the standard text or the corresponding grading standard content to obtain a test and evaluation score;

7. The method for constructing the information security scoring system based on the artificial intelligence as claimed in claim 1, wherein: whether the results of the comparison between the p1 and the p2 are consistent or not is judged, and the updated standard text is processed according to the comparison result, wherein the method comprises the following steps:

if the reviewer does not list the updating error of the specific field, the reviewer judges that the score cannot be automatically evaluated through the automatic test system due to the fact that the updated standard document has the non-network security field; and acquiring the industry standard field names and the corresponding scoring standard contents which cause scoring differences, marking the industry standard field names and the corresponding scoring standard contents as non-network security types, and retraining the binary classifier.