CN114879936A - Method and system for acquiring safety requirement facing natural language requirement - Google Patents

Method and system for acquiring safety requirement facing natural language requirement Download PDF

Info

Publication number
CN114879936A
CN114879936A CN202210427177.4A CN202210427177A CN114879936A CN 114879936 A CN114879936 A CN 114879936A CN 202210427177 A CN202210427177 A CN 202210427177A CN 114879936 A CN114879936 A CN 114879936A
Authority
CN
China
Prior art keywords
requirement
security
entity
safety
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210427177.4A
Other languages
Chinese (zh)
Inventor
沈国华
李广龙
黄志球
李锐
蔡茂东
杨思恩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202210427177.4A priority Critical patent/CN114879936A/en
Publication of CN114879936A publication Critical patent/CN114879936A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/10Requirements analysis; Specification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • G06F21/54Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by adding security routines or objects to programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method and a system for acquiring a safety requirement facing to a natural language requirement, and belongs to the technical field of safety requirement engineering. Firstly, extracting entities and action relations contained in each requirement statement of a requirement text based on natural language processing; predicting a safety target set of each demand statement by using a multi-label classification model based on deep learning; then, according to the entity, action relation and a safety target set, matching safety requirement templates constructed based on CC standards through matching conditions, wherein one safety target is mapped to a plurality of safety requirement templates, and whether the matching conditions in the safety requirement templates are satisfied or not is judged according to the entity and action relation extracted from a requirement statement; and finally, filling parameters in the matched safety requirement template according to the entity, the action or the constructed short sentence, and instantiating the safety requirement template. The invention realizes the reuse of safety knowledge and effectively improves the automation degree of the safety requirement acquisition method.

Description

Method and system for acquiring safety requirement facing natural language requirement
Technical Field
The invention belongs to the technical field of safety requirement engineering, and particularly relates to a method for automatically acquiring safety (Security) requirements from English natural language requirements by combining a natural language processing technology and a deep learning technology.
Background
Security Requirements Engineering (SRE) is a process and method for acquiring Security Requirements of a software system, and mainly focuses on performing Security analysis on the software system at an early stage of a requirement stage. There are mainly the following typical security requirement acquisition methods:
1) methods based on use cases (misase cases): and modeling software systems and safety-related elements by expanding use cases, so as to obtain safety requirements. Such as: misase cases acquire security requirements by describing behaviors that should not occur by an entity or a system.
2) Common Criterion (CC) -based method: the CC standard is called the Information Technology Security Evaluation criterion (Common criterion for Information Technology Security Evaluation), and is widely applied to various analysis and acquisition stages of Security requirements.
3) Threat \ risk oriented method: the security requirement is obtained by analyzing the threats, risks or attacks that the software system may face to provide corresponding mitigation measures.
4) Object-oriented methods: the safety target of the software system is analyzed to obtain the measures to be taken by the software system, and then the safety requirement is formed.
The CC standard is also called ISO/IEC 15408 standard, and the latest version is V3.1 at present. The CC standard includes the following three sections: profiles and General models (Introduction and General models), Security function Components (Security Functional Components), and Security Assurance Components (Security assessment Components). The second part of the CC is a directory of security function components that can be used to specify security function requirements of an Evaluation object (TOE). The CC has 11 Security classes (Security classes), each Security Class includes a plurality of Security components (Security components), each Security Component includes a plurality of Security elements (Security elements), wherein each Security Element is self-contained, and one Security Element is a Security requirement after being instantiated.
The traditional security requirement acquisition method mostly depends on expert knowledge in the security field and needs to be performed manually, and particularly needs to manually process a large amount of natural language requirement texts. In the related research of Security requirement engineering, the traditional methods often need to manually acquire Security requirements, such as Security Use Cases, absuse Cases, misse Cases, fault trees, attack trees and other methods and technologies, and the methods need the engineer to have expert knowledge in software Security. Most methods, when performing security analysis, are often accompanied by a large amount of manual processing of natural language requirements: requirement specification or software modeling is carried out on natural language requirements, such as: the threat modeling tool of Microsoft establishes a data flow graph of software firstly; the attack tree requires modeling of the attack path first, etc.
Disclosure of Invention
The purpose of the invention is as follows: in view of the above disadvantages of the prior art, the present invention aims to provide a method and a system for acquiring a security requirement facing natural language requirements, which reduces the dependence of the security requirement acquisition method on the knowledge of security experts and improves the automation degree of the security requirement acquisition method.
The technical scheme is as follows: in order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a safety requirement acquisition method facing natural language requirements comprises the following steps:
extracting entities and action relations contained in each requirement statement of the requirement text based on natural language processing;
predicting a safety target set of each demand statement by using a multi-label classification model based on deep learning;
according to the entity, the action relation and the safety target set, matching a safety requirement template constructed based on the CC standard through matching conditions; one safety target is mapped to a plurality of safety requirement templates, and whether a matching condition in the safety requirement template is satisfied is judged according to an entity extracted from a requirement statement and an action relation;
and filling parameters in the matched safety requirement template according to the entity, action or constructed short sentence, and instantiating the safety requirement template.
Preferably, extracting the entities and the action relationships based on natural language processing includes:
preprocessing a requirement text by using Stanford CoreNLP, wherein the requirements text comprises tokenization, sentence splitting, word drying and part-of-speech tagging;
extracting entities and entity relations by using a dependency type-based heuristic rule constructed by iMER;
extracting entity relationship by using an openIE annotator of Stanford CoreNLP;
and taking a union of all the extracted results, removing stop words, removing the relation not including verbs, and selecting the entity with the longest word string and the relation with the longest word string by adopting a longest principle.
Preferably, the method for selecting the entities/relationships using the principle of longest quantization is as follows:
establishing an empty set for storing the longest entity/entity relationship triple, which is marked as: MaxEntites/MaxER;
comparing each extracted entity/entity relationship triplet with all entity/entity relationship triplets in MaxEntites/MaxER: if entity nesting exists, only storing an entity/entity relation triple with the longest word string in the MaxEntites/MaxER; if entity nesting does not occur all the time, the entity/entity relationship triple is stored in MaxEntites/MaxER.
Preferably, the multi-label classification model employs a BERT-TextCNN model, and the classification labels include "consistency", "Integrity", "Availability", "Identification & Authentication", "Privacy", and "accounting".
Preferably, each security requirement template is a description of a target security function TSF, which can satisfy one or more security targets; each TSF includes a function name, a function introduction, and one or more security scopes; each security scope includes a set of matching criteria, a security requirement template, and one or more security objectives that the template satisfies.
Preferably, the < subjects >, < actions > in the matching conditions of the security requirement template are < subjects >, < actions > extracted from the requirement statement; < users > are drawn from the entity set, and < information > is the entity set that remains after filtering out < system >, < resources >, < users > that appear inside < objects >.
Preferably, the method for judging whether the matching condition in the security requirement template is satisfied is as follows: if all the matching conditions in a group of matching conditions corresponding to the security requirement template are satisfied, the group of matching conditions is satisfied, otherwise, the group of matching conditions is not satisfied, and the specific method for judging whether each matching condition is satisfied is as follows:
if the matching condition is expressed as a single basic parameter type, if the information of the parameter type exists in the input requirement statement, the matching condition is satisfied; otherwise, the matching condition is not satisfied;
if the matching condition is expressed as an assignment statement of the basic parameter, the matching condition is satisfied as long as a word on the right side of the equal sign exists in the input requirement statement; if all the words do not exist, the matching condition is not established;
if the matching condition is expressed as a plurality of basic parameters "and" represents logic and operation, if words of the several parameter types exist in the extracted information at the same time, the matching condition is satisfied; otherwise, the matching condition is not satisfied;
if the matching condition is expressed as a plurality of basic parameter phases "or" which represent logic or operation, the matching condition is satisfied as long as a word of one parameter type exists in the extracted information; if the words of all parameter types do not exist, the matching condition does not hold.
Preferably, when the safety requirement template is instantiated, the parameters needing to be instantiated in the safety requirement template are found through the regular expression, the parameters are assigned, and the parameters without being assigned are kept to be original shapes; the parameters capable of being automatically filled comprise < events > besides the parameters in the matching conditions, and the < events > is assigned as a main object and a predicate object short sentence consisting of an action subject, an action and an action object.
A natural language requirements oriented security requirements acquisition system, comprising:
the entity and action relation extraction module is used for extracting the entity and action relation contained in each requirement statement of the requirement text based on natural language processing;
the safety target classification module is used for predicting a safety target set of each requirement statement by using a multi-label classification model based on deep learning;
the security requirement template matching module is used for matching the security requirement template constructed based on the CC standard through matching conditions according to the entity, the action relation and the security target set; one safety target is mapped to a plurality of safety requirement templates, and whether a matching condition in the safety requirement template is satisfied is judged according to an entity extracted from a requirement statement and an action relation;
and the safety requirement template instantiation module is used for filling the matched parameters in the safety requirement template according to the entity, the action or the constructed short sentence, and instantiating the safety requirement template.
A computer system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes the security requirement acquisition method facing natural language requirements when being loaded to the processor.
Has the advantages that: compared with the prior art, the method establishes a set of parameterized safety requirement templates based on the CC standard, realizes the reuse of safety knowledge, automatically matches the safety requirement templates by using a natural language processing technology and a deep learning technology, instantiates the safety requirement templates, and effectively improves the automation degree of the safety requirement acquisition method.
Drawings
FIG. 1 is an overall flow diagram of an embodiment of the present invention.
FIG. 2 is a graphical output example diagram of dependency syntax parsing in an embodiment of the present invention.
Fig. 3 is an exemplary diagram of a graphical output result of the extraction of the open information according to the embodiment of the present invention.
Detailed Description
In order to facilitate understanding of those skilled in the art, the present invention will be further described with reference to the following embodiments and accompanying drawings. Before describing the specific flow of the embodiment of the present invention in detail, the natural language processing technique and the deep learning technique involved will be described first.
Natural Language Processing (NLP) technology can extract information such as entities and relationships between entities from a Natural Language text and structure the information. The NLP technology can be used for processing requirement texts described by natural languages, and the extracted requirement elements are used in a safety requirement process. The embodiment of the invention mainly relates to two natural language processing technologies: dependency Parsing (Dependency Parsing) and Open domain Information Extraction (Open IE). Dependency parsing refers to identifying interdependencies between words in a sentence. The output of dependency parsing can be represented as a directed acyclic graph between words, with dependencies labeled between words. An example of a dependency relationship is nsubj (evaluate {4}, vector {2}), meaning that "vector" is the subject of the verb "evaluate", and the numbers inside the braces "{ }" are the sequence numbers of the corresponding words in the sentence. Open domain information extraction an open domain relationship triple is extracted from a sentence: the subject (subject) of the relationship, the relationship (relationship), and the object (object) of the relationship. Consisting of noun phrases and verb phrases representing the semantic relationship between them, usually denoted as<Entity 1 (ii) a A relationship; entity 2 >In the form of (a).
In the embodiment of the invention, the input of the multi-label text classification task is an English natural language requirement, and the output is a plurality of security target label categories of the requirement.
The Security Object is a Security attribute (Security Properties) desired by the system, and by identifying a Security Object expressed or implied by a particular sentence in the document, the intent of the sentence can be understood, and possible Security functions and requirements for establishing the intent can be established. The embodiment of the invention uses six safety targets, the safety targets mainly concern the implementation technology of a software system, and the six safety targets are defined as follows:
1) confidentiality (C: configurativity): only to the extent that the data is disclosed as desired.
2) Integrity (I: integer): the system or component prevents the extent to which computer programs or data are improperly modified or corrupted.
3) Identification and authentication (IA: identification & Authentication): it is necessary to determine that the identity of the claim is valid for the user, process or device.
4) Availability (a: availability): operability and accessibility of systems or components when needed for use.
5) Accountability (AY: accountability): the behavior that affects a software asset can be traced back to the point of the participant responsible for the behavior.
6) Privacy (P: privacy): the extent to which participants understand and control the manner in which their information is used.
As shown in fig. 1, a method for acquiring a security requirement facing a natural language requirement disclosed in an embodiment of the present invention mainly includes four steps: firstly, extracting entities and action relations contained in each requirement statement based on natural language processing; then, predicting a safety target set of each demand statement by using a multi-label classification model based on deep learning; secondly, according to the output of the first two steps: matching a security requirement template through matching conditions, wherein the security requirement template is constructed based on CC (component collection) standards; and finally, filling parameters in the matched safety requirement template.
1. Entity and action extraction based on natural language processing
The extraction of entities from the requirement text and the actions between entities can be used not only to match the conditions of the security requirement template in step 3 of the method of the present embodiment, but also to instantiate the security requirement template in step 4. This embodiment extracts entities and Entity relationships based on iMER (Javed M, Lin Y. iMER: Interactive Process of Entity Relationship and Business Process Model Extraction from the Requirements [ J ]. Information and Software Technology,2021,135 (6): 106558) and the Natural language processing tool Stanford CoreNLP, and further screens out desired entities and actions between entities.
The requirements text is first preprocessed using Stanford CoreNLP, including Tokenizing (Tokenizing), splitting (splitting), stemming (Lemmatizing), and Part-of-Speech tagging (Part of Speech). The following requirement text R1 is an example of what is taught throughout the flow of the invention. FIG. 2 is a graphical representation of the dependencies resulting from the requirements text R1 after Stanford CoreNLP processing.
R1:"The doctor shall evaluate the results of tests taken over by the patient in the hospital。"
Secondly, the entity and entity relationship extraction is carried out by using the dependency type-based heuristic rule constructed by iMER. In the requirement text R1, the entity set extracted by iMER is: [ 'tests', 'results', 'sector', 'hospital', 'patient' ], the set of entity relationships are: "{ subject ': vector', 'relationship', 'update', 'object', 'results' }.
Then, an entity relation extractor is used for extracting entity relations by using an openIE annotator of Stanford CoreNLP, fig. 3 shows the entity relations of the requirement text R1 in a graphical representation obtained by processing the Stanford CoreNLP, and the extracted entity sets are as follows: [ 'vectors', 'results of tests', 'results' ], the extracted entity relationships and sets are as follows: "{ ' subject ': vector ', ' relationship ', ' shape update ', ' object ': results ', ' responses of tests ', ' response of ' property ', ' response ': result ': is in ', ' object ', ' host ', ' horizontal ', ' response ', ' shape update ', ' object ', ' response ', etc.
Finally, the union of all the extraction results is taken, so as to extract all potential entities and entity relations as much as possible. After the union is taken, some stop words, such as "call", "should", "the", etc., need to be removed. It is particularly noted that in the extracted entities, the word string of one entity may be a substring of the word string of another entity, i.e., there may be a case of entity nesting between entities, such as: the entity "drugs" is a substring of the entity "list of durgs"; an entity "results" is a substring of entities "results of tests". To avoid repetition, the present embodiment adopts the principle of longest quantization: the entity with the longest word length, i.e., "list of durgs", "results of tests", is selected. This embodiment gives an algorithm for selecting the longest entity when nesting entities, as shown in algorithm 1:
algorithm 1. SelectMaximizeEntites
Input:Entities
Output:MaxEntities
Figure BDA0003610087020000071
Since we need an action between entities, which is used in the subsequent steps to match the security requirement template, for an entity relationship, if it does not contain a verb, the relationship is discarded, such as for the relationship in FIG. 3, the relationships { ' subject ': task ', ' relationship ': in ', ' object ': host ' }. Second, when screening action relationships between entities, if there is nesting in both relationships: a "relationship" in one relationship is part of a "relationship" in another relationship, or an entity in one relationship is part of an entity in another relationship. Such as: an object ' of an entity relationship triple { ' subject ', ' vector ', ' relationship ', ' shell even ', ' object ', ' results ', ' relationship ', ' shell even ', ' object ' or ' results ' } is a substring of an object ' of the entity relationship triple { ' subject ', ' vector ', ' relationship ', ' result ', ' shell even ', ' object ', ' results of tests ' }, i.e., the ' results ' are substrings of the ' results of the ' tests '. To avoid repetition, the embodiment also adopts the principle of longest quantization: the relationship with the longest word length is selected.
After the first step of processing, the entity set extracted from the requirement text R1 includes: [ 'results of tests', 'vector', 'hospital', 'patient' ], the action set is: these will also be the inputs to the third step of the method of the present embodiment.
2. Deep learning based secure object multi-label classification
In this step, a deep learning based security target multi-label classification model is used to obtain a security target set for each requirement statement, which may contain 0 or 1 or more security targets. In the embodiment of the present invention, a total of six typical security objectives are used: "consistency", "Integrity", "Availability", "Identification & Authentication", "Privacy", and "accounting Availability", the most common security objects being "consistency, Integrity, Availability", which are also referred to as "CIA triplets".
To obtain excellent security target multi-label classification efficiency, we trained a classification model "BERT-TextCNN": the model uses the excellent pre-training language model BERT to obtain the word vector for each English word in the demand sentence, and then outputs the probability of each security target label using TextCNN as a downstream task.
The security target class predicted by the requirement text R1 by BERT-textCNN is "Confidentiality, Integrity", which also meets our expectations: since the doctor is to evaluate the examination results of the patient, this relates to the confidentiality of the patient data, as well as the integrity of the examination results.
3. Matching of security requirement templates
After obtaining the entities of the requirement statements, the actions between the entities, and the set of security objectives, these information need to be used to match the security requirement template, so we first introduce the security requirement template that we have built.
3.1 safety requirement template
The Security requirement Template established in the embodiment of the invention is completed based on the Security Function component of the second part of the CC standard, however, because the standard is not easy for a requirement engineer with a non-Security background to interpret and apply, we simplify a part of the Security Function components, and finally form 36 Target Security Functions (TSF) and 40 Security Requirements templates (Security Requirements templates). For each TSF, a security family of the security function component of the CC may be corresponding, so the embodiment reserves the name of the security family in the CC as the unique identifier of the corresponding TSF. Each security requirement template is obtained by screening, simplifying and combining security elements under the security family. We choose the security family as the counterpart of the TSF because the security family is a second level directory, i.e., a middle level hierarchy, of security function components that is more granular than security classes and has better generalization ability than security components.
3.1.1 organizational Structure of safety requirement template
Each Security requirement template is a description of a Target Security Function (TSF), i.e., a Security Function that the system should implement, and at the same time, each template may also satisfy a specific Security Target. Each TSF contains a name for a function, a description of the function, and one or more Security scopes (Security scopes). Each security scope includes a set of matching criteria, a security requirement template, and one or more security objectives that the template satisfies. The security scope is defined by the subject, object and action targeted by the TSF, which is represented by the matching condition of the security requirement template it contains. The same TSF may have multiple security scopes, i.e., multiple security requirement templates, due to different choices (different subjects, objects, or actions) in practical applications. We use the matching condition of the security requirement template to represent the security scope, i.e. the matching condition is used to determine whether the current security requirement template is suitable for the input requirement text.
The following is a complete example of a TSF that provides data validation capabilities to determine the validity of data:
Figure BDA0003610087020000091
Figure BDA0003610087020000101
we also present a mapping of a set of Security objects to the Security requirements template, as shown in Table 1, where column 1 is the Security Object (SO). Since the security scopes are not explicitly classified, and it is considered that the security requirement templates are organized in a security function, we establish a mapping relationship between the security target and the TSF. Therefore, a given security target can be mapped to a specific TSF, then a specific security requirement template can be obtained by traversing the security range under the TSF, and finally the security requirement template suitable for the original requirement is screened out through matching conditions.
TABLE 1 secure target, TSF mapping relationship table
Figure BDA0003610087020000102
3.1.2 matching conditions for Security requirement templates
The set of matching conditions may include a plurality of individual matching conditions, and the set of matching conditions is satisfied when each condition is satisfied, that is, the matching of the current requirement template is successful. All matching conditions have eight Basic Parameters (Basic Parameters) and four grammatical forms.
1) Eight basic parameters: all matching conditions share the following eight basic parameters: the method comprises the following steps of < key words >, < subjects >, < objects >, < actions >, < information >, < users >, < system >, < resources >, wherein each basic parameter represents a type of words with specific semantics, and the words need to be extracted from an original requirement statement and then are used for filling a safety requirement template. Among the eight basic parameters, < keyword words > refers to the keyword that should be in the original requirement sentence, and is only applied to the following second syntax form; the objects > and the actions > are relative to one action, the action is a subject initiated by the action and an object carrying the action; information > refers to information for transmission, processing, storage in the system; < users > refer to users interacting with the system; < system > refers to the system as a whole; < resources > refers to "resource", "service".
2) Four grammatical forms: all matching conditions have the following four syntax forms, where "and" is used to denote a logical and operation, "or" denotes a logical or operation, "═ denotes an assignment operation, < a >, < B > denotes one of eight basic parameters, and W denotes a set of words:
a. < A >: when the condition is a basic parameter, the condition is satisfied if the information of the parameter type exists in the original requirement statement;
b. < a > -W: when the condition is an assignment statement of a basic parameter, the condition is satisfied if words on the right side of the equal sign exist in the original requirement statement. The parameters < key words >, < system >, < resources > apply to this condition, where < key words > has a variable assignment depending on the different templates, and < system >, < resources > has a fixed assignment;
c. < A > and < B >: when the condition is a plurality of basic parameter phases 'and', the condition is satisfied if the extracted information contains several parameter types at the same time;
< A > or < B >: when the condition is a plurality of basic parameters "or", it means that if information of one parameter type exists in the extracted information, the condition is satisfied.
3.2 matching of Security requirement templates
According to the mapping relation between the security target and the TSF, a set of security requirement templates can be obtained from the security target set obtained in the second step, but whether the templates are all suitable for the input requirement statement needs to be judged according to the matching condition of each template.
To determine whether the matching condition is satisfied, first, word instances of 8 basic parameters of the matching condition need to be identified from the input requirement text.
As can be seen from the foregoing description of the basic parameters in FIG. 8, only five of the 8 basic parameters need to be identified: < subjects >, < objects >, < actions >, < information >, < users >, and the entities and the actions between the entities extracted in the first step include the following parameters: < subjects >, < objects >, < actions >. For < users >, the present embodiment adopts two extraction rules and performs extraction within the entity set:
1) identified by affix '-er', 'or' in english;
2) establishing keywords of some user classes, such as: the title of a professional or a person in a specific area (patient in the medical field).
For < information > this embodiment considers that such words usually appear in < objects > class, and < users >, < system >, < resources > also often appear in < objects >, so this embodiment considers that after filtering < system >, < resources >, < users > that appear in < objects >, the remaining set of entities belongs to < information >.
For the requirement text R1, the present embodiment extracts the following parameters and word instances of the parameters from the output of the first step:
1)<subjects>:“doctor”;
2)<objects>:“results of tests”;
3)<actions>:“evaluate”;
4)<information>:“results of tests”;
5)<users>:“patient”,“doctor”。
for the requirement text R1, the security target "consistency" obtained in the second step may be mapped to tsf.5access control policy, and finally, a security requirement template and matching conditions thereof that satisfy "consistency" may be obtained:
Matching conditions:<subjects>and<actions>and<objects>
Security requirement:The TSF shall enforce the[assignment:<access control SFP>]on[assignment:<events>].
therefore, the matching condition is true, which means that the security requirement template is successfully matched, and is applied to the requirement text R1, and this security requirement template (hereinafter referred to as SRQ1) will be completed in the next step of the method.
We present the algorithm to determine whether the matching condition holds, as shown in algorithm 2, the input "BasicParameters" of algorithm 2 refers to 8 basic parameters and word examples thereof identified from the input requirement text.
Algorithm 2.JudgeMatchingConditions
Input:MatchingConditions,BasicParameters,ReqSentence
Return:isMatching
Figure BDA0003610087020000121
Figure BDA0003610087020000131
4. Instantiation of security requirement templates
The parameters needing to be instantiated in the security requirement template, namely the parameters with the pointed brackets in [ 'assignment:' ], are found through a group of character string regular expressions, the parameters which can be assigned are assigned in a form of "< A > ═ string value", and the parameters which are not assigned are kept as they are. In fact, because of the lack of relevant security background knowledge, it may be difficult for the requesting personnel to fill in parameters regarding a particular security technology, such as: the encryption standard < encryption standard >, the access control policy < access control policy >, etc., but the parameters themselves have actual semantics, and they do not affect the overall understanding of the security requirements by software development designers, so the embodiment retains the parameters as it is. All auto-fillable parameters are basically the basic parameters in the matching conditions, but there is also one: < events >, < events > refers to an event, an example of which is a simple short main-predicate object composed of an action subject, an action, and an action object.
The filled-in security requirements template SRQ1 is as follows:
The TSF shall enforce the[assignment:<access control policy>on[assignment:<events>="doctor evaluate results of tests"].
the method of the present invention is analyzed and compared with the existing methods in terms of description form, applicability, reusability, degree of automation, effectiveness, etc. of the security requirements acquisition method as follows, and the results are shown in table 2.
1) Degree of automation: whether supported by a tool, manual, semi-automatic or automatic.
2) Applicability: context in which the method is applicable.
3) Effectiveness: whether the acquired security requirements are valid.
4) Description of the form: the method result adopts a description form, such as natural language, formalization, semi-formalization and the like.
5) The reusability is as follows: whether the execution result of the method has reusable resources as safety knowledge in the field.
TABLE 2 comparison of the Process of the invention with other Processes
Figure BDA0003610087020000141
The method carries out multi-label text classification on the English natural language required text based on the deep learning technology, and Table 3 shows the evaluation results of the English natural language required data set and other classification methods, wherein the bold represents the best result, so that the BERT-TextCNN model used by the method achieves the optimal effect on each index.
TABLE 3 average experimental results of the multi-label classification model used in the method of the invention and other reference methods
Figure BDA0003610087020000142
Based on the same inventive concept, the system for acquiring the safety requirement facing the natural language requirement disclosed by the embodiment of the invention comprises: the entity and action relation extraction module is used for extracting the entity and action relation contained in each requirement statement of the requirement text based on natural language processing; the safety target classification module is used for predicting a safety target set of each requirement statement by using a multi-label classification model based on deep learning; the security requirement template matching module is used for matching the security requirement template constructed based on the CC standard through matching conditions according to the entity, the action relation and the security target set; one safety target is mapped to a plurality of safety requirement templates, and whether a matching condition in the safety requirement templates is satisfied is judged according to an entity extracted from a requirement statement and an action relation; and the safety requirement template instantiation module is used for filling the matched parameters in the safety requirement template according to the entity, the action or the constructed short sentence, and instantiating the safety requirement template.
The specific working process of each module described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again. The division of the modules is only one logical functional division, and in actual implementation, there may be another division, for example, a plurality of modules may be combined or may be integrated into another system.
Based on the same inventive concept, the embodiment of the present invention discloses a computer system, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the computer program implements the method for acquiring a security requirement oriented to a natural language requirement when being loaded into the processor.
It will be understood by those skilled in the art that the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer system (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes: various media capable of storing computer programs, such as a U disk, a removable hard disk, a read only memory ROM, a random access memory RAM, a magnetic disk, or an optical disk.

Claims (10)

1. A safety requirement acquisition method facing natural language requirements is characterized by comprising the following steps:
extracting entities and action relations contained in each requirement statement of the requirement text based on natural language processing;
predicting a safety target set of each demand statement by using a multi-label classification model based on deep learning;
according to the entity, the action relation and the safety target set, matching a safety requirement template constructed based on the CC standard through matching conditions; one safety target is mapped to a plurality of safety requirement templates, and whether a matching condition in the safety requirement template is satisfied is judged according to an entity extracted from a requirement statement and an action relation;
and filling parameters in the matched safety requirement template according to the entity, action or constructed short sentence, and instantiating the safety requirement template.
2. The method for acquiring security requirements oriented to natural language requirements of claim 1, wherein extracting entities and action relationships based on natural language processing comprises:
preprocessing a requirement text by using Stanford CoreNLP, wherein the requirements text comprises tokenization, sentence splitting, word drying and part-of-speech tagging;
extracting entities and entity relationships by using heuristic rules based on dependency types constructed by iMER;
extracting entity relationship by using an openIE annotator of Stanford CoreNLP;
and taking a union of all the extracted results, removing stop words, removing the relation not including verbs, and selecting the entity with the longest word string and the relation with the longest word string by adopting a longest principle.
3. A natural language requirement oriented security requirement acquisition method according to claim 2, wherein the method of selecting entities/relationships using the longest principle is:
establishing an empty set for storing the longest entity/entity relationship triple, which is marked as: MaxEntites/MaxER;
comparing each extracted entity/entity relationship triplet with all entity/entity relationship triplets in MaxEntites/MaxER: if entity nesting exists, only storing an entity/entity relation triple with the longest word string in the MaxEntites/MaxER; if entity nesting does not occur all the time, the entity/entity relationship triple is stored in MaxEntites/MaxER.
4. The method of claim 1, wherein the multi-label classification model is a BERT-TextCNN model, and the classification labels include "consistency", "Integrity", "Availability", "Identification & Authentication", "Privacy" and "accounting".
5. The method for acquiring security requirements oriented to natural language requirements of claim 1, wherein each security requirement template is a description of a Target Security Function (TSF) and can satisfy one or more security targets; each TSF includes a function name, a function introduction, and one or more security scopes; each security scope includes a set of matching criteria, a security requirement template, and one or more security objectives that the template satisfies.
6. The method for acquiring security requirements for natural language requirements according to claim 1, wherein the < subjects >, < actions > in the matching conditions of the security requirements template are < subjects >, < actions > extracted from the requirements statement; < users > are drawn from the entity set, and < information > is the entity set that remains after filtering out < system >, < resources >, < users > that appear inside < objects >.
7. The method for acquiring a security requirement oriented to natural language requirement of claim 1, wherein the method for determining whether the matching condition in the security requirement template is satisfied is: if all the matching conditions in a group of matching conditions corresponding to the security requirement template are satisfied, the group of matching conditions is satisfied, otherwise, the group of matching conditions is not satisfied, and the specific method for judging whether each matching condition is satisfied is as follows:
if the matching condition is expressed as a single basic parameter type, if the information of the parameter type exists in the input requirement statement, the matching condition is established; otherwise, the matching condition is not satisfied;
if the matching condition is expressed as an assignment statement of the basic parameter, the matching condition is satisfied as long as a word on the right side of the equal sign exists in the input requirement statement; if all the words do not exist, the matching condition is not established;
if the matching condition is expressed as a plurality of basic parameters "and" represents logic and operation, if words of the several parameter types exist in the extracted information at the same time, the matching condition is satisfied; otherwise, the matching condition is not satisfied;
if the matching condition is expressed as a plurality of basic parameter phases "or" which represent logic or operation, the matching condition is satisfied as long as a word of one parameter type exists in the extracted information; if the words of all parameter types do not exist, the matching condition does not hold.
8. The method for acquiring the safety requirement facing to the natural language requirement according to claim 1, wherein when the safety requirement template is instantiated, the parameters needing to be instantiated in the safety requirement template are found through a regular expression, the parameters are assigned, and the parameters without assignment are kept as they are; the parameters capable of being automatically filled in comprise < events > besides the parameters in the matching conditions, and the < events > is assigned as a subject-predicate short sentence consisting of an action subject, an action and an action object.
9. A natural language requirements oriented security requirements acquisition system, comprising:
the entity and action relation extraction module is used for extracting the entity and action relation contained in each requirement statement of the requirement text based on natural language processing;
the safety target classification module is used for predicting a safety target set of each requirement statement by using a multi-label classification model based on deep learning;
the security requirement template matching module is used for matching the security requirement template constructed based on the CC standard through matching conditions according to the entity, the action relation and the security target set; one safety target is mapped to a plurality of safety requirement templates, and whether a matching condition in the safety requirement templates is satisfied is judged according to an entity extracted from a requirement statement and an action relation;
and the safety requirement template instantiation module is used for filling the matched parameters in the safety requirement template according to the entity, the action or the constructed short sentence, and instantiating the safety requirement template.
10. A computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the computer program when loaded into the processor implements the natural language requirements oriented security requirements acquisition method of any of claims 1-8.
CN202210427177.4A 2022-04-22 2022-04-22 Method and system for acquiring safety requirement facing natural language requirement Pending CN114879936A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210427177.4A CN114879936A (en) 2022-04-22 2022-04-22 Method and system for acquiring safety requirement facing natural language requirement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210427177.4A CN114879936A (en) 2022-04-22 2022-04-22 Method and system for acquiring safety requirement facing natural language requirement

Publications (1)

Publication Number Publication Date
CN114879936A true CN114879936A (en) 2022-08-09

Family

ID=82672315

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210427177.4A Pending CN114879936A (en) 2022-04-22 2022-04-22 Method and system for acquiring safety requirement facing natural language requirement

Country Status (1)

Country Link
CN (1) CN114879936A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117056896A (en) * 2023-10-13 2023-11-14 中国人民解放军军事科学院系统工程研究院 Intelligent control system form verification method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117056896A (en) * 2023-10-13 2023-11-14 中国人民解放军军事科学院系统工程研究院 Intelligent control system form verification method and device
CN117056896B (en) * 2023-10-13 2023-12-22 中国人民解放军军事科学院系统工程研究院 Intelligent control system form verification method and device

Similar Documents

Publication Publication Date Title
US11334692B2 (en) Extracting a knowledge graph from program source code
US8423952B2 (en) Method and an apparatus for automatic extraction of process goals
Davril et al. Feature model extraction from large collections of informal product descriptions
Vlas et al. A rule-based natural language technique for requirements discovery and classification in open-source software development projects
US20010037492A1 (en) Method and apparatus for automatically extracting verification models
US20090234640A1 (en) Method and an apparatus for automatic semantic annotation of a process model
RU2544739C1 (en) Method to transform structured data array
Martinelli et al. Enhanced privacy and data protection using natural language processing and artificial intelligence
CN112580331A (en) Method and system for establishing knowledge graph of policy text
Papanikolaou et al. Towards natural-language understanding and automated enforcement of privacy rules and regulations in the cloud: survey and bibliography
CN112597307A (en) Extraction method, device and equipment of figure action related data and storage medium
US10558760B2 (en) Unsupervised template extraction
Arnicans et al. Semi-automatic generation of a software testing lightweight ontology from a glossary based on the ONTO6 methodology
Hosseini et al. Analyzing privacy policies through syntax-driven semantic analysis of information types
CN114879936A (en) Method and system for acquiring safety requirement facing natural language requirement
Cui et al. {PoliGraph}: Automated privacy policy analysis using knowledge graphs
Alalfi et al. An approach to clone detection in sequence diagrams and its application to security analysis
CN109672586A (en) A kind of DPI service traffics recognition methods, device and computer readable storage medium
Upadhyay et al. Open information extraction with entity focused constraints
Anthonysamy et al. Inferring semantic mapping between policies and code: the clue is in the language
CN112287663B (en) Text parsing method, equipment, terminal and storage medium
Maiti Capturing, Eliciting, and Prioritizing (CEP) Non-Functional Requirements Metadata during the Early Stages of Agile Software Development
Abdelgawad et al. Synthesizing and analyzing attribute-based access control model generated from natural language policy statements
Jain et al. Pact: Detecting and classifying privacy behavior of android applications
CN112561714A (en) NLP technology-based underwriting risk prediction method and device and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination