CN117252261A - Knowledge graph construction method, electronic equipment and storage medium - Google Patents

Knowledge graph construction method, electronic equipment and storage medium Download PDF

Info

Publication number
CN117252261A
CN117252261A CN202311245542.0A CN202311245542A CN117252261A CN 117252261 A CN117252261 A CN 117252261A CN 202311245542 A CN202311245542 A CN 202311245542A CN 117252261 A CN117252261 A CN 117252261A
Authority
CN
China
Prior art keywords
test
entity
nodes
similarity
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311245542.0A
Other languages
Chinese (zh)
Inventor
蒋雨宁
赵学亮
李日标
邢振昌
申志奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Nanyang Technological University
Original Assignee
WeBank Co Ltd
Nanyang Technological University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd, Nanyang Technological University filed Critical WeBank Co Ltd
Priority to CN202311245542.0A priority Critical patent/CN117252261A/en
Publication of CN117252261A publication Critical patent/CN117252261A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application provides a knowledge graph construction method, electronic equipment and a storage medium. The knowledge graph construction method comprises the following steps: dividing a testing step according to a dependency syntax tree of a software testing text, and marking semantic roles of the testing step; identifying a test action and a test entity in the test step according to the semantic role, and combining the test action and the test entity to obtain a test action; acquiring a target named entity and a target entity category of the software test text, and embedding the target named entity and the target entity category into the test entity to obtain a target test entity; generating a node of a knowledge graph according to the target test entity, the test action and the test action, and constructing the knowledge graph of the software test according to the node of the knowledge graph. The method and the device can improve the accuracy of the knowledge graph for software testing.

Description

Knowledge graph construction method, electronic equipment and storage medium
Technical Field
The application relates to the technical field of software testing, in particular to a knowledge graph construction method, electronic equipment and a storage medium.
Background
With the increasing complexity and functionality of financial banking systems, software testers spend a great deal of time and effort to learn about various aspects of the system, and in order to facilitate software testing, similar test cases may be recommended or new test cases, vulnerability localization, and questions may be generated by building a knowledge graph. At present, a method based on a main-predicate-guest triplet is used for constructing a knowledge graph, and only named entities and corresponding relations are adopted for extraction, so that the constructed knowledge graph cannot accurately reflect the relation between the entities and other attributes in software testing, and the accuracy of the constructed knowledge graph is poor.
Disclosure of Invention
An object of the present application is to provide a method for constructing a knowledge graph, an electronic device and a storage medium, which aim to improve the accuracy of the constructed knowledge graph for software testing.
According to an aspect of the embodiments of the present application, there is provided a method for constructing a knowledge graph, the method including:
dividing a testing step according to a dependency syntax tree of a software testing text, and marking semantic roles of the testing step;
identifying a test action and a test entity in the test step according to the semantic role, and combining the test action and the test entity to obtain a test action;
Acquiring a target named entity and a target entity category of the software test text, and embedding the target named entity and the target entity category into the test entity to obtain a target test entity;
generating a node of a knowledge graph according to the target test entity, the test action and the test action, and constructing the knowledge graph of the software test according to the node of the knowledge graph.
In an embodiment, constructing a knowledge graph of a software test according to the nodes of the knowledge graph includes:
extracting test entity nodes, test action nodes and test action nodes according to the nodes of the knowledge graph;
respectively evaluating the entity similarity between the test entity nodes and the action similarity between the test action nodes;
calculating the behavior similarity between the test behavior nodes according to the entity similarity and the action similarity;
identifying the test entity node, the test action node and the matched node in the test action node according to the entity similarity, the action similarity and the action similarity;
and aligning the matched nodes, and constructing a knowledge graph of the software test according to the aligned nodes obtained by alignment.
In an embodiment, the evaluating the entity similarity between the test entity nodes and the action similarity between the test action nodes respectively includes:
respectively taking the test entity node and the test action node as nodes to be evaluated;
acquiring equivalent word dictionary attributes of the nodes to be evaluated and opposite word dictionary attributes of the nodes to be evaluated;
converting the equivalent word dictionary attribute into an equivalent word vector, and converting the opposite word dictionary attribute into an opposite word vector;
calculating cosine similarity between the nodes to be evaluated according to the equivalent word vector and the opposite word vector;
and determining the entity similarity between the test entity nodes and the action similarity between the test action nodes according to the cosine similarity between the nodes to be evaluated.
In an embodiment, determining the entity similarity between the test entity nodes and the action similarity between the test action nodes according to the cosine similarity between the nodes to be evaluated includes:
acquiring name attributes and category attributes of a plurality of nodes to be evaluated;
according to the editing operation times between the character strings corresponding to the name attributes, evaluating the name similarity between the name attributes;
Determining the category similarity between the category attributes according to whether the category attributes are the same attributes or not;
according to the graph structure similarity of the subgraph formed by each attribute of the node to be evaluated in the knowledge graph, carrying out linear weighting to obtain the subgraph similarity;
linearly weighting the cosine similarity, the name similarity, the category similarity and the sub-graph similarity to obtain the similarity between the nodes to be evaluated;
and determining the entity similarity between the test entity nodes and the action similarity between the test action nodes according to the similarity between the nodes to be evaluated.
In an embodiment, aligning the matched nodes, and constructing a knowledge graph of a software test according to the aligned nodes obtained by alignment, including:
combining the attributes of the matched nodes into the nodes of the knowledge graph, and connecting the matched nodes with the nodes of the knowledge graph to obtain aligned nodes;
respectively representing the test cases, the test loopholes and the test requirements as reference test nodes of the knowledge graph;
establishing an edge of a knowledge graph according to the semantic relation between the reference test node and the alignment node;
And constructing the knowledge graph of the software test according to the reference test node, the alignment node and the edges of the knowledge graph.
In one embodiment, the step of partitioning the test according to the dependency syntax tree of the software test text comprises:
extracting subtrees in the dependency syntax tree of the software test text;
searching whether the subtree has the nodes corresponding to the parallel conjunctions or not according to the labels of the nodes in the subtree;
if the subtree has the node corresponding to the parallel conjunctions, dividing leaf nodes of the subtree according to the position of the parallel conjunctions in the subtree, and obtaining the testing step;
and if the subtree does not have parallel conjunctions, obtaining the testing step according to the leaf nodes of the subtree.
In one embodiment, labeling the semantic roles of the test steps includes:
according to the characteristics of the software test text, generating a reference semantic role of the software test and a reference attribute corresponding to the reference semantic role; the reference attributes include attributes of the test actions and attributes of the test entities;
and marking the semantic roles of the test steps according to the reference semantic roles and the reference attributes, and obtaining the semantic roles corresponding to the test steps.
In an embodiment, obtaining the target named entity and the target entity class of the software test text includes:
acquiring target service data matched with a service scene where the software test text is located, performing feature engineering analysis on the target service data, and extracting a reference entity category;
extracting a test case and a reference text in a test vulnerability report, and labeling reference entities in the reference text according to the reference entity types;
dividing the reference text into word segments, and adding a start mark and an end mark to the software test text to obtain mark data;
training a deep learning model according to the reference entity and the marking data to obtain an entity identification model;
and identifying the software test text through the entity identification model to obtain the target named entity and the target entity category.
According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; and storage means for storing one or more programs that, when executed by the one or more processors, cause the electronic device to implement the methods provided in the various alternative implementations described above.
According to an aspect of embodiments of the present application, there is provided a computer program medium having computer readable instructions stored thereon, which, when executed by a processor of a computer, cause the computer to perform the methods provided in the various alternative implementations described above.
According to an aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in the various alternative implementations described above.
According to the technical scheme, the test steps are divided according to the dependency syntax tree of the software test text, semantic roles of the test steps are marked, the test actions and the test entities in the test steps are identified according to the semantic roles, and the test actions and the test entities are combined to obtain the test actions, so that the meaning and the indication of the test actions are accurately expressed, the test actions and the test entities can be extracted from the software test text, the main predicate-guest triples can be reflected, and the information groups taking the test actions as key arguments can be accurately extracted; and, through dependency syntax tree and semantic role labeling, to extract the test action, test entity, and the association between test action and test entity, generate test behavior as key demonstration, and embody the test process and sequence through test steps; further, the target named entity and the target entity category are embedded into the test entity to obtain a target test entity, so that the accuracy of the test entity can be improved; on the basis, the nodes of the knowledge graph are generated according to the target test entity, the test action and the test behavior, and the knowledge graph of the software test is constructed according to the nodes of the knowledge graph, so that the specific correlation between the test entity and the test action in the software test text can be more accurately captured by the knowledge graph, and the integrity and the accuracy of the knowledge graph in the software test scene for identifying the software test are improved.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned in part by the practice of the application.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
Fig. 1 shows a flow diagram of a method for constructing a knowledge graph, according to an embodiment of the present application.
FIG. 2 illustrates a flow diagram for building a knowledge-graph based on unstructured data, in accordance with an embodiment of the present application.
FIG. 3 shows a schematic diagram of a dependency syntax tree for "enter online payment system".
FIG. 4 shows a schematic diagram of a dependency syntax tree for "clients do not generate data other than financial risk products".
FIG. 5 shows a schematic diagram of a dependency syntax tree of the "optimization objective and Payment model" dependency syntax tree.
FIG. 6 shows a schematic diagram of a dependency syntax tree for "invest and entry and account opening and core application".
Fig. 7 shows a schematic diagram of a sub-graph corresponding to the test procedure in an embodiment of the present application.
Fig. 8 shows a schematic diagram of an application scenario of a knowledge-graph constructed according to an embodiment of the present application.
Fig. 9 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these example embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the present application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more example embodiments. In the following description, numerous specific details are provided to give a thorough understanding of example embodiments of the present application. However, those skilled in the art will recognize that the aspects of the present application may be practiced with one or more of the specific details omitted, or with other methods, components, steps, etc. In other instances, well-known structures, methods, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.
Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
Before introducing embodiments of the present application, the terms referred to in the embodiments of the present application will be explained first:
the large language model (Large Language Model, LLM) refers to a large-scale language model such as GPT-3 (generating Pre-trained Transformer). These models have billions or even billions of parameters that can produce coherent natural language text and achieve excellent performance over a variety of natural language processing tasks.
Knowledge graph (KM) is a graphical structure for organizing, representing and storing Knowledge. It is a semantic network consisting of a set of nodes representing entities or concepts and edges connecting the nodes, the edges representing relationships between the entities.
Natural language processing (Natural Language Processing, NLP), a branch of the field of artificial intelligence, involves the ability to process and understand human language. It is a research and application field on how computers interact with human language, understand and generate natural language text. The goal of NLP is to enable a computer to understand, parse, and process human language, thereby enabling efficient communication and interaction with humans. NLP involves a number of tasks and techniques including text classification, named entity recognition, semantic role labeling, syntactic analysis, and the like.
Open information extraction (PpenInformation Extraction, openIE), a natural language processing technique, aims to extract structured, computer-understandable information from text without the need for predefined templates or domain-specific knowledge.
Named entity recognition (Named Entity Recognition, NER) is intended to recognize words in text representing entities (person names, place names, organization, etc.).
Relationship identification (Relation Extraction, RE) aimed at identifying relationships between different entities.
Semantic role labels (Semantic RoleLabeling, SRL) are intended to identify Semantic roles of different words in a sentence, such as predicates, arguments, etc., to understand the semantics of the sentence in depth. Semantic role labels are somewhat similar to relationship extraction, but are more fine-grained. It not only identifies relationships between entities, but also further characterizes roles and attributes for these entities.
The transformer-based bi-directional encoder representation (Bidirectional Encoder Representations from Transformers, BERT) is a pre-trained model in natural language processing. The BERT model is pre-trained using a Trans-former model, where bidirectory represents that it takes into account contextual information in the text and predicts all words, subwords, or symbols in the text in the same model.
Roberta is an improved version of the BERT model, a large neural network model based on a transducer encoder. Roberta uses essentially the same techniques and designs as BERT, removing some of the constraints and drawbacks of BERT, including adding more sequence length and more data morphing, some pre-processing selection adjustments and refinements, not using a mask-in-BERT language model, etc. These improvements enable Roberta to better extract linguistic representations and expressions of textual mood than BERT.
Neo4j is a graph database commonly used in engineering and academic, supporting flexible and extensible graph storage, standard query language cyto, fast read-write and transactional access through relational query and linkage deletion techniques, etc.
The graph data science (Graph Data Science, GDS) library of Neo4j is an efficient, flexible and scalable graph analysis engine for graph data analysis and machine learning. The embodiments of the present application may use a GDS library to accomplish graph embedding and computation.
The following describes the application background of the embodiments of the present application in detail:
as a software system deployed on a computer, a financial banking system requires software testing to ensure proper operation. When the software test is performed, the knowledge graph can be used for representing the relation between various test related information in the software test, such as user requirements, system design and source code modification, and the relation between test cases, test defects and logs generated in the test process. The constructed knowledge graph can be further used for recommending similar test cases or generating new test cases, vulnerability localization and question and answer. The method enables the testers to quickly locate potential problems according to the relation on the knowledge graph, greatly improves the efficiency and quality of software testing, remarkably reduces the testing time, and can also provide more value for the software testers.
In some implementations, knowledge maps are constructed in dependence upon the identified entities. However, the knowledge graph constructed in this way has a limitation in terms of expression ability, and cannot accurately and completely express specific test behaviors. In particular, a large amount of heterogeneous data is generated during the software testing process, including different descriptions of user and developer reporting problems and solutions, general descriptive language, and a mix of banking and programming language proper terms, etc. Existing methods fail to process these data efficiently.
Also, the software test text is typically a complex long text in which the sentence structure reflects information such as important test steps. Subject (default to tester) will typically be omitted from the test document, while a number of verbs are used to represent the test action. These verbs are equally and even more important for the reference of a specific test behavior, especially when analyzing and similarity comparing multiple test documents. If only the extraction of the main predicate-guest triple is considered, important information such as a test step is ignored, only the dominant relationship can be reflected, and the recessive relationship can not be extracted. An explicit relationship such as a test action and the test entity it relates to, an implicit relationship including, for example, conditions and frequencies, manner, etc. under which the test action is completed.
Furthermore, in some implementations, a knowledge graph is built based on rules, representing knowledge as a series of rules and reasoning using those rules. This requires a lot of domain knowledge input and manual definition of rules, and the problem of rule crossing and collision cannot be avoided, which is not beneficial to maintenance and upgrading of the knowledge graph. In dealing with complex problems, rule-based approaches may result in intersections and conflicts between rules, thus requiring some rule optimization and reconstruction.
In order to solve the above problems, the following embodiments are presented to explain the technical solutions of the present application in detail.
Fig. 1 shows a flowchart of a method for constructing a knowledge graph according to an embodiment of the present application, the method comprising:
step S10: dividing the testing steps according to the dependency syntax tree of the software testing text, and marking the semantic roles of the testing steps.
In the embodiment of the application, the execution subject is an electronic device. The software test text is text related to the software test. Software test text such as text in a software bug report generated in a software test, text in a test case document, text in a test requirements document, and text in a financial banking system document, such as a system description, a function module operating manual. The dependency syntax tree is a tree structure representing the dependency relationship between words in the sentence of the software test text, and is used to describe the dependency relationship (edges) between each word (node) of the sentence, as well as the syntactic role and semantic association of each word. The test steps are operations and verification steps performed during the testing of the software and can be used to check whether the function, performance, stability or other quality attributes of the software meet expected requirements. The test step is used to confirm whether the system is performing as expected and to find potential problems or defects.
In an embodiment, the software test text of unstructured data can be processed to obtain the software test text in a data form of a unified format, so that the software test text can be conveniently processed, and a dependency syntax tree of the software test text is generated according to the grammar relation among words in the software test text of the unified format, so that the search test steps can be conveniently refined.
In one embodiment, one sentence in the software test text corresponds to one dependency syntax tree, and one sentence may contain one or more test steps. The step boundary words in the dependency syntax tree can be identified, the step boundary words are used as the dividing boundaries, the test steps are divided, and if the step boundary words are not identified, the fact that sentences possibly correspond to single test steps is indicated, and the division is not needed. Step boundary words may be various words such as "then," "simultaneously," etc. to indicate boundaries between different test steps. By searching out each test step through the dependency syntax tree, the test steps can be divided more accurately, so that the feature of the software test can be more accurately represented further based on the knowledge graph obtained by the test steps.
In one embodiment, annotating semantic roles of the test steps includes: acquiring defined semantic roles, such as: PREP (Prepositional Phrase) denotes the semantic role of preposition phrases in sentences, ARG0 (figure 0): ARG1 (figure 1) representing an initiative or a actor of a verb or a predicate verb: the ARGM ADV may represent a state of a test action, etc., and the semantic roles in the test step are labeled according to the obtained semantic roles.
In an embodiment, semantic role labeling can also be performed on actions, targets, parameters, expected results, and preconditions in the test step. Semantic roles may be added to the test steps by way of labels or annotations.
In an embodiment, text cleaning and normalization techniques may be used to process and convert the original software test data, and further perform feature extraction on the processed and converted software test data, extract key features from heterogeneous data, obtain a key software test text, construct a dependency syntax tree according to the key software test text, and search the test step according to the dependency syntax tree. Therefore, the standard software test text which is convenient to process can be extracted from unstructured data, and accordingly, the dependency syntax tree can be accurately constructed and the test steps can be searched.
Step S200: and identifying the test action and the test entity in the test step according to the semantic role, and combining the test action and the test entity to obtain the test behavior.
According to the semantic roles in the test step, the test action and the test entity are identified from the test step, for example, if the semantic role of the text in the test step is PREP, the corresponding attribute is identified as the test action, and if the semantic role is ARG0 or ARG1, the corresponding attribute is identified as the test entity. Combining test actions with test entities means that the test actions and the test entities are integrated as a test action, and specific data forms thereof are not limited herein. The test behavior includes a test action and a test entity.
In an embodiment, the testing step is "verify the payment amount outputted", and the testing action in the testing step is identified according to the semantic role as: verifying and inputting; the identified test entities are: pay amount, then obtaining test actions by combining includes: (test action 1: verify; test entity 1: pay amount (input; test action 2: input; test entity 2: pay amount) so that a test behavior can be extracted as an information group of key argument and a main predicate-guest triplet can be extracted, wherein the subject of the test step is null, the predicate is verify, and the object is pay amount. In this way, the meaning and designation of the test behavior is accurately expressed.
By analyzing the dependency syntax tree and semantic role labels, the test actions, the test entities and the association between the test actions and the test entities are extracted, the test actions are generated as key arguments, and the test steps are reserved to embody the test process and sequence. In this way, the embodiment can more accurately capture the specific correlation between the test entity and the test action in the software test text, and the integrity of the knowledge graph to the test scene can be improved through subsequent construction of the knowledge graph.
Step S300: and acquiring a target named entity and a target entity category of the software test text, and embedding the target named entity and the target entity category into the test entity to acquire the target test entity.
The target named entity is a named entity corresponding to the software test text in the service field, and the target entity class is an entity class of the entity corresponding to the software test text in the service field.
In one embodiment, the software test text is a test text related to a software system in the banking field, in which the target named entity refers to words of specific significance in the banking field, such as financial risk, XX bank, etc., and the target entity class, such as the target entity class in which the financial risk is located, is "product", and the target entity class in which the XX bank is located is "organization".
In an embodiment, the implementation manner of embedding the target named entity and the target entity category into the test entity may be that the content of the target named entity and the content of the target entity category are embedded into the structured data of the test entity, or the target entity category and the target named entity may be represented by using a label and associated with the test entity, so as to obtain the target test entity.
In one embodiment, the target named entity and target entity class in the software test text may be identified by a deep learning model, such as the BiLSTM-CRF model: this is a sequence labeling model based on a two-way long and short Term Memory network (Bidirectional Long Short-Term Memory, biLSTM) and conditional random fields (Conditional Random Field, CRF). BiLSTM may be used to capture context information, and CRF may be used to model dependencies between tags, thereby enabling identification of target named entities and target entity categories. Other deep learning models may be employed in addition to or in other ways to identify the target named entity and the target entity class.
In one embodiment, text features are extracted from text data of a number of software testing systems by machine learning methods, such as support vector machines (SVM, or Support Vector Machine), random Forest (Random Forest), or the like, and used to compare and rank the text. These methods use weights to compare and rank the importance between different texts, and can also avoid the problem of overfitting to some extent.
Step S400: generating a node of the knowledge graph according to the target test entity, the test action and the test action, and constructing the knowledge graph of the software test according to the node of the knowledge graph.
In an embodiment, generating a node of a knowledge graph according to a target test entity, a test action, and a test behavior includes: and respectively taking the target test entity, the test action and the test action as nodes of the knowledge graph to generate the nodes of the knowledge graph.
In an embodiment, in order to enrich the information of the knowledge graph, more information may be further added as a node of the knowledge graph on the basis of the target test entity, the test action and the test behavior as nodes of the knowledge graph, for example, the nodes respectively serving as the knowledge graph according to the test case and the test vulnerability may be further added into the nodes of the original knowledge graph.
In an embodiment, constructing a knowledge-graph of a software test from nodes of the knowledge-graph includes: and establishing the edges of the knowledge graph according to the relation between the nodes of the knowledge graph, and constructing the knowledge graph tested by the software according to the nodes of the knowledge graph and the edges of the knowledge graph.
In an embodiment, constructing a knowledge-graph of a software test from nodes of the knowledge-graph includes: the method comprises the steps of aligning nodes to be aligned of a knowledge graph, wherein the nodes to be aligned comprise a plurality of test action nodes, a plurality of test entity nodes and a plurality of test action nodes, the alignment mode is mainly to evaluate similarity among the nodes to be aligned, if the similarity is high enough, for example, the similarity is higher than a similarity threshold value or is in a reference similarity interval, node attributes of the nodes with the high enough similarity are combined, connection relations are created to align, and the knowledge graph of software testing is constructed according to the aligned nodes and the connection relations among the nodes. By performing the alignment, the accuracy of the knowledge graph can be improved.
Referring to fig. 2, fig. 2 is a schematic flow chart of building a knowledge-graph based on unstructured data according to an embodiment of the application, and fig. 2 includes steps 1 to 6, wherein:
the method mainly comprises the steps of data acquisition and preprocessing, wherein the acquired data are unstructured data, and the preprocessing process comprises the steps of text cleaning and normalization of the unstructured data to process and convert heterogeneous data into data in a unified format. Such as a software test text in a unified format for subsequent processing and analysis; unstructured data such as test cases, test vulnerability reports, and system configuration files.
Step 2, mainly performing dependency syntax analysis on the software test text obtained by preprocessing to generate a dependency syntax tree, and dividing or searching the test step according to the dependency syntax tree; dependency syntax analysis may be implemented by a rule-based built dependency syntax analysis model.
And step 3, marking the semantic roles of the test step, identifying or extracting the test actions and the test entities in the test step according to the semantic roles, and combining the test actions and the test entities to obtain test behaviors. The test behavior is taken as a key demonstration.
Step 4 is mainly to acquire or identify the target named entity and the target entity category of the software test text. Wherein, the target named entity and the target entity category can be identified by constructing an entity identification model based on deep learning.
And step 5, embedding the target named entity and the target entity category into the test entity to obtain the target test entity. And, the equivalent word dictionary and the opposite word dictionary of the test entity can be further embedded in the test entity to obtain the target test entity. The equivalent word dictionary and the opposite word dictionary of the test action can also be embedded in the test action to expand the test action.
And step 6, generating a node of the knowledge graph according to the obtained structured data, including a target test entity, a test action, a test behavior and the like, and constructing the knowledge graph of the software test according to the node of the knowledge graph. The target test entity, the test action and the test action can be respectively aligned, so that the constructed knowledge graph is more accurate.
In an embodiment, step 1 may specifically include steps 1 a-1 c. Wherein:
and step 1a, collecting original software test data.
Raw software test data such as data related to a financial banking test system mainly comprises vulnerability reports, test case documents, demand documents and financial banking system documents (such as system introduction and functional module operation guide). The vulnerability report and the test case can be directly stored in the test system, and can be exported in an Excel or CSV format. The demand document may be output directly from HTML in JSON format with tags. The system document may be exported in a TXT format.
The following exemplifies a set of software test data in the original device test data, wherein the set of software test data has an ID of: 987654; the name is: [ OPS ] -online payment system-payment amount input verification; the demand types are: functionally related; the label is as follows: payment security; the test task priority is: p1-high; the test requirements are as follows: a system; the test preconditions are: entering an online payment system; the example steps are as follows: (1) selecting a test account and logging in; (2) opening the payment function and selecting a payment means; (3) Inputting an invalid (payment) amount-inputting a negative number-inputting an illegal character (4) in an attempt to make a payment operation (http:// 123); the test expected results were: the system should detect the invalid payment amount and give a corresponding error prompt-the system should not pay; the creation time is: 2022/02/02 15:40.
And step 1b, performing data format conversion on the original software test data to obtain the software test data in the target format.
The target format is, for example, JSON format. Through normalization technology, data in different formats are converted and stored into JSON format in a unified mode, so that data analysis and comparison can be performed better. Taking a test case document as an example for illustration, the method comprises the following processing steps:
test cases lacking test steps or completely repeated are removed.
Unifies the date format. The date format in the original document is various, such as "year/month/day time minute" or "year/day/month time minute" or "year-month-day time minute", which is unified as "year-month-day time minute", for example, "2023-05-16:00".
And establishing a related dictionary of the system elements and unifying naming standards. Such as "OPS" and "online payment system" both refer to "online payment system", the relevant nomenclature in the document is unified as "online payment system".
The method comprises the steps of extracting test case IDs, names, test requirements, requirement types, requirement quantity, test case types, labels, test task priorities, test preconditions, test steps, test expected results, test modes and creation time from Excel or CSV files, and storing the test case IDs, names, test requirements, requirement types, requirement quantity, test case types, labels, test task priorities, test preconditions, test expected results, test modes and creation time in a JSON format in sequence.
And step 1c, preprocessing preset characters in the original software test data to obtain a required software test text.
The predetermined characters may be some unnecessary characters, which may be called noise characters. Such as: some language-qi auxiliary words such as "o", "have" and the like can be removed according to the rules, and the "selected" is processed as "selected"; some unnecessary punctuation marks can be preprocessed according to rules, and a tab display label (none) is processed into a tab display label which is not shown: the "symbol processing is" including ". The "+" symbol is treated as a "sum". An HTML blacklist may also be established, removing unnecessary HTML tags based on rule alignment.
By the above processing, vulnerability reports, test cases, and requirement documents as shown below can be obtained.
The content of the vulnerability report is, for example: "ID" 308002"," report content "in making payment, the system does not verify the payment amount entered, resulting in any amount payable", "affected product" online payment system "," discovery step "SIT", "vulnerability type" functional case "," release plan "2023 years Q3", "release version" 1.5.0"," status "pending", "severity" high "," solution type "code repair", "creation time" 2023-05-1610:00"," repair time "2023-07-1509:00", "closing time" 2023-07-2016:30"," required time "5 working days", "associated defect ID" not associated "," associated defect name "not associated".
The content of the test case document is, for example: "ID": "987654", "name": "payment amount input verification", "test requirement": "online payment system payment security test", "requirement type": system requirement "," requirement number ": 1.0", "type": functional case "," tag ": payment security", "task priority": high "," precondition ": enter online payment system", "step:" 1. Selecting a test account and logging in; 2. opening a payment function and selecting a payment mode; 3. entering an invalid payment amount (e.g., negative or illegal character); 4. attempting to make a payment operation, wherein an online payment system detects an invalid payment amount and gives a corresponding error prompt; the online payment system should not pay "," test mode "," manual "," creation time "2022-02-0215:40".
The content of the demand document is, for example: "ID": "149000", "demand name": "online payment system pays for security test", "demand priority": high "," demand description ": "to ensure payment security of an online payment system, comprehensive testing and evaluation is required. The focus of the test is to verify whether the system is able to secure user information and funds when processing payment transactions. The following are test requirements for payment security of an online payment system: 1. a verification function of verifying the payment amount, such as whether the test system can verify the inputted payment amount, preventing any amount from being paid, and confirming that the system can correctly detect and process an abnormal or illegal payment amount; 2. testing user identity verification and authorization mechanisms, such as validation systems, enables accurate verification of user identity to prevent unauthorized payment. "
Through the method, noise in the software test text can be reduced, the text is cleaner, the text can be unified into a more consistent expression form, and semantic matching is easier to carry out when a knowledge graph is constructed.
In one embodiment, the step 2 mainly includes steps 2a to 2e. Wherein,
step 2a: and acquiring the preprocessed software test text, deleting step marks in the software test text, and generating a step sequence.
The software test text may be a long text, step marks such as a serial number or a mark in the long text. For example, the software test text is: the method comprises the steps of 1, entering an online payment system 2, generating no data except financial insurance products by a client 3, optimizing a target and a payment mode 4, carrying out investment, part feeding, account opening and core application, deleting step marks and generating a step sequence: [ ' entering an online payment system ', ' customers do not generate data other than financial risk products ', ' optimization goals and payment modes ', ' invest and enter and account and core applications ', '.
And 2b, performing word segmentation and part-of-speech tagging on each step in the step sequence to obtain a word segmentation and part-of-speech tagging result.
The implementation modes are as follows: and loading a "tok/fine" model of the HanLP, and performing Chinese word segmentation task. Inputting the text into a "tok/fine" model, performing word segmentation prediction on the text, and returning a word segmentation result. The step of entering an online payment system in the step sequence can obtain the word segmentation by performing word segmentation and part-of-speech tagging, wherein the word segmentation comprises the following steps of: [ ' entering ', ' online ', ' payment ', ' system ', ' part of speech notation: entry/VV online JJ payment/NN system/NN.
The step sequence of the client not generating data except financial risk products can obtain word segmentation by word segmentation and part-of-speech tagging, wherein the word segmentation comprises the following steps: [ ' customer ', ' none ', ' generation ', ' financial risk ', ' product ', ' outside ', ' data ', ' part of speech notations: the client/NN has no/VV generation/VV financial risk/NN product/NN out/DEG data/NN of the LC.
The optimization target and payment mode in the step sequence can obtain the word segmentation by performing word segmentation and part-of-speech tagging, wherein the word segmentation comprises the following steps of: [ ' optimization ', ' objective ', ' and ', ' payment ', ' mode ', ' ], part of speech notations are: optimization/VV objective/NN and/CC payment/NN mode/NN.
In the step sequence, the investment, the entry, the account opening and the core application are subjected to word segmentation and part of speech tagging, so that the word segmentation can be obtained as follows: the terms "advance", "investment", "and", "feed", "and", "open", "and", "core", "application", "part of speech" are given as: the performance/VV investments/NN and/CC entries/NN and/CC account opening/NN and/CC cores/NN applications/NN.
VV represents a verb, NN represents a noun, CC represents an conjunctive, JJ represents an adjective, LC represents an azimuth, DEG represents "respectively.
And 2c, obtaining the dependency relationship among the words in the sentence based on the word segmentation and the part-of-speech tagging, and generating a dependency syntax tree.
The manner in which dependency syntax analysis is performed is as follows: loading a 'con' model of the HanLP, carrying out a dependency syntax analysis task, analyzing the recursion composition of a sentence on grammar, and representing the recursion composition as a task of a tree structure. And inputting the results of word segmentation and part-of-speech tagging into a 'con' model, and constructing a dependency syntax tree according to the component syntax tags and the dependency relationship.
The dependency syntax tree of "entering the online payment system" is shown in fig. 3, the dependency syntax tree of "the client does not generate data other than financial risk products" is shown in fig. 4, the dependency syntax tree of "the optimization objective and the payment model" is shown in fig. 5, and the dependency syntax tree of "the investment and the entry and the account opening and the core application" is shown in fig. 6.
In the dependency syntax tree, tok represents a vocabulary in the dependency syntax tree, the vocabulary being obtained based on the word segmentation result, poS represents a part-of-speech tag, the part-of-speech tag being obtained based on the part-of-speech tag result. The dependency syntax tree includes nodes and edges, one node is a word (ToK) in a sentence, for example, in the dependency syntax tree shown in fig. 3, the nodes are entering, online, paying and systematic respectively, wherein the edges represent the dependency relationship of each node, VP represents verb phrase, IP represents phrase of small sentence, and NP represents noun phrase. In the dependency syntax tree shown in fig. 3, nodes are client, none, generated, financial risk, product, other than, data, LCP represents azimuth phrase, and DNP represents phrase representing the relationship of "composed". In the dependency syntax tree shown in FIG. 4, the nodes are optimization, objective, and, payment, schema, respectively. In the dependency syntax tree species shown in fig. 5, nodes are run, invest, sum, enter, sum, account, sum, cancel, and apply, respectively.
And 2d, dividing the testing step according to the dependency syntax tree of the software test text.
In one embodiment, the step of partitioning the test according to the dependency syntax tree of the software test text comprises:
extracting subtrees in the dependency syntax tree of the software test text; searching whether the subtree has the nodes corresponding to the parallel conjunctions according to the labels of the nodes in the subtree; if the subtree has the nodes corresponding to the parallel conjunctions, dividing leaf nodes of the subtree according to the positions of the parallel conjunctions in the subtree, and obtaining a testing step; if the subtree does not have parallel conjunctions, the testing step is obtained according to the leaf nodes of the subtree.
A subtree represents a partial tree structure consisting of a central node and all its children directly or indirectly dependent on that node. Parallel conjunctions such as "and", "and the like. Leaf nodes in a subtree refer to nodes that are present in the subtree. If there are parallel conjunctions, it means that there are different test steps, a more accurate test step can be obtained by performing the partitioning, if there are no parallel conjunctions, no splitting is needed, and a test step is obtained based on leaf nodes of the subtree.
For example, for the dependency syntax tree of "optimization objective and payment pattern", the "optimization" is a verb, and "objective" and "payment pattern" are objects and object complements of the verb "optimization", depending on which the subtrees include "and" this parallel conjunctions, and therefore, the "optimization objective and payment pattern" is classified into the "optimization objective" and the "optimization payment pattern" based on the position where the "and" are located, or understood as depending on the dependency relationship between the "and" optimization "," payment pattern ".
The following embodiment provides a more accurate way to search for test steps by searching for a function through sub-steps, traversing the dependency syntax tree and generating a sequence of sub-steps.
The implementation mode of the substep search function comprises the following steps: returning to the leaf node if the subtree has only one leaf node; initializing a subtree partition list and a leaf node list if the leaf nodes of the subtree are more than one; and traversing each node in the subtree, if the label of the next node is CC and the label is parallel conjunctions in the traversing process, merging the contents in the leaf node partition list into a character string, adding the character string into the subtree partition list, and clearing the leaf node partition list. If the label of the next node is not the parallel conjunctions, recursively calling a sub-step search function for the next node, obtaining a leaf node division result, initializing a new leaf node list, traversing the leaf node division list and the leaf node division result, generating a new leaf node list, and updating the leaf node division list. After traversing, adding the content in the leaf node partition list into the subtree partition list, and returning to the subtree partition list to obtain the substep. The above-mentioned sub-steps are the test steps to be obtained.
And 2e, forming a list of the obtained testing steps according to the sequence of the steps so as to facilitate the subsequent processing.
For example, the list obtained is: the method comprises the following steps of [ ' entering an online payment system ', ' not generating data except financial risk products ', ' optimizing target ', ' optimizing payment mode ', ' investment ', ' advancing, opening an account ', ' core application ', '.
By adopting the mode, the testing steps in the software testing text can be divided, so that the knowledge graph of the software testing is conveniently constructed.
In one embodiment, the step 3 mainly includes steps 3a to 3d.
And 3a, designing specific semantic roles according to the test text characteristics.
The semantic roles of the design are exemplified as follows: the semantic role of the design is PREP, the corresponding attribute is test action, and the corresponding text is "login", "creation", "execution", "optimization"; the set semantic role is ARG0, the corresponding attribute is a test entity, and the corresponding text is, for example, "management desk", "login person"; the semantic role is ARG1, the corresponding attribute is a test entity, and the corresponding text is such as 'enterprise name', 'mobile phone number', 'cooperative enterprise project'; the semantic role is ARGM ADV, the corresponding attribute is the state of the test action, and the corresponding text is "success", "first", "all", "leading", "none"; the semantic role is ARGM MNR, the corresponding attribute is the mode of occurrence of the test action, and the corresponding text is 'name of the enterprise according to importing'; the semantic role is ARGM TMP, the corresponding attribute is the occurrence time of the test action, and the corresponding text is 'after test 1 is executed'; the semantic role is ARGMLOC, the corresponding attribute is the position where the test action occurs, and the corresponding text is the 'on-right system'.
And 3b, marking the semantic roles of the test steps.
In one embodiment, annotating semantic roles of the test steps includes: according to the characteristics of the software test text, generating a reference semantic role of the software test and a reference attribute corresponding to the reference semantic role; the reference attributes include attributes of the test actions and attributes of the test entities; and marking the semantic roles of the test steps according to the reference semantic roles and the reference attributes, and obtaining the semantic roles corresponding to the test steps.
The characteristics of the software test text refer to preset characteristics of the text in the software test field, for example, the text has characteristics including actions, entities, conditions, expected results and the like in the software test field, wherein the actions include various verbs, the entities include various nouns, the conditions include limiting conditions for test execution, and the expected results include expected results generated by the test. The features of the software test text are combined to generate the reference semantic roles possessed by the software test. And referencing semantic roles, namely the semantic roles in the generated software testing field. The reference attribute is an attribute of a semantic character in the software testing field, such as a test action, a test entity, a state of the test action, a manner in which the test action occurs, and the like.
By marking the semantic roles of the test steps in the mode, the semantic roles of the test steps in the field of software testing can be obtained, and the constructed knowledge graph can accurately show the relation among elements in the software testing, so that the accuracy of the knowledge graph is improved.
In one embodiment, each step in the sequence of steps may be semantically personally annotated. The implementation modes are as follows: and loading a 'srl' model of the HanLP, and carrying out semantic role labeling tasks. Inputting the text of the test step in the step sequence into an srl model, marking the text with semantic roles, replacing the text with the designed semantic roles according to rules, and returning to a semantic role list corresponding to each step.
For example, for the test step "enter online payment system", its semantic role list is: [ [ "enter", "PRED",0,1], [ "online payment system", "ARG1",1,4] ]; for the test step "optimize pay mode", the semantic role list is: [ ('optimized', 'PRED',0, 1), ('pay mode', 'ARG1',1, 3) ].
And 3c, defining a test entity, a test action and a test behavior.
Defining a test entity includes a plurality of attributes: entity name, entity class, other test entities equivalent to the test entity, etc. Such as "user interface-generic entity- [" interface "," window "]" is a test entity. By defining this attribute of the entity class, it can be further used to embed a target entity class, which together with the attribute of "other test entities equivalent to the test entity" is used to evaluate the similarity between test entities and align the test entities.
The test action includes a plurality of attributes: the name of the test action, the state of the test action, the generation mode of the test action, the generation time of the test action, the generation position of the test action, other test action lists equivalent to the test action and other test action lists opposite to the test action. For example, "check-success-after entering a valid password according to the imported user ID-password entry box- [" check result "] [" check failure "," reject check "]" is a test action. The two attributes "other test action list equivalent to the test action" and "other test action list opposite to the test action" may be used to calculate a similarity calculation between test actions and align the test actions based on the similarity.
The test behavior includes at least one attribute: test action name (denoted as "test action:: test entity"). For example, "query: running water status" is a test behavior. The test behavior is a key argument that constitutes the atlas.
And 3d, identifying or extracting the test action and the test entity in the test step according to the semantic role, and combining the test action and the test entity to obtain the test action, wherein the test action is used as a key demonstration.
Traversing the semantic role list generated in the step, and extracting test entities, test actions and test behaviors according to rules by combining defined semantic roles. For attributes for which no relevant information is extracted, "n/a" is used to supplement. For example, the semantic role list is: [ [ "into", "PRED",0,1], [ "online payment system", "ARG1",1,4] ], the test entity obtained is: "Online Payment System-n/a-n/a", test action is: "ingress-n/a-n/a-n/a-n/a" test behavior is: and entering an online payment system. The semantic role list is: [ ('optimized', 'PRED',0, 1), ('pay mode', 'ARG1',1, 3) ]. The obtained test entity: "Payment model-n/a-n/a"; the test action obtained was: "optimization-n/a-n/a-n/a-n/a"; the test behavior obtained was: "optimization: pay mode".
And 4, mainly acquiring a target named entity and a target entity category of the software test text.
In one embodiment, obtaining a target named entity and a target entity class of a software test text includes: acquiring target service data matched with a service scene where a software test text is located, performing feature engineering analysis on the target service data, and extracting a reference entity category; extracting a test case and a reference text in a test vulnerability report, and labeling reference entities in the reference text according to reference entity categories; dividing a reference text into word segments, and adding a start mark and an end mark to a software test text to obtain mark data; training the deep learning model according to the reference entity and the marking data to obtain an entity identification model; and identifying the software test text through the entity identification model to obtain the target named entity and the target entity category.
The business scenario is a scenario of a business of a software test practical application, such as a financial bank, electronic commerce, and the like. The target service data is the service scene related data. And referring to entity category, namely the entity category matched under the service scene. In a particular business scenario, there will be corresponding entities and entity types, e.g., in the financial banking field, various entities related to finance, and specific entity types, e.g., organization organizations, financial products, personnel, bank-specific entities, general entities, etc., in the e-commerce field, commodity entities, user entities, order entities, payment entities, etc., specific entity categories, e.g., commodity, brand, product model, order number, etc.
A test case is a normalized description of a set of input values, execution steps, and expected results for verifying the function, performance, or characteristic of a software system or application under specific conditions. A test vulnerability report is a description and record of discovered vulnerabilities or problems. The reference text is text extracted from the test vulnerability report and the test cases, and the reference entity is an entity in the reference text. The start mark is used for indicating the start position of the sentence, and the end mark is used for indicating the end position of the sentence. The addition of the start marker and the end marker can make it easier for the deep learning model to extract the relationships between the structure of the text and the sentences. And generating a pre-training text based on the reference entity and the marking data, training the deep learning model, and taking the deep learning model as an entity recognition model when the training of the deep learning model is completed.
By adopting the mode, the deep learning model is adopted, the named entity and the category thereof can be identified, and the named entity and the category are embedded into the test entity, so that the further constructed knowledge graph is more accurate.
In one embodiment, step 4 includes steps 4a through 4m. Wherein:
step 4a, based on the text and banking content in the banking field, carrying out characteristic engineering analysis, and extracting the entity type to be identified: general entities, bank specific entities, personnel, products, and institutional organizations.
Wherein, general entities such as pages, bank specific entities such as XX advertising platforms, personnel such as clients, managers, products such as financial risks, and organizations such as XX banks.
And 4b, labeling the entity in the vulnerability report and the test case.
The automatic open source tool slave can be used for data annotation, so that the annotation efficiency and accuracy are improved. The method comprises the following specific steps: and cutting the text into sentence granularity to generate a TXT format document needing marking. For example, "the system should detect an invalid payment amount and give a corresponding error indication; the system should not pay "cut into" the system should detect an invalid payment amount and give a corresponding error indication "and" the system should not pay ". And running a slave tool at the computer terminal, and manually identifying the entity in the sentence and marking the corresponding predefined entity type for each input sentence.
In the labeling process, the location and category of the entity are identified for a single sentence. Locations such as ((1, 0), (1,0,3)), which means that the entity with the start location (0, 0) and the end location (0, 3) in the first sentence is labeled as a generic entity.
In one embodiment, the labeled sentence is "enter online payment system", and the identified entity is: (1, 0, 2), (1,0,7)) -label: general_component (tag: generic entity); ((2,0,0), (2,0,5)) -label: general_component (tag: person).
In one embodiment, the annotated sentence is "data other than the customer did not generate financial risk products"; the identified entities are: (2,0,6), (2,0,8)) -label: product; (2, 0, 14), (2, 0, 15)) -label: general_component (tag: general entity).
And 4c, generating a pre-training text according to the marked sentences and the positions and the categories of the identified entities.
The format of the pre-trained text generated is { "text": "," tag "{" specific tag "[ [ location of text start, location of text end ] ] }.
For example, the location and category of the tagged sentence, identified entity is "enter online payment system" ((1, 0, 2), (1,0,7)) -tag: the general entity, the pre-training text generated is: { "text": "enter online payment system", "tag": { "generic entity": { { "online payment system": [ [2,7] ] }. The positions and categories of the marked sentences and the identified entities are as follows: "customer does not generate data other than financial risk products" ((2,0,0), (2,0,1)) -tag: personnel ((2,0,6), (2,0,8)) -tag: product ((2, 0, 14), (2, 0, 15)) -tag: a generic entity. The pre-training text generated is: { "text": "customer does not generate data other than financial risk products", "tag" { "personnel" { { "customer": [0,1] ], "product" { { "financial risk": [6,8] ], "general entity" { { "data": [14, 15] ] }.
And 4d, dividing the text into word fragments, and adding a start mark at the beginning of the text and an end mark at the end of the text.
The "[ CLS ]" tag may be used as a start tag of a sentence and the "[ SEP ]" tag may be used as an end tag of a sentence. These special tags facilitate the model in extracting the relationships between the structure of the text and the sentences.
For example, the text is "enter online payment system", and [ "[ CLS ]", "enter", "online", "payment", "system", "[ SEP ]" ] is obtained by dividing the text into word segments and adding a start tag and an end tag. The text is: "the customer does not generate data other than financial risk products". The [ "[ CLS ]", "client", "none", "generate", "financial", "risk", "product", "outside", "data", "[ SEP ]" ] is obtained by dividing the text into word segments and adding a start tag and an end tag.
And 4e, establishing a mapping relation from the words to the indexes.
In one embodiment, word-to-index mappings are established using a dictionary pre-trained by the Chinese-RoBERTa-WWM-Ext model. Taking "online payment system" as an example, assume that the index of "online" in the dictionary is 101, the index of "payment" is 102, and the index of "system" is 103. After each word is converted to its corresponding index, an index sequence [101, 102, 103] can be obtained.
And 4f, replacing each index with a corresponding word vector to form a vector sequence with a fixed length, and converting the vector sequence into an input feature matrix suitable for the RoBERTa model.
The feature matrix, namely, the word vector sequence [ [ word vector_1, word vector_2, word vector_3 ] ], is input.
For example, for text with added labels: [ "[ CLS ]", "enter", "online", "pay", "system" "[ SEP ]" ], obtained by indexing and replacing with corresponding word vectors: CLS word vector_1 word vector_2 word vector_3 word vector_4 word vector_5 word vector_6 [ sep ]. For tagged text: [ "[ CLS ]", "client", "no", "generate", "finance", "risk", "product", "outside", "data" "[ SEP ]" ], obtained by indexing and replacing with corresponding word vectors: CLS word vector_1 word vector_2 word vector_3 word vector_4 word vector_5 word vector_6 word vector_7 word vector_8 word vector_9 word vector_10 word vector_11 word vector_12 word vector_13 word vector_14 word vector_15 word vector_16 [ sep ].
And 4g, building a named entity recognition model, loading the pre-training model as a basic model, and adjusting parameters according to the data and the characteristics of the named entity recognition task.
In an embodiment, a named entity recognition model can be built by using PyTorch, a pre-training model of Chinese-RoBERTa-WWM-Ext is loaded as a basic model, and adjustment parameters according to data and named entity recognition task characteristics are as follows: the dropout probability of the attention mechanism is 0.1; the directivity is bidirectional; a hidden layer activation function, GELU; dropout is implemented in the hidden layer, and the probability of being removed is 0.1; the number of neurons of the hidden layer is 768; initializing the range to be 0.02; the number of neurons in the middle layer is 3072; maximum number of position inserts 512; the number of the attention heads is 12; the number of layers of the hidden layer encoder is 12; number of tags: 22; outputting attention weights: false; outputting a hidden state: false; outputting past states: true; 768 parts of the size of the full connecting layer of the pool reactor; the number of attention heads of the pooling device is 12; the number of the full connection layers of the pooling device is 3; the size of each attention head of the pooling device is 128; the type of the pooler is that the first mark is transformed; the type vocabulary size is 2; vocabulary size 21128.
In the above implementation, the BERT-Base-Chinese model is not used, but rather the Chinese-RoBERTa-WWM-Ext model is used. In the BERT-Base-Chinese model, chinese is segmented with word granularity, and mask tasks in the BERT pre-training process are also operated with word granularity. The Chinese word segmentation operation in the traditional natural language processing is fully considered, and Mask operation is carried out by taking words as granularity. The evaluation performance data of the named entity recognition model is obtained through careful analysis, and the named entity recognition model has excellent performance on accuracy, recall and F1 values aiming at different label categories of general entities, bank specific entities, personnel, products and organization.
In an embodiment, named entity recognition, namely Few-shot or Zero-shot LLM, can be performed by using low-resource language modeling with low-sample learning, and the problem of data starvation required for building a natural language processing system under the condition that enough data is not available can be solved.
And 4h, adding a conditional random field layer on the pre-training model to obtain the pre-training model with the conditional random field layer.
The conditional random field layer can utilize a conditional random field model to model the dependency relationship among the labels, so that the label prediction accuracy is improved. In addition, a loss function is defined as a cross entropy loss function, and the difference between the model output and the real label is calculated.
And 4i, copying the training data for a plurality of times, and carrying out different full word mask operations on each word segment according to a certain probability by each copy.
The number of copies is, for example, 10 words, the probability is, for example, 15%, and the whole word masking operation is, for example, adding the tag "[ MASK ]", which is equivalent to using 10 static MASKs for the original data set.
For example, the training data includes [ "[ CLS ]", "entering", "online", "payment", "system" "[ SEP ]" ], and then after the above operation, mask 1 is obtained: [ "[ CLS ]", "enter" "," [ MASK ] "," "pay", "system" "[ SEP ]" ]; mask 2: [ "[ CLS ]", "[ MASK ]", "online", "pay", "system", "[ SEP ]" ] … … MASK 10: [ "[ CLS ]", "enter", "online", "pay" "[ MASK ]", "[ SEP ]" ].
And 4j, training the RoBERTa model by using the marked data according to the specific requirements and data characteristics of the test task, and adjusting model training parameters.
Such as optimizing parameters of the model by minimizing the loss function. Some important parameters are exemplified as follows: epsilon parameters in Adam optimizer: 1e-08; maximum sequence length of evaluation phase: 512. Conditional random field layer learning rate: 5e-05; round: 14; total optimization steps: 924.
and 4k, generating a prediction result output by the model according to the specification of named entity identification and a marking rule BIOS, and identifying the starting position and the category of the named entity.
In the marking rule BIOS, "B" represents the starting position of an entity, appearing on the first word of the entity; "I" means the internal location of an entity, appearing on a non-first word of the entity; "O" means a non-physical word; "S" means an individual entity, i.e., an entity consisting of only one word.
For example, the software test text to be identified is "entering an online payment system", and the starting position and the category of the identified named entity are "OOB-general entity I-general entity"; the software test text to be identified is that 'the client does not generate data except financial risk products', and 'B-personnel I-personnel OOOOB-product I-product OOOOOB-general entity I-general entity'.
And 4l, decoding and post-processing are carried out, and the prediction result is converted into a final entity identification result.
The entity identification result is { "id":1, "tag": { "generic entity": { { "online payment system": [ [2,7] ] }, for the software test text: "enter online payment system". Aiming at a software test text, wherein 'data except financial risk products are not generated by clients', an entity identification result is { 'id }' 1, 'tag {' personnel { 'client { [0,1] ], product { { {' financial risk [ [6,8] ], general entity { { "data [ [14, 15] ] }.
And 4m, performing entity recognition on the software test text which does not participate in training by using the trained model.
This step still further comprises: data format conversion, creation of test data sets and loaders, and decoding and post-processing of prediction results. Such as test text for software: "customer does not purchase financial risk", entity identification results in: [ { 'tag {' person { 'customer [ [0,1] ] }, product {' financial risk } 'is [ [5,7] ] }, text:' customer does not purchase financial risk } ].
By adopting the mode, the relation and knowledge representation among the entities can be automatically learned from a large amount of data by using the method based on deep learning, the defects of manual definition and rule reasoning are avoided, meanwhile, rule crossing and conflict can be avoided, for example, the starting position and the ending position of the term can be automatically judged and accurately segmented, the method can also be suitable for texts in different languages, and the maintainability and the upgradeability of the knowledge graph are improved.
In one embodiment, step 5 includes steps 5a through 5f.
And 5a, loading a word vector model.
An already trained Word vector model, such as the Word2Vec or GloVe model, is selected and loaded. And acquiring the vocabulary and the word vectors corresponding to the vocabulary according to the loaded word vector model.
And 5b, aiming at the input test entity, acquiring word vectors corresponding to the test entity according to a word vector model, calculating the similarity between the word vectors, determining similar words by using a predefined similarity threshold, and constructing an equivalent word dictionary of the test entity.
Specifically, if the similarity is greater than a predefined similarity threshold, a word vector greater than the threshold is determined to be a similar word, and an equivalent word dictionary for the test entity is constructed from the similar word.
And 5c, aiming at the input test entity, acquiring word vectors corresponding to the test entity according to a word vector model, calculating the similarity between the word vectors, selecting the vocabulary with the lowest similarity as opposite words, and constructing an opposite word dictionary of the test entity according to the opposite words.
Step 5d, aiming at the input test action, obtaining word vectors corresponding to the test action according to a word vector model, calculating the similarity between the word vectors, determining similar words by using a predefined similarity threshold, and constructing an equivalent word dictionary of the test action; and selecting the vocabulary with the lowest similarity as the opposite words, and constructing an opposite word dictionary of the test action according to the opposite words.
And 5e, embedding the equivalent word dictionary corresponding to the test entity, the equivalent word dictionary corresponding to the opposite word dictionary and the equivalent word dictionary corresponding to the test action into the test entity and the test action generated before, and obtaining the target test entity and the target test action.
For example, for a test entity: "Online Payment System-n/a-n/a", embedding equivalent word dictionary, obtaining test entity as follows: "Online Payment System-n/a- [" electronic Payment System "]; for the test action: "enter-n/a-n/a-n/a-n/a-n/a-n/a" embedding an equivalent word dictionary and an opposite word dictionary to obtain: "enter-n/a-n/a-n/a-n/a- [" log in "]" log out "]". For test entities: "Payment pattern-n/a-n/a", embedding an equivalent word dictionary, obtaining a test entity: "Payment pattern-n/a- [" payment means "," payment pattern "]"; for the test action: "optimization-n/a-n/a-n/a-n/a-n/a-n/a", embedding an equivalent word dictionary and an opposite word dictionary, obtaining a test action: "optimize-n/a-n/a-n/a-n/a- [" improve "," promote "]" worsen "," degenerate "," cancel optimize "].
And 5f, associating the test entity with the existing named entity and entity category.
The association mode is as follows: the existing named entities and entity categories are embedded into the test entity to further improve the accuracy and consistency of the entity.
For example, for a test entity: "Online Payment System-n/a- [" electronic Payment System "]", embedding named entity and entity category, obtaining test entity as: "Online Payment System-generic entity- [" electronic Payment System "]"; for test entities: "Payment model-n/a- [" payment method "," payment model "]", embedding named entity and entity category, obtaining test entity: "Payment mode-generic entity- [" payment means "," payment mode "]. If the test entity is the same as the named entity, one is reserved.
By adopting the implementation mode, the names of the test entity and the test action are input into the pre-trained word vector model, the dictionary of equivalent words and opposite words is generated, the alignment of the test entity and the test action is further supported, and the alignment accuracy of the test entity and the test action can be improved. On the basis of a knowledge graph construction method based on rules, a named entity recognition model based on deep learning and a word embedding model based on pre-training word vectors are established to support embedding and alignment of graphs. The advantages of natural language processing and deep learning models can be fully utilized, and the software testing knowledge graph can be constructed in a more accurate and comprehensive mode.
In an embodiment, step 6 corresponds to step S400 in the above embodiment, that is, includes: generating a node of the knowledge graph according to the target test entity, the test action and the test action, and constructing the knowledge graph of the software test according to the node of the knowledge graph.
In an embodiment, constructing a knowledge-graph of a software test from nodes of the knowledge-graph includes: extracting a test entity node, a test action node and a test action node according to the nodes of the knowledge graph; respectively evaluating the entity similarity between the test entity nodes and the action similarity between the test action nodes; according to the entity similarity and the action similarity, calculating the action similarity between the test action nodes; identifying test entity nodes, test action nodes and matched nodes in the test action nodes according to the entity similarity, the action similarity and the action similarity; and aligning the matched nodes, and constructing a knowledge graph of the software test according to the aligned nodes obtained by alignment.
In the process of constructing the knowledge graph, the test entity node, the test action node and the test action node can be aligned in the mode, and the alignment process mainly matches and connects different nodes and relations.
The entity similarity refers to the similarity between the test entity nodes, such as the similarity between some two test entity nodes, and the action similarity refers to the similarity between the test action nodes, such as the similarity between some two test action nodes. When evaluating entity similarity or action similarity, the similarity between two nodes can be calculated through the feature vectors or the attributes of the two nodes. Such as by cosine similarity, euclidean distance, etc.
In an embodiment, in order to evaluate the behavior similarity between two test behavior nodes, the action similarity and the entity similarity between the two test behavior nodes may be obtained, and the behavior similarity may be calculated according to the action similarity and the entity similarity. The calculation may be performed in a linear weighted manner or may be performed in other manners. By adopting the method, the similarity of different test behavior nodes can be accurately estimated, so that the similar test behavior nodes can be aligned later.
In an embodiment, identifying the test entity node, the test action node, and the matched node in the test action node according to the entity similarity, the action similarity, and the action similarity includes: if the entity similarity is greater than the entity similarity threshold, acquiring a test entity node corresponding to the entity similarity, and configuring the test entity node corresponding to the entity similarity as a matched node; if the action similarity is greater than the action similarity threshold, acquiring a test action node corresponding to the action similarity, and configuring the test action node corresponding to the action similarity as a matched node; if the behavior similarity is greater than the behavior similarity threshold, configuring the test behavior node corresponding to the behavior similarity as a matched node. The entity similarity threshold, the action similarity threshold and the action similarity threshold may be the same as each other, for example, 90%, and may be different values.
In addition, for the test entity nodes corresponding to the entity similarity less than or equal to the entity similarity threshold, the nodes are not matched; for the test action nodes corresponding to the action similarity less than or equal to the action similarity threshold, the test action nodes are not matched; for test behavior nodes corresponding to the behavior similarity less than or equal to the behavior similarity threshold, the nodes are not matched.
By the method, the matched test action nodes, the matched test entity nodes and the matched test action nodes can be obtained, the nodes are further aligned, and the accuracy of the knowledge graph can be improved.
In an embodiment, aligning the matched nodes, and constructing a knowledge graph of a software test according to the aligned nodes obtained by alignment, including: mapping the matched nodes to obtain aligned nodes, and establishing a connection relationship between the aligned nodes to construct a knowledge graph of the software test.
In one embodiment, evaluating the entity similarity between the test entity nodes and the action similarity between the test action nodes, respectively, includes: respectively taking the test entity node and the test action node as nodes to be evaluated; acquiring equivalent word dictionary attributes of the nodes to be evaluated and opposite word dictionary attributes of the nodes to be evaluated; converting the equivalent word dictionary attribute into an equivalent word vector, and converting the opposite word dictionary attribute into an opposite word vector; according to the equivalent word vector and the opposite word vector, calculating cosine similarity between the nodes to be evaluated; and determining the entity similarity between the test entity nodes and the action similarity between the test action nodes according to the cosine similarity between the nodes to be evaluated.
The node to be evaluated is the node needing to evaluate the similarity. And respectively determining the entity similarity between the test entity nodes and the action similarity between the test action nodes by respectively taking the test entity nodes and the test action nodes as nodes to be evaluated.
The equivalent word dictionary attribute of the node to be evaluated comprises an equivalent word dictionary of the test entity or an equivalent word dictionary of the test action. The equivalent word dictionary of the test entity refers to a dictionary constructed from similar words determined from the similarity to the word vector of the test entity. The equivalent word dictionary of the test action refers to a dictionary constructed from similar words determined from the similarity to the word vector of the test action. In the case of higher similarity, the corresponding vocabulary is indicated as a similar word.
The opposite word dictionary attribute of the node under evaluation includes an opposite word dictionary of the test entity or an opposite word dictionary of the test action. The opposite word dictionary of the test entity is a dictionary constructed from opposite words determined from similarity to the word vector of the test entity. The reverse word dictionary of the test action is a dictionary constructed from reverse words determined from the similarity to the word vector of the test action. In the case of a low similarity, the corresponding word is indicated as the opposite word.
In an embodiment, the cosine similarity between the nodes to be evaluated in the motion may be determined as the entity similarity between the test entity nodes or the motion similarity between the test motion nodes directly according to the node to be evaluated as the test entity node or the test motion node.
By adopting the mode, the diversity of the test entity and the test node can be improved through the equivalent word dictionary and the opposite word dictionary, the alignment effect is better, and the accuracy of the constructed knowledge graph is improved.
In an embodiment, determining the entity similarity between the test entity nodes and the action similarity between the test action nodes according to the cosine similarity between the nodes to be evaluated includes: acquiring name attributes and category attributes of a plurality of nodes to be evaluated; according to the editing operation times between the character strings corresponding to the name attributes, evaluating the name similarity between the name attributes; determining the category similarity between the category attributes according to whether the category attributes are the same attributes; according to the graph structure similarity of the subgraph formed by each attribute of the node to be evaluated in the knowledge graph, linear weighting is carried out to obtain the subgraph similarity; linearly weighting cosine similarity, name similarity, category similarity and sub-graph similarity to obtain similarity among nodes to be evaluated; and determining the entity similarity between the test entity nodes and the action similarity between the test action nodes according to the similarity between the nodes to be evaluated.
If the category attributes are the same attribute, the attribute similarity is a first preset value, and if the category attributes are not the same attribute, the attribute similarity is a second preset value, and the first preset value is larger than the second preset value. The first preset value is for example 1 and the second preset value is for example 0.
The attributes of the node to be evaluated include all the attributes of the node to be evaluated, such as the attributes including names, categories, and the like. The sub-graph similarity can be calculated by using Jacquard similarity.
In the implementation manner, the similarity among the nodes is obtained more accurately by combining the name similarity, the attribute similarity and the sub-graph similarity on the basis of the similarity among the nodes estimated based on the equivalent words and the opposite words, so that the nodes are aligned more accurately, and a more accurate knowledge graph is further obtained.
In an embodiment, aligning the matched nodes, and constructing a knowledge graph of a software test according to the aligned nodes obtained by alignment, including: combining the attributes of the matched nodes into the nodes of the knowledge graph, and connecting the matched nodes with the nodes of the knowledge graph to obtain aligned nodes; respectively representing the test cases, the test loopholes and the test requirements as reference test nodes of the knowledge graph; establishing an edge of the knowledge graph according to the semantic relation between the reference test node and the alignment node; and constructing a knowledge graph of the software test according to the reference test node, the alignment node and the edges of the knowledge graph.
And combining the attributes of the matched nodes with the attributes corresponding to the original nodes in the knowledge graph, so that the attributes of the original nodes in the knowledge graph are known to comprise the attributes of the matched nodes. On the basis, the matched nodes are connected with the original nodes corresponding to the knowledge graph, so that repeated alignment or misoperation is avoided. Referring to semantic relationships between test nodes and alignment nodes, such as test cases involving test entities, test vulnerabilities involving test actions, and so forth.
By adopting the method, the matched nodes and the original nodes in the knowledge graph are aligned, and edges of the knowledge graph are established according to the test cases, the test holes and the test requirements serving as reference test nodes, so that the relations among the test actions, the test entities, the test behaviors, the test cases, the test holes and the test requirements can be enriched, and the integrity and the accuracy of the knowledge graph are improved.
In one embodiment, step 6 includes steps 6 a-6 e.
And 6a, establishing nodes of the knowledge graph.
In an embodiment, the knowledge graph may be constructed based on Neo4j graph that implements knowledge graph visualization, and may represent the test entity, the test action, and the test behavior as nodes of the Neo4j graph, respectively. In addition, a knowledge graph other than Neo4j may be used, and the present embodiment is not limited thereto.
And 6b, aligning the test entity and the test action.
The alignment of test entity is exemplified, and the implementation method of the alignment function is as follows, aiming at the test entity node { C 1 ,C 2 ,., CN }; a selected test entity node: cp), the following treatment is performed:
traversing each test entity node C in the knowledge graph k (k. Noteq. P) for test entity C p Entity name attribute of (2)And C k Entity name attribute->By calculating between two stringsIs used for measuring the degree of difference and evaluating +.>And->Similarity between the converted strings. The specific implementation can adopt a Levenshtein distance algorithm.
For test entity C p Entity class attributes of (2)And C k Entity category attribute->If->And->The similarity of the attributes is set to 1 if the attributes are the same, otherwise, 0.
For test entity C p Equivalent word dictionary attribute and opposite word dictionary attributeAnd C k Equivalent word dictionary attribute and opposite word dictionary attribute +.>Will->And->Respectively converting the similarity into word vectors, and calculating the similarity of the rest strings: />
Calculation C p And C k Is used for carrying out the linear weighting on the image structure similarity of the subgraphs formed by the attributes of the subgraphs in the atlas. The calculation mode can be as follows: Wherein w is i A weight value representing the first attribute.
Linearly weighting the similarity of the four dimensions to obtain C p And C k Is a similarity of (3). And when the similarity is greater than a set threshold, determining that the two test entity nodes are matched. A threshold value, such as 90%, is set. And merging the attribute information of the matched nodes into the attribute of the original node, creating a connection relation in the knowledge graph, and connecting the reference matched nodes with the original node to avoid repeated alignment or misoperation.
As shown in fig. 7, fig. 7 shows a schematic diagram of a sub-map in one embodiment. The figure shows a sub-graph corresponding to the test step [ 'invest and entry and account opening and core application' ]. Wherein, "carry out: investment "," proceeding: feeding: account opening "," carrying out: the verification application constitutes a collection of test behavior nodes.
Step 6c: and extracting a test entity and a test action in the test behaviors, respectively calculating the similarity of the test entity and the similarity of the test action, and performing linear weighting to obtain the similarity of the test behaviors. When the similarity is greater than the set threshold, the two test behaviors are considered to match and align.
Step 6d: and respectively representing the test cases, the test loopholes and the test requirements as nodes of the knowledge graph on the basis of the aligned test entities, test actions and test behaviors.
The knowledge graph is the Neo4j graph. And representing attribute information in the test text, such as preset conditions and expected results, as a subgraph formed by the nodes of the test step, and converting the attribute information into a character string list. The nodes and their attributes are exemplified as follows:
the node name is: the test entity, the corresponding node attribute is: "ID": "1111", "entity name": "user interface", "entity category": "generic entity", "list of other entities equivalent to the entity": [ "interface", "window" ].
The node name is: testing action, wherein the corresponding node attribute is as follows: "ID": "111223", "action name": "check", "status of action": "success", "manner of occurrence of action": "according to the imported user ID", "occurrence time of action": after a valid password is input, the occurrence position of the action is: "password entry box", "other action list equivalent to the action": [ "check result" ], "other action list opposite to the action": [ "check failed", "refusal check" ].
The node name is: testing, namely testing the corresponding node attribute: "entity name": "running state", "action name": "query", "test procedure": "query: running water status".
The node name is: the test case, the corresponding node attribute is: "ID": 987654"," name ":" [ 'input: payment amount', "verify: payment amount '," test requirement "[' test: online payment system payment security '," requirement type ": system requirement'," requirement number ": 1.0", "type": function case ', "tag": payment security', "task priority": P1-high "," precondition ":: online payment system '," step:: select: one test account', "login: test account '," open: payment function', "select: payment mode '," input: one invalid payment amount (e.g., negative or illegal character)', 'make: payment operation', "expected result": invalid payment amount is detected "," expected result ": corresponding error prompt is given: the corresponding error prompt is not made: the online payment system detects the invalid payment amount; the online payment system should not pay "," test mode "," manual "," creation time "2022-02-0215:40".
The node name is: testing loopholes, wherein the corresponding node attributes are as follows: "ID": 308002"," report content "[" do: pay ', "input: pay amount '," verify (do not do): input pay amount ', "pay (can): any amount" ] "affected product": "online payment system", "discovery step": SIT "," vulnerability type ": function case", "release plan": "2023 year Q3", "release version": "1.5.0", "status": pending "," severity ": high", "solution type": code repair "," creation time ": 2023-05-1610:00", "repair time": 2023-07-1509:00"," closing time ": 2023-07-2016:30", "required time": "5 working days", "associated defect ID": unassociated "," associated defect name "" unassociated ".
The node name is: testing requirements, and corresponding node attributes are as follows: "ID": "149000", "demand name": "[' test:: on-line payment system pays security" ] "," demand priority ": high", "demand description": "ensure: (comprehensive): (test and evaluate) payment system 'process: (pay transaction', (ensure: (user information security ', (ensure: (user funds security', (verify: (pay amount verification function)) ', (verify: (input payment amount', (prevent: (pay arbitrary) any) and (detect: (abnormal) payment amount ', (detect): (illegal) payment amount', (process (correct): (abnormal) payment amount ', (correct): (illegal) payment amount', (test): (user identity verification mechanism ', (correct): (verify): (user identity', (prevent): (unauthorized payment behavior) ]).
And 6e, establishing the edges of the knowledge graph, and constructing the knowledge graph according to the nodes and the edges of the knowledge graph.
The edges of the knowledge graph are set according to semantic relationships and corresponding attributes, which are exemplified as follows: the semantic relation is test_requirement, and the corresponding attribute is the corresponding requirement of the test case; the semantic relation is testcase_has_action, and the corresponding attribute is that the test case relates to a test action; the semantic relationship is: testcase_involves_accept, the corresponding attribute is: the test cases relate to test entities; the semantic relationship is: testcase_has_step, the corresponding attribute is: the test case comprises a test step; the semantic relationship is: bug_has_action, the corresponding attribute is: testing vulnerabilities involves testing actions; the semantic relationship is: bug_involves_accept, the corresponding attributes are: the test vulnerability relates to a test entity; the semantic relationship is: bug_has_step, the corresponding attribute is: the test vulnerability comprises a test step; the semantic relationship is: bug_matches_testcase, the corresponding attributes are: the test loopholes correspond to the test cases.
Through the alignment mode, an effective alignment model is established, and word vectors corresponding to the test actions and the test entities are established, so that the constructed knowledge graph can be further used for supporting other multiple functions.
For example, in an embodiment, as shown in fig. 8, the knowledge graph constructed as described above may be specifically applied to similar test case recommendation, test question-answering system, and automatic generation of test documents, so as to improve accuracy and convenience of software testing.
By adopting the implementation mode of the steps 1-6, the software test knowledge map of a system is constructed by building a natural language processing technology and a deep learning model, extracting application program concepts, test system characteristics, task workflow and test behaviors and test expected results from the semi-structured heterogeneous test text, and converting the extracted text information into structured knowledge. The knowledge graph can reflect not only the dominant relationship but also the recessive relationship. The explicit relationship includes, for example, the test action and the test entity it relates to, and the implicit relationship includes, for example, the condition and frequency, manner, etc. under which the test action is completed. Moreover, the knowledge graph can more accurately capture the specific correlation between the test entity and the test verb, so that the integrity and the accuracy of the knowledge graph on the test scene representation are improved.
An electronic device 50 according to an embodiment of the present application is described below with reference to fig. 9. The electronic device 50 shown in fig. 9 is merely an example, and should not be construed as limiting the functionality and scope of use of the embodiments herein.
As shown in fig. 9, the electronic device 50 is in the form of a general purpose computing device. Components of electronic device 50 may include, but are not limited to: the at least one processing unit 510, the at least one memory unit 520, and a bus 530 connecting the various system components, including the memory unit 520 and the processing unit 510.
Wherein the storage unit stores program code that is executable by the processing unit 510 such that the processing unit 510 performs steps according to various exemplary embodiments of the present application described in the description section of the exemplary method described above in the present specification. For example, the processing unit 510 may perform the various steps as shown in fig. 9.
The storage unit 520 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 5201 and/or cache memory unit 5202, and may further include Read Only Memory (ROM) 5203.
The storage unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 530 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 50 may also communicate with one or more external devices 600 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 50, and/or any device (e.g., router, modem, etc.) that enables the electronic device 50 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 550. An input/output (I/O) interface 550 is connected to the display unit 540. Also, electronic device 50 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 560. As shown, network adapter 560 communicates with other modules of electronic device 50 over bus 530. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 50, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, etc.) to perform the method according to the embodiments of the present application.
In an exemplary embodiment of the present application, there is also provided a computer-readable storage medium having stored thereon computer-readable instructions, which, when executed by a processor of a computer, cause the computer to perform the method described in the method embodiment section above.
According to an embodiment of the present application, there is also provided a program product for implementing the method in the above method embodiments, which may employ a portable compact disc read only memory (CD-ROM) and comprise program code and may be run on a terminal device, such as a personal computer. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, in accordance with embodiments of the present application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
Furthermore, although the various steps of the methods herein are depicted in the accompanying drawings in a particular order, this is not required to either suggest that the steps must be performed in that particular order, or that all of the illustrated steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, etc.) to perform the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

Claims (10)

1. The method for constructing the knowledge graph is characterized by comprising the following steps of:
dividing a testing step according to a dependency syntax tree of a software testing text, and marking semantic roles of the testing step;
identifying a test action and a test entity in the test step according to the semantic role, and combining the test action and the test entity to obtain a test action;
acquiring a target named entity and a target entity category of the software test text, and embedding the target named entity and the target entity category into the test entity to obtain a target test entity;
generating a node of a knowledge graph according to the target test entity, the test action and the test action, and constructing the knowledge graph of the software test according to the node of the knowledge graph.
2. The method of claim 1, wherein constructing a knowledge-graph of a software test from nodes of the knowledge-graph comprises:
extracting test entity nodes, test action nodes and test action nodes according to the nodes of the knowledge graph;
respectively evaluating the entity similarity between the test entity nodes and the action similarity between the test action nodes;
Calculating the behavior similarity between the test behavior nodes according to the entity similarity and the action similarity;
identifying the test entity node, the test action node and the matched node in the test action node according to the entity similarity, the action similarity and the action similarity;
and aligning the matched nodes, and constructing a knowledge graph of the software test according to the aligned nodes obtained by alignment.
3. The method of claim 2, wherein evaluating the entity similarity between the test entity nodes and the action similarity between the test action nodes, respectively, comprises:
respectively taking the test entity node and the test action node as nodes to be evaluated;
acquiring equivalent word dictionary attributes of the nodes to be evaluated and opposite word dictionary attributes of the nodes to be evaluated;
converting the equivalent word dictionary attribute into an equivalent word vector, and converting the opposite word dictionary attribute into an opposite word vector;
calculating cosine similarity between the nodes to be evaluated according to the equivalent word vector and the opposite word vector;
And determining the entity similarity between the test entity nodes and the action similarity between the test action nodes according to the cosine similarity between the nodes to be evaluated.
4. A method according to claim 3, wherein determining the entity similarity between the test entity nodes and the action similarity between the test action nodes based on cosine similarity between the nodes to be evaluated comprises:
acquiring name attributes and category attributes of a plurality of nodes to be evaluated;
according to the editing operation times between the character strings corresponding to the name attributes, evaluating the name similarity between the name attributes;
determining the category similarity between the category attributes according to whether the category attributes are the same attributes or not;
according to the graph structure similarity of the subgraph formed by each attribute of the node to be evaluated in the knowledge graph, carrying out linear weighting to obtain the subgraph similarity;
linearly weighting the cosine similarity, the name similarity, the category similarity and the sub-graph similarity to obtain the similarity between the nodes to be evaluated;
and determining the entity similarity between the test entity nodes and the action similarity between the test action nodes according to the similarity between the nodes to be evaluated.
5. The method of claim 2, wherein aligning the matched nodes and constructing a knowledge-graph of the software test based on the aligned nodes obtained by the alignment comprises:
combining the attributes of the matched nodes into the nodes of the knowledge graph, and connecting the matched nodes with the nodes of the knowledge graph to obtain aligned nodes;
respectively representing the test cases, the test loopholes and the test requirements as reference test nodes of the knowledge graph;
establishing an edge of a knowledge graph according to the semantic relation between the reference test node and the alignment node;
and constructing the knowledge graph of the software test according to the reference test node, the alignment node and the edges of the knowledge graph.
6. The method of claim 1, wherein the step of partitioning the test according to the dependency syntax tree of the software test text comprises:
extracting subtrees in the dependency syntax tree of the software test text;
searching whether the subtree has the nodes corresponding to the parallel conjunctions or not according to the labels of the nodes in the subtree;
if the subtree has the node corresponding to the parallel conjunctions, dividing leaf nodes of the subtree according to the position of the parallel conjunctions in the subtree, and obtaining the testing step;
And if the subtree does not have parallel conjunctions, obtaining the testing step according to the leaf nodes of the subtree.
7. The method of claim 1, wherein labeling the semantic roles of the testing step comprises:
according to the characteristics of the software test text, generating a reference semantic role of the software test and a reference attribute corresponding to the reference semantic role; the reference attributes include attributes of the test actions and attributes of the test entities;
and marking the semantic roles of the test steps according to the reference semantic roles and the reference attributes, and obtaining the semantic roles corresponding to the test steps.
8. The method of claim 1, wherein obtaining the target named entity and target entity category of the software test text comprises:
acquiring target service data matched with a service scene where the software test text is located, performing feature engineering analysis on the target service data, and extracting a reference entity category;
extracting a test case and a reference text in a test vulnerability report, and labeling reference entities in the reference text according to the reference entity types;
Dividing the reference text into word segments, and adding a start mark and an end mark to the software test text to obtain mark data;
training a deep learning model according to the reference entity and the marking data to obtain an entity identification model;
and identifying the software test text through the entity identification model to obtain the target named entity and the target entity category.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to implement the method of any of claims 1-8.
10. A computer readable storage medium having stored thereon computer readable instructions which, when executed by a processor of a computer, cause the computer to perform the method of any of claims 1 to 8.
CN202311245542.0A 2023-09-25 2023-09-25 Knowledge graph construction method, electronic equipment and storage medium Pending CN117252261A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311245542.0A CN117252261A (en) 2023-09-25 2023-09-25 Knowledge graph construction method, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311245542.0A CN117252261A (en) 2023-09-25 2023-09-25 Knowledge graph construction method, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117252261A true CN117252261A (en) 2023-12-19

Family

ID=89126063

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311245542.0A Pending CN117252261A (en) 2023-09-25 2023-09-25 Knowledge graph construction method, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117252261A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117453576A (en) * 2023-12-25 2024-01-26 企迈科技有限公司 DXM model-based SaaS software test case construction method
CN117555814A (en) * 2024-01-11 2024-02-13 三六零数字安全科技集团有限公司 Software testing method and device, storage medium and terminal
CN117574878A (en) * 2024-01-15 2024-02-20 西湖大学 Component syntactic analysis method, device and medium for mixed field
CN117555814B (en) * 2024-01-11 2024-05-10 三六零数字安全科技集团有限公司 Software testing method and device, storage medium and terminal

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117453576A (en) * 2023-12-25 2024-01-26 企迈科技有限公司 DXM model-based SaaS software test case construction method
CN117453576B (en) * 2023-12-25 2024-04-09 企迈科技有限公司 DXM model-based SaaS software test case construction method
CN117555814A (en) * 2024-01-11 2024-02-13 三六零数字安全科技集团有限公司 Software testing method and device, storage medium and terminal
CN117555814B (en) * 2024-01-11 2024-05-10 三六零数字安全科技集团有限公司 Software testing method and device, storage medium and terminal
CN117574878A (en) * 2024-01-15 2024-02-20 西湖大学 Component syntactic analysis method, device and medium for mixed field
CN117574878B (en) * 2024-01-15 2024-05-17 西湖大学 Component syntactic analysis method, device and medium for mixed field

Similar Documents

Publication Publication Date Title
US10846341B2 (en) System and method for analysis of structured and unstructured data
Jung Semantic vector learning for natural language understanding
US9740685B2 (en) Generation of natural language processing model for an information domain
US20230056987A1 (en) Semantic map generation using hierarchical clause structure
Gaur et al. Semi-supervised deep learning based named entity recognition model to parse education section of resumes
Guarasci et al. Assessing BERT’s ability to learn Italian syntax: A study on null-subject and agreement phenomena
Kashmira et al. Generating entity relationship diagram from requirement specification based on nlp
Qin et al. A survey on text-to-sql parsing: Concepts, methods, and future directions
US20230028664A1 (en) System and method for automatically tagging documents
CN117252261A (en) Knowledge graph construction method, electronic equipment and storage medium
Cui et al. Simple question answering over knowledge graph enhanced by question pattern classification
US20220245361A1 (en) System and method for managing and optimizing lookup source templates in a natural language understanding (nlu) framework
Fuchs Natural language processing for building code interpretation: systematic literature review report
Jayakumar et al. RNN based question answer generation and ranking for financial documents using financial NER
AbuRa’ed et al. Automatic related work section generation: experiments in scientific document abstracting
Chittimalli et al. An approach to mine SBVR vocabularies and rules from business documents
Liu et al. Adaptivepaste: Code adaptation through learning semantics-aware variable usage representations
US20220237383A1 (en) Concept system for a natural language understanding (nlu) framework
US20220229986A1 (en) System and method for compiling and using taxonomy lookup sources in a natural language understanding (nlu) framework
Kiyavitskaya et al. Requirements model generation to support requirements elicitation: the Secure Tropos experience
Ashfaq et al. Natural language ambiguity resolution by intelligent semantic annotation of software requirements
Zhang et al. Constructing covid-19 knowledge graph from a large corpus of scientific articles
Klenner et al. Enhancing coreference clustering
Loukachevitch et al. NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links
Moreira et al. Deepex: A robust weak supervision system for knowledge base augmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication