CN117493619A - Event graph-based method and system for predicting closing time of issues - Google Patents

Event graph-based method and system for predicting closing time of issues Download PDF

Info

Publication number
CN117493619A
CN117493619A CN202311851136.9A CN202311851136A CN117493619A CN 117493619 A CN117493619 A CN 117493619A CN 202311851136 A CN202311851136 A CN 202311851136A CN 117493619 A CN117493619 A CN 117493619A
Authority
CN
China
Prior art keywords
event
closing time
issue
node
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311851136.9A
Other languages
Chinese (zh)
Other versions
CN117493619B (en
Inventor
袁水平
裴学良
乔雨
王健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Sigao Intelligent Technology Co ltd
Original Assignee
Anhui Sigao Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Sigao Intelligent Technology Co ltd filed Critical Anhui Sigao Intelligent Technology Co ltd
Priority to CN202311851136.9A priority Critical patent/CN117493619B/en
Publication of CN117493619A publication Critical patent/CN117493619A/en
Application granted granted Critical
Publication of CN117493619B publication Critical patent/CN117493619B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Fuzzy Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an event graph-based method for predicting closing time of an issue, which is used for mining an event mode based on event log data of the issue in an open source software project; generating an issue event map based on the event pattern and the event information; and constructing an event graph-based issue closing time prediction model by using the graph attention network, and finally outputting an issue closing time prediction result. The invention innovatively defines the issue event diagram from the aspect of mining the issue event mode, and builds the issue closing time prediction model based on the issue event diagram by using the diagram attention network, thereby effectively capturing event factors influencing the issue closing time and enhancing the prediction capability on the issue closing time prediction task.

Description

Event graph-based method and system for predicting closing time of issues
Technical Field
The invention relates to the technical field of open source software project management, in particular to an event map-based method and system for predicting closing time of an issue.
Background
The Github serves as a maximum Internet-based code hosting platform, integrates code version control, problem feedback and developer communities, successfully attracts developers worldwide, and hosts tens of millions of open source code projects. For most open source projects, the Github problem tracking system records the evolution of a software process, is of great importance to attracting various stakeholders in the whole project life cycle, enables a developer team to track ideas, feedback, tasks or defects and the like in the development of the Github project, and therefore, the study of the closing time of the issues in the Github problem tracking system can help the team to better manage and track the problems and tasks, improve the production efficiency and quality level and has great significance for the software development team.
With the widespread use of graph data, graphs are increasingly emerging for revealing the advantageous effects of various software engineering studies. However, the existing research studies either study the link rule from the perspective of the topic tracing network or dig out topic processing modes of different stages from the event flow, but there is no feasible research at present on how to accurately predict the topic closing time from the perspective of the event flow and the graph network.
Disclosure of Invention
In order to solve the above problems, the present invention constructs an event flow chart based on the abundant event information in the event log of the issue from the perspective of the event log and the event flow chart, combines the event mode with the graph network method, exerts the characteristic learning advantages of the graph data and the graph network method, and performs more accurate issue closing time prediction from the perspective of the event flow chart and the graph network.
An issue closing time prediction method based on an event map comprises the following steps:
s1: analyzing the main event types of the issues based on event log data of the issues in the open source software project, and mining event modes of closing the issues, wherein the event modes comprise a general event mode and a key event mode;
s2: constructing an issue event map based on the event mode and the event information;
s3: based on the event diagram data, an event diagram-based event closing time prediction model is constructed by using a diagram annotation meaning network (Graph Attention Network, GAT), and an event closing time prediction result is output.
Further, in step S1, the implementation process of the main event type of the analysis subject is as follows:
firstly, acquiring an event log of each closed topic in a project based on time data of the topic in open source project data, and counting the quantity distribution of 44 all event types defined by a Github platform; then, the frequently occurring event type is selected as the main event type by calculating the event support, i.e., the duty ratio of the number of topics having the event of the type in the number of all topics.
Further, in step S1, the mining process of the general event mode is as follows:
by applying the frequent item set mining method of the Apriori algorithm, a frequent sequence item set is mined on a data set of an event mode, the data set is used as an event type set, the minimum support degree is set, the event type set is set as a main event type, and the frequent sequence item set output by the algorithm, namely a general event mode, is obtained;
wherein the frequent sequence item set is defined as follows: a first item in the sequence item set represents a first item of the event mode, a second item represents a second item of the event mode, and so on, the sequence item set consisting of k sequence items is called a k-sequence item set, and a sequence item set with a support degree greater than the minimum support degree in the event mode data set is called a frequent sequence item set; the support of the sequential set of terms is equal to the number of event patterns comprising the sequential set of terms divided by the total number of event patterns in the data set, and the calculation formula is as follows:
wherein,representing the total number of event patterns within the dataset, +.>Representing the number of event patterns comprising a sequence item set, < +.>Representing the support of sequential item sets, event pattern +.>Comprising a sequence item setThe number n of items of the sequence item set is not larger than the number m of items of the event mode, and m and n are positive integers.
Further, in step S1, the mining process of the key event mode is as follows:
s1.4.1 counting the closing time distribution of the subject, dividing the closing time of the subject into a plurality of closing time categories, dividing the closing time of the subject into three types of short term T1, medium term T2 and long term T3 according to the closing time categories, taking the closing time period T as the last element, adding the last element into the event mode corresponding to the subject, and constructing a subject event mode [ L ] containing closing time information 1 ,L 2 ,L 3 ,...,T]Wherein T is one of T1, T2 and T3;
s1.4.2, mining frequent sequence item sets associated with the closing time stage on the data set by a frequent item set mining method of an Apriori algorithm; setting an event type set as a main event type and three closing time stages, and removing the frequent sequence item set which does not contain T after the frequent sequence item set is obtained; the support of the sequential item set containing the sequential item set of the off-time information is equal to the number of event patterns containing the sequential item set divided by the total amount of data in the sequential item set; wherein an event pattern L 1 ,L 2 ,...,L m ,T]Comprising a sequence item set { L ] 1 ,L 2 ,...,L n T, the number of entries n of the contained sequential entry set is not greater than the number of entries m of the event pattern, the first n entries of the event pattern are equal to the contained sequential entry set, and the off-time period T is equal;
s1.4.3, setting minimum confidence on the basis of the frequent sequence item set obtained in step S1.4.2, and applying an association rule learning method of the Apriori algorithm to obtain an association rule related to the closing time stage, namely a key event mode.
Further, the implementation process of step S2 is as follows:
s2.1, defining nodes of the topic event graph according to an event mode, and adding node labels according to event information:
defining each sequence item in the event mode of the topic as a node of the topic event graph, wherein a node label is an event type name, and the basic attribute of the node comprises whether the basic attribute is the identity type of a creator of the topic and an event initiator in the item, wherein the identity type comprises a member, a contributor, a partner and an owner;
s2.2, constructing edges between nodes according to the event mode:
constructing directed edges based on the sequence of occurrence of the event of the subject, wherein the direction of the edges points to the event type node of the immediately following event from the event type node of the previous event, and the weight of the edges is the occurrence times of the trend of the event in all events of the subject;
s2.3, constructing an issue event diagram according to nodes and edges of the event diagram, and respectively acquiring an adjacent matrix and an attribute matrix corresponding to the issue event diagram according to the issue event diagram.
Further, the topic closing time prediction model consists of stacking 2 graph annotation layers and 1 fully connected layer, and the input of the model is an adjacency matrix and an attribute matrix of the topic event graph; the implementation process of step S3 is as follows:
s3.1, defining the classified prediction category number C of the theme-off time prediction model according to the classified theme-off time categories, namely, within which of the closing time categories the theme is predicted to be closed;
s3.2, constructing 2 drawing meaning force layers:
the input to the graph attention layer is the set of attribute feature vectors for the node h:
wherein,for the attribute characteristics of the ith node in the issue event graph, i=1, 2, …, N is the number of nodes in the event graph, R F Representing the F-dimensional real number vector set, ">Representation->Is a real number vector with dimension F, and F is the number of node attribute features;
the calculation formula of the attention coefficient of the node i to the node j is as follows:
wherein, ij represents the attention coefficient of node i to node j, W is the weight matrix of nodes in the issue event graph, leakyRelu () represents the activation function with leakage correction linear element, < >>For the attribute feature of the ith node in the topic event graph,/for the item of interest>For the attribute feature of the j-th node in the topic event graph, j=1, 2,3,.. i Representing the neighborhood of the ith node, +.>Representing weight vectors, T representing transpose operation, < ->Representing a splicing operation->Representing dimension as +.>Is set of real vectors of (a),representation->Is a dimension of +.>Is a real vector of>Representing the number of new node attribute features;
after the multi-head attention mechanism, a new attribute feature vector set of the node is obtained
Wherein,representing the attribute characteristics of the i-th node, i=1, 2, …, N 1 ,N 1 For the new number of nodes +.>Representing dimension as +.>Is a real vector set,/>Representation->Is a dimension of +.>Is a real vector of>Representing the number of new node attribute features, attribute feature of the ith node +.>Expressed as:
wherein,softmax function, ->Representing the normalized attention coefficient calculated by the kth attention mechanism,/for>Then the weight matrix representing the input feature under the kth attention mechanism, K being the corresponding attention mechanism, k=1, 2, …, K representing the number of attention mechanisms,/-the number of attention mechanisms>For the attribute feature of the j-th node in the topic event graph, j=1, 2,3,.. i Representing a neighborhood of the ith node;
and S3.3, setting the feature output dimension as the number C of the closing time categories of the issues, processing the node attribute features obtained in the step S3.2 as the input of the full-connection layer, wherein the output of the full-connection layer is the probability that each issue belongs to each category, and taking the closing time period corresponding to the category with the highest category probability as a prediction result for obtaining the closing time of the issues.
An event map based issue closing time prediction system comprising: a processor and a storage device; the processor loads and executes the instructions and data in the storage device for implementing the event map based issue closing time prediction method.
A storage device storing instructions and data for implementing the event map based issue closing time prediction method.
The technical scheme provided by the invention has the beneficial effects that: the invention starts from the point of excavating an issue event mode, analyzes the main event types of issues based on the event log data of the issues in the open source software project, excavates the general event mode and the key event mode of closing the issues, innovatively defines an issue event diagram, builds an issue closing time prediction model based on the issue event diagram data and uses GAT, thereby effectively capturing event factors influencing the issue closing time and enhancing the prediction capability on the issue closing time prediction task.
Drawings
FIG. 1 is a flowchart of an event map based method for predicting closing time of an issue according to an embodiment of the present invention;
FIG. 2 is a diagram of an attribute matrix of an issue event graph in an embodiment of the present invention;
FIG. 3 is a schematic diagram of the operation of a hardware device in an embodiment of the invention.
Detailed Description
For a clearer understanding of technical features, objects and effects of the present invention, a detailed description of embodiments of the present invention will be made with reference to the accompanying drawings.
The invention provides an event graph-based method for predicting closing time of an issue, which is used for mining an event mode based on event log data of the issue in an open source software project; then generating an issue event map based on the event mode and the event information; and then constructing an issue closing time prediction model based on the event diagram by using GAT, and finally outputting an issue closing time prediction result.
As shown in fig. 1, the present invention provides an event map-based method for predicting closing time of an issue, comprising the following steps:
s1, analyzing main event types of the topics based on event log data of the topics in the open source software project, and mining event modes for closing the topics, wherein the event modes comprise a general event mode and a key event mode.
S2, constructing an issue event diagram based on the event mode and the event information.
S3, based on the event diagram data, constructing an event diagram-based event closing time prediction model by using a diagram attention network (Graph Attention Network, GAT), and outputting an event closing time prediction result.
The implementation process of the step S1 specifically includes:
s1.1, analyzing main event types of the issues. Based on the time data of the subjects in the open source project data, obtaining an event log of each closed subject in the project, and counting the quantity distribution of 44 total event types defined by the Github platform. The invention screens frequently occurring event types as main event types by calculating the event support, namely the duty ratio of the number of the topics with the event types in the number of all the topics, and takes the event types with the support of more than or equal to 0.1 as the main event types of the topics.
S1.2, defining an event mode. Observing the occurrence rule of main event types in the event types of the issues, defining the occurrence sequence of different types of events as event modes, wherein the events with the occurrence sequence are defined as a sequence item, and using L 1 ,L 2 ,L 3 ,...]Representing the first item L in the event schema 1 Representing the type of event first occurring, also known as the first order item, the second item L 2 Representing the second occurring event type, and so on, and each item in the event pattern must not be repeated, i.e., each item represents a different event type. The sequence of sequential items occurring from the first sequential item in the event schema is defined as the sequential item set, and the number of event types in the event schema is defined as the number of items of the event schema, such as { L ] 1 ,L 2 }、{ L 1 ,L 2 ,L 3 Each of the terms is a sequential term set with terms of 2 and 3, respectively, and { L } is 2 ,L 3 Not a sequential set of items.
S1.3, general event mode mining. By applying the frequent item set mining method of the Apriori algorithm, frequent sequence item sets are mined on the event mode data set, the minimum support degree is set to be 0.03, the event type set is set to be a main event type, and the frequent sequence item sets output by the algorithm, namely the general event mode, are obtained.
Wherein the frequent sequence item set is defined as follows: the first term in the sequence item set represents the first term of the event pattern, the second term represents the second term of the event pattern, and so on, the sequence item set consisting of k sequence items is called a k-sequence item set, and the sequence item set with the support degree larger than the minimum support degree in the event pattern data set is called a frequent sequence item set. The support of the sequential set of items is equal to the number of event patterns comprising the sequential set of items divided by the total number of event patterns in the data set, calculated as follows:
representing the total number of event patterns within the dataset, +.>Representing the number of event patterns comprising the sequence item set,/->Representing the support of the sequential item set. Wherein, event pattern->Comprising a sequence item setIf and only if the number of items n of the sequence item set is not greater than the number of items m of the event pattern, the first n items in the event pattern, etcIn the sequence item set, m and n are positive integers.
S1.4, the mining process of the key event mode specifically comprises the following steps:
s1.4.1, counting the issue closing time distribution, classifying the issue closing time into a plurality of closing time categories, such as closing within 3 hours, closing within 1 day, closing within 7 days, closing within 30 days, closing within 120 days, etc. Dividing the closing time of the subject into three types of short term T1, medium term T2 and long term T3 according to the closing time category, taking the closing time period T as the last element, adding the last element into the event mode corresponding to the subject, and constructing a subject event mode [ L ] containing closing time information 1 ,L 2 ,L 3 ,...,T]Wherein T is one of T1, T2 and T3.
S1.4.2 frequent item sets associated with the shutdown time period T are mined on the data set by the frequent item set mining method of the Apriori algorithm. Setting the minimum support degree to be 0.01, setting the event type set to be a main event type and three closing time phases T, and eliminating the frequent sequence item set which does not contain T after the frequent sequence item set is obtained. The support of a sequential set of items of the sequential set of events containing the closing time information is equal to the number of event patterns containing the sequential set of items within the sequential set of items divided by the total amount of data. Wherein an event pattern L 1 ,L 2 ,...,L m ,T]Comprising a sequence item set { L ] 1 ,L 2 ,...,L n T if and only if the number of items of the contained sequential item set is not greater than the number of items of the event pattern, and the first n items of the event pattern are equal to the contained sequential item set, and the off-time period T is equal.
S1.4.3, setting the minimum confidence coefficient in the parameters to 0.40 on the basis of the frequent sequence item set obtained in the step S1.4.2, and applying an association rule learning method of the Apriori algorithm to obtain an association rule related to the closing time stage, namely a key event mode.
The implementation process of the step S2 is as follows:
s2.1, defining nodes (namely graph nodes) of the event graph according to the event mode, and adding node labels (namely graph node labels) of the event graph according to the event information. Each sequential item in the event schema defining the issue is a graph node, the graph node labels the event type name, and the underlying attributes of the graph node include whether it is the creator of the issue and the identity type of the event initiator in the item (i.e., member, contributor, co-worker, owner).
S2.2, constructing edges among nodes according to the issue event mode. The method comprises the steps of constructing directed edges based on the sequence of occurrence of the event of the subject, wherein the direction of the edges points to the event type node of the immediately following event from the event type node of the previous event, and the weight of the edges is the occurrence times of the trend of the event in all events of the subject.
For example, the occurrence sequence of all the events of the subjects is [ 'created', 'stacked', 'cross-referenced', 'command', 'reduced', 'command', 'reduced', 'cross-referenced', 'closed' ], wherein the 1 st, 5 th, 6 th events are initiated by the subject creator, and the identities of the initiators of the 2 nd, 3 th, 4 th, 12 th events are contributors, the event mode corresponding to the subject is [ 'created', 'stacked', 'cross-referenced', 'command', 'reduced', 'closed' ]. And taking each item of the event mode as a node in turn, and taking the occurrence sequence of different events as edges to construct an issue event diagram.
S2.3, respectively acquiring an adjacent matrix and an attribute matrix corresponding to the event diagram according to the event diagram of the subject. Setting the adjacency matrix value to 1 indicates that two events are sequential items, such as created (v) 0 ) After the event, a delayed (v) 1 ) Event, v of adjacency matrix 0 Line v 1 The column corresponding position value is 1. FIG. 2 is a schematic diagram of an attribute matrix Attr showing which attribute features of an event node are, e.g., row a4 v 0 Column value of 1, v 0 This node has the attribute a4, v 0 The initiator identity of the event is the creator.
The process of the step S3 is as follows: event diagram data of an open source project issue is taken as a data set, an adjacency matrix and an attribute matrix of the event diagram are taken as inputs, and an issue closing time prediction model is formed by stacking 2 diagram annotation force layers and 1 full connection layer. Specific:
s3.1, defining the classification prediction category number C of the subject closing time prediction model according to the subject closing time categories divided in the step S1.4, namely, within which closing time category the predicted subject is closed.
S3.2, constructing 2 drawing force layers. The input to the attention layer of the graph is the set of attribute feature vectors h for all nodes:
wherein F is the number of node attribute features, a plurality of nodes are arranged in an issue event graph,for the attribute characteristics of all nodes of the event graph, the size is N.times.F, +.>The attribute characteristics of the 1 st node in the topic event graph are represented, < >>Attribute feature representing node 2 in the topic event graph, +.>For the attribute characteristics of the ith node in the issue event graph, i=1, 2, …, N is the number of nodes in the event graph, R F A set of real vectors representing the F dimension, +.>Representation->Is a real vector of dimension F.
The attention coefficient calculation formula of the node i to the node j is as follows:
wherein, ij represents the attention coefficient of node i to node j, < ->For being +.>The controlled shared attention mechanism is a parameter +.>Is a single layer feed-forward neural network,/->Representing the splicing operation, W is the weight matrix of the event graph node,>for the attribute feature of the ith node in the topic event graph,/for the item of interest>For the attribute feature of the j-th node in the topic event graph, j=1, 2,3,.. i Representing the neighborhood of the ith node (i.e. the set of nodes adjacent to node i),/and>representing weight vectors, T representing transpose operation, < ->Representing dimension as +.>Is a real vector set,/>Representation->Is a dimension of +.>Is a real vector of (c). And then obtaining a new attribute feature vector set of the node after a multi-head attention mechanism>
Wherein,representing the attribute characteristics of the i-th node, i=1, 2, …, N 1 ,N 1 For the new number of nodes +.>Attribute characteristic representing all new nodes, +.>Representing the number of new node property features, +.>Representing dimension as +.>Is a real vector set,/>Representation->Is a dimension of +.>Property feature of the ith node +.>Expressed as:
wherein,softmax function, ->Representing the normalized attention coefficient calculated by the kth attention mechanism,/for>The weight matrix K representing the input feature under the kth attention mechanism is the corresponding attention mechanism, k=1, 2, …, K represents the number of attention mechanisms, +.>For the attribute feature of the j-th node in the topic event graph, j=1, 2,3,.. i Representing the neighborhood of the i-th node.
And S3.3, setting the feature output dimension as the number C of the closing time categories of the issues by using a full connection layer, and processing the node attribute features obtained in the step S3.2 to obtain the probability that each issue belongs to each category, namely the predicting result of the closing time of the issues.
Referring to fig. 3, fig. 3 is a schematic working diagram of a hardware device according to an embodiment of the present invention, where the hardware device specifically includes: an event map based issue closing time prediction system 301, comprising: a processor 302 and a storage device 303; the storage device 303 stores instructions and data; the processor 302 loads and executes instructions and data in the storage device 303 for implementing the event map based issue closing time prediction method.
The beneficial effects of the invention are as follows: the invention starts from the point of excavating an issue event mode, analyzes the main event types of issues based on the event log data of the issues in the open source software project, excavates the general event mode and the key event mode of closing the issues, innovatively defines an issue event diagram, builds an issue closing time prediction model based on the issue event diagram data and uses GAT, thereby effectively capturing event factors influencing the issue closing time and enhancing the prediction capability on the issue closing time prediction task.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (8)

1. The method for predicting the closing time of the topic based on the event diagram is characterized by comprising the following steps of: comprising the following steps:
s1: analyzing the main event types of the issues based on event log data of the issues in the open source software project, and mining event modes of closing the issues, wherein the event modes comprise a general event mode and a key event mode;
s2: constructing an issue event map based on the event mode and the event information;
s3: based on the event diagram, constructing an event diagram-based event closing time prediction model by using the drawing attention network, and outputting an event closing time prediction result.
2. The method for event map based issue close time prediction as set forth in claim 1, wherein: in step S1, the implementation process of the main event type of the analysis subject is as follows:
firstly, acquiring an event log of each closed topic in a project based on time data of the topic in open source project data, and counting the quantity distribution of 44 all event types defined by a Github platform; then, the frequently occurring event type is selected as the main event type by calculating the event support, i.e., the duty ratio of the number of topics having the event of the type in the number of all topics.
3. The method for event map based issue close time prediction as set forth in claim 1, wherein: in step S1, the mining process of the general event mode is as follows:
by applying the frequent item set mining method of the Apriori algorithm, a frequent sequence item set is mined on a data set of an event mode, the data set is used as an event type set, the minimum support degree is set, the event type set is set as a main event type, and the frequent sequence item set output by the algorithm, namely a general event mode, is obtained;
wherein the frequent sequence item set is defined as follows: a first item in the sequence item set represents a first item of the event mode, a second item represents a second item of the event mode, and so on, the sequence item set consisting of k sequence items is called a k-sequence item set, and a sequence item set with a support degree greater than the minimum support degree in the event mode data set is called a frequent sequence item set; the support of the sequential set of items is equal to the number of event patterns comprising the sequential set of items divided by the total number of event patterns in the data set, calculated as follows:
wherein,representing the total number of event patterns within the dataset, +.>Representing the number of event patterns comprising the sequence item set,/->Representing the support of sequential item sets, event pattern +.>Comprising a sequence item setThe number n of items of the sequence item set is not larger than the number m of items of the event mode, and m and n are positive integers.
4. A method of event map based off-issue time prediction as defined in claim 3, wherein: in step S1, the key event mode mining process is as follows:
s1.4.1 counting the closing time distribution of the subject, dividing the closing time of the subject into a plurality of closing time categories, dividing the closing time of the subject into three types of short term T1, medium term T2 and long term T3 according to the closing time categories, taking the closing time period T as the last element, adding the last element into the event mode corresponding to the subject, and constructing a subject event mode [ L ] containing closing time information 1 ,L 2 ,L 3 ,...,T]Wherein T is one of T1, T2 and T3;
s1.4.2, mining frequent sequence item sets associated with the closing time stage on the data set by a frequent item set mining method of an Apriori algorithm; setting an event type set as a main event type and three closing time stages, and removing the frequent sequence item set which does not contain T after the frequent sequence item set is obtained; the support of the sequential item set containing the sequential item set of the off-time information is equal to the number of event patterns containing the sequential item set divided by the total amount of data in the sequential item set; wherein an event pattern L 1 ,L 2 ,...,L m ,T]Comprising a sequence item set { L ] 1 ,L 2 ,...,L n T, the number of entries n of the contained sequential entry set is not greater than the number of entries m of the event pattern, the first n entries of the event pattern are equal to the contained sequential entry set, and the off-time period T is equal;
s1.4.3, setting minimum confidence on the basis of the frequent sequence item set obtained in step S1.4.2, and applying an association rule learning method of the Apriori algorithm to obtain an association rule related to the closing time stage, namely a key event mode.
5. The method for event map based closing time prediction of an issue as claimed in claim 4, wherein: the implementation process of step S2 is as follows:
s2.1, defining nodes of an issue event graph according to an event mode, and adding graph node labels according to event information:
defining each sequence item in the event mode of the topic as a node of an event graph of the topic, wherein a graph node label is an event type name, and the basic attribute of the graph node comprises whether the basic attribute is an identity type of a creator of the topic and an event initiator in the item, wherein the identity type comprises a member, a contributor, a partner and an owner;
s2.2, constructing edges between nodes according to the event mode:
constructing directed edges based on the sequence of occurrence of the event of the subject, wherein the direction of the edges points to the event type node of the immediately following event from the event type node of the previous event, and the weight of the edges is the occurrence times of the trend of the event in all events of the subject;
s2.3, constructing an issue event diagram according to nodes and edges of the event diagram, and respectively acquiring an adjacent matrix and an attribute matrix corresponding to the issue event diagram according to the issue event diagram.
6. The method for event map based closing time prediction according to claim 5, wherein: the topic closing time prediction model consists of stacking 2 drawing meaning layers and 1 full connection layer, and inputs are an adjacency matrix and an attribute matrix of an topic event diagram; the implementation process of step S3 is as follows:
s3.1, defining the classified prediction category number C of the theme-off time prediction model according to the classified theme-off time categories, namely, within which of the closing time categories the theme is predicted to be closed;
s3.2, constructing 2 drawing meaning force layers:
the input to the graph attention layer is the set of attribute feature vectors for the node h:
wherein,i=1, 2, which is the attribute feature of the i-th node in the issue event graph... N, N is the number of nodes of the event graph, R F Representing the F-dimensional real number vector set, ">Representation->Is a real number vector with dimension F, and F is the number of node attribute features;
the calculation formula of the attention coefficient of the node i to the node j is as follows:
wherein, ij represents the attention coefficient of node i to node j, W is the weight matrix of nodes in the issue event graph, leakyRelu () represents the activation function with leakage correction linear element, < >>For the attribute feature of the ith node in the topic event graph,/for the item of interest>For the attribute feature of the j-th node in the topic event graph, j=1, 2,3,.. i Representing the neighborhood of the ith node, +.>Representing weight vectors, T representing transpose operation, < ->Representing a splicing operation->Representing dimension as +.>Is set of real vectors of (a),representation->Is a dimension of +.>Is a real vector of>Representing the number of new node attribute features;
after the multi-head attention mechanism, a new attribute feature vector set of the node is obtained
Wherein,representing the attribute characteristics of the i-th node, i=1, 2, …, N 1 ,N 1 For the new number of nodes +.>Representing dimension asIs a real vector set,/>Representation->Is a dimension of +.>Is a real vector of>Representing the number of new node attribute features, attribute feature of the ith node +.>Expressed as:
wherein,softmax function, ->Representing the normalized attention coefficient calculated by the kth attention mechanism,then the weight matrix representing the input feature under the kth attention mechanism, K being the corresponding attention mechanism, k=1, 2, …, K representing the number of attention mechanisms,/-the number of attention mechanisms>For the attribute feature of the j-th node in the topic event graph, j=1, 2,.. i Representing a neighborhood of the ith node;
s3.3, setting the feature output dimension as the number C of the closing time categories of the issues by using a full-connection layer, taking the attribute features of the nodes obtained in the step S3.2 as the input of the full-connection layer, wherein the output of the full-connection layer is the probability that each issue belongs to each category, and taking the closing time period corresponding to the category with the highest category probability as the prediction result of the closing time of the issues.
7. An issue closing time prediction system based on an event map is characterized in that: the system comprises: a processor and a storage device; the processor loads and executes the instructions and data in the storage device to implement the event map-based method for predicting the closing time of an issue according to any one of claims 1 to 6.
8. A memory device, characterized by: the storage device stores instructions and data for implementing the event map-based method for predicting closing time of an issue according to any one of claims 1 to 6.
CN202311851136.9A 2023-12-29 2023-12-29 Event graph-based method and system for predicting closing time of issues Active CN117493619B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311851136.9A CN117493619B (en) 2023-12-29 2023-12-29 Event graph-based method and system for predicting closing time of issues

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311851136.9A CN117493619B (en) 2023-12-29 2023-12-29 Event graph-based method and system for predicting closing time of issues

Publications (2)

Publication Number Publication Date
CN117493619A true CN117493619A (en) 2024-02-02
CN117493619B CN117493619B (en) 2024-03-26

Family

ID=89685360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311851136.9A Active CN117493619B (en) 2023-12-29 2023-12-29 Event graph-based method and system for predicting closing time of issues

Country Status (1)

Country Link
CN (1) CN117493619B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111258624A (en) * 2020-01-13 2020-06-09 上海交通大学 Method and system for predicting Issue solution time in open source software development
CN112087370A (en) * 2020-09-04 2020-12-15 北京明略昭辉科技有限公司 Method, system, electronic device and computer-readable storage medium for issuing GitHub Issues
CN114239576A (en) * 2021-12-20 2022-03-25 南京邮电大学 Issue label classification method based on topic model and convolutional neural network
CN115292167A (en) * 2022-07-26 2022-11-04 武汉大学 Life cycle prediction model construction method, device, equipment and readable storage medium
US20220398466A1 (en) * 2021-06-10 2022-12-15 Visa International Service Association System, Method, and Computer Program Product for Event Forecasting Using Graph Theory Based Machine Learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111258624A (en) * 2020-01-13 2020-06-09 上海交通大学 Method and system for predicting Issue solution time in open source software development
CN112087370A (en) * 2020-09-04 2020-12-15 北京明略昭辉科技有限公司 Method, system, electronic device and computer-readable storage medium for issuing GitHub Issues
US20220398466A1 (en) * 2021-06-10 2022-12-15 Visa International Service Association System, Method, and Computer Program Product for Event Forecasting Using Graph Theory Based Machine Learning
CN114239576A (en) * 2021-12-20 2022-03-25 南京邮电大学 Issue label classification method based on topic model and convolutional neural network
CN115292167A (en) * 2022-07-26 2022-11-04 武汉大学 Life cycle prediction model construction method, device, equipment and readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
RAFAEL KALLIS等: "Predicting Issue Types on GitHub", 《ELSEVIER》, 21 July 2021 (2021-07-21) *
余译青;吴丽兵;朱庆华;: "开源软件开发团队的冲突来源研究――基于虚拟团队与软件工程的视角", 图书情报知识, no. 06, 10 November 2018 (2018-11-10) *

Also Published As

Publication number Publication date
CN117493619B (en) 2024-03-26

Similar Documents

Publication Publication Date Title
US7490106B2 (en) Multidimensional database subcubes
Maedche et al. Managing multiple ontologies and ontology evolution in ontologging
Rahman et al. Discretization of continuous attributes through low frequency numerical values and attribute interdependency
Bajaj et al. SEAM: A state-entity-activity-model for a well-defined workflow development methodology
Mathisen et al. Using extended siamese networks to provide decision support in aquaculture operations
Karthik et al. Prognostic Kalman Filter Based Bayesian Learning Model for Data Accuracy Prediction.
CN117236677A (en) RPA process mining method and device based on event extraction
Xue et al. A graph regularized point process model for event propagation sequence
Xu et al. A novel framework of knowledge transfer system for construction projects based on knowledge graph and transfer learning
CN113204335B (en) UML model merging and consistency detection method and system oriented to collaborative modeling
Hamad et al. Knowledge-driven decision support system based on knowledge warehouse and data mining for market management
Glava et al. Information Systems Reengineering Approach Based on the Model of Information Systems Domains
Kanter et al. Machine Learning 2.0: Engineering Data Driven AI Products
CN117493619B (en) Event graph-based method and system for predicting closing time of issues
Jin et al. Financial management and decision based on decision tree algorithm
CN111612156A (en) Interpretation method for XGboost model
US11609971B2 (en) Machine learning engine using a distributed predictive analytics data set
Xu et al. An algorithm for predicting customer churn via BP neural network based on rough set
Fisun et al. Implementation of the information system of the association rules generation from OLAP-cubes in the post-relational DBMS caché
Štolfa et al. Value estimation of the use case parameters using SOM and fuzzy rules
Yu et al. Modeling Community Evolution Characteristics of Dynamic Networks with Evolutionary Bayesian Nonnegative Matrix Factorization
Yang et al. Auto-Associative LSTM for Multivariate Time Series Imputation
Peng et al. Analyzing the Reliability of the Grouping-Based Concept Lattice Reductions and a Method for Improving It
Wang et al. NEW ARP: Data-Driven Academia Resource Planning for CAS Researchers
Chen Financial Risks Identification Model based on Reinforcement Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant