CN116909910A - Research and development efficiency measurement method and system based on machine learning - Google Patents

Research and development efficiency measurement method and system based on machine learning Download PDF

Info

Publication number
CN116909910A
CN116909910A CN202310917895.4A CN202310917895A CN116909910A CN 116909910 A CN116909910 A CN 116909910A CN 202310917895 A CN202310917895 A CN 202310917895A CN 116909910 A CN116909910 A CN 116909910A
Authority
CN
China
Prior art keywords
research
development process
development
machine learning
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310917895.4A
Other languages
Chinese (zh)
Inventor
陶嘉驹
陈煜�
张雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangyin Consumer Finance Co ltd
Original Assignee
Hangyin Consumer Finance Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangyin Consumer Finance Co ltd filed Critical Hangyin Consumer Finance Co ltd
Priority to CN202310917895.4A priority Critical patent/CN116909910A/en
Publication of CN116909910A publication Critical patent/CN116909910A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3608Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3616Software analysis for verifying properties of programs using software metrics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses a research and development efficiency measurement method and system based on machine learning. Firstly, acquiring research and development process data, wherein the research and development process data comprises codes, documents, requirements, tests, defects and version control, then, carrying out statistics-based feature extraction on the research and development process data to obtain a plurality of research and development process statistical features, wherein the plurality of research and development process statistical features comprise code line numbers, defect numbers, submission frequency and code complexity, and then carrying out association analysis on the plurality of research and development process statistical features by using a machine learning model to obtain research and development efficiency metric values. Therefore, the method can be used for carrying out efficiency index definition based on machine learning and measuring the process efficiency through an algorithm model through data collection, so that possible risk early warning can be carried out based on the efficiency measurement value, and the overall research and development efficiency is improved.

Description

Research and development efficiency measurement method and system based on machine learning
Technical Field
The present application relates to the field of research and development performance metrics, and more particularly, to a method and system for research and development performance metrics based on machine learning.
Background
Development effectiveness refers to the efficiency and quality exhibited by a software development team in completing a project objective. The research and development efficiency measurement is used for evaluating the working state of a team, finding problems and improving points, and improving the research and development efficiency and quality. Traditional research and development performance measurement methods mainly rely on manual collection and analysis of various indexes such as code line number, defect number, submission frequency, code complexity and the like. These methods suffer from drawbacks such as incomplete data collection, inconsistent indices, subjective analysis results, etc.
Thus, an optimized development performance metric scheme is desired.
Disclosure of Invention
The present application has been made to solve the above-mentioned technical problems. The embodiment of the application provides a research and development efficiency measurement method and system based on machine learning. The method can be used for defining the performance index based on machine learning through data collection and measuring the process performance through an algorithm model, so that possible risk early warning can be performed based on the performance measurement value, and the overall research and development performance is improved.
According to one aspect of the present application, there is provided a development effectiveness measurement method based on machine learning, including:
acquiring development process data, wherein the development process data comprises codes, documents, requirements, tests, defects and version control;
performing statistics-based feature extraction on the research and development process data to obtain a plurality of research and development process statistical features, wherein the plurality of research and development process statistical features comprise code line numbers, defect numbers, submission frequency and code complexity; and
and performing association analysis on the plurality of statistical characteristics of the research and development process by using a machine learning model to obtain a research and development efficiency metric value.
According to another aspect of the present application, there is provided a machine learning based development effectiveness metric system, comprising:
the data acquisition module is used for acquiring research and development process data, wherein the research and development process data comprises codes, documents, requirements, tests, defects and version control;
the feature extraction module is used for carrying out feature extraction based on statistics on the research and development process data to obtain a plurality of research and development process statistical features, wherein the plurality of research and development process statistical features comprise code line numbers, defect numbers, submission frequency and code complexity; and
and the association analysis module is used for carrying out association analysis on the plurality of statistical characteristics of the research and development process by using a machine learning model so as to obtain a research and development efficiency measurement value.
Compared with the prior art, the research and development efficiency measurement method and system based on machine learning firstly acquire research and development process data, wherein the research and development process data comprises codes, documents, requirements, tests, defects and version control, then, the research and development process data is subjected to statistics-based feature extraction to obtain a plurality of research and development process statistical features, wherein the plurality of research and development process statistical features comprise code line numbers, defect numbers, submitting frequency and code complexity, and then, a machine learning model is used for carrying out association analysis on the plurality of research and development process statistical features to obtain research and development efficiency measurement values. Therefore, the method can be used for carrying out efficiency index definition based on machine learning and measuring the process efficiency through an algorithm model through data collection, so that possible risk early warning can be carried out based on the efficiency measurement value, and the overall research and development efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly introduced below, the following drawings not being drawn to scale with respect to actual dimensions, emphasis instead being placed upon illustrating the gist of the present application.
FIG. 1 is a flow chart of a method for machine learning based development effectiveness metrics in accordance with an embodiment of the present application.
FIG. 2 is a schematic diagram of a machine learning-based development performance metric method according to an embodiment of the application.
FIG. 3 is a flowchart of sub-step S120 of a machine learning based development effectiveness metric method according to an embodiment of the present application.
FIG. 4 is a flowchart of sub-step S130 of a machine learning based development effectiveness metric method according to an embodiment of the present application.
FIG. 5 is a flowchart of training steps further included in a machine learning based development effectiveness metric method according to an embodiment of the present application.
FIG. 6 is a block diagram of a machine learning based development effectiveness metric system in accordance with an embodiment of the present application.
Fig. 7 is an application scenario diagram of a machine learning-based development effectiveness metric method according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are also within the scope of the application.
As used in the specification and in the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.
Although the present application makes various references to certain modules in a system according to embodiments of the present application, any number of different modules may be used and run on a user terminal and/or server. The modules are merely illustrative, and different aspects of the systems and methods may use different modules.
A flowchart is used in the present application to describe the operations performed by a system according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in order precisely. Rather, the various steps may be processed in reverse order or simultaneously, as desired. Also, other operations may be added to or removed from these processes.
Hereinafter, exemplary embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.
Development effectiveness refers to the ability to achieve maximum yield and value with minimum resources and time during development. It measures the efficiency and effectiveness of an organization or team in a research and development activity. Development effectiveness can be measured in several ways: 1. project delivery time, teams with high research and development efficiency can deliver projects on time, obey a preset time schedule, can efficiently organize and arrange work, reasonably allocate resources and avoid delay and hold off; 2. the resource utilization rate, the team with high research and development efficiency can effectively utilize resources, including manpower, material resources and financial resources. They can reasonably plan and manage resources, avoiding waste and idle of resources; 3. the innovation capability, the team with high research and development efficiency has stronger innovation capability, can bring new ideas and solutions, and converts the new ideas and solutions into actual products or services, and can continuously perform technical research and innovation practice to keep competitive advantages; 4. the quality and reliability of products or services can be guaranteed by teams with high research and development efficiency, and the teams can perform comprehensive test and verification, repair loopholes and problems in time, and ensure that the products or services meet the needs and expectations of users; 5. team cooperation and communication, teams with high research and development efficiency can cooperate and communicate well, and the teams can effectively work and cooperate, and the professional knowledge and skills of team members are fully utilized.
However, the conventional research and development performance measurement method has some defects, and the reasons thereof mainly include the following points: 1. the data collection is incomplete, the traditional method needs to collect various index data manually, but due to the limitation of manpower and time, the data collection may be incomplete or inaccurate, which may cause deviation of an evaluation result, and the actual situation of a team cannot be comprehensively reflected; 2. the indexes are inconsistent, different teams can have different understanding and attention points on the development efficiency, and therefore, the selected indexes can also have differences, so that incomparability of measurement results is caused, and effective comparison and analysis are difficult to carry out; 3. the analysis results are subjective: the data analysis in the traditional method generally depends on manual subjective judgment, and can have the problems of personal prejudice or subjective evaluation, so that objective and accurate evaluation results cannot be obtained, and the scientificity and effectiveness of improved decisions are affected; 4. neglecting details of the software development process, the traditional method usually only pays attention to final product quality and delivery time, and neglects details and intermediate results in the software development process, so that potential problems cannot be found and solved in time, and improvement of research and development efficiency is affected.
In view of the above technical problems, the technical idea of the present application is to define performance indexes based on machine learning and measure process performance through an algorithm model through data collection, so that possible risk early warning can be performed based on performance metric values, thereby improving overall research and development performance.
FIG. 1 is a flow chart of a method for machine learning based development effectiveness metrics in accordance with an embodiment of the present application. FIG. 2 is a schematic diagram of a machine learning-based development performance metric method according to an embodiment of the application. As shown in fig. 1 and 2, the development effectiveness measuring method based on machine learning according to the embodiment of the application includes the steps of: s110, acquiring research and development process data, wherein the research and development process data comprises codes, documents, requirements, tests, defects and version control; s120, carrying out statistics-based feature extraction on the research and development process data to obtain a plurality of research and development process statistical features, wherein the plurality of research and development process statistical features comprise code line numbers, defect numbers, submission frequency and code complexity; and S130, performing association analysis on the plurality of research and development process statistical characteristics by using a machine learning model to obtain research and development efficiency measurement values.
Specifically, in the technical scheme of the application, research and development process data is firstly obtained, wherein the research and development process data comprises codes, documents, requirements, tests, defects and version control. In order to facilitate subsequent feature extraction, the development process data is first cleaned and sorted to obtain pre-processed development process data.
And then, carrying out statistics-based feature extraction on the research and development process data to obtain a plurality of research and development process statistical features, wherein the plurality of research and development process statistical features comprise code line numbers, defect numbers, submission frequency and code complexity. The number of code lines: the number of lines of code refers to the total number of lines of code written during development, which can be used to scale and complexity of an item. The defect number: the defect number refers to the number of code defects or errors discovered and recorded during development, which can be used to evaluate code quality and development team efficiency. The commit frequency: the submission frequency refers to the frequency of code submissions in a version control system, which can be used to measure the activity of a developer and the progress of a project. The code complexity: code complexity refers to the complexity of the structure and logic of the code, which may be evaluated for code readability and maintainability by various metrics such as loop complexity, function length, etc.
It should be appreciated that by performing statistical feature extraction on the development process data, specific values for these metrics may be obtained and further analyzed and compared to the development process for different projects or teams. Here, the development process statistics can help evaluate the effectiveness of the development process, discover potential problems, and make corresponding improvements.
Accordingly, as shown in fig. 3, the statistical-based feature extraction of the development process data to obtain a plurality of development process statistical features includes: s121, cleaning and sorting the research and development process data to obtain preprocessed research and development process data; and S122, carrying out feature extraction based on statistics on the preprocessed development process data to obtain the plurality of development process statistical features. In the research and development process, the data cleaning and arrangement refers to processing and screening of the original data so as to facilitate subsequent analysis and feature extraction, and the purpose of cleaning and arrangement is to remove noise, errors and redundant information in the data, so that the data is more accurate, reliable and consistent. The steps of cleaning and grooming may include the following: 1. the data is de-duplicated, repeated data records are removed, and repeated calculation of the same data in subsequent analysis is avoided; 2. processing missing values, wherein for data with missing values, a record containing the missing values can be deleted or the missing values can be filled by using an interpolation method; 3. abnormal value processing, namely detecting and processing abnormal values in data, and avoiding interference to the results of subsequent analysis and feature extraction; 4. data format conversion converts the data into an appropriate format for subsequent statistical analysis and feature extraction. The purpose of the cleaning and sorting is to improve the quality and usability of the data, ensure that subsequent analysis and feature extraction can be performed based on accurate and reliable data, and thus avoid erroneous results and misleading conclusions in the development efficacy measurement process.
Then, a machine learning model is used for carrying out association analysis on the plurality of statistical characteristics of the research and development process to obtain a research and development efficiency metric value. Specifically, a machine learning model is used to capture inter-index correlation pattern information contained in the plurality of statistical features of the development process, and the inter-index correlation pattern information is subjected to decoding regression by a decoder to obtain a decoded value for representing the development efficiency metric value. That is, in the technical solution of the present application, the process of performing the association analysis on the statistical features of the plurality of development processes by using the machine learning model to obtain the development efficiency metric value includes: arranging the plurality of research and development process statistical features into research and development process statistical feature input vectors; extracting local neighborhood associated features of the statistical feature input vector of the research and development process by using the machine learning model to obtain an inter-index associated mode feature vector of the research and development process; and passing the inter-development process index correlation mode feature vector through a decoder to obtain a decoded value, wherein the decoded value is used for representing a development efficiency metric value.
In particular, in one specific example of the present application, the machine learning model is an associated feature extractor based on a one-dimensional convolutional neural network model. Correspondingly, the machine learning model is used for carrying out local neighborhood correlation feature extraction on the research and development process statistical feature input vector to obtain a research and development process inter-index correlation mode feature vector, and the method comprises the following steps: inputting the research and development process statistical feature input vector into the correlation feature extractor based on the one-dimensional convolutional neural network model to obtain the correlation mode feature vector among the research and development process indexes. That is, the one-dimensional convolutional neural network model-based correlation feature extractor is used for carrying out one-dimensional convolutional coding on the research and development process statistical feature input vector so as to obtain the research and development process inter-index correlation mode feature vector.
Accordingly, as shown in fig. 4, performing a correlation analysis on the plurality of statistical features of the development process using a machine learning model to obtain a development efficiency metric value includes: s131, arranging the plurality of research and development process statistical features into research and development process statistical feature input vectors; s132, carrying out local neighborhood correlation feature extraction on the research and development process statistical feature input vector by using the machine learning model to obtain a research and development process inter-index correlation mode feature vector; and S133, generating the research and development efficiency metric value based on the related mode feature vector among the research and development process indexes. The machine learning model is an associated feature extractor based on a one-dimensional convolutional neural network model.
Accordingly, in step S132, the local neighborhood correlation feature extraction is performed on the research and development process statistical feature input vector by using the machine learning model to obtain a research and development process inter-index correlation mode feature vector, including: inputting the research and development process statistical feature input vector into the correlation feature extractor based on the one-dimensional convolutional neural network model to obtain the correlation mode feature vector among the research and development process indexes. It should be appreciated that a one-dimensional convolutional neural network (1D CNN) is a machine learning model for processing sequence data, and is more suitable when processing one-dimensional data (e.g., time series, text series, etc.) than conventional convolutional neural networks. The one-dimensional convolutional neural network extracts features by applying one-dimensional convolutional operations on the input data. It uses a sliding window to convolve over the input sequence and captures local patterns and features in the input sequence by learning the weights of the convolution kernel (filter). The dimension of the feature is then reduced by a pooling operation (e.g., maximum pooling or average pooling) to extract more representative features. The application of the one-dimensional convolutional neural network in research and development efficiency measurement is mainly used as a correlation feature extractor, and the correlation feature extractor can capture the correlation information among different features by learning the local neighborhood correlation modes of the input research and development process statistical features, and feature vectors of the correlation modes can be used for generating research and development efficiency measurement values to help evaluate and improve the research and development efficiency. The one-dimensional convolutional neural network is used as an associated feature extractor, and can automatically learn key feature representations in input data without manually designing features; the one-dimensional convolutional neural network can capture local modes and features in an input sequence through local convolutional operation, so that the local relevance of data is better reflected; and the convolution kernel in the one-dimensional convolution neural network can share parameters at different positions of the input sequence, so that the parameter quantity of the model is reduced, and the efficiency and generalization capability of the model are improved. In other words, the one-dimensional convolutional neural network plays an important role in the research and development efficiency measurement as a correlation feature extractor based on convolutional operation, and can help extract correlation pattern features from the statistical features of the research and development process so as to generate a research and development efficiency measurement value.
Accordingly, in step S133, generating the research and development efficiency metric based on the inter-research and development process index correlation pattern feature vector includes: and passing the inter-development process index correlation mode feature vector through a decoder to obtain a decoding value, wherein the decoding value is used for representing a development efficiency measurement value. It should be appreciated that a decoder is a component for converting inter-process-index correlation pattern feature vectors into decoded values, which are an output representing a development efficiency metric, and the decoder functions to re-convert the encoded feature vectors into the original metric. In the one-dimensional convolutional neural network, an encoder is responsible for extracting and encoding the characteristics of the input correlation mode characteristic vectors among the development process indexes, and a decoder decodes the encoded characteristic vectors into original metric values. The design of the decoder may be adjusted according to specific tasks and requirements to ensure that the decoded values accurately represent the development effectiveness metric values. The decoded values generated by the decoder may be used to measure and evaluate development performance, and these metric values may help teams understand efficiency, quality, and direction of improvement in the development process, thereby guiding decision making and optimizing development efforts. Decoders play a critical role in developing performance metrics, which translate abstract feature vectors into interpretable and applicable metric results.
Further, the development effectiveness measuring method based on machine learning further comprises a training step of training the correlation feature extractor based on the one-dimensional convolutional neural network model and the decoder. It should be appreciated that the training step plays a critical role in the machine learning based development effectiveness metric approach, by which the associated feature extractor and decoder can be trained using known input and output data so that they can learn the associated patterns and feature representations in the development process. The main purpose of the training step is to adjust parameters and weights of the model through a repeated iterative process, so that the model can better fit training data and generalize the training data to unseen data. Through training, the model can learn the mapping relation among the association mode, the characteristic representation and the metric value in the research and development process. The specific process of the training step comprises the following steps: 1. preparing training data, collecting and preparing input features and corresponding target outputs (development efficiency metric values) for training; 2. initializing model parameters, and initializing parameters and weights of a relevant feature extractor and a decoder according to the structure and algorithm selection of the model; 3. forward propagation, namely forward propagation is carried out on the input features through the associated feature extractor and the decoder to obtain a predicted research and development efficiency metric value; 4. calculating a loss function, comparing the predicted metric value with a real metric value, and calculating the loss function to measure the predicted error; 5. back propagation, namely updating parameters and weights of the model through a back propagation algorithm according to the loss function, so that the loss function is gradually reduced; 6. and repeating the iteration, and repeating the processes of forward propagation, calculation of the loss function and reverse propagation until the model converges or reaches the preset training round number. The model can learn the mode and rule of the research and development effectiveness measurement from the training data through a training step, and can predict and measure on unseen data, and the purpose of the training step is to improve the accuracy and generalization capability of the model so as to effectively evaluate and improve the research and development effectiveness in practical application.
Among them, it is worth mentioning that the back propagation algorithm of gradient descent is an optimization algorithm for training a neural network by calculating the gradient of network parameters with respect to the loss function and updating the parameters according to the direction of the gradient, thereby minimizing the loss function. The back propagation algorithm is mainly divided into two steps: forward propagation and backward propagation. In forward propagation, input data is passed through the various layers of the neural network, computing and delivering outputs layer by layer, each neuron computing an output according to the input and activation functions, and delivering the output to the next layer. In back propagation, the gradient of the output of the loss function to the output layer is first calculated, then the gradient is propagated forward to each layer according to the chain law, and the gradient of each layer is calculated, and finally each parameter is updated using a gradient descent algorithm to minimize the loss function. The core idea of the back propagation algorithm is to propagate the gradient forward from the output layer by the chain law to efficiently calculate the gradient of each parameter to the loss function. In this way, the network can update the parameters according to the direction of the gradient, thus gradually optimizing the performance of the model. The back propagation algorithm enables the neural network to learn a complex mapping relationship between the input and the output, and continuously adjusts parameters during training to improve accuracy and performance of the model.
In a specific example, as shown in fig. 5, the training step includes: s210, acquiring training data, wherein the training data comprises training research and development process data and reference measurement values of research and development efficiency measurement values; s220, carrying out feature extraction based on statistics on the training research and development process data to obtain a plurality of training research and development process statistical features; s230, processing the plurality of training research and development process statistical features by using the correlation feature extractor based on the one-dimensional convolutional neural network model to obtain correlation mode feature vectors among training research and development process indexes; s240, enabling the correlation mode feature vectors among the indexes of the training research and development process to pass through the decoder to obtain a training decoding loss function value; and S250, training the correlation feature extractor and the decoder based on the one-dimensional convolutional neural network model by using the training decoding loss function value as a loss function value through a gradient descent back propagation algorithm.
Accordingly, in the technical solution of the present application, the source data modal differences of the statistical features of the multiple training development processes are considered, that is, although the statistical features of the multiple training development processes are all represented as numerical statistical features, the feature types and the corresponding numerical units thereof all have large differences, for example, the number of code lines and the number of defects represent absolute numbers, the frequency of submission represents the frequency of times, and the complexity of the code represents the relative degree, so that after the statistical features of the multiple training development processes are arranged as input vectors and then pass through the correlation feature extractor based on the one-dimensional convolutional neural network model, although the extraction of local correlation features can be performed based on the one-dimensional convolutional kernel of the one-dimensional convolutional neural network model, the obtained correlation pattern feature vectors among the training development process indexes still have diversified local feature distributions corresponding to the statistical feature source data.
In this way, when the inter-training development process index correlation mode feature vector is decoded by the decoder, considering that in the domain transfer from the feature domain to the decoding target domain in the decoding process, the distribution transferability difference of the diversified feature distribution may exist, for example, the transferability of the correlation distribution between the same type of statistical features is significantly higher than that of the correlation distribution between different types of statistical features, it is desirable to improve the training effect of the decoding training of the inter-training development process index correlation mode feature vector by the decoder, that is, improve the decoding speed and the accuracy of the obtained decoding result by further adaptively optimizing the inter-training development process index correlation mode feature vector with respect to the weight matrix of the decoder. The applicant of the present application therefore performs a cross-domain attention-based transfer optimization on the weight matrix M during each iteration of the weight matrix of the decoder.
In a specific example, training the one-dimensional convolutional neural network model-based associative feature extractor and the decoder with the training decoding loss function value as a loss function value and by a gradient descent back propagation algorithm, includes: in each iteration process of the weight matrix of the decoder, carrying out cross-domain attention-based transfer optimization iteration on the weight matrix by using the following optimization formula; wherein, the optimization formula is:
wherein M is the weight matrix of the decoder, and M is m×m, V 1 To V m Is the M row vectors of the weight matrix M, II 2 Representing the two norms of the feature vector (Σ) j m i,j Is a row vector obtained by arranging the summation value of each row vector of the weight matrix M, and cov 1 (. Cndot.) and cov 2 (. Cndot.) all represent a single-layer convolution operation,representing matrix multiplication (.) T Representing the transpose operation M Representing the weight matrix of the decoder after the iteration.
Here, the cross-domain attention-based transfer optimization is directed to different representations of feature distributions of correlation pattern feature vectors among the training development process indexes in a feature space domain and a decoding target domain, and the cross-domain diversity feature representation of the weight matrix M of the decoder relative to the feature vectors to be decoded also has corresponding structured row and column space structures, so that the cross-domain gap transferability of good transfer feature distributions in the diversity feature distributions can be enhanced by giving attention to the spatial structured feature distributions of the weight matrix M through convolution operation, and meanwhile, negative transfer (negative transfer) of poor transfer feature distributions is restrained, so that unsupervised domain transfer adaptive optimization of the weight matrix M is realized based on the distribution structure of the weight matrix M relative to the feature vectors to be decoded, and the training effect of decoding training of the correlation pattern feature vectors among the training development process indexes through the decoder is improved.
It should be noted that the weight matrix of the decoder is a parameter of the decoder part in the neural network model, and is used to convert the output of the encoder into the final prediction result or the generated output. In a typical neural network, the decoder is typically made up of a series of fully connected layers. Each fully connected layer has a weight matrix for connecting and computing the input data to the neurons of that layer. In particular, the weight matrix of the decoder is a two-dimensional matrix whose size depends on the input and output dimensions of the layer. Each element represents a connection weight for weighting the input data with the corresponding neuron. The weight matrix of the decoder is updated by a back propagation algorithm during training to enable the decoder to adapt to the characteristics of the input data and generate accurate output results. It should be noted that the weight matrix of the decoder is typically part of the model, rather than a separate matrix, which together with the weight matrices and bias terms of the other layers forms the parameter set of the overall neural network model.
In summary, the method for measuring research and development efficiency based on machine learning according to the embodiments of the present application is illustrated, which can define efficiency indexes based on machine learning and measure process efficiency through an algorithm model through data collection, so that possible risk early warning can be performed based on the efficiency metrics, thereby improving overall research and development efficiency.
FIG. 6 is a block diagram of a machine learning based development effectiveness metric system 100 in accordance with an embodiment of the present application. As shown in fig. 6, a machine learning based development effectiveness metric system 100 according to an embodiment of the present application includes: a data acquisition module 110 for acquiring development process data, wherein the development process data includes codes, documents, requirements, tests, defects and version control; the feature extraction module 120 is configured to perform statistics-based feature extraction on the development process data to obtain a plurality of development process statistical features, where the plurality of development process statistical features include a code line number, a defect number, a submission frequency, and a code complexity; and a correlation analysis module 130, configured to perform correlation analysis on the plurality of statistical features of the development process using a machine learning model to obtain a development efficiency metric.
Here, it will be appreciated by those skilled in the art that the specific functions and operations of the respective modules in the above-described machine learning-based development efficacy measuring system 100 have been described in detail in the above description of the machine learning-based development efficacy measuring method with reference to fig. 1 to 5, and thus, repetitive descriptions thereof will be omitted.
As described above, the development effectiveness measuring system 100 based on machine learning according to the embodiment of the present application may be implemented in various wireless terminals, for example, a server or the like having a development effectiveness measuring algorithm based on machine learning. In one example, the machine learning based development effectiveness metric system 100 according to embodiments of the present application may be integrated into a wireless terminal as a software module and/or hardware module. For example, the machine learning based development effectiveness metric system 100 may be a software module in the operating system of the wireless terminal or may be an application developed for the wireless terminal; of course, the machine learning based development performance metric system 100 can also be one of a plurality of hardware modules of the wireless terminal.
Alternatively, in another example, the machine learning based development performance metric system 100 and the wireless terminal may be separate devices, and the machine learning based development performance metric system 100 may be connected to the wireless terminal via a wired and/or wireless network and communicate the interaction information in accordance with a agreed data format.
Fig. 7 is an application scenario diagram of a machine learning-based development effectiveness metric method according to an embodiment of the present application. As shown in fig. 7, in this application scenario, first, development process data (e.g., D illustrated in fig. 7) including codes, documents, requirements, tests, defects, and version control is acquired, and then, the development process data is input into a server (e.g., S illustrated in fig. 7) in which a machine learning-based development effectiveness metric algorithm is deployed, wherein the server can process the development process data using the machine learning-based development effectiveness metric algorithm to obtain a decoded value representing a development effectiveness metric value.
Furthermore, those skilled in the art will appreciate that the various aspects of the application are illustrated and described in the context of a number of patentable categories or circumstances, including any novel and useful procedures, machines, products, or materials, or any novel and useful modifications thereof. Accordingly, aspects of the application may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.) or by a combination of hardware and software. The above hardware or software may be referred to as a "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the application may take the form of a computer product, comprising computer-readable program code, embodied in one or more computer-readable media.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The foregoing is illustrative of the present application and is not to be construed as limiting thereof. Although a few exemplary embodiments of this application have been described, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of this application. Accordingly, all such modifications are intended to be included within the scope of this application as defined in the following claims. It is to be understood that the foregoing is illustrative of the present application and is not to be construed as limited to the specific embodiments disclosed, and that modifications to the disclosed embodiments, as well as other embodiments, are intended to be included within the scope of the appended claims. The application is defined by the claims and their equivalents.

Claims (10)

1. A machine learning based development effectiveness metric method, comprising:
acquiring development process data, wherein the development process data comprises codes, documents, requirements, tests, defects and version control;
performing statistics-based feature extraction on the research and development process data to obtain a plurality of research and development process statistical features, wherein the plurality of research and development process statistical features comprise code line numbers, defect numbers, submission frequency and code complexity; and
and performing association analysis on the plurality of statistical characteristics of the research and development process by using a machine learning model to obtain a research and development efficiency metric value.
2. The machine learning based development effectiveness metric method of claim 1, wherein performing a statistical-based feature extraction on the development process data to obtain a plurality of development process statistical features comprises:
cleaning and sorting the research and development process data to obtain preprocessed research and development process data; and
and carrying out statistics-based feature extraction on the preprocessed development process data to obtain the plurality of development process statistical features.
3. The machine learning based development performance metric method of claim 2, wherein performing a correlation analysis on the plurality of development process statistics using a machine learning model to obtain a development performance metric value comprises:
arranging the plurality of research and development process statistical features into research and development process statistical feature input vectors;
extracting local neighborhood associated features of the statistical feature input vector of the research and development process by using the machine learning model to obtain an inter-index associated mode feature vector of the research and development process; and
and generating the research and development efficiency measurement value based on the related mode feature vector among the research and development process indexes.
4. The machine learning based development effectiveness metric method of claim 3, wherein the machine learning model is a one-dimensional convolutional neural network model based correlation feature extractor.
5. The machine learning based development effectiveness metric method of claim 4, wherein using the machine learning model to perform local neighborhood correlation feature extraction on the development process statistical feature input vector to obtain a development process inter-index correlation pattern feature vector comprises:
inputting the research and development process statistical feature input vector into the correlation feature extractor based on the one-dimensional convolutional neural network model to obtain the correlation mode feature vector among the research and development process indexes.
6. The machine learning based development performance metric method of claim 5, wherein generating the development performance metric value based on the inter-development process indicator correlation pattern feature vector comprises:
and passing the inter-development process index correlation mode feature vector through a decoder to obtain a decoding value, wherein the decoding value is used for representing a development efficiency measurement value.
7. The machine learning based development effectiveness metric method of claim 6, further comprising a training step of training the one-dimensional convolutional neural network model based correlation feature extractor and the decoder.
8. The machine learning based development effectiveness metric method of claim 7, wherein the training step comprises:
acquiring training data, wherein the training data comprises training research and development process data and a reference measurement value of research and development efficiency measurement value;
carrying out statistics-based feature extraction on the training research and development process data to obtain a plurality of training research and development process statistics features;
processing the statistical characteristics of the training research and development processes by using the correlation characteristic extractor based on the one-dimensional convolutional neural network model to obtain correlation mode characteristic vectors among the training research and development process indexes;
the correlation mode feature vectors among the indexes of the training research and development process pass through the decoder to obtain a training decoding loss function value; and
the associated feature extractor and the decoder based on the one-dimensional convolutional neural network model are trained by a back propagation algorithm with the training decoding loss function value as a loss function value and through gradient descent.
9. The machine learning based development effectiveness metric method of claim 8, wherein training the one-dimensional convolutional neural network model based correlation feature extractor and the decoder with the training decoding loss function value as a loss function value and by a gradient descent back propagation algorithm comprises:
in each iteration process of the weight matrix of the decoder, carrying out cross-domain attention-based transfer optimization iteration on the weight matrix by using the following optimization formula;
wherein, the optimization formula is:
wherein M is the weight matrix of the decoder, and M is m×m, V 1 To V m Is the M row vectors of the weight matrix M, II 2 Representing the two norms of the feature vector (Σ) j m i,j Is a row vector obtained by arranging the summation value of each row vector of the weight matrix M, and cov 1 (. Cndot.) and cov 2 (. Cndot.) all represent a single-layer convolution operation,representing matrix multiplication (.) T Representing the transpose operation M Representing the weight matrix of the decoder after the iteration.
10. A machine learning based development effectiveness metric system, comprising:
the data acquisition module is used for acquiring research and development process data, wherein the research and development process data comprises codes, documents, requirements, tests, defects and version control;
the feature extraction module is used for carrying out feature extraction based on statistics on the research and development process data to obtain a plurality of research and development process statistical features, wherein the plurality of research and development process statistical features comprise code line numbers, defect numbers, submission frequency and code complexity; and
and the association analysis module is used for carrying out association analysis on the plurality of statistical characteristics of the research and development process by using a machine learning model so as to obtain a research and development efficiency measurement value.
CN202310917895.4A 2023-07-25 2023-07-25 Research and development efficiency measurement method and system based on machine learning Pending CN116909910A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310917895.4A CN116909910A (en) 2023-07-25 2023-07-25 Research and development efficiency measurement method and system based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310917895.4A CN116909910A (en) 2023-07-25 2023-07-25 Research and development efficiency measurement method and system based on machine learning

Publications (1)

Publication Number Publication Date
CN116909910A true CN116909910A (en) 2023-10-20

Family

ID=88352921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310917895.4A Pending CN116909910A (en) 2023-07-25 2023-07-25 Research and development efficiency measurement method and system based on machine learning

Country Status (1)

Country Link
CN (1) CN116909910A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116777892A (en) * 2023-07-03 2023-09-19 东莞市震坤行胶粘剂有限公司 Method and system for detecting dispensing quality based on visual detection

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116777892A (en) * 2023-07-03 2023-09-19 东莞市震坤行胶粘剂有限公司 Method and system for detecting dispensing quality based on visual detection
CN116777892B (en) * 2023-07-03 2024-01-26 东莞市震坤行胶粘剂有限公司 Method and system for detecting dispensing quality based on visual detection

Similar Documents

Publication Publication Date Title
CN109034368B (en) DNN-based complex equipment multiple fault diagnosis method
US11650968B2 (en) Systems and methods for predictive early stopping in neural network training
Molnar et al. Pitfalls to avoid when interpreting machine learning models
JP4627674B2 (en) Data processing method and program
CN111124840A (en) Method and device for predicting alarm in business operation and maintenance and electronic equipment
CN110335168B (en) Method and system for optimizing power utilization information acquisition terminal fault prediction model based on GRU
CN116909910A (en) Research and development efficiency measurement method and system based on machine learning
CN111198817A (en) SaaS software fault diagnosis method and device based on convolutional neural network
CN112910690A (en) Network traffic prediction method, device and equipment based on neural network model
CN111447574A (en) Short message classification method, device, system and storage medium
Klein et al. Towards reproducible neural architecture and hyperparameter search
CN113822499A (en) Train spare part loss prediction method based on model fusion
CN114840375B (en) Aging performance testing method and system for semiconductor storage product
CN111400964B (en) Fault occurrence time prediction method and device
CN111367798B (en) Optimization prediction method for continuous integration and deployment results
Zong et al. Embedded software fault prediction based on back propagation neural network
CN117194163A (en) Computer equipment, fault detection system, method and readable storage medium
US7840391B2 (en) Model-diversity technique for improved proactive fault monitoring
CN112800037B (en) Optimization method and device for engineering cost data processing
US11481267B2 (en) Reinforcement learning approach to root cause analysis
CN115495085A (en) Generation method and device based on deep learning fine-grained code template
CN107247664A (en) A kind of agreement Measurement Method towards open source software
CN115169426A (en) Anomaly detection method and system based on similarity learning fusion model
John A control chart pattern recognition methodology for controlling information technology-enabled service (ITeS) process customer complaints
Vanderheyden et al. Net Promoter sentiment classifier using OHPL-ALL

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination