WO2022262247A1 - 代码缺陷状态确定方法、装置、设备、介质及程序 - Google Patents

代码缺陷状态确定方法、装置、设备、介质及程序 Download PDF

Info

Publication number
WO2022262247A1
WO2022262247A1 PCT/CN2021/141249 CN2021141249W WO2022262247A1 WO 2022262247 A1 WO2022262247 A1 WO 2022262247A1 CN 2021141249 W CN2021141249 W CN 2021141249W WO 2022262247 A1 WO2022262247 A1 WO 2022262247A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
version
probability
code
defect
Prior art date
Application number
PCT/CN2021/141249
Other languages
English (en)
French (fr)
Inventor
刘珍
赵学亮
余伟
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2022262247A1 publication Critical patent/WO2022262247A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3692Test management for test results analysis

Definitions

  • This application relates to financial technology (Fintech) information technology, specifically, but not limited to, a code defect determination method, device, equipment, medium and program.
  • the evaluation of the defect state or quality of code data can only be carried out under the condition that the eigenvalues corresponding to the indicators of code data satisfy the normal distribution.
  • the distribution of the eigenvalues of is relatively high, so the evaluation and determination of the defect status of arbitrary code data cannot be realized.
  • Embodiments of the present application provide a code defect state determination method, device, equipment, medium and program.
  • the method for determining the code defect status provided by the embodiment of the present application can determine the defect status of the code data even when at least one index data of the code data is randomly distributed, thereby realizing the flexible evaluation of the defect status of any code data.
  • a method for determining a code defect state the method being executed by an electronic device; the method comprising:
  • the project data includes at least one version of code data that realizes project functions;
  • the index data includes quality defect data of the project data;
  • a defect status of the code data is determined.
  • the embodiment of the present application also provides a device for determining a code defect state, the device comprising:
  • the first determination module is configured to determine at least one index data of project data; wherein, the project data includes at least one version of code data that implements project functions; the index data includes quality defect data of the project data ;
  • a processing module configured to perform clustering processing on the at least one indicator data to obtain a clustering result
  • the second determination module is configured to determine the defect state of the code data based on the clustering result.
  • the embodiment of the present application also provides an electronic device, and the electronic device includes:
  • memory configured to store executable instructions
  • a processor configured to implement the method for determining a code defect state as described in any one of the preceding items when executing the executable instruction stored in the memory.
  • the embodiment of the present application also provides a computer-readable storage medium, wherein executable instructions are stored in the computer-readable storage medium, and when the executable instructions are executed by a processor, the code described in any one of the preceding items can be implemented. Defect status determination method.
  • the obtained clustering results can still objectively and comprehensively reflect the distribution status between different index data and different versions of the same index data.
  • the code determined based on the clustering results The defect status of the data can objectively and accurately reflect the actual distribution of defects in the project data and the change trend of defects in different versions of the code data, so that it can objectively and comprehensively reflect the actual defects of at least one version of the code data status; and, since the code defect state determination method provided by the embodiment of the present application does not limit the distribution of at least one index data, the code defect state determination method provided by the embodiment of the present application can realize the defect detection of any code data The evaluation of the state can be applied in a wider range of scenarios.
  • FIG. 1 is a schematic flowchart of a method for determining a code defect state provided in an embodiment of the present application
  • FIG. 2 is a schematic flow diagram of determining at least one indicator data of project data provided by the embodiment of the present application
  • FIG. 3 is a schematic flowchart of performing clustering processing on at least one index data to obtain clustering results provided by the embodiment of the present application;
  • FIG. 4 is a schematic flow diagram of obtaining a clustering result provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of determining the defect state of code data provided by the embodiment of the present application.
  • FIG. 6 is a schematic flow diagram of obtaining the first data set to the nth data set provided by the embodiment of the present application.
  • Fig. 7 is a schematic flow chart for determining the defect probability of code data of the m+1th version provided by the embodiment of the present application.
  • FIG. 8 is another schematic flow chart for determining the defect probability of code data of the m+1th version provided by the embodiment of the present application.
  • Fig. 9 is another schematic flowchart for determining the defect probability of code data of the m+1th version provided by the embodiment of the present application
  • FIG. 10 is another schematic flowchart of a method for determining a code defect state provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a device for determining a code defect state provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the code defect state determination method provided by the embodiment of the present application includes a series of steps, but the code defect state determination method provided by the embodiment of the present application is not limited to the steps described.
  • the code defect state provided by the embodiment of the present application The status determining device includes a series of modules, but the device provided by the embodiment of the present application is not limited to including the modules explicitly recorded, and may also include modules that need to be configured for obtaining relevant information or processing based on the information.
  • the defect status of code data can only be evaluated when the eigenvalues corresponding to the indicators of the code data satisfy a normal distribution, and such an evaluation method requires relatively high requirements for the eigenvalues corresponding to the indicators of the code data. Therefore, it is impossible to accurately evaluate the defect status of arbitrary code data.
  • the quality status assessment of code data in related technologies has problems such as many limiting factors that cannot be promoted, poor flexibility of the assessment method, and insufficient objectivity of the assessment results.
  • an embodiment of the present application provides a method for determining a code defect state, which can be executed by an electronic device.
  • the above-mentioned electronic devices may include terminals and/or servers.
  • the terminals may be thin clients, thick clients, handheld or laptop devices, microprocessor-based systems, programmable consumer electronics products, network Personal computers, small computer systems, and more.
  • a server may be a small computer system, a mainframe computer system, and a distributed cloud computing technology environment including any of the above, among others.
  • Electronic devices such as servers may include program modules for executing computer instructions.
  • program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computing system storage media including storage devices.
  • the method for determining the code defect state can be implemented by a processor of any of the above electronic devices, and the above processor can be an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), digital signal processor (Digital Signal Processor, DSP), digital signal processing device (Digital Signal Processing Device, DSPD), programmable logic device (Programmable Logic Device, PLD), field programmable logic gate array (Field Programmable Gate Array, FPGA), central processing At least one of a Central Processing Unit (CPU), a controller, a microcontroller, and a microprocessor. It can be understood that the electronic device implementing the above processor function may also be other, which is not limited in this embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a method for determining a code defect state provided by an embodiment of the present application. As shown in Figure 1, the method may include steps 101 to 103:
  • Step 101 Determine at least one index data of project data.
  • the project data includes at least one version of code data that realizes the project function; the index data includes quality defect data of the project data.
  • code data may include data written in any programming language.
  • the programming language may include high-level programming languages, such as Java language, C++ language, C language, etc.;
  • the programming language may also include low-level programming languages, such as assembly language;
  • the programming language may also include It can include any language in scripting language and hardware description language (Very High Speed Integrated Circuit Hardware Description Language, VHDL).
  • the code data may be source code data; for example, the code data may also be executable code data obtained after compiling the source code data.
  • the project function may include at least one of page display function, data upload/download function, data storage function, data query function, and data transmission function, which is not limited in this embodiment of the application.
  • code data of different versions can be managed and counted through version numbers; when there are at least two versions of code data, the code data of the first version and the code data of the second version The difference between them may include partial differences; for example, the code data may be divided according to modules, then the difference between the code data of the first version and the code data of the second version may include at least part of the module code difference between.
  • the project data may include code data of at least one version that is in the internal testing stage; it may also include code data of at least one version that has been released.
  • the project data may include code data of at least one version in the research and development process, and may also include code data of at least one version in operation after delivery.
  • the quality defect data may include at least one version of the fault data that occurs during the operation of the code data; for example, the quality defect data may be obtained through internal testing by R&D personnel, testing by professional testers, business processing At least one way, such as a process, is determined by processing at least one version of code data; for example, the quality defect data may also include static code scanning bugs of at least one version of code data.
  • the quality defect data may include at least one of static code scanning bugs and defect density.
  • the type and/or quantity of indicator data may include the type and/or quantity of all or part of the indicator data of the project data; for example, some indicator data may include indicators of some modules in the project data data.
  • the index data may include at least one quality defect data of the project data; for example, when the index data includes multiple quality defect data, the index data may be embodied in the form of a matrix, exemplary Yes, the above matrix can be recorded as C mn , that is, the specification of the matrix is m*n, where m and n are both integers greater than or equal to 1; m can correspond to the version number of the code data, and N can correspond to the type of index data quantity.
  • Step 102 Perform clustering processing on at least one index data to obtain a clustering result.
  • the clustering process may be implemented by a general clustering method.
  • exemplary, general clustering methods are K-means clustering, mean shift clustering, density-based clustering methods and the like.
  • performing clustering processing on at least one type of index data may be sorting at least one type of index data according to the version number of at least one version of code data, and then performing clustering processing according to the sorting result .
  • the index data of each version can be clustered according to the version number of at least one code data; the sorted index data can also be Unified clustering processing.
  • the distribution characteristics of at least one index data can be extracted from randomly distributed at least one index data. Therefore, even in at least one When the index data cannot satisfy the normal distribution, through clustering processing, the distribution state of the index data can still be extracted objectively and accurately from at least one index data, thus establishing the objectivity of the data for the determination of the defect state of the project data Base.
  • Step 103 based on the clustering result, determine the defect state of the code data.
  • the defect state of the code data may include the severity and/or number of defects of the state of the code data, and may also include the probability that the state of the code data has a defect of a specified severity.
  • the defect state of the code data may be determined by the distribution of the index data in each cluster of the clustering result; for example, the distribution of the index data may represent the index data in each cluster The number of data, distribution density, etc.
  • the defect state of the code data can be determined according to the version number of the code data and the analysis and processing of the indicator data in each cluster of the clustering results. In this way, the defects corresponding to each version of the code data The state can reflect the change trend of the defect state of the project data in the version number dimension, so that it can objectively show the change process of the defect in at least one version of the code data as a whole.
  • the defect state of the code data may be reflected by the defect state of any version of the code data.
  • the defect status of any version of the code data may include the probability of a defect event or failure occurring when the version of the code data implements at least one function, and the potential existence of any module of the code data of the version.
  • the probability of a failure and at least one of the probability of a failure of a specified level during the running of the code data of this version are not limited in this embodiment of the present application.
  • the defect status of any version of the code data may include the number of times the code data of this version is expected to have defects of a specified severity level during operation, and the number of faults that occur during the operation of the code data of this version. The probability associated with a defect of the specified severity, etc.
  • the obtained clustering results can still objectively and comprehensively reflect the distribution status between different types of index data and different versions of the same index data, on this basis , the defect status of the code data determined based on the clustering results can objectively and accurately reflect the actual distribution of defects in the project data and the changing trend of defects in different versions of the code data, thus objectively and comprehensively reflecting at least The quality of a version of the code data changes; and, since the code defect state determination method provided by the embodiment of the present application does not limit the distribution of at least one index data, the code defect state determination method provided by the embodiment of the present application can Realize the evaluation of the status and quality of arbitrary code data, so it can be applied in a wider range of scenarios.
  • determining at least one index data of the project data can be realized through Fig. 2, and Fig. 2 is at least one of determining the project data provided by the embodiment of the present application Schematic diagram of the flow of indicator data, as shown in Figure 2, the flow may include steps 1011 to 1013:
  • Step 1011 acquiring at least two original index data of any type of project data.
  • the first type index data may be defect density; the second type index data may be static code scanning bug.
  • Step 1012. Determine weight information corresponding to each original index data in at least two original index data of any type.
  • the weight information corresponding to different original index data may be different.
  • the weight information corresponding to the original index data may be set corresponding to the level information of the original index data;
  • the level information of the original index data may be based on the quality or status of the project data based on the original index data It can also be determined according to the degree of influence of the project data, or according to the test target of the project data, or according to the stage of the project data.
  • its levels may include five levels from L1 to L5; correspondingly, there may also be five weight information corresponding to the above five levels.
  • Step 1013 based on the weight information, perform weighting processing on each original index data, and determine any type of index data.
  • the nth type of index data may be determined by weighted summation of at least two original index data of the nth type.
  • the original index data of the defect density type may include five levels of defects from L1 to L5, wherein the weight corresponding to the L1 defect level may be 1.6; the weight corresponding to the L2 defect level It can be 1.3; the weight corresponding to the L3 defect level can be 1; the weight corresponding to the L4 defect level can be 0.7; the weight corresponding to the L5 defect level can be 0.4; for example, based on the above defect levels and their corresponding weights, it can be The number of defects at each level is weighted to determine the index data of the defect density type; for example, the index data of the defect density type can also be determined through formula (1):
  • I is the number of defect levels, and its value can be 5; a i is the total number of defects of the i-th defect level; b i is the i-th defect level
  • P mn can represent the index data of the nth type in the code data of the mth version, namely
  • the original index data of the density defect type and the total number of test cases can be obtained through the Descon project management tool (Descon Project Management System, DPMS); exemplary, through the DPMS, at least one code can be obtained A collection of test cases associated with data and defect data.
  • Descon Project Management System DPMS
  • DPMS Descon Project Management System
  • the static code scanning bug can be acquired through the sonarqube plug-in embedded in the continuous integration (Continuous integration, CI) platform.
  • CI Continuous integration
  • the above plug-in when the code data is constructed, potential or obvious errors in the source code corresponding to the code data can be checked out.
  • the above-mentioned errors can be divided into levels according to severity, for example, static code scanning bugs can be divided into blocking errors, minor errors, serious errors, prompt errors, and major level errors; exemplary, static code
  • the indicator data of the scanning bug type can be determined by formula (2):
  • X j is the number of static code scanning bugs at jth level
  • W j is the weight of static code scanning bugs at jth level
  • J is an integer greater than 1, which is used to represent static code scanning bugs
  • Q mn can represent the index data of the nth type in the code data of the mth version, that is, the index data of the static code scanning bug type.
  • the above steps only show the process of determining the corresponding type of index data based on the two types of original index data.
  • the types of original index data for project data can also include multiple types, and this embodiment of the application does not do this limited.
  • the weight information corresponding to each level of original index data of the corresponding type can also be determined, and then based on the weight information
  • Each original indicator data is weighted to determine the corresponding type of indicator data. That is to say, in the embodiment of the present application, after obtaining any type of original indicator data, it can also be weighted according to the level of the original indicator data, so that the result of the weighting process can carry the level of the original indicator data Information, in the actual project analysis process, by adjusting the weight information of different levels of original index data, targeted analysis of some types of original index data can be realized, thereby improving the flexibility of determining the defect status of code data.
  • FIG. 3 is a schematic flowchart of performing clustering processing on at least one type of index data to obtain clustering results provided by the embodiments of the present application.
  • the method may include step 1021 to step 1022:
  • Step 1021 Analyze each type of index data in at least one type of index data, and determine initial centroid data of each type of index data.
  • the initial centroid data includes at least two of the maximum value, minimum value, average value, mode and median of each index data.
  • the number of types of index data of each version of code data may be N types.
  • the number of the maximum value, minimum value, average value, and mode number of each type of index data may be at least one.
  • each data in the initial centroid data F n of the nth index data set may have two-dimensional coordinate components, namely F nx and F ny .
  • Step 1022 based on the initial centroid data, perform clustering processing on each index data to obtain a clustering result.
  • a clustering method may be determined first, and according to the clustering method, each index data may be clustered based on the initial centroid data, so that a clustering result may be obtained.
  • the same clustering method may be used for different types of index data, or different clustering methods may be used, which is not limited in this embodiment of the present application.
  • the clustering result used to determine the defect state of the code data is obtained by clustering at least one index data, and through clustering processing, at least A defect characteristic carried by index data provides a data basis for determining the defect status of code data.
  • each index data is clustered to obtain the clustering result, which can be realized through Figure 4, which is the clustering obtained in the embodiment of the present application
  • the schematic flow chart of the result, as shown in Figure 4, the process may include steps A1 to A4:
  • Step A Determine the first distance information between any index data in each type of index data and each data of the initial centroid data.
  • the first distance information may represent any of the Euclidean distance, Manhattan distance, Chebyshev distance, and power distance between any index data in each index data and each data in the initial centroid data. a distance.
  • each data in the nth initial centroid data of the nth type of index data set may have a two-dimensional coordinate component, namely (F nx , F ny ); correspondingly, each index data Any of the index data can also have two-dimensional coordinate components; it should be noted that the two-dimensional coordinate system where the initial centroid data is located can be the same as the two-dimensional coordinate system where any index data is located; if the two are located in If the two-dimensional coordinate systems are different, it is necessary to convert according to the ratio of the two two-dimensional coordinate systems.
  • the two-dimensional coordinate component of cmn may be (c mnx , cmny ).
  • the code data of the mth version The first distance information D(c mn , F na ) between the corresponding nth index data and the corresponding initial centroid data can be determined by formula (3):
  • a is an integer greater than or equal to 1 and less than or equal to A, where A is the number of data in the initial centroid data, and A is an integer greater than or equal to 2 and less than or equal to 5;
  • Fna is The a-th centroid data in the n-th initial centroid data F n ; (F nax , F nay ) are the two-dimensional coordinate components of F na respectively.
  • Step A2 Based on each first distance information, perform clustering processing on each index data to obtain an intermediate clustering result.
  • the intermediate clustering result can be obtained in the following manner: judge each first distance information corresponding to the nth type of index data, and classify the index data whose first distance information is less than the distance threshold , and divide these index data into one cluster, and the cluster obtained is the intermediate clustering result.
  • the above operations may be repeatedly performed, so that multiple intermediate clustering results may be obtained, and each intermediate clustering result may include at least one cluster.
  • the number of index data in the cluster and the centroid of the cluster will also change dynamically.
  • Step A3. Based on the intermediate clustering result, update the first distance information.
  • the number of index data contained in the wth cluster of the tth intermediate clustering result is the same as the t-1th intermediate clustering
  • the number of index data contained in the wth cluster of the class result can be different, therefore, the centroid of the wth cluster changes from the t-1th intermediate clustering result to the tth intermediate clustering result, so, in the clustering During the clustering process, the distance between any index data in the wth cluster and the centroid, that is, the first distance information, will change as the clustering process progresses. Therefore, the first distance information can be updated based on the intermediate clustering results .
  • t is an integer greater than or equal to 1
  • w is an integer greater than or equal to 0.
  • Step A4 In the case that the sum of squared errors of each first distance information does not converge, continue to execute steps A2 to A3; when the sum of squared errors of each first distance information converges, the clustering process is completed to obtain Clustering results.
  • the sum of squared errors of each first distance information converges, which may mean that the sum of squared errors of each first distance information is relatively stable without significant changes.
  • the error square sum S n of each first distance information corresponding to the nth index data of m versions of code data can be calculated by formula (4):
  • b is an integer greater than 1
  • c bn is the nth index data in the code data of the bth version.
  • the clustering process progresses, if the variation of S n is less than a preset threshold, it may be determined that the clustering process ends.
  • the condition for ending the clustering is strictly controlled, so that the clustering effect can be improved.
  • At least one index data includes n types of index data
  • the clustering results include the first result to the nth result; when i ranges from 1 to n, the i-th result is the aggregation of the i-th index data Class results; project data, including code data from version 1 to version m; defect status of code data, including defect probability of code data of version m+1; defect probability of code data of version m+1 , when the quantitative result corresponding to the clustering result including at least one index data appears, the probability of any type of defect event in the m+1th version of the code data; n is an integer greater than or equal to 1; m is an integer greater than or equal to 2.
  • the nth type of index data may include multiple index data; correspondingly, the nth result may include at least one cluster.
  • determining the defect state of the project data can be realized through the process shown in FIG. 5 .
  • the process may include Step B1 to Step B3:
  • Step B1 performing statistics on the clustering results, and determining quantization interval information.
  • the quantized interval information represents the interval distribution information of the distance between the index data and the centroid in the first result to the nth result.
  • the clustering results are counted, which means that the clustering results corresponding to the index data containing m versions of code data are counted, that is, the quantization interval information is based on m versions of code data
  • the quantitative interval information determined can statically reflect the defect type of the code data itself, and on the other hand, it can also dynamically reflect the change trend of the index data in the code data with the version number. Therefore, the quantitative interval information determined in the above manner can more objectively reflect the objective distribution state of the index data of the m versions of code data.
  • the interval information carried in the quantized interval information and the quantity of the interval information carried in the quantized interval information may be determined by statistically determining the distance between all index data and the centroid of the cluster.
  • the quantization interval information may include interval set information of at least one interval information; for example, the quantization interval information corresponding to different types of index data may be different or the same.
  • the embodiment of the application does not limit this.
  • the distance information covered by any interval information in the quantization interval information may be determined according to the stage and/or state of at least one version of the code data, for example, when the developer self-tests the code In the stage of data, the quantitative interval information can be the first interval information; after the code data is delivered to the testers, the quantization interval information can be the second interval information; when the code data release is switched to the operation state, the quantization interval information can be The third interval information.
  • the distance information covered by any interval information in the quantization interval information may be determined according to the functions realized by the code data itself and/or the characteristics of the code data. For example, if the code data is mainly used to realize the underlying functions, the distance information covered by any interval information in the quantized interval information corresponding to the underlying functions can be compared with the distance information covered by any interval information in the quantized interval information corresponding to the upper layer functions different.
  • Step B2. Quantize the first to nth results based on the quantization interval information to obtain the first to nth data sets.
  • the coordinate information of the indicator data in each cluster of the first result to the nth result may be quantized according to the quantization interval information, so as to obtain the first data set to the nth data set.
  • the number of quantization intervals included in the quantization interval information can be flexibly adjusted according to the index data; for example, according to the number of quantization intervals included in the quantization interval information, it can be determined that each data set The value range of the data.
  • the corresponding quantization result can be determined according to the number of the quantization interval in which each quantization result falls, for example, the first If any of the n results falls into the first quantization interval, its corresponding quantization result can take a value of 1; correspondingly, if the quantization interval information contains five quantization intervals, the nth data set
  • the value range of each data may be [1, 5].
  • the quantization of the first result to the nth result is performed based on the quantization interval information, and the quantization interval information can be determined according to the state and/or characteristics of the code data itself. Therefore, in the first In the process of quantifying from the 1st result to the nth result, the characteristics of the code data itself are fully considered, so that the quantization results can better fit the distribution characteristics of the index data itself of the code data.
  • Step B3 when n is greater than 1, determine the defect probability of the code data of the m+1th version based on the first data set to the nth data set.
  • the defect probability of the code data of the m+1th version may represent the probability of occurrence of any type of defect event in the code data of the m+1th version when at least one indicator data appears.
  • the conditional probability of occurrence That is to say, the defect probability of the m+1th version of the code data is not only used to evaluate the probability of defect events in the m+1th version of the code data, but also can be used to compare the index characteristics of the code data with the possible Therefore, the defect probability of the code data of the m+1th version can reflect the impact of at least one indicator data on any defect event to a certain extent.
  • the defect probability of the m+1th version of code data may be the first to nth data sets obtained after clustering and quantification of the m+1th version of code data.
  • the defect probability of the code data of the m+1th version can be predicted, which is convenient for pre-intervention and reduces the code The probability of defects in the data operation process.
  • the defect probability of code data determined by the code defect state determination method can not only obtain the occurrence of any type of defect in a certain version of the code data when at least one or several types of index data appear.
  • the probability of occurrence of defect events can predict the defect probability of the m+1th version of code data based on m versions of code data.
  • the code data quality or status is determined, and the defect probability of the next version of the code data can be predicted.
  • the first result to the nth result are quantized to obtain the first data set to the nth data set, which can be realized through the process shown in Figure 6, which is the embodiment of the present application
  • the provided flow diagram for obtaining the first data set to the nth data set, as shown in Figure 6, the process may include steps C1 to C2:
  • Step C1. Analyze the first to nth results, and determine the second distance information between any data in the first to nth results and the centroids of the first to nth results.
  • the nth result may contain at least one cluster, taking the nth result containing K clusters as an example, wherein the kth cluster may contain n k data; wherein, K may be an integer greater than 1; k is an integer greater than or equal to 1 and less than K; n k may be an integer greater than 1.
  • K may be an integer greater than 1
  • k is an integer greater than or equal to 1 and less than K
  • n k may be an integer greater than 1.
  • Each data in the kth cluster of the nth result can be expressed in a two-dimensional coordinate system; correspondingly, the second distance information corresponding to the kth cluster in the nth result can be represented in the form of Euclidean distance.
  • Step C2 based on the quantization interval information and the second distance information, quantize the data in the first result to the nth result to obtain the first data set to the nth data set.
  • any data in the kth cluster in the nth result as an example, the data in the first to nth results are quantified and described, and the kth in the nth result is recorded Any data in the cluster is C mn , and the distance between it and the centroid of the kth cluster in the nth result is d. At this time, Cmn can be quantified according to the matching degree between d and the quantization interval information .
  • the quantization result corresponding to C mn can be 1; if d is greater than the maximum value of the first quantization interval and less than or equal to the minimum value of the second quantization interval, then the quantization result corresponding to C mn can be 2; if d is greater than the maximum value of the second quantization interval and less than or equal to the minimum value of the third quantization interval, then the quantization result corresponding to C mn can be is 3; if d is greater than the maximum value of the third quantization interval and less than the minimum value of the fourth quantization interval, the quantization result corresponding to C mn can be 4; if d is greater than the maximum value of the fourth quantization interval, then the corresponding quantization result of C mn The quantized result can be 5.
  • the intervals covered by the first to fifth quantization intervals are adjacent to each other and satisfy a relationship of increasing
  • the first data set to the nth data set can be represented in the form of a matrix.
  • the matrix can be recorded as C' mn
  • each element in it can be recorded as c' mn
  • its value range is c' mn ⁇ 1,5 ⁇ .
  • the distribution characteristics of the index data can be extracted from the randomly distributed index data, and the quantitative processing of the clustering results can quantify the index data of random size to a limited number.
  • the data set of value range can reduce the amount of computation for subsequent calculation of the defect state of the code data; and, since the quantization of the index data is based on the quantization interval information, the first data set obtained after quantization to the first In the n data set, the indicator characteristics of the indicator data are still carried, so that the accuracy of the defect state of the subsequent code data can be improved.
  • FIG. 7 is a schematic flowchart of determining the defect probability of code data of the m+1th version provided by the embodiment of the present application. As shown in FIG. 7, the process may include steps D1 to D3:
  • Step D1 from the first data set to the nth data set, obtain the quantification results corresponding to the clustering results of the indicator data of the code data of the mth version.
  • the first data set to the nth data set include the quantification results corresponding to the clustering results of the index data of m versions of the code data; for example, each index data and the index data correspond to
  • the clustering results and the quantification results corresponding to the clustering results can carry the version information of the code data. In this way, based on the version information of the code data, the mth data set can be selected from the first data set to the nth data set.
  • Step D2 based on the quantification results corresponding to the index data of the code data of the m-th version, determine the quality score of the code data of the m-th version.
  • the quantification result corresponding to the clustering result of the indicator data of the code data of the mth version may include n data, and at this time, several data may be selected from the n data, and determined based on the above-mentioned several data
  • the quality score of the code data of the mth version Exemplarily, the weighted summation of n data can be performed to determine the quality score of the code data of the mth version; Exemplarily, the n data can be summed directly , so as to determine the quality score of the code data of the mth version.
  • the quality score of the code of the mth version can be recorded as H, which can be calculated by formula (5):
  • c' mp represents the quantitative result of the clustering result of the p-th index data of the m-th version.
  • the quality score of the code data of the mth version may be the health score of the code data of the mth version.
  • Step D3 if the quality score is greater than the scoring threshold, determine the defect probability of the m+1th version of the code data based on the first data set to the nth data set.
  • the quality score is less than or equal to the score threshold, problems with the indicator data in the code data of the m-th version can be determined, and a review is performed on these problems to improve the quality of the code data.
  • the scoring threshold can be determined according to the state of the code data, and can also be based on at least one of the functions implemented by the code data and the probability of defect events or failures in the historical versions of the code data. However, it is determined that this is not limited in the embodiment of the present application.
  • the defect probability of the code data of the m+1th version can be sorted according to the version numbers of the code data, sorting the data in the first data set to the nth data set, and sorting the results Statistics, from which the quantitative results of the clustering of the indicator data are obtained change with the version number, and are determined according to the above-mentioned change trend.
  • the defect probability of the m+1th version of the code data is determined based on the first data set to the nth data set, so that The conditions for calculating the defect probability of the m+1th version of code data can be controlled, and the amount of data for calculating the defect probability of the m+1th version of code data can also be reduced.
  • the defect probability of the code data of the m+1th version is determined, which can also be realized through Fig. 8, which is provided in the embodiment of the present application to determine the Another schematic flow diagram of the defect probability of m+1 versions of code data, as shown in Figure 8, the flow may include steps E1 to steps E5 to achieve:
  • Step E1 acquiring event type information.
  • the event type information indicates the type information of at least one defect event that occurs during the running of any version of the code data.
  • the event type information may indicate the type information of defect events or failures that may occur during the operation of each version of the code data, that is, the event type information in the embodiment of the application, in any
  • the code data of the version has universal significance; for example, the event type information can be expressed in the form of a character string, or in the form of a number number, which is not limited in the embodiment of this application; for example, the event type
  • the information may also include the degree of impact of each defect event or fault on the code data quality, that is, the level of the defect event or fault.
  • the event type information can also be embodied in the form of a matrix.
  • the number of event types in the event type information can be W
  • the kind of event type information can be the same.
  • the event type information may include a database (DataBase, DB) class, an application programming interface ⁇ Application Programming Interface, API ⁇ class, a compatibility type, and the like.
  • the statistical results of w types of event type information appearing in m versions of code data and the quantification results corresponding to the clustering results of n types of index data can be shown in Table 1 .
  • 1, 2, m on the left side of the table represent the code data numbers of the m versions released iteratively; 1, 2, ..., n in the horizontal column on the upper side of the table are used to represent m Quantification results obtained after clustering n types of index data in code data of several versions, 1, ..., w on the upper side of the table are used to represent w types of events in code data of m versions; among them, c mn 'Indicates the quantification result corresponding to the clustering result of the nth index data in the mth version of the code data; g mw indicates the wth event type that occurs during the operation of the mth version of the code data.
  • Step E2 based on the event type information, determine the first probability of occurrence of each type of defect event during the running of the code data.
  • the first probability may be obtained through statistics of event type information that occurs during multiple runs of m versions of code data.
  • the first probability may be the probability P gw of occurrence of the wth event type information g w , which may be determined by formula (6):
  • q is an integer greater than or equal to 1 and less than or equal to m;
  • g qw represents the probability of occurrence of the wth event type in the qth version, for example, when the qth version appears in the When there are w types of events, the value of g qw is 1, otherwise it is 0.
  • Step E3 based on the first data set to the nth data set, determine the second probability of occurrence of the quantitative result corresponding to the clustering result of each index data in the m versions of the code data.
  • the second probability can be obtained by counting the occurrence of index data of m versions of code data; for example, among the m versions of code data, the aggregation of nth index data
  • the second probability of the quantitative result corresponding to the class result It can be determined by formula (7):
  • K' n ⁇ c 1 ' n ,c' 2n ,...,c' qn ,... ,c' mn ⁇ ;
  • c' qn represents the quantitative result corresponding to the index data clustering result of the qth version of the code data.
  • the probability of any type of index data appearing in any version of code data is independent of each other.
  • Step E4 Determine the third probability based on the event type information and the first to nth data sets.
  • the third probability is the conditional probability of occurrence of the quantitative result corresponding to the clustering result of at least one index data when any type of defect event occurs in the m versions of the code data.
  • the third probability may be a combination of the occurrence of multiple types of defect events in m versions of code data and the occurrence of quantitative results corresponding to the clustering results of at least one indicator data obtained from statistics.
  • Step E5 based on the first probability, the second probability and the third probability, determine the defect probability of the code data of the m+1th version.
  • the defect probability of the m+1th version of the code data may be determined by calculating the first probability, the second probability and the third probability by means of probability theory in statistics.
  • the defect probability of the m+1th version of the code data can be realized in the following manner: make statistics on the event type information and the first to nth data sets, and determine the event type information , the association relationship with different index data, and the change trend of the above association relationship with the version number of the code data, and then determine the defect probability of the m+1th version of the code data according to the above association relationship and change trend.
  • the defect probability of the m+1th version of the code data is the first to nth data sets obtained after clustering and quantification of the index data of at least one version of the code data
  • the event type information with universal significance in the code data is determined as the data basis, that is to say, in the process of determining the defect state of the m+1th version of the code data, the occurrence of any type of defect event, and At least one kind of correlation between the appearance of index data, so that the defect probability of the code data of the m+1th version can reflect the probability of the m+1th version more finely from the causal relationship between the index data and the defect event.
  • the quality status of the code data is the first to nth data sets obtained after clustering and quantification of the index data of at least one version of the code data
  • the event type information with universal significance in the code data is determined as the data basis, that is to say, in the process of determining the defect state of the m+1th version of the code data, the occurrence of any type of defect event, and At least one
  • the defect probability of the code data of the m+1th version is quantified according to the event type information in the code data of the m versions and the clustering results of at least one index data The results are calculated. Therefore, the defect probability of the code data of the m+1th version calculated through the above steps can objectively and comprehensively reflect the relationship between event type information and at least one indicator data. .
  • determining the defect probability of the code data of the m+1th version can be achieved in the following manner:
  • a defect probability of obtaining the m+1th version of the code data is determined.
  • P 1 is the first probability
  • P 2 is the second probability
  • P 3 is the third probability
  • P s is the defect probability of the code data of the m+1th version.
  • P 1 may be the first probability, that is, the P 2 can be the second probability, that is, the P 3 may be P(K n'
  • (K 1 ',K' 2 ,...,K' n )) is P s , which can be calculated by formula (9):
  • the method for determining the code defect state provided by the embodiment of the present application can efficiently determine the defect probability of the code data of the m+1th version, and, because of the defect probability of the code data of the m+1th version, It is determined by calculating the indicator data and event type information of the code data of the first m versions through NBC. Therefore, the defect probability of the code data of the m+1th version can more objectively and accurately reflect the mth +1 version of the actual state of the code data.
  • the defect probability of the m+1th version of the code data is less than the probability threshold, it can be determined that the m+1th version of the code data is in a stable state and can be released directly.
  • the probability threshold may be determined according to the stage or state of the code data, or may be determined according to the expected state of code data release, which is not limited in the embodiment of the present application.
  • the probability threshold may be 50%.
  • the processing operation performed on the code data of the mth version may be determined based on the defect probability of the code data of the m+1th version; for example, for the code data of the mth version
  • Data processing operations can include review and walkthrough, for example, for DB event types, you can execute the review change script sqlscan scan results, check whether indexes are used in condition fields, explain execution plans, etc.
  • the defect probability of the code data of the m+1th version is determined, which can also be realized through Fig. 9, which is provided in the embodiment of the present application to determine the Another schematic flow diagram of the defect probability of m+1 versions of code data, as shown in FIG. 9 , the flow may include steps F1 to F4:
  • Step F1 acquiring event type information.
  • the event type information indicates the type information of at least one defect event that occurs during the running of any version of the code data.
  • Step F2 based on the first to nth data sets and event type information, train the decision tree model to obtain a trained decision tree model.
  • the first to nth data sets and event type information can be divided into training sample sets and test sample sets, and the decision tree model is trained through the training sample sets to obtain training results and pass the test The sample set tests the training results.
  • the decision tree model is continuously trained based on the training sample set until the decision tree model.
  • the training process of the decision tree model is ended to obtain the trained decision tree model.
  • the decision tree model may be a composite tree model, such as an XGBoost model.
  • Step F3 obtaining the quantification result corresponding to the clustering result of at least one index data corresponding to the code data of the m+1th version.
  • the quantification result corresponding to the clustering result of at least one index data corresponding to the code data of the m+1th version can be obtained by the same method as the foregoing embodiment, which will not be repeated here.
  • Step F4 based on the trained decision tree model, process the quantification results corresponding to the clustering results of at least one indicator data corresponding to the m+1th version of code data and event type information, and determine the m+1th version Defect probability for versioned code data.
  • the quantization results corresponding to the clustering results of at least one indicator data corresponding to the m+1th version of the code data and the event type information can be input into the trained decision tree model for processing, so that The defect probability of the code data of the m+1th version can be determined.
  • the training process of the decision tree model requires a large amount of calculations, and overfitting may occur.
  • the probability of occurrence of the index data is independent of each other, the first decision tree model calculated by the training is completed.
  • the accuracy of the defect probability of the code data of the m+1 version may be lower than the accuracy of the defect probability of the code data of the m+1th version calculated by the NBC method.
  • the method for determining the code defect state provided by the embodiment of the present application fully uses the data characteristics of at least one index data of the m version data when determining the defect probability of the m+1th version of the code data, Moreover, by clustering at least one index data, the distribution status of various index data can be obtained, so that it can be accurately and objectively determined when at least one index data is randomly distributed.
  • the defect probability of the code data of the m+1th version therefore, the method for determining the code defect state provided by the embodiment of the present application can also reduce the difficulty of determining the defect probability of the code data and improve the flexibility of determining the defect state of the code data .
  • FIG. 10 is another schematic flowchart of a method for determining a code defect state provided by an embodiment of the present application. As shown in Figure 10, the method may include the following steps:
  • Step 301 acquire at least index data from the R&D quality platform (Quality Management System, QMS).
  • QMS Quality Management System
  • the R&D quality platform can rely on DPMS, CI, and test platform, etc., to at least collect and provide index data; exemplary, for the code data of the first m versions, index data and event type information can be obtained from the QMS.
  • Step 302 cleaning historical data.
  • a common data cleaning method may be used to remove redundant data from historical data.
  • Step 303 acquiring iterative operation data.
  • the iterative operation data may include code data that has been switched to a delivery state and continued to operate for a period of time, and defect event data generated during operation.
  • Step 304 acquire event tags.
  • the event tag is equivalent to the event type information in the foregoing embodiments.
  • the event tags here may include tags of all events generated during the operation of the code data, and may also include tags of some events generated during the operation of the code data.
  • Step 305 acquiring iteration process data.
  • the iterative process data may include original indicator data in at least one version of the code data.
  • Step 306 determining index data.
  • the index data here can be determined by processing the original index data using the methods provided in the foregoing embodiments, and several types of index data can also be selected from at least one type of index data.
  • Step 307 the descriptive statistics analysis automatically obtains the initial centroid.
  • the initial centroid here may be the initial centroid data in the foregoing embodiments.
  • the descriptive statistical analysis may use several key data to describe the overall situation of the indicator data.
  • step 308 may be performed; if the indicator data belongs to the code data of the new version released iteratively, step 314 may be performed.
  • Step 308 calculating the first probability, the second probability and the third probability.
  • Step 309 calculating the defect probability of the new version.
  • the defect probability of the new version here may be the defect probability of the code data of the m+1th version obtained based on the calculation of the first probability, the second probability and the third probability by NBC.
  • Step 310 judging whether the defect probability is greater than 50%.
  • 50% here may be the probability threshold described in the foregoing embodiments.
  • step 311 can be performed; otherwise, step 312 can be performed.
  • Step 311 pertinently review and review and process the code data.
  • step 312 can be executed.
  • Step 312 iteratively publishing the new version of the indicator data.
  • the new version of the indicator data can be obtained from the QMS platform. And through steps 305 to 307, the new version of the index data is analyzed, the initial centroid is determined, and clustering is performed.
  • Step 313 acquiring new version of index data.
  • Step 314 generate index scores and calculate project health scores.
  • the project health score here may be the quality score of the code data of the m+1th version in the foregoing embodiments.
  • Step 315 judging whether the health score is less than the target value.
  • the target value here is equivalent to the scoring threshold in the foregoing embodiments. If the health score is less than the target value, go to step 316; otherwise, go to step 309.
  • Step 316 check the non-standard code data to form a new version of the code data.
  • step 309 may also be executed.
  • the code defect state determination method provided by the embodiment of the present application can make full use of a series of index data and event type tags related to the code data version obtained from the QMS platform, and can also rely on clustering algorithms and probability Statistical analysis is carried out on the above-mentioned various data, so that the correlation relationship between various index data and event type labels can be obtained, and the basis for accurate and flexible determination of the defect status of the new version of code data is laid.
  • FIG. 11 is a schematic structural diagram of an apparatus 4 for determining a code defect state provided by an embodiment of the present application. As shown in Figure 11, the device includes:
  • the first determination module 401 is configured to determine at least one index data of the project data; wherein, the project data includes at least one version of code data that realizes the project function; the index data includes quality defect data of the project data;
  • the processing module 402 is configured to perform clustering processing on at least one index data to obtain a clustering result
  • the second determination module 403 is configured to determine the defect state of the code data based on the clustering result.
  • the processing module 402 is configured to analyze each type of index data in at least one type of index data, and determine the initial centroid data of each type of index data; wherein, the initial centroid data includes the maximum value of each type of index data. at least two of value, minimum, mean, mode, and median;
  • the processing module 402 is further configured to perform clustering processing on each index data based on the initial centroid data to obtain a clustering result.
  • the processing module 402 is configured to determine the first distance information between any index data in each index data and each data of the initial centroid data; based on each first distance information, for each index The data is clustered to obtain an intermediate clustering result; based on the intermediate clustering result, the first distance information is updated;
  • the processing module 402 is configured to perform clustering processing on each index data based on each first distance information when the sum of squared errors of each first distance information does not converge, to obtain an intermediate clustering result; based on the intermediate clustering The clustering results are updated to update the first distance information; it is also configured to complete the clustering process and obtain the clustering result when the sum of squared errors of each first distance information converges.
  • the first determination module 401 is configured to obtain at least two original index data of any type of project data
  • the first determination module 401 is also configured to determine the weight information corresponding to each original index data in at least two original index data of any type; based on the weight information, perform weighting processing on each original index data, and determine the nth index data.
  • At least one index data includes n types of index data
  • the clustering results include the first result to the nth result; when i ranges from 1 to n, the i-th result is the clustering of the i-th index data Result; project data, including code data from version 1 to version m; defect status of code data, defect probability of code data of version m+1; defect probability of code data of version m+1, including When the quantification result corresponding to the clustering result of at least one indicator data appears, the probability of any type of defect event in the m+1th version of the code data; where, n is an integer greater than or equal to 1; m is an integer greater than or equal to 2;
  • the second determination module 403 is configured to perform statistics on the clustering results and determine quantization interval information; quantify the first to nth results based on the quantization interval information to obtain the first to nth data sets; when n is greater than 1 In the case of , based on the first data set to the nth data set, determine the defect probability of the m+1th version of the code data; wherein, the quantization interval information indicates the relationship between the index data and the centroid in the first result to the nth result The interval distribution information of the distance between them.
  • the second determination module 403 is configured to analyze the first to nth results, and determine the relationship between any data in each cluster in the first to nth results and the first to nth results The second distance information between particles; based on the quantization interval information and the second distance information, quantify the data in the first result to the nth result to obtain the first data set to the nth data set.
  • the second determination module 403 is configured to obtain the quantification result corresponding to the clustering result of the indicator data of the code data of the mth version from the first data set to the nth data set; based on the mth Determine the quality score of the m-th version of the code data according to the quantification result corresponding to the clustering result of the index data of the code data of the version; if the quality score is greater than the scoring threshold, based on the first data set to the n-th data set, determine Defect probability of the m+1th version of code data.
  • the second determination module 403 is configured to obtain event type information; wherein, the event type information represents the type information of at least one defect event occurring during the running of any version of code data;
  • the second determination module 403 is configured to determine the first probability of occurrence of each type of defect event during the operation of the code data based on the event type information; determine m versions of the code based on the first data set to the nth data set The second probability of the quantitative result corresponding to the clustering result of each indicator data in the data; the third probability is determined based on the event type information and the first data set to the nth data set; where the third probability is m In the case of any type of defect event in the code data of the version, the conditional probability of the quantitative result corresponding to the clustering result of at least one indicator data; based on the first probability, the second probability and the third probability, determine the mth Defect probability states for +1 version of code data.
  • the second determination module 403 is configured to pass Determining the defect probability of obtaining the m+1th version of the code data; wherein, P 1 is the first probability; P 2 is the second probability; P 3 is the third probability; P s is the m+1th version The defect probability of the code data.
  • the second determination module 403 is configured to obtain event type information; wherein, the event type information represents the type information of at least one defect event that occurs during the running of any version of code data; based on the first data Collect the nth data set and event type information, train the decision tree model, and obtain the trained decision tree model; obtain the quantification corresponding to the clustering result of at least one indicator data corresponding to the code data of the m+1th version Result: Based on the trained decision tree model, process the quantitative results corresponding to the clustering results of at least one indicator data corresponding to the code data of the m+1th version and event type information, and determine the m+1th version The defect probability of the code data.
  • the first determination module 501, the processing module 502, and the second determination module 503 can be realized by a processor in an electronic device, and the above-mentioned processor can be ASIC, DSP, DSPD, PLD, FPGA, At least one of CPU, controller, microcontroller, microprocessor.
  • FIG. 12 is a schematic structural diagram of the electronic device 5 provided in the embodiment of the present application.
  • the electronic device 5 may include a memory 501 and a processing device 502; wherein:
  • memory 501 configured to store executable instructions
  • the processor 502 is configured to implement the method for determining a defect state as in any preceding embodiment when executing the executable instructions stored in the memory 502 .
  • the above-mentioned processor 502 may be at least one of application-specific integrated circuits ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. It can be understood that the electronic device used to implement the above processor function may also be other, which is not specifically limited in this embodiment of the present invention.
  • memory 501 can be volatile memory (volatile memory), such as RAM; Or non-volatile memory (non-volatile memory), such as ROM, flash memory (flash memory, hard disk (Hard Disk Drive, HDD) or Solid-State Drive (SSD); or a combination of the above types of memory, and provide instructions and data to the processor.
  • volatile memory such as RAM
  • non-volatile memory such as ROM, flash memory (flash memory, hard disk (Hard Disk Drive, HDD) or Solid-State Drive (SSD); or a combination of the above types of memory, and provide instructions and data to the processor.
  • the embodiments of the present application also provide a computer-readable storage medium, in which executable instructions are stored, and when the executable instructions are executed by a processor, it can realize the The code defect status determination method described above.
  • an embodiment of the present application further provides a computer program, the computer program including computer readable codes, and when the above computer readable codes are run in an electronic device, the processor of the electronic device is configured to implement the above-mentioned A code defect state determination method described in an embodiment.
  • the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. Wait.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
  • the embodiment of the present application discloses a method, device, device, medium and program for determining a code defect state.
  • the method includes: determining at least one index data of project data; wherein, the project data includes at least one A version of code data; the index data, including the quality defect data of the project data; performing clustering processing on the at least one index data to obtain a clustering result; based on the clustering result, determining the code The defect state of the data.
  • the code defect state determination method provided in this application can realize the evaluation of the state and quality of any code data, and thus can be applied in a wider range of scenarios.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

本申请提供了一种代码缺陷状态确定方法、装置、设备、介质及程序,方法包括:确定项目数据的至少一种指标数据(101);其中,项目数据,包括实现项目功能的至少一个版本的代码数据;指标数据,包括项目数据的质量缺陷数据;对至少一种指标数据进行聚类处理,得到聚类结果(102);基于聚类结果,确定代码数据的缺陷状态(103)。通过本申请提供的代码缺陷状态确定发方法,能够实现对任意代码数据状态和质量的评估,因而能够在更广泛的场景中得以应用。

Description

代码缺陷状态确定方法、装置、设备、介质及程序
相关申请的交叉引用
本申请基于申请号为202110661540.4、申请日为2021年6月15日、申请人为深圳前海微众银行股份有限公司、申请名称为“一种代码缺陷状态确定方法、装置、电子设备及介质”的中国专利申请提出,并要求上述中国专利申请的优先权,上述中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及金融科技(Fintech)的信息技术,具体而言,涉及但不限于一种代码缺陷确定方法、装置、设备、介质及程序。
背景技术
随着计算机技术的发展,越来越多的技术应用在金融领域,传统金融业正在逐步向科技金融转变,但由于金融行业的安全性、实时性要求,也对技术提出了更高的要求。
在实际应用中,金融行业中各种金融业务的稳定执行以及金融数据安全的有力保障,都需要通过健壮的代码数据才能实现,因此,对代码数据的缺陷状态或质量的精准评估就显得非常重要。
在相关技术中,对代码数据的缺陷状态或质量的评估,只能在代码数据的指标对应的特征值满足正态分布的条件下才能开展,然而,这种评估方法,对代码数据的指标对应的特征值分布要求较高,因此无法实现对任意代码数据的缺陷状态的评估和确定。
发明内容
本申请实施例提供了一种代码缺陷状态确定方法、装置、设备、介质及程序。本申请实施例提供的代码缺陷状态确定方法,在代码数据的至少一种指标数据为任意分布的情况下,都能够确定代码数据的缺陷状态,从而实现了对任意代码数据缺陷状态的灵活评估。
本申请实施例提供的技术方案是这样实现的:
一种代码缺陷状态确定方法,所述方法由电子设备执行;所述方法包括:
确定项目数据的至少一种指标数据;其中,所述项目数据,包括实现项目功能的至少一个版本的代码数据;所述指标数据,包括所述项目数据的质量缺陷数据;
对所述至少一种指标数据进行聚类处理,得到聚类结果;
基于所述聚类结果,确定所述代码数据的缺陷状态。
本申请实施例还提供了一种代码缺陷状态确定装置,所述装置包括:
第一确定模块,配置为确定项目数据的至少一种指标数据;其中,所述项目数据,包括实现项目功能的至少一个版本的代码数据;所述指标数据,包括所述项目数据的质量缺陷数据;
处理模块,配置为对所述至少一种指标数据进行聚类处理,得到聚类结果;
第二确定模块,配置为基于所述聚类结果,确定所述代码数据的缺陷状态。
本申请实施例还提供了一种电子设备,所述电子设备包括:
存储器,配置为存储可执行指令;
处理器,配置为执行所述存储器中存储的所述可执行指令时,实现如前任一项所述的代码缺陷状态确定方法。
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有可执行指令,所述可执行指令被处理器执行时,能够实现如前任一项所述的代码缺陷状态确定方法。
在本申请实施例中,在确定包括至少一个版本的代码数据的项目数据的至少一种指标数据之后,在至少一种指标数据的分布为任意随机分布状态的情况下,对至少一种指标数据进行聚类处理,得到的聚类结果,依然可以客观而全面的反应出不同的指标数据之间、不同版本的相同指标数据之间的分布状态,在此基础上,基于聚类结果确定的代码数据的缺陷状态,可以客观而精准的反映出项目数据 中缺陷的实际分布状态、以及不同版本的代码数据中缺陷的变化趋势,从而能够客观而全面的反映出至少一个版本的代码数据的实际缺陷状态;并且,由于本申请实施例提供的代码缺陷状态确定方法对至少一种指标数据的分布并未限制,因此,本申请实施例提供的代码缺陷状态确定方法,能够实现对任意代码数据的缺陷状态的评估,从而能够在更广泛的场景中得以应用。
为使本申请的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
图1为本申请实施例提供的代码缺陷状态确定方法的流程示意图;
图2为本申请实施例提供的确定项目数据的至少一种指标数据的流程示意图;
图3为本申请实施例提供的对至少一种指标数据进行聚类处理得到聚类结果的流程示意图;
图4为本申请实施例提供的得到聚类结果的流程示意图;
图5为本申请实施例提供的确定代码数据的缺陷状态的流程示意图;
图6为本申请实施例提供过的得到第1数据集合至第n数据集合的流程示意图;
图7为本申请实施例提供的确定第m+1个版本的代码数据的缺陷概率的流程示意图;
图8为本申请实施例提供的确定第m+1个版本的代码数据的缺陷概率的另一流程示意图;
图9为本申请实施例提供的确定第m+1个版本的代码数据的缺陷概率的又一流程示意图
图10为本申请实施例提供的代码缺陷状态确定方法的又一流程示意图;
图11为本申请实施例提供的代码缺陷状态确定装置的结构示意图;
图12为本申请实施例提供的电子设备的结构示意图。
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本申请的实施例,并与说明书一起用于说明本申请的技术方案。应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它相关的附图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请的选定实施例。基于本申请的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其它实施例,都属于本申请保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
本文中术语“和/或”,仅仅是描述一种关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
例如,本申请实施例提供的代码缺陷状态确定方法包含了一系列的步骤,但是本申请实施例提供的代码缺陷状态确定方法不限于所记载的步骤,同样地,本申请实施例提供的代码缺陷状态确定装置包括了一系列模块,但是本申请实施例提供的装置不限于包括所明确记载的模块,还可以包括为获取相关信息、或基于信息进行处理时所需要设置的模块。
在实际应用中,金融行业中各种金融业务的稳定执行、金融数据安全的有力保障,都需要通过健壮的项目数据才能实现,因此,对代码数据的缺陷状态的精准评估就显得异常重要。
在相关技术中,只有在代码数据的指标对应的特征值满足正态分布的情况下,才能对代码数据的 缺陷状态进行评估,而这样的评估方式,对代码数据的指标对应的特征值要求较高,因此,无法实现对任意代码数据的缺陷状态进行精准评估。
在相关技术中,还存在通过项目专家对包括至少一个版本的代码数据的项目数据的质量进行评估的方案。然而,上述方案虽然可以充分借鉴项目专家的项目经验,由于项目数据的变化速度快、涉及的数据维度多,因此,依赖于项目专家有限的项目经验评估得到的评估结果的粒度较大;与此同时,在项目专家进行项目数据评估的过程中,不可避免的会引入一些主观因素,因此,通过以上评估方式得到的评估结果的公允性和稳定性不足。
综上,相关技术中代码数据的质量状态评估,存在限制因素较多无法推广、评估方法灵活性差且评估结果客观性不足的问题。
基于以上问题,本申请实施例提供了一种代码缺陷状态确定方法,该方法可以由电子设备执行。
需要说明的是,上述电子设备,可以包括终端和/或服务器,这里,终端可以是瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、可编程消费电子产品、网络个人电脑、小型计算机系统,等等。服务器可以是小型计算机系统、大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
服务器等电子设备可以包括用于执行计算机指令的程序模块。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
示例性的,本申请实施例提供的代码缺陷状态确定方法,可以由以上任一电子设备的处理器实现,上述处理器可以为特定用途集成电路(Application Specific Integrated Circuit,ASIC)、数字信号处理器(Digital Signal Processor,DSP)、数字信号处理装置(Digital Signal Processing Device,DSPD)、可编程逻辑装置(Programmable Logic Device,PLD)、现场可编程逻辑门阵列(Field Programmable Gate Array,FPGA)、中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器中的至少一种。可以理解地,实现上述处理器功能的电子器件还可以为其它,本申请实施例不作限制。
图1为本申请实施例提供的代码缺陷状态确定方法的流程示意图。如图1所示,该方法可以包括步骤101至步骤103:
步骤101、确定项目数据的至少一种指标数据。
其中,项目数据,包括实现项目功能的至少一个版本的代码数据;指标数据,包括项目数据的质量缺陷数据。
在本申请实施例中,代码数据,可以包括以任意一种编程语言编写的数据。示例性的,编程语言,可以包括高级编程语言,比如Java语言、C++语言、C语言等;示例性的,编程语言,还可以包括低级编程语言,比如汇编语言;示例性的,编程语言,还可以包括脚本语言、硬件描述语言(Very High Speed Integrated Circuit Hardware Description Language,VHDL)中的任一种语言。
在本申请实施例中,代码数据,可以是源代码数据;示例性的,代码数据,还可以是源代码数据经过编译之后得到的可执行代码数据。
在本申请实施例中,项目功能,可以包括页面显示功能、数据上传/下载功能、数据存储功能、数据查询功能、数据传输功能中的至少一种,本申请实施例对此不做限定。
在本申请实施例中,不同版本的代码数据,可以是通过版本编号进行管理和统计的;在代码数据的版本包括至少两个的情况下,第一版本的代码数据与第二版本的代码数据之间的差异,可以包括局部差异;示例性的,代码数据,可以是按照模块划分的,那么,第一版本的代码数据与第二版本的代码数据之间的差异,可以包括至少部分模块代码之间的差异。
在本申请实施例中,项目数据,可以包括处于内部测试阶段的至少一个版本的代码数据;还可以包括已经处于发布状态的至少一个版本的代码数据。在本申请实施例中,项目数据,可以包括处于研发过程的至少一个版本的代码数据,还可以包括交付之后处于运营状态的至少一个版本的代码数据。
在本申请实施例中,质量缺陷数据,可以包括至少一个版本的代码数据运行过程中出现的故障数据;示例性的,质量缺陷数据,可以是通过研发人员内部测试、专业测试人员测试、业务处理过程等至少一种途径,对至少一个版本的代码数据进行处理而确定的;示例性的,质量缺陷数据,还可以包括至少一个版本的代码数据的静态代码扫描bug等。示例性的,质量缺陷数据,可以包括静态代码扫描bug、缺陷密度中的至少一种。
在本申请实施例中,指标数据的类型和/或数量,可以包括项目数据的所有或部分指标数据的类型和/或数量;示例性的,部分指标数据,可以包括项目数据中部分模块的指标数据。
在本申请实施例中,指标数据,可以包括项目数据的至少一种质量缺陷数据;示例性的,在指标数据包括多种质量缺陷数据的情况下,指标数据可以以矩阵的形式体现,示例性的,上述矩阵可以记为C mn,即矩阵的规格为m*n,其中,m以及n均为大于或等于1的整数;m可以对应于代码数据的版本编号,N可以对应于指标数据种类的数量。C mn可以记为C mn={K 1,…,K p,…,K n},其中,K p可以对应于第p种指标数据;K p={c 1p,…,c qp,…,c mp};第n种指标数据可以均体现在m个版本的代码数据中;p和q均为大于1且小于n的整数。
步骤102、对至少一种指标数据进行聚类处理,得到聚类结果。
在本申请实施例中,聚类处理,可以通过通用的聚类方法实现。示例性的,通用的聚类方法为K均值聚类、均值漂移聚类、基于密度的聚类方法等。
在本申请实施例中,对至少一种指标数据进行聚类处理,可以是按照至少一个版本的代码数据的版本编号,对至少一种指标数据进行排序,然后根据排序的结果进行聚类处理的。示例性的,按照代码数据的版本编号对至少一种指标数据排序之后,可以按照至少一个代码数据的版本编号,分别对每个版本的指标数据进行聚类处理;还可以将排序之后的指标数据统一进行聚类处理。
在本申请实施例中,通过对至少一种指标数据进行聚类处理,就可以从随机分布的至少一种指标数据中,提取出至少一种指标数据的分布特性,因此,即使在至少一种指标数据不能满足正态分布的情况下,通过聚类处理,依然可以从至少一种指标数据中客观而精确的提取指标数据的分布状态,从而为项目数据的缺陷状态确定奠定了数据的客观性基础。
步骤103、基于聚类结果,确定代码数据的缺陷状态。
在本申请实施例中,代码数据的缺陷状态,可以包括代码数据所处的状态缺陷的严重程度和/或缺陷的数量,还可以包括代码数据所处的状态具备指定严重程度缺陷的概率高低。
在本申请实施例中,代码数据的缺陷状态,可以是通过对聚类结果的每一簇中指标数据的分布情况确定的;示例性的,指标数据的分布情况,可以表示每一簇中指标数据的数量多少、分布密度等。
在一些实施方式中,代码数据的缺陷状态,可以是根据代码数据的版本编号,对聚类结果的每一簇中指标数据分析处理的需要而确定的,如此,各个版本的代码数据对应的缺陷状态,可以反映出项目数据的缺陷状态在版本编号这个维度的变化趋势,从而能够从整体上客观的展示出至少一个版本的代码数据中缺陷的变化过程。
在本申请实施例中,代码数据的缺陷状态,可以是通过任一版本的代码数据的缺陷状态体现的。
在本申请实施例中,任一版本的代码数据的缺陷状态,可以包括该版本的代码数据在实现至少一种功能时出现缺陷事件或故障的概率、该版本的代码数据的任一模块存在潜在故障的概率、以及该版本的代码数据运行过程中出现指定级别的故障的概率中的至少一种,本申请实施例对此不做限定。
在本申请实施例中,任一版本的代码数据的缺陷状态,可以包括该版本的代码数据在运行过程中预期出现指定严重级别的缺陷的次数、该版本的代码数据在运行过程中出现的故障与指定严重级别的缺陷相关的概率等。
通过以上步骤可以看出,在本申请实施例中,在确定包括至少一个版本的代码数据的项目数据的至少一种指标数据之后,在至少一种指标数据的分布为任意随机分布状态的情况下,对至少一种指标 数据进行聚类处理,得到的聚类结果,依然可以客观而全面的反应出不同种类的指标数据之间、不同版本的相同指标数据之间的分布状态,在此基础上,基于聚类结果确定的代码数据的缺陷状态,可以客观而精准的反映出项目数据中缺陷的实际分布状态、以及不同版本的代码数据中缺陷的变化趋势,从而能够客观而全面的反映出至少一个版本的代码数据的质量变化;并且,由于本申请实施例提供的代码缺陷状态确定方法对至少一种指标数据的分布并未限制,因此,本申请实施例提供的代码缺陷状态确定方法,能够实现对任意代码数据状态和质量的评估,因而能够在更广泛的场景中得以应用。
基于前述实施例,本申请实施例提供的代码缺陷状态确定方法中,确定项目数据的至少一种指标数据,可以通过图2实现,图2为本申请实施例提供的确定项目数据的至少一种指标数据的流程示意图,如图2所示,该流程可以包括步骤1011至步骤1013:
步骤1011、获取项目数据的任一种类的至少两个原始指标数据。
在本申请实施例中,以至少两个原始指标数据包括第一类型指标数据以及第二类型指标数据为例,第一类型指标数据可以为缺陷密度;第二类型的指标数据可以为静态代码扫描bug。相应的,第一类型指标数据中的至少两个原始指标数据,可以包括第一级别的缺陷总数以及第二级别的缺陷总数;第二类型指标数据中的至少两个原始指标数据,可以包括第三级别的bug数量以及第四级别的bug数量。
步骤1012、确定任一种类的至少两个原始指标数据中每一原始指标数据对应的权重信息。
在本申请实施例中,不同的原始指标数据对应的权重信息可以不同。示例性的,原始指标数据对应的权重该信息,可以是与原始指标数据的级别信息对应设置的;示例性的,原始指标数据的级别信息,可以是根据原始指标数据对项目数据的质量或状态的影响程度而确定的,也可以是根据对项目数据的测试目标而确定的,还可以是根据项目数据所处的阶段而确定的。在本申请实施例中,以缺陷密度为例,其级别可以包括L1至L5五个级别;相应的,与上述五个级别对应的权重信息也可以有五个。
步骤1013、基于权重信息,对每一原始指标数据进行加权处理,确定任一种类的指标数据。
在本申请实施例中,第n种指标数据,可以是对第n种类型的至少两个原始指标数据进行加权求和而确定的。
在本申请实施例中,以缺陷密度为例,缺陷密度类型的原始指标数据可以包括L1至L5五个级别的缺陷数量,其中,L1缺陷级别对应的权重可以为1.6;L2缺陷级别对应的权重可以为1.3;L3缺陷级别对应的权重可以为1;L4缺陷级别对应的权重可以为0.7;L5缺陷级别对应的权重可以为0.4;示例性的,基于上述各个缺陷级别及其对应的权重,可以对各个级别的缺陷数量进行加权处理,从而确定该缺陷密度类型的指标数据;示例性的,缺陷密度类型的指标数据,还可以通过式(1)计算确定:
Figure PCTCN2021141249-appb-000001
其中,I为缺陷级别的数量,其取值可以是5;a i为第i缺陷级别的缺陷总数;b i为第i缺陷级
别对应的权重;c为测试用例总数;P mn可以表示第m个版本的代码数据中第n种类型的指标数据即
缺陷密度类型的指标数据。
在本申请实施例中,密度缺陷类型的各个原始指标数据以及测试用例总数,可以是通过Descon项目管理工具(Descon Project Management System,DPMS)获取的;示例性的,通过DPMS,可以获取至少一个代码数据关联的测试用例集合以及缺陷数据。
在本申请实施例中,以静态代码扫描bug为例,可以通过持续集成(Continuous integration,CI)平台中嵌入的sonarqube插件获取。通过上述插件,在代码数据的工程构建时,就可以检查出代码数据对应的源代码中潜在的、或明显的错误。示例性的,可以按照严重程度对上述错误进行级别划分,比如,可以将静态代码扫描bug划分为阻断错误、次要错误、严重错误、提示错误以及主要级别错误 等;示例性的,静态代码扫描bug类型的指标数据,可以通过式(2)计算确定:
Figure PCTCN2021141249-appb-000002
在式(2)中,X j为第j级别的静态代码扫描bug的数量;W j为第j级别的静态代码扫描bug的权重;J为大于1的整数,其用于表示静态代码扫描bug的级别总数;Q mn可以表示第m个版本的代码数据中第n类型的指标数据即静态代码扫描bug类型的指标数据。
以上步骤仅展示了基于两种类型的原始指标数据确定对应类型的指标数据的过程,在实际应用中,对项目数据的原始指标数据的类型还可以包括多种,本申请实施例对此不做限定。
通过以上步骤可以看出,在本申请实施例中,在获取对应类型的至少两个原始指标数据后,还可以确定对应类型的每一级别的原始指标数据对应的权重信息,再基于权重信息对每一原始指标数据进行加权处理,从而确定对应类型的指标数据。也就是说,在本申请实施例中,在获取任一类型的原始指标数据之后,还能够按照原始指标数据的级别对其进行加权处理,如此,加权处理的结果中能够携带原始指标数据的级别信息,在实际的项目分析过程中,通过调整不同级别的原始指标数据的权重信息,能够实现对部分种类原始指标数据的针对性的分析,从而能够改善对代码数据的缺陷状态确定的灵活性。
基于前述实施例,图3为本申请实施例提供的对至少一种指标数据进行聚类处理得到聚类结果的流程示意图。如图3所示,该方法可以包括步骤1021至步骤1022:
步骤1021、对至少一种指标数据中的每种指标数据进行分析,确定每种指标数据的初始质心数据。
其中,初始质心数据,包括每种指标数据的最大值、最小值、平均值、众数以及中位数中的至少两个。
在本申请实施例中,每一版本的代码数据的指标数据的种类数量可以为N种。示例性的,每种指标数据的最大值、最小值、平均值、以及众数的数量可以分别为至少一个。
在本申请实施例中,m个版本的代码数据对应的第n种指标数据的集合,可以记为V n={c 1n,c 2n,…,c pn,…,c mn};其中,c mn为第m个版本的代码数据对应的第n种指标数据集合。相应的,在初始质心数据包括以上五种数据的情况下,初始质心数据可以记为F n={V nmax,V nmin,V nmean,V nmode,V nmedian},其中,V nmax,V nmin,V nmean,V nmode,V nmedian分别为第n种指标数据集合中的最大值、最小值、平均值、众数以及中位数。
在本申请实施例中,第n种指标数据集合的初始质心数据F n中的每一数据,可以具备二维坐标分量,即F nx和F ny
步骤1022、基于初始质心数据,对每种指标数据进行聚类处理,得到聚类结果。
在本申请实施例中,可以首先确定聚类方法,并根据聚类方法,基于初始质心数据对每种指标数据进行聚类处理,从而可以得到聚类结果。示例性的,对不同类型的指标数据可以采用相同的聚类方法,也可以采用不同的聚类方法,本申请实施例对此不做限定。
由以上可知,本申请实施例提供的代码缺陷状态确定方法中,用于确定代码数据的缺陷状态的聚类结果,是对至少一种指标数据聚类得到的,通过聚类处理,能够提取至少一种指标数据所携带的缺陷特性,从而为确定代码数据的缺陷状态提供了数据基础。
基于前述实施例,在本申请实施例中,基于初始质心数据,对每种指标数据进行聚类处理,得到 聚类结果,可以通过图4实现,图4为本申请实施例提供的得到聚类结果的流程示意图,如图4所示,该流程可以包括步骤A1至步骤A4:
步骤A1、确定每种指标数据中任一指标数据与初始质心数据的每一数据之间的第一距离信息。
在本申请实施例中,第一距离信息,可以表示每种指标数据中任一指标数据与初始质心数据中每一数据之间的欧式距离、曼哈顿距离、切比雪夫距离、幂距离中的任一种距离。
在本申请实施例中,第n种类的指标数据集合的第n个初始质心数据中的每一数据,可以具备二维坐标分量,即(F nx,F ny);相应的,每种指标数据中任一指标数据也可以具备二维坐标分量;需要说明的是,初始质心数据所在的二维坐标系统,与任一指标数据所在的二维坐标系统可以是相同的;若二者所处的二维坐标系统不同,则需要按照两种二维坐标系统的比例进行换算。示例性的,c mn的二维坐标分量可以为(c mnx,c mny)。
在本申请实施例中,在确定第n个初始质心数据的二维坐标分量以及任一指标数据的二维坐标分量、且第一距离信息为欧式距离的条件下,第m个版本的代码数据对应的第n个指标数据、与对应的初始质心数据之间的第一距离信息D(c mn,F na)可以通过式(3)确定:
Figure PCTCN2021141249-appb-000003
在式(3)中,a为大于或等于1且小于或等于A的整数,其中,A为初始质心数据中数据的数量,A为大于或等于2且小于或等于5的整数;F na为第n个初始质心数据F n中第a个质心数据;(F nax,F nay)分别为F na的二维坐标分量。
步骤A2、基于每一第一距离信息,对每种指标数据进行聚类处理,得到中间聚类结果。
在本申请实施例中,中间聚类结果可以是通过以下方式得到的:对第n类指标数据对应的每一第一距离信息进行判断,将第一距离信息小于距离阈值的指标数据进行归类,并将这些指标数据划分为一簇,划分得到的簇即为中间聚类结果。示例性的,在确定聚类结果之前,可以反复执行上述操作,如此可以得到多个中间聚类结果,每一中间聚类结果,可以包括至少一个簇。示例性的,随着聚类过程的进行,每一中间聚类结果产生之后,簇中指标数据的数量、簇的质心也会动态发生变化。
步骤A3、基于中间聚类结果,更新第一距离信息。
如前述实施例所述,随着聚类过程的执行,第t中间聚类结果产生之后,第t中间聚类结果的第w簇中所包含的指标数据的数量,与第t-1中间聚类结果的第w簇中所包含的指标数据的数量,可以不同,因此,第w簇的质心在从第t-1中间聚类结果到第t中间聚类结果发生了变化,如此,在聚类过程中,第w簇中任一指标数据与质心之间的距离即第一距离信息,会随着聚类过程的推进而发生变化,因此,可以基于中间聚类结果,更新第一距离信息。其中,t为大于或等于1的整数,w为大于或等于0的整数。
步骤A4、在每一第一距离信息的误差平方和不收敛的情况下,持续执行步骤A2至步骤A3;在每一第一距离信息的误差平方和收敛的情况下,完成聚类处理,得到聚类结果。
在本申请实施例中,若每一第一距离信息的误差平方和不收敛,则需要基于中间聚类结果的质心数据,更新第一距离信息,并根据中间聚类结果的质心数据以及第一距离信息再次执行聚类结果,直至每一第一距离信息的误差平方和收敛为止。
在本申请实施例中,每一第一距离信息的误差平方和收敛,可以表示每一第一距离信息的误差平方和较为稳定,无显著变化。在本申请实施例中,m个版本的代码数据的第n种指标数据对应的每一 第一距离信息的误差平方和S n,可以通过式(4)计算得到:
Figure PCTCN2021141249-appb-000004
其中,b为大于1的整数,c bn为第b个版本的代码数据中第n个指标数据。
在本申请实施例中,随着聚类过程的推进,若S n的变化量小于预设阈值,则可以确定聚类过程结束。在本申请实施例中,在对指标数据聚类的过程中,对聚类的结束条件进行了严格控制,从而能够改善聚类的效果。
在本申请实施例中,至少一种指标数据包括n种指标数据,聚类结果包括第1结果至第n结果;在i取1至n时,第i结果,为第i种指标数据的聚类结果;项目数据,包括第1版本至第m版本的代码数据;代码数据的缺陷状态,包括第m+1个版本的代码数据的缺陷概率;第m+1个版本的代码数据的缺陷概率,包括至少一种指标数据的聚类结果对应的量化结果出现的情况下,第m+1个版本的代码数据出现中任一类型的缺陷事件的概率;n为大于或等于1的整数;m为大于或等于2的整数。
在本申请实施例中,第n种指标数据中,可以包括多个指标数据;相应的,第n结果中,可以包括至少一个簇。
示例性的,基于聚类结果,确定项目数据的缺陷状态,可以通过图5所示的流程实现,图5为本申请实施例提供的确定代码数据的缺陷状态的流程示意图,如图5所示,该流程可以包括步骤B1至步骤B3:
步骤B1、对聚类结果进行统计,确定量化区间信息。
其中,量化区间信息,表示第1结果至第n结果中指标数据与质心之间距离的区间分布信息。
在本申请实施例中,对聚类结果进行统计,表示对包含m个版本的代码数据的指标数据对应的聚类结果进行统计,也就是说,量化区间信息,是基于m个版本的代码数据的指标数据进行的,由此确定的量化区间信息,一方面能够静态的体现代码数据本身的缺陷类型,另一方面,还能够动态的反映出代码数据中指标数据随着版本编号的变化趋势。因此,通过上述方式确定的量化区间信息,能够更客观的反应m个版本的代码数据的指标数据的客观分布状态。
在一种实施方式中,量化区间信息中携带的区间信息、以及量化区间信息中携带的区间信息的数量,可以是对所有指标数据与所在的簇的质心之间的距离进行统计而确定的。
在本申请实施例中,量化区间信息中,可以包含至少一个区间信息的区间集合信息;示例性的,不同种类的指标数据对应的量化区间信息,可以是不同的,也可以是相同的,本申请实施例对此不做限定。
在本申请实施例中,量化区间信息中任一区间信息所覆盖的距离信息,可以是根据至少一个版本的代码数据所处的阶段和/或状态而确定的,比如,在研发人员自测代码数据的阶段,量化区间信息可以为第一区间信息;而在代码数据交付测试人员后,量化区间信息可以为第二区间信息;在代码数据发布切换至运营状态的情况下,量化区间信息可以为第三区间信息。
在本申请实施例中,量化区间信息中任一区间信息所覆盖的距离信息,可以是根据代码数据本身实现的功能和/或代码数据的特征而确定的。比如,代码数据主要用于实现底层功能,则与底层功能对应的量化区间信息中任一区间信息所覆盖的距离信息,可以与上层功能对应的量化区间信息中任一区间信息所覆盖的距离信息不同。
步骤B2、基于量化区间信息对第1结果至第n结果进行量化,得到第1数据集合至第n数据集合。
在本申请实施例中,可以根据量化区间信息,对第1结果至第n结果的每一簇中的指标数据的坐标信息进行量化,从而得到第1数据集合至第n数据集合。
在本申请实施例中,量化区间信息中所包含的量化区间的数量,可以根据指标数据灵活调整;示例性的,根据量化区间信息中所包含的量化区间的数量,可以确定每一数据集合中数据的取值范围。比如,在量化区间信息中包含五个量化区间的情况下,第n结果对应的第n数据集合中,可以根据每一量化结果落入的量化区间的编号,确定对应的量化结果,比如,第n结果中的任一结果落入第一量化区间中,则其所对应的量化结果可以取值为1;相应的,在量化区间信息中包含五个量化区间的情况下,第n数据集合中每一数据的取值范围可以为[1,5]。
在本申请实施例中,对第1结果至第n结果的量化,是基于量化区间信息进行的,而量化区间信息,可以根据代码数据本身的状态和/或特征而确定,因此,在对第1结果至第n结果量化的过程中,充分考虑了代码数据本身的特性,从而使得量化结果能够更贴合代码数据的指标数据本身的分布特征。
步骤B3、在n大于1的情况下,基于第1数据集合至第n数据集合,确定第m+1个版本的代码数据的缺陷概率。
在本申请实施例中,第m+1个版本的代码数据的缺陷概率,可以表示至少一种指标数据出现的情况下,第m+1个版本的代码数据中出现任一类型的缺陷事件的出现的条件概率。也就是说,第m+1个版本的代码数据的缺陷概率,不仅仅用于评估第m+1个版本的代码数据出现缺陷事件的概率,而且还能将代码数据的指标特征与代码数据可能出现的缺陷事件关联起来,因此,第m+1个版本的代码数据的缺陷概率,从一定程度上能够体现至少一种指标数据对任一缺陷事件的影响。
在本申请实施例中,第m+1个版本的代码数据的缺陷概率,可以是基于m个版本的代码数据聚类量化后得到的第1数据集合至第n数据集合,对第m+1个版本的代码数据的状态的预测结果,如此,在第m+1个版本发布或交付之前,就可以对第m+1个版本的代码数据的缺陷概率进行预测,便于预先介入干预,降低代码数据运营过程中产生缺陷的概率。
在相关技术中,无论是依靠专家的项目经验对项目数据的质量或状态进行评估,还是在指标数据的分布满足正态分布的情况下对项目数据的质量或状态进行评估,得到的评估结果,都只能从整体上反应项目数据的质量或状态,而无法获取在至少一种或若干种指标数据出现的情况下,某一版本的代码数据中出现任一类型的缺陷事件出现的概率,因此无法从更细的粒度上评估项目数据或代码数据的状态和质量。而本申请实施例提供的代码缺陷状态确定方法所确定的代码数据的缺陷概率,不仅能够获取在至少一种或若干种指标数据出现的情况下,某一版本的代码数据中出现任一类型的缺陷事件出现的概率,而且能够基于m个版本的代码数据预测第m+1个版本的代码数据的缺陷概率,因此,本申请实施例提供的代码缺陷状态确定方法,不仅能够实现更细粒度的代码数据质量或状态确定,而且能够预测下一版本的代码数据的缺陷概率。
在本申请实施例中,基于量化区间信息对第1结果至第n结果进行量化,得到第1数据集合至第n数据集合,可以通过图6所示的流程实现,图6为本申请实施例提供过的得到第1数据集合至第n数据集合的流程示意图,如图6所示,该流程可以包括步骤C1至步骤C2:
步骤C1、对第1结果至第n结果进行分析,确定第1结果至第n结果中任一数据与第1结果至第n结果的质心之间的第二距离信息。
在本申请实施例中,第n结果可以包含至少一个簇,以第n结果中包含K个簇为例,其中第k簇中可以包含n k个数据;其中,K可以为大于1的整数;k为大于或等于1且小于K的整数;n k可以为大于1的整数。第n结果的第k个簇中的每一数据,可以在二维坐标系中表示;相应的,第n结果中的第k个簇对应的第二距离信息,可以以欧式距离的形式体现。
步骤C2、基于量化区间信息以及第二距离信息,对第1结果至第n结果中的数据进行量化,得到第1数据集合至第n数据集合。
在本申请实施例中,以第n结果中的第k个簇中的任一数据为例,对第1结果至第n结果中的数据进行量化进行说明,记第n结果中的第k个簇中的任一数据为C mn,其与第n结果中的第k个簇的 质心之间的距离为d,此时可以根据d与量化区间信息之间的匹配程度,对C mn进行量化。示例性的,若d大于第一量化区间的最小值且小于或等于第一量化区间的最大值,则C mn对应的量化结果可以为1;若d大于第一量化区间的最大值且小于或等于第二量化区间的最小值,则C mn对应的量化结果可以为2;若d大于第二量化区间的最大值且小于或等于第三量化区间的最小值,则C mn对应的量化结果可以为3;若d大于第三量化区间的最大值且小于第四量化区间的最小值,则C mn对应的量化结果可以为4;若d大于第四量化区间的最大值,则C mn对应的量化结果可以为5。并且,第一至第五量化区间所覆盖的区间范围相邻,且满足依次递增的关系。通过以上处理得到的对任一结果量化的数据集合中,任一数据的取值范围为[1,5]。
在本申请实施例中,可以将第1数据集合至第n数据集合以矩阵的形式体现,示例性的,可以将该矩阵记录为C' mn,其中的每一元素可以记为c' mn,且其取值范围为c' mn∈{1,5}。
由以上可以看出,通过对指标数据进行聚类处理,可以从随机分布的指标数据中提取出指标数据的分布特性,而对聚类结果进行量化处理,可以将随机大小的指标数据量化至有限取值范围的数据集合,这样就可以减少后续在计算代码数据的缺陷状态的运算量;并且,由于对指标数据的量化是基于量化区间信息进行的,因此量化之后得到的第1数据集合至第n数据集合中,依然携带有指标数据的指标特性,从而能够提高后续代码数据的缺陷状态的精确程度。
在本申请实施例中,在n大于1的情况下,基于第1数据集合至第n数据集合,确定第m+1个版本的代码数据的缺陷概率,还可以通过图7所示的流程实现,图7为本申请实施例提供的确定第m+1个版本的代码数据的缺陷概率的流程示意图,如图7所示,该流程可以包括步骤D1至步骤D3:
步骤D1、从第1数据集合至第n数据集合中,获取第m个版本的代码数据的指标数据的聚类结果对应的量化结果。
在本申请实施例中,第1数据集合至第n数据集合,包括了m个版本的代码数据的指标数据的聚类结果对应的量化结果;示例性的,每一指标数据、以及指标数据对应的聚类结果、聚类结果对应的量化结果中,可以携带有代码数据的版本信息,如此,基于代码数据的版本信息,就可以从第1数据集合至第n数据集合中,筛选出第m个版本的代码数据的指标数据的聚类结果对应的量化结果。
步骤D2、基于第m个版本的代码数据的指标数据对应的量化结果,确定第m个版本的代码数据的质量评分。
在本申请实施例中,第m个版本的代码数据的指标数据的聚类结果对应的量化结果,可以包括n个数据,此时可以从n个数据中选取若干数据,并基于上述若干数据确定第m个版本的代码数据的质量评分;示例性的,可以对n个数据进行加权求和,确定第m个版本的代码数据的质量评分;示例性的,可以直接对n个数据进行求和,从而确定第m个版本的代码数据的质量评分。示例性的,可以记第m个版本的代码的质量评分为H,该评分可以通过式(5)计算得到:
Figure PCTCN2021141249-appb-000005
在式(5)中,c' mp表示第m个版本的第p个指标数据聚类结果的量化结果。
在本申请实施例中,第m个版本的代码数据的质量评分,可以为第m个版本的代码数据的健康 度评分。
步骤D3、在质量评分大于评分阈值的情况下,基于第1数据集合至第n数据集合,确定第m+1个版本的代码数据的缺陷概率。
在本申请实施例中,若质量评分小于或等于评分阈值,则可以确定第m个版本的代码数据中指标数据存在的问题,并针对这些问题进行复盘,以提高代码数据的质量。
在本申请实施例中,评分阈值,可以根据代码数据所处的状态确定,还可以根据代码数据实现的功能、以及代码数据的历史版本中出现缺陷事件或故障的概率高低中的至少一种因素而确定,本申请实施例对此不做限定。
在本申请实施例中,第m+1个版本的代码数据的缺陷概率,可以是按照代码数据的版本编号,对第1数据集合至第n数据集合中的数据进行排序,并对排序结果进行统计,从中获取指标数据的聚类之后的量化结果随版本编号的变化趋势,并根据上述变化趋势确定的。
由以上可知,在本申请实施例中,只有代码数据的质量评分大于评分阈值的情况下,才基于第1数据集合至第n数据集合确定第m+1个版本的代码数据的缺陷概率,从而能够控制第m+1个版本的代码数据的缺陷概率的计算条件,也能够降低第m+1个版本的代码数据的缺陷概率计算的数据量。
在本申请实施例中,基于第1数据集合至第n数据集合,确定第m+1个版本的代码数据的缺陷概率,还可以通过图8实现,图8为本申请实施例提供的确定第m+1个版本的代码数据的缺陷概率的另一流程示意图,如图8所示,该流程可以包括步骤E1至步骤E5实现:
步骤E1、获取事件类型信息。
其中,事件类型信息,表示任一版本的代码数据运行过程中出现的至少一种缺陷事件的类型信息。
在本申请实施例中,事件类型信息,可以表示每一版本的代码数据运行过程中均会出现的缺陷事件或故障的类型信息,也就是说,本申请实施例中的事件类型信息,在任一版本的代码数据中具备普遍意义;示例性的,事件类型信息,可以以字符串的形式体现,也可以以数字编号的形式体现,本申请实施例对此不做限定;示例性的,事件类型信息中,还可以包括每种缺陷事件或故障对代码数据质量的影响程度,即缺陷事件或故障的级别。
在本申请实施例中,事件类型信息也可以通过矩阵的形式体现,示例性的,事件类型信息中事件类型的数量可以为W,第m个版本的事件类型信息的矩阵可以记为G mW={g m1,g m2,…,g mw,…,g mW},g mw∈{0,1};其中,g mw为第m个版本的第w事件类型信息;示例性的,每一版本的事件类型信息的种类可以是相同的。示例性的,事件类型信息可以包括数据库(DataBase,DB)类、应用程序接口{Application Programming Interface,API}类、兼容性类型等。
表1
Figure PCTCN2021141249-appb-000006
在本申请实施例中,示例性的,对m个版本的代码数据中出现的w种事件类型信息、以及n种指标数据的聚类结果对应的量化结果的统计结果,可以如表1所示。
示例性的,在表1中,表格左侧的1,2,m,表示迭代发布的m个版本的代码数据的编号;表 格上侧横列中的1,2,…,n,用于表示m个版本的代码数据中的n种指标数据的聚类之后得到的量化结果,表格上侧的1,…,w,用于表示m个版本的代码数据中的w种事件类型;其中,c mn'表示第m个版本的代码数据中第n种指标数据的聚类结果对应的量化结果;g mw表示第m个版本的代码数据运行过程中出现的第w种事件类型。
步骤E2、基于事件类型信息,确定每一类型的缺陷事件在代码数据运行过程中出现的第一概率。
在本申请实施例中,第一概率,可以是通过m个版本的代码数据多次运行过程中出现的事件类型信息进行统计得到的。示例性的,第一概率可以为第w种事件类型信息g w出现的概率P gw,其可以通过式(6)确定:
Figure PCTCN2021141249-appb-000007
在式(6)中,q为大于或等于1且小于或等于m的整数;g qw表示第q个版本中出现第w种事件类型的概率,示例性的,当第q个版本中出现第w种事件类型时,g qw的取值为1,反之为0。
步骤E3、基于第1数据集合至第n数据集合,确定m个版本的代码数据中每一指标数据的聚类结果对应的量化结果出现的第二概率。
在本申请实施例中,第二概率,可以是通过对m个版本的代码数据的指标数据出现情况进行统计得到的;示例性的,m个版本的代码数据中,第n种指标数据的聚类结果对应的量化结果出现的第二概率
Figure PCTCN2021141249-appb-000008
可以通过式(7)确定:
Figure PCTCN2021141249-appb-000009
其中,第n种指标数据的聚类结果对应的量化结果,简记为K' n,其中,K' n={c 1' n,c' 2n,...,c' qn,...,c' mn};c' qn表示第q个版本的代码数据的指标数据聚类结果对应的量化结果。
需要说明的是,在本申请实施例中,任一种类的指标数据在任一版本的代码数据中出现的概率,是相互独立的。
步骤E4、基于事件类型信息、以及第1数据集合至第n数据集合,确定第三概率。
其中,第三概率,为m个版本的代码数据中任一类型的缺陷事件出现的情况下,至少一种指标数据的聚类结果对应的量化结果出现的条件概率。
在本申请实施例中,第三概率,可以是对m个版本的代码数据中多种类型的缺陷事件出现的情况、以及至少一种指标数据的聚类结果对应的量化结果出现的情况进行综合统计而得到的。
在本申请实施例中,通过表1所示的各种数据,可以利用概率论统计学的计算方法,在g mw的取值为1的情况下,通过式(8)计算得到第三概率P(K n'|g w):
Figure PCTCN2021141249-appb-000010
步骤E5、基于第一概率、第二概率以及第三概率,确定第m+1个版本的代码数据的缺陷概率。
在本申请实施例中,第m+1个版本的代码数据的缺陷概率,可以是通过统计学中概率论的方法,对第一概率、第二概率以及第三概率进行计算而确定的。
在本申请实施例中,第m+1个版本的代码数据的缺陷概率,可以是通过以下方式实现的:对事件类型信息、以及第1数据集合至第n数据集合进行统计,确定事件类型信息、与不同的指标数据之间的关联关系、以及上述关联关系随代码数据的版本编号的变化趋势,然后根据上述关联关系以及变化趋势,确定第m+1个版本的代码数据的缺陷概率。
在本申请实施例中,第m+1个版本的代码数据的缺陷概率,是以至少一个版本的代码数据的指标数据聚类以及量化之后得到的第1数据集合至第n个数据集合、以及在代码数据中具备普遍意义的事件类型信息作为数据基础确定的,也就是说,在第m+1个版本的代码数据的缺陷状态确定过程中,充分考虑到了任一类型的缺陷事件出现、与至少一种指标数据出现之间的关联关系,从而使得第m+1个版本的代码数据的缺陷概率,能够从指标数据与缺陷事件的因果关系这个粒度更精细的体现第m+1个版本的代码数据的质量状态。
由以上可知,在本申请实施例中,第m+1个版本的代码数据的缺陷概率,是根据m个版本的代码数据中事件类型信息、以及至少一种指标数据的聚类结果对应的量化结果计算得到的,因此,通过上述步骤计算得到的第m+1个版本的代码数据的缺陷概率,能够从整体上客观而全面的反应出事件类型信息与至少一种指标数据之间的关联关系。
在本申请实施例中,基于第一概率、第二概率以及第三概率,确定第m+1个版本的代码数据的缺陷概率,可以通过以下方式实现:
通过
Figure PCTCN2021141249-appb-000011
确定得到第m+1个版本的所述代码数据的缺陷概率。
其中,P 1为第一概率;P 2为第二概率;P 3为第三概率;P s为第m+1个版本的代码数据的缺陷概率。
在本申请实施例中,P 1可以为第一概率即前述实施例中的
Figure PCTCN2021141249-appb-000012
P 2可以为第二概率即前述实施例中的
Figure PCTCN2021141249-appb-000013
P 3可以为前述实施例中的P(K n'|g w)。
示例性的,依据朴素贝叶斯(Naive Bayesian Classification,NBC),第m+1个版本的代码数据的缺陷概率P(g w|(K 1',K' 2,...,K' n))即P s,可以通过式(9)计算得到:
Figure PCTCN2021141249-appb-000014
而P((K 1',K' 2,...,K' n)|g w)可以通过式(10)计算得到:
P((K 1',K' 2,...,K' n)|g w)=P(K 1'|g w)*P(K' 2|g w)...*P(K' n|g w)       (10)
而式(10)中等号右侧的每一概率,可以通过式(8)以及表1计算得到,由于每一指标数据在任一版本数据中出现的独立性,式(9)所示等式右边的分母,可以通过式(11)计算得到:
P(K 1',K' 2,...,K' n)=P(K 1')*P(K' 2)...*P(K' n)       (11)
通过以上步骤,本申请实施例提供的代码缺陷状态确定方法,可以高效的确定第m+1个版本的代码数据的缺陷概率,并且,由于在第m+1个版本的代码数据的缺陷概率,是通过NBC,对前m个版本的代码数据的指标数据以及事件类型信息进行计算而确定的,因此,第m+1个版本的代码数据的缺陷概率,能够更客观而精准的反应出第m+1个版本的代码数据的实际状态。
在本申请实施例中,在确定第m+1个版本的代码数据的缺陷概率之后,还可以执行以下步骤:
在第m+1个版本的代码数据的缺陷概率大于或等于概率阈值的情况下,对第m个版本的代码数据进行处理,得到第m+1个版本的代码数据;发布第m+1个版本的代码数据。
相应的,在第m+1个版本的代码数据的缺陷概率小于概率阈值的情况下,可以确定第m+1个版本的代码数据处于稳定状态,可以直接发布。
在本申请实施例中,概率阈值,可以是根据代码数据所处的阶段或状态而确定的,还可以是根据代码数据发布的期望状态而确定的,本申请实施例对此不做限定。示例性的,概率阈值可以为50%。
在本申请实施例中,对第m个版本的代码数据进行的处理操作,可以是基于第m+1个版本的代码数据的缺陷概率而确定的;示例性的,对第m个版本的代码数据进行的处理操作,可以包括评审和走查等,比如,针对DB事件类型,可以执行复核变更脚本sqlscan扫描结果、检测条件字段是否用到索引、explain执行计划等。通过以上针对性的处理操作,就可以降低版本升级而引入的缺陷概率。
在本申请实施例中,基于第1数据集合至第n数据集合,确定第m+1个版本的代码数据的缺陷概率,还可以通过图9实现,图9为本申请实施例提供的确定第m+1个版本的代码数据的缺陷概率的又一流程示意图,如图9所示,该流程可以包括步骤F1至步骤F4:
步骤F1、获取事件类型信息。
其中,事件类型信息,表示任一版本的代码数据运行过程中出现的至少一种缺陷事件的类型信息。
步骤F2、基于第1数据集合至第n数据集合以及事件类型信息,对决策树模型进行训练,得到训练完成的决策树模型。
在本申请实施例中,可以将第1数据集合至第n数据集合以及事件类型信息划分为训练样本集和测试样本集,通过训练样本集对决策树模型进行训练,得到训练结果,并通过测试样本集对训练结果进行测试,在决策树模型输出的概率信息,与实际的概率信息之间的差别大于期望差值的情况下,基于训练样本集持续对决策树模型进行训练,直至决策树模型输出的概率信息与实际的概率信息之间的差别小于或等于期望差值的情况下,结束对决策树模型的训练过程,以得到训练完成的决策树模型。
在本申请实施例中,决策树模型,可以为复合树模型,比如XGBoost模型。
步骤F3、获取第m+1个版本的代码数据对应的至少一种指标数据的聚类结果对应的量化结果。
在本申请实施例中,第m+1个版本的代码数据对应的至少一种指标数据的聚类结果对应的量化结果,可以采用与前述实施例相同的方法获取,此处不再赘述。
步骤F4、基于训练完成的决策树模型,对第m+1个版本的代码数据对应的至少一种指标数据的聚类结果对应的量化结果、以及事件类型信息进行处理,确定第m+1个版本的代码数据的缺陷概率。
在本申请实施例中,可以将第m+1个版本的代码数据对应的至少一种指标数据的聚类结果对应的量化结果、以及事件类型信息输入至训练完成的决策树模型进行处理,从而能够确定第m+1个版本的代码数据的缺陷概率。
在实际应用中,对决策树模型的训练过程的运算量较大,且可能会出现过拟合的情况,在指标数据出现概率相互独立的情况下,通过训练完成的决策树模型计算得到的第m+1个版本的代码数据的缺陷概率的准确性,较NBC的方法计算得到的第m+1个版本的代码数据的缺陷概率的准确性,可能会有所下降。
由以上可知,本申请实施例提供的代码缺陷状态确定方法,在确定第m+1个版本的代码数据的缺陷概率的时候,充分借鉴了m个版本数据的至少一种指标数据的数据特征,并且,通过聚类的方法,对至少一种指标数据进行聚类,就能够获取各种指标数据的分布状态,从而能够在至少一种指标 数据随机分布的情况下,也能够精确且客观的确定第m+1个版本的代码数据的缺陷概率,因此,本申请实施例提供的代码缺陷状态确定方法,还能够降低代码数据的缺陷概率确定的难度,提高了代码数据的缺陷状态确定的灵活性。
图10为本申请实施例提供的代码缺陷状态确定方法的又一流程示意图。如图10所示,该方法可以包括以下步骤:
步骤301、从研发质量平台(Quality Management System,QMS)至少获取指标数据。
示例性的,研发质量平台可以依托于DPMS、CI以及测试平台等,至少收集并提供获取指标数据;示例性的,对于前m个版本的代码数据,可以从QMS获取指标数据以及事件类型信息。
步骤302、清洗历史数据。
示例性的,对于前m个版本的代码数据而言,可以采用通用的数据清洗方式,从历史数据中去除冗余数据。
步骤303、获取迭代运营数据。
示例性的,迭代运营数据,可以包括切换至交付状态并持续运营一段时间的代码数据、在运营过程中所产生的缺陷事件数据。
步骤304、获取事件标签。
示例性的,事件标签,相当于前述实施例中的事件类型信息。示例性的,此处的事件标签,可以包括代码数据在运营过程中所产生的所有事件的标签,也可以包括代码数据在运营过程中所产生的部分事件的标签。
步骤305、获取迭代过程数据。
示例性的,迭代过程数据,可以包括至少一个版本的代码数据中的原始指标数据。
步骤306、确定指标数据。
示例性的,这里的指标数据,可以是采用前述实施例所提供的方法,对原始指标数据进行处理,从而确定指标数据,还可以从至少一种指标数据中选取若干种类的指标数据。
步骤307、描述性统计分析自动获取初始质心。
示例性的,这里的初始质心,可以为前述实施例中的初始质心数据。示例性的,描述性统计分析,可以使用若干关键数据描述指标数据的整体情况。
示例性的,在指标数据为历史版本的代码数据的指标数据的情况下,可以执行步骤308;若指标数据属于迭代发布的新版本的代码数据,则可以执行步骤314。
步骤308、计算第一概率、第二概率以及第三概率。
示例性的,此处计算上述三种概率的方法可以参照前述实施例,此处不再赘述。
步骤309、计算新版本的缺陷概率。
示例性的,这里新版本的缺陷概率,可以是基于NBC对第一概率、第二概率以及第三概率进行计算而得到的、第m+1个版本的代码数据的缺陷概率。
步骤310、判断缺陷概率是否大于50%。
示例性的,这里的50%,可以为前述实施例所述的概率阈值。
缺陷概率大于50%,则可以执行步骤311;反之,可以执行步骤312。
步骤311、针对性复核评审并处理代码数据。
示例性的,在针对性复核评审处理代码数据之后,可以得到新版本的代码数据。之后可以执行步骤312。
步骤312、迭代发布新版本的指标数据。
在迭代发布新版本的代码数据之后,可以从QMS平台获取新版本的指标数据。并通过步骤305至步骤307对新版本的指标数据进行解析,确定初始质心,并执行聚类处理。
步骤313、获取新版本的指标数据。
示例性的,这里可以是获取第m+1个版本的指标数据。
步骤314、生成指标评分计算项目健康度评分。
示例性的,此处的项目健康度评分可以为前述实施例中的第m+1个版本的代码数据的质量评分。
步骤315、判断健康度评分是否小于目标值。
示例性的,此处的目标值相当于前述实施例中的评分阈值。若健康度评分小于目标值,则执行步骤316;反之,执行步骤309。
步骤316、检视不达标代码数据,形成新版本代码数据。
在步骤316之后,还可以执行步骤309。
由以上可知,本申请实施例提供的代码缺陷状态确定方法,可以充分利用从QMS平台中获取的一系列与代码数据的版本相关的指标数据以及事件类型标签,而且还可以依托聚类算法以及概率论的统计方法,对上述各种数据进行统计分析,从而能够获取各种指标数据与事件类型标签之间的关联关系,进而为新版本的代码数据的缺陷状态的精准灵活确定奠定了基础。
基于前述实施例,本申请实施例还提供了一种代码缺陷状态确定装置4。图11为本申请实施例提供的代码缺陷状态确定装置4的结构示意图。如图11所示,该装置包括:
第一确定模块401,配置为确定项目数据的至少一种指标数据;其中,项目数据,包括实现项目功能的至少一个版本的代码数据;指标数据,包括项目数据的质量缺陷数据;
处理模块402,配置为对至少一种指标数据进行聚类处理,得到聚类结果;
第二确定模块403,配置为基于聚类结果,确定代码数据的缺陷状态。
在一些实施例中,处理模块402,配置为对至少一种指标数据中的每种指标数据进行分析,确定每种指标数据的初始质心数据;其中,初始质心数据,包括每种指标数据的最大值、最小值、平均值、众数以及中位数中的至少两个;
处理模块402,还配置为基于初始质心数据,对每种指标数据进行聚类处理,得到聚类结果。
在一些实施例中,处理模块402,配置为确定每种指标数据中任一指标数据与初始质心数据的每一数据之间的第一距离信息;基于每一第一距离信息,对每种指标数据进行聚类处理,得到中间聚类结果;基于中间聚类结果,更新第一距离信息;
处理模块402,配置为在每一第一距离信息的误差平方和不收敛的情况下,基于每一第一距离信息,对每种指标数据进行聚类处理,得到中间聚类结果;基于中间聚类结果,更新第一距离信息;还配置为在每一第一距离信息的误差平方和收敛的情况下,完成聚类处理,得到聚类结果。
在一些实施例中,第一确定模块401,配置为获取项目数据的任一种类的至少两个原始指标数据;
第一确定模块401,还配置为确定任一种类的至少两个原始指标数据中每一原始指标数据对应的权重信息;基于权重信息,对每一原始指标数据进行加权处理,确定第n种指标数据。
在一些实施例中,至少一种指标数据包括n种指标数据,聚类结果包括第1结果至第n结果;在i取1至n时,第i结果,为第i种指标数据的聚类结果;项目数据,包括第1版本至第m版本的代码数据;代码数据的缺陷状态,第m+1个版本的代码数据的缺陷概率;第m+1个版本的代码数据的缺陷概率,包括至少一种指标数据的聚类结果对应的量化结果出现的情况下,第m+1个版本的代码数据出现中任一类型的缺陷事件的概率;其中,n为大于或等于1的整数;m为大于或等于2的整数;
第二确定模块403,配置为对聚类结果进行统计,确定量化区间信息;基于量化区间信息对第1结果至第n结果进行量化,得到第1数据集合至第n数据集合;在n大于1的情况下,基于第1数据集合至第n数据集合,确定第m+1个版本的代码数据的缺陷概率;其中,量化区间信息,表示第1结果至所述n结果中指标数据与质心之间距离的区间分布信息。
在一些实施例中,第二确定模块403,配置为对第1结果至第n结果进行分析,确定第1结果至第n结果中每一簇中任一数据与第1结果至第n结果的质点之间的第二距离信息;基于量化区间信息以及第二距离信息,对第1结果至第n结果中的数据进行量化,得到第1数据集合至第n数据集合。
在一些实施例中,第二确定模块403,配置为从第1数据集合至第n数据集合中,获取第m个版本的代码数据的指标数据的聚类结果对应的量化结果;基于第m个版本的代码数据的指标数据的聚类结果对应的量化结果,确定第m个版本的代码数据的质量评分;在质量评分大于评分阈值的情况下,基于第1数据集合至第n数据集合,确定第m+1个版本的代码数据的缺陷概率。
在一些实施例中,第二确定模块403,配置为获取事件类型信息;其中,事件类型信息,表示任一版本的代码数据运行过程中出现的至少一种缺陷事件的类型信息;
第二确定模块403,配置为基于事件类型信息,确定每一类型的缺陷事件在的代码数据运行过程中出现的第一概率;基于第1数据集合至第n数据集合,确定m个版本的代码数据中每一指标数据的聚类结果对应的量化结果出现的第二概率;基于事件类型信息、以及第1数据集合至第n数据集合,确定第三概率;其中,第三概率,为m个版本的代码数据中任一类型的缺陷事件出现的情况下,至少一种指标数据的聚类结果对应的量化结果出现的条件概率;基于第一概率、第二概率以及第三概率,确定第m+1个版本的代码数据的缺陷概率态。
在一些实施例中,第二确定模块403,配置为通过
Figure PCTCN2021141249-appb-000015
确定得到第m+1个版本的所述代码数据的缺陷概率;其中,P 1为第一概率;P 2为第二概率;P 3为第三概率;P s为第m+1个版本的所述代码数据的缺陷概率。
在一些实施例中,第二确定模块403,配置为获取事件类型信息;其中,事件类型信息,表示任一版本的代码数据运行过程中出现的至少一种缺陷事件的类型信息;基于第1数据集合至第n数据集合以及事件类型信息,对决策树模型进行训练,得到训练完成的决策树模型;获取第m+1个版本的代码数据对应的至少一种指标数据的聚类结果对应的量化结果;基于训练完成的决策树模型,对第m+1个版本的代码数据对应的至少一种指标数据的聚类结果对应的量化结果、以及事件类型信息进行处理,确定第m+1个版本的代码数据的缺陷概率。
需要说明的是,实际应用中,第一确定模块501、处理模块502以及第二确定模块503,可以利用电子设备中的处理器实现,上述处理器可以为ASIC、DSP、DSPD、PLD、FPGA、CPU、控制器、微控制器、微处理器中的至少一种。
基于前述实施例,本申请实施例还提供了一种电子设备5,图12为本申请实施例提供的电子设备5的结构示意图,如图5所示,该电子设备5可以包括存储器501和处理器502;其中:
存储器501,配置为存储可执行指令;
处理器502,配置为执行存储器502中存储的可执行指令时,实现如前任一实施例的缺陷状态确定方法。
其中,上述处理器502可以为特定用途集成电路ASIC、DSP、DSPD、PLD、FPGA、CPU、控制器、微控制器、微处理器中的至少一种。可以理解地,用于实现上述处理器功能的电子器件还可以为其它,本发明实施例不作具体限定。
上述存储器501,可以是易失性存储器(volatile memory),例如RAM;或者非易失性存储器(non-volatile memory),例如ROM,快闪存储器(flash memory,硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);或者上述种类的存储器的组合,并向处理器提供指令和数据。
基于前述实施例,本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有可执行指令,可执行指令被处理器执行时,能够实现如前任一实施例所述的代码缺陷状态确定方法。
基于前述实施例,本申请实施例还提供了一种计算机程序,该计算机程序包括计算机可读代码,上述计算机可读代码在电子设备中运行的情况下,电子设备的处理器配置为实现如前任一实施例所述的代码缺陷状态确定方法。
其中,上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工 作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后说明的是:以上所述实施例,仅为本申请的具体实施方式,用以说明本申请的技术方案而非对其限制,本申请的保护范围并不局限于此,尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本申请实施例技术方案的精神和范围,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。
工业实用性
本申请实施例公开了一种代码缺陷状态确定方法、装置、设备、介质及程序,所述方法包括:确定项目数据的至少一种指标数据;其中,所述项目数据,包括实现项目功能的至少一个版本的代码数据;所述指标数据,包括所述项目数据的质量缺陷数据;对所述至少一种指标数据进行聚类处理,得到聚类结果;基于所述聚类结果,确定所述代码数据的缺陷状态。通过本申请提供的代码缺陷状态确定发方法,能够实现对任意代码数据状态和质量的评估,因而能够在更广泛的场景中得以应用。

Claims (23)

  1. 一种代码缺陷状态确定方法,所述方法由电子设备执行;所述方法包括:
    确定项目数据的至少一种指标数据;其中,所述项目数据,包括实现项目功能的至少一个版本的代码数据;所述指标数据,包括所述项目数据的质量缺陷数据;
    对所述至少一种指标数据进行聚类处理,得到聚类结果;
    基于所述聚类结果,确定所述代码数据的缺陷状态。
  2. 根据权利要求1所述的方法,其中,所述对所述至少一种指标数据进行聚类处理,得到聚类结果,包括:
    对所述至少一种指标数据中的每种指标数据进行分析,确定所述每种指标数据的初始质心数据;其中,所述初始质心数据,包括所述每种指标数据的最大值、最小值、平均值、众数以及中位数中的至少两个;
    基于所述初始质心数据,对所述每种指标数据进行所述聚类处理,得到所述聚类结果。
  3. 根据权利要求2所述的方法,其中,所述基于所述初始质心数据,对所述每种指标数据进行所述聚类处理,得到所述聚类结果,包括:
    确定所述每种指标数据中任一指标数据与所述初始质心数据的每一数据之间的第一距离信息;
    基于每一所述第一距离信息,对所述每种指标数据进行聚类处理,得到中间聚类结果;
    基于所述中间聚类结果,更新所述第一距离信息;
    在每一所述第一距离信息的误差平方和不收敛的情况下,基于每一所述第一距离信息,对所述每种指标数据进行聚类处理,得到中间聚类结果;基于所述中间聚类结果,更新所述第一距离信息;在每一所述第一距离信息的误差平方和收敛的情况下,完成所述聚类处理,得到所述聚类结果。
  4. 根据权利要求1所述的方法,其中,所述确定项目数据的至少一种指标数据,包括:
    获取所述项目数据的任一种类的至少两个原始指标数据;
    确定所述任一种类的至少两个原始指标数据中每一所述原始指标数据对应的权重信息;
    基于所述权重信息,对每一所述原始指标数据进行加权处理,确定所述任一种类的指标数据。
  5. 根据权利要求1所述的方法,其中,所述至少一种指标数据包括n种指标数据,所述聚类结果包括第1结果至第n结果;在i取1至n时,第i结果,为第i种指标数据的聚类结果;所述项目数据,包括第1版本至第m版本的所述代码数据;所述代码数据的缺陷状态,包括第m+1个版本的代码数据的缺陷概率;所述第m+1个版本的代码数据的缺陷概率,包括所述至少一种所述指标数据的聚类结果对应的量化结果出现的情况下,第m+1个版本的代码数据出现中任一类型的缺陷事件的概率;其中,n为大于或等于1的整数;m为大于或等于2的整数;所述基于所述聚类结果,确定所述代码数据的缺陷状态,包括:
    对所述聚类结果进行统计,确定量化区间信息;其中,所述量化区间信息,表示所述第1结果至所述n结果中指标数据与质心之间距离的区间分布信息;
    基于所述量化区间信息对所述第1结果至第n结果进行量化,得到第1数据集合至第n数据集合;
    在n大于1的情况下,基于所述第1数据集合至所述第n数据集合,确定所述第m+1个版本的所述代码数据的缺陷概率。
  6. 根据权利要求5所述的方法,其中,所述基于所述量化区间信息对所述第1结果至第n结果进行量化,得到第1数据集合至第n数据集合,包括:
    对所述第1结果至第n结果进行分析,确定所述第1结果至第n结果中每一簇中任一数据与所述第1结果至第n结果的质心之间的第二距离信息;
    基于所述量化区间信息以及所述第二距离信息,对所述第1结果至第n结果中的数据进行量化,得到所述第1数据集合至第n数据集合。
  7. 根据权利要求5所述的方法,其中,所述基于所述第1数据集合至第n数据集合,确定所述第m+1个版本的所述代码数据的缺陷概率,包括:
    从所述第1数据集合至第n数据集合中,获取第m个版本的所述代码数据的所述指标数据的聚类结果对应的量化结果;
    基于所述第m个版本的所述代码数据的所述指标数据的聚类结果对应的量化结果,确定第m个版本的所述代码数据的质量评分;
    在所述质量评分大于评分阈值的情况下,基于所述第1数据集合至第n数据集合,确定所述第m+1个版本的所述代码数据的缺陷概率。
  8. 根据权利要求5所述的方法,其中,所述基于所述第1数据集合至第n数据集合,确定所述第m+1个版本的所述代码数据的缺陷概率,包括:
    获取事件类型信息;其中,所述事件类型信息,表示任一版本的所述代码数据运行过程中出现的至少一种缺陷事件的类型信息;
    基于所述事件类型信息,确定每一类型的所述缺陷事件在所述的代码数据运行过程中出现的第一概率;
    基于所述第1数据集合至第n数据集合,确定m个版本的所述代码数据中每一所述指标数据的聚类结果对应的量化结果出现的第二概率;
    基于所述事件类型信息、以及所述第1数据集合至第n数据集合,确定第三概率;其中,所述第三概率,为m个版本的所述代码数据中任一类型的缺陷事件出现的情况下,至少一种所述指标数据的聚类结果对应的量化结果出现的条件概率;
    基于所述第一概率、所述第二概率以及所述第三概率,确定所述第m+1个版本的所述代码数据的缺陷概率。
  9. 根据权利要求8所述的方法,其中,所述基于所述第一概率、所述第二概率以及所述第三概率,确定所述第m+1个版本的所述代码数据的缺陷概率,包括:
    通过
    Figure PCTCN2021141249-appb-100001
    确定得到所述第m+1个版本的所述代码数据的缺陷概率;其中,P 1为第一概率;P 2为第二概率;P 3为第三概率;P s为所述第m+1个版本的所述代码数据的缺陷概率。
  10. 根据权利要求5所述的方法,其中,所述基于所述第1数据集合至所述第n数据集合,确定所述第m+1个版本的所述代码数据的缺陷概率,包括:
    获取事件类型信息;其中,所述事件类型信息,表示任一版本的所述代码数据运行过程中出现的至少一种缺陷事件的类型信息;
    基于所述第1数据集合至第n数据集合以及所述事件类型信息,对所述决策树模型进行训练,得到训练完成的所述决策树模型;
    获取第m+1个版本的所述代码数据对应的至少一种所述指标数据的聚类结果对应的量化结果;
    基于训练完成的所述决策树模型,对所述第m+1个版本的所述代码数据对应的至少一种所述指标数据的聚类结果对应的量化结果、以及所述事件类型信息进行处理,确定所述第m+1个版本的所述代码数据的缺陷概率。
  11. 一种代码缺陷状态确定装置,所述装置包括:
    第一确定模块,配置为确定项目数据的至少一种指标数据;其中,所述项目数据,包括实现项目功能的至少一个版本的代码数据;所述指标数据,包括所述项目数据的质量缺陷数据;
    处理模块,配置为对所述至少一种指标数据进行聚类处理,得到聚类结果;
    第二确定模块,配置为基于所述聚类结果,确定所述代码数据的缺陷状态。
  12. 根据权利要求11所述的装置,其中:
    所述处理模块,配置为对所述至少一种指标数据中的每种指标数据进行分析,确定所述每种指标数据的初始质心数据;其中,所述初始质心数据,包括所述每种指标数据的最大值、最小值、平均值、 众数以及中位数中的至少两个;
    所述处理模块,还配置为基于所述初始质心数据,对所述每种指标数据进行所述聚类处理,得到所述聚类结果。
  13. 根据权利要求12所述的装置,其中:
    所述处理模块,配置为确定所述每种指标数据中任一指标数据与所述初始质心数据的每一数据之间的第一距离信息;基于每一所述第一距离信息,对所述每种指标数据进行聚类处理,得到中间聚类结果;基于所述中间聚类结果,更新所述第一距离信息;在每一所述第一距离信息的误差平方和不收敛的情况下,基于每一所述第一距离信息,对所述每种指标数据进行聚类处理,得到中间聚类结果;基于所述中间聚类结果,更新所述第一距离信息;在每一所述第一距离信息的误差平方和收敛的情况下,完成所述聚类处理,得到所述聚类结果。
  14. 根据权利要求11所述的装置,其中:
    所述第一确定模块,配置为获取所述项目数据的任一种类的至少两个原始指标数据;确定所述任一种类的至少两个原始指标数据中每一所述原始指标数据对应的权重信息;基于所述权重信息,对每一所述原始指标数据进行加权处理,确定所述任一种类的指标数据。
  15. 根据权利要求11所述的装置,其中,所述至少一种指标数据包括n种指标数据,所述聚类结果包括第1结果至第n结果;在i取1至n时,第i结果,为第i种指标数据的聚类结果;所述项目数据,包括第1版本至第m版本的所述代码数据;所述代码数据的缺陷状态,包括第m+1个版本的代码数据的缺陷概率;所述第m+1个版本的代码数据的缺陷概率,包括所述至少一种所述指标数据的聚类结果对应的量化结果出现的情况下,第m+1个版本的代码数据出现中任一类型的缺陷事件的概率;其中,n为大于或等于1的整数;m为大于或等于2的整数;
    所述第二确定模块,配置为对所述聚类结果进行统计,确定量化区间信息;其中,所述量化区间信息,表示所述第1结果至所述n结果中指标数据与质心之间距离的区间分布信息;
    所述第二确定模块,还配置为基于所述量化区间信息对所述第1结果至第n结果进行量化,得到第1数据集合至第n数据集合;在n大于1的情况下,基于所述第1数据集合至所述第n数据集合,确定所述第m+1个版本的所述代码数据的缺陷概率。
  16. 根据权利要求15所述的装置,其中:
    所述第二确定模块,配置为对所述第1结果至第n结果进行分析,确定所述第1结果至第n结果中每一簇中任一数据与所述第1结果至第n结果的质心之间的第二距离信息;基于所述量化区间信息以及所述第二距离信息,对所述第1结果至第n结果中的数据进行量化,得到所述第1数据集合至第n数据集合。
  17. 根据权利要求15所述的装置,其中:
    第二确定模块,配置为从所述第1数据集合至第n数据集合中,获取第m个版本的所述代码数据的所述指标数据的聚类结果对应的量化结果;基于所述第m个版本的所述代码数据的所述指标数据的聚类结果对应的量化结果,确定第m个版本的所述代码数据的质量评分;在所述质量评分大于评分阈值的情况下,基于所述第1数据集合至第n数据集合,确定所述第m+1个版本的所述代码数据的缺陷概率。
  18. 根据权利要求15所述的装置,其中:
    所述第二确定模块,配置为获取事件类型信息;其中,所述事件类型信息,表示任一版本的所述代码数据运行过程中出现的至少一种缺陷事件的类型信息;
    所述第二确定模块,还配置为基于所述事件类型信息,确定每一类型的所述缺陷事件在所述的代码数据运行过程中出现的第一概率;基于所述第1数据集合至第n数据集合,确定m个版本的所述代码数据中每一所述指标数据的聚类结果对应的量化结果出现的第二概率;
    基于所述事件类型信息、以及所述第1数据集合至第n数据集合,确定第三概率;其中,所述第三概率,为m个版本的所述代码数据中任一类型的缺陷事件出现的情况下,至少一种所述指标数据的聚类结果对应的量化结果出现的条件概率;基于所述第一概率、所述第二概率以及所述第三概率, 确定所述第m+1个版本的所述代码数据的缺陷概率。
  19. 根据权利要求18所述的装置,其中:
    所述第二确定模块,配置为通过
    Figure PCTCN2021141249-appb-100002
    确定得到所述第m+1个版本的所述代码数据的缺陷概率;其中,P 1为第一概率;P 2为第二概率;P 3为第三概率;P s为所述第m+1个版本的所述代码数据的缺陷概率。
  20. 根据权利要求15所述的装置,其中:
    第二确定模块,配置为获取事件类型信息;基于所述第1数据集合至第n数据集合以及所述事件类型信息,对所述决策树模型进行训练,得到训练完成的所述决策树模型;获取第m+1个版本的所述代码数据对应的至少一种所述指标数据的聚类结果对应的量化结果;基于训练完成的所述决策树模型,对所述第m+1个版本的所述代码数据对应的至少一种所述指标数据的聚类结果对应的量化结果、以及所述事件类型信息进行处理,确定所述第m+1个版本的所述代码数据的缺陷概率;其中,所述事件类型信息,表示任一版本的所述代码数据运行过程中出现的至少一种缺陷事件的类型信息。
  21. 一种电子设备,所述电子设备包括:
    存储器,配置为存储可执行指令;
    处理器,配置为执行所述存储器中存储的所述可执行指令时,实现权利要求1至10任一项所述的代码缺陷状态确定方法。
  22. 一种计算机可读存储介质,所述计算机可读存储介质中存储有可执行指令,所述可执行指令被处理器执行时,能够实现如权利要求1至10任一项所述的代码缺陷状态确定方法。
  23. 一种计算机程序,所述计算机程序包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备的处理器执行配置为实现如权利要求1至10任一项所述的代码缺陷状态确定方法。
PCT/CN2021/141249 2021-06-15 2021-12-24 代码缺陷状态确定方法、装置、设备、介质及程序 WO2022262247A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110661540.4 2021-06-15
CN202110661540.4A CN113326198A (zh) 2021-06-15 2021-06-15 一种代码缺陷状态确定方法、装置、电子设备及介质

Publications (1)

Publication Number Publication Date
WO2022262247A1 true WO2022262247A1 (zh) 2022-12-22

Family

ID=77420882

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/141249 WO2022262247A1 (zh) 2021-06-15 2021-12-24 代码缺陷状态确定方法、装置、设备、介质及程序

Country Status (2)

Country Link
CN (1) CN113326198A (zh)
WO (1) WO2022262247A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326198A (zh) * 2021-06-15 2021-08-31 深圳前海微众银行股份有限公司 一种代码缺陷状态确定方法、装置、电子设备及介质
US11874798B2 (en) * 2021-09-27 2024-01-16 Sap Se Smart dataset collection system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106502909A (zh) * 2016-11-07 2017-03-15 南京大学 一种智能手机应用开发中的代码缺陷预测方法
US20170091071A1 (en) * 2015-09-25 2017-03-30 International Business Machines Corporation Predicting software product quality
CN109726120A (zh) * 2018-12-05 2019-05-07 北京计算机技术及应用研究所 一种基于机器学习的软件缺陷确认方法
CN110109821A (zh) * 2019-03-19 2019-08-09 深圳壹账通智能科技有限公司 软件程序质量评价方法、装置、计算机设备及存储介质
CN110210508A (zh) * 2018-12-06 2019-09-06 北京奇艺世纪科技有限公司 模型生成方法、异常流量检测方法、装置、电子设备、计算机可读存储介质
CN111178380A (zh) * 2019-11-15 2020-05-19 腾讯科技(深圳)有限公司 数据分类方法、装置及电子设备
CN112579477A (zh) * 2021-02-26 2021-03-30 北京北大软件工程股份有限公司 一种缺陷检测方法、装置以及存储介质
CN113326198A (zh) * 2021-06-15 2021-08-31 深圳前海微众银行股份有限公司 一种代码缺陷状态确定方法、装置、电子设备及介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083514B (zh) * 2019-03-19 2023-03-10 深圳壹账通智能科技有限公司 软件测试缺陷评估方法、装置、计算机设备及存储介质
CN111837109A (zh) * 2019-11-22 2020-10-27 深圳海付移通科技有限公司 一种代码质量和缺陷的分析方法、服务器及存储介质
CN111258905B (zh) * 2020-01-19 2023-05-23 中信银行股份有限公司 缺陷定位方法、装置和电子设备及计算机可读存储介质
CN112148605B (zh) * 2020-09-22 2022-05-20 华南理工大学 一种基于谱聚类的半监督学习的软件缺陷预测方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170091071A1 (en) * 2015-09-25 2017-03-30 International Business Machines Corporation Predicting software product quality
CN106502909A (zh) * 2016-11-07 2017-03-15 南京大学 一种智能手机应用开发中的代码缺陷预测方法
CN109726120A (zh) * 2018-12-05 2019-05-07 北京计算机技术及应用研究所 一种基于机器学习的软件缺陷确认方法
CN110210508A (zh) * 2018-12-06 2019-09-06 北京奇艺世纪科技有限公司 模型生成方法、异常流量检测方法、装置、电子设备、计算机可读存储介质
CN110109821A (zh) * 2019-03-19 2019-08-09 深圳壹账通智能科技有限公司 软件程序质量评价方法、装置、计算机设备及存储介质
CN111178380A (zh) * 2019-11-15 2020-05-19 腾讯科技(深圳)有限公司 数据分类方法、装置及电子设备
CN112579477A (zh) * 2021-02-26 2021-03-30 北京北大软件工程股份有限公司 一种缺陷检测方法、装置以及存储介质
CN113326198A (zh) * 2021-06-15 2021-08-31 深圳前海微众银行股份有限公司 一种代码缺陷状态确定方法、装置、电子设备及介质

Also Published As

Publication number Publication date
CN113326198A (zh) 2021-08-31

Similar Documents

Publication Publication Date Title
Verenich et al. Survey and cross-benchmark comparison of remaining time prediction methods in business process monitoring
US11256555B2 (en) Automatically scalable system for serverless hyperparameter tuning
CN107633265B (zh) 用于优化信用评估模型的数据处理方法及装置
WO2022262247A1 (zh) 代码缺陷状态确定方法、装置、设备、介质及程序
US20210042590A1 (en) Machine learning system using a stochastic process and method
EP3591586A1 (en) Data model generation using generative adversarial networks and fully automated machine learning system which generates and optimizes solutions given a dataset and a desired outcome
US11810000B2 (en) Systems and methods for expanding data classification using synthetic data generation in machine learning models
CN106030589A (zh) 使用开源数据的疾病预测系统
US11146580B2 (en) Script and command line exploitation detection
US20210374582A1 (en) Enhanced Techniques For Bias Analysis
Ardimento et al. Knowledge extraction from on-line open source bug tracking systems to predict bug-fixing time
US20240013919A1 (en) Supervised machine learning-based modeling of sensitivities to potential disruptions
CN112241494A (zh) 基于用户行为数据的关键信息推送方法及装置
Dasgupta et al. Towards auto-remediation in services delivery: Context-based classification of noisy and unstructured tickets
CN114840531B (zh) 基于血缘关系的数据模型重构方法、装置、设备及介质
Ogunleye The concepts of predictive analytics
US20140244293A1 (en) Method and system for propagating labels to patient encounter data
Gallo et al. Analysis of XDMoD/SUPReMM data using machine learning techniques
Qudsi et al. Predictive data mining of chronic diseases using decision tree: a case study of health insurance company in Indonesia
CN111325350A (zh) 可疑组织发现系统和方法
CN112308294A (zh) 违约概率预测方法及装置
US11886320B2 (en) Diagnosing application problems by learning from fault injections
Wilde et al. Segmentation analysis and the recovery of queuing parameters via the Wasserstein distance: a study of administrative data for patients with chronic obstructive pulmonary disease
US20230197230A1 (en) Hierarchy-aware adverse reaction embeddings for signal detection
Malioutov et al. Heavy Sets with Applications to Interpretable Machine Learning Diagnostics

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21945823

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE