CN118114075A

CN118114075A - Data clustering method and device, computer equipment and storage medium

Info

Publication number: CN118114075A
Application number: CN202211528292.7A
Authority: CN
Inventors: 佘西敏
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2024-05-31

Abstract

The embodiment of the application discloses a data clustering method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: the nth access request to be clustered is sent to a tested system; the method comprises the steps that a instrumentation file for acquiring code coverage data exists in an object code file of a tested system, and the code coverage data is code execution state data generated by playback of an nth access request of the tested system; the instrumentation file is utilized to obtain code coverage data of the nth access request played back by the tested system; determining target code coverage data of the nth access request based on the code coverage data of the (n-1) th access request and the code coverage data of the nth access request; clustering the nth access request based on the target code coverage data to obtain a target request set corresponding to the nth access request; the partial access requests in the target request set are used to test a new version of the system under test. By adopting the application, the test efficiency can be improved, and the test cost can be reduced.

Description

Data clustering method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data clustering method, a data clustering device, a computer device, and a storage medium.

Background

With the rapid development of internet technology, more and more application systems are continuously on-line with new versions, and in order to ensure the availability of the new versions of application systems, the new versions of application systems generally need to be tested in a test environment.

To test an application system, a large number of original online access requests are generally recorded, and a new version of the application system is tested by using the original online access requests. However, because the original online access request amount is too large, if the new version of the application system is tested by the full playback mode, higher testing cost (such as server cost, time cost, etc.) will usually be generated, and the testing efficiency is lower.

Disclosure of Invention

The embodiment of the application provides a data clustering method, a data clustering device, computer equipment and a storage medium, which can improve the testing efficiency and reduce the testing cost.

In a first aspect, an embodiment of the present application provides a data clustering method, including:

acquiring an nth access request to be clustered, and sending the nth access request to a tested system; the method comprises the steps that a pile inserting file for obtaining code coverage data exists in an object code file of a tested system, the code coverage data is code execution state data generated by playing back an nth access request of the tested system, N is [1, N ], N is the total number of access requests to be clustered, and N and N are positive integers;

Acquiring code coverage data of the nth access request played back by the tested system by utilizing the instrumentation file;

Acquiring code coverage data of an n-1 th access request played back by the tested system, and determining target code coverage data of the n-1 th access request based on the code coverage data of the n-1 th access request and the code coverage data of the n-th access request;

Clustering the nth access request based on the target code coverage data to obtain a target request set corresponding to the nth access request; and the partial access requests in the target request set are used for testing the new version of the tested system.

In a second aspect, an embodiment of the present application provides a data clustering apparatus, including:

The first acquisition unit is used for acquiring an nth access request to be clustered and sending the nth access request to a tested system; the method comprises the steps that a pile inserting file for obtaining code coverage data exists in an object code file of a tested system, the code coverage data is code execution state data generated by playing back an nth access request of the tested system, N is [1, N ], N is the total number of access requests to be clustered, and N and N are positive integers;

The second acquisition unit is used for acquiring code coverage data of the nth access request played back by the tested system by utilizing the instrumentation file;

a determining unit, configured to obtain code coverage data of an nth-1 st access request played back by the tested system, and determine target code coverage data of the nth access request based on the code coverage data of the nth-1 st access request and the code coverage data of the nth access request;

The clustering unit is used for clustering the nth access request based on the target code coverage data to obtain a target request set corresponding to the nth access request; and the partial access requests in the target request set are used for testing the new version of the tested system.

In a third aspect, an embodiment of the present application provides a computer apparatus, including: a processor and a memory, the processor being configured to perform the method according to the first aspect.

In a fourth aspect, an embodiment of the present application further provides a computer readable storage medium, where program instructions are stored, the program instructions when executed implementing the method according to the first aspect.

In a fifth aspect, embodiments of the present application also provide a computer program product or computer program comprising program instructions which, when executed by a processor, implement the method of the first aspect described above.

In the embodiment of the application, the nth access request to be clustered can be sent to the tested system; the object code file of the tested system can be provided with a instrumentation file for acquiring code coverage data, wherein the code coverage data is code execution state data generated by playback of an nth access request of the tested system; then, the instrumentation file can be utilized to obtain code coverage data of the nth access request played back by the tested system; and determining target code coverage data of the nth access request based on the code coverage data of the (n-1) th access request and the code coverage data of the nth access request; further, the nth access request can be clustered based on the target code coverage data to obtain a target request set corresponding to the nth access request; the partial access requests in the target request set may be used to test a new version of the system under test. The test efficiency can be improved, and the test cost can be reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a data clustering system according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a data clustering method according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of obtaining clustering requirements according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of another data clustering method according to an embodiment of the present application;

fig. 5a is a schematic flow chart of a recording access request according to an embodiment of the present application;

fig. 5b is a schematic flow chart of a system under test instrumentation according to an embodiment of the present application;

FIG. 5c is a flowchart illustrating another method for clustering data according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a data clustering device according to an embodiment of the present application;

Fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Cloud Technology (Cloud Technology) refers to a hosting Technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.

The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied by the cloud computing business mode, can form a resource pool, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

Cloud Computing (Cloud Computing) is a Computing model that distributes Computing tasks across a large pool of computer-made resources, enabling various application systems to acquire Computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the cloud are infinitely expandable in the sense of users, and can be acquired at any time, used as needed, expanded at any time and paid for use as needed.

The application can store the data related to the data clustering into the cloud, acquire the data in the cloud at any time according to the requirements and expand the data at any time. For example, access requests to be clustered may be stored in a "cloud", and may be obtained from the "cloud" if it is desired to cluster the access requests. For another example, the access request and the corresponding clustering result (such as scene type) may be stored in the "cloud", and if the access request and the corresponding clustering result need to be utilized, the access request and the corresponding clustering result may be obtained from the "cloud". For example, if an identification model is used for identifying the scene type of the access request, and a corresponding training sample is needed, a large number of training samples can be obtained from the cloud, wherein the access request and the corresponding scene type can be used as a training sample.

The embodiment of the application provides a data clustering scheme; the data clustering scheme can realize the clustering (or classifying) of the access requests on the original line and obtain corresponding clustering results, wherein the clustering results can refer to scene types corresponding to the access requests, and the scene types can be used for indicating the corresponding service scenes of the access requests on an application system (or simply referred to as a system). Specifically, the scheme principle is as follows: for the N access requests to be clustered, the access requests to be clustered can be clustered in turn to obtain a request set where each access request to be clustered is located. Where N is the total number of access requests to be clustered, one request set corresponds to one scene type, i.e. the scene types of all access requests in one request set are the same, and one request set may include one or more access requests.

The following description will be given of a data clustering scheme taking an nth access request to be clustered as an example: first, the nth access request may be sent to the tested system to obtain code coverage data of the tested system for playing back the n access requests. Wherein the system under test is the above mentioned system in a test environment. Alternatively, the instrumentation file in the tested system may be used to obtain the code coverage data of the n access requests, where the instrumentation file is inserted into the original code file of the tested system in advance through an instrumentation operation, and the instrumentation file may be used to obtain the code coverage data generated by the tested system when the n access requests are played back, where the code coverage data may be code execution status data generated by the tested system when the n access requests are played back, and may be understood as a representation of a code execution situation, such as the code coverage data may also be referred to as a code coverage representation.

After the code coverage data of the nth access request is obtained, the code coverage data can be used for clustering to obtain a target request set corresponding to the nth access request. Optionally, code coverage data of the n-1 th access request played back by the tested system can be acquired first; then, the target code coverage data of the nth access request may be determined based on the code coverage data of the nth-1 access request and the code coverage data of the nth access request, and finally, the nth access request may be clustered based on the target code coverage data, so as to obtain a target request set corresponding to the nth access request.

By implementing the clustering scheme, the code coverage can be expressed as the characteristic of the scene type, so that the aim of clustering the access requests is fulfilled. By taking the code coverage performance as the characteristic of the scene type, the access requests can be clustered based on the difference of the code coverage performance, namely the access requests with inconsistent code coverage performance of the code execution can be classified into different scene types, so that the condition of omission of the scene type can be effectively reduced, and the clustering effect is improved. In addition, the code coverage performance is usually a more visual embodiment mode of the program code execution condition, and the access requests of the same code coverage performance can be effectively classified by the method, so that the clustering accuracy is effectively improved.

In one implementation, the partial access requests in the obtained target request set may be used to test a new version of the tested system, that is, test a new version of the original system in a test environment. Compared with the method for testing the original system with the new version by using the full playback mode, the method and the device for testing the access requests can select part of the access requests from the request sets of one or more request sets obtained by clustering N access requests to test, can reduce the playback quantity of the access requests in the tested system with the new version while ensuring that all online access scenes are covered as much as possible in the test, further can reduce the test cost, such as the time cost, the storage cost of the access requests, the cost of equipment (i.e. equipment for executing test operation, such as a server) and the like, and can improve the test efficiency.

In a specific implementation, the above mentioned data clustering scheme may be performed by one computer device, in particular by a clustering service in the computer device; clustering services may refer to a manner for concatenating entire scene clusters (i.e., clustering scene types for N access requests), including, but not limited to, program (e.g., script file) calls, pipelining to implement clustering of N access requests. The computer device may be a terminal or a server; among them, the terminals mentioned herein may include, but are not limited to: smart phones, tablet computers, notebook computers, desktop computers, intelligent voice interaction equipment, intelligent home appliances, vehicle terminals, aircrafts and the like; a wide variety of clients (APP) may be running within the terminal, such as game-type clients, multimedia play-type clients, social-type clients, browser-type clients, information-flow-type clients, educational-type clients, and so on. The server mentioned herein may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligence platforms, and so on.

In a possible implementation manner, when the computer device is a server, the embodiment of the application provides a data clustering system, as shown in fig. 1, where the data clustering system includes at least one terminal and at least one server; the terminal can acquire the nth access request to be clustered, and upload the acquired nth access request to a server (i.e. computer equipment), so that the computer equipment can acquire the nth access request, and cluster the nth access request, thereby obtaining a target request set corresponding to the nth access request.

It will be appreciated that in the specific embodiments of the present application, related data such as user data, execution data, result data, etc. are referred to, and when the above embodiments of the present application are applied to specific products or technologies, permission or consent of the subject needs to be obtained, and collection, use and processing of related data needs to comply with related laws and regulations and standards of related countries and regions.

Based on the related description of the data clustering scheme, the embodiment of the application provides a data clustering method, and the embodiment of the application mainly uses computer equipment as an execution main body for explanation; referring to fig. 2, the data clustering method may include the following steps S201 to S204:

S201, acquiring an nth access request to be clustered, and sending the nth access request to a tested system.

It should be noted that, in the embodiment of the present application, N access requests to be clustered may exist, where N refers to the total number of access requests to be clustered, and N is a positive integer greater than or equal to 1; the N access requests may be clustered sequentially, and the clustering process of each access request is similar, and the embodiment of the present application is described by taking the clustering process of one access request as an example, for example, assuming that the access request is the nth access request, N e [1, N ].

The tested system may be a system in a testing environment and the same as an original system to be tested, where the original system may refer to a search system, a multimedia playing system, a browser system, an information stream system, and so on. The original system may also be understood as an application, and the original system may refer to a search class application, a multimedia play class application, a browser class application, a stream class application, and so on. The system under test may be a program that is clustering access requests for scene types, i.e., the access requests may be clustered for scene types in the system under test, and the code program of the system under test may be written using Go language program or other language program.

The access request in the embodiment of the present application may refer to an access request (such as a service request) in a real online environment, for example, the access request may also be referred to as an online access request (or an original online access request); the access request may be one or more of a call request from an upstream server, and a direct request from the client. For example, assuming that the system under test is a search class application, the direct request from the user side may refer to a search request initiated by the user on the search class application; the call request of the upstream server may refer to a call request sent by another server to a server corresponding to the search class application.

The instrumentation file for acquiring the code coverage data is inserted into the original code file of the tested system through instrumentation operation, and the original code file containing the instrumentation file may be referred to herein as an object code file, that is, the instrumentation file for acquiring the code coverage data exists in the object code file. The instrumentation operation refers to inserting a probe code segment into different logic branches of an original code program on the basis of ensuring the logic integrity of the original code program, and the probe code segment is used for acquiring required data through a probe, for example, in the embodiment of the application, code coverage data can be acquired through the probe, and the code coverage data can be: code execution state data generated for playback of the nth access request by the system under test.

In one implementation, the nth access request to be clustered may be obtained when there is a clustering requirement for access requests on the original line.

Alternatively, when the computer device obtains the clustering request for the nth access request, it may determine that the clustering requirement for the nth access request is obtained, where the clustering request may be used by a target object (which may refer to any user) to perform a related operation on the user operation interface, and trigger to generate the clustering request. If the target object needs to acquire the clustering result corresponding to the nth access request, a related operation can be executed on a user operation interface output by the used terminal, so as to send the clustering request for the nth access request to the computer equipment.

For example, referring to fig. 3, a user operation interface may be displayed on a terminal screen of a terminal used by a target object, and the user operation interface may include at least a data input area 301 and a confirmation control 302. If the target object wants to obtain the clustering result corresponding to the nth access request, the relevant information of the nth access request (for example, the relevant information may be directly the nth access request or the storage area address corresponding to the nth access request) may be input in the data input area 301; a triggering operation (e.g., a click, press, etc. operation) may then be performed on the validation control 302.

After the terminal detects that the confirmation control 302 is triggered, an nth access request may be acquired based on the information in the input area 301, and after the terminal acquires the nth access request, a clustering request may be sent to the computer device. If the data input area 301 inputs the nth access request, the cluster request may carry the nth access request; if the storage area address corresponding to the nth access request is input by the data input area 301, the clustering request may carry the storage area address corresponding to the nth access request, and after the computer device acquires the clustering request, the corresponding nth access request may be acquired based on the storage area address.

Alternatively, it may be determined that the cluster requirements for the access request are obtained when a cluster timing task is triggered. If a clustering timing task can be set, when the triggering condition for the clustering timing task is triggered, the acquisition of the clustering requirement can be determined. In one embodiment, a large number of access requests may be stored in a specific area, and the triggering condition may be that the current time reaches a preset clustering time, or that the remaining storage space of the specific storage area exceeds a preset remaining storage space, or the like.

S202, the instrumentation file is utilized to obtain code coverage data of the nth access request played back by the tested system.

Wherein the code coverage data of the nth access request may include: the code execution times of each logic code block in the target code file after the tested system plays back the nth access request are as follows: the tested system plays back the sum of the code execution times generated after the n access requests, wherein the n access requests comprise the 1 st access request and the n access request. That is, the number of code executions of one logical code block is: the tested system plays back the sum of the code execution times generated after each access request between the 1 st access request and the n-th access request.

For example, assuming n is 3, for the 3 rd access request, the number of code executions under one logical code block is: the tested system plays back the sum of the code execution times generated after 3 access requests, wherein the 3 access requests comprise a1 st access request, a2 nd access request and a3 rd access request.

In one implementation, the instrumentation file may count, in real time, the number of code executions of each logical code block in the object code file after the tested system plays back the nth access request. In general, the instrumentation file has a function of counting the number of code executions of each logical code block, for example, it can be regarded as a counter whose initial value is 0, which is incremented by 1 when a certain logical code block is executed once when an access request is played back, and which remains unchanged when a certain logical code block is not executed when an access request is played back.

The code coverage data may further include information related to each logic code block in the tested system, such as a code file name of the tested system, where the code file may include a code file name of each logic code block in the target code file; the detailed start-stop coordinate information of all logic code blocks of the tested system in the object code file, wherein the start-stop coordinate information of one logic code block can refer to the initial line number and the end line number of the logic code block in the object code file; the number of rows each logical code block contains in all logical code blocks of the system under test, and so on.

S203, acquiring code coverage data of the n-1 th access request played back by the tested system, and determining target code coverage data of the n-1 th access request based on the code coverage data of the n-1 th access request and the code coverage data of the n-th access request.

As can be seen from the foregoing, each time the plug-in file obtains the code coverage data of one access request, the code coverage data of the access request is accumulated, and then, in order to obtain the code coverage data (e.g., the target code coverage data) corresponding to only the current access request, the target code coverage data corresponding to only the current access request can be determined by using the code coverage data of the previous access request and the code coverage data of the current access request.

In one implementation, for an nth access request, to determine target code coverage data for the nth access request, the code coverage data for the nth-1 access request may be combined. In the specific implementation, code coverage data of the n-1 th access request played back by the tested system can be acquired first; the code coverage data of the access request acquired by the plug-in file each time can be stored in a designated storage area, and when the code coverage data of any access request is needed, the needed code coverage data can be acquired from the designated storage area; the code coverage data of the n-1 th access request may be acquired from the designated memory area as here. After the code coverage data of the n-1 th access request is obtained, the target code coverage data of the n-1 th access request can be determined based on the code coverage data of the n-1 th access request and the code coverage data of the n-th access request.

As previously described, the code coverage data of the nth access request may include: the number of code executions of each logic code block in the target code file after the tested system plays back the nth access request can be determined based on the number of code executions of each logic code block corresponding to the nth-1 access request and the number of code executions of each logic code block corresponding to the nth access request. Specifically, for any one of the respective logical code blocks, the difference between the number of times of code execution of the any one logical code block under the playback of the nth access request and the number of times of code execution of the any one logical code block under the playback of the nth-1 access request may be used as the target number of times of code execution of the nth access request under any one logical code block.

By the method, the number of times of executing the target code of the nth access request under each logic code block can be obtained, and further, the number of times of executing the target code of the nth access request under each logic code block can be used as target code coverage data of the nth access request. The object code coverage data may further include other data, such as the other data, that is, the above-mentioned related information about each logic code block in the tested system, such as a code file name of the tested system, detailed start-stop coordinate information of all logic code blocks in the tested system in the object code file, a number of lines included in each logic code block in all logic code blocks in the tested system, and so on.

For example, the object code file includes the following 4 logical code blocks: code block 1, code block 2, code block 3, code block 4; the code execution times of the 4 logic code blocks corresponding to the n-1 access request are k1, k2, k3 and k4 respectively, and the code execution times of the 4 logic code blocks corresponding to the n-1 access request are m1, m2, m3 and m4 respectively; the target code execution times of the 4 logical code blocks corresponding to the nth access request are m1-k1, m2-k2, m3-k3, and m4-k4, respectively.

If n is1, it indicates that the nth access request of the cluster is the 1 st access request, and the (n-1) th access request is the 0 th access request, in which case, the code execution times of each logical code block corresponding to the 0 th access request may be defaulted to be 0.

S204, clustering the nth access request based on the target code coverage data to obtain a target request set corresponding to the nth access request.

In one implementation, the nth access request may be clustered based on the number of target code executions of each logical code block in the target code coverage data to obtain a target request set corresponding to the nth access request.

It can be understood that when the nth access request is clustered, the clustering process of the first n-1 access requests is completed, and each access request in the first n-1 access requests has a corresponding request set. Wherein, a request set corresponds to a scene type; each request set may include one or more access requests or request identifiers corresponding to each access request in the one or more access requests, where one request identifier is used to uniquely indicate one access request; the application is described primarily with respect to the example of request identification contained in a request set. And the target code execution times of all the logic code blocks corresponding to all the access requests in one request set are the same, namely the scene types of all the access requests in one request set are the same. It should be appreciated that if the target code execution times of each logical code block of different access requests are the same, then it may be determined that these different access requests are of the same scene type.

Based on the above, when the nth access request is clustered, the target request set corresponding to the nth access request is determined only based on the target code execution times of the logic code blocks corresponding to the respective request sets in the existing request sets.

Alternatively, if the target code execution times of each logical code block corresponding to the nth access request are the same as the target code execution times of each logical code block corresponding to a certain request set, the request set may be determined as the target request set. If the target code execution times of each logic code block corresponding to the nth access request are different from the target code execution times of each logic code block corresponding to any one of the existing request sets, it can be determined that the access request corresponds to a new scene type, and a request set can be newly created, wherein the newly created request set is the target request set.

After determining the target request set, an nth access request may be added to the target request set, and if the direct access request is stored in the request set, the nth access request may be directly added to the target request set; if the request identifier of the access request is stored in the request set, the request identifier corresponding to the nth access request may be added to the target request set.

For example, there are currently the following 3 request sets: set 1, set 2, set 3, and the scene types corresponding to set 1, set 2, set 3 are type 1, type 2, type 3, respectively; and assuming that the object code file includes 4 logic code blocks, the object code execution times of each logic code block corresponding to the set 1 are a1, a2, a3 and a4, the object code execution times of each logic code block corresponding to the set 2 are b1, b2, b3 and b4, and the object code execution times of each logic code block corresponding to the set 3 are c1, c2, c3 and c4.

If the target code execution times of each logic code block corresponding to the nth access request are b1, b2, b3 and b4, it may be determined that the target request set of the nth access request is set 2, that is, the scene type of the nth access request is the scene type corresponding to set 2 (i.e., type 2), and the nth access request or the request identifier corresponding to the nth access request may be added to set 2.

If the target code execution times of each logic code block corresponding to the nth access request are d1, d2, d3 and d4, it can be seen that the nth access request does not belong to any request set, a request set can be newly established, and can be used as a target request set, for example, the target request set can be called as set 4, and the scene type corresponding to set 4 can be called as type 4; then the nth access request or the request identification corresponding to the nth access request may be added to the set 4.

It should be noted that if n is 1, it indicates that the nth access request of the cluster is the 1 st access request, in this case, it may be directly determined that the 1 st access request corresponds to a scene type, then a request set may be newly created, and the newly created request set is the target request set, and the request identifier of the 1 st access request may be added to the target request set.

In one implementation, a portion of the access requests in the set of target requests may be used to test a new version of the system under test, i.e., a new version of the original system under test in a test environment. That is, embodiments of the present application may select a portion of the access requests from each of one or more request sets obtained by clustering N access requests for testing, e.g., may select one or more access requests. It can be seen that, after a large number of access requests are clustered, it can be ensured that all online access scenes are covered as much as possible in the test, and meanwhile, the playback amount of the access requests in the new version of tested system can be reduced, so that the test cost (such as time cost, storage cost of the access requests, equipment cost and the like) can be reduced, and the test efficiency can be improved.

In the embodiment of the application, the access requests can be acquired and played back one by one to a clustering processing environment (namely, an environment used for playing back the access requests on a line and clustering the access requests, such as a tested system, and the like), and the code coverage performance of the access requests in the tested system can be acquired, and can be used as the characteristic of scene types, so that the purpose of clustering the access requests is achieved. By taking the code coverage performance as the characteristic of the scene type, the access requests can be clustered based on the difference of the code coverage performance, namely the access requests with inconsistent code coverage performance of the code execution can be classified into different scene types, so that the condition of omission of the scene type can be effectively reduced, and the clustering effect is improved. In addition, the code coverage performance is a more visual embodiment mode of the program code execution condition, and the access requests of the same code coverage performance can be effectively classified by the method, so that the clustering accuracy is effectively improved. Meanwhile, the mode of clustering (classifying) the online access requests based on the code coverage performance does not need to be manually participated, and the clustering can be directly performed based on the specific code coverage performance of each access request, so that automatic clustering operation can be realized, and clustering automation and intellectualization are improved. In addition, the request set obtained after clustering a large number of access requests can be used for testing a new version of tested system, so that the playback amount of the access requests in the new version of tested system can be reduced while all online access scenes are covered as much as possible in the test, the test cost can be further reduced, and the test efficiency can be improved.

Fig. 4 is a schematic flow chart of another data clustering method according to an embodiment of the present application. The embodiment of the application mainly uses computer equipment as an execution main body for explanation; referring to fig. 4, the data clustering method may include the following steps S401 to S407:

S401, acquiring an nth access request to be clustered, and sending the nth access request to a tested system.

In one implementation, before the clustering operation is performed on the N access requests, preparation work may be further performed for the clustering operation, that is, before a stage of performing the clustering operation (for example, may be simply referred to as a clustering stage), a preparation stage may be further included, where the preparation stage may include a stage of recording the access requests and a stage of inserting a tested system, and specific implementation of the two stages is described below.

It will be appreciated that before the access requests are clustered, that is, before step S401 is performed, the access requests that need to be clustered may be recorded, and these access requests may be stored in a storage area in advance, for example, the storage area may be referred to as a request data storage area, so that, when the clustering is performed, the corresponding access requests may be acquired from the request data storage area and clustered later.

In one implementation, the flow of recording an access request is described below in connection with fig. 5 a. The specific implementation manner may include the following steps S11 to S12:

S11, recording a plurality of access requests (online access requests) aiming at the tested system.

The specific recording mode adopted for recording the access request is not limited. For example, the data can be obtained based on the flow mirroring capability of the network hardware device, or obtained from online in real time in a software manner, or obtained by analyzing an online access request log of the tested system.

S12, obtaining and storing hash values of all access requests.

In one implementation, a hash calculation may be performed on each of a plurality of access requests to obtain a hash value for each access request. The algorithm used for hash calculation may include, but is not limited to, any one of MD5, SHA1, SHA256, SHA512, and may be calculated in parallel or serial manner. After obtaining the hash value for each access request, the hash value for each access request may be identified as a corresponding request identification (e.g., may also be referred to as a request ID). Further, each access request may be stored in association with a corresponding request identification in the request data storage area. The storage form can include any one of files, databases and caches; each access request and corresponding request identification may be stored in an online access request memory core structure at the time of storage, for example, the online access request memory core structure may be as shown in table 1 below:

TABLE 1

Wherein the key information stored in the request data storage area includes a request identification (i.e., request ID) and an access request text. The request ID can be understood as a primary key, that is, the access request text is stored with the request ID as the primary key when stored, and the subsequent operations such as searching for the access request can be performed by directly using the request ID as an index through the associated storage of the request ID. The access request text may include all information included in the original online access request (i.e., the recorded access request described above), which is not limited to the corresponding network protocol. For example, taking the example that the network protocol of the original online access request is the HTTP protocol, the access request text will store information including, but not limited to, a Header, body and URL Parameters.

The representation form of the single access request when stored is not limited, and can be a dictionary data structure, a list data structure and the like. For example, if the storage condition of the access request is represented in a list form, the storage of the access request may be as follows in table 2:

TABLE 2

Request ID	Access request text
		Request ID1	Text 1
Request ID2	Text 2
		…	…

Wherein, a row in table 2 represents a related data record of an access request, and the record includes a request identifier of the access request and a corresponding access request text.

In summary, the specific embodiment of acquiring the nth access request to be clustered in step S401 may be that the nth access request to be clustered is acquired from the request data storage area.

It can be understood that, in order to make the tested system have the function of acquiring the code coverage data, the tested system needs to be subjected to the pile-inserting operation in advance, and the pile-inserting process of the tested system is described in connection with fig. 5 b. The specific implementation manner may include the following steps S21 to S23:

s21, performing instrumentation processing (instrumentation operation) on an original code file corresponding to a tested system to obtain a target code file, wherein the target code file comprises an instrumentation file; i.e. the instrumentation is automatically performed in the code of the system under test. In addition to performing instrumentation operations, a new interface may be created by way of code injection to facilitate subsequent use of the interface for dynamically retrieving code overlay data (e.g., code execution times) for access requests. The obtained code coverage data can be stored so as to be directly obtained when needed, and the application is not limited to a storage form, and the storage form comprises any one of a file, a database, a cache and the like.

S22, compiling the target code file to obtain the executable file. That is, compiling of the object code file of the tested system may be started to obtain the tested system instrumentation package, which may be understood as the executable file described above. Namely, through compiling, the tested system can be guaranteed to have the function of acquiring code coverage data by using the instrumentation file.

S23, starting the tested system based on the executable file. So that the subsequent clustering stage can be triggered to be executed, the started tested system can be used for playing back the access request, and corresponding code coverage data can be obtained. The starting mode is not limited, and for example, the starting can be implemented through a script file.

The recording access request stage and the tested system instrumentation stage in the preparation stage may be executed by a clustering service in a computer device. After the preparation phase is completed by the clustering service, the subsequent clustering phase can be performed, namely, clustering is performed on each access request in the request data storage area.

S402, the instrumentation file is utilized to obtain code coverage data of the nth access request played back by the tested system.

S403, acquiring code coverage data of the n-1 th access request played back by the tested system, and determining target code coverage data of the n-1 th access request based on the code coverage data of the n-1 th access request and the code coverage data of the n-th access request.

The specific embodiments of steps S401 to S403 may refer to the descriptions related to steps S201 to S203, and are not described herein.

S404, determining the target scene type of the nth access request based on the target code coverage data.

In one implementation, first, hash computation may be performed on target code coverage data of an nth access request to obtain a hash value corresponding to the target code coverage data. The target code overlay data for hash calculation here is the target code execution times of all logical code blocks corresponding to the nth access request. Among them, algorithms utilized for hash calculation include, but are not limited to, any of MD5, SHA1, SHA256, SHA 512. Then, the hash value corresponding to the target code coverage data may be used as a target type identifier of the target scene type of the nth access request. I.e. the corresponding scene type can be characterized by a hash value of the target code overlay data of the access request, and the hash value can be used as a scene type identifier (or simply referred to as a type identifier), the scene type can be represented by the scene type identifier, and a scene type identifier can be used for uniquely indicating a scene type. By introducing the type identifier, the corresponding request set can be determined by directly utilizing the target type identifier corresponding to the target scene type during subsequent clustering, and the corresponding request set is not required to be determined by utilizing target code coverage data, so that the data processing capacity and the data processing complexity during clustering can be reduced, and the clustering speed can be further increased.

S405, matching the target scene type with the scene type existing in the scene data storage area.

As can be seen from the foregoing, the specific implementation of step S405 may be: the target type identification is matched with the type identification of the scene type existing in the scene data storage area. If the target type identifier is matched with any type identifier in the type identifiers of the existing scene types, determining that the target scene type is matched with any scene type in the existing scene types; if the target type identifier does not match the type identifier of the existing scene type, it may be determined that the target scene type does not match the existing scene type.

It is to be understood that by constantly clustering access requests in the request data storage area, the clustering result of each access request may be stored in one storage area, which may be referred to as a scene data storage area, for example. The scene data storage area may be configured to store data of scene types associated with the access request, for example, a request set corresponding to each of one or more scene types may be stored, where one request set includes one or more request identifications.

In one implementation, the scene data storage area may be stored in a scene type information storage structure when stored, for example, the scene type information storage structure may be as shown in table 3 below:

TABLE 3 Table 3

The key information stored in the scene data storage area may include scene type identification (e.g., may be referred to as scene type ID, type identification, type ID, etc.), code coverage performance under the scene type, and request identification list under the scene type, among others. The scene type ID may be understood as a primary key, i.e., stored with the scene type ID as a primary key when storing the scene type of the access request. Code coverage manifestations are also referred to above as object code coverage data. For example, the object code overlay data may include four core contents: firstly, the file name of the code of the tested system; secondly, detailed start-stop coordinate information of all logic code blocks of the tested system in the target code file; thirdly, the number of lines contained in each logic code block in all logic code blocks of the tested system; and fourthly, summarizing the difference value of the execution times of all the logic code blocks of the tested system before and after the completion of the request execution of the scene type, namely the target code execution times of the code logic blocks. The request identifier list may refer to a list composed of request representations of access requests belonging to the scene type, and specifically may be a list composed of request identifiers corresponding to the access requests, for example, as may be understood as the request set described above.

The storage form corresponding to the scene data storage area may include any one of a file, a database and a cache. The representation of the code overlay representation after the execution of a single access request at the time of storage is not limited, and may be, for example, a dictionary data structure, a list data structure, or the like. For example, if the storage condition of the scene data storage area is represented in a list form, the storage of the scene data storage area may be as shown in table 4 below:

TABLE 4 Table 4

Scene type ID	Code coverage performance	Request identification list
			Type ID1	Manifestation 1	{ Request id 1 request id 4 request id 6}
Type ID2	Manifestation 2	{ Request id 2 request id 5}
			…	…	…

Wherein a row in table 4 represents a record, each record being a scene type identifier for indicating a scene type, a code coverage representation under the scene type, and a list of request identifiers attributed to the scene type.

S406, if the target scene type is matched with any scene type in the existing scene types, taking the request set corresponding to the matched scene type as a target request set, and adding a target request identifier of the nth access request into the target request set, wherein the target request identifier is used for indicating the nth access request.

In one implementation, as previously described, if the target type identifier matches any of the type identifiers of the existing scene types, then it may be determined that the target scene type matches any of the existing scene types. If the target scene type is matched with any scene type in the existing scene types, it may be determined that the target scene type and the matched scene type are the same scene type, then the access request in the request set corresponding to the nth access request and the matched scene type may be classified as an access request under one scene type, that is, the request set corresponding to the matched scene type may be used as a target request set, and the target request identifier of the nth access request may be added to the target request set. Wherein the target request identifier may be used to indicate an nth access request, and similarly, a request identifier may be used to indicate an access request.

For example, existing scene types are type 1, type 3; if the target scene type is type 3, the request set corresponding to type 3 can be used as a target request set, and the target request identification of the nth access request can be added into the target request set. For example, the request set corresponding to type 3 (i.e., the target request set) may be represented as { request id 1 request id 2 request id 3}, where request id 1, request id 2, and request id 3 represent request ids corresponding to 3 access requests, respectively. In summary, the target request identifier corresponding to the nth access request (for example, the target request identifier is denoted as a request identifier n) may be added to the target request set, and the target request set after adding the target request identifier may be updated to { request identifier 1 request identifier 2 request identifier 3 request identifier n }.

S407, if the target scene type is not matched with the existing scene type, a request set is newly built, the newly built request set is used as a target request set, and a target request identifier of the nth access request is added into the target request set.

In one implementation, as previously described, if the target type identifier does not match the type identifier of the existing scene type, then it may be determined that the target scene type does not match the existing scene type. If the target scene type does not match the existing scene type, it may be determined that the target scene type is a new scene type, a request set may be newly created, the newly created request may also be used as a target request set corresponding to the nth access request, and it may be determined that the scene type corresponding to the newly created request set (i.e., the target request set) is the target scene type. Further, the target request identifier of the nth access request may also be added to the target request set.

For example, existing scene types are type 1, type 2, type 3; if the target scene type is type 4, a request set for the scene type is type 4 (the newly created request set is empty at this time) can be created, and the request set is the target request set of the nth access request. Further, a target request identifier corresponding to the nth access request (for example, the target request identifier is denoted as a request identifier n) may be added to the target request set, and the target request set after adding the target request identifier may be updated to { request identifier n }.

It should be noted that in the scenario of verifying availability of a new version of the system by recording online access requests (access requests), it is often necessary for developers to cover all online access scenarios, in which case it may be necessary to store data corresponding to all online access requests and to replay these online access requests to the tested system for the new version. It can be seen that this storage manner of storing all online access requests may result in a relatively high storage cost, and the playback manner of playing back all online access requests may also result in a long time for playing back online access requests, which may result in an efficient blocking of the system in the overall development verification process, or a low verification speed.

In the embodiment of the application, after the clustering operation of all the access requests for the scene types is completed, only the data corresponding to part of the access requests under each scene type can be selected for storage, for example, one or more data corresponding to the access requests under each scene type can be selected for storage; therefore, the data storage amount can be effectively reduced, and the storage cost is reduced. In this case, the storage manner of the access request may be specifically as follows:

First, a reference request identifier may be selected from each request set of one or more request sets obtained by clustering N access requests, where, for one request set, the selected reference request identifier may be a part of all request identifiers in the request set, i.e. the number of selected reference request identifiers may be one or more, and its specific number is not limited. It should be noted that, in the case that the selected number is a plurality, the number may be smaller than the data corresponding to all the request identifications in the request set. If there is only one request identity in a request set, the selected reference request identity is the request identity. It is to be appreciated that a request set corresponds to a scene type, and that one request set includes one or more request identifiers, and that one request identifier is used to indicate an access request.

Then, the access request corresponding to the reference request identifier for each scene type may be obtained from the request data storage area, where N access requests and corresponding request identifiers are stored as described above. Finally, the reference request identification under each scene type may be stored in association with the corresponding access request in the target storage area. Compared with storing all access requests, in the embodiment of the application, only partial access requests under each scene type can be selected for storage, so that the data storage amount can be effectively reduced while the coverage of the whole scene types is ensured, and the storage cost is reduced.

In one implementation, a portion of the access requests in the set of target requests may be used to test a new version of the system under test; it is known that after the clustering of the N access requests is completed, one or more request sets generated by clustering the N access requests may be obtained, and then a portion of the access requests in each request set of the one or more request sets may be selected for testing a new version of the tested system. Optionally, when detecting that a test requirement for a new version of the tested system exists, the access request corresponding to each scene type may be obtained from the target storage area to perform a test, for example, a part of the access requests may be selected from each request set of one or more request sets obtained by clustering N access requests to perform a test. It can be seen that through clustering operation, the online access requests with inconsistent code execution coverage performance can be classified into different scene types, the condition of missing scene types can be effectively avoided, and full scene coverage is realized. And in the subsequent system test of a new version by using the clustering result, the representative online access request can be selected from all online access requests for playback verification, and the selected online access request can be ensured to accurately cover key code logic of the tested system, so that the reliability and scene coverage rate of the system in the validity verification are improved.

Optionally, when selecting a part of access requests from each request set of one or more request sets to test, one access request from each request set may be selected to test, that is, only one access request under each scene type needs to be sent to a new version of the tested system, that is, all online access scenes may be covered as far as possible, so that playback time may be effectively shortened. As described above, when testing a new version of the tested system, the required data related to the access request (i.e. the access request and the corresponding request identifier) may be stored in the target storage area in advance; then, when only one access request of each scene type is used for testing, that is to say, when the target storage area is used for storing, only data related to one access request under each scene type is needed to be stored, so that the storage cost for storing the access requests can be greatly reduced.

Optionally, when selecting a part of the access requests from each of the one or more request sets to test, one or more access requests from each request set may be selected to test. The number of reference request identifiers to be selected in each request set can be determined based on the importance degree of the scene type corresponding to each request set. In one embodiment, the number of reference request identifiers to be selected in a request set may be positively correlated with the importance of the scene type corresponding to the request set; for example, if the importance of the scene type corresponding to a certain request set is higher, the number of reference request identifiers to be selected from the request set is larger, and if the importance of the scene type corresponding to a certain request set is lower, the number of reference request identifiers to be selected from the request set is smaller.

The importance degree of the scene type may refer to the importance degree of the service scene corresponding to the scene type. Alternatively, the importance level may be preset, for example, for a system, the system may include a plurality of service scenarios, and when a developer develops the system, the importance level of each service scenario under the system may be preset. Optionally, the importance degree of the scene type may be determined based on the number of request identifications in the request set under each scene type; for example, the importance of a scene type may be positively correlated with the number of request identifiers in a request set under the scene type, i.e., the greater the number of request identifiers in a request set under a certain scene type, the higher the importance of the scene type, and the lesser the number of request identifiers in a request set under a certain scene type, the lower the importance of the scene type.

In conclusion, more access requests can be adopted for testing service scenes with higher importance, so that the reliability of the test can be effectively enhanced; for service scenes with low importance, fewer access requests can be used for testing, and when testing of different service scenes is ensured, the cost can be reduced as much as possible.

As described above, when testing a new version of the tested system, the required data related to the access request (i.e. the access request and the corresponding request identifier) may be stored in the target storage area in advance; then, when testing with one or more access requests per scene type, that is to say when storing in the target storage area, the data relating to one or more access requests per scene type may also be stored. In addition, by storing a plurality of access requests under one scene type, when an abnormal condition occurs (such as that return data corresponding to the access request cannot be acquired, the access request is destroyed, etc.) during testing by using a certain access request, the rest access requests can be acquired again from the storage for testing; it is possible to avoid that when an abnormal situation occurs in the test using the access request, no remaining access requests are subjected to the test again.

For a better understanding of the data clustering method according to the embodiment of the present application, the following is further described with reference to fig. 5c, where an execution subject is described as an example of a clustering service in a computer device. As shown in fig. 5c, the data clustering method may include the following steps S31 to S38:

s31, after receiving the signal of the completion of the recording online access request flow and the tested system pile inserting flow in the preparation stage, the clustering service can start the clustering operation.

The recording on-line access request flow in the preparation stage and the system under test pile inserting flow can be shown in fig. 5a and 5b respectively.

S32, the clustering service acquires a piece of request data from the online access request storage (namely the request data storage area) and the request data can contain a request ID and the text of the original online access request (namely the access request).

Wherein the rule for acquiring one piece of request data is not limited. Alternatively, a piece of request data may be obtained randomly from the request data storage area, or a piece of request data may be obtained in the target order. For obtaining a piece of request data according to a target sequence, the request data contained in the request data storage area may be sequenced in advance to obtain a sequencing result for the request data, and the target sequence may refer to the sequence indicated by the sequencing result, that is, the request data may be sequentially obtained and clustered according to the sequence indicated by the sequencing result. Compared with the method that the clustering operation is carried out by randomly acquiring one piece of request data each time, the method has the advantages that the order of clustering can be effectively improved by acquiring one piece of request data according to the target sequence, the follow-up process that whether the request data is clustered or not is judged when one piece of request data is randomly selected can be avoided, and therefore the clustering efficiency can be effectively improved.

And S33, the clustering service plays back the original online access request acquired in the S32 to the tested system after the instrumentation.

S34, the clustering service acquires the code coverage performance of the tested system after the original online access request is played back.

The code coverage expression may be understood as the number of code executions of each logical code block in the code coverage data.

In one implementation, the clustering service may obtain the current code coverage performance through the interface created in the system under test instrumentation flow S31 in the preparation stage after receiving the return data of the system under test. The returned data may refer to data after the tested system responds to the original online access request, that is, after the clustering service receives the returned data, it may indicate that the tested system completes execution of the code corresponding to the original online access request, and at this time, the obtained code coverage performance is also reliable.

S35, the clustering service calculates the difference value between the current code coverage performance and the code coverage performance of the last original online access request after playback, and takes the difference value as the code coverage performance of the original online access request.

In one implementation, for any logical code block, the number of times the current request (i.e., the current original online access request, which may be the nth access request described above) of the logical code block is executed minus the number of times the logical code block is executed after the last request (i.e., the last original online access request, which may be the nth-1 access request described above) is executed may be used as the number of times the current request is executed in the target code of the logical code block. Based on this, it is known that the code coverage representation of the original online access request is the target code execution number of each logical code block.

S36, the clustering service judges whether the code coverage performance of the current request is a new scene type.

In one implementation, the clustering service may take the currently requested code coverage performance (i.e., the target code coverage data) as an input, and obtain, through a hash algorithm, a hash value of the currently requested code coverage performance, that is, perform hash calculation on the code coverage performance, and obtain a corresponding hash value. And the hash value may be referred to as a scene type identification (or as a type identification, scene type ID, etc.), which may be used to indicate a scene type.

After obtaining the currently requested scene type identifier, the clustering service may determine whether the scene type identifier has a record of the scene type identifier in the scene type information store (i.e., the scene data storage area described above). If there is a record, the currently requested scene type may be determined to be an existing scene type, and if there is no record, the currently requested scene type may be determined to be a new scene type. Optionally, the above-mentioned judging process may be that the currently requested scene type identifier is matched with the scene type identifier in the scene type information storage one by one, if any scene type identifier can be matched, the result is recorded, and if none of the currently requested scene type identifier is matched with the scene type identifier in the scene type information storage, the result is recorded.

S37, storing the current request into a scene type information storage by the clustering service according to the scene type.

In one implementation, if the clustering service determines that the currently requested code coverage appears as a new scene type, the clustering service may add scene information for the scene type to the scene type information store. When scene information is newly added in the scene type information storage, the stored primary key can be a scene type identifier, the code coverage performance of the current request can be used as data, and the request identifier of the current request can be stored in a request identifier list (namely the request set) under the scene type identifier. If the clustering service judges that the code coverage of the current request is presented as the existing scene type, the clustering service can find a matched scene type identifier by using the scene type identifier in the scene type information storage, and newly add the request identifier of the current request in a request identifier list under the scene type identifier.

S38, after the clustering of the request data is completed, the clustering service can continuously judge whether the original online access request (such as the request data storage area) has the residual original online access request which is not subjected to the clustering operation. If so, the process may jump back to S32 to cluster one of the remaining non-clustered original online access requests. If all original online access requests have completed the clustering operation, it may be indicated that the clustering stage is complete.

The specific manner in which the clustering service determines whether there are any remaining original online access requests that have not been clustered is not limited, including, but not limited to, by using an additional tag field in the storage of the original online access request data to indicate whether clustering has been completed. In a specific implementation, after each original online access request completes a clustering operation, a tag field may be added to the original online access request, where the tag field is used to indicate that the original online access request has completed the clustering operation. Then, after the clustering of a piece of request data is completed, one original online access request may be selected from the unmarked original online access requests to continue the clustering operation.

It should be noted that, if the clustering service obtains the original online access requests in the above-mentioned target order and performs the clustering operation, an additional tag field may not be needed to be used to indicate whether the clustering is completed, and only after each original online access request is completed and the clustering operation is performed, the next original online access request is obtained according to the order indicated by the target order.

If the clustering stage is completed, information of N original online access requests under each scene type is stored in the scene type information storage, and each piece of the scene type information storage is recorded as a scene type identifier for indicating the scene type, a code coverage performance under the scene type and a request identifier list belonging to the scene type.

In the embodiment of the application, the access requests of the system can be recorded and stored, and clustering is realized on the basis of code coverage performance for all recorded access requests, so that the access requests can be clustered on the basis of the difference of the code coverage performance, and the clustering effect and the clustering accuracy can be effectively improved. And the clustering result of each access request can be applied to the validity verification of the system aiming at the new version, one or more access requests can be selected from each scene type as representative access requests through the clustering result, and the representative access requests of all scene types are played back to the system (namely the new version of the tested system) in the tested environment one by one, so that the playback verification of the new version of the tested system based on the representative access requests can be realized, namely all online access scenes can be covered as much as possible in the playback verification, and the test reliability is further improved.

Fig. 6 is a schematic structural diagram of a data clustering device according to an embodiment of the present application. The data clustering device described in this embodiment includes:

A first obtaining unit 601, configured to obtain an nth access request to be clustered, and send the nth access request to a tested system; the method comprises the steps that a pile inserting file for obtaining code coverage data exists in an object code file of a tested system, the code coverage data is code execution state data generated by playing back an nth access request of the tested system, N is [1, N ], N is the total number of access requests to be clustered, and N and N are positive integers;

A second obtaining unit 602, configured to obtain code coverage data of the nth access request played back by the tested system using the instrumentation file;

A determining unit 603, configured to obtain code coverage data of an n-1 th access request played back by the tested system, and determine target code coverage data of the n-1 th access request based on the code coverage data of the n-1 th access request and the code coverage data of the n-th access request;

A clustering unit 604, configured to cluster the nth access request based on the object code coverage data, to obtain a target request set corresponding to the nth access request; and the partial access requests in the target request set are used for testing the new version of the tested system.

In one implementation, the code coverage data of the nth access request includes: the tested system plays back the code execution times of each logic code block in the target code file after the nth access request, wherein the code execution times of one logic code block are as follows: the tested system plays back the sum of code execution times generated after n access requests, wherein the n access requests comprise the 1 st access request and each access request between the n access requests; the determining unit 603 is specifically configured to:

For any one of the logic code blocks, taking the difference value between the code execution times of the any one logic code block under the playback of the nth access request and the code execution times of the any one logic code block under the playback of the (n-1) th access request as the target code execution times of the nth access request under the any one logic code block;

and taking the target code execution times of the nth access request under each logic code block as target code coverage data.

In one implementation, the clustering unit 604 is specifically configured to:

Determining a target scene type of the nth access request based on the target code coverage data, and matching the target scene type with the scene type existing in a scene data storage area;

if the target scene type is matched with any scene type in the existing scene types, taking a request set corresponding to the matched scene type as a target request set, and adding a target request identifier of the nth access request into the target request set; the target request identifier is used for indicating the nth access request;

If the target scene type is not matched with the existing scene type, a request set is newly established, the newly established request set is used as a target request set, and the target request identification of the nth access request is added into the target request set.

In one implementation, the clustering unit 604 is specifically configured to:

Carrying out hash calculation on the target code coverage data of the nth access request to obtain a hash value corresponding to the target code coverage data, and taking the hash value corresponding to the target code coverage data as a target type identifier of a target scene type of the nth access request;

Matching the target type identifier with the type identifier of the existing scene type in the scene data storage area;

if the target type identifier is matched with any one of the type identifiers of the existing scene types, determining that the target scene type is matched with any one of the existing scene types;

and if the target type identifier is not matched with the type identifier of the existing scene type, determining that the target scene type is not matched with the existing scene type.

In one implementation, the apparatus further includes a test unit 605, specifically configured to:

Selecting a reference request identifier from each request set of one or more request sets obtained by clustering N access requests; one request set corresponds to one scene type, one request set comprises one or more request identifiers, and one request identifier is used for indicating one access request;

Obtaining access requests corresponding to reference request identifiers under each scene type from a request data storage area, wherein the request data storage area stores the N access requests and the corresponding request identifiers;

storing the reference request identification under each scene type and the corresponding access request in a target storage area in an associated manner;

And when detecting that the test requirement of the tested system aiming at the new version exists, acquiring an access request corresponding to each scene type from the target storage area for testing.

In one implementation, the first obtaining unit 601 is further configured to:

recording a plurality of access requests for the tested system;

Carrying out hash calculation on each access request in the plurality of access requests to obtain a hash value of each access request, wherein the hash value of each access request is used as a corresponding request identifier;

Storing each access request and the corresponding request identifier in a request data storage area in an associated manner;

And acquiring an nth access request to be clustered from the request data storage area.

In one implementation, the first obtaining unit 601 is further configured to:

performing instrumentation processing on an original code file corresponding to the tested system to obtain a target code file, wherein the target code file comprises an instrumentation file;

compiling the target code text to obtain an executable file;

And starting the tested system based on the executable file so that the started tested system is used for playing back the access request.

It will be appreciated that the division of the units in the embodiment of the present application is illustrative, and is merely a logic function division, and other division manners may be actually implemented. The functional units in the embodiment of the application can be integrated in one processing unit, or each unit can exist alone physically, or two or more units are integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

Fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application. The computer device described in the present embodiment includes: a processor 701, a memory 702 and a network interface 703. Data may be interacted between the processor 701, the memory 702, and the network interface 703.

The Processor 701 may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 702 may include read only memory and random access memory and provides program instructions and data to the processor 701. A portion of the memory 702 may also include non-volatile random access memory. Wherein the processor 701, when calling the program instructions, is configured to execute:

In one implementation, the code coverage data of the nth access request includes: the tested system plays back the code execution times of each logic code block in the target code file after the nth access request, wherein the code execution times of one logic code block are as follows: the tested system plays back the sum of code execution times generated after n access requests, wherein the n access requests comprise the 1 st access request and each access request between the n access requests; the processor 701 is specifically configured to:

In one implementation, the processor 701 is specifically configured to:

In one implementation, the processor 701 is further configured to:

recording a plurality of access requests for the tested system;

In one implementation, the processor 701 is further configured to:

compiling the target code text to obtain an executable file;

The embodiment of the application also provides a computer storage medium, and the computer storage medium stores program instructions, which when executed can include part or all of the steps of the data clustering method in the corresponding embodiment of fig. 2 or fig. 4.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of action described, as some steps may be performed in other order or simultaneously according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.

Embodiments of the present application also provide a computer program product or computer program comprising program instructions stored in a computer readable storage medium. The program instructions are read from the computer-readable storage medium by a processor of the computer device, and executed by the processor, cause the computer device to perform the steps performed in the embodiments of the methods described above.

The foregoing describes in detail a data clustering method, apparatus, computer device and storage medium provided by the embodiments of the present application, and specific examples are applied to illustrate the principles and embodiments of the present application, where the foregoing examples are only used to help understand the method and core idea of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A method of clustering data, the method comprising:

2. The method of claim 1, the code coverage data of the nth access request comprising: the tested system plays back the code execution times of each logic code block in the target code file after the nth access request, wherein the code execution times of one logic code block are as follows: the tested system plays back the sum of code execution times generated after n access requests, wherein the n access requests comprise the 1 st access request and each access request between the n access requests;

The determining the target code coverage data of the nth access request based on the code coverage data of the nth-1 access request and the code coverage data of the nth access request includes:

3. The method of claim 1, wherein the clustering the nth access request based on the object code coverage data to obtain the object request set corresponding to the nth access request includes:

4. A method according to claim 3, said determining a target scene type of the nth access request based on the target code coverage data and matching the target scene type with existing scene types in a scene data storage area, comprising:

5. The method as recited in claim 1, further comprising:

6. The method as recited in claim 1, further comprising:

recording a plurality of access requests for the tested system;

the obtaining the nth access request to be clustered includes:

7. The method as recited in claim 1, further comprising:

compiling the target code text to obtain an executable file;

8. A data clustering device, comprising:

9. A computer device comprising a processor and a memory, wherein the memory is for storing a computer program, the computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-7.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein program instructions, which when executed, are adapted to carry out the method according to any of claims 1-7.