CN118194292A - Test data generation method and device of machine learning framework and electronic equipment - Google Patents

Test data generation method and device of machine learning framework and electronic equipment Download PDF

Info

Publication number
CN118194292A
CN118194292A CN202410302818.2A CN202410302818A CN118194292A CN 118194292 A CN118194292 A CN 118194292A CN 202410302818 A CN202410302818 A CN 202410302818A CN 118194292 A CN118194292 A CN 118194292A
Authority
CN
China
Prior art keywords
data
test
test data
machine learning
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410302818.2A
Other languages
Chinese (zh)
Inventor
邹权臣
刘昭
于恬
王旋
张德岳
于泓凯
杨东东
韩东
徐昌凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN202410302818.2A priority Critical patent/CN118194292A/en
Publication of CN118194292A publication Critical patent/CN118194292A/en
Pending legal-status Critical Current

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application discloses a method and a device for generating test data of a machine learning framework and electronic equipment, and relates to the technical field of testing, wherein the method comprises the following steps: acquiring basic data related to testing of a machine learning framework to be tested; generating a prompt template conforming to at least one data structure attribute according to the data structure attribute of an operator in the machine learning framework to be tested and the basic data; and inputting the prompt template into the semantic model to generate first test data. The method provided by the application can provide various data reserves for test data by acquiring the basic data. The prompting template can generate test data with single attribute and test data with multiple attribute composite types, so that the validity of the test data can be ensured, and the diversity of the test data can be improved. The first test data is generated by inputting the prompt template into the semantic model, and compared with the manually generated test data, the test data can be automatically and efficiently generated on the basis of the prompt template.

Description

Test data generation method and device of machine learning framework and electronic equipment
Technical Field
The present application relates to the field of testing technologies, and in particular, to a method and an apparatus for generating test data of a machine learning framework, and an electronic device.
Background
Currently, in order to test the security of a machine learning framework, security holes of the machine learning framework need to be tested and found through test data when the machine learning framework is developed.
There are two problems with the current test data generation methods for machine learning frameworks: firstly, the generated test data are random, the structural requirement of a machine learning framework on the test data is not met, the test is stopped at a validity checking stage, and the deep code cannot be entered for safety detection; secondly, the diversity of test data is low, and depending on subjective experience of testers, when new test projects are encountered, the generated test data is difficult to meet test requirements in diversity and effectiveness, so that the test efficiency is reduced.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for generating test data of a machine learning framework, which are used for solving the defect that the diversity and the effectiveness of the test data generated in the related technology are difficult to meet the test requirement, and the technical scheme provided by the embodiment of the application is as follows:
in a first aspect, an embodiment of the present application provides a method for generating test data of a machine learning framework, including:
Acquiring basic data related to testing of a machine learning framework to be tested;
generating a prompt template conforming to at least one data structure attribute according to the data structure attribute of an operator in the machine learning framework to be tested and the basic data;
and inputting the prompt template into a semantic model to generate first test data.
In an alternative of the first aspect, the base data is at least one of: the method comprises the steps of detecting historical vulnerability data of a machine learning frame to be detected, each data type in the machine learning frame to be detected, boundary values corresponding to each data type, and data of a preset type corresponding to the machine learning frame to be detected.
In an optional aspect of the first aspect, after the obtaining basic data related to the test of the machine learning framework to be tested, the method further includes:
and processing the basic data into basic data vectors, and constructing a basic database based on the basic data vectors.
In an alternative of the first aspect, the data structure attributes of the operator include a type, a shape, and a value;
Wherein the types include a basic type and a composite type;
the shape includes dimensions and element numbers of the operator;
The value is the value of each element in the operator.
In an optional implementation manner of the first aspect, the generating, according to the data structure attribute of the operator in the machine learning framework to be tested and the basic data, a hint template conforming to at least one data structure attribute includes:
generating a control field based on data structure attributes of at least one of the operators;
Traversing the basic database to obtain basic data related to the control field, and taking the traversed result as a data field;
And generating the prompt template by combining the control field and the data field.
In an optional implementation manner of the first aspect, the inputting the hint template into the semantic model generates first test data, including:
Inputting the prompting template into a semantic model, determining data structure attributes based on control fields of the prompting template, and determining basic data conforming to at least one data structure attribute in data fields of the prompting template as the first test data.
In an optional implementation manner of the first aspect, the inputting the hint template into the semantic model generates first test data, including:
Inputting the prompting template into a semantic model, determining the requirement of the data structure attribute based on the control field of the prompting template, determining the basic data conforming to the at least one data structure attribute in the data field of the prompting template, modifying the basic data conforming to the at least one data structure attribute, and generating the first test data based on the modified data.
In an optional implementation manner of the first aspect, after generating the first test data, the method further includes:
And inputting the first test data into an operator test template for testing, and outputting a first test result.
In an optional implementation manner of the first aspect, after the generating the first test data, the inputting the first test data into a test template of an operator to perform a test, and before outputting a first test result, the method further includes:
Comparing the first test data with test data in a test database;
deleting the first test data if repeated test data exists between the first test data and the test database;
the step of inputting the first test data into an operator test template for testing, and outputting a first test result comprises the following steps:
And if the first test data and the test database have no repeated test data, inputting the first test data into an operator test template for testing, and outputting a first test result.
In an optional implementation manner of the first aspect, after outputting the first test result, the method further includes:
And if the first test result does not comprise error information, storing the first test data into a test database.
In an optional implementation manner of the first aspect, the inputting the first test data into the operator test template for testing, and after outputting the first test result, further includes:
and if the first test result comprises error information, correcting error parameters corresponding to the error information, and outputting a correction result.
In an optional implementation manner of the first aspect, the correcting the error parameter corresponding to the error information, and outputting a correction result, includes:
Traversing a vector database of the machine learning framework to be tested to obtain a code segment corresponding to the error information;
and constructing a correction prompt template by combining the error information, the error parameters and the code segments, inputting the correction prompt template into the semantic model, correcting the error parameters through the semantic model, and outputting a correction result.
In an optional implementation manner of the first aspect, after outputting the correction result, the method further includes:
And combining the correction result with the first test data to generate second test data, inputting the second test data into an operator test template for testing, and outputting a second test result.
In an optional aspect of the first aspect, after the combining the correction result and the first test data to generate second test data, the inputting the second test data into an operator test template for testing, and before outputting the second test result, further includes:
comparing the second test data with test data in a test database;
Deleting the second test data if the second test data and the test database have repeated test data;
inputting the second test data into an operator test template for testing, and outputting a second test result, wherein the method comprises the following steps:
And if the second test data and the test database have no repeated test data, inputting the second test data into an operator test template for testing, and outputting a second test result.
In an optional implementation manner of the first aspect, the inputting the second test data into the operator test template for testing, after outputting the second test result, includes:
If the second test result does not include error information, the second test data are stored in a test database;
and if the second test result comprises error information, the step of correcting the error parameter corresponding to the error information and outputting a correction result is executed again until the second test result does not comprise error information.
In a second aspect, an embodiment of the present application further provides a test data generating apparatus of a machine learning framework, including:
The basic data module is used for acquiring basic data related to the test of the machine learning framework to be tested;
The test data generation module is used for generating a prompt template conforming to at least one data structure attribute according to the data structure attribute of an operator in the machine learning framework to be tested;
The test data generation module is also used for inputting the prompt template into a semantic model to generate first test data.
In an alternative of the second aspect, the apparatus further includes a test module, the test data generating module inputs the first test data into an operator test template of the test module for testing, and the test module generates a first test result.
In an alternative of the second aspect, the apparatus further includes a data correction module, after the test data generating module inputs the first test data into an operator test template of the test module for testing, if the first test result generated by the test module includes error information, the data correction module corrects an error parameter corresponding to the error information, and outputs a correction result.
In an optional aspect of the second aspect, the data correction module corrects an error parameter corresponding to the error information, and outputs a correction result, including:
The data correction module traverses a vector database of the machine learning framework to be detected to obtain a code segment corresponding to the error information;
the data correction module combines the error information, the error parameters and the code segments to construct a correction prompt template, the correction prompt template is input into the semantic model, the data correction module corrects the error parameters through the semantic model, and a correction result is output.
In an optional aspect of the second aspect, the data correction module corrects an error parameter corresponding to the error information, and after outputting a correction result, the data correction module further includes:
the data correction module combines the correction result and the first test data to generate second test data, the second test data is input into an operator test template of the test module for testing, and the test module outputs a second test result.
In an optional aspect of the second aspect, the data modification module inputs the second test data into an operator test template of the test module for testing, and after the test module outputs the second test result, the data modification module further includes:
if the second test result does not include error information, the data correction module stores the second test data to a test database;
and if the second test result comprises error information, the data correction module executes the step of correcting the error parameter corresponding to the error information again and outputs a correction result until the second test result does not comprise error information.
In a third aspect, an embodiment of the present application further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor, where the processor implements the method provided by the first aspect or any implementation manner of the first aspect of the embodiment of the present application when the processor executes the program.
In a fourth aspect, the present application also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method provided by the first aspect of the embodiments or any implementation of the first aspect.
The technical scheme provided by the embodiments of the application has the beneficial effects that at least:
According to the test data generation method of the machine learning framework, provided by the embodiment of the application, the basic data related to the test of the machine learning framework to be tested is obtained, so that various data reserves can be provided for the generation of the test data. The prompt template is further generated based on at least one data structure attribute of the basic data and the operator, so that the prompt template can be used for generating test data with single attribute and test data with multi-attribute composite type, the legality of the data structure attribute of the operator can be ensured, and the diversity of the test data is ensured. Compared with the manual generation of the test data, the test data can be automatically and efficiently generated on the basis of the prompt template based on the natural language understanding capability of the semantic model.
Drawings
In order to more clearly illustrate the application or the technical solutions in the related art, the drawings used in the description of the embodiments or the related art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a test data generation system of a machine learning framework according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for generating test data for a machine learning framework according to an embodiment of the present application;
FIG. 3 is a flowchart of a method for generating test data for a machine learning framework according to an embodiment of the present application;
FIG. 4 is a flowchart of a method for generating test data for a machine learning framework according to an embodiment of the present application;
FIG. 5 is a flowchart of a method for generating test data for a machine learning framework according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a test data generating device of a machine learning framework according to an embodiment of the present application;
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "comprising" and "having" and any variations thereof in the description and claims of the application and in the foregoing drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to only those steps or modules but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the term "first/second" related to the present application is merely to distinguish similar objects, and does not represent a specific order for the objects, and it should be understood that "first/second" may interchange a specific order or precedence where allowed. It is to be understood that the "first\second" distinguishing aspects may be interchanged where appropriate to enable embodiments of the application described herein to be implemented in sequences other than those described or illustrated herein.
The related art mainly focuses on the functions and performances of the machine learning framework and the usability of the machine learning framework for developers, such as Tensorflow focuses on universality, pytorch focuses on usability and Oneflow focuses on high performance, but the security of the machine learning framework is not fully considered.
Therefore, when the machine learning framework is researched and developed, the security holes of the machine learning framework are needed to be found and solved through the test data, so that the framework security, the application security and the user data security are ensured.
However, the current test data generation method for the machine learning framework has two problems: firstly, the generated test data are random, the structural requirement of a machine learning framework on the test data is not met, the test is stopped at a validity checking stage, and the deep code cannot be entered for safety detection; secondly, the diversity of test data is low, and depending on subjective experience of testers, when new test projects are encountered, the generated test data is difficult to meet test requirements in diversity and effectiveness, so that the test efficiency is reduced.
Based on the above, the embodiment of the application provides a test data generation method of a machine learning framework, which can provide various data reserves for the generation of test data by acquiring basic data related to the test of the machine learning framework to be tested. The prompt template is further generated based on at least one data structure attribute of the basic data and the operator, so that the prompt template can be used for generating test data with single attribute and test data with multi-attribute composite type, the legality of the data structure attribute of the operator can be ensured, and the diversity of the test data is ensured. Compared with the manual generation of the test data, the test data can be automatically and efficiently generated on the basis of the prompt template based on the natural language understanding capability of the semantic model.
Reference is next made to fig. 1, which is a schematic diagram illustrating a test data generating system of a machine learning framework according to an exemplary embodiment of the present application. As shown in fig. 1, the test data generation system of the machine learning framework includes: a terminal 110 and a server side 120.
Terminal 110 may include one or more user-corresponding terminals. A client version of software may be installed in the terminal 110, where the software is configured to obtain basic data related to a test of the machine learning framework to be tested, generate a prompting template according to the data structure attribute of the operator in the machine learning framework to be tested and the basic data, and input the prompting template into the semantic model to generate first test data. The terminal 110 may also establish a data relationship with the network, and establish a data connection relationship with the server 120 through the network, for example, the terminal 110 may obtain, from the server 120, basic data related to a test of the machine learning framework to be tested, may input a prompt template into a semantic model of the server 120, receive an output result of the semantic model of the server 120, and generate first test data.
The server 120 may receive the alert template from the terminal 110 through the network, input the alert template into the semantic model erected on the server 120, and output the output result of the semantic model to the terminal 110. It is understood that terminal 110 may be, but is not limited to being, a hardware server, a virtual server, a cloud server, etc.
The method for generating test data of the machine learning framework is not limited to the server side 120, but may be executed by the terminal 110 alone or by a combination of the terminal 110 and the server side 120, which is not particularly limited in the embodiments of the present application, and the following embodiments will all take the method for generating test data of the machine learning framework executed by the terminal 110 as an example.
It will be appreciated that the semantic model may be provided at the local terminal or at the server. The terminal 110 may be connected to a peripheral device, input necessary basic data through a peripheral device such as a keyboard and a mouse, and may also be connected to an external storage device, so as to directly obtain basic data from the external storage device.
The network may be a medium that provides a communication link between the server side 120 and any one of the terminals 110, or may be the internet including network devices and transmission media, but is not limited thereto. The transmission medium may be a wired link, such as, but not limited to, coaxial cable, fiber optic and digital subscriber lines (digital subscriber line, DSL), etc., or a wireless link, such as, but not limited to, wireless internet (WIRELESS FIDELITY, WIFI), bluetooth, a mobile device network, etc.
It will be appreciated that the number of terminals 110 and server sides 120 in the test data generation system of the machine learning framework shown in fig. 1 is merely an example, and in a specific implementation, any number of terminals and server sides may be included in the test data generation system, and embodiments of the present application are not limited in this regard. For example, but not limited to, the server side 120 may be a server cluster formed by a plurality of servers, and the terminal 110 may be a terminal cluster formed by a plurality of terminals.
It will be appreciated that the machine learning framework is a library, interface or tool for enabling developers to build machine learning models more easily and quickly, by using pre-built and optimized component libraries to define machine learning models, by which programmers can be prevented from starting from scratch when creating specific machine learning applications, making the development process more efficient.
The present application will be described in detail with reference to specific examples.
Next, referring to fig. 1, taking a test data generating method of a machine learning framework executed by a terminal as an example, a test data generating method of a machine learning framework provided by an embodiment of the present application will be described. Referring specifically to fig. 2, fig. 2 shows a flowchart of a method for generating test data of a machine learning framework according to an embodiment of the present application, where the method includes the following steps:
S201, basic data related to testing of the machine learning framework to be tested is acquired.
It will be appreciated that the machine learning framework to be tested includes, but is not limited to TensorFlow, pyTorch, MXNet, CNTK, caffe and the like, and embodiments of the present application are not limited thereto. The complexity of the machine learning framework is such that many security risks are hidden therein, and thus security vulnerabilities in the machine learning framework need to be detected through testing.
Specifically, the basic data related to the test of the machine learning framework to be tested can be understood as the reserve data for the test of the machine learning framework, the data generated in the historical operation and/or the test of the machine learning framework can be analyzed and collected as the reserve data, and any data type and value selected by people can be set as the reserve data of the test, so that a basis is provided for the generation of the test data.
Alternatively, the underlying data may be at least one of: historical vulnerability data of a machine learning frame to be tested, each data type in the machine learning frame to be tested, boundary values corresponding to each data type, and data of a preset type corresponding to the machine learning frame to be tested;
In particular, the historical vulnerability data can be understood as data which has triggered the vulnerability in the historical usage and/or test records of the machine learning framework, and can be obtained directly from the security bulletins issued by the publisher of the machine learning framework, or can be obtained from the historical usage and/or test records of the machine learning framework, and the general problems in the machine learning framework can be reflected through the historical vulnerability data. By way of example, the historical vulnerability data may include heap out-of-range read-write and integer overflow vulnerabilities caused by the parameter being a negative number or a large integer, null pointer dereferencing vulnerabilities caused by a null array, floating point exception vulnerabilities caused by 0, and so forth. By analyzing the historical loopholes, key data for triggering the loopholes can be obtained, and the generated test data can possibly test similar loopholes.
Specifically, the machine learning framework to be tested has data of different data types, the different data types have corresponding value ranges, when the numerical value of the data is a boundary value of the corresponding value ranges, out-of-range vulnerabilities are easy to cause, and the related data types and the corresponding boundary values can be obtained according to the document analysis of the machine learning framework. For example, for a 32 bit signed integer, the range of values is [ -2 31,231 -1], when the value is-2 31, if there is an operation of converting with a 64 bit signed integer or a 32 bit unsigned integer, the sign of the value will change to a positive number, thereby causing a cross-boundary vulnerability.
Specifically, the preset type of data can be understood as: based on the data of the specific type selected by the type of the machine learning frame, different machine learning frames have different data types, and targeted selection is carried out according to the data of the specific type existing in the different machine learning frames; the data types not covered by the historical vulnerability data and the boundary value data can be selected as further supplements of the basic data. The above-mentioned historical vulnerability data, data types and corresponding boundary values only relate to simple types of data, such as numbers or character strings, and some special types of data are also included in the machine learning framework, for example, in TensorFlow, besides simple types, such as integer data type int and floating point data type float, there are special types of data, such as RESOURCE type dt_resource and VARIANT type dt_variant.
It may be understood that the basic data may be any permutation and combination of the above three types of data, may include any one of historical vulnerability data of the machine learning frame to be tested, each data type in the machine learning frame to be tested and a boundary value corresponding to each data type, and data of a preset type corresponding to the machine learning frame to be tested, may include any two of historical vulnerability data of the machine learning frame to be tested, each data type in the machine learning frame to be tested and a boundary value corresponding to each data type, and data of a preset type corresponding to the machine learning frame to be tested, and may include all three types.
Optionally, the obtained basic data related to the test of the machine learning framework to be tested can be stored locally and/or in the cloud.
S202, generating a prompt template conforming to at least one data structure attribute according to the data structure attribute of an operator in the machine learning framework to be tested and the basic data.
Understandably, by parsing the operator document and description file of the machine learning framework to be tested, the data structure attributes of the operator parameters can be obtained, and the operator parameters can include a plurality of data structure attributes.
Specifically, the hint template is generated according to the data structure attribute of the operator and the basic data, and the hint template accords with at least one data structure attribute, which can be understood that the hint template defines at least one data structure attribute and also includes the basic data acquired in S201. The data structure attribute in the prompt template can be one of the data structure attributes of the operators, or can be an arrangement combination of more than one data structure attribute of the operators.
In some embodiments tensor is the basic data structure of operations in the machine learning framework, one tensor is defined by three data structure attributes: type, shape, and value; wherein the types include basic types and composite types; the shape includes the dimensions of the operator and the number of elements; the value is the value of each element in the operator.
S203, inputting the prompt template into the semantic model to generate first test data.
It will be appreciated that the semantic model is a large language model (large language model) with natural language understanding capabilities, and embodiments of the present application are not limited to a particular type of semantic model. The large language model can realize the capability of processing a large amount of information through analysis and learning of a large amount of data, so that new data is generated according to the existing database.
Specifically, after the prompt template is input into the semantic model, the meaning of the language text in the prompt template can be understood through the semantic model, so that first test data which accords with the data structure attribute in the prompt template is generated, and the generated first test data also accords with the data structure attribute of the operator.
Specifically, according to the difference of the data structure attributes defined in the prompt template, the first test data not only can be the test data conforming to one of the data structure attributes, but also can be the test data conforming to the type with multiple attributes.
According to the test data generation method of the machine learning framework, provided by the embodiment of the application, the basic data related to the test of the machine learning framework to be tested is obtained, so that various data reserves can be provided for the generation of the test data. The prompt template is further generated based on at least one data structure attribute of the basic data and the operator, so that the prompt template can be used for generating test data with single attribute and test data with multi-attribute composite type, the legality of the data structure attribute of the operator can be ensured, and the diversity of the test data is ensured. Compared with the manual generation of the test data, the test data can be automatically and efficiently generated on the basis of the prompt template based on the natural language understanding capability of the semantic model.
Next, please refer to fig. 3, which is a flowchart illustrating another method for generating test data of a machine learning framework according to an embodiment of the present application. As shown in fig. 3, the method may include:
s301, basic data related to testing of a machine learning framework to be tested is obtained;
Specifically, the details of this step are not described in detail in S201, and are not described here again.
S302, processing the basic data into basic data vectors, and constructing a basic database based on the basic data vectors.
Specifically, after the basic data is acquired, the basic data is processed into a vector form, and the vector-shaped basic data vector is stored in a basic database.
Further, generating a prompt template conforming to at least one data structure attribute according to the data structure attribute of an operator in the machine learning framework to be tested and the basic data, and specifically comprising the steps of:
S303, generating a control field based on the data structure attribute of at least one operator; traversing the basic database to obtain basic data related to the control field, taking the traversed result as a data field, and combining the control field and the data field to generate a prompt template.
Specifically, the hint template includes a control field and a data field.
Specifically, the control field defines a data structure attribute of at least one operator, and the data structure attribute in the control field is described by natural language. At least one data structure attribute of an operator parameter of the natural language description is constructed as a control field when the hint template is generated. Illustratively, for the base data structure tensor, the control field may be "generate an integer type number less than a number of 10", the required data type is an integer type in the base type data, the shape (number of elements) is 1, and the number of elements is 10; the control field may also be "generate a number less than 10", a shape (number of elements) of 1, an element number of 10, and no requirement for the data type.
Specifically, according to the data structure attribute defined in the control field, all the stored basic data are traversed in the basic database established in S302, and the data of the data structure attribute meeting the requirement of the control field is selected as the data field. Illustratively, if the control field is "generate digits with an integer type value less than 10", then all digits with an integer type value less than 10 are traversed in the underlying database as the data field.
Further, inputting the prompt template into the semantic model to generate first test data, which specifically comprises the following steps:
s304, inputting the prompting template into a semantic model, determining data structure attributes based on control fields of the prompting template, and determining basic data conforming to at least one data structure attribute in the data fields of the prompting template as first test data.
Specifically, the control field and the data field in the prompt template are analyzed by utilizing the natural language understanding capability of the semantic model. The control field gives the requirements of the data structure attribute, for example, the structure of the test data is determined from three aspects of numerical value, type and shape, and the data field gives the optional range of the basic data for determining the specific content of the test data so as to generate the first test data.
Illustratively, if the control field determines that a real number of a positive integer type is required, the data field includes 1 and 2, the first test data may be generated as: a real number 1 of a positive integer type and a real number 2 of a positive integer type.
Optionally, inputting the prompt template into the semantic model to generate first test data, which specifically includes the steps of:
S305, inputting the prompting template into a semantic model, determining the requirement of the data structure attribute based on the control field of the prompting template, determining the basic data conforming to at least one data structure attribute in the data field of the prompting template, modifying the basic data conforming to at least one data structure attribute, and generating first test data based on the modified data.
Specifically, given basic data can be modified on the basis of the basic data given by the data field, and the data is derivative processed on the premise of meeting the data validity requirement. For example, the control field determines that the data structure attribute is a positive integer real number, the data field gives data 1 and 2, and the data can be derived to 3 or other positive integer real numbers.
Optionally, modifying the underlying data may include changing the value of the data, the data type, the data sign, etc. Some or all of the underlying data may be modified.
According to the embodiment of the application, the basic data is conveniently searched by vectorizing the basic database, so that the data can be quickly searched and referred. The prompt template is generated based on at least one data structure attribute of the basic data and the operator, so that the prompt template can be used for generating test data with single attribute and test data with multi-attribute composite type, the legality of the data structure attribute of the operator can be ensured, and the diversity of the test data is ensured. More data can be generated by taking or modifying variant derivatives. Compared with the manual generation of the test data, the test data can be automatically and efficiently generated on the basis of the prompt template based on the natural language understanding capability of the semantic model.
Next, please refer to fig. 4, which is a flowchart illustrating a test data generation of a machine learning framework according to an embodiment of the present application. As shown in fig. 4, the method may include the steps of:
S401, inputting the first test data into an operator test template for testing, and outputting a first test result.
It can be understood that the operator test template library refers to a template database of test operators, different templates of the test operators are stored in the operator test template library, and the operator test template library contains operator calling methods, operator parameter types, constraint conditions among operator parameters and the like, and can generate test data conforming to the operator data structure attributes according to the operator data structure attributes. The operator can be tested in a multi-dimension mode through the operator test template library, so that the safety of operator calculation can be effectively improved.
Specifically, the first test data is input into the operator test template through a function interface corresponding to the machine learning framework to be tested.
Specifically, the first test result directly reflects the test success or test failure result after the test data is input to the operator test template, and if the test fails, the first test result feeds back the error information of the test failure. When error information occurs, the corresponding test data cannot be used for testing the machine learning framework to be tested.
Illustratively, the error information may include data parsing failure, data processing non-norms, and the like, which are not limited by the embodiments of the present application.
Further, according to whether the first test result includes error information, after S401, if the test result does not include error information, the step of S402 is performed, and if the test result includes error information, the step of S403 is performed.
Specifically, if the first test result does not include error information, the method specifically includes:
s402, storing the first test data in a test database.
Specifically, if there is no error information in the first test result, it indicates that the corresponding first test data can be used for testing the corresponding machine learning framework to be tested, the corresponding first test data can be stored in a test database, all test data available for testing is stored in the test database, and the test database can be stored on the local terminal device and/or the cloud server.
Optionally, a label of a corresponding type may be generated based on the type of the operator test template, and a mapping relationship between the label and the test database may be generated, so that the corresponding test data may be conveniently called according to the type of the operator test template.
Specifically, if the first test result includes error information, the method includes:
S403, if the first test result comprises error information, correcting error parameters corresponding to the error information, and outputting a correction result.
Specifically, code segments corresponding to error information can be obtained by traversing a vector database of a machine learning framework to be tested;
specifically, the error parameter causing the error may be determined by the error information or the code segment corresponding to the error information. The code segment includes specific parameter judgment on the related parameters, and can be used for determining corresponding error parameters.
Specifically, a correction prompt template can be constructed by combining error information, error parameters and code segments, the correction prompt template is input into a semantic model, the error parameters recorded in the correction prompt template are understood to be corrected through the semantic model by utilizing the natural language understanding capability of the semantic model, and a correction result is output.
Illustratively, for example, the Tensorflow framework runtime may have error information of "TransposeOp operand rank 5does not match permutation size 4", which indicates that the operand of operator Transpose is 5-dimensional and that the size 4 of the variable permuztion does not match, while Transpose operator parameters include a parameter named permuztion, so that the correction hint template may include information that the dimensions of the Transpose operator do not match the dimensions of the error parameter permuzzzk, and the semantic model may understand the correction hint template to modify the dimensions of the error parameter permuzk.
S404, combining the correction result and the first test data to generate second test data.
Specifically, after the first test data is tested, one or more error parameters may exist, and the correction result may be understood as a correction result of all the error parameters.
Specifically, after the correction result is generated, the correction result needs to be combined with the first test data to replace the original error parameter.
S405, inputting the second test data into an operator test template for testing, and outputting a second test result.
Further, after the second test result is obtained, it is determined whether further modification of the second test data is required according to the second test result.
Specifically, if the second test result does not include error information, the step of S402 is performed, and the second test data is saved to the test database.
Specifically, if the second test result includes error information, the steps of S403, S404, and S405 are executed again until the second test result does not include error information.
Specifically, multiple loop iterations may be performed on the second test data, and if the corrected data still has problems, the process of correcting the test data is repeated.
Optionally, in the process of performing multiple loop iterations on the second test data, an upper limit of the number of loop iterations may be set, and if error information still exists after the iteration exceeding the upper limit of the number of loop iterations, an error parameter corresponding to the error information is discarded.
According to the embodiment of the application, after the first test data is tested, if the first test result comprises error information, the first test data is corrected, the second test data is further generated according to the correction result, the cost for generating the test data is reduced, and more test data are generated. And generating a correction prompt template by combining the code segment, the error parameter and the error information, and inputting the correction prompt template into the semantic model, so that errors in the test data are intelligently identified and corrected, the cost of correcting the test data is reduced, the accuracy of the test data is improved, and the test efficiency is improved.
Next, please refer to fig. 5, which is a flowchart illustrating a test data generation of a machine learning framework according to an embodiment of the present application. As shown in fig. 5, the method may include the steps of:
s501, inputting a prompt template into a semantic model to generate first test data.
Specifically, the details of this step are not described in detail in S203, and will not be described here.
S502, comparing the first test data with the test data in the test database.
Specifically, the test database is a database for storing test data, and a plurality of test data can be selected according to a preset and stored as the test database, or the test data which does not generate error information can be stored in the test database according to a test result.
Alternatively, the test database may be a collection of multiple test databases.
Specifically, the first test data is compared with all the existing test data in the test database, so that the test data is prevented from being repeated. Wherein, some parameters may be the same in the test data, and repeated test data may be understood as all parameters in the two test data being identical.
Further, based on the comparison result, if the first test data and the test database are stored in the repeated test data, the following steps are executed:
s503, deleting the first test data.
Further, if the first test data and the test database have no repeated test data, the following steps are executed:
s504, inputting the first test data into an operator test template for testing, and outputting a first test result.
Specifically, the details of this step are not described in detail in S401, and are not described here again.
Further, if the first test result does not include error information, the method includes:
S505, the first test data is saved to a test database.
Optionally, the test database is the same as the test database used for comparing the test data.
Specifically, the step is not described in detail in S402, and will not be described here again.
Further, if the first test result includes error information, the method includes:
S506, if the first test result comprises error information, correcting error parameters corresponding to the error information, and outputting a correction result.
Specifically, the step S403 is not described in detail, and will not be described here.
S507, combining the correction result with the first test data to generate second test data.
Specifically, the step is not described in detail in S404, and will not be described here again.
Further, after the second test data is generated, the second test data is input into the operator test template for testing, and before the test result is output, the step of executing S502 compares the second test data with the test data in the test database again.
Further, based on the comparison result, if the repeated test data is stored with the test database, executing the step of S503 to delete the second test data; if no repeated test data exists between the first test data and the test database, the step S504 is executed to input the second test data into the operator test template for testing, and the test result is output.
According to the method provided by the embodiment of the application, after the first test data and the second test data are generated each time, the first test data and the second test data are subjected to repeated checking according to the test data in the test database, and the test data identical to the existing test data in the test database are deleted, so that the test data can be prevented from being repeated, the repeated test by using the identical test data is prevented, and the test efficiency is improved.
The following are device embodiments of the present application that may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
Referring next to fig. 6, a schematic structural diagram of a computing device for an application usage duration is provided according to an exemplary embodiment of the present application. The device may be implemented as a whole or part of the terminal by software, hardware or a combination of both, or may be integrated on the server as a separate module. The computing device for the application use duration in the embodiment of the present application may be applied to a terminal or a cloud, where the device 6 includes a base data module 61, a test data generating module 62, a test module 63, and a data modifying module 64, and specifically includes:
A basic data module 61, configured to acquire basic data related to a test of a machine learning framework to be tested;
A test data generating module 62, configured to generate a prompt template according to at least one data structure attribute according to the data structure attribute of the operator in the machine learning frame to be tested;
the test data generation module 62 is further configured to input the hint template into the semantic model to generate first test data.
Optionally, the basic data acquired by the basic data module 61 is at least one of the following: historical vulnerability data of a machine learning frame to be tested, each data type in the machine learning frame to be tested, boundary values corresponding to each data type, and data of a preset type corresponding to the machine learning frame to be tested.
Optionally, the base data module 61 processes the base data into a base data vector after acquiring the base data related to the test of the machine learning framework to be tested, and constructs a base database based on the base data vector.
Specifically, the data structure attributes of the operators include type, shape, and value; wherein the types include basic types and composite types; the shape includes the dimensions of the operators and the number of elements; the value is the value of each element in the operator.
Optionally, the test data generating module 62 inputs the hint template into the semantic model, the test data generating module 62 determines the data structure attribute based on the control field of the hint template through the semantic model, and determines the basic data conforming to at least one data structure attribute in the data field of the hint template as the first test data.
Optionally, the test data generating module 62 inputs the prompting template into the semantic model, the test data generating module 62 determines the requirement of the data structure attribute based on the control field of the prompting template through the semantic model, determines the basic data conforming to at least one data structure attribute in the data field of the prompting template, modifies the basic data conforming to at least one data structure attribute, and generates the first test data based on the modified data.
Optionally, after generating the first test data, inputting the first test data into the operator test template for testing, and before outputting the first test result, the test data generating module 62 is further configured to compare the first test data with the test data in the test database, and if the first test data and the test database are stored in the repeated test data, delete the first test data; if the first test data and the test database have no repeated test data, the test data generating module 62 inputs the first test data into the operator test template for testing, and outputs a first test result.
In some embodiments, the apparatus further includes a test module 63, the test data generating module 62 inputs the first test data into an operator test template of the test module 63 for testing, and the test module 63 generates a first test result.
In some embodiments, the apparatus further includes a data correction module 64, after the test data generating module 62 inputs the first test data into the operator test template of the test module 63 for testing, if the first test result generated by the test module 63 includes error information, the data correction module 64 corrects an error parameter corresponding to the error information, and outputs a correction result.
Specifically, the data correction module 64 corrects the error parameter corresponding to the error information, and outputs a correction result, including:
The data correction module 64 traverses a vector database of the machine learning framework to be detected to obtain a code segment corresponding to the error information;
the data correction module 64 combines the error information, the error parameters and the code segments to construct a correction prompt template, the correction prompt template is input into the semantic model, the data correction module 64 corrects the error parameters through the semantic model, and a correction result is output.
Specifically, the data correction module 64 corrects the error parameter corresponding to the error information, and after outputting the correction result, the method further includes:
The data correction module 64 combines the correction result and the first test data to generate second test data, the second test data is input into the operator test template of the test module 63 to be tested, and the test module 63 outputs the second test result.
Optionally, after the data correction module 64 combines the correction result with the first test data to generate the second test data, the second test data is input into the operator test template of the test module 63 for testing, and before outputting the second test result, the method further includes:
The data modification module 64 compares the second test data with the test data in the test database, and deletes the second test data if the second test data and the test database are stored in the repeated test data; if the second test data and the test database have no repeated test data, the data correction module 64 inputs the second test data into the operator test template of the test module 63 for testing, and outputs a second test result.
Specifically, the data correction module 64 inputs the second test data into the operator test template of the test module 63 to perform a test, and after the test module 63 outputs the second test result, the method further includes:
If the second test result does not include error information, the data correction module 64 saves the second test data to the test database;
If the second test result includes error information, the data correction module 64 performs the step of correcting the error parameter corresponding to the error information again, and outputs the correction result until the second test result does not include error information.
It should be noted that, in the apparatus 6 provided in the foregoing embodiment, only the division of the above functional modules is used as an example when executing the test data generating method, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the functions described above. In addition, the device provided in the above embodiment and the test data generating method embodiment belong to the same concept, which embody the detailed implementation process in the method embodiment, and are not described herein again.
The embodiment of the application also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the method of any embodiment when executing the program.
Fig. 7 is a block diagram of an electronic device according to an embodiment of the present application.
As shown in fig. 7, the electronic device 700 includes: a processor 701 and a memory 702.
In the embodiment of the present application, the processor 701 is a control center of a computer system, and may be a processor of a physical machine or a processor of a virtual machine. Processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 701 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ).
The processor 701 may also include a main processor and a coprocessor, wherein the main processor is a processor for processing data in an awake state, and is also called a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state.
Memory 702 may include one or more computer-readable storage media, which may be non-transitory. The memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments of the application, a non-transitory computer readable storage medium in memory 702 is used to store at least one instruction for execution by processor 701 to implement the methods of embodiments of the application.
In some embodiments, the electronic device 700 further includes: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 703 via buses, signal lines or a circuit board. Specifically, the peripheral device includes: peripheral interfaces 703 in display 704, camera 705, and audio circuit 706 may be used to connect at least one Input/Output (I/O) related peripheral to processor 701 and memory 702.
In some embodiments of the application, the processor 701, the memory 702, and the peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments of the application, either or both of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on separate chips or circuit boards. The embodiment of the present application is not particularly limited thereto.
The display screen 704 is used to display a UI. The UI may include graphics, text, icons, video, and any combination thereof. When the display 704 is a touch display, the display 704 also has the ability to collect touch signals at or above the surface of the display 704. The touch signal may be input to the processor 701 as a control signal for processing. At this point, the display 704 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards.
In some embodiments of the application, the display 704 may be one, disposed on a front panel of the electronic device 700; in other embodiments of the present application, the display 704 may be at least two, respectively disposed on different surfaces of the electronic device 700 or in a folded design; in still other embodiments of the present application, the display 704 may be a flexible display disposed on a curved surface or a folded surface of the electronic device 700. Even more, the display screen 704 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display 704 may be made of LCD (Liquid CRYSTAL DISPLAY), OLED (Organic Light-Emitting Diode), or other materials.
The camera 705 is used to capture images or video. Optionally, camera 705 includes a front camera and a rear camera. In general, a front camera is disposed on a front panel of an electronic device, and a rear camera is disposed on a rear surface of the electronic device. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments of the application, camera 705 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.
The audio circuit 706 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing. For purposes of stereo acquisition or descent, the microphone may be multiple, each disposed at a different location of the electronic device 700. The microphone may also be an array microphone or an omni-directional pickup microphone.
The power supply 707 is used to power the various components in the electronic device 700. The power supply 707 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power supply 707 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
The block diagram of the electronic device shown in the embodiments of the present application is not limiting of the electronic device 700, and the electronic device 700 may include more or less components than illustrated, or may combine some of the components, or may employ a different arrangement of components.
The present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the previous embodiments. The computer-readable storage medium may include, among other things, any type of disk including floppy disks, optical disks, DVDs, CD-ROMs, micro-drives, and magneto-optical disks, ROM, RAM, EPROM, EEPROM, DRAM, VRAM, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the related art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. A method of generating test data for a machine learning framework, comprising:
Acquiring basic data related to testing of a machine learning framework to be tested;
generating a prompt template conforming to at least one data structure attribute according to the data structure attribute of an operator in the machine learning framework to be tested and the basic data;
and inputting the prompt template into a semantic model to generate first test data.
2. The method of claim 1, wherein the underlying data is at least one of: the method comprises the steps of detecting historical vulnerability data of a machine learning frame to be detected, each data type in the machine learning frame to be detected, boundary values corresponding to each data type, and data of a preset type corresponding to the machine learning frame to be detected.
3. The method of claim 1, further comprising, after obtaining the base data related to the testing of the machine learning framework under test:
and processing the basic data into basic data vectors, and constructing a basic database based on the basic data vectors.
4. The method of claim 1, wherein the operator's data structure attributes include type, shape, and value;
Wherein the types include a basic type and a composite type;
the shape includes dimensions and element numbers of the operator;
The value is the value of each element in the operator.
5. A method according to claim 3, wherein generating a hint template conforming to at least one data structure attribute from the data structure attributes of operators in the machine learning framework to be tested and the underlying data comprises:
generating a control field based on data structure attributes of at least one of the operators;
Traversing the basic database to obtain basic data related to the control field, and taking the traversed result as a data field;
And generating the prompt template by combining the control field and the data field.
6. The method of claim 5, wherein the inputting the hint template into a semantic model generates first test data comprising:
Inputting the prompting template into a semantic model, determining data structure attributes based on control fields of the prompting template, and determining basic data conforming to at least one data structure attribute in data fields of the prompting template as the first test data.
7. The method of claim 5, wherein the inputting the hint template into a semantic model generates first test data comprising:
Inputting the prompting template into a semantic model, determining the requirement of the data structure attribute based on the control field of the prompting template, determining the basic data conforming to the at least one data structure attribute in the data field of the prompting template, modifying the basic data conforming to the at least one data structure attribute, and generating the first test data based on the modified data.
8. A test data generation apparatus of a machine learning framework, comprising:
The basic data module is used for acquiring basic data related to the test of the machine learning framework to be tested;
The test data generation module is used for generating a prompt template conforming to at least one data structure attribute according to the data structure attribute of an operator in the machine learning framework to be tested;
The test data generation module is also used for inputting the prompt template into a semantic model to generate first test data.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 7 when the program is executed by the processor.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 7.
CN202410302818.2A 2024-03-15 2024-03-15 Test data generation method and device of machine learning framework and electronic equipment Pending CN118194292A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410302818.2A CN118194292A (en) 2024-03-15 2024-03-15 Test data generation method and device of machine learning framework and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410302818.2A CN118194292A (en) 2024-03-15 2024-03-15 Test data generation method and device of machine learning framework and electronic equipment

Publications (1)

Publication Number Publication Date
CN118194292A true CN118194292A (en) 2024-06-14

Family

ID=91397877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410302818.2A Pending CN118194292A (en) 2024-03-15 2024-03-15 Test data generation method and device of machine learning framework and electronic equipment

Country Status (1)

Country Link
CN (1) CN118194292A (en)

Similar Documents

Publication Publication Date Title
US11379943B2 (en) Optimizing compilation of shaders
CN110188044B (en) Software error processing method, device, storage medium and equipment
CN106649084A (en) Function call information obtaining method and apparatus, and test device
CN112052008A (en) Code checking method, device, computer equipment and computer readable storage medium
CN112116690B (en) Video special effect generation method, device and terminal
CN115588131B (en) Model robustness detection method, related device and storage medium
CN112494940A (en) User interface manufacturing method and device, storage medium and computer equipment
US11537213B2 (en) Character recommending method and apparatus, and computer device and storage medium
CN111125602A (en) Page construction method, device, equipment and storage medium
CN110188366A (en) A kind of information processing method, device and storage medium
CN112506503A (en) Programming method, device, terminal equipment and storage medium
CN113138996A (en) Statement generation method and device
CN118194292A (en) Test data generation method and device of machine learning framework and electronic equipment
CN115858556A (en) Data processing method and device, storage medium and electronic equipment
CN111240972B (en) Model verification device based on source code
US20160224258A1 (en) Generating computer programs for use with computers having processors with dedicated memory
CN112860261A (en) Static code checking method and device, computer equipment and readable storage medium
CN111563035B (en) Test result display method, device, equipment and storage medium
JP2019144873A (en) Block diagram analyzer
US11914584B2 (en) Method and apparatus for reset command configuration, device and storage medium
CN115391524B (en) Sensitive word detection method and device, computer equipment, storage medium and product
CN115203988B (en) Operation method, device, equipment and storage medium of numerical reservoir simulation example
CN115292194B (en) Method for debugging flow, electronic equipment and computer readable storage medium
CN117633807A (en) Vulnerability prompting method, device, equipment and storage medium
CN117348868A (en) Interactive style template generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication