CN116909889A - Model risk evaluation method, device and equipment - Google Patents

Model risk evaluation method, device and equipment Download PDF

Info

Publication number
CN116909889A
CN116909889A CN202310838388.1A CN202310838388A CN116909889A CN 116909889 A CN116909889 A CN 116909889A CN 202310838388 A CN202310838388 A CN 202310838388A CN 116909889 A CN116909889 A CN 116909889A
Authority
CN
China
Prior art keywords
risk
model
test case
output result
generation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310838388.1A
Other languages
Chinese (zh)
Inventor
杨明洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202310838388.1A priority Critical patent/CN116909889A/en
Publication of CN116909889A publication Critical patent/CN116909889A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the specification discloses a risk evaluation method (namely a security risk evaluation method of an AIGC model), a device and equipment of a model, wherein the method comprises the following steps: receiving a risk evaluation instruction set aiming at a generated large model, wherein each risk evaluation instruction in the risk evaluation instruction set comprises requirement information for performing risk evaluation on the generated large model; acquiring a generation control condition set for performing risk evaluation on the generation type large model based on the set of the requirement information corresponding to the risk evaluation instruction set, and generating a test case set for performing risk evaluation on the generation type large model based on the generation control condition set; providing the test case set for the generated large model, and acquiring an output result set of the generated large model; and determining a risk test result of the generated large model based on the output result set and the risk category corresponding to each test case in the test case set.

Description

Model risk evaluation method, device and equipment
Technical Field
The present document relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for risk evaluation of a model.
Background
In recent years, with the breakthrough of technology, a large generation model (such as a ChatGPT model) is rapidly developed, and the model is applied to a plurality of scenes (such as conversations, text generation, image generation and the like), so that great convenience and welfare are brought to human beings. Meanwhile, at present, laws and regulations related to data privacy security are more mature, users attach more importance to the security of private data, and the generated large model also has phenomena such as facts errors, knowledge blind areas, common sense deviations and the like, and risks such as training data source compliance, data use bias, and generated content security are faced, and the phenomena and risks may cause users to receive improper information and even influence users to generate harmful behaviors, so that the development and application of the generated large model are limited. For this reason, it is required to provide a model security evaluation scheme capable of improving accuracy and reliability of a generative large model so that contents generated by the generative large model are safe, reliable and reliable.
Disclosure of Invention
An object of embodiments of the present specification is to provide a model security evaluation scheme capable of improving accuracy and reliability of a generative large model, so that content generated by the generative large model is safe, reliable and reliable.
In order to achieve the above technical solution, the embodiments of the present specification are implemented as follows:
the embodiment of the specification provides a risk evaluation method of a model, which comprises the following steps: and receiving a risk evaluation instruction set aiming at the generated large model, wherein each risk evaluation instruction in the risk evaluation instruction set comprises requirement information for performing risk evaluation on the generated large model. And acquiring a generation control condition set for performing risk evaluation on the generation type large model based on the set of the requirement information corresponding to the risk evaluation instruction set, and generating a test case set for performing risk evaluation on the generation type large model based on the generation control condition set. And providing the test case set for the generated large model, and acquiring an output result set of the generated large model. And determining a risk test result of the generated large model based on the output result set and the risk category corresponding to each test case in the test case set.
The embodiment of the specification provides a risk evaluation device of model, the device includes: the system comprises a use case generation module, a risk evaluation instruction set and a risk evaluation module, wherein the use case generation module receives a risk evaluation instruction set aiming at a large generation model, and each risk evaluation instruction in the risk evaluation instruction set comprises requirement information for performing risk evaluation on the large generation model; and acquiring a generation control condition set for performing risk evaluation on the generation type large model based on the set of the requirement information corresponding to the risk evaluation instruction set, and generating a test case set for performing risk evaluation on the generation type large model based on the generation control condition set. And the API reasoning module is used for providing the test case set for the large generation model through an API interface and acquiring an output result set of the large generation model through the API interface. And the risk identification module is used for determining a risk test result of the generated large model based on the output result set and the risk category corresponding to each test case in the test case set.
The embodiment of the specification provides a risk evaluation device of model, the risk evaluation device of model includes: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: and receiving a risk evaluation instruction set aiming at the generated large model, wherein each risk evaluation instruction in the risk evaluation instruction set comprises requirement information for performing risk evaluation on the generated large model. And acquiring a generation control condition set for performing risk evaluation on the generation type large model based on the set of the requirement information corresponding to the risk evaluation instruction set, and generating a test case set for performing risk evaluation on the generation type large model based on the generation control condition set. And providing the test case set for the generated large model, and acquiring an output result set of the generated large model. And determining a risk test result of the generated large model based on the output result set and the risk category corresponding to each test case in the test case set.
The present description also provides a storage medium for storing computer-executable instructions that when executed by a processor implement the following: and receiving a risk evaluation instruction set aiming at the generated large model, wherein each risk evaluation instruction in the risk evaluation instruction set comprises requirement information for performing risk evaluation on the generated large model. And acquiring a generation control condition set for performing risk evaluation on the generation type large model based on the set of the requirement information corresponding to the risk evaluation instruction set, and generating a test case set for performing risk evaluation on the generation type large model based on the generation control condition set. And providing the test case set for the generated large model, and acquiring an output result set of the generated large model. And determining a risk test result of the generated large model based on the output result set and the risk category corresponding to each test case in the test case set.
Drawings
For a clearer description of embodiments of the present description or of the solutions of the prior art, the drawings that are required to be used in the description of the embodiments or of the prior art will be briefly described, it being obvious that the drawings in the description below are only some of the embodiments described in the description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art;
FIG. 1 is a schematic diagram of an exemplary embodiment of a risk assessment method for a model of the present disclosure;
FIG. 2A is a schematic diagram of a risk assessment page of a model of the present specification;
FIG. 2B is a schematic structural diagram of a model risk assessment system according to the present disclosure;
FIG. 3 is an embodiment of a risk assessment method for another model of the present disclosure;
FIG. 4 is a schematic diagram of a test case generation process according to the present disclosure;
FIG. 5 is a schematic diagram of a risk identification process according to the present disclosure;
FIG. 6 is a model risk assessment device embodiment of the present disclosure;
fig. 7 is a model risk assessment device embodiment of the present specification.
Detailed Description
The embodiment of the specification provides a risk evaluation method, device and equipment for a model.
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
The embodiment of the specification provides a large model safety evaluation mechanism based on dynamic data, in recent years, along with technological breakthroughs, a generated large model (such as a ChatGPT model and the like) is rapidly developed, and the large model is applied to a plurality of scenes (such as conversations, text generation, image generation and the like), so that great convenience and benefit are brought to human beings. Meanwhile, the generated large model also has the phenomena of facts errors, knowledge blind areas, common sense deviations and the like, and risks of training data source compliance, data use prejudice, generated content safety and the like, and the phenomena and risks can lead users to receive improper information and even influence the users to generate harmful behaviors, so that the development and the application of the generated large model are limited. In order to improve accuracy and reliability of the generated large model and enable content generated by the generated large model to be safe, reliable and reliable, the embodiment of the specification provides a safety evaluation mechanism, so that comprehensive, fair and public safety evaluation is carried out on the generated large model through unified standards and specifications and by using effective methods and tools. Specific processing can be seen from the details in the following examples.
Example 1
As shown in fig. 1, the embodiment of the present disclosure provides a risk evaluation method of a model (especially, relates to a security risk evaluation method of an AIGC model), and the execution subject of the method may be a terminal device or a server, where the terminal device may be a mobile terminal device such as a mobile phone, a tablet computer, a computer device such as a notebook computer or a desktop computer, or may be an IoT device (specifically, a smart watch, a vehicle-mounted device, etc.), and the server may be an independent server, or may be a server cluster formed by a plurality of servers, and the server may be a background server such as a financial service or a network shopping service, or may be a background server of an application program. In this embodiment, the execution subject is taken as a server for example for detailed description, and for the case that the execution subject is a terminal device, the following processing of the case of the server may be referred to, and will not be described herein. The method specifically comprises the following steps:
in step S102, a risk assessment instruction set for the generative large model is received, where each risk assessment instruction in the risk assessment instruction set includes requirement information for risk assessment of the generative large model.
The generation type large model can be based on an artificial intelligence mode of generating an countermeasure network, a large pre-training model and the like, generates a large model of related content with proper generalization capability through learning and recognition of existing data, can generate content with certain creative and quality (namely, content which can meet user requirements and has a novel degree higher than a preset threshold) through an artificial intelligence algorithm, can generate content matched with the generated large model (or related to the generated large model) according to input conditions or guidance through learning of the training model and a large amount of data, and can generate text, images, audios and the like matched with the generated large model through input keywords, descriptions or samples. The large generation model may include various types, for example, a large generation model based on a transducer, a large generation model based on a recurrent neural network, and the like, and specifically may be set according to the actual situation, which is not limited in the embodiment of the present specification. The large model may be a model in which the model parameters exceed a preset number threshold and/or the complexity of the model structure exceeds a preset complexity threshold. The risk evaluation may be to evaluate whether the generated large model has a specified security risk, where the risk may include, for example, a risk of disclosure of private data, a risk of mental health, a risk of discrimination, etc., which may be specifically set according to an actual situation, and the embodiment of the present disclosure does not limit this. The requirement information may be related information provided by the evaluation requirement party of the generative large model for performing risk evaluation on the generative large model, and may include one or more of evaluating whether the generative large model has a risk of private data leakage, evaluating whether the generative large model has a risk of discrimination, evaluating whether the generative large model has a risk of mental health, and the like, which may be specifically set according to practical situations, and the embodiment of the present specification is not limited.
In implementation, as shown in fig. 2A, when risk assessment needs to be performed on the generative large model, the assessment requirement party may send a requirement for risk assessment on the generative large model to the assessment party, and the assessment party may initiate a risk assessment instruction for the generative large model through an entry (such as a hyperlink or a key) provided in the risk assessment system according to the requirement of the assessment requirement party, where the risk assessment instruction for the generative large model may be received by the risk assessment system, and the risk assessment instruction may include requirement information provided by the assessment requirement party for risk assessment on the generative large model. Or, as shown in fig. 2B and fig. 2A, the corresponding risk evaluation page may be obtained through an entry (such as a hyperlink or a key) provided by the risk evaluation system provided by an application program installed in a device used by the evaluation demander, a requirement information input box for risk evaluation and a determination key may be included in the risk evaluation page, the evaluation demander may input, according to actual requirements, requirement information for performing risk evaluation on the generated large model in the requirement information input box for risk evaluation in the risk evaluation page, and after the input is completed, the determination key in the risk evaluation page may be clicked, at this time, the risk evaluation system may generate a risk evaluation instruction based on the requirement information for performing risk evaluation on the generated large model input in the requirement information input box for risk evaluation, and may obtain the risk evaluation instruction. By the method, the risk evaluation instruction set aiming at the generated large model can be obtained.
In practical application, each risk evaluation instruction may include, in addition to the above-mentioned requirement information, related information of the generative large model, specifically, an identifier of the generative large model, a model parameter of the generative large model, and the like, and may further include related information of the evaluation requirement party, specifically, an identifier of the evaluation requirement party, device information of a device used by the evaluation requirement party, and the like. In addition, considering that the development of the current large model is in the burst period, many post-organization structures are developing their large models, such as ChatGPT, chatGLM, llaMa, which puts forward a higher requirement on the access standardization of the risk evaluation platform, for this purpose, the evaluating demand party needs to set a corresponding interface according to the access requirement provided by the evaluating party, and the corresponding information can be transmitted through the set interface, which can be specifically set according to the actual situation, and the embodiment of the present specification does not limit the present specification.
In step S104, a set of generation control conditions for performing risk evaluation on the large generation model is obtained based on the set of demand information corresponding to the risk evaluation instruction set, and a set of test cases for performing risk evaluation on the large generation model is generated based on the set of generation control conditions.
The set of generation control conditions may be a set of control conditions for generating a test case for risk evaluation of the large generation model, each generation control condition in the set of generation control conditions may include a plurality of types, for example, the generation control conditions may be constructed by one or more specified keywords, or the generation control conditions may be constructed by one or more sentences, or the like, by which the test case matched with the generation control conditions may be controlled and generated, specifically, for example, the generation control conditions include a keyword a and a keyword B, and then the test case including the keyword a and the keyword B may be controlled and generated, specifically, may be set according to the actual situation, which is not limited by the embodiment of the present specification. The test case set may be a set of related descriptions for performing a test task on a specific item, each test case in the test case set may embody a test scheme, a method, a technology, a policy, and the like, and its content may include a plurality of types, for example, its content may include a test target, a test environment, input data, a test step, an expected result, a test script, and the like, through which a document or a test program may be finally formed, that is, the test case may be a set of information such as test input, execution conditions, and an expected result that is compiled for a certain target to be tested, and is used to verify whether the document or the test program that meets the requirement of the target to be tested is satisfied.
In implementation, when the risk evaluation instruction set is obtained in the above manner, the risk evaluation instruction in the risk evaluation instruction set may be analyzed, and the requirement information of the evaluation requirement party contained in the risk evaluation instruction set may be extracted. The evaluation party may construct a generation control condition for performing risk evaluation on the large generation model according to the requirement information, for example, the requirement information may be whether the large generation model is tested to have risk in terms of data security or whether the large generation model is tested to have risk in terms of personal privacy in terms of data security, etc., and then the corresponding generation control condition may be constructed, where the generation control condition includes keywords such as a user name, account information, identity credentials, a mobile phone number, an address, etc., or may include information such as an attack manipulation, etc., so as to obtain a generation control condition set, which may be specifically set according to actual situations.
The AIGC (Artificial Intelligence Generated Content, generating artificial intelligence) safety scene and attack mode have the characteristics of diversity and complexity, the current static test data set may not cover all cases, loopholes or blind spots are difficult to avoid, so that the generating large model has some content output with potential safety hazards, and therefore the capability of continuously generating the data set is required to realize continuous risk discovery.
By the method, the data set of the test case can be continuously generated, so that the test case is continuously and dynamically generated, and continuous risk discovery of the AIGC model can be realized.
In step S106, the test case set is provided to the generative large model, and an output result set of the generative large model is obtained.
In implementation, since the generative large model is often a model of an external user, in order to protect the generative large model from leakage, each test case may be provided to the generative large model, and the test requirement party may input the test case into the generative large model, or the generative large model may autonomously input the test case into the generative large model, and the generative large model may obtain a corresponding output result, thereby obtaining an output result set of the generative large model.
In the process of interaction between the evaluation party and the evaluation demand party, besides providing the test case set to the generation type large model, other information may be interacted, for example, configuration information such as model parameters of the generation type large model, device information of the device used by the evaluation demand party, etc., which may be specifically set according to actual conditions, and the embodiment of the present specification is not limited thereto.
In step S108, a risk evaluation result of the large model is determined based on the output result set and the risk category corresponding to each test case in the test case set.
The risk types corresponding to the test cases may include various types, for example, may include personal privacy data disclosure risk, discrimination risk, and the like, and may be specifically set according to actual situations, which is not limited in the embodiments of the present disclosure.
In implementation, the output result set may be analyzed, matching analysis may be performed based on the obtained analysis result and the risk category corresponding to each test case, and finally, whether the generated large model has risks matched with the risk category or probabilities of different risks may be determined, so as to obtain a corresponding risk evaluation result.
It should be noted that, the above-mentioned obtained generation control conditions may include one, based on the generation control conditions, one matched test case may be generated, or a plurality of different test cases may be generated, and risk evaluation may be performed on the generated large model by using the plurality of different test cases, specifically, reference may be made to the processing of step S106 to step S108, and then the output result of the generated large model obtained by each test case may be synthesized, and the risk evaluation result of the generated large model may be finally determined by combining the risk categories corresponding to the test cases.
In addition, in the embodiment of the present specification, the risk evaluation is performed for the large generative model, but in practical application, the risk evaluation may be performed for other generative models other than the large generative model by the processing of step S102 to step S108 described above, that is, the processing of step S102 to step S108 in the embodiment of the present specification may be applied to the generative model (i.e., the AIGC model).
According to the risk evaluation method of the model, through receiving a risk evaluation instruction set aiming at the large generative model, each risk evaluation instruction in the risk evaluation instruction set comprises requirement information for performing risk evaluation on the large generative model, then, based on a set of the requirement information corresponding to the risk evaluation instruction set, a generation control condition set for performing risk evaluation on the large generative model is obtained, a test case set for performing risk evaluation on the large generative model is generated based on the generation control condition set, then, the test case set can be provided for the large generative model, an output result set of the large generative model is obtained, finally, a risk test result of the large generative model can be determined based on the output result set and a risk category corresponding to each test case in the test case set, so that more test cases for performing risk evaluation on the large generative model can be continuously obtained through automatically generating the test case set of the large generative model, a data set base of the test cases is continuously expanded, potential basis for the large generative model is found, the large generative model is directly provided, the large generative model is directly obtained, the large generative model is output, the reliability of the large generative model is improved, the large generative model is reliably obtained, the quality of the large generative model is guaranteed, the large generative model is reliably tested through the large model is reliably obtained, and the high-level test result of the large generative model is reliably and the risk test case of the large model is reliably obtained, and the risk test case is reliably and the risk is generated, and the risk is evaluated on the large model is generated.
Example two
As shown in fig. 3, the embodiment of the present disclosure provides a risk evaluation method of a model, where an execution subject of the method may be a terminal device or a server, where the terminal device may be a mobile terminal device such as a mobile phone, a tablet computer, or a computer device such as a notebook computer or a desktop computer, or may also be an IoT device (specifically, such as a smart watch, an in-vehicle device, or the like), and where the server may be a separate server, or may be a server cluster formed by a plurality of servers, and the server may be a background server such as a financial service or an online shopping service, or may be a background server of an application program. In this embodiment, the execution subject is taken as a server for example for detailed description, and for the case that the execution subject is a terminal device, the following processing of the case of the server may be referred to, and will not be described herein. The method specifically comprises the following steps:
in step S302, a risk assessment instruction set for the generative large model is received, where each risk assessment instruction in the risk assessment instruction set includes requirement information for risk assessment of the generative large model.
In step S304, a set of generation control conditions for performing risk evaluation on the generation type large model is obtained based on the set of the demand information corresponding to the risk evaluation instruction set.
The generation control condition can be constructed by one or more of keywords, topics and instruction attack methods. The instruction attack method may be a method of attacking an operation instruction or the like of the large generative model, or may be a method of executing an instruction related to risk evaluation of the large generative model, and specifically may be set according to the actual situation, which is not limited in the embodiment of the present specification. The generation control conditions may be only keywords, such as "enterprise", "country", etc., or may be only instruction attack methods, such as "Trojan program", "password cracking", etc., or may be two or three of keywords, subjects, and instruction attack methods.
In implementation, if a certain generation control condition is constructed by a keyword, a theme, and an instruction attack method, the generation control condition may include a condition of information such as the keyword, the theme, and the instruction attack method.
In step S306, a context information set satisfying the generation control conditions is generated by the context processor based on the above-described generation control condition set, and each generation control condition in the generation control condition set includes at least guidance information for guiding and generating a test case capable of evaluating that the generation large model is at risk.
The context processor may be configured to generate context information matched with each generated control condition, input data of the context processor may generate the control conditions (i.e. one or more of keywords, topics and instruction attack methods), output data may be a piece of context information, and the context processor may be configured by using a plurality of different algorithms, for example, the context processor may be configured by using a neural network algorithm, may also be configured by using BERT, etc., and may be specifically set according to practical situations. The guidance information may be constructed based on one or more of the keywords, topics, and instruction attack techniques described above.
In an implementation, as shown in fig. 4, in order to more comprehensively evaluate the risk that the large model is generated and find the potential risk and limitation of the large model, guiding information that can evaluate the test case that the large model is generated and has the risk may be set in the generation control conditions, so as to generate corresponding test cases (at this time, the generated test case and the test case that is generated normally (e.g., the test case that is not generated by the guiding information) have higher resistance (i.e., the resistance between the two is higher than the preset resistance threshold), where the resistance may be that the similarity between the test case that is generated at this time and the test case that is generated normally is higher than the preset threshold, but when the generated test case is identified by the specified model, the generated test case may be identified as a different type (or an identification error)), and further, it may be found that more potential cases that may exist in the large model are generated, in the generation control conditions, in combination with the content that is included in each generation control condition set (e.g., the test case that is not generated by the guiding information), the context may be that satisfies the context, and the risk set may be generated by using the specific information that is generated by the guiding information that is generated by using the high-level information that is generated by the guiding information.
In step S308, a corresponding test case set is generated by a pre-trained problem generation model for generating test cases matching the generation control conditions based on the context information set.
The problem generating model may be a model constructed by a preset algorithm, where the preset algorithm may include a plurality of algorithms, for example, a neural network algorithm, a BERT algorithm, and the like, and may be specifically set according to actual situations, which is not limited in the embodiment of the present disclosure. The generated test cases have strong correlation with the key words, the topics and the instruction attack methods in the generation control conditions.
In implementation, a corresponding algorithm may be obtained and an architecture of the problem-generating model may be constructed based on the algorithm, e.g., an architecture of the problem-generating model may be constructed based on a neural network algorithm. The input data of the problem generating model can be a piece of context information, and the output data can be a test case for risk evaluation of the generating large model. Then, a training sample (i.e., a context information sample) for training the problem generation model may be obtained, the corresponding problem generation model may be trained using the training sample, an objective function may be preset during the training process, and parameters in the problem generation model may be optimized based on the objective function, to finally obtain a trained problem generation model.
As shown in fig. 4, each of the obtained context information sets may be input into a trained problem generation model, and test cases matching the generation control conditions may be generated by the trained problem generation model, thereby obtaining a test case set. In the case where a piece of context information satisfying the guidance information is generated from the guidance information included in the generation control condition, a corresponding test case is generated from the trained problem generation model based on the context information, and the risk of the generated large model can be evaluated with a high probability in the test case.
In practical applications, the problem generating model may be constructed by a truublellm model, based on which the truublellm model may be trained and applied in the following manner, namely: obtaining a sample control condition and a target test sample corresponding to the sample control condition, wherein the sample control condition is used for controlling a pre-training large model to generate a condition of a test sample matched with the sample control condition; inputting the sample control conditions and target test samples corresponding to the sample control conditions into a pre-training large model to obtain output results corresponding to the sample control conditions; based on the output result corresponding to the sample control condition and the sample control condition, determining loss information corresponding to the sample control condition through a preset loss function, and adjusting model parameters of the TroubleLLM model based on the loss information so as to perform model training on the TroubleLLM model until the loss function converges, thereby obtaining the trained TroubleLLM model. The preset loss function at least comprises a first sub-loss function constructed based on the resistance among the generated test cases.
In practical application, the preset loss function further comprises a second sub-loss function constructed based on risk inquiry in risk evaluation of the generated large model. In addition, the preset loss function further includes a third sub-loss function for balancing weights of the first sub-loss function and the second sub-loss function. Specifically, the first sub-loss function may be an RQMF loss function, the second sub-loss function may be an SFT loss function, and the third sub-loss function may be a logarithmic loss function.
As shown in fig. 4, the context information obtained above may be input into a trained TroubleLLM model, and a test case matching the generation control condition may be generated by the trained TroubleLLM model.
In addition, in order to more comprehensively evaluate the possible risk of the generated large model and find more potential risks of the generated large model, a new test case (i.e., an countermeasure case) may be generated by guiding the generated test case, and the similarity between the new test case and the generated test case is higher than a preset threshold, but the new test case and the generated test case may be identified as different types when identified by a specified model, based on which the new test case is the countermeasure case of the generated test case, a guiding policy may be preset according to the actual situation, specifically, based on the test case, the countermeasure case corresponding to the test case may be generated by a preset guiding policy. The guidance policy may be a policy for guiding the generation of the countermeasures.
In step S310, the test case set is provided to the generative large model through a preset API interface, and an output result set of the generative large model is obtained through the API interface.
In implementation, considering that the development of the current large model is in an explosion period, many post-organization structures are developing large models of themselves, such as ChatGPT, chatGLM, llaMa, which puts high demands on the access standardization of the risk evaluation platform, and the evaluation party also needs to mask the access efficiency problem caused by the different models, for this purpose, a standardized large model access mechanism needs to be set, through which a corresponding API interface can be set, including setting access parameters (such as model parameters and device parameters) of a unified API interface, and using a unified signature mode, etc. The test cases can be transferred to the large generative model through the API interface, and the output result of the large generative model is obtained from the large generative model through the API interface.
In addition, in the case of the countermeasure case, the set of countermeasure cases may be provided to a large model to be generated, and a target output result set corresponding to the countermeasure case may be obtained.
The set of countermeasures can be transferred to the large model of the generation formula through the API interface, and the set of target output results corresponding to the countermeasures can be obtained from the large model of the generation formula through the API interface.
Based on the above, since a standardized large model access mechanism is set, and a unified signature manner is used in the mechanism, the processing in step S310 may include: the method comprises the steps that a preset first signing key can be used for signing a test case set to obtain a signed test case set, and the signed test case set is provided for a large generation model through an API interface; and obtaining an output result set of the generated large model subjected to signature processing through an API interface, and performing signature verification processing on the output result set subjected to signature processing to obtain the output result set of the generated large model.
Similarly, the signed counterexample can be obtained by signing the counterexample by using a preset second signing key, and the signed counterexample is provided to the generated large model through the API interface; the target output result corresponding to the counterexample after signature processing is obtained through the API interface, and the target output result after signature processing is subjected to signature verification processing to obtain the target output result corresponding to the counterexample.
Based on the above, because a standardized large model access mechanism is set, the mechanism sets access parameters of a unified API interface, based on the access parameters, model parameters of the large generation model and device parameters of the device running the large generation model are acquired through the API interface, and the model parameters, the device parameters and test cases are stored correspondingly to output results of the large generation model. In practical application, the model parameters, the device parameters and the like can also be contained in the risk evaluation instruction, when the test case is provided for the large generation model, the model parameters, the device parameters and the test case in the risk evaluation instruction can be provided for the large generation model through an API interface, the device for running the large generation model can be matched with the model parameters of the current large generation model and the device parameters of the current device based on the received model parameters and the device parameters, if the received model parameters and the device parameters are matched with the model parameters of the current large generation model, the test case is input into the large generation model, a corresponding output result is obtained, and if the received model parameters and the device parameters are not matched with the model parameters of the current large generation model, the test case is refused to be input into the large generation model.
In step S312, based on the risk category corresponding to each test case in the test case set, a first risk identification policy corresponding to each risk category is obtained from the pre-stored risk identification policies.
In implementation, as shown in fig. 5, the risk identification policy may be preset according to the actual situation, where the preset risk identification policy may be a risk identification policy in a preset policy package, or may be a specific risk identification policy set for different risks or risk categories. The risk category corresponding to the test case can be obtained, and the first risk identification policy corresponding to the risk category can be obtained from the pre-stored risk identification policies (including the risk identification policies and/or the special risk identification policies in the policy package).
The risk categories may be set, for example, by setting three primary risk categories, ten risk categories (or risk sub-categories), as shown in table 1.
TABLE 1
The setting of the risk category is only an optional setting manner, and in practical application, a first-level risk category and a risk category (or a risk sub category) with finer granularity may also be set according to practical situations, which is not limited in the embodiment of the present specification.
In step S314, the output result set is identified based on the obtained first risk identification policy, so as to obtain an identification result corresponding to the output result set.
Based on the processing of step S314 described above, the following processing may be performed: and determining a risk test result of the generated large model based on the identification result corresponding to the output result set.
The above-described processing can also be realized by the processing of step S316 and step S318 described below.
In step S316, the output results in each risk category and the output result set are respectively input into one or more different risk recognition models trained in advance, so as to recognize the output result set, and obtain a recognition result corresponding to each risk recognition model.
The risk recognition model may include various types, for example, the risk recognition model may be constructed by a neural network model, a generation type large model based on a transducer, etc., and in practical application, the risk recognition model may be a model constructed based on RoBERTa, or the risk recognition model may be a model constructed based on GLM, or the risk recognition model may be a model constructed based on LLM.
In implementation, as shown in fig. 5, in order to complete the high-quality, multi-scenario-classified, generative large model risk assessment, a risk identification policy, a risk identification model and a manually audited risk assessment scheme may be fused. Different scenarios may require different risk identification strategies and technical means, and it is therefore desirable to be able to provide more targeted and referential guidance for the security optimization of a generative large model by the classification system of table 1 as described above. And respectively inputting each risk category and the corresponding output result in the output result set into one or more different risk recognition models trained in advance to obtain a recognition result corresponding to each risk recognition model. Therefore, based on the risk identification of the generated large model, the cost of manual participation is reduced, and the efficiency of risk identification is remarkably improved.
In step S318, a risk test result of the generative large model is determined based on the identification result corresponding to the output result set and the identification result corresponding to each risk identification model.
In implementation, as shown in fig. 5, if there are multiple scene recognition modes, the risk test result of the large model can be determined by voting with weights (such as W1, W2, and W3 … Wn in fig. 5), specifically, the recognition result corresponding to the output result and the recognition result corresponding to each risk recognition model can be comprehensively analyzed, and the risk test result of the large model can be determined based on the result of the comprehensive analysis.
In step S320, if the risk test result of the large generative model cannot be determined based on the identification result corresponding to the output result set and the identification result corresponding to each risk identification model, the output result set and the test case set are sent to the management terminal.
In implementation, if the output result cannot be identified through the risk identification strategy and the risk identification model, the risk test result of the large model cannot be determined, and at this time, the output result set and the test case set can be sent to a management terminal for qualitative determination through a manual verification mode.
In step S322, a risk test result of the generative large model is acquired from the management terminal.
In the embodiment of the present disclosure, the risk evaluation is performed on the large generative model, but in practical applications, the risk evaluation may be performed on other generative models except for the large generative model through the processing of step S302 to step S322 described above, that is, the processing of step S302 to step S322 in the embodiment of the present disclosure may be applied to the generative model (i.e., the AIGC model).
According to the risk evaluation method of the model, through receiving a risk evaluation instruction set aiming at the large generative model, each risk evaluation instruction in the risk evaluation instruction set comprises requirement information for performing risk evaluation on the large generative model, then, based on a set of the requirement information corresponding to the risk evaluation instruction set, a generation control condition set for performing risk evaluation on the large generative model is obtained, a test case set for performing risk evaluation on the large generative model is generated based on the generation control condition set, then, the test case set can be provided for the large generative model, an output result set of the large generative model is obtained, finally, a risk test result of the large generative model can be determined based on the output result set and a risk category corresponding to each test case in the test case set, so that more test cases for performing risk evaluation on the large generative model can be continuously obtained through automatically generating the test case set of the large generative model, a data set base of the test cases is continuously expanded, potential basis for the large generative model is found, the large generative model is directly provided, the large generative model is directly obtained, the large generative model is output, the reliability of the large generative model is improved, the large generative model is reliably obtained, the quality of the large generative model is guaranteed, the large generative model is reliably tested through the large model is reliably obtained, and the high-level test result of the large generative model is reliably and the risk test case of the large model is reliably obtained, and the risk test case is reliably and the risk is generated, and the risk is evaluated on the large model is generated.
In addition, through designing standardized API interfaces and flexible parameter configuration, differences of accessed generation type models are shielded, and through fusing risk identification strategies, risk identification models and manually-checked risk identification mechanisms, not only is risk evaluation of high-quality and multi-scene classification models completed, but also performance of the generation type large models on different risk problems can be evaluated more comprehensively, and potential risks and limitations of the generation type large models are found.
Example III
The risk evaluation method of the model provided in the embodiment of the present disclosure is based on the same thought, and the embodiment of the present disclosure further provides a risk evaluation device of the model, as shown in fig. 6.
The risk evaluation device of the model comprises: a use case generation module 601, an API reasoning module 602 and a risk identification module 603, wherein:
the use case generation module 601 receives a risk evaluation instruction set aiming at a large generation model, wherein each risk evaluation instruction in the risk evaluation instruction set comprises requirement information for performing risk evaluation on the large generation model; acquiring a generation control condition set for performing risk evaluation on the generation type large model based on the set of the requirement information corresponding to the risk evaluation instruction set, and generating a test case set for performing risk evaluation on the generation type large model based on the generation control condition set;
The API reasoning module 602 is used for providing the test case set for the large generation model through an API interface and obtaining an output result set of the large generation model through the API interface;
and the risk identification module 603 determines a risk test result of the generated large model based on the output result set and a risk category corresponding to each test case in the test case set.
In the embodiment of the present disclosure, the use case generating module 601 includes:
a context generation unit that generates, by a context processor, a set of context information satisfying the generation control conditions based on the set of generation control conditions, each of the generation control conditions including at least guidance information for guiding and generating a test case that can evaluate that the generation-type large model is at risk;
and a case generation unit configured to generate the test case set by a pre-trained problem generation model for generating a test case matching the generation control condition, based on the context information set.
In the embodiment of the present specification, each of the set of generation control conditions includes one or more of a keyword, a topic, and an instruction attack method, and the problem generation model is constructed by a TroubleLLM model.
In this embodiment of the present disclosure, the API inference module 602 performs signature processing on the test case set using a preset first signing key to obtain a signed test case set, and provides the signed test case set to the generated large model through the API interface;
the API reasoning module 602 obtains the output result set of the large generated model after signature processing through the API interface, and performs signature verification processing on the output result set after signature processing to obtain the output result set of the large generated model.
In this embodiment of the present disclosure, the risk identification module 603 includes:
the strategy acquisition unit is used for acquiring a first risk identification strategy corresponding to each risk category from prestored risk identification strategies based on the risk category corresponding to each test case in the test case set;
and the risk identification unit is used for identifying the output result set based on the acquired first risk identification strategy to obtain an identification result corresponding to the output result set, and determining a risk test result of the generated large model based on the identification result corresponding to the output result set.
In this embodiment of the present disclosure, the risk recognition unit inputs, to one or more different risk recognition models trained in advance, each of the risk categories and corresponding output results in the output result set, so as to recognize the output result set, and obtain a recognition result corresponding to each risk recognition model; and determining a risk test result of the generated large model based on the identification result corresponding to the output result set and the identification result corresponding to each risk identification model.
In this embodiment of the present disclosure, the risk identification module 603 includes:
the information sending unit is used for sending the output result set and the test case set to a management terminal if the risk test result of the generated large model cannot be determined based on the identification result corresponding to the output result set and the identification result corresponding to each risk identification model;
and the result acquisition unit acquires a risk test result of the generated large model from the management terminal.
In this embodiment of the present disclosure, the risk recognition model is a model constructed based on RoBERTa, or the risk recognition model is a model constructed based on GLM, or the risk recognition model is a model constructed based on LLM.
In this embodiment of the present disclosure, the API inference module 602 obtains, through the API interface, a model parameter of the generative large model and a device parameter of a device running the generative large model, and stores the model parameter, the device parameter, and the test case set in correspondence with an output result set of the generative large model.
In this embodiment of the present disclosure, the API inference module 602 provides configuration information required for developing the API interface for a device to be accessed that runs the generative large model, where the configuration information includes one or more of parameter information of the API interface, message structure information of output data through the API interface, message structure information of input data through the API interface, a data batch processing rule, and access rules of a plurality of different generative large models.
In implementation, the API inference module 602 establishes an API inference unified standard, which includes parameter configuration of the generative large model API (i.e. parameter information of the API interface), standardization of messages of input data and output data (i.e. message structure information of output data through the API interface, message structure information of input data through the API interface), and through the configuration information, the risk evaluation device of the model can support multiple large model quick access (i.e. quick access through access rules of multiple different generative large models) and data unified batch processing (i.e. batch processing through data batch processing rules). The API inference module 602 mainly sets the message structure information (including model parameters, device parameters, etc.) of the input data and the output data of the unified API interface by standardizing the mode of large-model API access, and uses the unified data signature mode, so as to mask the access efficiency problem caused by the large-model difference.
In the embodiment of the present disclosure, the use case generating module 601 generates, based on the test use case set, a countermeasure use case set corresponding to the test use case set through a preset guidance policy;
the API reasoning module 602 is used for providing the counterexample set for the generated large model through an API interface and obtaining a target output result set corresponding to the counterexample set through the API interface;
the risk identification module 603 determines a risk test result of the generated large model based on the output result set, a risk category corresponding to each test case in the test case set, the target output result set, and a risk category corresponding to each countermeasure case in the countermeasure case set.
The embodiment of the specification provides a risk evaluation device of a model, through receiving a risk evaluation instruction set aiming at a large generative model, each risk evaluation instruction in the risk evaluation instruction set comprises requirement information for performing risk evaluation on the large generative model, then, based on a set of the requirement information corresponding to the risk evaluation instruction set, a generation control condition set for performing risk evaluation on the large generative model is obtained, a test case set for performing risk evaluation on the large generative model is generated based on the generation control condition set, afterwards, the test case set can be provided for the large generative model, and an output result set of the large generative model is obtained, finally, a risk test result of the large generative model can be determined based on the output result set and a risk category corresponding to each test case in the test case set, so that more test cases for performing risk evaluation on the large generative model can be continuously obtained through automatically generating the test case set of the large generative model, a data set library of the test case is continuously expanded, thereby providing a basis for potential generation of the large generative model, and the large generative model is directly obtained, the large generative model can be provided for the large generative model, the reliability of the large generative model is improved, the reliability of the large generative model is ensured, and the risk test result of the large model is determined, and the risk test result of the large generative model is generated by the test case set is determined, and the test case set of the risk test case of the large model is continuously, and the risk test case of the risk evaluation is continuously generated, and more risk evaluation is required by the risk evaluation, and the risk test case is generated by the risk test case is generated.
In addition, through designing standardized API interfaces and flexible parameter configuration, differences of accessed generation type models are shielded, and through fusing risk identification strategies, risk identification models and manually-checked risk identification mechanisms, not only is risk evaluation of high-quality and multi-scene classification models completed, but also performance of the generation type large models on different risk problems can be evaluated more comprehensively, and potential risks and limitations of the generation type large models are found.
Example IV
The risk evaluation device of the model provided in the embodiment of the present disclosure further provides a risk evaluation device of the model based on the same thought, as shown in fig. 7.
The risk assessment device of the model may provide a terminal device or a server or the like for the above embodiments.
The risk assessment device of the model may have a relatively large difference due to different configurations or performances, and may include one or more processors 701 and a memory 702, where the memory 702 may store one or more stored applications or data. Wherein the memory 702 may be transient storage or persistent storage. The application program stored in memory 702 may include one or more modules (not shown in the figures), each of which may include a series of computer-executable instructions in the risk assessment device for the model. Still further, the processor 701 may be configured to communicate with the memory 702 and execute a series of computer executable instructions in the memory 702 on the risk assessment device of the model. The risk assessment device of the model may also include one or more power supplies 703, one or more wired or wireless network interfaces 704, one or more input/output interfaces 705, and one or more keyboards 706.
In particular, in this embodiment, the risk assessment device of the model includes a memory, and one or more programs, where the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions in the risk assessment device of the model, and execution of the one or more programs by the one or more processors includes computer-executable instructions for:
receiving a risk evaluation instruction set aiming at a generated large model, wherein each risk evaluation instruction in the risk evaluation instruction set comprises requirement information for performing risk evaluation on the generated large model;
acquiring a generation control condition set for performing risk evaluation on the generation type large model based on the set of the requirement information corresponding to the risk evaluation instruction set, and generating a test case set for performing risk evaluation on the generation type large model based on the generation control condition set;
providing the test case set for the generated large model, and acquiring an output result set of the generated large model;
And determining a risk test result of the generated large model based on the output result set and the risk category corresponding to each test case in the test case set.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the risk assessment device embodiment of the model, since it is substantially similar to the method embodiment, the description is relatively simple, and reference is made to the partial description of the method embodiment for relevant points.
According to the risk evaluation equipment for the model, a risk evaluation instruction set aiming at the large generative model is received, the risk evaluation instruction set comprises requirement information for performing risk evaluation on the large generative model in each risk evaluation instruction, then, based on the set of the requirement information corresponding to the risk evaluation instruction set, a generation control condition set for performing risk evaluation on the large generative model is obtained, a test case set for performing risk evaluation on the large generative model is generated based on the generation control condition set, then, the test case set can be provided for the large generative model, an output result set of the large generative model is obtained, finally, a risk test result of the large generative model can be determined based on the output result set and a risk category corresponding to each test case in the test case set, so that more test cases for performing risk evaluation on the large generative model can be continuously obtained through the test case set of automatically, a data set library of the test cases is continuously expanded, potential basis for the large generative model is found, the large generative model is directly provided, the large generative model is directly obtained, the large generative model is reliably obtained, the high-quality test case type large model is reliably obtained, the high-quality test result is reliably obtained, the large generative model is reliably obtained, and the risk test result of the large generative model is reliably obtained, and the risk test case of the large model is reliably and the risk evaluation is generated, and more than the risk evaluation is generated.
In addition, through designing standardized API interfaces and flexible parameter configuration, differences of accessed generation type models are shielded, and through fusing risk identification strategies, risk identification models and manually-checked risk identification mechanisms, not only is risk evaluation of high-quality and multi-scene classification models completed, but also performance of the generation type large models on different risk problems can be evaluated more comprehensively, and potential risks and limitations of the generation type large models are found.
Example five
Further, based on the method shown in fig. 1 to 5, one or more embodiments of the present disclosure further provide a storage medium, which is used to store computer executable instruction information, and in a specific embodiment, the storage medium may be a U disc, an optical disc, a hard disk, etc., where the computer executable instruction information stored in the storage medium can implement the following flow when executed by a processor:
receiving a risk evaluation instruction set aiming at a generated large model, wherein each risk evaluation instruction in the risk evaluation instruction set comprises requirement information for performing risk evaluation on the generated large model;
acquiring a generation control condition set for performing risk evaluation on the generation type large model based on the set of the requirement information corresponding to the risk evaluation instruction set, and generating a test case set for performing risk evaluation on the generation type large model based on the generation control condition set;
Providing the test case set for the generated large model, and acquiring an output result set of the generated large model;
and determining a risk test result of the generated large model based on the output result set and the risk category corresponding to each test case in the test case set.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for one of the above-described storage medium embodiments, since it is substantially similar to the method embodiment, the description is relatively simple, and reference is made to the description of the method embodiment for relevant points.
The embodiment of the specification provides a storage medium, by receiving a risk evaluation instruction set for a large model of a generation type, wherein each risk evaluation instruction in the risk evaluation instruction set comprises requirement information for performing risk evaluation on the large model of the generation type, then, based on a set of the requirement information corresponding to the risk evaluation instruction set, acquiring a generation control condition set for performing risk evaluation on the large model of the generation type, generating a test case set for performing risk evaluation on the large model of the generation type based on the generation control condition set, then, providing a test case set to the large model of the generation type and acquiring an output result set of the large model of the generation type, finally, determining a risk test result of the large model of the generation type based on the output result set and a risk category corresponding to each test case in the test case set, through the automatic generation of the test case set of the large generation model, more test cases for risk evaluation of the large generation model can be continuously obtained, and the database of the test cases is continuously expanded, so that a foundation is provided for the subsequent discovery of potential risks of more large generation models.
In addition, through designing standardized API interfaces and flexible parameter configuration, differences of accessed generation type models are shielded, and through fusing risk identification strategies, risk identification models and manually-checked risk identification mechanisms, not only is risk evaluation of high-quality and multi-scene classification models completed, but also performance of the generation type large models on different risk problems can be evaluated more comprehensively, and potential risks and limitations of the generation type large models are found.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing one or more embodiments of the present description.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, one or more embodiments of the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present description are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable fraud case serial-to-parallel device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable fraud case serial-to-parallel device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, one or more embodiments of the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
One or more embodiments of the present specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present description may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the present disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims (15)

1. A method of risk assessment of a model, the method comprising:
receiving a risk evaluation instruction set aiming at a generated large model, wherein each risk evaluation instruction in the risk evaluation instruction set comprises requirement information for performing risk evaluation on the generated large model;
acquiring a generation control condition set for performing risk evaluation on the generation type large model based on the set of the requirement information corresponding to the risk evaluation instruction set, and generating a test case set for performing risk evaluation on the generation type large model based on the generation control condition set;
providing the test case set for the generated large model, and acquiring an output result set of the generated large model;
and determining a risk test result of the generated large model based on the output result set and the risk category corresponding to each test case in the test case set.
2. The method of claim 1, the generating a test case set for risk assessment of the generative large model based on the set of generation control conditions, comprising:
generating, by a context processor, a set of context information satisfying the generation control conditions based on the set of generation control conditions, each of the generation control conditions including at least guidance information for guiding and generating a test case capable of evaluating that the generation-type large model is at risk;
And generating the test case set through a pre-trained problem generation model based on the context information set, wherein the problem generation model is used for generating test cases matched with the generation control conditions.
3. The method of claim 2, the method further comprising:
generating a countermeasure case set corresponding to the test case set through a preset guiding strategy based on the test case set;
providing the counterexample set for the generated large model, and acquiring a target output result set corresponding to the counterexample set;
the determining the risk test result of the generated large model based on the output result set and the risk category corresponding to each test case in the test case set includes:
and determining a risk test result of the generated large model based on the output result set, the risk category corresponding to each test case in the test case set, the target output result set and the risk category corresponding to each countermeasure case in the countermeasure case set.
4. A method as claimed in claim 3, each of the set of generation control conditions comprising one or more of a keyword, a topic and a command attack methodology, the problem generation model being constructed from a truublellm model.
5. The method of claim 3, the providing the test case set to the generative large model and obtaining the output result set of the generative large model comprising:
and providing the test case set for the large generation model through a preset API interface, and acquiring an output result set of the large generation model through the API interface.
6. The method of claim 5, wherein providing the test case set to the generative large model through a preset API interface, and obtaining the output result set of the generative large model through the API interface, comprises:
carrying out signature processing on the test case set by using a preset first signing key to obtain a signed test case set, and providing the signed test case set for the generated large model through the API interface;
and acquiring an output result set of the generated large model subjected to signature processing through the API interface, and performing signature verification processing on the output result set subjected to signature processing to obtain the output result set of the generated large model.
7. The method of any of claims 1-6, the determining the risk test result of the generative large model based on the output result set and a risk category corresponding to each test case in the test case set, comprising:
Based on risk categories corresponding to each test case in the test case set, acquiring a first risk identification strategy corresponding to each risk category from prestored risk identification strategies;
and identifying the output result set based on the acquired first risk identification strategy to obtain an identification result corresponding to the output result set, and determining a risk test result of the generated large model based on the identification result corresponding to the output result set.
8. The method of claim 7, the determining the risk test result of the generative large model based on the recognition result corresponding to the output result set, comprising:
respectively inputting each risk category and corresponding output results in the output result set into one or more different pre-trained risk recognition models to recognize the output result set, so as to obtain recognition results corresponding to each risk recognition model;
and determining a risk test result of the generated large model based on the identification result corresponding to the output result set and the identification result corresponding to each risk identification model.
9. The method of claim 8, wherein the determining the risk test result of the generative large model based on the output result set and the risk category corresponding to each test case in the test case set comprises:
If the risk test result of the generated large model cannot be determined based on the identification result corresponding to the output result set and the identification result corresponding to each risk identification model, the output result set and the test case set are sent to a management terminal;
and acquiring a risk test result of the generated large model from the management terminal.
10. The method of claim 8, the risk recognition model is a RoBERTa-based model, or the risk recognition model is a GLM-based model, or the risk recognition model is a LLM-based model.
11. A risk assessment apparatus for a model, the apparatus comprising:
the system comprises a use case generation module, a risk evaluation instruction set and a risk evaluation module, wherein the use case generation module receives a risk evaluation instruction set aiming at a large generation model, and each risk evaluation instruction in the risk evaluation instruction set comprises requirement information for performing risk evaluation on the large generation model; acquiring a generation control condition set for performing risk evaluation on the generation type large model based on the set of the requirement information corresponding to the risk evaluation instruction set, and generating a test case set for performing risk evaluation on the generation type large model based on the generation control condition set;
The API reasoning module is used for providing the test case set for the large generation model through an API interface and obtaining an output result set of the large generation model through the API interface;
and the risk identification module is used for determining a risk test result of the generated large model based on the output result set and the risk category corresponding to each test case in the test case set.
12. The apparatus of claim 11, the API inference module to obtain, through the API interface, model parameters of the generative large model and device parameters of a device running the generative large model, and store the model parameters, the device parameters, the test case set in correspondence with an output result set of the generative large model.
13. The apparatus of claim 11, the API inference module to provide configuration information required to develop the API interface for a device to be accessed running the generative large model, the configuration information including one or more of parameter information of the API interface, message structure information of output data through the API interface, message structure information of input data through the API interface, data batch processing rules, and access rules of a plurality of different generative large models.
14. The apparatus of claim 11, the use case generation module to generate a counterexample set corresponding to the test case set through a preset guidance policy based on the test case set;
the API reasoning module is used for providing the counterexample set for the generated large model through an API interface and acquiring a target output result set corresponding to the counterexample set through the API interface;
and the risk identification module is used for determining a risk test result of the generated large model based on the output result set, the risk category corresponding to each test case in the test case set, the target output result set and the risk category corresponding to each countermeasure case in the countermeasure case set.
15. A risk evaluating apparatus of a model, the risk evaluating apparatus of the model comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
receiving a risk evaluation instruction set aiming at a generated large model, wherein each risk evaluation instruction in the risk evaluation instruction set comprises requirement information for performing risk evaluation on the generated large model;
Acquiring a generation control condition set for performing risk evaluation on the generation type large model based on the set of the requirement information corresponding to the risk evaluation instruction set, and generating a test case set for performing risk evaluation on the generation type large model based on the generation control condition set;
providing the test case set for the generated large model, and acquiring an output result set of the generated large model;
and determining a risk test result of the generated large model based on the output result set and the risk category corresponding to each test case in the test case set.
CN202310838388.1A 2023-07-07 2023-07-07 Model risk evaluation method, device and equipment Pending CN116909889A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310838388.1A CN116909889A (en) 2023-07-07 2023-07-07 Model risk evaluation method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310838388.1A CN116909889A (en) 2023-07-07 2023-07-07 Model risk evaluation method, device and equipment

Publications (1)

Publication Number Publication Date
CN116909889A true CN116909889A (en) 2023-10-20

Family

ID=88362157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310838388.1A Pending CN116909889A (en) 2023-07-07 2023-07-07 Model risk evaluation method, device and equipment

Country Status (1)

Country Link
CN (1) CN116909889A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117454142A (en) * 2023-12-26 2024-01-26 北京奇虎科技有限公司 Data generation method and device, storage medium and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117454142A (en) * 2023-12-26 2024-01-26 北京奇虎科技有限公司 Data generation method and device, storage medium and electronic equipment
CN117454142B (en) * 2023-12-26 2024-04-16 北京奇虎科技有限公司 Data generation method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
Naeini et al. Privacy expectations and preferences in an {IoT} world
CN107808098B (en) Model safety detection method and device and electronic equipment
US20150012464A1 (en) Systems and Methods for Creating and Implementing an Artificially Intelligent Agent or System
US11580222B2 (en) Automated malware analysis that automatically clusters sandbox reports of similar malware samples
CN110457578B (en) Customer service demand identification method and device
CN115033676B (en) Intention recognition model training and user intention recognition method and device
US10380261B2 (en) Conversational language and informational response systems and methods
CN110008394B (en) Public opinion information identification method, device and equipment
WO2023231785A1 (en) Data processing method, apparatus, and device
CN116909889A (en) Model risk evaluation method, device and equipment
CN112200132A (en) Data processing method, device and equipment based on privacy protection
CN113516480A (en) Payment risk identification method, device and equipment
US20200202068A1 (en) Computing apparatus and information input method of the computing apparatus
EP4060517A1 (en) System and method for designing artificial intelligence (ai) based hierarchical multi-conversation system
CN114896603A (en) Service processing method, device and equipment
CN114880472A (en) Data processing method, device and equipment
KR102102287B1 (en) Method for crowdsourcing data of chat model for chatbot
CN113221717A (en) Model construction method, device and equipment based on privacy protection
CN113992429B (en) Event processing method, device and equipment
CN113792889B (en) Model updating method, device and equipment
CN115204395A (en) Data processing method, device and equipment
Zhang et al. Focus on the action: Learning to highlight and summarize jointly for email to-do items summarization
CN110942306A (en) Data processing method and device and electronic equipment
CN117076650B (en) Intelligent dialogue method, device, medium and equipment based on large language model
Breve et al. A BERT-based Model for Semantic Consistency Checking of Automation Rules

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination