CN118194358A - Data security risk assessment and management system based on large language model - Google Patents

Data security risk assessment and management system based on large language model Download PDF

Info

Publication number
CN118194358A
CN118194358A CN202410605760.9A CN202410605760A CN118194358A CN 118194358 A CN118194358 A CN 118194358A CN 202410605760 A CN202410605760 A CN 202410605760A CN 118194358 A CN118194358 A CN 118194358A
Authority
CN
China
Prior art keywords
risk
coefficient
data
feature
combined
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410605760.9A
Other languages
Chinese (zh)
Other versions
CN118194358B (en
Inventor
高翔
叶吴彬
吴慧明
邓宙锦
孙义辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Zhongxin Wang 'an Information Technology Co ltd
Original Assignee
Fujian Zhongxin Wang 'an Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Zhongxin Wang 'an Information Technology Co ltd filed Critical Fujian Zhongxin Wang 'an Information Technology Co ltd
Priority to CN202410605760.9A priority Critical patent/CN118194358B/en
Publication of CN118194358A publication Critical patent/CN118194358A/en
Application granted granted Critical
Publication of CN118194358B publication Critical patent/CN118194358B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to a data security risk assessment and management system based on a large language model, which relates to the field of data security technology, and comprises the steps of obtaining data feature types and forming a data feature set; randomly combining the data feature types in the data feature set to determine a combined feature set; defining a combined effective set and determining the number of combined features; determining the combination feature quantity with the largest numerical value, determining a reference risk coefficient according to the combination feature quantity, and determining different feature types according to the combination effective set and the data feature set; determining a risk change coefficient according to all the different feature types, performing mean value calculation to determine a mean value change coefficient, and determining a comprehensive risk coefficient according to the mean value change coefficient and the reference risk coefficient; and determining a risk management scheme according to the comprehensive risk coefficient, and processing the current data according to the risk management scheme. The application has the effect of facilitating security risk assessment for data types not in the large language model.

Description

Data security risk assessment and management system based on large language model
Technical Field
The application relates to the field of data security technology, in particular to a data security risk assessment and management system based on a large language model.
Background
The large language model is a deep learning algorithm that is capable of performing various natural language processing tasks. The large language model can effectively identify and respond to data security risks such as unfairness, abuse, misleading, harmful suggestions, privacy disclosure and the like, so that the large language model is widely applied to data security assessment inside enterprises.
In the related technology, when the large language model is used for evaluating the data security risk, a worker inputs a corresponding massive text data according to the self business characteristics and the security requirement, so that the large language model is constructed, and the data security risk is evaluated later.
In the related art, since the data types are many and change fast, training processing cannot be performed on all types of data when a worker builds a large language model, so that when data types which are not in the large language model appear, the data security risk cannot be effectively evaluated, and there is still room for improvement.
Disclosure of Invention
In order to facilitate security risk assessment of data types not in a large language model, the application provides a security risk assessment and management system based on the large language model data.
In a first aspect, the present application provides a method for security risk assessment and management based on large language model data, which adopts the following technical scheme:
a security risk assessment and management method based on large language model data comprises the following steps:
acquiring a data characteristic type of the demand detection data;
summarizing the data feature types to form a data feature set;
randomly combining the data feature types in the data feature set when the registration set in the preset large language model does not contain the data feature set so as to determine a combined feature set;
Defining a combined feature set consistent with the registration set as a combined effective set, and counting according to the data feature types in the combined effective set to determine the number of combined features;
Determining the number of combined features with the largest numerical value according to a preset ordering rule, determining a reference risk coefficient according to a combined effective set corresponding to the number of combined features, and comparing according to the combined effective set and a data feature set to determine different feature types;
Determining risk change coefficients between two registration sets according to all different feature types in the registration sets;
Calculating the average value according to all risk change coefficients to determine an average value change coefficient, and calculating according to the average value change coefficient and a reference risk coefficient to determine a comprehensive risk coefficient;
And determining a risk management scheme corresponding to the comprehensive risk coefficient according to a preset risk management matching relation, and processing the current requirement detection data according to the risk management scheme.
Optionally, after the number of combined features is determined, the security risk assessment and management method based on the large language model data further includes:
Judging whether at least two combined effective sets with the same number and the maximum combined characteristic quantity exist or not;
If at least two combined effective sets with the same number and the maximum combined characteristic quantity do not exist, determining a reference risk coefficient according to the combined effective sets;
If at least two combined effective sets with the same and maximum combined feature quantity exist, determining different feature types under each combined effective set, and forming different sets according to the different feature types;
defining a registration set including a dissimilar set as a referenceable set among the registration sets, and counting according to the referenceable set to determine a referenceable number;
and determining the referent quantity with the largest numerical value according to the ordering rule, and determining the benchmark risk coefficient according to the combined effective set corresponding to the referent quantity.
Optionally, after the referent quantity is determined, the security risk assessment and management method based on the large language model data further comprises:
judging whether at least two referenceable sets with the same and maximum referenceable quantity exist or not;
If at least two referents with the same and maximum referents are not present, determining a reference risk coefficient according to the combined effective set corresponding to the unique referents;
If at least two referents with the same and maximum referents exist, calculating according to each risk change coefficient and the mean change coefficient to determine a risk deviation coefficient;
Determining the compensation quantity corresponding to the risk deviation coefficient according to a preset compensation matching relation, and correcting the referent quantity according to the compensation quantity, wherein the compensation matching relation is that In order to compensate for the amount of time,As a coefficient of the risk deviation (risk deviation),AndThe parameters are calculated for a preset fixed.
Optionally, after the risk change coefficient is determined, the security risk assessment and management method based on the large language model data further includes:
respectively defining two registration sets for determining risk change coefficients as a reference set and an expansion set, and defining the data feature type in the reference set as a reference feature type;
Defining a combined effective set for determining the reference risk coefficient as a benchmarking combination, and defining a reference feature type in the benchmarking combination as a reliable feature type;
counting according to the reference feature types to determine a reference number, and counting according to the reliable feature types to determine a reliable number;
and calculating according to the reliable quantity and the reference quantity to determine the reliable duty ratio, and rejecting risk change coefficients determined by the reference set and the expansion set, wherein the reliable duty ratio is smaller than the preset demand duty ratio.
Optionally, after the risk change coefficient is removed, the security risk assessment and management method based on the large language model data further comprises the following steps:
Counting according to the acquired risk variation coefficient to determine coefficient statistics quantity;
judging whether the coefficient statistical quantity is larger than a preset reference demand quantity or not;
If the coefficient statistical quantity is larger than the reference demand quantity, carrying out average value calculation according to the risk change coefficient;
If the coefficient statistical quantity is not greater than the reference demand quantity, defining a reference set corresponding to the eliminated risk change coefficient as an alternative set, and defining an alternative set with only one reliable feature type as a theoretical analysis set;
Carrying out mean value calculation on risk change coefficients determined by a theoretical analysis set corresponding to the same reliable feature type to determine analysis mean value coefficients;
calculating according to each risk change coefficient and the analysis mean value coefficient to determine a stable difference coefficient;
And defining the reliable feature type with the stable difference coefficient smaller than the preset standard coefficient as the stable feature type, recovering the risk change coefficient determined by the reference set containing the stable feature type from the eliminated risk change coefficients, and carrying out average value calculation according to all the risk change coefficients.
Optionally, after the stable feature type is determined, the security risk assessment and management method based on the large language model data further comprises:
counting according to a reference set containing stable characteristic types in the eliminated risk change coefficients to determine the number of the requirement coincidence;
performing difference calculation according to the coefficient statistical quantity and the reference demand quantity to determine the difference demand quantity;
judging whether the number of the required coincidence requirements is larger than the number of the difference requirements;
If the number of the requirement coincidence is not greater than the number of the difference requirements, recovering risk change coefficients determined by all reference sets containing stable characteristic types from the eliminated risk change coefficients;
if the required meeting quantity is larger than the difference required quantity, carrying out difference calculation according to the risk change coefficient and the analysis mean value coefficient to determine an excess coefficient;
and sorting the excess coefficients from small to large to determine an excess sequence, and selecting excess coefficients of the difference required quantity from front to back according to the excess sequence to recover the corresponding risk change coefficients.
In a second aspect, the present application provides a security risk assessment and management system based on large language model data, which adopts the following technical scheme:
a large language model data based security risk assessment and management system comprising:
The acquisition module is used for acquiring the data characteristic type of the requirement detection data;
The processing module is connected with the acquisition module and the judging module and is used for storing and processing information;
the judging module is connected with the acquisition module and the processing module and is used for judging information;
the processing module generalizes the data characteristic types to form a data characteristic set;
the processing module randomly combines the data feature types in the data feature set when the registration set in the preset large language model does not contain the data feature set so as to determine a combined feature set;
The processing module defines the combination feature set which is judged by the judging module to be consistent with the registration set as a combination effective set, and counts according to the data feature types in the combination effective set to determine the number of the combination features;
the processing module determines the combination feature quantity with the largest numerical value according to a preset ordering rule, determines a reference risk coefficient according to a combination effective set corresponding to the combination feature quantity, and compares the combination effective set with the data feature set to determine different feature types;
The processing module determines risk change coefficients between the two registration sets according to all different feature types in the registration sets;
The processing module carries out mean value calculation according to all risk change coefficients to determine a mean value change coefficient, and carries out calculation according to the mean value change coefficient and a reference risk coefficient to determine a comprehensive risk coefficient;
the processing module determines a risk management scheme corresponding to the comprehensive risk coefficient according to a preset risk management matching relation, and processes the current requirement detection data according to the risk management scheme.
In a third aspect, the present application provides an intelligent terminal, which adopts the following technical scheme:
an intelligent terminal comprises a memory and a processor, wherein the memory stores a computer program which can be loaded by the processor and execute any one of the large language model data based security risk assessment and management method.
In summary, the present application includes at least one of the following beneficial technical effects:
1. when the data which is not learned appears, the original data when the large language model is used for construction can be used for evaluating the risk of the current data;
2. the risk of the current data can be evaluated by selecting the original data with higher relevance, and the accuracy of risk evaluation is improved.
Drawings
FIG. 1 is a flow chart of a method for security risk assessment and management based on large language model data.
Fig. 2 is a flow chart of a combined active set screening method.
Fig. 3 is a flow chart of a referenceable collection screening method.
Fig. 4 is a flowchart of a risk variation coefficient culling method.
Fig. 5 is a flowchart of a risk change coefficient recovery method.
Fig. 6 is a flow chart of a risk variation coefficient recovery selection method.
FIG. 7 is a block flow diagram of a method for security risk assessment and management based on large language model data.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to fig. 1 to 7 and the embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Embodiments of the application are described in further detail below with reference to the drawings.
The embodiment of the application discloses a data security risk assessment and management method based on a large language model, which can select a data type with higher association degree with a current data type in the large language model for analysis when a data type which does not appear occurs during data security risk so as to determine the approximate risk degree of the current data type, thereby being convenient for carrying out effective relation on the data.
Referring to fig. 1, a method flow of the large language model data based security risk assessment and management method includes the following steps:
step S100: and acquiring the data characteristic type of the requirement detection data.
The data feature type, i.e. the type of data, e.g. the source of the data, the size of the data, the form of the data distribution, the degree of data dispersion, etc., when e.g. the size of the data is different, the corresponding data feature types are also different.
Step S101: the data feature types are generalized to form a data feature set.
The data feature set is a set of data feature types acquired by single data.
Step S102: and randomly combining the data feature types in the data feature set when the registered set in the preset large language model does not contain the data feature set so as to determine a combined feature set.
The registration set is a data feature set of specimen data for constructing the large language model, and when the registration set in the large language model does not contain the data feature set, the risk degree of the current data cannot be directly obtained through the existing data in the large language model and needs to be further analyzed; the combined feature set is a combination obtained by summarizing a random number of data feature types selected randomly, for example, the data feature set is ABC, and the formed combined feature set comprises A, B, C, AB, AC, BC, ABC.
Step S103: a combined feature set that is consistent with the registered set is defined as a combined active set and counted according to the data feature types in the combined active set to determine the combined feature quantity.
The combined effective set can be a combined feature set found from a database of the bottom layer constructed by the large language model, and the number of the combined features is the total number of data feature types in the combined effective set.
Step S104: determining the number of combined features with the largest numerical value according to a preset ordering rule, determining a reference risk coefficient according to a combined effective set corresponding to the number of combined features, and comparing according to the combined effective set and a data feature set to determine different feature types.
The sequencing rule is a method which is set by staff and can sequence the magnitude of the numerical value, such as a bubbling method, and the number of combined features with the largest numerical value can be determined through the sequencing rule, namely the combined effective set is closest to the data feature set of the data to be analyzed currently; taking ABC as an example, if there are A, B and AB combined feature sets in the large language model, the risk level value of the combined feature set is corresponding to the large language model, for example, the risk level value of a is 2%, the risk level value of B is 3%, and the risk level value of AB is 4%; the distinct feature types are the remaining data feature types that are not within the combined active set.
Step S105: the risk change coefficients between the two registered sets are determined based on all the distinct feature types in the registered sets.
The risk change coefficient is the difference of the risk degrees obtained after different feature types in the registration sets are different, for example, the data feature set is ABCDE, the effective combination set with the largest number of available combination features is ABC, the different feature types are DE, the compared registration sets are for example, A and ADE, F and DEF, and the like, only the two registration sets are required to be different by all different feature types, the risk change coefficient is the difference of the risk change degrees which can occur after all different feature types are added from the original registration set to the other registration set, taking the registration sets of A and ADE as an example, if the risk coefficient value of A stored in a large language model is 2%, the risk coefficient value of ADE is 5%, the corresponding risk change coefficient is 3%, and the risk change coefficient is obtained by subtracting 2%.
Step S106: and carrying out average value calculation according to all risk change coefficients to determine an average value change coefficient, and carrying out calculation according to the average value change coefficient and a reference risk coefficient to determine a comprehensive risk coefficient.
The mean change coefficient is the average value of all the determined risk change coefficients, and the reference risk coefficient and the mean change coefficient can be added to obtain the comprehensive risk coefficient representing the risk degree of the current data.
Step S107: and determining a risk management scheme corresponding to the comprehensive risk coefficient according to a preset risk management matching relation, and processing the current requirement detection data according to the risk management scheme.
The risk management scheme is a scheme for carrying out data processing on the data of the comprehensive risk coefficient, such as data security warning, data interception, data optimization and the like, the risk management matching relationship between the two is determined in advance by staff according to the requirement condition of the staff on the data, and the current requirement detection data is processed through the risk management scheme so as to effectively manage the data; in the process of using the computer, external data can be transmitted inwards, at the moment, the characteristic types of the data can be analyzed, the data is not in the original training set, and in principle, the data cannot be specially processed, for example, the data is uniformly received or is uniformly refused, so that the condition that the system safety is influenced due to the reception of toxic data can occur, and the condition that the data to be used cannot be received is also possible, so that the operation of system users is inconvenient; when the comprehensive risk coefficient is introduced, data risk assessment is carried out on any untrained data type, so that whether the data can be accepted and processed after receiving can be known, for example, when the comprehensive risk coefficient is in a high risk interval, the corresponding risk management scheme is used for intercepting the data, so that toxic data cannot enter the system to invade the system, when the comprehensive risk coefficient is in a medium risk interval, the corresponding risk management scheme is used for monitoring the data, the data can enter the system but real-time monitoring is needed for the data, so that the data can be optimized in time when the data is abnormal, when the comprehensive risk coefficient is in a low risk interval, the corresponding risk management scheme is used for warning data safety, namely, only the data need to be warned, other operations are not needed, and when the comprehensive risk coefficient is in a risk-free interval, the corresponding risk management scheme is used for normally receiving the data, namely, the data can be normally received.
Referring to fig. 2, after the determination of the number of combined features, the security risk assessment and management method based on the large language model data further includes:
step S200: and judging whether at least two combined effective sets with the same number and the maximum combined characteristic quantity exist.
The purpose of the determination is to know whether there are multiple combined active sets that meet the requirements.
Step S2001: and if at least two combined effective sets with the same number and the maximum combined characteristic quantity do not exist, determining a reference risk coefficient according to the combined effective sets.
When at least two combined effective sets with the same number of combined features and the maximum combined effective set does not exist, only a unique combined effective set meeting the requirements is indicated, and then the reference risk coefficient is determined according to the combined effective set.
Step S2002: if at least two combined effective sets with the same and maximum combined feature quantity exist, determining different feature types under each combined effective set, and forming different sets according to the different feature types.
When at least two combined effective sets with the same number and the maximum combined characteristic number exist, a plurality of combined effective sets meeting the requirements exist, and further screening is needed at the moment; the distinct set is a set formed by combining all the distinct feature types.
Step S201: among the registered sets, a registered set including a different set is defined as a referenceable set, and a count is made based on the referenceable set to determine the referenceable number.
The referenceable set is a set in which the registration set includes different sets, and the referenceable number is the total number of the referenceable sets.
Step S202: and determining the referent quantity with the largest numerical value according to the ordering rule, and determining the benchmark risk coefficient according to the combined effective set corresponding to the referent quantity.
The referent quantity with the largest value can be determined through the ordering rule, namely the referent quantity corresponds to the largest data of the acquired risk change coefficient, so that the subsequent determination of the comprehensive risk coefficient is facilitated.
Referring to fig. 3, after the referenceable quantity is determined, the security risk assessment and management method based on the large language model data further includes:
step S300: it is determined whether there are at least two referenceable sets of the same and maximum referenceable number.
The purpose of the determination is to learn whether there are multiple referenceable sets that can be selected to meet the requirements.
Step S3001: if at least two referents with the same and maximum referents are not present, determining a base risk coefficient according to the combined effective set corresponding to the unique referents.
When there are not at least two referenceable sets with the same and the largest referenceable number, it is only necessary to specify only one referenceable set, and then the base risk coefficient is normally determined according to the combined effective set corresponding to the referenceable number corresponding to the referenceable set.
Step S3002: if at least two referents with the same and maximum referents exist, calculating according to each risk change coefficient and the mean change coefficient to determine the risk deviation coefficient.
When there are at least two referents of the same and maximum referent sets, this indicates that there are a number of choices that require further analysis; the risk deviation coefficient is calculated asWhereinAs a coefficient of the risk deviation (risk deviation),As the coefficient of variation of the mean value,As the nth risk variation coefficient,The parameters are calculated for a preset fixed.
Step S301: determining the compensation quantity corresponding to the risk deviation coefficient according to a preset compensation matching relation, and correcting the referent quantity according to the compensation quantity, wherein the compensation matching relation is thatIn order to compensate for the amount of time,As a coefficient of the risk deviation (risk deviation),AndThe parameters are calculated for a preset fixed.
The compensation quantity is a quantity value for correcting the referent quantity, when the risk deviation coefficient is smaller, the influence caused by the currently determined different set is more stable, and the compensation quantity with larger value can be determined at the moment to be added into the referent quantity, so that the subsequent selection of the combined effective set is facilitated; the corresponding compensation matching relationship is determined in advance by the staff, whereinAndThe values may be the same or different.
Referring to fig. 4, after the risk change coefficient is determined, the security risk assessment and management method based on the large language model data further includes:
Step S400: two registration sets for determining risk change coefficients are respectively defined as a reference set and an expansion set, and data feature types in the reference set are defined as reference feature types.
The reference set is a set that does not contain a distinct set, such as a in the above example, and the extended set is a set that contains a distinct set, such as ADE in the above example; the reference feature type is defined to enable identification of different data feature types for subsequent analysis.
Step S401: the combined active set that determines the benchmark risk coefficients is defined as a benchmarking combination and the benchmark feature types that lie within the benchmarking combination are defined as reliable feature types.
A pair of signature combinations, such as ABC in the above example, that are the reference signature types within the pair of signature combinations, that is A, B and C in the above example, are defined to achieve differentiation of the different active sets.
Step S402: counting is performed according to the reference feature type to determine a reference number and counting is performed according to the reliable feature type to determine a reliable number.
The reference number is the number of reference feature types present and the reliable number is the number of reliable feature types in the determined reference set.
Step S403: and calculating according to the reliable quantity and the reference quantity to determine the reliable duty ratio, and rejecting risk change coefficients determined by the reference set and the expansion set, wherein the reliable duty ratio is smaller than the preset demand duty ratio.
The reliable duty ratio is a ratio obtained by dividing the reliable quantity by the reference quantity, and the required duty ratio is the minimum reliable duty ratio which is set by a worker and is required to be met when a certain correlation exists between the current reference set and the determined standard alignment combination; when the reliable duty ratio is smaller than the demand duty ratio, the risk change coefficient determined by the current reference set and the expansion set is indicated to be possibly in a condition that the reference cannot be provided for the current data, and the current data is rejected at the moment so as to reduce errors.
Referring to fig. 5, after the risk variation coefficient is removed, the security risk assessment and management method based on the large language model data further includes:
step S500: counting according to the acquired risk change coefficient to determine coefficient statistical quantity.
The coefficient statistics are the total number of risk variation coefficients still existing at present.
Step S501: judging whether the coefficient statistical quantity is larger than the preset reference demand quantity or not.
The standard required quantity is the minimum coefficient statistical quantity which is required to be met when a certain risk change coefficient value is larger or smaller and the determined mean change coefficient set by a worker cannot generate larger influence, and the purpose of judgment is to know whether the mean change coefficient can be calculated more accurately at present.
Step S5011: and if the coefficient statistical quantity is greater than the reference demand quantity, carrying out average value calculation according to the risk change coefficient.
When the coefficient statistical quantity is larger than the reference demand quantity, the mean change coefficient which is accurate and reasonable can be calculated at present, and the mean calculation is normally performed at the moment.
Step S5012: if the coefficient statistical quantity is not greater than the reference demand quantity, defining the reference set corresponding to the eliminated risk change coefficient as an alternative set, and defining the alternative set with only one reliable feature type as a theoretical analysis set.
When the coefficient statistical quantity is not greater than the reference demand quantity, the current risk change coefficient is limited, and the situation that a certain risk change coefficient is larger or smaller in value and has larger influence on the mean change coefficient possibly exists, and further analysis is needed at the moment; alternative sets are defined to distinguish between different reference sets, and theoretical analysis sets are defined to analyze the impact that individual data feature types can have.
Step S502: and carrying out mean value calculation on risk change coefficients determined by the theoretical analysis set corresponding to the same reliable feature type to determine analysis mean value coefficients.
The analysis mean value coefficient is the average value of all risk change coefficients determined by a theoretical analysis set corresponding to the same reliable feature type.
Step S503: and calculating according to each risk change coefficient and the analysis mean value coefficient to determine a stable difference coefficient.
The stability difference coefficient is a numerical value reflecting whether the risk change coefficient is stable, the smaller the numerical value is, the more stable the corresponding risk change coefficient is, and the more stable the corresponding reliable feature type is influenced on the data, wherein the calculation method of the stability difference coefficient is the same as the calculation method of the risk deviation coefficient, and details are omitted.
Step S504: and defining the reliable feature type with the stable difference coefficient smaller than the preset standard coefficient as the stable feature type, recovering the risk change coefficient determined by the reference set containing the stable feature type from the eliminated risk change coefficients, and carrying out average value calculation according to all the risk change coefficients.
The standard coefficient is the maximum stable difference coefficient which is set by the staff and needs to be met when the influence of the reliable feature type on the data is stable, the stable feature type is defined to distinguish different reliable feature types, and meanwhile, the risk change coefficient determined by the corresponding reference set is recovered, so that the average change coefficient is calculated conveniently.
Referring to fig. 6, after the stable feature type is determined, the security risk assessment and management method based on the large language model data further includes:
Step S600: counting according to the standard set containing the stable characteristic types in the eliminated risk change coefficient to determine the required conforming quantity.
The required compliance quantity is the total number of risk factors that can be recovered.
Step S601: and carrying out difference calculation according to the coefficient statistical quantity and the reference demand quantity to determine the difference demand quantity.
The number of difference demands is the number of risk change coefficients of the minimum number which is lacking at present, and is determined by subtracting the coefficient statistical number from the reference number of demands.
Step S602: and judging whether the required meeting quantity is larger than the difference value required quantity or not.
The purpose of the determination is to know if the number of required recovery is excessive.
Step S6021: if the number of the requirement compliance is not greater than the number of the difference requirements, recovering risk change coefficients determined by all the reference sets containing the stable characteristic types from the eliminated risk change coefficients.
And when the required meeting quantity is not greater than the required quantity of the difference values, normally carrying out recovery processing on the risk change coefficient to be recovered.
Step S6022: if the required meeting quantity is larger than the difference required quantity, carrying out difference calculation according to the risk change coefficient and the analysis mean value coefficient to determine an excess coefficient.
When the number of the required coincidence is larger than the number of the difference requirements, the number of the recoverable quantity is excessive, and the recoverable quantity can be screened at the moment; the excess coefficient is the difference between the risk variation coefficient and the corresponding analysis mean coefficient, and the difference is an absolute value.
Step S603: and sorting the excess coefficients from small to large to determine an excess sequence, and selecting excess coefficients of the difference required quantity from front to back according to the excess sequence to recover the corresponding risk change coefficients.
The excess sequence is the sequence of the excess coefficients obtained from small to large, the smaller the excess coefficient is, the more stable the corresponding risk change coefficient is, the corresponding risk change coefficient is recovered at the moment, the condition that the direction change coefficient with the overlarge numerical value or the overlarge numerical value is recovered is avoided, the calculated mean change coefficient is accurate, and the accurate comprehensive risk coefficient is convenient to determine later.
Referring to fig. 7, based on the same inventive concept, an embodiment of the present invention provides a security risk assessment and management system based on large language model data, including:
The acquisition module is used for acquiring the data characteristic type of the requirement detection data;
The processing module is connected with the acquisition module and the judging module and is used for storing and processing information;
the judging module is connected with the acquisition module and the processing module and is used for judging information;
the processing module generalizes the data characteristic types to form a data characteristic set;
the processing module randomly combines the data feature types in the data feature set when the registration set in the preset large language model does not contain the data feature set so as to determine a combined feature set;
The processing module defines the combination feature set which is judged by the judging module to be consistent with the registration set as a combination effective set, and counts according to the data feature types in the combination effective set to determine the number of the combination features;
the processing module determines the combination feature quantity with the largest numerical value according to a preset ordering rule, determines a reference risk coefficient according to a combination effective set corresponding to the combination feature quantity, and compares the combination effective set with the data feature set to determine different feature types;
The processing module determines risk change coefficients between the two registration sets according to all different feature types in the registration sets;
The processing module carries out mean value calculation according to all risk change coefficients to determine a mean value change coefficient, and carries out calculation according to the mean value change coefficient and a reference risk coefficient to determine a comprehensive risk coefficient;
The processing module determines a risk management scheme corresponding to the comprehensive risk coefficient according to a preset risk management matching relation, and processes current requirement detection data according to the risk management scheme;
the combined effective set screening module is used for screening the most suitable combined effective sets aiming at a plurality of combined effective sets meeting the requirements;
The referenceable set screening module is used for screening the most suitable referenceable sets aiming at a plurality of referenceable sets meeting requirements;
The risk change coefficient eliminating module is used for eliminating the risk change coefficient determined by the registration set with lower association degree;
the risk change coefficient recovery module is used for recovering the direction change coefficients with larger influence degree on part of the features aiming at the condition that the number of the risk change coefficients is insufficient;
And the recovery selection module is used for selecting and recovering the risk change coefficients with a large number.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above. The specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which are not described herein.
Based on the same inventive concept, the embodiment of the invention provides an intelligent terminal, which comprises a memory and a processor, wherein the memory stores a computer program which can be loaded by the processor and execute a security risk assessment and management method based on large language model data.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above. The specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which are not described herein.
The foregoing description of the preferred embodiments of the application is not intended to limit the scope of the application in any way, including the abstract and drawings, in which case any feature disclosed in this specification (including abstract and drawings) may be replaced by alternative features serving the same, equivalent purpose, unless expressly stated otherwise. That is, each feature is one example only of a generic series of equivalent or similar features, unless expressly stated otherwise.

Claims (8)

1. A method for security risk assessment and management based on large language model data, comprising:
acquiring a data characteristic type of the demand detection data;
summarizing the data feature types to form a data feature set;
randomly combining the data feature types in the data feature set when the registration set in the preset large language model does not contain the data feature set so as to determine a combined feature set;
Defining a combined feature set consistent with the registration set as a combined effective set, and counting according to the data feature types in the combined effective set to determine the number of combined features;
Determining the number of combined features with the largest numerical value according to a preset ordering rule, determining a reference risk coefficient according to a combined effective set corresponding to the number of combined features, and comparing according to the combined effective set and a data feature set to determine different feature types;
Determining risk change coefficients between two registration sets according to all different feature types in the registration sets;
Calculating the average value according to all risk change coefficients to determine an average value change coefficient, and calculating according to the average value change coefficient and a reference risk coefficient to determine a comprehensive risk coefficient;
And determining a risk management scheme corresponding to the comprehensive risk coefficient according to a preset risk management matching relation, and processing the current requirement detection data according to the risk management scheme.
2. The large language model data based security risk assessment and management method according to claim 1, wherein after the determination of the number of combined features, the large language model data based security risk assessment and management method further comprises:
Judging whether at least two combined effective sets with the same number and the maximum combined characteristic quantity exist or not;
If at least two combined effective sets with the same number and the maximum combined characteristic quantity do not exist, determining a reference risk coefficient according to the combined effective sets;
If at least two combined effective sets with the same and maximum combined feature quantity exist, determining different feature types under each combined effective set, and forming different sets according to the different feature types;
defining a registration set including a dissimilar set as a referenceable set among the registration sets, and counting according to the referenceable set to determine a referenceable number;
and determining the referent quantity with the largest numerical value according to the ordering rule, and determining the benchmark risk coefficient according to the combined effective set corresponding to the referent quantity.
3. The large language model data based security risk assessment and management method according to claim 2, wherein after the determination of the referenceable quantity, the large language model data based security risk assessment and management method further comprises:
judging whether at least two referenceable sets with the same and maximum referenceable quantity exist or not;
If at least two referents with the same and maximum referents are not present, determining a reference risk coefficient according to the combined effective set corresponding to the unique referents;
If at least two referents with the same and maximum referents exist, calculating according to each risk change coefficient and the mean change coefficient to determine a risk deviation coefficient;
Determining the compensation quantity corresponding to the risk deviation coefficient according to a preset compensation matching relation, and correcting the referent quantity according to the compensation quantity, wherein the compensation matching relation is that ,/>To compensate for the quantity/>As risk deviation coefficient,/>And/>The parameters are calculated for a preset fixed.
4. The large language model data based security risk assessment and management method according to claim 3, wherein after the risk change coefficient is determined, the large language model data based security risk assessment and management method further comprises:
respectively defining two registration sets for determining risk change coefficients as a reference set and an expansion set, and defining the data feature type in the reference set as a reference feature type;
Defining a combined effective set for determining the reference risk coefficient as a benchmarking combination, and defining a reference feature type in the benchmarking combination as a reliable feature type;
counting according to the reference feature types to determine a reference number, and counting according to the reliable feature types to determine a reliable number;
and calculating according to the reliable quantity and the reference quantity to determine the reliable duty ratio, and rejecting risk change coefficients determined by the reference set and the expansion set, wherein the reliable duty ratio is smaller than the preset demand duty ratio.
5. The method for security risk assessment and management based on large language model data according to claim 4, wherein after the risk change coefficient is eliminated, the method for security risk assessment and management based on large language model data further comprises:
Counting according to the acquired risk variation coefficient to determine coefficient statistics quantity;
judging whether the coefficient statistical quantity is larger than a preset reference demand quantity or not;
If the coefficient statistical quantity is larger than the reference demand quantity, carrying out average value calculation according to the risk change coefficient;
If the coefficient statistical quantity is not greater than the reference demand quantity, defining a reference set corresponding to the eliminated risk change coefficient as an alternative set, and defining an alternative set with only one reliable feature type as a theoretical analysis set;
Carrying out mean value calculation on risk change coefficients determined by a theoretical analysis set corresponding to the same reliable feature type to determine analysis mean value coefficients;
calculating according to each risk change coefficient and the analysis mean value coefficient to determine a stable difference coefficient;
And defining the reliable feature type with the stable difference coefficient smaller than the preset standard coefficient as the stable feature type, recovering the risk change coefficient determined by the reference set containing the stable feature type from the eliminated risk change coefficients, and carrying out average value calculation according to all the risk change coefficients.
6. The method for security risk assessment and management based on large language model data according to claim 5, wherein after the determination of the stable feature type, the method for security risk assessment and management based on large language model data further comprises:
counting according to a reference set containing stable characteristic types in the eliminated risk change coefficients to determine the number of the requirement coincidence;
performing difference calculation according to the coefficient statistical quantity and the reference demand quantity to determine the difference demand quantity;
judging whether the number of the required coincidence requirements is larger than the number of the difference requirements;
If the number of the requirement coincidence is not greater than the number of the difference requirements, recovering risk change coefficients determined by all reference sets containing stable characteristic types from the eliminated risk change coefficients;
if the required meeting quantity is larger than the difference required quantity, carrying out difference calculation according to the risk change coefficient and the analysis mean value coefficient to determine an excess coefficient;
and sorting the excess coefficients from small to large to determine an excess sequence, and selecting excess coefficients of the difference required quantity from front to back according to the excess sequence to recover the corresponding risk change coefficients.
7. A large language model data based security risk assessment and management system, comprising:
The acquisition module is used for acquiring the data characteristic type of the requirement detection data;
The processing module is connected with the acquisition module and the judging module and is used for storing and processing information;
the judging module is connected with the acquisition module and the processing module and is used for judging information;
the processing module generalizes the data characteristic types to form a data characteristic set;
the processing module randomly combines the data feature types in the data feature set when the registration set in the preset large language model does not contain the data feature set so as to determine a combined feature set;
The processing module defines the combination feature set which is judged by the judging module to be consistent with the registration set as a combination effective set, and counts according to the data feature types in the combination effective set to determine the number of the combination features;
the processing module determines the combination feature quantity with the largest numerical value according to a preset ordering rule, determines a reference risk coefficient according to a combination effective set corresponding to the combination feature quantity, and compares the combination effective set with the data feature set to determine different feature types;
The processing module determines risk change coefficients between the two registration sets according to all different feature types in the registration sets;
The processing module carries out mean value calculation according to all risk change coefficients to determine a mean value change coefficient, and carries out calculation according to the mean value change coefficient and a reference risk coefficient to determine a comprehensive risk coefficient;
the processing module determines a risk management scheme corresponding to the comprehensive risk coefficient according to a preset risk management matching relation, and processes the current requirement detection data according to the risk management scheme.
8. An intelligent terminal comprising a memory and a processor, the memory having stored thereon a computer program capable of being loaded by the processor and performing the method according to any of claims 1 to 6.
CN202410605760.9A 2024-05-16 2024-05-16 Data security risk assessment and management system based on large language model Active CN118194358B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410605760.9A CN118194358B (en) 2024-05-16 2024-05-16 Data security risk assessment and management system based on large language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410605760.9A CN118194358B (en) 2024-05-16 2024-05-16 Data security risk assessment and management system based on large language model

Publications (2)

Publication Number Publication Date
CN118194358A true CN118194358A (en) 2024-06-14
CN118194358B CN118194358B (en) 2024-08-13

Family

ID=91403896

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410605760.9A Active CN118194358B (en) 2024-05-16 2024-05-16 Data security risk assessment and management system based on large language model

Country Status (1)

Country Link
CN (1) CN118194358B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118585632A (en) * 2024-08-06 2024-09-03 浪潮云信息技术股份公司 Large model question-answering method, device, equipment and storage medium for traffic industry

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100145896A1 (en) * 2007-08-22 2010-06-10 Fujitsu Limited Compound property prediction apparatus, property prediction method, and program for implementing the method
CN112527321A (en) * 2020-12-29 2021-03-19 平安银行股份有限公司 Deep learning-based application online method, system, device and medium
CN116883178A (en) * 2023-07-13 2023-10-13 上海栖盟科技有限公司 Asset configuration analysis method based on large language model
CN117726166A (en) * 2023-12-01 2024-03-19 上海臻旷信息科技有限公司 Artificial intelligence enterprise customer risk information analysis and evaluation method and system based on large language model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100145896A1 (en) * 2007-08-22 2010-06-10 Fujitsu Limited Compound property prediction apparatus, property prediction method, and program for implementing the method
CN112527321A (en) * 2020-12-29 2021-03-19 平安银行股份有限公司 Deep learning-based application online method, system, device and medium
CN116883178A (en) * 2023-07-13 2023-10-13 上海栖盟科技有限公司 Asset configuration analysis method based on large language model
CN117726166A (en) * 2023-12-01 2024-03-19 上海臻旷信息科技有限公司 Artificial intelligence enterprise customer risk information analysis and evaluation method and system based on large language model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王飞龙;胡志军;: "大数据在信用风险体系中的应用分析", 金融科技时代, no. 11, 10 November 2018 (2018-11-10) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118585632A (en) * 2024-08-06 2024-09-03 浪潮云信息技术股份公司 Large model question-answering method, device, equipment and storage medium for traffic industry

Also Published As

Publication number Publication date
CN118194358B (en) 2024-08-13

Similar Documents

Publication Publication Date Title
CN118194358B (en) Data security risk assessment and management system based on large language model
US5845285A (en) Computer system and method of data analysis
CN108009040B (en) Method, system and computer readable storage medium for determining fault root cause
US7756810B2 (en) Software tool for training and testing a knowledge base
CN115409395B (en) Quality acceptance inspection method and system for hydraulic construction engineering
CN115269342B (en) Monitoring cloud platform based on edge computing and monitoring method thereof
CN113591824B (en) Traffic violation data entry anomaly detection method and device
CN109241043B (en) Data quality detection method and device
CN112330474B (en) Nuclear protection wind control monitoring method, device, equipment and storage medium
CN112700158B (en) Multi-dimensional model-based algorithm efficiency evaluation method
CN115062675A (en) Full-spectrum pollution tracing method based on neural network and cloud system
CN112906738A (en) Water quality detection and treatment method
CN113807004A (en) Tool life prediction method, device and system based on data mining
CN111475746A (en) Method and device for mining point of interest, computer equipment and storage medium
CN113435939A (en) Engineering cost progress management system
CN114665986B (en) Bluetooth key testing system and method
CN116797153A (en) Enterprise employee task management application customization method based on big data
CN113050846B (en) Component-based time-space big data visualization configuration method and system
CN113435842A (en) Business process processing method and computer equipment
CN118313667A (en) Enterprise production security risk assessment method, system, device and storage medium
CN107767144A (en) A kind of credit assessment method and device
CN117349780B (en) Warehouse data intelligent identification management and control system and method based on data analysis
CN114389840B (en) Method and system for determining area where network attack source is located based on GLM factorization method
WO2024127478A1 (en) Detection device, detection method, and detection program
CN117669866A (en) Customer call service monitoring method and system of three-party data platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant