CN116796372B

CN116796372B - File information safety supervision method and system based on big data

Info

Publication number: CN116796372B
Application number: CN202311077570.6A
Authority: CN
Inventors: 刘保定; 王全刚; 王星; 郎晓旭
Original assignee: Shenzhen Real Xinda Science And Technology Development Co ltd
Current assignee: Shenzhen Real Xinda Science And Technology Development Co ltd
Priority date: 2023-08-25
Filing date: 2023-08-25
Publication date: 2023-11-28
Anticipated expiration: 2043-08-25
Also published as: CN116796372A

Abstract

The application discloses a archive information safety supervision method and system based on big data, and belongs to the technical field of archive management. Data segmentation is carried out on the files, a file information package set is generated, and a file information consulting package set is generated by combining an operation log of file information; two kinds of marks of readable state and unreadable state are carried out on the file information, and in the file information consulting packet set, marks are respectively added to the file information from different file information packet sets for display; taking the file information package set as a unified standard data set, taking the consulting behavior as a diversified scale data set, and analyzing the correlation and privacy evasion between files guided by the consulting behavior; judging the risk of influence of the consultation between the files in each consultation operation, calibrating the correlation between the files, analyzing and outputting privacy class values between the files; and further, the safety of the archive information is ensured while the sharing availability among the archive information is ensured.

Description

File information safety supervision method and system based on big data

Technical Field

The application relates to the technical field of archive management, in particular to an archive information safety supervision method and system based on big data.

Background

Along with the development of modern technology, the traditional file management mode cannot meet the requirements of contemporary work, and the pertinence of file management work can be well improved by utilizing technologies such as big data, the Internet of things and the like, and the management and control of loopholes and security risks of a large amount of data in file management are facilitated by accurately analyzing the file data;

in the patent of application publication date 2020.02.11, application number 201911155247.X, entitled a blockchain-based archive information security management system and method, the system comprises a unified data management tool, a local archive system database, and a blockchain network, wherein the unified data management tool is configured to be in communication connection with the local archive system database and adapt to multiple types of archive databases, the blockchain network comprises multiple nodes which are in communication connection with each other and performs information interaction with the unified data management tool, and the unified data management tool acquires an operation log of the local archive system database and unifies the operation log into transaction data to be issued to the blockchain network; the block chain network performs signature verification and transaction data consensus on the received transaction data; the unified data management tool operates the local file system database according to the transaction result; the database log consensus of high-efficiency automation is realized, and support is provided for the integrity of the authentication file, the operation tracing and the file data backup;

however, in the above patent, although the integrity of the archive and the traceability of the operation are ensured, the usability and privacy security of the archive in referring to the sharing process are ignored.

Disclosure of Invention

The application aims to provide a archive information safety supervision method and system based on big data, so as to solve the problems in the background technology.

In order to solve the technical problems, the application provides the following technical scheme:

the archive information safety supervision system based on big data comprises: the system comprises a archive data cloud deck module, an archive state processing module, an archive data analysis module and a privacy class binding module;

the archive data holder module is used for carrying out data segmentation on archives to generate an archive information packet set; generating a file information consulting package set according to the file information package set and combining an operation log of file information;

the archive state processing module is used for marking the archive information in a readable state and in an unreadable state; according to the marking result of the file information, in the file information consulting packet set, respectively adding marking display to the file information from different file information packet sets, and counting the marking display result;

the archive data analysis module is used for taking the archive information packet set as a unified standard data set, taking the consulting behavior as a diversified scale data set and analyzing the correlation among archives guided by the consulting behavior; according to the mark display, analyzing privacy evasiveness among files guided by the consulting behaviors during each consulting operation;

the privacy class binding module judges the risk of influence of the consulting among the archives in each consulting operation according to the relativity among the archives and the privacy evasion, and calibrates the relativity among the archives according to the judging result; and analyzing and outputting privacy class values among files according to the calibration result.

Further, the archive data holder module further comprises a data segmentation unit and an operation log unit;

the data segmentation unit is used for uniformly encoding files stored in the file data holder and marking any file as i; dividing data of information content contained in any one file i, and generating a file information packet set, which is denoted by ii= { AIi1, AIi2, & gt, AIin }, wherein Ii represents the file information packet set generated correspondingly by the file i, AIi1, AIi2, & gt, and AIin represents 1,2, and 4 pieces of file information obtained by dividing the data of the file i;

the operation log unit is configured to retrieve an operation log of archive information stored in the archive data holder, where the operation log includes a number of times of review operations and all archive information fed back in the archive data holder during each review operation, and collectively retrieve all archive information fed back by the archive data holder during an xth review operation according to the operation log, and generate an archive information review packet set, denoted as rx= { AIi1, AIi2, & gt, AIin, & gt, AIj, AIj, and AIjm }, where Rx represents an archive information review packet set corresponding to the generated during the xth review operation, AIj, AIj2, & gt, AIjm is the 1, 2.., m archive information included in the archive information packet set Ij, j is archive coding, and i is not equal to j.

Further, the archive state processing module further comprises a state marking unit and a display statistical unit;

the status marking unit is used for marking the file information in the file data holder in a readable status and an unreadable status, wherein the readable status is the status that the file information can be read in the file data holder, and the unreadable status is the status that the file information cannot be read in the file data holder;

the display statistics unit adds mark display to the file information from different file information packet sets in the file information consulting packet set according to the mark result of the file information, counts the file information in the file information packet set Ii according to the mark display, counts the number of readable states and unreadable states in the file information consulting packet set Rx, and marks as RS (Ii|Rx) and NRS (Ii|Rx) respectively.

Further, the archive data analysis module further comprises a similarity processing unit and an evasiveness processing unit;

the similarity processing unit refers to the file information packet set according to the file information packet set and the file information, uses the file information packet set as a unified standard data set, uses the reference behavior as a diversified scale data set, analyzes the correlation between files guided by the reference behavior, and calculates the correlation between files, wherein the specific calculation formula is as follows:

DA(i，j|x)=ln{NUM(Ii∩Rx)/NUM(Ii)÷[NUM(Ij∩Rx)/NUM(Ij)]}

wherein DA (i, j|x) represents the degree of correlation between file i and file j at the x-th reference operation, NUM (Ii), NUM (Ij) and NUM (Ij n Rx) represent the amounts of file information contained in file packet set Ii, file packet set Ij, intersection between file packet set Ii and file information reference packet set Rx, and intersection between file packet set Ij and file information reference packet set Rx, respectively, and Ii n Rx is not equal to ∅ and Ij n Rx is not equal to ∅;

the evasiveness processing unit is used for analyzing privacy evasiveness among files guided by the consulting behaviors in the x-th consulting operation according to the mark display, and calculating privacy evasiveness among files, wherein a specific calculation formula is as follows:

PA(i，j|x)=NRS(Ii|Rx)/[RS(Ii|Rx)+NRS(Ii|Rx)]÷{NRS(Ij|Rx)/[RS(Ij|Rx)+NRS(Ij|Rx)]}

wherein PA (i, j|x) represents the privacy avoidance degree between profile i and profile j at the x-th review operation, RS (ij|rx) and NRS (ij|rx) represent the profile information in the profile information package set Ij, respectively, the number of readable states and unreadable states in the profile information review package set Rx.

Further, the privacy class binding module further comprises a calibration unit and a privacy binding unit;

the calibration unit judges the risk of influence of the reference among the files in the x-th reference operation according to the correlation among the files and the privacy avoidance among the files, and the specific judgment formula is as follows:

DA(i，j|x)≥PA(i，j|x)

if the correlation between the files and the privacy avoidance between the files meet the judging formula in the x-th consulting operation, calibrating the correlation between the files to enable DA (i, j|x) =CA (i, j|x) =PA (i, j|x), otherwise not calibrating the correlation between the files to enable DA (i, j|x) =CA (i, j|x), wherein CA (i, j|x) represents a calibrated value of the correlation between the files;

the privacy binding unit updates the correlation between the archives at each of the history review operations according to the calibration result, and calculates a privacy class value P (i|j) =y-1 Σx=1 yCA (i, j|x) between the archives; and outputting privacy class values among the archives, and mutually binding the archives i and j through the privacy class value P (i|j).

A archive information safety supervision method based on big data comprises the following steps:

step S100: data segmentation is carried out on the files to generate file information package sets; generating a file information consulting package set according to the file information package set and combining an operation log of file information;

step S200: two marks of readable state and unreadable state are carried out on the file information; according to the marking result of the file information, in the file information consulting packet set, respectively adding marking display to the file information from different file information packet sets, and counting the marking display result;

step S300: taking the file information package set as a unified standard data set, taking the consulting behavior as a diversified scale data set, and analyzing the correlation among files guided by the consulting behavior; according to the mark display, analyzing privacy evasiveness among files guided by the consulting behaviors during each consulting operation;

step S400: judging the risk of influence of the consulting among the archives in each consulting operation according to the relativity and privacy evasion among the archives, and calibrating the relativity among the archives according to the judging result; and analyzing and outputting privacy class values among files according to the calibration result.

Further, the specific implementation process of the step S100 includes:

step S101: uniformly encoding files stored in the file data holder, and marking any file as i; dividing data of information content contained in any one file i, and generating a file information packet set, which is denoted by ii= { AIi1, AIi2, & gt, AIin }, wherein Ii represents the file information packet set generated correspondingly by the file i, AIi1, AIi2, & gt, and AIin represents 1,2, and 4 pieces of file information obtained by dividing the data of the file i;

step S102: the method comprises the steps of calling a file information operation log stored in a file data cloud platform, wherein the operation log comprises the times of checking operation and all file information fed back in the file data cloud platform during each checking operation, according to the operation log, all file information fed back by the file data cloud platform during the x-th checking operation is uniformly called, a file information checking packet set is generated, and is recorded as Rx= { AIi1, AIi2,.,. AIin, & gt, AIj1, AIj2, & gt, AIjm }, wherein Rx represents a file information checking packet set which is correspondingly generated during the x-th checking operation, AIj, AIj2, & gt, AIjm are respectively 1,2, wherein the m pieces of file information are contained in a file information packet set Ij, and j is file code, and i is not equal to j.

Further, the specific implementation process of the step S200 includes:

step S201: in the file data holder, two marks of readable states and unreadable states are carried out on file information, wherein the readable states are states in which the file information can be read in the file data holder, and the unreadable states are states in which the file information cannot be read in the file data holder;

step S202: according to the marking result of the file information, the marking display is added to the file information from different file information packet sets respectively in the file information consulting packet set, the file information in the file information packet set Ii is counted respectively according to the marking display, and the number of readable states and unreadable states in the file information consulting packet set Rx are respectively marked as RS (Ii|Rx) and NRS (Ii|Rx).

Further, the implementation process of the step S300 includes:

step S301: according to the file information package set and the file information consulting package set, taking the file information package set as a unified standard data set, taking the consulting behavior as a diversified scale data set, analyzing the correlation among files guided by the consulting behavior, and calculating the correlation among files, wherein the specific calculation formula is as follows:

DA(i，j|x)=ln{NUM(Ii∩Rx)/NUM(Ii)÷[NUM(Ij∩Rx)/NUM(Ij)]}

step S302: according to the mark display, the privacy avoidance between files guided by the consulting behavior in the x-th consulting operation is analyzed, the privacy avoidance between files is calculated, and a specific calculation formula is as follows:

Further, the specific implementation process of the step S400 includes:

step S401: judging the risk of influence of the reference among the files in the x-th reference operation according to the correlation among the files and the privacy avoidance among the files, wherein a specific judgment formula is as follows:

DA(i，j|x)≥PA(i，j|x)

step S402: updating the correlation between the profiles at each of the historic reference operations according to the calibration result, and calculating a privacy level value P (i|j) =y-1 Σx=1 yCA (i, j|x) between the profiles; outputting privacy class values between archives;

according to the method, the correlation between files is obtained according to analysis of each reference behavior, the correlation is calculated through retrieval and sharing feedback of file information in the process of overall reference behavior, the correlation represents the sharing availability degree of files under the condition that the files are compared with actual actions in the reference behavior, and the larger the correlation is, the larger the availability degree between files is represented; the archive information often relates to a lot of private data or content, and further the archive information has two readable and unreadable states, when the archive is in the readable state, the content of the file or record can be easily read and understood, the archive can exist in the forms of characters, numbers, diagrams, pictures and the like, the information is clear, complete and accurate, no obstacle or limitation exists, the archive can be conveniently referred to and used by related personnel, and when the archive is in the unreadable state, the content of the file or record can not be directly read and understood, and the archive needs to be further unlocked by permission; furthermore, although correlation, i.e., availability, between the archives can be obtained in the review behavior, the privacy involved between the archives cannot be guaranteed, and further, the privacy avoidance degree between the archives needs to be further analyzed, so that when one archive is reviewed, the privacy content of another archive possibly involved is prevented from being related and analyzed through similarity, and the higher the privacy avoidance degree is, the greater the privacy avoidance degree is, the more the privacy disclosure risk involved between the archives is expressed; and then calibrate the similarity through privacy avoidance to when guaranteeing usability between the archives, can balance privacy between the archives and reveal the risk.

Compared with the prior art, the application has the following beneficial effects: in the archive information safety supervision method and system based on big data, the archive is subjected to data segmentation to generate an archive information package set, and an archive information consulting package set is generated by combining an operation log of archive information; two kinds of marks of readable state and unreadable state are carried out on the file information, and in the file information consulting packet set, marks are respectively added to the file information from different file information packet sets for display; taking the file information package set as a unified standard data set, taking the consulting behavior as a diversified scale data set, and analyzing the correlation and privacy evasion between files guided by the consulting behavior; judging the risk of influence of the consultation between the files in each consultation operation, calibrating the correlation between the files, analyzing and outputting privacy class values between the files; and further, the safety of the archive information is ensured while the sharing availability among the archive information is ensured.

Drawings

The accompanying drawings are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, serve to explain the application.

In the drawings: FIG. 1 is a schematic diagram of a archive information security supervision system based on big data according to the present application;

fig. 2 is a schematic diagram of steps of a method for managing archive information security based on big data according to the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Referring to fig. 1-2, the present application provides the following technical solutions:

referring to fig. 1, in a first embodiment: provided is a archive information security supervision system based on big data, comprising: the system comprises a archive data cloud deck module, an archive state processing module, an archive data analysis module and a privacy class binding module;

the archive data holder module further comprises a data segmentation unit and an operation log unit;

the data segmentation unit is used for uniformly encoding the files stored in the file data holder and marking any file as i; dividing data of information content contained in any one file i, and generating a file information packet set, which is denoted by ii= { AIi1, AIi2, & gt, AIin }, wherein Ii represents the file information packet set generated correspondingly by the file i, AIi1, AIi2, & gt, and AIin represents 1,2, and 4 pieces of file information obtained by dividing the data of the file i;

an operation log unit, configured to retrieve an operation log of archive information stored in the archive data holder, where the operation log includes a number of times of review operations and all archive information fed back in the archive data holder during each review operation, and collectively retrieve all archive information fed back by the archive data holder during an xth review operation according to the operation log, and generate an archive information review packet set, denoted as rx= { AIi1, AIi2,..;

the file state processing module is used for marking file information in a readable state and in an unreadable state; according to the marking result of the file information, in the file information consulting packet set, respectively adding marking display to the file information from different file information packet sets, and counting the marking display result;

the archive state processing module further comprises a state marking unit and a display statistical unit;

the state marking unit is used for marking the file information in the file data holder in a readable state and an unreadable state, wherein the readable state is a state that the file information can be read in the file data holder, and the unreadable state is a state that the file information cannot be read in the file data holder;

a display statistics unit for adding a mark display to the file information from different file information packet sets in the file information reference packet set according to the mark result of the file information, counting the file information in the file information packet set Ii according to the mark display, and recording the number of readable states and unreadable states in the file information reference packet set Rx as RS (Ii|Rx) and NRS (Ii|Rx) respectively;

the archive data analysis module further comprises a similarity processing unit and an evasiveness processing unit;

the similarity processing unit is used for consulting the packet set according to the file information packet set and the file information, taking the file information packet set as a unified standard data set, taking the consulting behavior as a diversified scale data set, analyzing the correlation among files guided by the consulting behavior, and calculating the correlation among files, wherein the specific calculation formula is as follows:

DA(i，j|x)=ln{NUM(Ii∩Rx)/NUM(Ii)÷[NUM(Ij∩Rx)/NUM(Ij)]}

wherein PA (i, j|x) represents the privacy avoidance degree between archive i and archive j at the x-th review operation, RS (ij|rx) and NRS (ij|rx) represent the archive information in the archive information packet set Ij, respectively, the number of readable states and unreadable states in the archive information review packet set Rx;

the privacy class binding module judges the risk of influence of the consulting among the archives in each consulting operation according to the relativity among the archives and the privacy evasion, and calibrates the relativity among the archives according to the judging result; analyzing and outputting privacy class values among files according to the calibration result;

the privacy class binding module further comprises a calibration unit and a privacy binding unit;

DA(i，j|x)≥PA(i，j|x)

a privacy binding unit that updates a correlation between profiles at each of the history review operations based on the calibration result, and calculates a privacy level value P (i|j) =y-1 Σx=1 yCA (i, j|x) between the profiles; and outputting privacy class values among the archives, and mutually binding the archives i and j through the privacy class value P (i|j).

Referring to fig. 2, in the second embodiment: the archive information safety supervision method based on big data comprises the following steps:

data segmentation is carried out on the files to generate file information package sets; generating a file information consulting package set according to the file information package set and combining an operation log of file information;

uniformly encoding files stored in the file data holder, and marking any file as i; dividing data of information content contained in any one file i, and generating a file information packet set, which is denoted by ii= { AIi1, AIi2, & gt, AIin }, wherein Ii represents the file information packet set generated correspondingly by the file i, AIi1, AIi2, & gt, and AIin represents 1,2, and 4 pieces of file information obtained by dividing the data of the file i;

the method comprises the steps of calling a file information operation log stored in a file data cloud platform, wherein the operation log comprises the times of the consulting operation and all file information fed back in the file data cloud platform during each consulting operation, according to the operation log, uniformly calling all file information fed back by the file data cloud platform during the x-th consulting operation, generating a file information consulting packet set, and recording as Rx= { AIi1, AIi2, & gt, AIin, 2, AIj, AIj2, & gt, AIjm }, wherein Rx represents a file information consulting packet set correspondingly generated during the x-th consulting operation, AIj, AIj, & gt, AIjm are respectively 1,2, m pieces of file information contained in a file information packet set Ij, j is file coding, and i is not equal to j;

two marks of readable state and unreadable state are carried out on the file information; according to the marking result of the file information, in the file information consulting packet set, respectively adding marking display to the file information from different file information packet sets, and counting the marking display result;

in the file data holder, two marks of readable states and unreadable states are carried out on file information, wherein the readable states are states in which the file information can be read in the file data holder, and the unreadable states are states in which the file information cannot be read in the file data holder;

respectively adding mark display to the file information from different file information packet sets in the file information reference packet set according to the mark result of the file information, respectively counting the file information in the file information packet set Ii according to the mark display, and respectively marking the number of readable states and unreadable states in the file information reference packet set Rx as RS (Ii|Rx) and NRS (Ii|Rx);

taking the file information package set as a unified standard data set, taking the consulting behavior as a diversified scale data set, and analyzing the correlation among files guided by the consulting behavior; according to the mark display, analyzing privacy evasiveness among files guided by the consulting behaviors during each consulting operation;

according to the file information package set and the file information consulting package set, taking the file information package set as a unified standard data set, taking the consulting behavior as a diversified scale data set, analyzing the correlation among files guided by the consulting behavior, and calculating the correlation among files, wherein the specific calculation formula is as follows:

DA(i，j|x)=ln{NUM(Ii∩Rx)/NUM(Ii)÷[NUM(Ij∩Rx)/NUM(Ij)]}

the application relates to a correlation degree between files, which is obtained according to the principle of a differential privacy technology, and is calculated by improving a differential privacy calculation formula, wherein the essence of the correlation degree is privacy budget in the differential privacy technology, the privacy budget represents similarity probability of comparison between file information packet sets Ii and Ij and a reference operation result, namely a file information reference packet set Rx, respectively, and the higher the correlation degree is, namely the higher the privacy budget is, the closer the file information packet sets Ii and Ij are respectively to the file information reference packet set Rx;

according to the mark display, the privacy avoidance between files guided by the consulting behavior in the x-th consulting operation is analyzed, the privacy avoidance between files is calculated, and a specific calculation formula is as follows:

according to the application, the privacy evasion degree is used for further calibrating the privacy budget in the differential privacy calculation formula, and the similarity probability is changed by adjusting the privacy budget, namely the correlation degree, so that the usability among files and the privacy disclosure risk are simultaneously considered;

in the process of consulting the file information, the privacy content of another file can be attacked by consulting the behavior, and then the application calibrates the similarity between files by means of differential privacy technology after improvement, and adds privacy evasion risk, thereby ensuring the usability between files and ensuring the privacy between files in the process of consulting;

judging the risk of influence of the consulting among the archives in each consulting operation according to the relativity and privacy evasion among the archives, and calibrating the relativity among the archives according to the judging result; analyzing and outputting privacy class values among files according to the calibration result;

judging the risk of influence of the reference among the files in the x-th reference operation according to the correlation among the files and the privacy avoidance among the files, wherein a specific judgment formula is as follows:

DA(i，j|x)≥PA(i，j|x)

updating the correlation between the profiles at each of the historic reference operations according to the calibration result, and calculating a privacy level value P (i|j) =y-1 Σx=1 yCA (i, j|x) between the profiles; the privacy class values between the profiles are output.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present application and is not intended to limit the present application, but although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the technical solutions described in the foregoing embodiments, or equivalents may be substituted for some of the technical features thereof. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. The archive information safety supervision method based on big data is characterized by comprising the following steps of:

step S400: judging the risk of influence of the consulting among the archives in each consulting operation according to the relativity and privacy evasion among the archives, and calibrating the relativity among the archives according to the judging result; analyzing and outputting privacy class values among files according to the calibration result;

the specific implementation process of the step S100 includes:

step S101: uniformly encoding files stored in the file data holder, and marking any file as i; dividing the information content contained in any one file I into data, generating a file information packet set, and recording as I _i ＝{AI _i1 ，AI _i2 ，...，AI _in }, wherein I _i File information package set for representing corresponding generation of file iClosing, AI _i1 ，AI _i2 ，...，AI _in Respectively representing the 1 st, 2 nd and n th archive information obtained by dividing the archive i;

step S102: the method comprises the steps of calling an operation log of file information stored in a file data cloud platform, wherein the operation log comprises the times of the consulting operation and all file information fed back in the file data cloud platform during each consulting operation, uniformly calling all file information fed back by the file data cloud platform during the x-th consulting operation according to the operation log, generating a file information consulting package set, and marking as R _x ＝{AI _i1 ，AI _i2 ，...，AI _in ，...，AI _j1 ，AI _j2 ，...，AI _jm (wherein R is _x Representing a set of archive information reference packets corresponding to the x-th reference operation, AI _j1 ，AI _j2 ，...，AI _jm Respectively file information package set I _j The 1,2 nd, m archive information, j is archive code, and i not equal to j;

the specific implementation process of the step S200 includes:

step S202: according to the marking result of the file information, respectively adding marking display to the file information from different file information packet sets in the file information reference packet set, respectively counting file information packet set I according to the marking display _i Medium file information, reference packet set R in file information _x The number of readable and unreadable states in (a) and (b) are respectively denoted as RS (I _i |R _x ) And NRS (I) _i |R _x )；

The specific implementation process of the step S300 includes:

wherein DA (I, j|x) represents the correlation between file I and file j at the x-th review operation, NUM (I) _i )、NUM(I _j )、NUM(I _i ∩R _x ) And NUM (I) _j ∩R _x ) Respectively representing file information package set I _i Set of archive packets I _j Set of archive packets I _i Set R of file information reference package _x Intersection between file packet sets I _j Set R of file information reference package _x The quantity of the file information contained in the intersection between them, andand->

wherein PA (I, j|x) represents the degree of privacy avoidance between profile I and profile j at the x-th review operation, RS (I) _j |R _x ) And NRS (I) _j |R _x ) Respectively representing file information package set I _j Medium file information, reference packet set R in file information _x The number of readable states and unreadable states in the memory.

2. The archive information security supervision method based on big data according to claim 1, wherein the specific implementation process of step S400 includes:

DA(i，j|x)≥PA(i，j|x)

step S402: updating the correlation between profiles at each of the historic reference operations based on the calibration result, and calculating the privacy class value between profilesThe privacy class values between the profiles are output.

3. A archive information security supervision system based on big data, the system comprising: the system comprises a archive data cloud deck module, an archive state processing module, an archive data analysis module and a privacy class binding module;

the data segmentation unit is used for uniformly encoding files stored in the file data holder and marking any file as i; dividing the information content contained in any one file I into data, generating a file information packet set, and recording as I _i ＝{AI _i1 ，AI _i2 ，...，AI _in }, wherein I _i Representing a set of archive information packages generated corresponding to archive i, AI _i1 ，AI _i2 ，...，AI _in Respectively representing the 1 st, 2 nd and n th archive information obtained by dividing the archive i;

the operation log unit is configured to retrieve an operation log of file information stored in the file data holder, where the operation log includes the number of times of the review operation and all file information fed back in the file data holder during each review operation, and according to the operation log, retrieve all file information fed back by the file data holder during the x-th review operation in a unified manner, and generate a file information review packet set, and record as R _x ＝{AI _i1 ，AI _i2 ，...，AI _in ，...，AI _j1 ，AI _j2 ，...，AI _jm (wherein R is _x Representing a set of archive information reference packets corresponding to the x-th reference operation, AI _j1 ，AI _j2 ，...，AI _jm Respectively file information package set I _j The 1,2 nd, m archive information, j is archive code, and i not equal to j;

the display statistics unit adds mark display to the file information from different file information packet sets in the file information consulting packet set according to the mark result of the file information, and respectively counts file information packet set I according to the mark display _i Medium file information, reference packet set R in file information _x The number of readable and unreadable states in (a) and (b) are respectively denoted as RS (I _i |R _x ) And NRS (I) _i |R _x )；

wherein DA (I, j|x) represents the correlation between file I and file j at the x-th review operation, NUM (I) _i )、NUM(I _j )、NUM(I _i ∩R _x ) And NUM (I) _j ∩R _x ) Respectively representing file information package set I _i Set of archive packets I _j Files and filesPacket aggregation I _i Set R of file information reference package _x Intersection between file packet sets I _j Set R of file information reference package _x The quantity of the file information contained in the intersection between them, andand->

4. A profile information security administration system based on big data as claimed in claim 3, wherein: the privacy class binding module further comprises a calibration unit and a privacy binding unit;

DA(i，j|x)≥PA(i，j|x)

the privacy binding unit updates the correlation between files in each reference operation in the history reference operation according to the calibration result, and calculates the privacy class value between filesAnd outputting privacy class values among the archives, and mutually binding the archives i and j through the privacy class value P (i|j).