CN116796372B - File information safety supervision method and system based on big data - Google Patents

File information safety supervision method and system based on big data Download PDF

Info

Publication number
CN116796372B
CN116796372B CN202311077570.6A CN202311077570A CN116796372B CN 116796372 B CN116796372 B CN 116796372B CN 202311077570 A CN202311077570 A CN 202311077570A CN 116796372 B CN116796372 B CN 116796372B
Authority
CN
China
Prior art keywords
file information
file
files
archive
consulting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311077570.6A
Other languages
Chinese (zh)
Other versions
CN116796372A (en
Inventor
刘保定
王全刚
王星
郎晓旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Real Xinda Science And Technology Development Co ltd
Original Assignee
Shenzhen Real Xinda Science And Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Real Xinda Science And Technology Development Co ltd filed Critical Shenzhen Real Xinda Science And Technology Development Co ltd
Priority to CN202311077570.6A priority Critical patent/CN116796372B/en
Publication of CN116796372A publication Critical patent/CN116796372A/en
Application granted granted Critical
Publication of CN116796372B publication Critical patent/CN116796372B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/604Tools and structures for managing or administering access control systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Automation & Control Theory (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a archive information safety supervision method and system based on big data, and belongs to the technical field of archive management. Data segmentation is carried out on the files, a file information package set is generated, and a file information consulting package set is generated by combining an operation log of file information; two kinds of marks of readable state and unreadable state are carried out on the file information, and in the file information consulting packet set, marks are respectively added to the file information from different file information packet sets for display; taking the file information package set as a unified standard data set, taking the consulting behavior as a diversified scale data set, and analyzing the correlation and privacy evasion between files guided by the consulting behavior; judging the risk of influence of the consultation between the files in each consultation operation, calibrating the correlation between the files, analyzing and outputting privacy class values between the files; and further, the safety of the archive information is ensured while the sharing availability among the archive information is ensured.

Description

File information safety supervision method and system based on big data
Technical Field
The application relates to the technical field of archive management, in particular to an archive information safety supervision method and system based on big data.
Background
Along with the development of modern technology, the traditional file management mode cannot meet the requirements of contemporary work, and the pertinence of file management work can be well improved by utilizing technologies such as big data, the Internet of things and the like, and the management and control of loopholes and security risks of a large amount of data in file management are facilitated by accurately analyzing the file data;
in the patent of application publication date 2020.02.11, application number 201911155247.X, entitled a blockchain-based archive information security management system and method, the system comprises a unified data management tool, a local archive system database, and a blockchain network, wherein the unified data management tool is configured to be in communication connection with the local archive system database and adapt to multiple types of archive databases, the blockchain network comprises multiple nodes which are in communication connection with each other and performs information interaction with the unified data management tool, and the unified data management tool acquires an operation log of the local archive system database and unifies the operation log into transaction data to be issued to the blockchain network; the block chain network performs signature verification and transaction data consensus on the received transaction data; the unified data management tool operates the local file system database according to the transaction result; the database log consensus of high-efficiency automation is realized, and support is provided for the integrity of the authentication file, the operation tracing and the file data backup;
however, in the above patent, although the integrity of the archive and the traceability of the operation are ensured, the usability and privacy security of the archive in referring to the sharing process are ignored.
Disclosure of Invention
The application aims to provide a archive information safety supervision method and system based on big data, so as to solve the problems in the background technology.
In order to solve the technical problems, the application provides the following technical scheme:
the archive information safety supervision system based on big data comprises: the system comprises a archive data cloud deck module, an archive state processing module, an archive data analysis module and a privacy class binding module;
the archive data holder module is used for carrying out data segmentation on archives to generate an archive information packet set; generating a file information consulting package set according to the file information package set and combining an operation log of file information;
the archive state processing module is used for marking the archive information in a readable state and in an unreadable state; according to the marking result of the file information, in the file information consulting packet set, respectively adding marking display to the file information from different file information packet sets, and counting the marking display result;
the archive data analysis module is used for taking the archive information packet set as a unified standard data set, taking the consulting behavior as a diversified scale data set and analyzing the correlation among archives guided by the consulting behavior; according to the mark display, analyzing privacy evasiveness among files guided by the consulting behaviors during each consulting operation;
the privacy class binding module judges the risk of influence of the consulting among the archives in each consulting operation according to the relativity among the archives and the privacy evasion, and calibrates the relativity among the archives according to the judging result; and analyzing and outputting privacy class values among files according to the calibration result.
Further, the archive data holder module further comprises a data segmentation unit and an operation log unit;
the data segmentation unit is used for uniformly encoding files stored in the file data holder and marking any file as i; dividing data of information content contained in any one file i, and generating a file information packet set, which is denoted by ii= { AIi1, AIi2, & gt, AIin }, wherein Ii represents the file information packet set generated correspondingly by the file i, AIi1, AIi2, & gt, and AIin represents 1,2, and 4 pieces of file information obtained by dividing the data of the file i;
the operation log unit is configured to retrieve an operation log of archive information stored in the archive data holder, where the operation log includes a number of times of review operations and all archive information fed back in the archive data holder during each review operation, and collectively retrieve all archive information fed back by the archive data holder during an xth review operation according to the operation log, and generate an archive information review packet set, denoted as rx= { AIi1, AIi2, & gt, AIin, & gt, AIj, AIj, and AIjm }, where Rx represents an archive information review packet set corresponding to the generated during the xth review operation, AIj, AIj2, & gt, AIjm is the 1, 2.., m archive information included in the archive information packet set Ij, j is archive coding, and i is not equal to j.
Further, the archive state processing module further comprises a state marking unit and a display statistical unit;
the status marking unit is used for marking the file information in the file data holder in a readable status and an unreadable status, wherein the readable status is the status that the file information can be read in the file data holder, and the unreadable status is the status that the file information cannot be read in the file data holder;
the display statistics unit adds mark display to the file information from different file information packet sets in the file information consulting packet set according to the mark result of the file information, counts the file information in the file information packet set Ii according to the mark display, counts the number of readable states and unreadable states in the file information consulting packet set Rx, and marks as RS (Ii|Rx) and NRS (Ii|Rx) respectively.
Further, the archive data analysis module further comprises a similarity processing unit and an evasiveness processing unit;
the similarity processing unit refers to the file information packet set according to the file information packet set and the file information, uses the file information packet set as a unified standard data set, uses the reference behavior as a diversified scale data set, analyzes the correlation between files guided by the reference behavior, and calculates the correlation between files, wherein the specific calculation formula is as follows:
DA(i,j|x)=ln{NUM(Ii∩Rx)/NUM(Ii)÷[NUM(Ij∩Rx)/NUM(Ij)]}
wherein DA (i, j|x) represents the degree of correlation between file i and file j at the x-th reference operation, NUM (Ii), NUM (Ij) and NUM (Ij n Rx) represent the amounts of file information contained in file packet set Ii, file packet set Ij, intersection between file packet set Ii and file information reference packet set Rx, and intersection between file packet set Ij and file information reference packet set Rx, respectively, and Ii n Rx is not equal to ∅ and Ij n Rx is not equal to ∅;
the evasiveness processing unit is used for analyzing privacy evasiveness among files guided by the consulting behaviors in the x-th consulting operation according to the mark display, and calculating privacy evasiveness among files, wherein a specific calculation formula is as follows:
PA(i,j|x)=NRS(Ii|Rx)/[RS(Ii|Rx)+NRS(Ii|Rx)]÷{NRS(Ij|Rx)/[RS(Ij|Rx)+NRS(Ij|Rx)]}
wherein PA (i, j|x) represents the privacy avoidance degree between profile i and profile j at the x-th review operation, RS (ij|rx) and NRS (ij|rx) represent the profile information in the profile information package set Ij, respectively, the number of readable states and unreadable states in the profile information review package set Rx.
Further, the privacy class binding module further comprises a calibration unit and a privacy binding unit;
the calibration unit judges the risk of influence of the reference among the files in the x-th reference operation according to the correlation among the files and the privacy avoidance among the files, and the specific judgment formula is as follows:
DA(i,j|x)≥PA(i,j|x)
if the correlation between the files and the privacy avoidance between the files meet the judging formula in the x-th consulting operation, calibrating the correlation between the files to enable DA (i, j|x) =CA (i, j|x) =PA (i, j|x), otherwise not calibrating the correlation between the files to enable DA (i, j|x) =CA (i, j|x), wherein CA (i, j|x) represents a calibrated value of the correlation between the files;
the privacy binding unit updates the correlation between the archives at each of the history review operations according to the calibration result, and calculates a privacy class value P (i|j) =y-1 Σx=1 yCA (i, j|x) between the archives; and outputting privacy class values among the archives, and mutually binding the archives i and j through the privacy class value P (i|j).
A archive information safety supervision method based on big data comprises the following steps:
step S100: data segmentation is carried out on the files to generate file information package sets; generating a file information consulting package set according to the file information package set and combining an operation log of file information;
step S200: two marks of readable state and unreadable state are carried out on the file information; according to the marking result of the file information, in the file information consulting packet set, respectively adding marking display to the file information from different file information packet sets, and counting the marking display result;
step S300: taking the file information package set as a unified standard data set, taking the consulting behavior as a diversified scale data set, and analyzing the correlation among files guided by the consulting behavior; according to the mark display, analyzing privacy evasiveness among files guided by the consulting behaviors during each consulting operation;
step S400: judging the risk of influence of the consulting among the archives in each consulting operation according to the relativity and privacy evasion among the archives, and calibrating the relativity among the archives according to the judging result; and analyzing and outputting privacy class values among files according to the calibration result.
Further, the specific implementation process of the step S100 includes:
step S101: uniformly encoding files stored in the file data holder, and marking any file as i; dividing data of information content contained in any one file i, and generating a file information packet set, which is denoted by ii= { AIi1, AIi2, & gt, AIin }, wherein Ii represents the file information packet set generated correspondingly by the file i, AIi1, AIi2, & gt, and AIin represents 1,2, and 4 pieces of file information obtained by dividing the data of the file i;
step S102: the method comprises the steps of calling a file information operation log stored in a file data cloud platform, wherein the operation log comprises the times of checking operation and all file information fed back in the file data cloud platform during each checking operation, according to the operation log, all file information fed back by the file data cloud platform during the x-th checking operation is uniformly called, a file information checking packet set is generated, and is recorded as Rx= { AIi1, AIi2,.,. AIin, & gt, AIj1, AIj2, & gt, AIjm }, wherein Rx represents a file information checking packet set which is correspondingly generated during the x-th checking operation, AIj, AIj2, & gt, AIjm are respectively 1,2, wherein the m pieces of file information are contained in a file information packet set Ij, and j is file code, and i is not equal to j.
Further, the specific implementation process of the step S200 includes:
step S201: in the file data holder, two marks of readable states and unreadable states are carried out on file information, wherein the readable states are states in which the file information can be read in the file data holder, and the unreadable states are states in which the file information cannot be read in the file data holder;
step S202: according to the marking result of the file information, the marking display is added to the file information from different file information packet sets respectively in the file information consulting packet set, the file information in the file information packet set Ii is counted respectively according to the marking display, and the number of readable states and unreadable states in the file information consulting packet set Rx are respectively marked as RS (Ii|Rx) and NRS (Ii|Rx).
Further, the implementation process of the step S300 includes:
step S301: according to the file information package set and the file information consulting package set, taking the file information package set as a unified standard data set, taking the consulting behavior as a diversified scale data set, analyzing the correlation among files guided by the consulting behavior, and calculating the correlation among files, wherein the specific calculation formula is as follows:
DA(i,j|x)=ln{NUM(Ii∩Rx)/NUM(Ii)÷[NUM(Ij∩Rx)/NUM(Ij)]}
wherein DA (i, j|x) represents the degree of correlation between file i and file j at the x-th reference operation, NUM (Ii), NUM (Ij) and NUM (Ij n Rx) represent the amounts of file information contained in file packet set Ii, file packet set Ij, intersection between file packet set Ii and file information reference packet set Rx, and intersection between file packet set Ij and file information reference packet set Rx, respectively, and Ii n Rx is not equal to ∅ and Ij n Rx is not equal to ∅;
step S302: according to the mark display, the privacy avoidance between files guided by the consulting behavior in the x-th consulting operation is analyzed, the privacy avoidance between files is calculated, and a specific calculation formula is as follows:
PA(i,j|x)=NRS(Ii|Rx)/[RS(Ii|Rx)+NRS(Ii|Rx)]÷{NRS(Ij|Rx)/[RS(Ij|Rx)+NRS(Ij|Rx)]}
wherein PA (i, j|x) represents the privacy avoidance degree between profile i and profile j at the x-th review operation, RS (ij|rx) and NRS (ij|rx) represent the profile information in the profile information package set Ij, respectively, the number of readable states and unreadable states in the profile information review package set Rx.
Further, the specific implementation process of the step S400 includes:
step S401: judging the risk of influence of the reference among the files in the x-th reference operation according to the correlation among the files and the privacy avoidance among the files, wherein a specific judgment formula is as follows:
DA(i,j|x)≥PA(i,j|x)
if the correlation between the files and the privacy avoidance between the files meet the judging formula in the x-th consulting operation, calibrating the correlation between the files to enable DA (i, j|x) =CA (i, j|x) =PA (i, j|x), otherwise not calibrating the correlation between the files to enable DA (i, j|x) =CA (i, j|x), wherein CA (i, j|x) represents a calibrated value of the correlation between the files;
step S402: updating the correlation between the profiles at each of the historic reference operations according to the calibration result, and calculating a privacy level value P (i|j) =y-1 Σx=1 yCA (i, j|x) between the profiles; outputting privacy class values between archives;
according to the method, the correlation between files is obtained according to analysis of each reference behavior, the correlation is calculated through retrieval and sharing feedback of file information in the process of overall reference behavior, the correlation represents the sharing availability degree of files under the condition that the files are compared with actual actions in the reference behavior, and the larger the correlation is, the larger the availability degree between files is represented; the archive information often relates to a lot of private data or content, and further the archive information has two readable and unreadable states, when the archive is in the readable state, the content of the file or record can be easily read and understood, the archive can exist in the forms of characters, numbers, diagrams, pictures and the like, the information is clear, complete and accurate, no obstacle or limitation exists, the archive can be conveniently referred to and used by related personnel, and when the archive is in the unreadable state, the content of the file or record can not be directly read and understood, and the archive needs to be further unlocked by permission; furthermore, although correlation, i.e., availability, between the archives can be obtained in the review behavior, the privacy involved between the archives cannot be guaranteed, and further, the privacy avoidance degree between the archives needs to be further analyzed, so that when one archive is reviewed, the privacy content of another archive possibly involved is prevented from being related and analyzed through similarity, and the higher the privacy avoidance degree is, the greater the privacy avoidance degree is, the more the privacy disclosure risk involved between the archives is expressed; and then calibrate the similarity through privacy avoidance to when guaranteeing usability between the archives, can balance privacy between the archives and reveal the risk.
Compared with the prior art, the application has the following beneficial effects: in the archive information safety supervision method and system based on big data, the archive is subjected to data segmentation to generate an archive information package set, and an archive information consulting package set is generated by combining an operation log of archive information; two kinds of marks of readable state and unreadable state are carried out on the file information, and in the file information consulting packet set, marks are respectively added to the file information from different file information packet sets for display; taking the file information package set as a unified standard data set, taking the consulting behavior as a diversified scale data set, and analyzing the correlation and privacy evasion between files guided by the consulting behavior; judging the risk of influence of the consultation between the files in each consultation operation, calibrating the correlation between the files, analyzing and outputting privacy class values between the files; and further, the safety of the archive information is ensured while the sharing availability among the archive information is ensured.
Drawings
The accompanying drawings are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, serve to explain the application.
In the drawings: FIG. 1 is a schematic diagram of a archive information security supervision system based on big data according to the present application;
fig. 2 is a schematic diagram of steps of a method for managing archive information security based on big data according to the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1-2, the present application provides the following technical solutions:
referring to fig. 1, in a first embodiment: provided is a archive information security supervision system based on big data, comprising: the system comprises a archive data cloud deck module, an archive state processing module, an archive data analysis module and a privacy class binding module;
the archive data holder module is used for carrying out data segmentation on archives to generate an archive information packet set; generating a file information consulting package set according to the file information package set and combining an operation log of file information;
the archive data holder module further comprises a data segmentation unit and an operation log unit;
the data segmentation unit is used for uniformly encoding the files stored in the file data holder and marking any file as i; dividing data of information content contained in any one file i, and generating a file information packet set, which is denoted by ii= { AIi1, AIi2, & gt, AIin }, wherein Ii represents the file information packet set generated correspondingly by the file i, AIi1, AIi2, & gt, and AIin represents 1,2, and 4 pieces of file information obtained by dividing the data of the file i;
an operation log unit, configured to retrieve an operation log of archive information stored in the archive data holder, where the operation log includes a number of times of review operations and all archive information fed back in the archive data holder during each review operation, and collectively retrieve all archive information fed back by the archive data holder during an xth review operation according to the operation log, and generate an archive information review packet set, denoted as rx= { AIi1, AIi2,..;
the file state processing module is used for marking file information in a readable state and in an unreadable state; according to the marking result of the file information, in the file information consulting packet set, respectively adding marking display to the file information from different file information packet sets, and counting the marking display result;
the archive state processing module further comprises a state marking unit and a display statistical unit;
the state marking unit is used for marking the file information in the file data holder in a readable state and an unreadable state, wherein the readable state is a state that the file information can be read in the file data holder, and the unreadable state is a state that the file information cannot be read in the file data holder;
a display statistics unit for adding a mark display to the file information from different file information packet sets in the file information reference packet set according to the mark result of the file information, counting the file information in the file information packet set Ii according to the mark display, and recording the number of readable states and unreadable states in the file information reference packet set Rx as RS (Ii|Rx) and NRS (Ii|Rx) respectively;
the archive data analysis module is used for taking the archive information packet set as a unified standard data set, taking the consulting behavior as a diversified scale data set and analyzing the correlation among archives guided by the consulting behavior; according to the mark display, analyzing privacy evasiveness among files guided by the consulting behaviors during each consulting operation;
the archive data analysis module further comprises a similarity processing unit and an evasiveness processing unit;
the similarity processing unit is used for consulting the packet set according to the file information packet set and the file information, taking the file information packet set as a unified standard data set, taking the consulting behavior as a diversified scale data set, analyzing the correlation among files guided by the consulting behavior, and calculating the correlation among files, wherein the specific calculation formula is as follows:
DA(i,j|x)=ln{NUM(Ii∩Rx)/NUM(Ii)÷[NUM(Ij∩Rx)/NUM(Ij)]}
wherein DA (i, j|x) represents the degree of correlation between file i and file j at the x-th reference operation, NUM (Ii), NUM (Ij) and NUM (Ij n Rx) represent the amounts of file information contained in file packet set Ii, file packet set Ij, intersection between file packet set Ii and file information reference packet set Rx, and intersection between file packet set Ij and file information reference packet set Rx, respectively, and Ii n Rx is not equal to ∅ and Ij n Rx is not equal to ∅;
the evasiveness processing unit is used for analyzing privacy evasiveness among files guided by the consulting behaviors in the x-th consulting operation according to the mark display, and calculating privacy evasiveness among files, wherein a specific calculation formula is as follows:
PA(i,j|x)=NRS(Ii|Rx)/[RS(Ii|Rx)+NRS(Ii|Rx)]÷{NRS(Ij|Rx)/[RS(Ij|Rx)+NRS(Ij|Rx)]}
wherein PA (i, j|x) represents the privacy avoidance degree between archive i and archive j at the x-th review operation, RS (ij|rx) and NRS (ij|rx) represent the archive information in the archive information packet set Ij, respectively, the number of readable states and unreadable states in the archive information review packet set Rx;
the privacy class binding module judges the risk of influence of the consulting among the archives in each consulting operation according to the relativity among the archives and the privacy evasion, and calibrates the relativity among the archives according to the judging result; analyzing and outputting privacy class values among files according to the calibration result;
the privacy class binding module further comprises a calibration unit and a privacy binding unit;
the calibration unit judges the risk of influence of the reference among the files in the x-th reference operation according to the correlation among the files and the privacy avoidance among the files, and the specific judgment formula is as follows:
DA(i,j|x)≥PA(i,j|x)
if the correlation between the files and the privacy avoidance between the files meet the judging formula in the x-th consulting operation, calibrating the correlation between the files to enable DA (i, j|x) =CA (i, j|x) =PA (i, j|x), otherwise not calibrating the correlation between the files to enable DA (i, j|x) =CA (i, j|x), wherein CA (i, j|x) represents a calibrated value of the correlation between the files;
a privacy binding unit that updates a correlation between profiles at each of the history review operations based on the calibration result, and calculates a privacy level value P (i|j) =y-1 Σx=1 yCA (i, j|x) between the profiles; and outputting privacy class values among the archives, and mutually binding the archives i and j through the privacy class value P (i|j).
Referring to fig. 2, in the second embodiment: the archive information safety supervision method based on big data comprises the following steps:
data segmentation is carried out on the files to generate file information package sets; generating a file information consulting package set according to the file information package set and combining an operation log of file information;
uniformly encoding files stored in the file data holder, and marking any file as i; dividing data of information content contained in any one file i, and generating a file information packet set, which is denoted by ii= { AIi1, AIi2, & gt, AIin }, wherein Ii represents the file information packet set generated correspondingly by the file i, AIi1, AIi2, & gt, and AIin represents 1,2, and 4 pieces of file information obtained by dividing the data of the file i;
the method comprises the steps of calling a file information operation log stored in a file data cloud platform, wherein the operation log comprises the times of the consulting operation and all file information fed back in the file data cloud platform during each consulting operation, according to the operation log, uniformly calling all file information fed back by the file data cloud platform during the x-th consulting operation, generating a file information consulting packet set, and recording as Rx= { AIi1, AIi2, & gt, AIin, 2, AIj, AIj2, & gt, AIjm }, wherein Rx represents a file information consulting packet set correspondingly generated during the x-th consulting operation, AIj, AIj, & gt, AIjm are respectively 1,2, m pieces of file information contained in a file information packet set Ij, j is file coding, and i is not equal to j;
two marks of readable state and unreadable state are carried out on the file information; according to the marking result of the file information, in the file information consulting packet set, respectively adding marking display to the file information from different file information packet sets, and counting the marking display result;
in the file data holder, two marks of readable states and unreadable states are carried out on file information, wherein the readable states are states in which the file information can be read in the file data holder, and the unreadable states are states in which the file information cannot be read in the file data holder;
respectively adding mark display to the file information from different file information packet sets in the file information reference packet set according to the mark result of the file information, respectively counting the file information in the file information packet set Ii according to the mark display, and respectively marking the number of readable states and unreadable states in the file information reference packet set Rx as RS (Ii|Rx) and NRS (Ii|Rx);
taking the file information package set as a unified standard data set, taking the consulting behavior as a diversified scale data set, and analyzing the correlation among files guided by the consulting behavior; according to the mark display, analyzing privacy evasiveness among files guided by the consulting behaviors during each consulting operation;
according to the file information package set and the file information consulting package set, taking the file information package set as a unified standard data set, taking the consulting behavior as a diversified scale data set, analyzing the correlation among files guided by the consulting behavior, and calculating the correlation among files, wherein the specific calculation formula is as follows:
DA(i,j|x)=ln{NUM(Ii∩Rx)/NUM(Ii)÷[NUM(Ij∩Rx)/NUM(Ij)]}
wherein DA (i, j|x) represents the degree of correlation between file i and file j at the x-th reference operation, NUM (Ii), NUM (Ij) and NUM (Ij n Rx) represent the amounts of file information contained in file packet set Ii, file packet set Ij, intersection between file packet set Ii and file information reference packet set Rx, and intersection between file packet set Ij and file information reference packet set Rx, respectively, and Ii n Rx is not equal to ∅ and Ij n Rx is not equal to ∅;
the application relates to a correlation degree between files, which is obtained according to the principle of a differential privacy technology, and is calculated by improving a differential privacy calculation formula, wherein the essence of the correlation degree is privacy budget in the differential privacy technology, the privacy budget represents similarity probability of comparison between file information packet sets Ii and Ij and a reference operation result, namely a file information reference packet set Rx, respectively, and the higher the correlation degree is, namely the higher the privacy budget is, the closer the file information packet sets Ii and Ij are respectively to the file information reference packet set Rx;
according to the mark display, the privacy avoidance between files guided by the consulting behavior in the x-th consulting operation is analyzed, the privacy avoidance between files is calculated, and a specific calculation formula is as follows:
PA(i,j|x)=NRS(Ii|Rx)/[RS(Ii|Rx)+NRS(Ii|Rx)]÷{NRS(Ij|Rx)/[RS(Ij|Rx)+NRS(Ij|Rx)]}
wherein PA (i, j|x) represents the privacy avoidance degree between archive i and archive j at the x-th review operation, RS (ij|rx) and NRS (ij|rx) represent the archive information in the archive information packet set Ij, respectively, the number of readable states and unreadable states in the archive information review packet set Rx;
according to the application, the privacy evasion degree is used for further calibrating the privacy budget in the differential privacy calculation formula, and the similarity probability is changed by adjusting the privacy budget, namely the correlation degree, so that the usability among files and the privacy disclosure risk are simultaneously considered;
in the process of consulting the file information, the privacy content of another file can be attacked by consulting the behavior, and then the application calibrates the similarity between files by means of differential privacy technology after improvement, and adds privacy evasion risk, thereby ensuring the usability between files and ensuring the privacy between files in the process of consulting;
judging the risk of influence of the consulting among the archives in each consulting operation according to the relativity and privacy evasion among the archives, and calibrating the relativity among the archives according to the judging result; analyzing and outputting privacy class values among files according to the calibration result;
judging the risk of influence of the reference among the files in the x-th reference operation according to the correlation among the files and the privacy avoidance among the files, wherein a specific judgment formula is as follows:
DA(i,j|x)≥PA(i,j|x)
if the correlation between the files and the privacy avoidance between the files meet the judging formula in the x-th consulting operation, calibrating the correlation between the files to enable DA (i, j|x) =CA (i, j|x) =PA (i, j|x), otherwise not calibrating the correlation between the files to enable DA (i, j|x) =CA (i, j|x), wherein CA (i, j|x) represents a calibrated value of the correlation between the files;
updating the correlation between the profiles at each of the historic reference operations according to the calibration result, and calculating a privacy level value P (i|j) =y-1 Σx=1 yCA (i, j|x) between the profiles; the privacy class values between the profiles are output.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present application and is not intended to limit the present application, but although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the technical solutions described in the foregoing embodiments, or equivalents may be substituted for some of the technical features thereof. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (4)

1. The archive information safety supervision method based on big data is characterized by comprising the following steps of:
step S100: data segmentation is carried out on the files to generate file information package sets; generating a file information consulting package set according to the file information package set and combining an operation log of file information;
step S200: two marks of readable state and unreadable state are carried out on the file information; according to the marking result of the file information, in the file information consulting packet set, respectively adding marking display to the file information from different file information packet sets, and counting the marking display result;
step S300: taking the file information package set as a unified standard data set, taking the consulting behavior as a diversified scale data set, and analyzing the correlation among files guided by the consulting behavior; according to the mark display, analyzing privacy evasiveness among files guided by the consulting behaviors during each consulting operation;
step S400: judging the risk of influence of the consulting among the archives in each consulting operation according to the relativity and privacy evasion among the archives, and calibrating the relativity among the archives according to the judging result; analyzing and outputting privacy class values among files according to the calibration result;
the specific implementation process of the step S100 includes:
step S101: uniformly encoding files stored in the file data holder, and marking any file as i; dividing the information content contained in any one file I into data, generating a file information packet set, and recording as I i ={AI i1 ,AI i2 ,...,AI in }, wherein I i File information package set for representing corresponding generation of file iClosing, AI i1 ,AI i2 ,...,AI in Respectively representing the 1 st, 2 nd and n th archive information obtained by dividing the archive i;
step S102: the method comprises the steps of calling an operation log of file information stored in a file data cloud platform, wherein the operation log comprises the times of the consulting operation and all file information fed back in the file data cloud platform during each consulting operation, uniformly calling all file information fed back by the file data cloud platform during the x-th consulting operation according to the operation log, generating a file information consulting package set, and marking as R x ={AI i1 ,AI i2 ,...,AI in ,...,AI j1 ,AI j2 ,...,AI jm (wherein R is x Representing a set of archive information reference packets corresponding to the x-th reference operation, AI j1 ,AI j2 ,...,AI jm Respectively file information package set I j The 1,2 nd, m archive information, j is archive code, and i not equal to j;
the specific implementation process of the step S200 includes:
step S201: in the file data holder, two marks of readable states and unreadable states are carried out on file information, wherein the readable states are states in which the file information can be read in the file data holder, and the unreadable states are states in which the file information cannot be read in the file data holder;
step S202: according to the marking result of the file information, respectively adding marking display to the file information from different file information packet sets in the file information reference packet set, respectively counting file information packet set I according to the marking display i Medium file information, reference packet set R in file information x The number of readable and unreadable states in (a) and (b) are respectively denoted as RS (I i |R x ) And NRS (I) i |R x );
The specific implementation process of the step S300 includes:
step S301: according to the file information package set and the file information consulting package set, taking the file information package set as a unified standard data set, taking the consulting behavior as a diversified scale data set, analyzing the correlation among files guided by the consulting behavior, and calculating the correlation among files, wherein the specific calculation formula is as follows:
wherein DA (I, j|x) represents the correlation between file I and file j at the x-th review operation, NUM (I) i )、NUM(I j )、NUM(I i ∩R x ) And NUM (I) j ∩R x ) Respectively representing file information package set I i Set of archive packets I j Set of archive packets I i Set R of file information reference package x Intersection between file packet sets I j Set R of file information reference package x The quantity of the file information contained in the intersection between them, andand->
Step S302: according to the mark display, the privacy avoidance between files guided by the consulting behavior in the x-th consulting operation is analyzed, the privacy avoidance between files is calculated, and a specific calculation formula is as follows:
wherein PA (I, j|x) represents the degree of privacy avoidance between profile I and profile j at the x-th review operation, RS (I) j |R x ) And NRS (I) j |R x ) Respectively representing file information package set I j Medium file information, reference packet set R in file information x The number of readable states and unreadable states in the memory.
2. The archive information security supervision method based on big data according to claim 1, wherein the specific implementation process of step S400 includes:
step S401: judging the risk of influence of the reference among the files in the x-th reference operation according to the correlation among the files and the privacy avoidance among the files, wherein a specific judgment formula is as follows:
DA(i,j|x)≥PA(i,j|x)
if the correlation between the files and the privacy avoidance between the files meet the judging formula in the x-th consulting operation, calibrating the correlation between the files to enable DA (i, j|x) =CA (i, j|x) =PA (i, j|x), otherwise not calibrating the correlation between the files to enable DA (i, j|x) =CA (i, j|x), wherein CA (i, j|x) represents a calibrated value of the correlation between the files;
step S402: updating the correlation between profiles at each of the historic reference operations based on the calibration result, and calculating the privacy class value between profilesThe privacy class values between the profiles are output.
3. A archive information security supervision system based on big data, the system comprising: the system comprises a archive data cloud deck module, an archive state processing module, an archive data analysis module and a privacy class binding module;
the archive data holder module is used for carrying out data segmentation on archives to generate an archive information packet set; generating a file information consulting package set according to the file information package set and combining an operation log of file information;
the archive state processing module is used for marking the archive information in a readable state and in an unreadable state; according to the marking result of the file information, in the file information consulting packet set, respectively adding marking display to the file information from different file information packet sets, and counting the marking display result;
the archive data analysis module is used for taking the archive information packet set as a unified standard data set, taking the consulting behavior as a diversified scale data set and analyzing the correlation among archives guided by the consulting behavior; according to the mark display, analyzing privacy evasiveness among files guided by the consulting behaviors during each consulting operation;
the privacy class binding module judges the risk of influence of the consulting among the archives in each consulting operation according to the relativity among the archives and the privacy evasion, and calibrates the relativity among the archives according to the judging result; analyzing and outputting privacy class values among files according to the calibration result;
the archive data holder module further comprises a data segmentation unit and an operation log unit;
the data segmentation unit is used for uniformly encoding files stored in the file data holder and marking any file as i; dividing the information content contained in any one file I into data, generating a file information packet set, and recording as I i ={AI i1 ,AI i2 ,...,AI in }, wherein I i Representing a set of archive information packages generated corresponding to archive i, AI i1 ,AI i2 ,...,AI in Respectively representing the 1 st, 2 nd and n th archive information obtained by dividing the archive i;
the operation log unit is configured to retrieve an operation log of file information stored in the file data holder, where the operation log includes the number of times of the review operation and all file information fed back in the file data holder during each review operation, and according to the operation log, retrieve all file information fed back by the file data holder during the x-th review operation in a unified manner, and generate a file information review packet set, and record as R x ={AI i1 ,AI i2 ,...,AI in ,...,AI j1 ,AI j2 ,...,AI jm (wherein R is x Representing a set of archive information reference packets corresponding to the x-th reference operation, AI j1 ,AI j2 ,...,AI jm Respectively file information package set I j The 1,2 nd, m archive information, j is archive code, and i not equal to j;
the archive state processing module further comprises a state marking unit and a display statistical unit;
the status marking unit is used for marking the file information in the file data holder in a readable status and an unreadable status, wherein the readable status is the status that the file information can be read in the file data holder, and the unreadable status is the status that the file information cannot be read in the file data holder;
the display statistics unit adds mark display to the file information from different file information packet sets in the file information consulting packet set according to the mark result of the file information, and respectively counts file information packet set I according to the mark display i Medium file information, reference packet set R in file information x The number of readable and unreadable states in (a) and (b) are respectively denoted as RS (I i |R x ) And NRS (I) i |R x );
The archive data analysis module further comprises a similarity processing unit and an evasiveness processing unit;
the similarity processing unit refers to the file information packet set according to the file information packet set and the file information, uses the file information packet set as a unified standard data set, uses the reference behavior as a diversified scale data set, analyzes the correlation between files guided by the reference behavior, and calculates the correlation between files, wherein the specific calculation formula is as follows:
wherein DA (I, j|x) represents the correlation between file I and file j at the x-th review operation, NUM (I) i )、NUM(I j )、NUM(I i ∩R x ) And NUM (I) j ∩R x ) Respectively representing file information package set I i Set of archive packets I j Files and filesPacket aggregation I i Set R of file information reference package x Intersection between file packet sets I j Set R of file information reference package x The quantity of the file information contained in the intersection between them, andand->
The evasiveness processing unit is used for analyzing privacy evasiveness among files guided by the consulting behaviors in the x-th consulting operation according to the mark display, and calculating privacy evasiveness among files, wherein a specific calculation formula is as follows:
wherein PA (I, j|x) represents the degree of privacy avoidance between profile I and profile j at the x-th review operation, RS (I) j |R x ) And NRS (I) j |R x ) Respectively representing file information package set I j Medium file information, reference packet set R in file information x The number of readable states and unreadable states in the memory.
4. A profile information security administration system based on big data as claimed in claim 3, wherein: the privacy class binding module further comprises a calibration unit and a privacy binding unit;
the calibration unit judges the risk of influence of the reference among the files in the x-th reference operation according to the correlation among the files and the privacy avoidance among the files, and the specific judgment formula is as follows:
DA(i,j|x)≥PA(i,j|x)
if the correlation between the files and the privacy avoidance between the files meet the judging formula in the x-th consulting operation, calibrating the correlation between the files to enable DA (i, j|x) =CA (i, j|x) =PA (i, j|x), otherwise not calibrating the correlation between the files to enable DA (i, j|x) =CA (i, j|x), wherein CA (i, j|x) represents a calibrated value of the correlation between the files;
the privacy binding unit updates the correlation between files in each reference operation in the history reference operation according to the calibration result, and calculates the privacy class value between filesAnd outputting privacy class values among the archives, and mutually binding the archives i and j through the privacy class value P (i|j).
CN202311077570.6A 2023-08-25 2023-08-25 File information safety supervision method and system based on big data Active CN116796372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311077570.6A CN116796372B (en) 2023-08-25 2023-08-25 File information safety supervision method and system based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311077570.6A CN116796372B (en) 2023-08-25 2023-08-25 File information safety supervision method and system based on big data

Publications (2)

Publication Number Publication Date
CN116796372A CN116796372A (en) 2023-09-22
CN116796372B true CN116796372B (en) 2023-11-28

Family

ID=88048383

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311077570.6A Active CN116796372B (en) 2023-08-25 2023-08-25 File information safety supervision method and system based on big data

Country Status (1)

Country Link
CN (1) CN116796372B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014010578A (en) * 2012-06-28 2014-01-20 Kyocera Document Solutions Inc Security management device, image forming apparatus, and security management program
CN115424705A (en) * 2022-11-08 2022-12-02 深圳市宝安区石岩人民医院 Intelligent medical file intelligent consulting, analyzing and managing system and method based on cloud computing
CN115982762A (en) * 2022-12-26 2023-04-18 网思科技股份有限公司 Big data based data security leakage-proof management method, system and medium
CN116485351A (en) * 2023-06-21 2023-07-25 深圳市软筑信息技术有限公司 Electronic archive management method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7287280B2 (en) * 2002-02-12 2007-10-23 Goldman Sachs & Co. Automated security management
GB2507722A (en) * 2012-09-28 2014-05-14 Barclays Bank Plc Document management system taking actions based on extracted data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014010578A (en) * 2012-06-28 2014-01-20 Kyocera Document Solutions Inc Security management device, image forming apparatus, and security management program
CN115424705A (en) * 2022-11-08 2022-12-02 深圳市宝安区石岩人民医院 Intelligent medical file intelligent consulting, analyzing and managing system and method based on cloud computing
CN115982762A (en) * 2022-12-26 2023-04-18 网思科技股份有限公司 Big data based data security leakage-proof management method, system and medium
CN116485351A (en) * 2023-06-21 2023-07-25 深圳市软筑信息技术有限公司 Electronic archive management method and system

Also Published As

Publication number Publication date
CN116796372A (en) 2023-09-22

Similar Documents

Publication Publication Date Title
US11138561B2 (en) System and method for data record selection by application of predictive models and velocity analysis
CN110602248A (en) Abnormal behavior information identification method, system, device, equipment and medium
Abad‐Romero et al. Risk and return around bond rating changes: New evidence from the Spanish stock market
US20080208780A1 (en) System and method for evaluating documents
Spears A holistic risk analysis method for identifying information security risks
EP1673686A4 (en) Automated financial transaction due diligence systems and methods
Duchesne et al. PASOS (parental allocation of singles in open systems): a computer program for individual parental allocation with missing parents
CN110263016A (en) Data processing method, terminal device and computer storage medium
WO2008154346A2 (en) System and method for risk prioritization
WO2021120628A1 (en) Blockchain-based sensitive word detection method and apparatus, computer device and computer-readable storage medium
Odeh et al. Reliability of statistical software
US8484724B2 (en) User permissions in computing systems
CN116796372B (en) File information safety supervision method and system based on big data
Pandey et al. Blockchain technology enabled critical success factors for supply chain resilience and sustainability
CN113902574A (en) Protocol data processing method, device, computer equipment and storage medium
CN116090018B (en) Bank data security processing method and system
CN111161088A (en) Bill processing method, device and equipment
Wurzenberger et al. Discovering insider threats from log data with high-performance bioinformatics tools
US8700429B2 (en) System and method for managing pedigree information
US11748515B2 (en) System and method for secure linking of anonymized data
Lyvas et al. A hybrid dynamic risk analysis methodology for cyber-physical systems
KR20220117187A (en) Security compliance automation method
CN109377378B (en) Industry relevancy risk determination device and system
CN112132694B (en) Method, device, equipment and storage medium for confirming and checking policy and security case
CN116664084B (en) Project evaluation management method and system based on Internet of things

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant