CN115438015A

CN115438015A - Computer file storage system and method based on block chain

Info

Publication number: CN115438015A
Application number: CN202211127852.8A
Authority: CN
Inventors: 邓成正
Original assignee: Individual
Current assignee: Individual
Priority date: 2022-09-16
Filing date: 2022-09-16
Publication date: 2022-12-06

Abstract

The invention discloses a computer file storage system and method based on a block chain, comprising the following steps: the system comprises a file uploading module, a data memory, a file processing module, a block chain module and a file calling module; after the identity authentication is carried out on the user through the file uploading module, files uploaded by the user are collected, and the collected files are sent to the data storage; storing all the collected user information and file data through the data storage; generating a file characteristic value and an access link through the file processing module, classifying and performing hash processing, and uploading the processed data to the block chain module; reading an address through the block chain module, encrypting and chaining the address, and sending the address to the file calling module; after the file calling module identifies the identity of the user, keywords are input, and corresponding file information is extracted, so that the problems of huge enterprise files, slow searching, easy loss and easy tampering are solved.

Description

Computer file storage system and method based on block chain

Technical Field

The invention relates to the technical field of computer file storage, in particular to a computer file storage system and method based on a block chain.

Background

The enterprise is established at the beginning, the information accumulation is started, the enterprise develops along with the time, the information quantity is more and more huge, the company file is used as the real record and knowledge accumulation of the activities such as enterprise research and development, production and operation, the office document, the file and the electronic file are used as the important knowledge assets of the enterprise, and the method has very important function in the enterprise operation management.

However, files are scattered in different computers, servers or systems, the number of the files is huge, unified backup is difficult, and the problems of slow search and low efficiency exist; moreover, security monitoring cannot be realized for relatively sensitive documents, and data may be lost or leaked, so that the security of the documents is threatened; and the traditional paper file management is not only easy to lose, but also the paper file is not easy to store, and the important file can not be repaired after being damaged.

Therefore, there is a need for a blockchain-based computer file storage system and method that can solve the above problems, monitor file storage and usage through identification, and prevent file tampering through blockchain technology.

Disclosure of Invention

The present invention is directed to a system and method for storing a computer file based on a block chain, so as to solve the problems mentioned in the background art.

In order to solve the technical problems, the invention provides the following technical scheme: a blockchain-based computer file storage system, the system comprising: the system comprises a file uploading module, a data memory, a file processing module, a block chain module and a file calling module;

after the identity authentication is carried out on the user through the file uploading module, files uploaded by the user are collected, and the collected files are sent to the data storage;

storing all the collected user information and file data through the data storage;

generating a file characteristic value and an access link through the file processing module, classifying and performing hash processing, and uploading the processed data to the block chain module;

reading an address through the block chain module, encrypting and chaining the address, and sending the address to the file calling module;

after the identity of the user is identified through the file calling module, keywords are input, and corresponding file information is extracted.

Further, the file uploading module comprises a user information collecting unit and a file content uploading unit; the user information collecting unit is used for collecting the identity information of a user who uploads a file; the file content uploading unit is used for collecting file information and uploading the file information to the data storage.

Further, the file processing module comprises a file processing unit, a file classification unit and a file hash unit; the file processing unit generates a file characteristic value and a file access link after processing a file; the file classification unit classifies and summarizes all files according to the file characteristic values; the file hash unit maps the file characteristic value and the file access link corresponding to the file characteristic value to the data memory address by using a separation link method;

the file processing unit comprises a characteristic value generating subunit and an access link generating subunit; the characteristic value generation subunit projects the data to a low-dimensional space by using an LDA model, and then extracts keywords, so as to obtain a file characteristic value, so that the same type of data is as compact as possible, and different types of data are dispersed as far as possible; the access link generation subunit generates a link for each file using a link generator.

Further, the block chain module comprises a reading unit, an encryption unit and an uplink unit; the reading unit is used for reading the address of the file hash unit; the encryption unit encrypts data by using an asymmetric encryption algorithm and sends the encrypted data to the uplink unit; and the uplink unit is used for acquiring data by using the block chain acquisition node so as to store the file characteristic value and the file access link.

Further, the file calling module comprises an identity authentication unit and a data extraction unit; the identity authentication unit is used for identifying identity information of company personnel and reducing the range of file extraction; and the data extraction unit is used for clicking the appeared link after the user inputs the keyword, thereby displaying the information content of the file.

A block chain based computer file storage method comprises the following steps:

s1: performing identity authentication on a user, and collecting a file uploaded by the user;

s2: storing all collected user information and file data;

s3: generating a file characteristic value and an access link, and carrying out classification and hash processing;

s4: reading the address and carrying out encryption and uplink transmission;

s5: and identifying the identity of the user, inputting keywords and extracting corresponding file information.

Further, in step S1: and verifying the identity information of the user through the work number, wherein the identity information comprises the name of an employee, the department of the employee and the like, and judging whether the user is a company employee or not, if so, the user can enter the system to upload the file, and otherwise, the user cannot enter the system.

Further, in step S2: and storing all the collected user identity information and file data through a data storage.

Further, in step S3: in order to obtain the file feature value, the data is projected to a low-dimensional space by using an LDA model, and then keyword extraction is performed:

(1) Firstly, calculating the mean value and covariance of each group of file data after projection: setting a data set formed by document data as D _j ＝(x ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _m ,y _m ) Wherein x is _i Is an arbitrary n-dimensional vector, y _i E {0,1} is used for judging whether the keyword is contained, wherein D _j A data set representing jth file data, m representing the number of jth file data, such that:

wherein, mu _j Represents the mean value of m data in the jth file data,

representing a covariance matrix of m data in jth file data;

setting the projection straight line as vector omega, and for any file data x, its projection on the straight line is omega ^T x, then the mean vector and covariance matrix of the document data after projection are:

(2) Then, an optimization objective function J of the LDA model is calculated:

the uploaded files and the files with the same characteristics in the system are gathered together by utilizing two classifications, and a divergence matrix in the classifications is defined as S _ω For representing the aggregation degree of data points in each file, and defining the inter-class divergence matrix as S _b For representing the dispersion degree of files of different types:

S _b ＝(μ ₀ -μ ₁ )(μ ₀ -μ ₁ ) ^T ；

in order to make the projection points of the same type of file data as close as possible, the covariance of the projection points of the same type of file data can be made as small as possible, the projection points of different types of file data can be kept as distant as possible, the distance between the class center points can be made as far as possible, and an optimization objective function of the LDA model is defined:

by making omega ^T S _ω ω =1, the above equation can be maximized, thus making the data points of the same class more clustered and the data points of different classes more dispersed;

(3) Then, a projection straight line ω:

optimizing the objective function using a lagrange function: from L (omega) = omega ^T S _b ω-λ(ω ^T S _ω ω -1) and λ represents only one parameter, resulting in ω = S _ω ^-1 (μ ₀ -μ ₁ )；

(4) And finally, obtaining projected data points Y = ω X, wherein Y represents the set of characteristic values of each file.

Further, in step S3: classifying all files according to the file characteristic values, classifying the clustered files of the same class into one class, and further determining which type and which department the files of the same class belong to; and mapping the file characteristic value and the file access link corresponding to the file characteristic value to the data memory address by using a separation link method, wherein the method belongs to the conventional technical means of a person skilled in the art, and therefore, the method is not described in detail.

Further, in step S4: after the address of the file hash unit is read, data is encrypted by using an asymmetric encryption algorithm, wherein the method belongs to the conventional technical means of a person skilled in the art, and therefore, the method is not described in detail; and after encryption processing, acquiring data by using a block chain acquisition node, thereby storing the file characteristic value and the file access link.

Further, in step S5: identifying identity information of company personnel, and predicting files which are possibly extracted by a user according to the identity information of the company personnel, including names, affiliated departments and extraction records after the identity information passes verification, so that the range of the files is narrowed; the system identifies the key words input by the user, finds the file characteristic values of the key words, generates file access links, and clicks the links by the user, so that the information content of the files is displayed.

Compared with the prior art, the invention has the following beneficial effects:

the identity information of the user is verified through the work number, so that the security of the file is more favorably monitored; the data are projected to a low-dimensional space by using an LDA model, and then key words are extracted, so that the characteristic value of the file is obtained, the data points of the same class are more gathered, the data points of different classes are more dispersed, and the subsequent classification of the file is more facilitated; the file characteristic value and the file access link corresponding to the file characteristic value are mapped to the data storage address by utilizing a separation link method, so that the storage of files with overlarge memories is facilitated; the block chain technology is utilized to be more beneficial to preventing the user from tampering the file; the files which are possibly extracted by the user are predicted according to the identity information of the user, so that the range of the files is reduced, and the file extraction efficiency is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a block diagram of a blockchain based computer file storage system of the present invention;

fig. 2 is a flow chart of a method for storing a computer file based on a blockchain according to the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

Referring to fig. 1-2, the present invention provides a technical solution: a blockchain-based computer file storage system, the system comprising: the system comprises a file uploading module, a data memory, a file processing module, a block chain module and a file calling module;

Further, the file uploading module comprises a user information collecting unit and a file content uploading unit; the user information collecting unit is used for collecting the identity information of the user uploading the files, including the name of the user and the department to which the user belongs, so as to judge whether the user is a company employee or not; the file content uploading unit is used for collecting file information and uploading the file information to the data storage.

Furthermore, the data storage stores all the collected user identity information and file data, so that the user information can be conveniently recorded and the file information can be conveniently extracted by the user.

Further, the file processing module comprises a file processing unit, a file classification unit and a file hash unit; the file processing unit generates a file characteristic value and a file access link after processing the file, so that the file with a large memory can be conveniently stored in a block chain in the follow-up process; the file classification unit classifies and summarizes all files according to the file characteristic values, so that the files are tidier and more regular; the file hash unit maps the file characteristic value and the file access link corresponding to the file characteristic value to the data memory address by using a separation link method;

the file processing unit comprises a characteristic value generating subunit and an access link generating subunit; the characteristic value generation subunit projects the data to a low-dimensional space by using an LDA model, and then extracts keywords, so that a file characteristic value is obtained, the file characteristic value is more favorable for storing file data, the same type of data is as compact as possible, and the different types of data are as dispersed as possible; the access link generation subunit generates a link for each file using the link generator, thereby reducing the storage space thereof.

Further, the block chain module comprises a reading unit, an encryption unit and an uplink unit; the reading unit is used for reading the address of the file hash unit; the encryption unit encrypts data by using an asymmetric encryption algorithm and sends the encrypted data to the uplink unit; the uplink unit is used for collecting data by using the block chain collection node, so that the file characteristic value and the file access link are stored, and the problem of file tampering is effectively solved.

Further, the file calling module comprises an identity authentication unit and a data extraction unit; the identity authentication unit is used for identifying identity information of company personnel, and is favorable for predicting file information, so that the file extraction range is narrowed; and the data extraction unit is used for clicking the appeared link after the user inputs the keyword, thereby displaying the information content of the file.

A block chain based computer file storage method comprises the following steps:

s2: storing all the collected user information and file data;

s4: reading the address and carrying out encryption and uplink transmission;

s5: and identifying the identity of the user, inputting the key words and extracting corresponding file information.

Further, in step S1: the identity information of the user is verified through the work number, including the name of the staff, the department to which the user belongs, and the like, whether the user is a company staff or not is judged, if the user is a company staff, the user can enter the system,

otherwise, the system cannot be accessed, and the file is uploaded after the verification is passed.

Further, in step S2: all the collected user identity information and file data are stored through the data storage, so that the user information can be conveniently recorded and the file information can be conveniently extracted by the user.

(1) Firstly, calculating the mean value and covariance of each group of file data after projection: setting a document dataformThe resultant data set is D _j ＝(x ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _m ,y _m ) Wherein x is _i Is an arbitrary n-dimensional vector, y _i E {0,1} is used for judging whether the keyword is contained, wherein D _j A data set representing jth file data, and m represents the number of jth file data, such that:

wherein, mu _j Represents the mean value of m data in the jth file data,

representing a covariance matrix of m data in the jth document data;

(2) Then, an optimization objective function J of the LDA model is calculated:

the uploaded files and the files with the same characteristics in the system are gathered by utilizing two classifications, and a divergence matrix in the classifications is defined as S _ω For representing the aggregation degree of data points in each file, and defining the inter-class divergence matrix as S _b For indicating the degree of dispersion of files of different classes:

S _b ＝(μ ₀ -μ ₁ )(μ ₀ -μ ₁ ) ^T ；

in order to make the projection points of the same type of file data as close as possible, the covariance of the projection points of the same type of file data can be made as small as possible, the projection points of different types of file data can be made as distant as possible, the distance between the class center points can be made as far as possible, the subsequent classification of files is facilitated, and an optimization objective function of the LDA model is defined:

(3) Then, a projection line ω:

optimizing the objective function using a lagrange function: by L (ω) = ω ^T S _b ω-λ(ω ^T S _ω ω -1), derived and let the result be 0 and λ represent only one parameter, resulting in ω = S _ω ^-1 (μ ₀ -v ₁ )；

(4) And finally, obtaining a projected data point Y = ω X, wherein Y represents a set of characteristic values of each file.

Further, in step S3: classifying all files according to the file characteristic values, classifying the files which are gathered together into one class, and further determining which type and department the files belong to; and mapping the file characteristic values and the corresponding file access links to the data memory addresses by using a separation linking method, wherein the method belongs to the conventional technical means of the technical personnel in the field, and therefore, the method is not described in detail.

Further, in step S5: identifying the identity information of company personnel, and predicting files which are possibly extracted by a user according to the identity information of the company personnel, including names, affiliated departments and extraction records after the identity information passes verification, so that the range of extracting the files by the system is narrowed, and the operation efficiency of the system is improved; the system identifies the key words input by the user, finds the file characteristic values of the key words, generates file access links, and displays the information content of the files by clicking the links by the user.

The first embodiment is as follows:

in step S1: the identity information of the user is verified through the work number, the fact that the name of the user is Zhang III, belongs to the financial department and is a staff member of the company is recognized, and therefore the file is uploaded.

In step S2: and storing the identity information of the user and the uploaded file data through a data storage.

In step S3: in order to obtain the file characteristic value, the data is projected to a low-dimensional space by using an LDA model, and then keyword extraction is performed:

(1) Firstly, calculating the mean value and covariance of each group of file data after projection: setting a data set formed of document data to D _j ＝(x ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _m ,y _m ) Wherein x is _i Is an arbitrary n-dimensional vector, y _i E {0,1} for judging whether the keyword is a keyword, wherein D _j A data set representing jth file data, m representing the number of jth file data, such that:

wherein, mu _j Represents the mean value of m data in the jth file data,

representing a covariance matrix of m data in the jth document data;

setting the projection straight line as vector omega, and for any file data x, its projection on the straight line is omega ^T x, then the post-projection document dataThe mean vector and covariance matrix of (a) are:

(2) Then, an optimization objective function J of the LDA model is calculated:

the uploaded files and the files with the same characteristics in the system are gathered together by utilizing two classifications, and a divergence matrix in the classifications is defined as S _ω The method is used for representing the aggregation degree of data points in each file and defining an inter-class divergence matrix as S _b For representing the dispersion degree of files of different types:

S _b ＝(μ ₀ -μ ₁ )(μ ₀ -μ ₁ ) ^T ；

(3) Then, a projection line ω:

(4) Finally, obtaining a projected data point Y = ω X, wherein Y represents a set of characteristic values of each file;

at this time, the obtained file feature value is "a → B invoice".

Classifying the files according to the characteristic value 'A → B invoice', further confirming that the files belong to the 'invoice' class files and belong to the financial department; and mapping the file characteristic value and the file access link corresponding to the file characteristic value to the data memory address by using a separation linking method.

In step S4: after the address of the file hash unit is read, the data is encrypted by using an asymmetric encryption algorithm, wherein the method belongs to the conventional technical means of a person skilled in the art, and therefore, the method is not described in detail; and after encryption processing, acquiring data by using a block chain acquisition node, thereby storing the file characteristic value and the file access link.

In step S5: identifying the identity information of company personnel, identifying that the identity information is 'Liquan', belongs to the financial department, predicting that the files possibly extracted by the user are invoices, financial bills and the like according to the identity information including names, the department to which the company belongs and extraction records after the company personnel pass the verification, thereby reducing the range of the files; the system recognizes that the keyword input by the 'Liquad' is 'A → B invoice', automatically presents the file access link, and displays the specific information content of the invoice by clicking the link.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described above, or equivalents may be substituted for elements thereof. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A blockchain-based computer file storage system, comprising: the system comprises: the system comprises a file uploading module, a data memory, a file processing module, a block chain module and a file calling module;

2. The blockchain-based computer file storage system of claim 1, wherein: the file uploading module comprises a user information collecting unit and a file content uploading unit; the user information collection unit is used for collecting the identity information of a user uploading files; the file content uploading unit is used for collecting file information and uploading the file information to the data storage.

3. The blockchain-based computer file storage system of claim 1, wherein: the file processing module comprises a file processing unit, a file classification unit and a file hash unit; the file processing unit generates a file characteristic value and a file access link after processing a file; the file classification unit classifies and summarizes all files according to the file characteristic values; the file hash unit maps the file characteristic value and the file access link corresponding to the file characteristic value to the data memory address by using a separation link method;

the file processing unit comprises a characteristic value generating subunit and an access link generating subunit; the characteristic value generation subunit projects data to a low-dimensional space by using an LDA model, and then extracts keywords, so as to obtain a file characteristic value; the access link generation subunit generates a link for each file using a link generator.

4. The blockchain-based computer file storage system of claim 1, wherein: the block chain module comprises a reading unit, an encryption unit and an uplink unit; the reading unit is used for reading the address of the file hash unit; the encryption unit encrypts data by using an asymmetric encryption algorithm and sends the encrypted data to the uplink unit; and the uplink unit is used for acquiring data by using the block chain acquisition node so as to store the file characteristic value and the file access link.

5. The blockchain-based computer file storage system of claim 1, wherein: the file calling module comprises an identity authentication unit and a data extraction unit; the identity authentication unit is used for identifying identity information of company personnel and reducing the range of file extraction; and the data extraction unit is used for clicking the appeared link after the user inputs the keyword, thereby displaying the information content of the file.

6. A block chain-based computer file storage method is characterized in that: the method comprises the following steps:

s2: storing all the collected user information and file data;

s4: reading the address and carrying out encryption and uplink transmission;

7. The method for storing the computer files based on the blockchain as claimed in claim 6, wherein the method comprises the following steps: in step S1: and verifying the identity information of the user through the work number, and judging whether the user is a company employee or not, wherein if the user is the company employee, the user can enter the system to upload the file, and otherwise, the user cannot enter the system.

8. The method for storing the computer files based on the blockchain as claimed in claim 6, wherein the method comprises the following steps: in step S3: in order to obtain the file feature value, the data is projected to a low-dimensional space by using an LDA model, and then keyword extraction is performed:

wherein, mu _j Represents the mean value of m data in the jth file data,

representing a covariance matrix of m data in the jth document data;

(2) Then, an optimization objective function J of the LDA model is calculated:

the uploaded files and the files with the same characteristics in the system are gathered by utilizing two classifications, and a divergence matrix in the classifications is defined as S _ω For representing the aggregation degree of data points in each file, and defining the inter-class divergence matrix as S _b For representing the dispersion degree of files of different types:

S _b ＝(μ ₀ -μ ₁ )(μ ₀ -μ ₁ ) ^T ；

in order to aggregate files of the same class, files of different classes are scattered, and an optimization objective function of an LDA model is defined:

by making omega ^T S _ω ω =1, the above formula can be maximized;

(3) Then, a projection line ω:

9. The blockchain-based computer file storage method of claim 8, wherein: in step S3: classifying all files according to the file characteristic values, classifying the same class files gathered together into one class, and further confirming the types and the departments to which the class files belong; and mapping the file characteristic value and the file access link corresponding to the file characteristic value to the data memory address by using a separation linking method.

10. The method for storing the computer files based on the blockchain as claimed in claim 6, wherein the method comprises the following steps: in step S5: identifying identity information of company personnel, and predicting files which are possibly extracted by a user according to the identity information of the company personnel, including names, affiliated departments and extraction records after the identity information passes verification, so that the range of the files is narrowed; the system identifies the key words input by the user, finds the characteristic values of the files, generates file access links, and clicks the links by the user, so that the information content of the files is displayed.