CN112052220A - System and method for searching and sharing files of big data platform - Google Patents
System and method for searching and sharing files of big data platform Download PDFInfo
- Publication number
- CN112052220A CN112052220A CN202010892128.9A CN202010892128A CN112052220A CN 112052220 A CN112052220 A CN 112052220A CN 202010892128 A CN202010892128 A CN 202010892128A CN 112052220 A CN112052220 A CN 112052220A
- Authority
- CN
- China
- Prior art keywords
- file
- data
- platform
- big data
- sharing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000012545 processing Methods 0.000 claims abstract description 18
- 238000013500 data storage Methods 0.000 claims abstract description 4
- 238000003058 natural language processing Methods 0.000 claims abstract description 4
- 238000005192 partition Methods 0.000 claims abstract description 4
- 230000010354 integration Effects 0.000 claims description 12
- 238000007726 management method Methods 0.000 claims description 12
- 230000005540 biological transmission Effects 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 6
- 230000000977 initiatory effect Effects 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 3
- 238000007689 inspection Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 8
- 238000006243 chemical reaction Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013475 authorization Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013079 data visualisation Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000000344 soap Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/176—Support for shared access to files; File sharing support
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2107—File encryption
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Storage Device Security (AREA)
Abstract
The invention discloses a system and a method for searching and sharing files of a big data platform, comprising the following steps: a server and a platform system cluster disposed on the server, the platform system cluster comprising: the data acquisition sub-platform system comprises: the data are collected in real time through a client, and are transmitted in time through the flash; a data processing sub-platform system; before processing the data acquired by the data acquisition sub-platform system, a natural language processing system is used, a reverse sorting algorithm is used for realizing full-text retrieval of data contents, and then data processing is carried out; a data storage sub-platform system: the partitions store different data information; sharing and exchanging sub-platform system: and sharing and exchanging the data information and outputting the data. According to the invention, the data is calculated and stored in a full amount through mass big data, the uploading is convenient, the file retrieval of a user is convenient, and the file can be directly shared after the retrieval, so that the whole platform is convenient to use.
Description
Technical Field
The invention belongs to the technical field of big data, and particularly relates to a system and a method for retrieving and sharing files of a big data platform.
Background
In the world, information-based wave is rolled around the world, big data, cloud computing, the internet of things and the like are developed vigorously, so that a new step is taken in the internet era, big data technology is undoubtedly a hot topic of the whole information industry and even the whole society, and all industries discuss how the big data can bring changes and promotion to the industry where the big data is located, so that the value concept of 'people are doing, turning numbers and computing clouds' is realized as the highest target.
The big data technology is rapidly developed with the advantages of large capacity, fast reading and writing, wide application range and the like, and becomes the most important data aggregation technology and information service and management technology at present. The big data technology can help the technology and the industry to realize cross fusion, help the human to develop new knowledge, create new industry and improve the value and the capability of the traditional field. Therefore, big data is becoming an important mark and implementation tool for the industrial revolution and group leap in the current society. However, how to effectively share the big data retrieval results ensures that the retrieved big data can be more conveniently used.
In addition, the big data platform is based on an open source distributed framework Hadoop and is deeply optimized, a large number of excellent open source technical frameworks are integrated, and a big data ecosystem is formed by data distributed acquisition, big data aggregation transmission, data distributed storage, real-time and offline data analysis and calculation, intelligent full-text retrieval, big data visualization and the like. The traditional database cannot support the calculation and storage of the whole mass of data, and is very inconvenient to use.
Therefore, how to solve the above-mentioned drawbacks of the prior art becomes the direction of efforts of those skilled in the art.
Disclosure of Invention
The invention aims to provide a system and a method for searching and sharing files of a big data platform, which can completely solve the defects of the prior art.
The purpose of the invention is realized by the following technical scheme: a big data platform file retrieval and sharing system and method thereof includes:
a server and a platform system cluster disposed on the server, the platform system cluster comprising:
the data acquisition sub-platform system comprises: the data are collected in real time through a client, and are transmitted in time through the flash;
a data processing sub-platform system; before processing the data acquired by the data acquisition sub-platform system, a natural language processing system is used, a reverse sorting algorithm is used for realizing full-text retrieval of data contents, and then data processing is carried out;
a data storage sub-platform system: the partitions store different data information;
sharing and exchanging sub-platform system: and sharing and exchanging the data information and outputting the data.
As one of the preferable modes, the data input processing flow comprises the following steps:
The first step is as follows: a user initiates a file uploading request to a big data center platform;
the second step is that: the big data center platform controls a data acquisition unit to upload files by using the file uploading module in the big data center platform data acquisition unit;
the third step: the content of the file requested to be uploaded by the user is safely inspected through the content inspection module used in the big data center platform, and whether the content is safe is judged; if the file is unsafe, uploading is finished; if the file is safe, the file receiving module receives the file;
the fourth step: performing content reading, feature extraction and index construction on file contents through the data processing unit for the big data center;
the fifth step: the file and the index file are stored separately,
and a sixth step: and performing file retrieval, file sharing and safe transmission on file contents through the sharing and exchanging unit of the big data center data.
As one of the preferable modes, in the fourth step, a feature extraction algorithm suitable for the feature selection is selected according to the read file content features.
As one preferable mode, an encryption module is further provided.
As one of the preferable modes, a user initiates an encrypted file query request to a big data center platform, the big data center platform locates an encrypted index through a file retrieval module in a sharing system for the encrypted file of the big data center platform, an index key decryption index is obtained from a key management module in the system for retrieving and sharing the encrypted file of the big data center platform, the decrypted index is retrieved, a file which meets the query request is located, a query result is returned to the user initiating the file query request, the user judges whether the file is shared according to the returned retrieval result, and if the file is not required to be shared, the query is finished; if the file is required to be shared, a user initiates a sharing request to the big data center platform, requests to return a file corresponding to the query result, checks whether the file is allowed to be shared to the user initiating the request, and if the file is allowed to be shared, acquires a file key from a key management module in a system for retrieving and sharing the encrypted file of the big data center platform to decrypt the file allowed to be shared.
Further preferably: and the big data center platform adopts an XMA integration collaboration platform.
Compared with the prior art, the invention has the beneficial effects that: a system and a method for searching and sharing files of a big data platform are provided, which carry out the total calculation and storage of data through mass big data, are convenient to upload and convenient for searching files of users, and can also directly share the files after searching, so that the whole platform is convenient to use.
Drawings
FIG. 1 is a schematic diagram of the architecture of the big data platform file retrieval and sharing system of the present invention;
FIG. 2 is a data flow diagram of the big data platform file retrieval and sharing system of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention belongs to a big data integration cooperative platform, which has the main functions of extracting shared data from other subsystems and carrying out data fusion processing on data which are mutually inconsistent and are from multiple sources; organizing real-time data and historical data based on a data dictionary to ensure correctness and understandability of the relationship between the data and avoid data redundancy; providing data service in various forms, setting authority for various users by adopting a hierarchical method, so that different users can obtain respective required data and ensure the safety of the data transmission process and the interoperability and interoperability of shared data; maintaining basic information, dynamic service data and system management configuration parameters; the operation and maintenance capabilities of a network framework, information safety, network management, process management, database maintenance, backup and the like of the system are supported. Firstly, the exchange service and route flow management of basic data and shared data, which is the basis of the exchange platform, includes: static exchange data, dynamic exchange data, graphic data, tables, statistical data and other attribute data. Then, the interface between subsystems is realized, and the data sharing and transmission operation between subsystems is realized according to the pre-made specifications and standards. When the central platform is accessed, the system structure is designed according to the system integration requirement, and various data interfaces follow the system integration specification.
In the specific implementation:
as shown in fig. 1 and fig. 2, a system and method for retrieving and sharing a large data platform file includes: a server and a platform system cluster disposed on the server, the platform system cluster comprising: the data acquisition sub-platform system comprises: the data are collected in real time through a client, and are transmitted in time through the flash; a data processing sub-platform system; before processing the data acquired by the data acquisition sub-platform system, a natural language processing system is used, a reverse sorting algorithm is used for realizing full-text retrieval of data contents, and then data processing is carried out; a data storage sub-platform system: the partitions store different data information; sharing and exchanging sub-platform system: and sharing and exchanging the data information and outputting the data.
The data input processing flow comprises the following steps:
the first step is as follows: a user initiates a file uploading request to a big data center platform;
the second step is that: the big data center platform controls a data acquisition unit to upload files by using the file uploading module in the big data center platform data acquisition unit;
the third step: the content of the file requested to be uploaded by the user is safely inspected through the content inspection module used in the big data center platform, and whether the content is safe is judged; if the file is unsafe, uploading is finished; if the file is safe, the file receiving module receives the file;
The fourth step: performing content reading, feature extraction and index construction on file contents through the data processing unit for the big data center; and selecting a suitable feature extraction algorithm according to the read file content features.
The fifth step: the file and the index file are stored separately,
and a sixth step: and performing file retrieval, file sharing and safe transmission on file contents through the sharing and exchanging unit of the big data center data.
In addition, an encryption module is also arranged. A user initiates an encrypted file query request to a big data center platform, the big data center platform positions an encrypted index through a file retrieval module in a sharing system for the encrypted file of the big data center platform, an index key decryption index is obtained from a key management module in the system for retrieving and sharing the encrypted file of the big data center platform, the decrypted index is retrieved, a file which meets the query request is positioned, a query result is returned to the user initiating the file query request, the user judges whether the file is shared according to the returned retrieval result, and if the file is not required to be shared, the query is finished; if the file is required to be shared, a user initiates a sharing request to the big data center platform, requests to return a file corresponding to the query result, checks whether the file is allowed to be shared to the user initiating the request, and if the file is allowed to be shared, acquires a file key from a key management module in a system for retrieving and sharing the encrypted file of the big data center platform to decrypt the file allowed to be shared.
The bottom layer of the XMA integration cooperative platform adopts a message middleware technology to realize reliable data transmission. The data exchange is realized on the basis of services in an application layer, and data acquisition, data summarization, data distribution, data update notification, data forwarding and data conversion must be supported. And a real-time, timed and on-demand data exchange mode is supported. The method supports various data sources and provides support for identity authentication, user authorization, transmission encryption, data integrity, data credibility and data validity. Supporting data segment transmission, data compression/decompression, data caching and the like. The XMA integration collaboration platform must provide a security mechanism to ensure the integrity and confidentiality of data exchange information. The XMA integrated collaboration platform must be able to effectively integrate with the security certification platform. The safety authentication platform and the XMA integration cooperation platform can protect the exchange information content from interception or illegal modification.
The process of collecting the public data can be regarded as a data summarizing process, and the public data of each business department is collected back and collected to a cache database of the data center through the xMA integration collaboration platform. And obtaining consistent data through comparison, verification and conversion of the data management system. Data distribution is the process of actively providing data to various data users from the perspective of a data center. Data are distributed from the data center to each data use department according to rules of data use authority through public data service, and data sharing and information linkage are achieved. The data exchange service may convert data from a certain database into a standard XML data set. And the data conversion module is used for converting various heterogeneous data into public data with uniform standard specification, consistency and integrity.
In addition, the data interface system should be an open system, and some extensible interfaces and secondary development interfaces are provided to support users to define their own featured services based on these interfaces.
And: and monitoring and managing data service, managing user authority, checking running log and counting performance. Details of data exchange can be recorded and tracked through the data service log. And managing the data exchange nodes, and providing a security policy guide and server security management configuration.
The application integration based on the WebServices technology supports the interfaces of the application systems through seamless integration of mainstream WebServices protocols such as SOAP, XMLRPC and the like, provides an application system integration adapter based on the WebServices, and provides a tool and an interface API for quickly integrating WebServices applications.
The data provider defines a public data service, encapsulating the content and protocols of the data exchange in the form of a service. The data user calls the public data service of the data provider to acquire the required data and updates the data to the local data source according to certain data conversion and data updating rules. Data exchange between a data provider and a data consumer is realized through interaction of a local data service and an open data service.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (6)
1. A big data platform file retrieval and sharing system is characterized in that: it includes:
a server and a platform system cluster disposed on the server, the platform system cluster comprising:
the data acquisition sub-platform system comprises: the data are collected in real time through a client, and are transmitted in time through the flash;
a data processing sub-platform system; before processing the data acquired by the data acquisition sub-platform system, a natural language processing system is used, a reverse sorting algorithm is used for realizing full-text retrieval of data contents, and then data processing is carried out;
a data storage sub-platform system: the partitions store different data information;
sharing and exchanging sub-platform system: and sharing and exchanging the data information and outputting the data.
2. The method of a big data platform file retrieval and sharing system as claimed in claim 1, wherein: the data input processing flow comprises the following steps:
The first step is as follows: a user initiates a file uploading request to a big data center platform;
the second step is that: the big data center platform controls a data acquisition unit to upload files by using the file uploading module in the big data center platform data acquisition unit;
the third step: the content of the file requested to be uploaded by the user is safely inspected through the content inspection module used in the big data center platform, and whether the content is safe is judged; if the file is unsafe, uploading is finished; if the file is safe, the file receiving module receives the file;
the fourth step: performing content reading, feature extraction and index construction on file contents through the data processing unit for the big data center;
the fifth step: the file and the index file are stored separately,
and a sixth step: and performing file retrieval, file sharing and safe transmission on file contents through the sharing and exchanging unit of the big data center data.
3. The method of a big data platform file retrieval and sharing system as claimed in claim 2, wherein: and in the fourth step, selecting a characteristic extraction algorithm which is suitable for the characteristic selection according to the read file content characteristics.
4. The method of a big data platform file retrieval and sharing system as claimed in claim 2, wherein: in addition, an encryption module is also arranged.
5. The method of a big data platform file retrieval and sharing system as claimed in claim 4, wherein: a user initiates an encrypted file query request to a big data center platform, the big data center platform positions an encrypted index through a file retrieval module in a sharing system for the encrypted file of the big data center platform, an index key decryption index is obtained from a key management module in the system for retrieving and sharing the encrypted file of the big data center platform, the decrypted index is retrieved, a file which meets the query request is positioned, a query result is returned to the user initiating the file query request, the user judges whether the file is shared according to the returned retrieval result, and if the file is not required to be shared, the query is finished; if the file is required to be shared, a user initiates a sharing request to the big data center platform, requests to return a file corresponding to the query result, checks whether the file is allowed to be shared to the user initiating the request, and if the file is allowed to be shared, acquires a file key from a key management module in a system for retrieving and sharing the encrypted file of the big data center platform to decrypt the file allowed to be shared.
6. The method of a big data platform file retrieval and sharing system as claimed in claim 2, wherein: and the big data center platform adopts an XMA integration collaboration platform.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010892128.9A CN112052220A (en) | 2020-08-31 | 2020-08-31 | System and method for searching and sharing files of big data platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010892128.9A CN112052220A (en) | 2020-08-31 | 2020-08-31 | System and method for searching and sharing files of big data platform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112052220A true CN112052220A (en) | 2020-12-08 |
Family
ID=73607081
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010892128.9A Pending CN112052220A (en) | 2020-08-31 | 2020-08-31 | System and method for searching and sharing files of big data platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112052220A (en) |
-
2020
- 2020-08-31 CN CN202010892128.9A patent/CN112052220A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lv et al. | BIM big data storage in WebVRGIS | |
US20200334605A1 (en) | Emergency resource sharing and exchange system | |
CN107465656B (en) | Security monitoring big data processing method and system based on cloud computing | |
Das et al. | Big data analytics: A framework for unstructured data analysis | |
CN103812939B (en) | Big data storage system | |
CN102651775B (en) | Based on method, the equipment and system of many tenants shared object management of cloud computing | |
Wang et al. | Research and implementation on spatial data storage and operation based on Hadoop platform | |
CN111125228A (en) | Data sharing method and device based on forestry data sharing service platform | |
TW201205320A (en) | Optimizing data cache when applying user-based security | |
Khan et al. | Data model for big data in cloud environment | |
CN108268614A (en) | A kind of distribution management method of forest reserves spatial data | |
CN113535846A (en) | Big data platform and construction method thereof | |
Zhang et al. | Research on remote sensing data sharing model based on blockchain technology | |
CN113011960A (en) | Block chain-based data access method, device, medium and electronic equipment | |
CN112052220A (en) | System and method for searching and sharing files of big data platform | |
Li et al. | Smart tourism identity authentication service based on blockchain and decentralized identifier | |
CN115510116A (en) | Data directory construction method, device, medium and equipment | |
CN112306992B (en) | Big data platform system based on internet | |
Ribeiro et al. | A scalable data integration architecture for smart cities: implementation and evaluation | |
CN114925044A (en) | Data synchronization method, device and equipment based on cloud storage and storage medium | |
Tang et al. | Skewness‐aware clustering tree for unevenly distributed spatial sensor nodes in smart city | |
Imran et al. | Searching in cloud object storage by using a metadata model | |
Rahunathan et al. | Efficient Multi Keyword Search in Heterogeneous Environment Based On Ranking Technique | |
CN113177022A (en) | Full-process big data storage method for aluminum/copper plate strip production | |
Jiang et al. | Research and Modeling Electronic Data Tracing Scheme Based on Blockchain |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20201208 |