CN112052220A - System and method for searching and sharing files of big data platform - Google Patents

System and method for searching and sharing files of big data platform Download PDF

Info

Publication number
CN112052220A
CN112052220A CN202010892128.9A CN202010892128A CN112052220A CN 112052220 A CN112052220 A CN 112052220A CN 202010892128 A CN202010892128 A CN 202010892128A CN 112052220 A CN112052220 A CN 112052220A
Authority
CN
China
Prior art keywords
file
data
platform
big data
sharing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010892128.9A
Other languages
Chinese (zh)
Inventor
李丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Yunye Technology Co ltd
Original Assignee
Chengdu Yunye Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Yunye Technology Co ltd filed Critical Chengdu Yunye Technology Co ltd
Priority to CN202010892128.9A priority Critical patent/CN112052220A/en
Publication of CN112052220A publication Critical patent/CN112052220A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2107File encryption

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a system and a method for searching and sharing files of a big data platform, comprising the following steps: a server and a platform system cluster disposed on the server, the platform system cluster comprising: the data acquisition sub-platform system comprises: the data are collected in real time through a client, and are transmitted in time through the flash; a data processing sub-platform system; before processing the data acquired by the data acquisition sub-platform system, a natural language processing system is used, a reverse sorting algorithm is used for realizing full-text retrieval of data contents, and then data processing is carried out; a data storage sub-platform system: the partitions store different data information; sharing and exchanging sub-platform system: and sharing and exchanging the data information and outputting the data. According to the invention, the data is calculated and stored in a full amount through mass big data, the uploading is convenient, the file retrieval of a user is convenient, and the file can be directly shared after the retrieval, so that the whole platform is convenient to use.

Description

System and method for searching and sharing files of big data platform
Technical Field
The invention belongs to the technical field of big data, and particularly relates to a system and a method for retrieving and sharing files of a big data platform.
Background
In the world, information-based wave is rolled around the world, big data, cloud computing, the internet of things and the like are developed vigorously, so that a new step is taken in the internet era, big data technology is undoubtedly a hot topic of the whole information industry and even the whole society, and all industries discuss how the big data can bring changes and promotion to the industry where the big data is located, so that the value concept of 'people are doing, turning numbers and computing clouds' is realized as the highest target.
The big data technology is rapidly developed with the advantages of large capacity, fast reading and writing, wide application range and the like, and becomes the most important data aggregation technology and information service and management technology at present. The big data technology can help the technology and the industry to realize cross fusion, help the human to develop new knowledge, create new industry and improve the value and the capability of the traditional field. Therefore, big data is becoming an important mark and implementation tool for the industrial revolution and group leap in the current society. However, how to effectively share the big data retrieval results ensures that the retrieved big data can be more conveniently used.
In addition, the big data platform is based on an open source distributed framework Hadoop and is deeply optimized, a large number of excellent open source technical frameworks are integrated, and a big data ecosystem is formed by data distributed acquisition, big data aggregation transmission, data distributed storage, real-time and offline data analysis and calculation, intelligent full-text retrieval, big data visualization and the like. The traditional database cannot support the calculation and storage of the whole mass of data, and is very inconvenient to use.
Therefore, how to solve the above-mentioned drawbacks of the prior art becomes the direction of efforts of those skilled in the art.
Disclosure of Invention
The invention aims to provide a system and a method for searching and sharing files of a big data platform, which can completely solve the defects of the prior art.
The purpose of the invention is realized by the following technical scheme: a big data platform file retrieval and sharing system and method thereof includes:
a server and a platform system cluster disposed on the server, the platform system cluster comprising:
the data acquisition sub-platform system comprises: the data are collected in real time through a client, and are transmitted in time through the flash;
a data processing sub-platform system; before processing the data acquired by the data acquisition sub-platform system, a natural language processing system is used, a reverse sorting algorithm is used for realizing full-text retrieval of data contents, and then data processing is carried out;
a data storage sub-platform system: the partitions store different data information;
sharing and exchanging sub-platform system: and sharing and exchanging the data information and outputting the data.
As one of the preferable modes, the data input processing flow comprises the following steps:
The first step is as follows: a user initiates a file uploading request to a big data center platform;
the second step is that: the big data center platform controls a data acquisition unit to upload files by using the file uploading module in the big data center platform data acquisition unit;
the third step: the content of the file requested to be uploaded by the user is safely inspected through the content inspection module used in the big data center platform, and whether the content is safe is judged; if the file is unsafe, uploading is finished; if the file is safe, the file receiving module receives the file;
the fourth step: performing content reading, feature extraction and index construction on file contents through the data processing unit for the big data center;
the fifth step: the file and the index file are stored separately,
and a sixth step: and performing file retrieval, file sharing and safe transmission on file contents through the sharing and exchanging unit of the big data center data.
As one of the preferable modes, in the fourth step, a feature extraction algorithm suitable for the feature selection is selected according to the read file content features.
As one preferable mode, an encryption module is further provided.
As one of the preferable modes, a user initiates an encrypted file query request to a big data center platform, the big data center platform locates an encrypted index through a file retrieval module in a sharing system for the encrypted file of the big data center platform, an index key decryption index is obtained from a key management module in the system for retrieving and sharing the encrypted file of the big data center platform, the decrypted index is retrieved, a file which meets the query request is located, a query result is returned to the user initiating the file query request, the user judges whether the file is shared according to the returned retrieval result, and if the file is not required to be shared, the query is finished; if the file is required to be shared, a user initiates a sharing request to the big data center platform, requests to return a file corresponding to the query result, checks whether the file is allowed to be shared to the user initiating the request, and if the file is allowed to be shared, acquires a file key from a key management module in a system for retrieving and sharing the encrypted file of the big data center platform to decrypt the file allowed to be shared.
Further preferably: and the big data center platform adopts an XMA integration collaboration platform.
Compared with the prior art, the invention has the beneficial effects that: a system and a method for searching and sharing files of a big data platform are provided, which carry out the total calculation and storage of data through mass big data, are convenient to upload and convenient for searching files of users, and can also directly share the files after searching, so that the whole platform is convenient to use.
Drawings
FIG. 1 is a schematic diagram of the architecture of the big data platform file retrieval and sharing system of the present invention;
FIG. 2 is a data flow diagram of the big data platform file retrieval and sharing system of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention belongs to a big data integration cooperative platform, which has the main functions of extracting shared data from other subsystems and carrying out data fusion processing on data which are mutually inconsistent and are from multiple sources; organizing real-time data and historical data based on a data dictionary to ensure correctness and understandability of the relationship between the data and avoid data redundancy; providing data service in various forms, setting authority for various users by adopting a hierarchical method, so that different users can obtain respective required data and ensure the safety of the data transmission process and the interoperability and interoperability of shared data; maintaining basic information, dynamic service data and system management configuration parameters; the operation and maintenance capabilities of a network framework, information safety, network management, process management, database maintenance, backup and the like of the system are supported. Firstly, the exchange service and route flow management of basic data and shared data, which is the basis of the exchange platform, includes: static exchange data, dynamic exchange data, graphic data, tables, statistical data and other attribute data. Then, the interface between subsystems is realized, and the data sharing and transmission operation between subsystems is realized according to the pre-made specifications and standards. When the central platform is accessed, the system structure is designed according to the system integration requirement, and various data interfaces follow the system integration specification.
In the specific implementation:
as shown in fig. 1 and fig. 2, a system and method for retrieving and sharing a large data platform file includes: a server and a platform system cluster disposed on the server, the platform system cluster comprising: the data acquisition sub-platform system comprises: the data are collected in real time through a client, and are transmitted in time through the flash; a data processing sub-platform system; before processing the data acquired by the data acquisition sub-platform system, a natural language processing system is used, a reverse sorting algorithm is used for realizing full-text retrieval of data contents, and then data processing is carried out; a data storage sub-platform system: the partitions store different data information; sharing and exchanging sub-platform system: and sharing and exchanging the data information and outputting the data.
The data input processing flow comprises the following steps:
the first step is as follows: a user initiates a file uploading request to a big data center platform;
the second step is that: the big data center platform controls a data acquisition unit to upload files by using the file uploading module in the big data center platform data acquisition unit;
the third step: the content of the file requested to be uploaded by the user is safely inspected through the content inspection module used in the big data center platform, and whether the content is safe is judged; if the file is unsafe, uploading is finished; if the file is safe, the file receiving module receives the file;
The fourth step: performing content reading, feature extraction and index construction on file contents through the data processing unit for the big data center; and selecting a suitable feature extraction algorithm according to the read file content features.
The fifth step: the file and the index file are stored separately,
and a sixth step: and performing file retrieval, file sharing and safe transmission on file contents through the sharing and exchanging unit of the big data center data.
In addition, an encryption module is also arranged. A user initiates an encrypted file query request to a big data center platform, the big data center platform positions an encrypted index through a file retrieval module in a sharing system for the encrypted file of the big data center platform, an index key decryption index is obtained from a key management module in the system for retrieving and sharing the encrypted file of the big data center platform, the decrypted index is retrieved, a file which meets the query request is positioned, a query result is returned to the user initiating the file query request, the user judges whether the file is shared according to the returned retrieval result, and if the file is not required to be shared, the query is finished; if the file is required to be shared, a user initiates a sharing request to the big data center platform, requests to return a file corresponding to the query result, checks whether the file is allowed to be shared to the user initiating the request, and if the file is allowed to be shared, acquires a file key from a key management module in a system for retrieving and sharing the encrypted file of the big data center platform to decrypt the file allowed to be shared.
The bottom layer of the XMA integration cooperative platform adopts a message middleware technology to realize reliable data transmission. The data exchange is realized on the basis of services in an application layer, and data acquisition, data summarization, data distribution, data update notification, data forwarding and data conversion must be supported. And a real-time, timed and on-demand data exchange mode is supported. The method supports various data sources and provides support for identity authentication, user authorization, transmission encryption, data integrity, data credibility and data validity. Supporting data segment transmission, data compression/decompression, data caching and the like. The XMA integration collaboration platform must provide a security mechanism to ensure the integrity and confidentiality of data exchange information. The XMA integrated collaboration platform must be able to effectively integrate with the security certification platform. The safety authentication platform and the XMA integration cooperation platform can protect the exchange information content from interception or illegal modification.
The process of collecting the public data can be regarded as a data summarizing process, and the public data of each business department is collected back and collected to a cache database of the data center through the xMA integration collaboration platform. And obtaining consistent data through comparison, verification and conversion of the data management system. Data distribution is the process of actively providing data to various data users from the perspective of a data center. Data are distributed from the data center to each data use department according to rules of data use authority through public data service, and data sharing and information linkage are achieved. The data exchange service may convert data from a certain database into a standard XML data set. And the data conversion module is used for converting various heterogeneous data into public data with uniform standard specification, consistency and integrity.
In addition, the data interface system should be an open system, and some extensible interfaces and secondary development interfaces are provided to support users to define their own featured services based on these interfaces.
And: and monitoring and managing data service, managing user authority, checking running log and counting performance. Details of data exchange can be recorded and tracked through the data service log. And managing the data exchange nodes, and providing a security policy guide and server security management configuration.
The application integration based on the WebServices technology supports the interfaces of the application systems through seamless integration of mainstream WebServices protocols such as SOAP, XMLRPC and the like, provides an application system integration adapter based on the WebServices, and provides a tool and an interface API for quickly integrating WebServices applications.
The data provider defines a public data service, encapsulating the content and protocols of the data exchange in the form of a service. The data user calls the public data service of the data provider to acquire the required data and updates the data to the local data source according to certain data conversion and data updating rules. Data exchange between a data provider and a data consumer is realized through interaction of a local data service and an open data service.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (6)

1. A big data platform file retrieval and sharing system is characterized in that: it includes:
a server and a platform system cluster disposed on the server, the platform system cluster comprising:
the data acquisition sub-platform system comprises: the data are collected in real time through a client, and are transmitted in time through the flash;
a data processing sub-platform system; before processing the data acquired by the data acquisition sub-platform system, a natural language processing system is used, a reverse sorting algorithm is used for realizing full-text retrieval of data contents, and then data processing is carried out;
a data storage sub-platform system: the partitions store different data information;
sharing and exchanging sub-platform system: and sharing and exchanging the data information and outputting the data.
2. The method of a big data platform file retrieval and sharing system as claimed in claim 1, wherein: the data input processing flow comprises the following steps:
The first step is as follows: a user initiates a file uploading request to a big data center platform;
the second step is that: the big data center platform controls a data acquisition unit to upload files by using the file uploading module in the big data center platform data acquisition unit;
the third step: the content of the file requested to be uploaded by the user is safely inspected through the content inspection module used in the big data center platform, and whether the content is safe is judged; if the file is unsafe, uploading is finished; if the file is safe, the file receiving module receives the file;
the fourth step: performing content reading, feature extraction and index construction on file contents through the data processing unit for the big data center;
the fifth step: the file and the index file are stored separately,
and a sixth step: and performing file retrieval, file sharing and safe transmission on file contents through the sharing and exchanging unit of the big data center data.
3. The method of a big data platform file retrieval and sharing system as claimed in claim 2, wherein: and in the fourth step, selecting a characteristic extraction algorithm which is suitable for the characteristic selection according to the read file content characteristics.
4. The method of a big data platform file retrieval and sharing system as claimed in claim 2, wherein: in addition, an encryption module is also arranged.
5. The method of a big data platform file retrieval and sharing system as claimed in claim 4, wherein: a user initiates an encrypted file query request to a big data center platform, the big data center platform positions an encrypted index through a file retrieval module in a sharing system for the encrypted file of the big data center platform, an index key decryption index is obtained from a key management module in the system for retrieving and sharing the encrypted file of the big data center platform, the decrypted index is retrieved, a file which meets the query request is positioned, a query result is returned to the user initiating the file query request, the user judges whether the file is shared according to the returned retrieval result, and if the file is not required to be shared, the query is finished; if the file is required to be shared, a user initiates a sharing request to the big data center platform, requests to return a file corresponding to the query result, checks whether the file is allowed to be shared to the user initiating the request, and if the file is allowed to be shared, acquires a file key from a key management module in a system for retrieving and sharing the encrypted file of the big data center platform to decrypt the file allowed to be shared.
6. The method of a big data platform file retrieval and sharing system as claimed in claim 2, wherein: and the big data center platform adopts an XMA integration collaboration platform.
CN202010892128.9A 2020-08-31 2020-08-31 System and method for searching and sharing files of big data platform Pending CN112052220A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010892128.9A CN112052220A (en) 2020-08-31 2020-08-31 System and method for searching and sharing files of big data platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010892128.9A CN112052220A (en) 2020-08-31 2020-08-31 System and method for searching and sharing files of big data platform

Publications (1)

Publication Number Publication Date
CN112052220A true CN112052220A (en) 2020-12-08

Family

ID=73607081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010892128.9A Pending CN112052220A (en) 2020-08-31 2020-08-31 System and method for searching and sharing files of big data platform

Country Status (1)

Country Link
CN (1) CN112052220A (en)

Similar Documents

Publication Publication Date Title
Lv et al. BIM big data storage in WebVRGIS
US20200334605A1 (en) Emergency resource sharing and exchange system
CN107465656B (en) Security monitoring big data processing method and system based on cloud computing
Das et al. Big data analytics: A framework for unstructured data analysis
CN103812939B (en) Big data storage system
CN102651775B (en) Based on method, the equipment and system of many tenants shared object management of cloud computing
Wang et al. Research and implementation on spatial data storage and operation based on Hadoop platform
CN111125228A (en) Data sharing method and device based on forestry data sharing service platform
TW201205320A (en) Optimizing data cache when applying user-based security
Khan et al. Data model for big data in cloud environment
CN108268614A (en) A kind of distribution management method of forest reserves spatial data
CN113535846A (en) Big data platform and construction method thereof
Zhang et al. Research on remote sensing data sharing model based on blockchain technology
CN113011960A (en) Block chain-based data access method, device, medium and electronic equipment
CN112052220A (en) System and method for searching and sharing files of big data platform
Li et al. Smart tourism identity authentication service based on blockchain and decentralized identifier
CN115510116A (en) Data directory construction method, device, medium and equipment
CN112306992B (en) Big data platform system based on internet
Ribeiro et al. A scalable data integration architecture for smart cities: implementation and evaluation
CN114925044A (en) Data synchronization method, device and equipment based on cloud storage and storage medium
Tang et al. Skewness‐aware clustering tree for unevenly distributed spatial sensor nodes in smart city
Imran et al. Searching in cloud object storage by using a metadata model
Rahunathan et al. Efficient Multi Keyword Search in Heterogeneous Environment Based On Ranking Technique
CN113177022A (en) Full-process big data storage method for aluminum/copper plate strip production
Jiang et al. Research and Modeling Electronic Data Tracing Scheme Based on Blockchain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201208