CN111324782A - Big data storage system - Google Patents

Big data storage system Download PDF

Info

Publication number
CN111324782A
CN111324782A CN202010189556.5A CN202010189556A CN111324782A CN 111324782 A CN111324782 A CN 111324782A CN 202010189556 A CN202010189556 A CN 202010189556A CN 111324782 A CN111324782 A CN 111324782A
Authority
CN
China
Prior art keywords
data
module
classification
target data
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010189556.5A
Other languages
Chinese (zh)
Inventor
林波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202010189556.5A priority Critical patent/CN111324782A/en
Publication of CN111324782A publication Critical patent/CN111324782A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a big data storage system, comprising: the target data acquisition module acquires target data through multiple ports and sends the acquired data to the data preprocessing module; the data preprocessing module is used for clearing redundant data based on an ISODATA algorithm; the data feature extraction module is used for extracting feature data of the target data by adopting an attribute reduction algorithm based on attribute importance; the data classification module is used for realizing the classification of the target data according to the characteristic data based on a naive Bayesian classification algorithm; and the data positioning module is used for finding a proper position in the database for the target data according to the classification result of the data classification module, finding a similar data point for the target data and establishing a relation between the similar data point and the target data point. The invention can realize automatic arrangement and storage of big data and convert various mass data into usable data with information and commercial value.

Description

Big data storage system
Technical Field
The invention relates to the field of big data, in particular to a big data storage system.
Background
With the explosive growth and accumulation of information, the big data era has come, and the basic characteristics of big data are as follows: the data volume is large, the types are various, the value density is low, the speed is high, and the time efficiency is high; for big data, the association is more important than the cause-and-effect relationship, which is the conclusion in the big data era, how to analyze the correlation between certain types of information in the face of massive information, and then analyze the implicit value behind the information, and then reflect the value of the data information at a higher and deeper level, but in the face of such massive data, it is very difficult to rapidly and accurately analyze the association between the data.
At present, the traditional data storage system generally has the defects of single function, need of manually mining and classifying data values in a later period and the like.
Disclosure of Invention
In order to solve the above problems, the present invention provides a big data storage system, which can convert numerous and diverse mass data into usable data with information and commercial value while realizing automatic arrangement and storage of big data.
In order to achieve the purpose, the invention adopts the technical scheme that:
a big data storage system, comprising:
the target data acquisition module acquires target data through multiple ports and sends the acquired data to the data preprocessing module;
the data preprocessing module is used for clearing redundant data based on an ISODATA algorithm;
the data feature extraction module is used for extracting feature data of the target data by adopting an attribute reduction algorithm based on attribute importance;
the data classification module is used for realizing the classification of the target data according to the characteristic data based on a naive Bayesian classification algorithm;
and the data positioning module is used for finding a proper position in the database for the target data according to the classification result of the data classification module, finding a similar data point for the target data and establishing a relation between the similar data point and the target data point.
Further, still include:
and the format standardization module is used for calling a corresponding format standardization algorithm according to the classification result of the data classification module to realize the format standardization of the target data.
Further, still include:
and the data access authority limiting module is used for realizing the limitation of the access authority of the target data according to the classification result of the data classification module and controlling the access authority through the biological characteristic data and a specific encryption algorithm.
Further, still include:
and the picture data identification module is used for identifying the picture data, prestoring the identified picture data into a corresponding database through the picture data transmission channel, and taking the data source information (the model of the target data acquisition module) and the acquisition time of the picture data as picture names when the picture data are prestored.
Further, still include:
and the picture data processing module is used for marking the data with the standardized format onto the corresponding picture in a hyperlink mode, and the direct access of the data with the standardized format can be realized by clicking the hyperlink.
Further, still include:
and the operation state monitoring module is deployed on the server in a static jar packet mode, records the operation state of a user in a script recording mode, monitors the operation state of each module in real time, compares the recorded operation state data with the behavior data in the abnormal behavior database in a similarity manner, starts the short message automatic editing and sending module when the comparison result falls into a preset threshold, and sends the comparison result to a specified mobile terminal for displaying.
And when the database memory reaches a preset threshold, the automatic short message editing and sending module is started, the monitoring result is sent to a specified mobile terminal to be displayed, and meanwhile, the data caching module is started to realize the caching of the target data.
The invention has the following beneficial effects:
1) the database real-time updating is realized, meanwhile, the automatic arrangement and storage of the data are realized, and the calling of the later data is greatly facilitated.
2) The analysis efficiency of large-scale data is improved, massive data which are numerous and diverse can be converted into available data with information and commercial values, and then data mining is automatically completed.
Drawings
FIG. 1 is a system block diagram of a big data storage system according to an embodiment of the present invention.
Detailed Description
In order that the objects and advantages of the invention will be more clearly understood, the invention is further described in detail below with reference to examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, an embodiment of the present invention provides a big data storage system, including:
the target data acquisition module acquires target data through multiple ports and sends the acquired data to the data preprocessing module;
the data preprocessing module is used for clearing redundant data based on an ISODATA algorithm;
the data feature extraction module is used for extracting feature data of the target data by adopting an attribute reduction algorithm based on attribute importance;
the data classification module is used for realizing the classification of the target data according to the characteristic data based on a naive Bayesian classification algorithm;
and the data positioning module is used for finding a proper position in the database for the target data according to the classification result of the data classification module, finding a similar data point for the target data and establishing a relation between the similar data point and the target data point. The data positioning module realizes data positioning based on a facet technology, and accurately positions data by calculating a facet distance between different data terms; when the data is positioned, corresponding terms are selected under the constraint of the known facets, so that the description of the required data is completed, and if the selection is successful, the corresponding data is returned; if the selection is unsuccessful, the system calculates the similarity of terms according to the synonym dictionary and the concept distance map to form new positioning information;
and the format standardization module is used for calling a corresponding format standardization algorithm according to the classification result of the data classification module to realize the format standardization of the target data.
And the data access authority limiting module is used for realizing the limitation of the access authority of the target data according to the classification result of the data classification module and controlling the access authority through the biological characteristic data and a specific encryption algorithm.
And the picture data identification module is used for identifying the picture data, prestoring the identified picture data into a corresponding database through the picture data transmission channel, and taking the data source information (the model of the target data acquisition module) and the acquisition time of the picture data as picture names when the picture data are prestored.
And the picture data processing module is used for marking the data with the standardized format onto the corresponding picture in a hyperlink mode, and the direct access of the data with the standardized format can be realized by clicking the hyperlink.
The operation state monitoring module is deployed on the server in a static jar packet mode, records the operation state of a user in a script recording mode, monitors the operation state of each module in real time, compares the recorded operation state data with the behavior data in the abnormal behavior database in a similarity manner, starts the short message automatic editing and sending module when the comparison result falls into a preset threshold, and sends the comparison result to a specified mobile terminal for displaying;
the database memory monitoring module is used for monitoring the database memory, when the database memory reaches a preset threshold, the automatic short message editing and sending module is started, the monitoring result is sent to a specified mobile terminal to be displayed, and meanwhile, the data caching module is started to realize the caching of target data;
and the central processing unit is used for realizing the work coordination of the modules.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and these improvements and modifications should also be construed as the protection scope of the present invention.

Claims (7)

1. A big data storage system, comprising: the method comprises the following steps:
the target data acquisition module acquires target data through multiple ports and sends the acquired data to the data preprocessing module;
the data preprocessing module is used for clearing redundant data based on an ISODATA algorithm;
the data feature extraction module is used for extracting feature data of the target data by adopting an attribute reduction algorithm based on attribute importance;
the data classification module is used for realizing the classification of the target data according to the characteristic data based on a naive Bayesian classification algorithm;
and the data positioning module is used for finding a proper position in the database for the target data according to the classification result of the data classification module, finding a similar data point for the target data and establishing a relation between the similar data point and the target data point.
2. A big data storage system as in claim 1, wherein: further comprising:
and the format standardization module is used for calling a corresponding format standardization algorithm according to the classification result of the data classification module to realize the format standardization of the target data.
3. A big data storage system as in claim 1, wherein: further comprising:
and the data access authority limiting module is used for realizing the limitation of the access authority of the target data according to the classification result of the data classification module and controlling the access authority through the biological characteristic data and a specific encryption algorithm.
4. A big data storage system as in claim 1, wherein: further comprising:
and the picture data identification module is used for identifying the picture data, prestoring the identified picture data into a corresponding database through the picture data transmission channel, and taking the data source information and the acquisition time of the picture data as picture names during prestoring.
5. A big data storage system as in claim 1, wherein: further comprising:
and the picture data processing module is used for marking the data with the standardized format onto the corresponding picture in a hyperlink mode, and the direct access of the data with the standardized format can be realized by clicking the hyperlink.
6. A big data storage system as in claim 1, wherein: further comprising:
and the operation state monitoring module is deployed on the server in a static jar packet mode, records the operation state of a user in a script recording mode, monitors the operation state of each module in real time, compares the recorded operation state data with the behavior data in the abnormal behavior database in a similarity manner, starts the short message automatic editing and sending module when the comparison result falls into a preset threshold, and sends the comparison result to a specified mobile terminal for displaying.
7. A big data storage system as in claim 1, wherein: further comprising:
and the database memory monitoring module is used for monitoring the database memory, when the database memory reaches a preset threshold, the automatic short message editing and sending module is started, the monitoring result is sent to a specified mobile terminal for displaying, and the data caching module is started to realize the caching of the target data.
CN202010189556.5A 2020-03-18 2020-03-18 Big data storage system Pending CN111324782A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010189556.5A CN111324782A (en) 2020-03-18 2020-03-18 Big data storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010189556.5A CN111324782A (en) 2020-03-18 2020-03-18 Big data storage system

Publications (1)

Publication Number Publication Date
CN111324782A true CN111324782A (en) 2020-06-23

Family

ID=71173320

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010189556.5A Pending CN111324782A (en) 2020-03-18 2020-03-18 Big data storage system

Country Status (1)

Country Link
CN (1) CN111324782A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111766811A (en) * 2020-07-01 2020-10-13 泰州市柯普尼通讯设备有限公司 Ship satellite vat information classification storage system and method
CN112052366A (en) * 2020-09-08 2020-12-08 河南工业职业技术学院 Computer big data storage system
CN112559742A (en) * 2020-12-08 2021-03-26 北京伟杰东博信息科技有限公司 Classified storage method and system thereof
CN112632156A (en) * 2021-01-29 2021-04-09 赵琰 Big data-based computer data analysis and management system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150026114A1 (en) * 2013-07-18 2015-01-22 Dania M. Triff System and method of automatically extracting data from plurality of data sources and loading the same to plurality of target databases
CN104317827A (en) * 2014-10-09 2015-01-28 深圳码隆科技有限公司 Picture navigation method of commodity
CN109145556A (en) * 2018-07-28 2019-01-04 江苏经贸职业技术学院 A kind of Computer Intelligent Control System
CN109857784A (en) * 2019-02-12 2019-06-07 吉林师范大学 A kind of big data statistical analysis system
CN109857782A (en) * 2019-01-28 2019-06-07 中国石油大学胜利学院 A kind of Monitor of Logging Data Processing System

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150026114A1 (en) * 2013-07-18 2015-01-22 Dania M. Triff System and method of automatically extracting data from plurality of data sources and loading the same to plurality of target databases
CN104317827A (en) * 2014-10-09 2015-01-28 深圳码隆科技有限公司 Picture navigation method of commodity
CN109145556A (en) * 2018-07-28 2019-01-04 江苏经贸职业技术学院 A kind of Computer Intelligent Control System
CN109857782A (en) * 2019-01-28 2019-06-07 中国石油大学胜利学院 A kind of Monitor of Logging Data Processing System
CN109857784A (en) * 2019-02-12 2019-06-07 吉林师范大学 A kind of big data statistical analysis system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111766811A (en) * 2020-07-01 2020-10-13 泰州市柯普尼通讯设备有限公司 Ship satellite vat information classification storage system and method
CN111766811B (en) * 2020-07-01 2021-12-03 泰州市柯普尼通讯设备有限公司 Ship satellite vat information classification storage system and method
CN112052366A (en) * 2020-09-08 2020-12-08 河南工业职业技术学院 Computer big data storage system
CN112559742A (en) * 2020-12-08 2021-03-26 北京伟杰东博信息科技有限公司 Classified storage method and system thereof
CN112632156A (en) * 2021-01-29 2021-04-09 赵琰 Big data-based computer data analysis and management system

Similar Documents

Publication Publication Date Title
CN111324782A (en) Big data storage system
CN108040074B (en) Real-time network abnormal behavior detection system and method based on big data
CN104270275A (en) Auxiliary analysis method for causes of exceptions, server and intelligent equipment
CN113347502B (en) Video review method, video review device, electronic equipment and medium
CN113157994A (en) Multi-source heterogeneous platform data processing method
CN112488222B (en) Crowdsourcing data labeling method, system, server and storage medium
CN111369133A (en) Big data risk monitoring system
CN112672086B (en) Audio and video equipment data acquisition, analysis, early warning system
CN111966339B (en) Buried point parameter input method and device, computer equipment and storage medium
CN110138583B (en) Display method for intelligent alarm analysis
CN112052248A (en) Audit big data processing method and system
CN110825940B (en) Network data packet storage and query method
CN112685510B (en) Asset labeling method, computer program and storage medium based on full flow label
CN110718022A (en) Alarm method of intelligent electric meter, server and computer readable storage medium
CN111666525B (en) Information interception system and method
CN114546957A (en) Intelligent centralized data processing service platform
CN109714771B (en) Base station preprocessing method in base station analysis work based on signaling data
CN113139759A (en) Power grid data asset management method and system
CN116208464B (en) Broadcast transmitter fault big data information analysis method and system based on cloud computing
CN111078958A (en) Instant statistics and decision-making system for diversified biological data
CN118018330B (en) Data analysis method and system based on artificial intelligence
CN115858249B (en) Backup method for massive unstructured data files
CN115292566A (en) Routing inspection fault processing method and device, computer and medium
CN114880501A (en) Digital field information acquisition system and acquisition method for land survey
CN114022197A (en) Advertisement flow real-time detection system and detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200623