CN114610833A - Data center unstructured data access method - Google Patents
Data center unstructured data access method Download PDFInfo
- Publication number
- CN114610833A CN114610833A CN202210070479.0A CN202210070479A CN114610833A CN 114610833 A CN114610833 A CN 114610833A CN 202210070479 A CN202210070479 A CN 202210070479A CN 114610833 A CN114610833 A CN 114610833A
- Authority
- CN
- China
- Prior art keywords
- data
- module
- unstructured
- backup
- structured
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/116—Details of conversion of file system types or formats
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data center unstructured data access method, which comprises the following steps that a control module sends a collection instruction to a data collection module, the data collected by the data collection module is analyzed and processed by a data analysis module, and the data is verified by a data quality inspection and verification module, and has the beneficial effects that: according to the invention, the data is acquired and processed through the data acquisition module, then the data is checked and processed through the data quality check module, the data processing efficiency is improved, the data is further processed through the data conversion module and the data compression module, the stability of the operation flow is improved, the data received in real time is directly pushed to the client through the data pushing module after being processed, the real-time update of the data is realized, the real-time requirement of the client on the data is met, and the user experience and the friendliness are improved.
Description
Technical Field
The invention belongs to the field of unstructured data processing, and particularly relates to an unstructured data access method for a data center.
Background
In the existing life, unstructured data are data which are irregular or incomplete in data structure, have no predefined data model and are inconvenient to express by a database two-dimensional logic table. The system comprises office documents, texts, pictures, XML, HTML, various reports, images, audio/video information and the like in all formats, and data in a computer informatization system is divided into structured data and unstructured data. Unstructured data is very diverse in format, diverse in standard, and technically more difficult to standardize and understand than structured information. Therefore, more intelligent IT technologies such as mass storage, intelligent retrieval, knowledge mining, content protection, value-added development and utilization of information are needed for storage, retrieval, distribution and utilization, and unstructured data can be obtained anywhere. The data can be obtained from the mail information, chat records and collected survey results in your company, and can also be comments on personal websites, comments in a customer relationship management system or text fields obtained from personal application programs used by your.
Disclosure of Invention
The present invention is directed to provide a method for accessing unstructured data in a data center to solve the above problems, and solve the problems mentioned in the background art.
In order to solve the above problems, the present invention provides a technical solution:
a data center unstructured data access method comprises the following steps:
s1, sending an acquisition instruction to the data acquisition module through the control module, and analyzing and processing the data acquired by the data acquisition module through the data analysis module;
and S2, verifying the data through the data quality inspection and verification module, converting the data type through the data conversion module when the data quality inspection and verification module is qualified, and cleaning the unqualified data through the data cleaning module when the data quality inspection and verification module is unqualified.
S3, compressing the data processed by the data conversion module through the data compression module, then intensively storing the data through the data storage module, and backing up the data through the data backup module;
s4, the data is installed, protected and managed through the data security management module, and then the data is directly pushed to the client through the data pushing module, so that real-time updating of the data is achieved.
Preferably, the data acquisition module determines a target file server corresponding to the acquisition task from a multi-source file server, acquires access path information of the target file server from the changed structured data, accesses the target file server based on the access path information to acquire unstructured data stored by the target file server, acquires the unstructured data into a visual analysis system, provides a uniform view angle, and organizes the data into a final service theme to be presented on a display picture.
Preferably, the data conversion module takes semi-structured data as transition, and finally completes format conversion of process data by adopting a mode of gradually converting unstructured data into semi-structured data and then into structured data, in the process from the unstructured data to the semi-structured data, WORD documents are taken as data sources, different data extraction strategies are constructed according to different card types and formats, and when data are output, different XML templates are selected, and the data are output in a specified XML document format; in the process of converting the semi-structured data into the structured data, the unstructured data is converted into the structured data by analyzing the XML document and establishing a mapping relation between the XML document and the data information.
Preferably, when the data compression module performs data compression processing, the sampled first frame data is used as a basic sample, the second frame data is compared with the first frame data through the compression comparison module to obtain a variable which changes on the two frames of data and a change value of the corresponding variable, and similarly, when the nth frame sample data and the nth-1 frame sample data are processed, the two frames of data are compared to obtain a change value variable which is mapped into a fixed memory space through a Hash algorithm, and a change difference value corresponding to the variable is stored into the corresponding memory space.
Preferably, the data storage module identifies unstructured data and generates a main label, the unstructured data is stored in blocks based on the main label, a secondary label is generated based on the mining characteristics, the secondary label is searched in each block of storage area based on the secondary label and generates a mapping, and the mapping relation is stored in the second storage area.
Preferably, the data backup module calculates and determines the performance weight corresponding to each production server according to the performance index of each production server, when one of the production servers receives a backup task for backing up data in the shared storage to the back-end server, a backup process is created for the backup task to generate a corresponding backup strategy, the backup strategy is decomposed into a plurality of sub-processes for execution, all the sub-processes are distributed to each production server according to the performance weight of each production server, all the sub-processes in each production server are executed according to respective backup strategies, and the data in the shared storage is backed up to the back-end server respectively.
Preferably, the data security management module encapsulates a communication protocol between the user terminal and the message service module file transmission module, so as to improve the storage security of the information.
Preferably, the data pushing module directly pushes the data received in real time to the client after processing the data, so that the data is updated in real time, the requirement of the client on the real-time performance of the data is met, and the user experience and the friendliness are improved.
Preferably, the data analysis module, the data acquisition module, the data quality inspection and verification module, the data conversion module, the data compression module, the data storage module, the data backup module, the data security management module, the data push module and the data cleaning module are all electrically connected with the control module
The invention has the beneficial effects that: according to the invention, the data is acquired and processed through the data acquisition module, then the data is checked and processed through the data quality check module, the data processing efficiency is improved, the data is further processed through the data conversion module and the data compression module, the stability of the operation flow is improved, the data received in real time is directly pushed to the client through the data pushing module after being processed, the real-time update of the data is realized, the real-time requirement of the client on the data is met, and the user experience and the friendliness are improved.
Drawings
For ease of illustration, the invention is described in detail by the following detailed description and the accompanying drawings.
FIG. 1 is a flow chart of the present invention;
fig. 2 is a block diagram of the present invention.
Detailed Description
As shown in fig. 1-2, the following technical solutions are adopted in the specific embodiment of the present invention:
example (b):
a data center unstructured data access method comprises the following steps:
s1, sending an acquisition instruction to the data acquisition module through the control module, and analyzing and processing the data acquired by the data acquisition module through the data analysis module;
and S2, verifying the data through the data quality inspection and verification module, converting the data type through the data conversion module when the data quality inspection and verification module is qualified, and cleaning the unqualified data through the data cleaning module when the data quality inspection and verification module is unqualified.
S3, compressing the data processed by the data conversion module through the data compression module, then intensively storing the data through the data storage module, and backing up the data through the data backup module;
s4, the data is installed, protected and managed through the data security management module, and then the data is directly pushed to the client through the data pushing module, so that real-time updating of the data is achieved.
The data acquisition module determines a target file server corresponding to an acquisition task from a multi-source file server, acquires access path information of the target file server from changed structured data, accesses the target file server based on the access path information to acquire unstructured data stored by the target file server, acquires the data into a visual analysis system, provides a uniform visual angle, and organizes the data into a final service theme to be presented on a display picture.
The data conversion module takes semi-structured data as transition, adopts a mode of gradually converting unstructured data into semi-structured data and then into structured data to finally complete format conversion of process data, takes WORD documents as data sources in the process of converting unstructured data into semi-structured data, constructs different data extraction strategies aiming at different card types and formats, selects different XML templates when outputting data, and outputs the data in a specified XML document format; in the process of converting the semi-structured data into the structured data, the unstructured data is converted into the structured data by analyzing the XML document and establishing a mapping relation between the XML document and the data information.
When the data compression module is used for compressing data, the sampled first frame data is used as a basic sample, the second frame data is compared with the first frame data through the compression comparison module to obtain variables changing on the two frames of data and change values of the corresponding variables, similarly, when the N frame of sampled data and the N-1 frame of sampled data are processed, the two frames of data are compared to obtain the change value variables, the change value variables are mapped into a fixed memory space through a Hash algorithm, and the change difference values corresponding to the variables are stored into the corresponding memory space.
The data storage module identifies unstructured data and generates a main label, the unstructured data are stored in blocks based on the main label, an auxiliary label is generated based on mining characteristics, retrieval is carried out on each storage area based on the auxiliary label and mapping is generated, and the mapping relation is stored in a second storage area.
The data backup module calculates and determines performance weights corresponding to the production servers according to performance indexes of the production servers, when one of the production servers receives a backup task for backing up data in shared storage to a back-end server, a backup process is created for the backup task to generate a corresponding backup strategy, the backup strategy is divided into a plurality of sub-processes to be executed, all the sub-processes are distributed to the production servers according to the performance weights of the production servers, all the sub-processes in the production servers are executed according to the backup strategies of the production servers, and the data in the shared storage are backed up to the back-end server respectively.
The data security management module encapsulates a communication protocol between the user terminal and the message service module file transmission module, and the storage security of the information is improved.
The data pushing module directly pushes the data received in real time to the client after processing the data, so that the data is updated in real time, the requirement of the client on the real-time performance of the data is met, and the user experience and the friendliness are improved.
The data analysis module, the data acquisition module, the data quality inspection and verification module, the data conversion module, the data compression module, the data storage module, the data backup module, the data safety management module, the data pushing module and the data cleaning module are all electrically connected with the control module.
While there have been shown and described what are at present considered to be the fundamental principles of the invention and its essential features and advantages, it will be understood by those skilled in the art that the invention is not limited by the embodiments described above, which are merely illustrative of the principles of the invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents.
Claims (9)
1. A data center unstructured data access method is characterized by comprising the following steps:
s1, sending an acquisition instruction to the data acquisition module through the control module, and analyzing and processing the data acquired by the data acquisition module through the data analysis module;
s2, verifying the data through the data quality inspection and verification module, converting the data type through the data conversion module when the data quality inspection and verification module is qualified, and cleaning the unqualified data through the data cleaning module when the data quality inspection and verification module is unqualified;
s3, compressing the data processed by the data conversion module through the data compression module, then intensively storing the data through the data storage module, and backing up the data through the data backup module;
s4, the data is installed, protected and managed through the data security management module, and then the data is directly pushed to the client through the data pushing module, so that real-time updating of the data is achieved.
2. The method as claimed in claim 1, wherein the data collection module determines a target file server corresponding to the collection task from a source file server, obtains access path information of the target file server from the changed structured data, accesses the target file server based on the access path information to collect unstructured data stored in the target file server, collects the unstructured data into a visual analysis system, provides a uniform view, and organizes the data into a final service theme to be presented on a display screen.
3. The data center unstructured data access method according to claim 1, characterized in that the data conversion module takes semi-structured data as transition, and adopts a mode of gradually converting unstructured data to semi-structured data and then to structured data to finally complete format conversion of process data, in the process of converting unstructured data to semi-structured data, a WORD document is used as a data source, different data extraction strategies are constructed for different card types and formats, and when data is output, different XML templates are selected, and the data is output in a specified XML document format; in the process of converting the semi-structured data into the structured data, the unstructured data is converted into the structured data by analyzing the XML document and establishing a mapping relation between the XML document and the data information.
4. The method as claimed in claim 1, wherein the data compression module performs data compression processing on the first frame data, the second frame data is compared with the first frame data by the compression comparison module to obtain a variable that changes in the two frames of data and a change value of the corresponding variable, and similarly, when the nth frame sample data and the N-1 th frame sample data are processed, the two frames of data are compared to obtain a change value variable, the change value variable is mapped to the fixed memory space by a Hash algorithm, and a change difference value corresponding to the variable is stored in the corresponding memory space.
5. The data center unstructured data access method of claim 1, wherein the data storage module identifies unstructured data and generates a primary label, blocks the unstructured data based on the primary label and stores the unstructured data, generates a secondary label based on mining characteristics, retrieves and generates a mapping based on the secondary label in each block of storage area, and stores the mapping relationship to a second storage area.
6. The method according to claim 1, wherein the data backup module calculates and determines the performance weight corresponding to each production server according to the performance index of each production server, when one of the production servers receives a backup task for backing up data in the shared storage to a backend server, creates a backup process for the backup task to generate a corresponding backup policy, decomposes the backup policy into a plurality of subprocesses for execution, and allocates all the subprocesses to each production server according to the performance weight of each production server, and all the subprocesses in each production server are executed according to their own backup policies and respectively back up the data in the shared storage to the backend server.
7. The method according to claim 1, wherein the data security management module encapsulates a communication protocol between the user terminal and the file transfer module of the message service module, so as to improve the storage security of the information.
8. The method for accessing the unstructured data of the data center according to claim 1, wherein the data pushing module directly pushes the data received in real time to the client after processing the data, so as to realize real-time update of the data, meet the real-time requirement of the client on the data, and improve user experience and friendliness.
9. The method according to claim 1, wherein the data analysis module, the data acquisition module, the data quality inspection and verification module, the data conversion module, the data compression module, the data storage module, the data backup module, the data security management module, the data push module and the data cleaning module are electrically connected to the control module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210070479.0A CN114610833A (en) | 2022-01-21 | 2022-01-21 | Data center unstructured data access method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210070479.0A CN114610833A (en) | 2022-01-21 | 2022-01-21 | Data center unstructured data access method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114610833A true CN114610833A (en) | 2022-06-10 |
Family
ID=81858240
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210070479.0A Pending CN114610833A (en) | 2022-01-21 | 2022-01-21 | Data center unstructured data access method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114610833A (en) |
-
2022
- 2022-01-21 CN CN202210070479.0A patent/CN114610833A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11392550B2 (en) | System and method for investigating large amounts of data | |
US20200402058A1 (en) | Systems and methods for real-time processing of data streams | |
CN110866040B (en) | User portrait generation method, device and system | |
Siddiqui et al. | Pseudo-cache-based IoT small files management framework in HDFS cluster | |
CN112632129A (en) | Code stream data management method, device and storage medium | |
US11256659B1 (en) | Systems and methods for aggregating and displaying data from multiple data sources | |
CN112988770A (en) | Method and device for updating serial number, electronic equipment and storage medium | |
CN114090529A (en) | Log management method, device, system and storage medium | |
US20230153357A1 (en) | Method of processing an observation information, electronic device and storage medium | |
CN114610833A (en) | Data center unstructured data access method | |
CN113641769B (en) | Data processing method and device | |
CN114691769A (en) | Unstructured data processing method and device of power monitoring system | |
US20200167326A1 (en) | System and method for acting on potentially incomplete data | |
CN112306992A (en) | Big data platform based on internet | |
CN112597207B (en) | Metadata management system | |
CN116610531B (en) | Method for collecting data embedded points and requesting image uploading data based on code probe | |
CN112187623B (en) | Information release management system | |
Chen et al. | Internet Engineering Task Force C. Yang, Ed. Internet-Draft Y. Liu&Y. Wang&SY. Pan, Ed. Intended status: Standards Track South China University of Technology Expires: November 28, 2020 C. Chen Inspur | |
CN117472995A (en) | Log data processing method and device and electronic equipment | |
CN117472693A (en) | Buried point data processing method, system, equipment and storage medium based on data lake | |
CN117541165A (en) | Comprehensive management method for case and zone | |
CN117171394A (en) | Data dynamic processing method and device for network collaborative manufacturing platform | |
CN117094467A (en) | Data auditing method and device, storage medium and electronic equipment | |
CN116304352A (en) | Message pushing method, device, equipment and storage medium | |
CN114936823A (en) | Time efficiency subsection monitoring early warning method, device, equipment and storage medium for distribution center |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |