CN109669965A - A kind of acquisition analysis system that supporting unstructured data and method - Google Patents
A kind of acquisition analysis system that supporting unstructured data and method Download PDFInfo
- Publication number
- CN109669965A CN109669965A CN201811345099.3A CN201811345099A CN109669965A CN 109669965 A CN109669965 A CN 109669965A CN 201811345099 A CN201811345099 A CN 201811345099A CN 109669965 A CN109669965 A CN 109669965A
- Authority
- CN
- China
- Prior art keywords
- data
- database
- unstructured
- table structure
- structured
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of acquisition analysis systems for supporting unstructured data, it include: data acquisition module, the data for obtaining Web page upload verify first data as the first data, the second data are obtained, the second data are stored in unstructured database;Data processing module carries out data cleansing to the second data, obtains third data, then third data are stored in structured database for extracting the second data from unstructured database;Data analysis module, for being analyzed and processed to the data in the data or structured database in unstructured database.The present invention stores initial data using non-structural database, then gradually original data processing for the data of structuring and is stored in structured database, so that this system can contribute to mitigate the load pressure of Web application, the efficiency of unstructured data processing is promoted.The present invention can be widely applied to data processing technique.
Description
Technical field
The present invention relates to data processing technique, especially a kind of acquisition analysis system for supporting unstructured data and side
Method.
Background technique
Currently, depending on the high speed development of big data technology, more and more Web applications are used by a user, so that Web
Using the rapid growth of data volume.With the increase of Web application throughput, traditional data storage method, which is no longer satisfied, works as
Preceding demand, thus promoted the generation of big data quantity memory technology.But most of Web analysis system is using knot at present
The database of structure, these systems are unable to that easily non-structured data are stored and analyzed, it is required that Web is answered
It is pre-processed with to the data of acquisition, exports data format identical with database structure, increase the load pressure of Web application
Power.It is therefore desirable to improve to the prior art.
Summary of the invention
In order to solve the above technical problems, it is an object of the invention to: a kind of acquisition for supporting unstructured data point is provided
Analysis system and method.
The first technical solution adopted by the present invention is:
A kind of acquisition analysis system for supporting unstructured data, comprising:
Data acquisition module, the data for obtaining Web page upload carry out first data as the first data
Verification, obtains the second data, and the second data are stored in unstructured database;
It is clear to carry out data to the second data for extracting the second data from unstructured database for data processing module
It washes, obtains third data, then third data are stored in structured database;
Data analysis module, for being carried out to the data in the data or structured database in unstructured database
Analysis processing.
Further, the unstructured database is MongoDB database, and the structured database is MySQL data
Library.
Further, the data acquisition module includes data entry element, and the data entry element is for acquiring user
The data of input are as the first data.
Further, using document as storage cell in the unstructured database.
Further, described that data cleansing is carried out to the second data, third data are obtained, are specifically included:
Obtain the first table structure of structured database;
Obtain the second table structure of the second data;
According to the first table structure, the data of the invalid field in the second table structure are deleted, third data are obtained;It is described invalid
Field refers to the field being not present in the first table structure.
Further, the data processing module extracts the second data according to the setting period from non-structural database.
Second of technical solution adopted by the present invention is:
A kind of capturing analysis method for supporting unstructured data, comprising the following steps:
The data of Web page upload are obtained as the first data;
First data are verified, the second data are obtained;
Second data are stored in unstructured database;
The second data are extracted from unstructured database;
Data cleansing is carried out to the second data, obtains third data;
Third data are stored in structured database.
Further, the unstructured database is MongoDB database, and the structured database is MySQL data
Library.
Further, described that data cleansing is carried out to the second data, third data are obtained, are specifically included:
Obtain the first table structure of structured database;
Obtain the second table structure of the second data;
According to the first table structure, the data of the invalid field in the second table structure are deleted, third data are obtained;It is described invalid
Field refers to the field being not present in the first table structure.
It is further, described that the second data are extracted from unstructured database, specifically:
The second data are extracted from unstructured database according to the period of setting.
The beneficial effects of the present invention are: the present invention stores initial data using non-structural database, then gradually
Original data processing for the data of structuring and is stored in structured database, so that this system can contribute to mitigate
The load pressure of Web application, promotes the efficiency of unstructured data processing.
Detailed description of the invention
Fig. 1 is a kind of module frame chart of the acquisition analysis system of the support unstructured data of specific embodiment of the present invention;
Fig. 2 is a kind of flow chart of the capturing analysis method of the support unstructured data of specific embodiment of the present invention.
Specific embodiment
The present invention is further detailed with specific embodiment with reference to the accompanying drawings of the specification.
Referring to Fig.1, present embodiment discloses a kind of acquisition analysis system for supporting unstructured data, which includes:
Data acquisition module, the data for obtaining Web page upload carry out first data as the first data
Verification, obtains the second data, and the second data are stored in unstructured database.In this module, uploaded due to Web page
Data the problems such as there may be type error or loss of data, therefore needed before by database purchase unstructured database
First data are verified.System can obtain time and data type of data upload etc. when all data upload
Relevant information, system need to store after standardizing to these data.Certainly, if system discovery Web page uploads
Data there are problems, can also be by artificial correction.Wherein, unstructured database stores up data using document as unit
It deposits.The data having in unstructured database in most occasion can not be used directly, it is therefore desirable to gradually to this
A little data carry out structuring processing.
It is clear to carry out data to the second data for extracting the second data from unstructured database for data processing module
It washes, obtains third data, then third data are stored in structured database.This module is mainly used for gradually from unstructured
Data are extracted in database, the data structured for then extracting these, then the data for completing structuring are stored in structure
Change in database.Extraction in the present embodiment can refer to regular extraction, for example, according to the sequencing that document is stored in,
10 documents per treatment.
Data analysis module, for being carried out to the data in the data or structured database in unstructured database
Analysis processing.The analysis processing includes reading, statistics, excavation or classification etc..In this module, if system needs to use lattice
The data of formula are counted, and data can be directly extracted from structured database.If system need to initial data into
Row data mining can also carry out data mining by called data directly from unstructured database.
As preferred embodiment, the unstructured database is MongoDB database, and the structured database is
MySQL database.Since the first data are not often the excel file of specification, for structured database, not
This class file can effectively be handled.The present embodiment uses MongoDB database as unstructured database, can be to this
The non-structured data file of class is more efficiently stored.In view of the weak data pattern of MongoDB database, one is added
There is no any influence for old table for new field.For non-structured data, the processing speed of MongoDB database is non-
It is often fast;So the flexible horizontal extension of data storage layer may be implemented using MongoDB database.
As preferred embodiment, the data acquisition module includes data entry element, and the data entry element is used
In the data that acquisition user inputs as the first data.Certainly, in the present embodiment, system can also support the number manually imported
According to as more data sources.
It is described that data cleansing is carried out to the second data as preferred embodiment, third data are obtained, are specifically included:
Obtain the first table structure of structured database;
Obtain the second table structure of the second data;
According to the first table structure, the data of the invalid field in the second table structure are deleted, third data are obtained;It is described invalid
Field refers to the field being not present in the first table structure.
Invalid field in unstructured data is removed by the present embodiment, so that unstructured data meets structuring
The data format of database.
As preferred embodiment, the data processing module extracts second from non-structural database according to the setting period
Data.Wherein, the present embodiment passes through the data in ETL periodic synchronization unstructured database into structured database, certainly,
The synchronization includes cleaning process.By regularly handling, it is ensured that the unstructured data in non-structural database can
It is processed into the data of structuring in time.
Referring to Fig. 2, a kind of capturing analysis method for supporting unstructured data, comprising the following steps:
S1, the data of Web page upload are obtained as the first data.
S2, first data are verified, obtains the second data.
S3, the second data are stored in unstructured database.
The major function of step S1 to step S3 are to verify initial data, are then stored in the data by verification
In non-structural database.Due to Web page upload data there may be type error or loss of data the problems such as,
It will need to verify the first data before in database purchase unstructured database.System all may be used when all data upload
It needs to store after standardizing to these data to obtain time and relevant informations, the system such as data type of data upload
Come.It certainly, can also be by artificial correction if there are problems for the data of system discovery Web page upload.Wherein, unstructured
Database generally stores data using document as unit.Have in unstructured database in most occasion
Data can not be used directly, it is therefore desirable to gradually carry out structuring processing to these data.
S4, the second data are extracted from unstructured database.
S5, data cleansing is carried out to the second data, obtains third data.
S6, third data are stored in structured database.
Step S4 is gradually to extract data from unstructured database, then takes out these to step S6 major function
The data structured taken, then will be in the data deposit structured database that structuring be completed.Extraction in the present embodiment, can be
Regular extraction, for example, according to the sequencing that document is stored in, 10 documents per treatment.
As preferred embodiment, the unstructured database is MongoDB database, and the structured database is
MySQL database.
As preferred embodiment, the step S5 is specifically included:
S51, the first table structure for obtaining structured database;
S52, the second table structure for obtaining the second data;
S53, according to the first table structure, delete the data of the invalid field in the second table structure, obtain third data;It is described
Invalid field refers to the field being not present in the first table structure.
Invalid field in unstructured data is removed by the present embodiment, so that unstructured data meets structuring
The data format of database.
It is described that the second data are extracted from unstructured database as preferred embodiment, specifically:
The second data are extracted from unstructured database according to the period of setting.
For the step number in above method embodiment, it is arranged only for the purposes of illustrating explanation, between step
Sequence do not do any restriction, the execution of each step in embodiment sequence can according to the understanding of those skilled in the art come into
Row is adaptively adjusted.
It is to be illustrated to preferable implementation of the invention, but the present invention is not limited to the embodiment above, it is ripe
Various equivalent deformation or replacement can also be made on the premise of without prejudice to spirit of the invention by knowing those skilled in the art, this
Equivalent deformation or replacement are all included in the scope defined by the claims of the present application a bit.
Claims (10)
1. a kind of acquisition analysis system for supporting unstructured data, it is characterised in that: include:
Data acquisition module, the data for obtaining Web page upload carry out school to first data as the first data
It tests, obtains the second data, the second data are stored in unstructured database;
Data processing module carries out data cleansing to the second data, obtains for extracting the second data from unstructured database
To third data, then third data are stored in structured database;
Data analysis module, for analyzing the data in the data or structured database in unstructured database
Processing.
2. a kind of acquisition analysis system for supporting unstructured data according to claim 1, it is characterised in that: described non-
Structured database is MongoDB database, and the structured database is MySQL database.
3. a kind of acquisition analysis system for supporting unstructured data according to claim 1, it is characterised in that: the number
It include data entry element according to acquisition module, the data entry element is used to acquire the data of user's input as the first number
According to.
4. a kind of acquisition analysis system for supporting unstructured data according to claim 1, it is characterised in that: described non-
Using document as storage cell in structured database.
5. a kind of acquisition analysis system for supporting unstructured data according to claim 1, it is characterised in that: described right
Second data carry out data cleansing, obtain third data, specifically include:
Obtain the first table structure of structured database;
Obtain the second table structure of the second data;
According to the first table structure, the data of the invalid field in the second table structure are deleted, third data are obtained;The invalid field
Refer to the field being not present in the first table structure.
6. a kind of acquisition analysis system for supporting unstructured data according to claim 1, it is characterised in that: the number
The second data are extracted from non-structural database according to the setting period according to processing module.
7. a kind of capturing analysis method for supporting unstructured data, it is characterised in that: the following steps are included:
The data of Web page upload are obtained as the first data;
First data are verified, the second data are obtained;
Second data are stored in unstructured database;
The second data are extracted from unstructured database;
Data cleansing is carried out to the second data, obtains third data;
Third data are stored in structured database.
8. a kind of capturing analysis method for supporting unstructured data according to claim 7, it is characterised in that: described non-
Structured database is MongoDB database, and the structured database is MySQL database.
9. a kind of capturing analysis method for supporting unstructured data according to claim 7, it is characterised in that: described right
Second data carry out data cleansing, obtain third data, specifically include:
Obtain the first table structure of structured database;
Obtain the second table structure of the second data;
According to the first table structure, the data of the invalid field in the second table structure are deleted, third data are obtained;The invalid field
Refer to the field being not present in the first table structure.
10. a kind of capturing analysis method for supporting unstructured data according to claim 7, it is characterised in that: described
The second data are extracted from unstructured database, specifically:
The second data are extracted from unstructured database according to the period of setting.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811345099.3A CN109669965A (en) | 2018-11-13 | 2018-11-13 | A kind of acquisition analysis system that supporting unstructured data and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811345099.3A CN109669965A (en) | 2018-11-13 | 2018-11-13 | A kind of acquisition analysis system that supporting unstructured data and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109669965A true CN109669965A (en) | 2019-04-23 |
Family
ID=66141727
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811345099.3A Pending CN109669965A (en) | 2018-11-13 | 2018-11-13 | A kind of acquisition analysis system that supporting unstructured data and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109669965A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111737268A (en) * | 2020-08-17 | 2020-10-02 | 南京百敖软件有限公司 | Data processing method based on document database |
WO2021102888A1 (en) * | 2019-11-29 | 2021-06-03 | 京东方科技集团股份有限公司 | Data processing device and method, and computer-readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103106270A (en) * | 2013-02-02 | 2013-05-15 | 深圳先进技术研究院 | Method and system of cloud data fusion |
CN104282140A (en) * | 2014-09-22 | 2015-01-14 | 同济大学 | Large-scale real-time traffic index service method and system based on distributed framework |
CN105843860A (en) * | 2016-03-17 | 2016-08-10 | 山东大学 | Microblog attention recommendation method based on parallel item-based collaborative filtering algorithm |
CN108763562A (en) * | 2018-06-04 | 2018-11-06 | 广东京信软件科技有限公司 | A kind of construction method based on big data skill upgrading data exchange efficiency |
-
2018
- 2018-11-13 CN CN201811345099.3A patent/CN109669965A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103106270A (en) * | 2013-02-02 | 2013-05-15 | 深圳先进技术研究院 | Method and system of cloud data fusion |
CN104282140A (en) * | 2014-09-22 | 2015-01-14 | 同济大学 | Large-scale real-time traffic index service method and system based on distributed framework |
CN105843860A (en) * | 2016-03-17 | 2016-08-10 | 山东大学 | Microblog attention recommendation method based on parallel item-based collaborative filtering algorithm |
CN108763562A (en) * | 2018-06-04 | 2018-11-06 | 广东京信软件科技有限公司 | A kind of construction method based on big data skill upgrading data exchange efficiency |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021102888A1 (en) * | 2019-11-29 | 2021-06-03 | 京东方科技集团股份有限公司 | Data processing device and method, and computer-readable storage medium |
CN111737268A (en) * | 2020-08-17 | 2020-10-02 | 南京百敖软件有限公司 | Data processing method based on document database |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109726657B (en) | Deep learning scene text sequence recognition method | |
US9361343B2 (en) | Method for parallel mining of temporal relations in large event file | |
CN106611015B (en) | Label processing method and device | |
CN103761236A (en) | Incremental frequent pattern increase data mining method | |
CN110245697B (en) | Surface contamination detection method, terminal device and storage medium | |
CN111582401B (en) | Sunflower seed sorting method based on double-branch convolutional neural network | |
CN105608135A (en) | Data mining method and system based on Apriori algorithm | |
CN109669965A (en) | A kind of acquisition analysis system that supporting unstructured data and method | |
CN108073687B (en) | Random walk, random walk method based on cluster, random walk device and equipment | |
CN109271987A (en) | A kind of digital electric meter number reading method, device, system, computer equipment and storage medium | |
CN112653928B (en) | Video filtering method, system and equipment based on same content | |
CN108595211B (en) | Method and apparatus for outputting data | |
CN108595593B (en) | Topic model-based conference research hotspot and development trend information analysis method | |
CN109214519B (en) | Data processing system, method and device | |
CN102654875B (en) | Method and device for automatically processing inner link of web text | |
CN109255771B (en) | Image filtering method and device | |
CN109600428A (en) | A kind of automation uploads attachment and matches associated method and apparatus | |
CN105512237A (en) | Data introduction system with complex structure | |
CN115904970A (en) | Regression testing method and equipment | |
CN111369489A (en) | Image identification method and device and terminal equipment | |
CN113468906B (en) | Graphic code extraction model construction method, identification device, equipment and medium | |
CN115114805A (en) | Information interaction pair discrete simulation method of autonomous traffic system architecture | |
Mehdi et al. | Optimized word segmentation for the word based cursive handwriting recognition | |
CN104182396A (en) | Terminal as well as device and method of optimizing description of format document content | |
CN113850265A (en) | PDF document analysis method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190423 |