CN109669965A - A kind of acquisition analysis system that supporting unstructured data and method - Google Patents

A kind of acquisition analysis system that supporting unstructured data and method Download PDF

Info

Publication number
CN109669965A
CN109669965A CN201811345099.3A CN201811345099A CN109669965A CN 109669965 A CN109669965 A CN 109669965A CN 201811345099 A CN201811345099 A CN 201811345099A CN 109669965 A CN109669965 A CN 109669965A
Authority
CN
China
Prior art keywords
data
database
unstructured
table structure
structured
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811345099.3A
Other languages
Chinese (zh)
Inventor
颜文德
徐�明
叶祖锋
王华松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Ois Mdt Infotech Ltd
Original Assignee
Guangzhou Ois Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Ois Mdt Infotech Ltd filed Critical Guangzhou Ois Mdt Infotech Ltd
Priority to CN201811345099.3A priority Critical patent/CN109669965A/en
Publication of CN109669965A publication Critical patent/CN109669965A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of acquisition analysis systems for supporting unstructured data, it include: data acquisition module, the data for obtaining Web page upload verify first data as the first data, the second data are obtained, the second data are stored in unstructured database;Data processing module carries out data cleansing to the second data, obtains third data, then third data are stored in structured database for extracting the second data from unstructured database;Data analysis module, for being analyzed and processed to the data in the data or structured database in unstructured database.The present invention stores initial data using non-structural database, then gradually original data processing for the data of structuring and is stored in structured database, so that this system can contribute to mitigate the load pressure of Web application, the efficiency of unstructured data processing is promoted.The present invention can be widely applied to data processing technique.

Description

A kind of acquisition analysis system that supporting unstructured data and method
Technical field
The present invention relates to data processing technique, especially a kind of acquisition analysis system for supporting unstructured data and side Method.
Background technique
Currently, depending on the high speed development of big data technology, more and more Web applications are used by a user, so that Web Using the rapid growth of data volume.With the increase of Web application throughput, traditional data storage method, which is no longer satisfied, works as Preceding demand, thus promoted the generation of big data quantity memory technology.But most of Web analysis system is using knot at present The database of structure, these systems are unable to that easily non-structured data are stored and analyzed, it is required that Web is answered It is pre-processed with to the data of acquisition, exports data format identical with database structure, increase the load pressure of Web application Power.It is therefore desirable to improve to the prior art.
Summary of the invention
In order to solve the above technical problems, it is an object of the invention to: a kind of acquisition for supporting unstructured data point is provided Analysis system and method.
The first technical solution adopted by the present invention is:
A kind of acquisition analysis system for supporting unstructured data, comprising:
Data acquisition module, the data for obtaining Web page upload carry out first data as the first data Verification, obtains the second data, and the second data are stored in unstructured database;
It is clear to carry out data to the second data for extracting the second data from unstructured database for data processing module It washes, obtains third data, then third data are stored in structured database;
Data analysis module, for being carried out to the data in the data or structured database in unstructured database Analysis processing.
Further, the unstructured database is MongoDB database, and the structured database is MySQL data Library.
Further, the data acquisition module includes data entry element, and the data entry element is for acquiring user The data of input are as the first data.
Further, using document as storage cell in the unstructured database.
Further, described that data cleansing is carried out to the second data, third data are obtained, are specifically included:
Obtain the first table structure of structured database;
Obtain the second table structure of the second data;
According to the first table structure, the data of the invalid field in the second table structure are deleted, third data are obtained;It is described invalid Field refers to the field being not present in the first table structure.
Further, the data processing module extracts the second data according to the setting period from non-structural database.
Second of technical solution adopted by the present invention is:
A kind of capturing analysis method for supporting unstructured data, comprising the following steps:
The data of Web page upload are obtained as the first data;
First data are verified, the second data are obtained;
Second data are stored in unstructured database;
The second data are extracted from unstructured database;
Data cleansing is carried out to the second data, obtains third data;
Third data are stored in structured database.
Further, the unstructured database is MongoDB database, and the structured database is MySQL data Library.
Further, described that data cleansing is carried out to the second data, third data are obtained, are specifically included:
Obtain the first table structure of structured database;
Obtain the second table structure of the second data;
According to the first table structure, the data of the invalid field in the second table structure are deleted, third data are obtained;It is described invalid Field refers to the field being not present in the first table structure.
It is further, described that the second data are extracted from unstructured database, specifically:
The second data are extracted from unstructured database according to the period of setting.
The beneficial effects of the present invention are: the present invention stores initial data using non-structural database, then gradually Original data processing for the data of structuring and is stored in structured database, so that this system can contribute to mitigate The load pressure of Web application, promotes the efficiency of unstructured data processing.
Detailed description of the invention
Fig. 1 is a kind of module frame chart of the acquisition analysis system of the support unstructured data of specific embodiment of the present invention;
Fig. 2 is a kind of flow chart of the capturing analysis method of the support unstructured data of specific embodiment of the present invention.
Specific embodiment
The present invention is further detailed with specific embodiment with reference to the accompanying drawings of the specification.
Referring to Fig.1, present embodiment discloses a kind of acquisition analysis system for supporting unstructured data, which includes:
Data acquisition module, the data for obtaining Web page upload carry out first data as the first data Verification, obtains the second data, and the second data are stored in unstructured database.In this module, uploaded due to Web page Data the problems such as there may be type error or loss of data, therefore needed before by database purchase unstructured database First data are verified.System can obtain time and data type of data upload etc. when all data upload Relevant information, system need to store after standardizing to these data.Certainly, if system discovery Web page uploads Data there are problems, can also be by artificial correction.Wherein, unstructured database stores up data using document as unit It deposits.The data having in unstructured database in most occasion can not be used directly, it is therefore desirable to gradually to this A little data carry out structuring processing.
It is clear to carry out data to the second data for extracting the second data from unstructured database for data processing module It washes, obtains third data, then third data are stored in structured database.This module is mainly used for gradually from unstructured Data are extracted in database, the data structured for then extracting these, then the data for completing structuring are stored in structure Change in database.Extraction in the present embodiment can refer to regular extraction, for example, according to the sequencing that document is stored in, 10 documents per treatment.
Data analysis module, for being carried out to the data in the data or structured database in unstructured database Analysis processing.The analysis processing includes reading, statistics, excavation or classification etc..In this module, if system needs to use lattice The data of formula are counted, and data can be directly extracted from structured database.If system need to initial data into Row data mining can also carry out data mining by called data directly from unstructured database.
As preferred embodiment, the unstructured database is MongoDB database, and the structured database is MySQL database.Since the first data are not often the excel file of specification, for structured database, not This class file can effectively be handled.The present embodiment uses MongoDB database as unstructured database, can be to this The non-structured data file of class is more efficiently stored.In view of the weak data pattern of MongoDB database, one is added There is no any influence for old table for new field.For non-structured data, the processing speed of MongoDB database is non- It is often fast;So the flexible horizontal extension of data storage layer may be implemented using MongoDB database.
As preferred embodiment, the data acquisition module includes data entry element, and the data entry element is used In the data that acquisition user inputs as the first data.Certainly, in the present embodiment, system can also support the number manually imported According to as more data sources.
It is described that data cleansing is carried out to the second data as preferred embodiment, third data are obtained, are specifically included:
Obtain the first table structure of structured database;
Obtain the second table structure of the second data;
According to the first table structure, the data of the invalid field in the second table structure are deleted, third data are obtained;It is described invalid Field refers to the field being not present in the first table structure.
Invalid field in unstructured data is removed by the present embodiment, so that unstructured data meets structuring The data format of database.
As preferred embodiment, the data processing module extracts second from non-structural database according to the setting period Data.Wherein, the present embodiment passes through the data in ETL periodic synchronization unstructured database into structured database, certainly, The synchronization includes cleaning process.By regularly handling, it is ensured that the unstructured data in non-structural database can It is processed into the data of structuring in time.
Referring to Fig. 2, a kind of capturing analysis method for supporting unstructured data, comprising the following steps:
S1, the data of Web page upload are obtained as the first data.
S2, first data are verified, obtains the second data.
S3, the second data are stored in unstructured database.
The major function of step S1 to step S3 are to verify initial data, are then stored in the data by verification In non-structural database.Due to Web page upload data there may be type error or loss of data the problems such as, It will need to verify the first data before in database purchase unstructured database.System all may be used when all data upload It needs to store after standardizing to these data to obtain time and relevant informations, the system such as data type of data upload Come.It certainly, can also be by artificial correction if there are problems for the data of system discovery Web page upload.Wherein, unstructured Database generally stores data using document as unit.Have in unstructured database in most occasion Data can not be used directly, it is therefore desirable to gradually carry out structuring processing to these data.
S4, the second data are extracted from unstructured database.
S5, data cleansing is carried out to the second data, obtains third data.
S6, third data are stored in structured database.
Step S4 is gradually to extract data from unstructured database, then takes out these to step S6 major function The data structured taken, then will be in the data deposit structured database that structuring be completed.Extraction in the present embodiment, can be Regular extraction, for example, according to the sequencing that document is stored in, 10 documents per treatment.
As preferred embodiment, the unstructured database is MongoDB database, and the structured database is MySQL database.
As preferred embodiment, the step S5 is specifically included:
S51, the first table structure for obtaining structured database;
S52, the second table structure for obtaining the second data;
S53, according to the first table structure, delete the data of the invalid field in the second table structure, obtain third data;It is described Invalid field refers to the field being not present in the first table structure.
Invalid field in unstructured data is removed by the present embodiment, so that unstructured data meets structuring The data format of database.
It is described that the second data are extracted from unstructured database as preferred embodiment, specifically:
The second data are extracted from unstructured database according to the period of setting.
For the step number in above method embodiment, it is arranged only for the purposes of illustrating explanation, between step Sequence do not do any restriction, the execution of each step in embodiment sequence can according to the understanding of those skilled in the art come into Row is adaptively adjusted.
It is to be illustrated to preferable implementation of the invention, but the present invention is not limited to the embodiment above, it is ripe Various equivalent deformation or replacement can also be made on the premise of without prejudice to spirit of the invention by knowing those skilled in the art, this Equivalent deformation or replacement are all included in the scope defined by the claims of the present application a bit.

Claims (10)

1. a kind of acquisition analysis system for supporting unstructured data, it is characterised in that: include:
Data acquisition module, the data for obtaining Web page upload carry out school to first data as the first data It tests, obtains the second data, the second data are stored in unstructured database;
Data processing module carries out data cleansing to the second data, obtains for extracting the second data from unstructured database To third data, then third data are stored in structured database;
Data analysis module, for analyzing the data in the data or structured database in unstructured database Processing.
2. a kind of acquisition analysis system for supporting unstructured data according to claim 1, it is characterised in that: described non- Structured database is MongoDB database, and the structured database is MySQL database.
3. a kind of acquisition analysis system for supporting unstructured data according to claim 1, it is characterised in that: the number It include data entry element according to acquisition module, the data entry element is used to acquire the data of user's input as the first number According to.
4. a kind of acquisition analysis system for supporting unstructured data according to claim 1, it is characterised in that: described non- Using document as storage cell in structured database.
5. a kind of acquisition analysis system for supporting unstructured data according to claim 1, it is characterised in that: described right Second data carry out data cleansing, obtain third data, specifically include:
Obtain the first table structure of structured database;
Obtain the second table structure of the second data;
According to the first table structure, the data of the invalid field in the second table structure are deleted, third data are obtained;The invalid field Refer to the field being not present in the first table structure.
6. a kind of acquisition analysis system for supporting unstructured data according to claim 1, it is characterised in that: the number The second data are extracted from non-structural database according to the setting period according to processing module.
7. a kind of capturing analysis method for supporting unstructured data, it is characterised in that: the following steps are included:
The data of Web page upload are obtained as the first data;
First data are verified, the second data are obtained;
Second data are stored in unstructured database;
The second data are extracted from unstructured database;
Data cleansing is carried out to the second data, obtains third data;
Third data are stored in structured database.
8. a kind of capturing analysis method for supporting unstructured data according to claim 7, it is characterised in that: described non- Structured database is MongoDB database, and the structured database is MySQL database.
9. a kind of capturing analysis method for supporting unstructured data according to claim 7, it is characterised in that: described right Second data carry out data cleansing, obtain third data, specifically include:
Obtain the first table structure of structured database;
Obtain the second table structure of the second data;
According to the first table structure, the data of the invalid field in the second table structure are deleted, third data are obtained;The invalid field Refer to the field being not present in the first table structure.
10. a kind of capturing analysis method for supporting unstructured data according to claim 7, it is characterised in that: described The second data are extracted from unstructured database, specifically:
The second data are extracted from unstructured database according to the period of setting.
CN201811345099.3A 2018-11-13 2018-11-13 A kind of acquisition analysis system that supporting unstructured data and method Pending CN109669965A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811345099.3A CN109669965A (en) 2018-11-13 2018-11-13 A kind of acquisition analysis system that supporting unstructured data and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811345099.3A CN109669965A (en) 2018-11-13 2018-11-13 A kind of acquisition analysis system that supporting unstructured data and method

Publications (1)

Publication Number Publication Date
CN109669965A true CN109669965A (en) 2019-04-23

Family

ID=66141727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811345099.3A Pending CN109669965A (en) 2018-11-13 2018-11-13 A kind of acquisition analysis system that supporting unstructured data and method

Country Status (1)

Country Link
CN (1) CN109669965A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737268A (en) * 2020-08-17 2020-10-02 南京百敖软件有限公司 Data processing method based on document database
WO2021102888A1 (en) * 2019-11-29 2021-06-03 京东方科技集团股份有限公司 Data processing device and method, and computer-readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106270A (en) * 2013-02-02 2013-05-15 深圳先进技术研究院 Method and system of cloud data fusion
CN104282140A (en) * 2014-09-22 2015-01-14 同济大学 Large-scale real-time traffic index service method and system based on distributed framework
CN105843860A (en) * 2016-03-17 2016-08-10 山东大学 Microblog attention recommendation method based on parallel item-based collaborative filtering algorithm
CN108763562A (en) * 2018-06-04 2018-11-06 广东京信软件科技有限公司 A kind of construction method based on big data skill upgrading data exchange efficiency

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106270A (en) * 2013-02-02 2013-05-15 深圳先进技术研究院 Method and system of cloud data fusion
CN104282140A (en) * 2014-09-22 2015-01-14 同济大学 Large-scale real-time traffic index service method and system based on distributed framework
CN105843860A (en) * 2016-03-17 2016-08-10 山东大学 Microblog attention recommendation method based on parallel item-based collaborative filtering algorithm
CN108763562A (en) * 2018-06-04 2018-11-06 广东京信软件科技有限公司 A kind of construction method based on big data skill upgrading data exchange efficiency

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021102888A1 (en) * 2019-11-29 2021-06-03 京东方科技集团股份有限公司 Data processing device and method, and computer-readable storage medium
CN111737268A (en) * 2020-08-17 2020-10-02 南京百敖软件有限公司 Data processing method based on document database

Similar Documents

Publication Publication Date Title
CN109726657B (en) Deep learning scene text sequence recognition method
US9361343B2 (en) Method for parallel mining of temporal relations in large event file
CN106611015B (en) Label processing method and device
CN103761236A (en) Incremental frequent pattern increase data mining method
CN110245697B (en) Surface contamination detection method, terminal device and storage medium
CN111582401B (en) Sunflower seed sorting method based on double-branch convolutional neural network
CN105608135A (en) Data mining method and system based on Apriori algorithm
CN109669965A (en) A kind of acquisition analysis system that supporting unstructured data and method
CN108073687B (en) Random walk, random walk method based on cluster, random walk device and equipment
CN109271987A (en) A kind of digital electric meter number reading method, device, system, computer equipment and storage medium
CN112653928B (en) Video filtering method, system and equipment based on same content
CN108595211B (en) Method and apparatus for outputting data
CN108595593B (en) Topic model-based conference research hotspot and development trend information analysis method
CN109214519B (en) Data processing system, method and device
CN102654875B (en) Method and device for automatically processing inner link of web text
CN109255771B (en) Image filtering method and device
CN109600428A (en) A kind of automation uploads attachment and matches associated method and apparatus
CN105512237A (en) Data introduction system with complex structure
CN115904970A (en) Regression testing method and equipment
CN111369489A (en) Image identification method and device and terminal equipment
CN113468906B (en) Graphic code extraction model construction method, identification device, equipment and medium
CN115114805A (en) Information interaction pair discrete simulation method of autonomous traffic system architecture
Mehdi et al. Optimized word segmentation for the word based cursive handwriting recognition
CN104182396A (en) Terminal as well as device and method of optimizing description of format document content
CN113850265A (en) PDF document analysis method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190423