CN104424309A - Unstructured data processing method based on technological media cloud computing - Google Patents
Unstructured data processing method based on technological media cloud computing Download PDFInfo
- Publication number
- CN104424309A CN104424309A CN201310399024.4A CN201310399024A CN104424309A CN 104424309 A CN104424309 A CN 104424309A CN 201310399024 A CN201310399024 A CN 201310399024A CN 104424309 A CN104424309 A CN 104424309A
- Authority
- CN
- China
- Prior art keywords
- unstructured data
- scientific
- cloud computing
- technological media
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an unstructured data processing method based on technological media cloud computing. The method includes 1, acquiring technological media information data; 2, performing distribution-type cloud storage according to the characteristics of different types; 3, calling and performing offline processing, including cleaning, duplication removal, relevance, filtering, keyword extraction and intelligent classification, on unstructured data of cloud storage of the step 2, and updating the unstructured data in the cloud storage. The method has the advantages that unstructured data solution scheme based on cloud computing is provided for the perpendicular field of technological media, owing to accurate industrial positioning, the frequently-used key words are analyzed deeply, the accuracy of information can be improved, partial noise words can be removed, and data processing efficiency can be improved.
Description
Technical field
The present invention relates to microcomputer data processing field, particularly relate to a kind of based on scientific and technological media cloud computing unstructured data disposal route.
Background technology
Cloud computing is the increase of related service based on internet, use and delivery mode, is usually directed to provide dynamically easily expansion by internet and is often virtualized resource.Narrow sense cloud computing refers to payment and the using forestland of IT infrastructure, refers to obtain resource requirement by network in the mode as required, easily expanded; Broad sense cloud computing refers to payment and the using forestland of service, refer to obtain required service by network in the mode as required, easily expanded, it is relevant with software, internet that this service can be IT, also can be other service, mean that computing power also be can be used as a kind of commodity and circulated by internet.
Unstructured data management is challenged for the theory and methods in conventional information field proposes and becomes important new research direction.Because unstructured data data type is enriched, complex structure, data structure that is clear and definite, unified definition is not had to retrain, in addition the data scale of its magnanimity, highly dynamic data characteristic, various application scenarios, unified associating requirements for access, makes unstructured data manage and faces huge challenge.Because unstructured data kind is different from each other, often kind of data type, with distinctive data manipulation, by EXPANDING DISPLAY AREA data model, supports the valid function of different unstructured data; Based on above-mentioned consideration, each major company, around dissimilar unstructured data types, defines and realizes peculiar operation, and connected applications field, achieve unstructured data management system.
The subject matter managed based on the unstructured data of object model comprises: system lacks the optimization execution mechanism of object method at present, and in magnanimity environment, the efficiency of data processing is difficult to be guaranteed; System stresses the different demands processing special object, in the inquiry of process uniform data, there is certain difficulty; Some system realizes based on relational database, is limited to the framework of relational database, needs the problems such as strict consideration con current control, reduces the efficiency of unstructured data process further.Data integration correlation technique lays particular emphasis on sharing of isomeric data and inquiry, can reduce space cost in unstructured data management system, improves Query Result quality.Pattern match in data integration, query rewrite etc. make system constructing cost and query processing cost prohibitive.Data space overcomes the subproblem in data integration, but the model of data space inside is too complicated, does not support the data management of magnanimity.Meanwhile, the distributed management framework of keyword query mode and mass data is not discussed in data integrated system.
Analyze in conjunction with above, it is important to note that at present, the more existing unstructured data treatment technology based on cloud computing, its scope is still more wide in range, also deeply inadequate to the precision of data; Meanwhile, the existing unstructured data process based on cloud computing only relates to the method realized, a whole set of solution not from software and hardware configuration to implementation method.Therefore, for above aspect, need to make effective innovation.
Summary of the invention
The object of this invention is to provide a kind of unstructured data treatment technology in conjunction with cloud computing and provide hardware configuration, system architecture, data processing, result feedback etc. full-range based on scientific and technological media cloud computing unstructured data disposal route, to solve many deficiencies of prior art.
Object of the present invention carrys out specific implementation by the following technical programs:
A kind of based on scientific and technological media cloud computing unstructured data disposal route, form primarily of following steps:
(1), carry out the acquisition of scientific and technological media information data, obtain pending unstructured data;
(2), to unstructured data, distributed cloud storage is carried out according to dissimilar feature;
(3), to the unstructured data that step (2) medium cloud stores, processed offline is carried out after calling, processed offline comprises: cleaning, re-scheduling, association, filtration, keyword extraction and classifying intelligently, is then updated to by the unstructured data after processed offline in cloud storage;
(4), according to the feature of unstructured data, respond receiving information retrieval requests, result for retrieval sequence is shown according to the feature of unstructured data.
In step (1), the channel of scientific and technological media information data acquisition comprises manual entry and internet captures two kinds of modes.
Step (3), that carries out unstructured data calls and subsequent processed offline, is completed by large-scale distributed computing platform.
For step (4), result for retrieval sequence stores in the buffer simultaneously.
Step (4), is updated directly into the result for retrieval sequence in buffer memory in cloud storage or carries out cloud storage again after processed offline.
Beneficial effect based on scientific and technological media cloud computing unstructured data disposal route of the present invention is: the method is the unstructured data solution based on cloud computing in the vertical field being positioned at scientific and technological media, due to the precise positioning to industry, to the in-depth analysis of conventional keyword, the precision of information can be improved, the noise word of energy exclusive segment, improves the efficiency of data processing simultaneously; Be embodied in:
One, adopts the architecture of the sub-storage systems such as loose couplings destructuring source data cloud storage system, the characteristic cloud storage system of non-textual class unstructured data and the characteristic cloud system of text class unstructured data;
Its two, by can the independent query processing module of multiple deployment to the scheduling of the sub-storage system of bottom and polymorphic type feature extraction submodule, the source data of association unstructured data and characteristic;
Its three, realize the management function such as the storage to multiple unstructured data, acquisition, inquiry towards source data and characteristic with unified pattern;
All there is the advantage of enhanced scalability in the system architecture of formation and the content of management etc.
Accompanying drawing explanation
According to drawings and embodiments the present invention is described in further detail below.
Fig. 1 is based on scientific and technological media cloud computing unstructured data process flow figure described in the embodiment of the present invention.
Embodiment
As shown in Figure 1, a kind of based on scientific and technological media cloud computing unstructured data disposal route described in the embodiment of the present invention, form primarily of following steps:
(1), carry out the acquisition of scientific and technological media information data, obtain pending unstructured data;
(2), to unstructured data, distributed cloud storage is carried out according to dissimilar feature; This step requires to adopt the architecture supporting Large Copacity, high performance Hadoop+HBase
(3), to the unstructured data that step (2) medium cloud stores, processed offline is carried out after calling, processed offline comprises: cleaning, re-scheduling, association, filtration, keyword extraction and classifying intelligently, is then updated to by the unstructured data after processed offline in cloud storage;
(4), according to the feature of unstructured data, respond receiving information retrieval requests, result for retrieval sequence is shown according to the feature of unstructured data, and each result in described result for retrieval sequence is linked to corresponding data source respectively.
In step (1), the channel of scientific and technological media information data acquisition comprises manual entry and internet captures two kinds of modes.
Step (3), that carries out unstructured data calls and subsequent processed offline, is completed by large-scale distributed computing platform.
For step (4), result for retrieval sequence stores in the buffer simultaneously.
Step (4), is updated directly into the result for retrieval sequence in buffer memory in cloud storage or carries out cloud storage again after processed offline.Like this, when before the Data Update of not carrying out being correlated with, when same information retrieval requests, do not need carry out cloud computing and directly result for retrieval sequence issued requesting party.
Claims (5)
1., based on a scientific and technological media cloud computing unstructured data disposal route, it is characterized in that, form primarily of following steps:
(1), carry out the acquisition of scientific and technological media information data, obtain pending unstructured data;
(2), to unstructured data, distributed cloud storage is carried out according to dissimilar feature;
(3), to the unstructured data that step (2) medium cloud stores, processed offline is carried out after calling, processed offline comprises: cleaning, re-scheduling, association, filtration, keyword extraction and classifying intelligently, is then updated to by the unstructured data after processed offline in cloud storage;
(4), according to the feature of unstructured data, respond receiving information retrieval requests, result for retrieval sequence is shown according to the feature of unstructured data.
2. as claimed in claim 1 a kind of based on scientific and technological media cloud computing unstructured data disposal route, it is characterized in that: in step (1), the channel of scientific and technological media information data acquisition comprises manual entry and internet captures two kinds of modes.
3. as claimed in claim 1 a kind of based on scientific and technological media cloud computing unstructured data disposal route, it is characterized in that: step (3), that carries out unstructured data calls and subsequent processed offline, is completed by large-scale distributed computing platform.
4. as claimed in claim 1 a kind of based on scientific and technological media cloud computing unstructured data disposal route, it is characterized in that: for step (4), result for retrieval sequence stores in the buffer simultaneously.
5. one as claimed in claim 4 is based on scientific and technological media cloud computing unstructured data disposal route, it is characterized in that: step (4), the result for retrieval sequence in buffer memory is updated directly in cloud storage or carries out cloud storage again after processed offline.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310399024.4A CN104424309A (en) | 2013-09-05 | 2013-09-05 | Unstructured data processing method based on technological media cloud computing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310399024.4A CN104424309A (en) | 2013-09-05 | 2013-09-05 | Unstructured data processing method based on technological media cloud computing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104424309A true CN104424309A (en) | 2015-03-18 |
Family
ID=52973286
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310399024.4A Pending CN104424309A (en) | 2013-09-05 | 2013-09-05 | Unstructured data processing method based on technological media cloud computing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104424309A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105224563A (en) * | 2014-06-26 | 2016-01-06 | 清控科创控股股份有限公司 | A kind of scientific and technological media cloud computing unstructured data solution |
CN106649298A (en) * | 2015-07-22 | 2017-05-10 | 中国科学院微电子研究所 | Method and system for carrying out interdisciplinary association establishment on the basis of the Internet of Things |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101908191A (en) * | 2010-08-03 | 2010-12-08 | 深圳市她秀时尚电子商务有限公司 | Data analysis method and system for e-commerce |
CN102521767A (en) * | 2011-12-13 | 2012-06-27 | 亿赞普(北京)科技有限公司 | Method and system for publishing network advertising information |
CN102012912B (en) * | 2010-11-19 | 2012-08-22 | 清华大学 | Management method for unstructured data based on cloud computing environment |
CN102937976A (en) * | 2012-10-17 | 2013-02-20 | 北京奇虎科技有限公司 | Drop-down prompting method and apparatus based on input prefix |
CN103268336A (en) * | 2013-05-13 | 2013-08-28 | 刘峰 | Fast data and big data combined data processing method and system |
-
2013
- 2013-09-05 CN CN201310399024.4A patent/CN104424309A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101908191A (en) * | 2010-08-03 | 2010-12-08 | 深圳市她秀时尚电子商务有限公司 | Data analysis method and system for e-commerce |
CN102012912B (en) * | 2010-11-19 | 2012-08-22 | 清华大学 | Management method for unstructured data based on cloud computing environment |
CN102521767A (en) * | 2011-12-13 | 2012-06-27 | 亿赞普(北京)科技有限公司 | Method and system for publishing network advertising information |
CN102937976A (en) * | 2012-10-17 | 2013-02-20 | 北京奇虎科技有限公司 | Drop-down prompting method and apparatus based on input prefix |
CN103268336A (en) * | 2013-05-13 | 2013-08-28 | 刘峰 | Fast data and big data combined data processing method and system |
Non-Patent Citations (1)
Title |
---|
张贤善主编: "《工业企业采购供应管理》", 31 October 2009 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105224563A (en) * | 2014-06-26 | 2016-01-06 | 清控科创控股股份有限公司 | A kind of scientific and technological media cloud computing unstructured data solution |
CN106649298A (en) * | 2015-07-22 | 2017-05-10 | 中国科学院微电子研究所 | Method and system for carrying out interdisciplinary association establishment on the basis of the Internet of Things |
CN106649298B (en) * | 2015-07-22 | 2021-01-22 | 中国科学院微电子研究所 | Cross-domain association establishment method and system based on Internet of things |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109446279A (en) | Based on neo4j big data genetic connection management method, system, equipment and storage medium | |
CN109063196B (en) | Data processing method and device, electronic equipment and computer readable storage medium | |
CN108536778B (en) | Data application sharing platform and method | |
CN103646073A (en) | Condition query optimizing method based on HBase table | |
CN107729399B (en) | Data processing method and device | |
CN106126601A (en) | A kind of social security distributed preprocess method of big data and system | |
CN103440288A (en) | Big data storage method and device | |
CN103942330A (en) | Method and system for processing big data | |
Mishra et al. | Structured and unstructured big data analytics | |
CN111723161A (en) | Data processing method, device and equipment | |
CN104239470A (en) | Distributed environment-oriented space data compound processing system and method | |
CN112506887B (en) | Vehicle terminal CAN bus data processing method and device | |
CN108319604B (en) | Optimization method for association of large and small tables in hive | |
Castro-Medina et al. | Application of data fragmentation and replication methods in the cloud: a review | |
CN113810466A (en) | Middleware for multi-source heterogeneous data, system and method applying middleware | |
CN104424309A (en) | Unstructured data processing method based on technological media cloud computing | |
WO2014180411A1 (en) | Distributed index generation method and device | |
Gupta et al. | Efficient query analysis and performance evaluation of the NoSQL data store for bigdata | |
KR20170130178A (en) | In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment | |
CN111241142A (en) | Scientific and technological achievement conversion pushing system and method | |
MahmoudiNasab et al. | AdaptRDF: adaptive storage management for RDF databases | |
CN113590651B (en) | HQL-based cross-cluster data processing system and method | |
US10657126B2 (en) | Meta-join and meta-group-by indexes for big data | |
Bharti et al. | A Review on Big Data Analytics Tools in Context with Scalability | |
CN106446039B (en) | Aggregation type big data query method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150318 |