CN104424309A - Unstructured data processing method based on technological media cloud computing - Google Patents

Unstructured data processing method based on technological media cloud computing Download PDF

Info

Publication number
CN104424309A
CN104424309A CN201310399024.4A CN201310399024A CN104424309A CN 104424309 A CN104424309 A CN 104424309A CN 201310399024 A CN201310399024 A CN 201310399024A CN 104424309 A CN104424309 A CN 104424309A
Authority
CN
China
Prior art keywords
unstructured data
scientific
cloud computing
technological media
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310399024.4A
Other languages
Chinese (zh)
Inventor
渠继永
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TTTH HOLDINGS Co Ltd
Original Assignee
TTTH HOLDINGS Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TTTH HOLDINGS Co Ltd filed Critical TTTH HOLDINGS Co Ltd
Priority to CN201310399024.4A priority Critical patent/CN104424309A/en
Publication of CN104424309A publication Critical patent/CN104424309A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an unstructured data processing method based on technological media cloud computing. The method includes 1, acquiring technological media information data; 2, performing distribution-type cloud storage according to the characteristics of different types; 3, calling and performing offline processing, including cleaning, duplication removal, relevance, filtering, keyword extraction and intelligent classification, on unstructured data of cloud storage of the step 2, and updating the unstructured data in the cloud storage. The method has the advantages that unstructured data solution scheme based on cloud computing is provided for the perpendicular field of technological media, owing to accurate industrial positioning, the frequently-used key words are analyzed deeply, the accuracy of information can be improved, partial noise words can be removed, and data processing efficiency can be improved.

Description

A kind of based on scientific and technological media cloud computing unstructured data disposal route
Technical field
The present invention relates to microcomputer data processing field, particularly relate to a kind of based on scientific and technological media cloud computing unstructured data disposal route.
Background technology
Cloud computing is the increase of related service based on internet, use and delivery mode, is usually directed to provide dynamically easily expansion by internet and is often virtualized resource.Narrow sense cloud computing refers to payment and the using forestland of IT infrastructure, refers to obtain resource requirement by network in the mode as required, easily expanded; Broad sense cloud computing refers to payment and the using forestland of service, refer to obtain required service by network in the mode as required, easily expanded, it is relevant with software, internet that this service can be IT, also can be other service, mean that computing power also be can be used as a kind of commodity and circulated by internet.
Unstructured data management is challenged for the theory and methods in conventional information field proposes and becomes important new research direction.Because unstructured data data type is enriched, complex structure, data structure that is clear and definite, unified definition is not had to retrain, in addition the data scale of its magnanimity, highly dynamic data characteristic, various application scenarios, unified associating requirements for access, makes unstructured data manage and faces huge challenge.Because unstructured data kind is different from each other, often kind of data type, with distinctive data manipulation, by EXPANDING DISPLAY AREA data model, supports the valid function of different unstructured data; Based on above-mentioned consideration, each major company, around dissimilar unstructured data types, defines and realizes peculiar operation, and connected applications field, achieve unstructured data management system.
The subject matter managed based on the unstructured data of object model comprises: system lacks the optimization execution mechanism of object method at present, and in magnanimity environment, the efficiency of data processing is difficult to be guaranteed; System stresses the different demands processing special object, in the inquiry of process uniform data, there is certain difficulty; Some system realizes based on relational database, is limited to the framework of relational database, needs the problems such as strict consideration con current control, reduces the efficiency of unstructured data process further.Data integration correlation technique lays particular emphasis on sharing of isomeric data and inquiry, can reduce space cost in unstructured data management system, improves Query Result quality.Pattern match in data integration, query rewrite etc. make system constructing cost and query processing cost prohibitive.Data space overcomes the subproblem in data integration, but the model of data space inside is too complicated, does not support the data management of magnanimity.Meanwhile, the distributed management framework of keyword query mode and mass data is not discussed in data integrated system.
Analyze in conjunction with above, it is important to note that at present, the more existing unstructured data treatment technology based on cloud computing, its scope is still more wide in range, also deeply inadequate to the precision of data; Meanwhile, the existing unstructured data process based on cloud computing only relates to the method realized, a whole set of solution not from software and hardware configuration to implementation method.Therefore, for above aspect, need to make effective innovation.
Summary of the invention
The object of this invention is to provide a kind of unstructured data treatment technology in conjunction with cloud computing and provide hardware configuration, system architecture, data processing, result feedback etc. full-range based on scientific and technological media cloud computing unstructured data disposal route, to solve many deficiencies of prior art.
Object of the present invention carrys out specific implementation by the following technical programs:
A kind of based on scientific and technological media cloud computing unstructured data disposal route, form primarily of following steps:
(1), carry out the acquisition of scientific and technological media information data, obtain pending unstructured data;
(2), to unstructured data, distributed cloud storage is carried out according to dissimilar feature;
(3), to the unstructured data that step (2) medium cloud stores, processed offline is carried out after calling, processed offline comprises: cleaning, re-scheduling, association, filtration, keyword extraction and classifying intelligently, is then updated to by the unstructured data after processed offline in cloud storage;
(4), according to the feature of unstructured data, respond receiving information retrieval requests, result for retrieval sequence is shown according to the feature of unstructured data.
In step (1), the channel of scientific and technological media information data acquisition comprises manual entry and internet captures two kinds of modes.
Step (3), that carries out unstructured data calls and subsequent processed offline, is completed by large-scale distributed computing platform.
For step (4), result for retrieval sequence stores in the buffer simultaneously.
Step (4), is updated directly into the result for retrieval sequence in buffer memory in cloud storage or carries out cloud storage again after processed offline.
Beneficial effect based on scientific and technological media cloud computing unstructured data disposal route of the present invention is: the method is the unstructured data solution based on cloud computing in the vertical field being positioned at scientific and technological media, due to the precise positioning to industry, to the in-depth analysis of conventional keyword, the precision of information can be improved, the noise word of energy exclusive segment, improves the efficiency of data processing simultaneously; Be embodied in:
One, adopts the architecture of the sub-storage systems such as loose couplings destructuring source data cloud storage system, the characteristic cloud storage system of non-textual class unstructured data and the characteristic cloud system of text class unstructured data;
Its two, by can the independent query processing module of multiple deployment to the scheduling of the sub-storage system of bottom and polymorphic type feature extraction submodule, the source data of association unstructured data and characteristic;
Its three, realize the management function such as the storage to multiple unstructured data, acquisition, inquiry towards source data and characteristic with unified pattern;
All there is the advantage of enhanced scalability in the system architecture of formation and the content of management etc.
Accompanying drawing explanation
According to drawings and embodiments the present invention is described in further detail below.
Fig. 1 is based on scientific and technological media cloud computing unstructured data process flow figure described in the embodiment of the present invention.
Embodiment
As shown in Figure 1, a kind of based on scientific and technological media cloud computing unstructured data disposal route described in the embodiment of the present invention, form primarily of following steps:
(1), carry out the acquisition of scientific and technological media information data, obtain pending unstructured data;
(2), to unstructured data, distributed cloud storage is carried out according to dissimilar feature; This step requires to adopt the architecture supporting Large Copacity, high performance Hadoop+HBase
(3), to the unstructured data that step (2) medium cloud stores, processed offline is carried out after calling, processed offline comprises: cleaning, re-scheduling, association, filtration, keyword extraction and classifying intelligently, is then updated to by the unstructured data after processed offline in cloud storage;
(4), according to the feature of unstructured data, respond receiving information retrieval requests, result for retrieval sequence is shown according to the feature of unstructured data, and each result in described result for retrieval sequence is linked to corresponding data source respectively.
In step (1), the channel of scientific and technological media information data acquisition comprises manual entry and internet captures two kinds of modes.
Step (3), that carries out unstructured data calls and subsequent processed offline, is completed by large-scale distributed computing platform.
For step (4), result for retrieval sequence stores in the buffer simultaneously.
Step (4), is updated directly into the result for retrieval sequence in buffer memory in cloud storage or carries out cloud storage again after processed offline.Like this, when before the Data Update of not carrying out being correlated with, when same information retrieval requests, do not need carry out cloud computing and directly result for retrieval sequence issued requesting party.

Claims (5)

1., based on a scientific and technological media cloud computing unstructured data disposal route, it is characterized in that, form primarily of following steps:
(1), carry out the acquisition of scientific and technological media information data, obtain pending unstructured data;
(2), to unstructured data, distributed cloud storage is carried out according to dissimilar feature;
(3), to the unstructured data that step (2) medium cloud stores, processed offline is carried out after calling, processed offline comprises: cleaning, re-scheduling, association, filtration, keyword extraction and classifying intelligently, is then updated to by the unstructured data after processed offline in cloud storage;
(4), according to the feature of unstructured data, respond receiving information retrieval requests, result for retrieval sequence is shown according to the feature of unstructured data.
2. as claimed in claim 1 a kind of based on scientific and technological media cloud computing unstructured data disposal route, it is characterized in that: in step (1), the channel of scientific and technological media information data acquisition comprises manual entry and internet captures two kinds of modes.
3. as claimed in claim 1 a kind of based on scientific and technological media cloud computing unstructured data disposal route, it is characterized in that: step (3), that carries out unstructured data calls and subsequent processed offline, is completed by large-scale distributed computing platform.
4. as claimed in claim 1 a kind of based on scientific and technological media cloud computing unstructured data disposal route, it is characterized in that: for step (4), result for retrieval sequence stores in the buffer simultaneously.
5. one as claimed in claim 4 is based on scientific and technological media cloud computing unstructured data disposal route, it is characterized in that: step (4), the result for retrieval sequence in buffer memory is updated directly in cloud storage or carries out cloud storage again after processed offline.
CN201310399024.4A 2013-09-05 2013-09-05 Unstructured data processing method based on technological media cloud computing Pending CN104424309A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310399024.4A CN104424309A (en) 2013-09-05 2013-09-05 Unstructured data processing method based on technological media cloud computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310399024.4A CN104424309A (en) 2013-09-05 2013-09-05 Unstructured data processing method based on technological media cloud computing

Publications (1)

Publication Number Publication Date
CN104424309A true CN104424309A (en) 2015-03-18

Family

ID=52973286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310399024.4A Pending CN104424309A (en) 2013-09-05 2013-09-05 Unstructured data processing method based on technological media cloud computing

Country Status (1)

Country Link
CN (1) CN104424309A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224563A (en) * 2014-06-26 2016-01-06 清控科创控股股份有限公司 A kind of scientific and technological media cloud computing unstructured data solution
CN106649298A (en) * 2015-07-22 2017-05-10 中国科学院微电子研究所 Method and system for carrying out interdisciplinary association establishment on the basis of the Internet of Things

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908191A (en) * 2010-08-03 2010-12-08 深圳市她秀时尚电子商务有限公司 Data analysis method and system for e-commerce
CN102521767A (en) * 2011-12-13 2012-06-27 亿赞普(北京)科技有限公司 Method and system for publishing network advertising information
CN102012912B (en) * 2010-11-19 2012-08-22 清华大学 Management method for unstructured data based on cloud computing environment
CN102937976A (en) * 2012-10-17 2013-02-20 北京奇虎科技有限公司 Drop-down prompting method and apparatus based on input prefix
CN103268336A (en) * 2013-05-13 2013-08-28 刘峰 Fast data and big data combined data processing method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908191A (en) * 2010-08-03 2010-12-08 深圳市她秀时尚电子商务有限公司 Data analysis method and system for e-commerce
CN102012912B (en) * 2010-11-19 2012-08-22 清华大学 Management method for unstructured data based on cloud computing environment
CN102521767A (en) * 2011-12-13 2012-06-27 亿赞普(北京)科技有限公司 Method and system for publishing network advertising information
CN102937976A (en) * 2012-10-17 2013-02-20 北京奇虎科技有限公司 Drop-down prompting method and apparatus based on input prefix
CN103268336A (en) * 2013-05-13 2013-08-28 刘峰 Fast data and big data combined data processing method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张贤善主编: "《工业企业采购供应管理》", 31 October 2009 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224563A (en) * 2014-06-26 2016-01-06 清控科创控股股份有限公司 A kind of scientific and technological media cloud computing unstructured data solution
CN106649298A (en) * 2015-07-22 2017-05-10 中国科学院微电子研究所 Method and system for carrying out interdisciplinary association establishment on the basis of the Internet of Things
CN106649298B (en) * 2015-07-22 2021-01-22 中国科学院微电子研究所 Cross-domain association establishment method and system based on Internet of things

Similar Documents

Publication Publication Date Title
CN109446279A (en) Based on neo4j big data genetic connection management method, system, equipment and storage medium
CN109063196B (en) Data processing method and device, electronic equipment and computer readable storage medium
CN108536778B (en) Data application sharing platform and method
CN103646073A (en) Condition query optimizing method based on HBase table
CN107729399B (en) Data processing method and device
CN106126601A (en) A kind of social security distributed preprocess method of big data and system
CN103440288A (en) Big data storage method and device
CN103942330A (en) Method and system for processing big data
Mishra et al. Structured and unstructured big data analytics
CN111723161A (en) Data processing method, device and equipment
CN104239470A (en) Distributed environment-oriented space data compound processing system and method
CN112506887B (en) Vehicle terminal CAN bus data processing method and device
CN108319604B (en) Optimization method for association of large and small tables in hive
Castro-Medina et al. Application of data fragmentation and replication methods in the cloud: a review
CN113810466A (en) Middleware for multi-source heterogeneous data, system and method applying middleware
CN104424309A (en) Unstructured data processing method based on technological media cloud computing
WO2014180411A1 (en) Distributed index generation method and device
Gupta et al. Efficient query analysis and performance evaluation of the NoSQL data store for bigdata
KR20170130178A (en) In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment
CN111241142A (en) Scientific and technological achievement conversion pushing system and method
MahmoudiNasab et al. AdaptRDF: adaptive storage management for RDF databases
CN113590651B (en) HQL-based cross-cluster data processing system and method
US10657126B2 (en) Meta-join and meta-group-by indexes for big data
Bharti et al. A Review on Big Data Analytics Tools in Context with Scalability
CN106446039B (en) Aggregation type big data query method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150318