CN106055557A - Method and system for classification and pre-processing of big data under Internet environment - Google Patents

Method and system for classification and pre-processing of big data under Internet environment Download PDF

Info

Publication number
CN106055557A
CN106055557A CN201610308773.5A CN201610308773A CN106055557A CN 106055557 A CN106055557 A CN 106055557A CN 201610308773 A CN201610308773 A CN 201610308773A CN 106055557 A CN106055557 A CN 106055557A
Authority
CN
China
Prior art keywords
module
pretreatment
video
internet
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610308773.5A
Other languages
Chinese (zh)
Inventor
张晓丹
梁冰
王莉
白海燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
INSTITUTE OF SCIENCE AND TECHNOLOGY INFORMATION OF CHINA
Original Assignee
INSTITUTE OF SCIENCE AND TECHNOLOGY INFORMATION OF CHINA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by INSTITUTE OF SCIENCE AND TECHNOLOGY INFORMATION OF CHINA filed Critical INSTITUTE OF SCIENCE AND TECHNOLOGY INFORMATION OF CHINA
Publication of CN106055557A publication Critical patent/CN106055557A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The invention relates to a method and system for classification and pre-processing of big data and especially relates to the method for the classification and the pre-processing of the big data under an Internet environment. The method and the system belong to the field of data exaction. The method provided by the invention comprises the steps that multiple types of network data in the Internet is used to compose a complete pre-processing basic dataset, and the data is simplified through operations such as dimension reduction; and then, the different types of data in the dataset is analyzed and pre-processed respectively, and a dataset used for classification is obtained, so that a data preparation is made for further classification.

Description

Big data classification preprocess method and system under a kind of internet environment
Technical field
The present invention relates to a kind of big data classification preprocess method and system, particularly to several under a kind of internet environment According to classification preprocess method, belong to Data Mining.
Background technology
Along with the continuous progress of modern society, the especially fast development of the Internet, disparate networks resource quantity presents The features such as enormous amount, of a great variety, change is rapid.The Internet has been enter into big data age.At present in internet, applications environment Big data are in addition to substantial amounts, and the proportion that unstructured data accounts for is increasing, and resource quantity linear incremental increases.The most numerous In miscellaneous Internet resources, the data of only 10% really can be utilized.Therefore, valid data are quickly positioned, it is achieved to money The automatic classification in source, is one of key method solving this problem.But, traditional storage and sorting algorithm cannot meet interconnection The classificating requirement of big data in net applied environment.Realize the automatic of big data in internet, applications environment the most quickly and accurately Classification, has become as the focus of current data technical research.And preconditioning technique is the basis solving big data classification problem.
This patent is studied for the problem of pretreatment of data automatic classification big in internet, applications environment.Primary study The preconditioning technique of big data in internet, applications environment based on Hadoop platform.By the research of this patent, can not only be real Big data classification in existing internet, applications environment, it is also possible to information retrieval and excavation for data big in internet, applications environment carry For effective basic technology.
Summary of the invention
Big data classification preprocess method and system under a kind of internet environment are the purpose of the present invention is to propose to.
It is an object of the invention to be achieved through the following technical solutions.
Big data classification preprocess method under a kind of internet environment that the present invention proposes, it is characterised in that: it include with Lower operating procedure:
The data acquisition of big data classification preprocess method under step one, internet environment.
Network data different types of in the Internet is acquired, and carries out dimension-reduction treatment.
The pretreatment of big data classification preprocess method under step 2, internet environment, formation system can directly process Data.
Described pretreatment includes except making an uproar.
Big data classification pretreatment system under a kind of internet environment, including: data acquisition module, information extraction module, Text Pretreatment module, image pre-processing module, video pre-filtering module and audio frequency pretreatment module.
The major function of described data acquisition module is: be acquired network data different types of in the Internet, and Carry out dimension-reduction treatment;
The major function of described information extraction module is: from input the Internet extract text message, image information, Video information, audio-frequency information;
The major function of described Text Pretreatment module is: text message is carried out participle, feature extraction, weight calculation etc. Pretreatment:
The major function of described image pre-processing module is: image information is carried out image conversion, enhancing, rim detection, The pretreatment such as recovery, segmentation;
The major function of described video pre-filtering module is: video information carries out feature extraction, builds video library, to video Data carry out the pretreatment such as multidimensional analysis;
The major function of described audio frequency pretreatment module is: audio-frequency information is carried out front end pretreatment, feature extraction, identification Deng pretreatment.
Its annexation is:
The outfan of data acquisition module respectively with information extraction module, Text Pretreatment module, image pre-processing module, The input of video pre-filtering module and audio frequency pretreatment module connects;The outfan of information extraction module is located in advance with text respectively The input of reason module, image pre-processing module, video pre-filtering module and audio frequency pretreatment module connects;Text Pretreatment mould The outfan of block is connected with the input of the text analysis model in external equipment;The outfan of image pre-processing module is with outside The input of the image analysis module in equipment connects;The outfan of video pre-filtering module and the video analysis in external equipment The input of module connects;The outfan of audio frequency pretreatment module connects with the input of the audio analysis module in external equipment Connect.
Beneficial effect
Big data classification preprocess method and system under a kind of internet environment that the present invention proposes, with existing method and Systematic comparison, has following innovation: use network data multi-class in the Internet to form the basic data of more complete pretreatment Collection, first passes through the operations such as dimensionality reduction, it is achieved simplifying of data;Then by different types of data in this data set being carried out respectively point Analysis and pretreatment, obtain the data set for classification.Data preparation is carried out for realizing further classification.
Accompanying drawing explanation
Fig. 1 is the front view of equipment steering wheel (6) to be detected in the specific embodiment of the invention;
Detailed description of the invention
In order to further illustrate objects and advantages of the present invention, below in conjunction with the accompanying drawings with specific embodiment to the present invention.
Big data classification preprocess method under internet environment in the present embodiment, it includes following operating procedure:
The data acquisition of big data classification preprocess method under step one, internet environment.
Network data different types of in the Internet is acquired, and carries out dimension-reduction treatment.
The pretreatment of big data classification preprocess method under step 2, internet environment, formation system can directly process Data
Described pretreatment includes except making an uproar.
Based on the pretreatment system of big data classification preprocess method, its structural framing such as Fig. 1 under above-mentioned internet environment Shown in, including: data acquisition module, information extraction module, Text Pretreatment module, image pre-processing module, video pre-filtering Module and audio frequency pretreatment module.
The major function of described data acquisition module is: be acquired network data different types of in the Internet, and Carry out dimension-reduction treatment;
The major function of described information extraction module is: from input the Internet extract text message, image information, Video information, audio-frequency information;
The major function of described Text Pretreatment module is: text message is carried out participle, feature extraction, weight calculation etc. Pretreatment;
The major function of described image pre-processing module is: image information is carried out image conversion, enhancing, rim detection, The pretreatment such as recovery, segmentation;
The major function of described video pre-filtering module is: video information carries out feature extraction, builds video library, to video Data carry out the pretreatment such as multidimensional analysis;
The major function of described audio frequency pretreatment module is: audio-frequency information is carried out front end pretreatment, feature extraction, identification Deng pretreatment.
Above-described specific descriptions, have been carried out the most specifically purpose, technical scheme and the beneficial effect of invention Bright, be it should be understood that the specific embodiment that the foregoing is only the present invention, the protection model being not intended to limit the present invention Enclose, all within the spirit and principles in the present invention, any modification, equivalent substitution and improvement etc. done, should be included in the present invention Protection domain within.

Claims (2)

1. big data classification preprocess method under an internet environment, it is characterised in that: it includes following operating procedure:
The data acquisition of big data classification preprocess method under step one, internet environment;
Network data different types of in the Internet is acquired, and carries out dimension-reduction treatment;
The pretreatment of big data classification preprocess method, the number that formation system can directly process under step 2, internet environment According to;Described pretreatment includes except making an uproar.
2. big data classification pretreatment system under an internet environment, it is characterised in that: comprising: data acquisition module, letter Breath abstraction module, Text Pretreatment module, image pre-processing module, video pre-filtering module and audio frequency pretreatment module;
The major function of described data acquisition module is: is acquired network data different types of in the Internet, and carries out Dimension-reduction treatment;
The major function of described information extraction module is: extract text message, image information, video from the Internet of input Information, audio-frequency information;
The major function of described Text Pretreatment module is: text message carries out the pre-places such as participle, feature extraction, weight calculation Reason;
The major function of described image pre-processing module is: image information is carried out image conversion, enhancing, rim detection, recovery, The pretreatment such as segmentation;
The major function of described video pre-filtering module is: video information carries out feature extraction, builds video library, to video data Carry out the pretreatment such as multidimensional analysis;
The major function of described audio frequency pretreatment module is: audio-frequency information is carried out front end pretreatment, feature extraction, identification etc. pre- Process;
Its annexation is:
The outfan of data acquisition module respectively with information extraction module, Text Pretreatment module, image pre-processing module, video The input of pretreatment module and audio frequency pretreatment module connects;The outfan of information extraction module respectively with Text Pretreatment mould The input of block, image pre-processing module, video pre-filtering module and audio frequency pretreatment module connects;Text Pretreatment module Outfan is connected with the input of the text analysis model in external equipment;The outfan of image pre-processing module and external equipment In image analysis module input connect;The outfan of video pre-filtering module and the analysis module in external equipment Input connect;The outfan of audio frequency pretreatment module is connected with the input of the audio analysis module in external equipment.
CN201610308773.5A 2015-12-25 2016-05-12 Method and system for classification and pre-processing of big data under Internet environment Pending CN106055557A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510988528 2015-12-25
CN2015109885289 2015-12-25

Publications (1)

Publication Number Publication Date
CN106055557A true CN106055557A (en) 2016-10-26

Family

ID=57176211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610308773.5A Pending CN106055557A (en) 2015-12-25 2016-05-12 Method and system for classification and pre-processing of big data under Internet environment

Country Status (1)

Country Link
CN (1) CN106055557A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112158692A (en) * 2020-09-09 2021-01-01 北京明略昭辉科技有限公司 Method and device for acquiring flow of target object in elevator

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1588879A (en) * 2004-08-12 2005-03-02 复旦大学 Internet content filtering system and method
CN101937445A (en) * 2010-05-24 2011-01-05 中国科学技术信息研究所 Automatic file classification system
CN104376406A (en) * 2014-11-05 2015-02-25 上海计算机软件技术开发中心 Enterprise innovation resource management and analysis system and method based on big data
CN104731852A (en) * 2014-12-16 2015-06-24 芜湖乐锐思信息咨询有限公司 Big data system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1588879A (en) * 2004-08-12 2005-03-02 复旦大学 Internet content filtering system and method
CN101937445A (en) * 2010-05-24 2011-01-05 中国科学技术信息研究所 Automatic file classification system
CN104376406A (en) * 2014-11-05 2015-02-25 上海计算机软件技术开发中心 Enterprise innovation resource management and analysis system and method based on big data
CN104731852A (en) * 2014-12-16 2015-06-24 芜湖乐锐思信息咨询有限公司 Big data system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112158692A (en) * 2020-09-09 2021-01-01 北京明略昭辉科技有限公司 Method and device for acquiring flow of target object in elevator

Similar Documents

Publication Publication Date Title
CN103544255B (en) Text semantic relativity based network public opinion information analysis method
Rizzo et al. NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud.
CN104268160A (en) Evaluation object extraction method based on domain dictionary and semantic roles
CN104933113A (en) Expression input method and device based on semantic understanding
CN106778851B (en) Social relationship prediction system and method based on mobile phone evidence obtaining data
CN102542061B (en) Intelligent product classification method
CN104504024A (en) Method and system for mining keywords based on microblog content
CN104182465A (en) Network-based big data processing method
CN104281694A (en) Analysis system of emotional tendency of text
CN105808722A (en) Information discrimination method and system
CN111507083A (en) Text analysis method, device, equipment and storage medium
CN104866606A (en) MapReduce parallel big data text classification method
CN101794378A (en) Rubbish image filtering method based on image encoding
CN110675121A (en) Method for collecting picture type file material
Wilkinson et al. A novel word segmentation method based on object detection and deep learning
CN106326335A (en) Big data classification method based on significant attribute selection
CN106055557A (en) Method and system for classification and pre-processing of big data under Internet environment
CN103218420A (en) Method and device for extracting page titles
CN104268214A (en) Micro-blog user relationship based user gender identification method and system
Sueno et al. Converting text to numerical representation using modified Bayesian vectorization technique for multi-class classification
CN110895548A (en) Method and apparatus for processing information
CN103870567A (en) Automatic identifying method for webpage collecting template of vertical search engine in cloud computing
CN107291952B (en) Method and device for extracting meaningful strings
Kim et al. Main content extraction from web documents using text block context
CN103778210A (en) Method and device for judging specific file type of file to be analyzed

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20161026

WD01 Invention patent application deemed withdrawn after publication