CN106055557A - Method and system for classification and pre-processing of big data under Internet environment - Google Patents
Method and system for classification and pre-processing of big data under Internet environment Download PDFInfo
- Publication number
- CN106055557A CN106055557A CN201610308773.5A CN201610308773A CN106055557A CN 106055557 A CN106055557 A CN 106055557A CN 201610308773 A CN201610308773 A CN 201610308773A CN 106055557 A CN106055557 A CN 106055557A
- Authority
- CN
- China
- Prior art keywords
- module
- pretreatment
- video
- internet
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Abstract
The invention relates to a method and system for classification and pre-processing of big data and especially relates to the method for the classification and the pre-processing of the big data under an Internet environment. The method and the system belong to the field of data exaction. The method provided by the invention comprises the steps that multiple types of network data in the Internet is used to compose a complete pre-processing basic dataset, and the data is simplified through operations such as dimension reduction; and then, the different types of data in the dataset is analyzed and pre-processed respectively, and a dataset used for classification is obtained, so that a data preparation is made for further classification.
Description
Technical field
The present invention relates to a kind of big data classification preprocess method and system, particularly to several under a kind of internet environment
According to classification preprocess method, belong to Data Mining.
Background technology
Along with the continuous progress of modern society, the especially fast development of the Internet, disparate networks resource quantity presents
The features such as enormous amount, of a great variety, change is rapid.The Internet has been enter into big data age.At present in internet, applications environment
Big data are in addition to substantial amounts, and the proportion that unstructured data accounts for is increasing, and resource quantity linear incremental increases.The most numerous
In miscellaneous Internet resources, the data of only 10% really can be utilized.Therefore, valid data are quickly positioned, it is achieved to money
The automatic classification in source, is one of key method solving this problem.But, traditional storage and sorting algorithm cannot meet interconnection
The classificating requirement of big data in net applied environment.Realize the automatic of big data in internet, applications environment the most quickly and accurately
Classification, has become as the focus of current data technical research.And preconditioning technique is the basis solving big data classification problem.
This patent is studied for the problem of pretreatment of data automatic classification big in internet, applications environment.Primary study
The preconditioning technique of big data in internet, applications environment based on Hadoop platform.By the research of this patent, can not only be real
Big data classification in existing internet, applications environment, it is also possible to information retrieval and excavation for data big in internet, applications environment carry
For effective basic technology.
Summary of the invention
Big data classification preprocess method and system under a kind of internet environment are the purpose of the present invention is to propose to.
It is an object of the invention to be achieved through the following technical solutions.
Big data classification preprocess method under a kind of internet environment that the present invention proposes, it is characterised in that: it include with
Lower operating procedure:
The data acquisition of big data classification preprocess method under step one, internet environment.
Network data different types of in the Internet is acquired, and carries out dimension-reduction treatment.
The pretreatment of big data classification preprocess method under step 2, internet environment, formation system can directly process
Data.
Described pretreatment includes except making an uproar.
Big data classification pretreatment system under a kind of internet environment, including: data acquisition module, information extraction module,
Text Pretreatment module, image pre-processing module, video pre-filtering module and audio frequency pretreatment module.
The major function of described data acquisition module is: be acquired network data different types of in the Internet, and
Carry out dimension-reduction treatment;
The major function of described information extraction module is: from input the Internet extract text message, image information,
Video information, audio-frequency information;
The major function of described Text Pretreatment module is: text message is carried out participle, feature extraction, weight calculation etc.
Pretreatment:
The major function of described image pre-processing module is: image information is carried out image conversion, enhancing, rim detection,
The pretreatment such as recovery, segmentation;
The major function of described video pre-filtering module is: video information carries out feature extraction, builds video library, to video
Data carry out the pretreatment such as multidimensional analysis;
The major function of described audio frequency pretreatment module is: audio-frequency information is carried out front end pretreatment, feature extraction, identification
Deng pretreatment.
Its annexation is:
The outfan of data acquisition module respectively with information extraction module, Text Pretreatment module, image pre-processing module,
The input of video pre-filtering module and audio frequency pretreatment module connects;The outfan of information extraction module is located in advance with text respectively
The input of reason module, image pre-processing module, video pre-filtering module and audio frequency pretreatment module connects;Text Pretreatment mould
The outfan of block is connected with the input of the text analysis model in external equipment;The outfan of image pre-processing module is with outside
The input of the image analysis module in equipment connects;The outfan of video pre-filtering module and the video analysis in external equipment
The input of module connects;The outfan of audio frequency pretreatment module connects with the input of the audio analysis module in external equipment
Connect.
Beneficial effect
Big data classification preprocess method and system under a kind of internet environment that the present invention proposes, with existing method and
Systematic comparison, has following innovation: use network data multi-class in the Internet to form the basic data of more complete pretreatment
Collection, first passes through the operations such as dimensionality reduction, it is achieved simplifying of data;Then by different types of data in this data set being carried out respectively point
Analysis and pretreatment, obtain the data set for classification.Data preparation is carried out for realizing further classification.
Accompanying drawing explanation
Fig. 1 is the front view of equipment steering wheel (6) to be detected in the specific embodiment of the invention;
Detailed description of the invention
In order to further illustrate objects and advantages of the present invention, below in conjunction with the accompanying drawings with specific embodiment to the present invention.
Big data classification preprocess method under internet environment in the present embodiment, it includes following operating procedure:
The data acquisition of big data classification preprocess method under step one, internet environment.
Network data different types of in the Internet is acquired, and carries out dimension-reduction treatment.
The pretreatment of big data classification preprocess method under step 2, internet environment, formation system can directly process
Data
Described pretreatment includes except making an uproar.
Based on the pretreatment system of big data classification preprocess method, its structural framing such as Fig. 1 under above-mentioned internet environment
Shown in, including: data acquisition module, information extraction module, Text Pretreatment module, image pre-processing module, video pre-filtering
Module and audio frequency pretreatment module.
The major function of described data acquisition module is: be acquired network data different types of in the Internet, and
Carry out dimension-reduction treatment;
The major function of described information extraction module is: from input the Internet extract text message, image information,
Video information, audio-frequency information;
The major function of described Text Pretreatment module is: text message is carried out participle, feature extraction, weight calculation etc.
Pretreatment;
The major function of described image pre-processing module is: image information is carried out image conversion, enhancing, rim detection,
The pretreatment such as recovery, segmentation;
The major function of described video pre-filtering module is: video information carries out feature extraction, builds video library, to video
Data carry out the pretreatment such as multidimensional analysis;
The major function of described audio frequency pretreatment module is: audio-frequency information is carried out front end pretreatment, feature extraction, identification
Deng pretreatment.
Above-described specific descriptions, have been carried out the most specifically purpose, technical scheme and the beneficial effect of invention
Bright, be it should be understood that the specific embodiment that the foregoing is only the present invention, the protection model being not intended to limit the present invention
Enclose, all within the spirit and principles in the present invention, any modification, equivalent substitution and improvement etc. done, should be included in the present invention
Protection domain within.
Claims (2)
1. big data classification preprocess method under an internet environment, it is characterised in that: it includes following operating procedure:
The data acquisition of big data classification preprocess method under step one, internet environment;
Network data different types of in the Internet is acquired, and carries out dimension-reduction treatment;
The pretreatment of big data classification preprocess method, the number that formation system can directly process under step 2, internet environment
According to;Described pretreatment includes except making an uproar.
2. big data classification pretreatment system under an internet environment, it is characterised in that: comprising: data acquisition module, letter
Breath abstraction module, Text Pretreatment module, image pre-processing module, video pre-filtering module and audio frequency pretreatment module;
The major function of described data acquisition module is: is acquired network data different types of in the Internet, and carries out
Dimension-reduction treatment;
The major function of described information extraction module is: extract text message, image information, video from the Internet of input
Information, audio-frequency information;
The major function of described Text Pretreatment module is: text message carries out the pre-places such as participle, feature extraction, weight calculation
Reason;
The major function of described image pre-processing module is: image information is carried out image conversion, enhancing, rim detection, recovery,
The pretreatment such as segmentation;
The major function of described video pre-filtering module is: video information carries out feature extraction, builds video library, to video data
Carry out the pretreatment such as multidimensional analysis;
The major function of described audio frequency pretreatment module is: audio-frequency information is carried out front end pretreatment, feature extraction, identification etc. pre-
Process;
Its annexation is:
The outfan of data acquisition module respectively with information extraction module, Text Pretreatment module, image pre-processing module, video
The input of pretreatment module and audio frequency pretreatment module connects;The outfan of information extraction module respectively with Text Pretreatment mould
The input of block, image pre-processing module, video pre-filtering module and audio frequency pretreatment module connects;Text Pretreatment module
Outfan is connected with the input of the text analysis model in external equipment;The outfan of image pre-processing module and external equipment
In image analysis module input connect;The outfan of video pre-filtering module and the analysis module in external equipment
Input connect;The outfan of audio frequency pretreatment module is connected with the input of the audio analysis module in external equipment.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510988528 | 2015-12-25 | ||
CN2015109885289 | 2015-12-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106055557A true CN106055557A (en) | 2016-10-26 |
Family
ID=57176211
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610308773.5A Pending CN106055557A (en) | 2015-12-25 | 2016-05-12 | Method and system for classification and pre-processing of big data under Internet environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106055557A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112158692A (en) * | 2020-09-09 | 2021-01-01 | 北京明略昭辉科技有限公司 | Method and device for acquiring flow of target object in elevator |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1588879A (en) * | 2004-08-12 | 2005-03-02 | 复旦大学 | Internet content filtering system and method |
CN101937445A (en) * | 2010-05-24 | 2011-01-05 | 中国科学技术信息研究所 | Automatic file classification system |
CN104376406A (en) * | 2014-11-05 | 2015-02-25 | 上海计算机软件技术开发中心 | Enterprise innovation resource management and analysis system and method based on big data |
CN104731852A (en) * | 2014-12-16 | 2015-06-24 | 芜湖乐锐思信息咨询有限公司 | Big data system |
-
2016
- 2016-05-12 CN CN201610308773.5A patent/CN106055557A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1588879A (en) * | 2004-08-12 | 2005-03-02 | 复旦大学 | Internet content filtering system and method |
CN101937445A (en) * | 2010-05-24 | 2011-01-05 | 中国科学技术信息研究所 | Automatic file classification system |
CN104376406A (en) * | 2014-11-05 | 2015-02-25 | 上海计算机软件技术开发中心 | Enterprise innovation resource management and analysis system and method based on big data |
CN104731852A (en) * | 2014-12-16 | 2015-06-24 | 芜湖乐锐思信息咨询有限公司 | Big data system |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112158692A (en) * | 2020-09-09 | 2021-01-01 | 北京明略昭辉科技有限公司 | Method and device for acquiring flow of target object in elevator |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103544255B (en) | Text semantic relativity based network public opinion information analysis method | |
Rizzo et al. | NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud. | |
CN104268160A (en) | Evaluation object extraction method based on domain dictionary and semantic roles | |
CN104933113A (en) | Expression input method and device based on semantic understanding | |
CN106778851B (en) | Social relationship prediction system and method based on mobile phone evidence obtaining data | |
CN102542061B (en) | Intelligent product classification method | |
CN104504024A (en) | Method and system for mining keywords based on microblog content | |
CN104182465A (en) | Network-based big data processing method | |
CN104281694A (en) | Analysis system of emotional tendency of text | |
CN105808722A (en) | Information discrimination method and system | |
CN111507083A (en) | Text analysis method, device, equipment and storage medium | |
CN104866606A (en) | MapReduce parallel big data text classification method | |
CN101794378A (en) | Rubbish image filtering method based on image encoding | |
CN110675121A (en) | Method for collecting picture type file material | |
Wilkinson et al. | A novel word segmentation method based on object detection and deep learning | |
CN106326335A (en) | Big data classification method based on significant attribute selection | |
CN106055557A (en) | Method and system for classification and pre-processing of big data under Internet environment | |
CN103218420A (en) | Method and device for extracting page titles | |
CN104268214A (en) | Micro-blog user relationship based user gender identification method and system | |
Sueno et al. | Converting text to numerical representation using modified Bayesian vectorization technique for multi-class classification | |
CN110895548A (en) | Method and apparatus for processing information | |
CN103870567A (en) | Automatic identifying method for webpage collecting template of vertical search engine in cloud computing | |
CN107291952B (en) | Method and device for extracting meaningful strings | |
Kim et al. | Main content extraction from web documents using text block context | |
CN103778210A (en) | Method and device for judging specific file type of file to be analyzed |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20161026 |
|
WD01 | Invention patent application deemed withdrawn after publication |