CN103577160A - Characteristic extraction parallel-processing method for big data - Google Patents

Characteristic extraction parallel-processing method for big data Download PDF

Info

Publication number
CN103577160A
CN103577160A CN201310487250.8A CN201310487250A CN103577160A CN 103577160 A CN103577160 A CN 103577160A CN 201310487250 A CN201310487250 A CN 201310487250A CN 103577160 A CN103577160 A CN 103577160A
Authority
CN
China
Prior art keywords
characteristic
data
parallel
feature extraction
parallel processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310487250.8A
Other languages
Chinese (zh)
Inventor
刘镇
焦弘杰
吕超
钱萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Science and Technology
Original Assignee
Jiangsu University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Science and Technology filed Critical Jiangsu University of Science and Technology
Priority to CN201310487250.8A priority Critical patent/CN103577160A/en
Publication of CN103577160A publication Critical patent/CN103577160A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a characteristic extraction parallel-processing method for big data. According to the method, on the basis of a CUDA (Compute Unified Device Architecture), the parallel computing capacity of a GPU (Graphics Processing Unit) is adopted to process the big data. When the big data is processed, multi-thread concurrent execution processing is carried out on the data by using a parallelizable matrix array processing method so as to greatly increase the speed of the characteristic extraction. The parallel matrix array processing method adopted in the method disclosed by the invention is that each characteristic character of the task data is sequentially matched with each characteristic character of the characteristic data to form a '01' array, and then parallel processing is performed on the '01' array according to the length of the characteristic data, so that the correct matching result is obtained. The method takes advantage of the characteristics of the matrix array, is very good in parallelism, can effectively and fully enable the data processing to be parallel, and is particularly suitable for the rapid characteristic extraction of the big data.

Description

A kind of feature extraction method for parallel processing towards large data
Technical field
The invention belongs to large technical field of data processing, relate to a kind of method of feature extraction, more specifically relate to a kind of feature extraction method for parallel processing towards large data.
Technical background
Along with the arriving of large data age, large data of fast processing how, and extract the study hotspot that effective information has become IT industry frontier nature." large data " refers to that a scale of construction is large especially, and data category is many and require enough fast data sets of processing speed, and such data set cannot extract and manage its content with traditional database instrument.
According to the retrieval to existing Patent data, at present the disposal route of large data is mainly contained: improve CPU nuclear volume, set up distributed cluster system and optimize the aspects such as parallel algorithm.But because these methods are all only confined to rely on the calculation process ability of CPU, the limited amount of CPU core, the restriction of setting up the more high factor of distributed cluster system cost, still await further innovation and improve the disposal route of large data and ability in addition.
Current, Feature Extraction Technology is more and more extensive in the utilization of the aspects such as image processing, pattern-recognition and network invasion monitoring, and especially under large data environment, the efficiency of feature extraction has become the bottleneck that restricts fast processing data capability.
Summary of the invention
The object of the invention is under large data environment, the present situation that traditional computing machine mainly relies on CPU to come serial to complete to the feature extraction of data, a kind of feature extraction method for parallel processing towards large data is proposed, make computing machine faster to the speed of feature extraction data processing, processing power is stronger.
To achieve these goals, the technical scheme that the present invention addresses the above problem is a kind of feature extraction method for parallel processing towards large data, when the method is processed large data in hardware allows process range, according to task data to be dealt with and characteristic, build one can parallelization the matrix array of operation, by adopting the mode of parallel processing array, data are carried out to multi-thread concurrent and carry out characteristic matching, extract the data that meet feature, and add up the number of times that successfully extracts data.
According to above-mentioned technical scheme, it is the framework based on CUDA that the present invention adopts the method for parallel processing, utilizes GPU computation capability to realize.
Above-mentioned task data need to be delivered to the storage unit of GPU from CPU, to use GPU to carry out concurrent operation.
For above-mentioned parallel computation under large data environment, the speed of in real time data in buffer area being carried out to feature extraction is more than or equal to the transmission rate of data stream, and according to the concurrent width of the adaptive adjustment feature extraction of the transmission rate of data stream, guarantee can concurrently controlling of dynamic dataflow processing.
Above-mentioned feature extraction method for parallel processing, in conjunction with GPU ardware feature, in the scope of its processing power, the method that the utilization that matching algorithm is taked can parallelization matrix array deal with data comprises following two steps, and equal executed in parallel.
Step 1: task data and each character of characteristic are carried out to PARALLEL MATCHING successively, form an effective matrix array.
Step 2: according to the length of characteristic, the effective array of parallel processing, draws the result of correct coupling, i.e. the number of times of successful characteristic matching.
The leaching process of above-mentioned characteristic, while moving for minimizing program, constantly read the number of times of characteristic, further improve arithmetic speed, will store characteristic key with constant internal memory, described characteristic need to be delivered to the constant internal memory of GPU from CPU.The restrict access of constant internal memory is read-only, in certain address from constant internal memory for the first time, reads after characteristic, when other same addresses of thread request, will directly from buffer memory, read characteristic, thereby save time.
Above-mentioned task data and each character of characteristic are carried out to PARALLEL MATCHING successively, form an effective matrix array, according to task data length STRLEN and characteristic length K EYLEN, each character of task data and characteristic is carried out to PARALLEL MATCHING successively, form " 01 " matrix array of a KEYLEN*STRLEN, with the i of matrix array is capable, make comparisons with i character of characteristic respectively, identical be designated as " 1 ", difference is designated as " 0 ".
According to above-mentioned characteristic length K EYLEN, to the method for the parallel processing of effective array, be: the little array of the individual KEYLEN*KEYLEN of parallel processing successively (STRLEN-KEYLEN+1), whether the diagonal line numerical value that judges it is " 1 " entirely, whether the first bit value that first judges decimal group diagonal of a matrix is " 1 ", if not " 1 " (but " 0 "), need not continue to judge next bit numerical value, directly turn to the next little array of judgement; If " 1 " continues to judge whether the next bit numerical value on diagonal line is " 1 ", until diagonal line numerical value is all " 1 ", has a successful feature extraction, record successfully and mate once.
Accompanying drawing explanation
Accompanying drawing 1 in the present invention for the process flow diagram of the characteristic extraction algorithm of large data environment.
Accompanying drawing 2 in the present invention for the characteristic extraction algorithm embodiment process flow diagram of large data environment.
Accompanying drawing 3 is the structural representation of character in task data matching characteristic data in the present invention.
Accompanying drawing 4 is for utilizing the method for dividing array, the structural representation of parallel processing " 01 " matrix array in the present invention.
Accompanying drawing 5 is the algorithm flow chart of parallel processing matrix array in the present invention.
Embodiment
Below in conjunction with accompanying drawing, content of the present invention is further detailed.
1. the overall procedure of a kind of feature extraction method for parallel processing towards large data relating in the present invention is: during towards large data, in hardware handles limit of power, according to task data to be dealt with and characteristic, build one can parallelization the matrix array of operation, by adopting the mode of parallel processing array, data are carried out to multi-thread concurrent and carry out characteristic matching, extract the data that meet feature, and the number of times of statistical correction feature extraction (referring to accompanying drawing 1).
2. known characteristic and task data are delivered to the storage space of GPU from CPU respectively, wherein characteristic is stored in constant storer (Constant Memory), and task data is stored in global storage (Global Memory) (referring to accompanying drawing 2).
3. the GPU kernel function kernel calling under CUDA framework carries out concurrent operation, and detailed process is as follows:
(1) task data and each character of characteristic are carried out to PARALLEL MATCHING successively, form an effective matrix array, according to task data length STRLEN and characteristic length K EYLEN, each character of task data and characteristic is carried out to PARALLEL MATCHING successively, form " 01 " matrix array of a KEYLEN*STRLEN, with the i of matrix array is capable, make comparisons with i character of characteristic respectively, identical be designated as " 1 ", difference is designated as " 0 " (referring to accompanying drawing 3).
(2), according to characteristic length K EYLEN, the little array (referring to accompanying drawing 4) of the individual KEYLEN*KEYLEN of parallel processing successively (STRLEN-KEYLEN+1), judges whether its diagonal line numerical value is " 1 " entirely.Determination methods is (referring to accompanying drawing 5):
1. extract the first bit value of decimal group diagonal of a matrix;
2. judge whether this value is " 1 ", if " 1 " turns to the 3. step, otherwise turn to the 6. step.
3. judge this whether for this reason position of last on decimal group diagonal of a matrix, if last position turns to the 5. step, otherwise turn to 4. step.
4. extract the next bit numerical value on diagonal line, turn to the 2. step;
5. the match is successful, and counting variable sum adds 1.
6. judge next little array.
This situation can be utilized a judgement statement to complete above-mentioned treatment step, thereby save time according to the characteristic of compiler, improves speed-up ratio.For example characteristic has 3 characters, be designated as a[0], a[1], a[2], now only need judge whether eligible ((a[0]==1) & & (a[1]==1) & & (a[2]==1)), whether the match is successful just can to judge this.
4. the thread in synchronous kernel, after the concurrent operation of guaranteeing GPU all completes, (referring to accompanying drawing 2) on host memory returned in the result transmission that GPU computing is obtained.
5. discharge the upper memory headroom for task data and characteristic distribution of GPU, and the result of calculation (referring to accompanying drawing 2) that indicating characteristic extracts on main frame.

Claims (8)

1. towards a feature extraction method for parallel processing for large data, it is characterized in that: in the scope of hardware handles ability, this disposal route comprises following steps:
Step 1: be task data and characteristic memory allocated space on GPU;
Step 2: when processing large data, according to task data to be dealt with and characteristic, a matrix array with good concurrency of parallel structure;
Step 3: by adopting the method for parallel processing matrix array, data are carried out to multi-thread concurrent and carry out characteristic matching;
Step 4: extract the data that meet feature, and add up the number of times that successfully extracts data.
2. a kind of feature extraction method for parallel processing towards large data according to claim 1, is characterized in that: the method for described employing parallel processing matrix array is the framework based on CUDA, utilizes GPU computation capability to realize.
3. a kind of feature extraction method for parallel processing towards large data according to claim 1, is characterized in that: described task data need to be delivered to the storage unit of GPU from CPU, to use GPU to carry out concurrent operation.
4. a kind of feature extraction method for parallel processing towards large data according to claim 1, it is characterized in that: described extracting meets the data of feature, under large data environment, the speed of in real time data in buffer area being carried out to feature extraction is more than or equal to the transmission rate of data stream, and according to the concurrent width of the adaptive adjustment feature extraction of the transmission rate of data stream, guarantee can concurrently controlling of dynamic dataflow processing.
5. a kind of feature extraction method for parallel processing towards large data according to claim 1, it is characterized in that: described carries out multi-thread concurrent execution characteristic matching to data, in conjunction with GPU ardware feature, in the scope of its processing power, the utilization that matching algorithm is taked can parallelization matrix array deal with data method comprise following two steps, and equal executed in parallel:
Step 1: task data and each character of characteristic are carried out to PARALLEL MATCHING successively, form an effective matrix array;
Step 2: according to the length of characteristic, the effective array of parallel processing, draws the result of correct coupling, i.e. the number of times of successful characteristic matching.
6. according to the feature extraction method for parallel processing towards large data described in claim 1 and 5, it is characterized in that: described characteristic need to be delivered to the constant internal memory of GPU from CPU, with constant internal memory, store characteristic key, the restrict access of constant internal memory is read-only, in certain address from constant internal memory for the first time, read after characteristic, when other same addresses of thread request, will directly from buffer memory, read characteristic.
7. a kind of feature extraction method for parallel processing towards large data according to claim 5, it is characterized in that: the PARALLEL MATCHING of described step 1 is, according to task data length STRLEN and characteristic length K EYLEN, each character of task data and characteristic is carried out to PARALLEL MATCHING successively, form " 01 " matrix array of a KEYLEN*STRLEN, with the i of matrix array is capable, make comparisons with i character of characteristic respectively, identical be designated as " 1 ", difference is designated as " 0 ".
8. a kind of feature extraction method for parallel processing towards large data according to claim 5, it is characterized in that: during described PARALLEL MATCHING successively according to characteristic length K EYLEN, the little array of the individual KEYLEN*KEYLEN of parallel processing successively (STRLEN-KEYLEN+1), whether the diagonal line numerical value that judges it is " 1 " entirely, whether the first bit value that first judges decimal group diagonal of a matrix is " 1 ", if not " 1 " (but " 0 "), need not continue to judge next bit numerical value, directly turn to the next little array of judgement; If " 1 " continues to judge whether the next bit numerical value on diagonal line is " 1 ", until diagonal line numerical value is all " 1 ", has a successful feature extraction, record successfully and mate once.
CN201310487250.8A 2013-10-17 2013-10-17 Characteristic extraction parallel-processing method for big data Pending CN103577160A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310487250.8A CN103577160A (en) 2013-10-17 2013-10-17 Characteristic extraction parallel-processing method for big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310487250.8A CN103577160A (en) 2013-10-17 2013-10-17 Characteristic extraction parallel-processing method for big data

Publications (1)

Publication Number Publication Date
CN103577160A true CN103577160A (en) 2014-02-12

Family

ID=50049017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310487250.8A Pending CN103577160A (en) 2013-10-17 2013-10-17 Characteristic extraction parallel-processing method for big data

Country Status (1)

Country Link
CN (1) CN103577160A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874510A (en) * 2017-03-01 2017-06-20 深圳市博信诺达经贸咨询有限公司 It is applied to the statistical method and system of big data
CN106918300A (en) * 2017-01-11 2017-07-04 江苏科技大学 A kind of large-sized object three-dimensional Measured data connection method based on many three-dimensional tracking devices
CN109033203A (en) * 2018-06-29 2018-12-18 大连交通大学 A kind of feature extraction method for parallel processing towards big data
CN110414534A (en) * 2019-07-01 2019-11-05 深圳前海达闼云端智能科技有限公司 Image feature extraction method and device, storage medium and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100175538A1 (en) * 2009-01-15 2010-07-15 Ryoichi Yagi Rhythm matching parallel processing apparatus in music synchronization system of motion capture data and computer program thereof
CN103324698A (en) * 2013-06-08 2013-09-25 北京航空航天大学 Large-scale humming melody matching system based on data level paralleling and graphic processing unit (GPU) acceleration
CN103345382A (en) * 2013-07-15 2013-10-09 郑州师范学院 CPU+GPU group nuclear supercomputer system and SIFT feature matching parallel computing method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100175538A1 (en) * 2009-01-15 2010-07-15 Ryoichi Yagi Rhythm matching parallel processing apparatus in music synchronization system of motion capture data and computer program thereof
CN103324698A (en) * 2013-06-08 2013-09-25 北京航空航天大学 Large-scale humming melody matching system based on data level paralleling and graphic processing unit (GPU) acceleration
CN103345382A (en) * 2013-07-15 2013-10-09 郑州师范学院 CPU+GPU group nuclear supercomputer system and SIFT feature matching parallel computing method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
VINCENT GARCIA等: "K-nearest neighbor search: fast cpu-based implementations and application to high-dimensional feature matching", 《PROCEEDINGS OF 2010 IEEE 17TH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING》 *
李建江等: "CUDA架构下的灰度图像匹配并行算法", 《电子科技大学学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106918300A (en) * 2017-01-11 2017-07-04 江苏科技大学 A kind of large-sized object three-dimensional Measured data connection method based on many three-dimensional tracking devices
CN106918300B (en) * 2017-01-11 2019-04-05 江苏科技大学 A kind of large-sized object three-dimensional Measured data connection method based on more three-dimensional tracking devices
CN106874510A (en) * 2017-03-01 2017-06-20 深圳市博信诺达经贸咨询有限公司 It is applied to the statistical method and system of big data
CN109033203A (en) * 2018-06-29 2018-12-18 大连交通大学 A kind of feature extraction method for parallel processing towards big data
CN110414534A (en) * 2019-07-01 2019-11-05 深圳前海达闼云端智能科技有限公司 Image feature extraction method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN109598338B (en) Convolutional neural network accelerator based on FPGA (field programmable Gate array) for calculation optimization
CN106991011B (en) CPU multithreading and GPU (graphics processing unit) multi-granularity parallel and cooperative optimization based method
US9189282B2 (en) Thread-to-core mapping based on thread deadline, thread demand, and hardware characteristics data collected by a performance counter
CN103336758B (en) The sparse matrix storage means of a kind of employing with the sparse row of compression of local information and the SpMV implementation method based on the method
CN103049241B (en) A kind of method improving CPU+GPU isomery device calculated performance
CN106055311B (en) MapReduce tasks in parallel methods based on assembly line multithreading
US20110066806A1 (en) System and method for memory bandwidth friendly sorting on multi-core architectures
CN111105023B (en) Data stream reconstruction method and reconfigurable data stream processor
CN112465110A (en) Hardware accelerator for convolution neural network calculation optimization
CN102981807A (en) Graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment
CN102298567A (en) Mobile processor architecture integrating central operation and graphic acceleration
CN103577160A (en) Characteristic extraction parallel-processing method for big data
CN113361695B (en) Convolutional neural network accelerator
CN105550974A (en) GPU-based acceleration method of image feature extraction algorithm
CN106575220A (en) Multiple clustered very long instruction word processing core
CN109272110A (en) Photoelectricity based on photon neural network chip merges intelligent signal processing system
CN105739951A (en) GPU-based L1 minimization problem fast solving method
CN115880132A (en) Graphics processor, matrix multiplication task processing method, device and storage medium
CN103577161A (en) Big data frequency parallel-processing method
CN103543989A (en) Adaptive parallel processing method aiming at variable length characteristic extraction for big data
CN102810133B (en) Ray querying method in online game and scene server
CN106445472B (en) A kind of character manipulation accelerated method, device, chip, processor
CN103593304B (en) The quantization method of effective use based on LPT device model caching
CN115658323A (en) FPGA load flow calculation acceleration architecture and method based on software and hardware cooperation
CN102339386B (en) Method for quickening extraction of embedded fingerprint features

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140212