CN103577160A - Characteristic extraction parallel-processing method for big data - Google Patents
Characteristic extraction parallel-processing method for big data Download PDFInfo
- Publication number
- CN103577160A CN103577160A CN201310487250.8A CN201310487250A CN103577160A CN 103577160 A CN103577160 A CN 103577160A CN 201310487250 A CN201310487250 A CN 201310487250A CN 103577160 A CN103577160 A CN 103577160A
- Authority
- CN
- China
- Prior art keywords
- characteristic
- data
- parallel
- feature extraction
- parallel processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention discloses a characteristic extraction parallel-processing method for big data. According to the method, on the basis of a CUDA (Compute Unified Device Architecture), the parallel computing capacity of a GPU (Graphics Processing Unit) is adopted to process the big data. When the big data is processed, multi-thread concurrent execution processing is carried out on the data by using a parallelizable matrix array processing method so as to greatly increase the speed of the characteristic extraction. The parallel matrix array processing method adopted in the method disclosed by the invention is that each characteristic character of the task data is sequentially matched with each characteristic character of the characteristic data to form a '01' array, and then parallel processing is performed on the '01' array according to the length of the characteristic data, so that the correct matching result is obtained. The method takes advantage of the characteristics of the matrix array, is very good in parallelism, can effectively and fully enable the data processing to be parallel, and is particularly suitable for the rapid characteristic extraction of the big data.
Description
Technical field
The invention belongs to large technical field of data processing, relate to a kind of method of feature extraction, more specifically relate to a kind of feature extraction method for parallel processing towards large data.
Technical background
Along with the arriving of large data age, large data of fast processing how, and extract the study hotspot that effective information has become IT industry frontier nature." large data " refers to that a scale of construction is large especially, and data category is many and require enough fast data sets of processing speed, and such data set cannot extract and manage its content with traditional database instrument.
According to the retrieval to existing Patent data, at present the disposal route of large data is mainly contained: improve CPU nuclear volume, set up distributed cluster system and optimize the aspects such as parallel algorithm.But because these methods are all only confined to rely on the calculation process ability of CPU, the limited amount of CPU core, the restriction of setting up the more high factor of distributed cluster system cost, still await further innovation and improve the disposal route of large data and ability in addition.
Current, Feature Extraction Technology is more and more extensive in the utilization of the aspects such as image processing, pattern-recognition and network invasion monitoring, and especially under large data environment, the efficiency of feature extraction has become the bottleneck that restricts fast processing data capability.
Summary of the invention
The object of the invention is under large data environment, the present situation that traditional computing machine mainly relies on CPU to come serial to complete to the feature extraction of data, a kind of feature extraction method for parallel processing towards large data is proposed, make computing machine faster to the speed of feature extraction data processing, processing power is stronger.
To achieve these goals, the technical scheme that the present invention addresses the above problem is a kind of feature extraction method for parallel processing towards large data, when the method is processed large data in hardware allows process range, according to task data to be dealt with and characteristic, build one can parallelization the matrix array of operation, by adopting the mode of parallel processing array, data are carried out to multi-thread concurrent and carry out characteristic matching, extract the data that meet feature, and add up the number of times that successfully extracts data.
According to above-mentioned technical scheme, it is the framework based on CUDA that the present invention adopts the method for parallel processing, utilizes GPU computation capability to realize.
Above-mentioned task data need to be delivered to the storage unit of GPU from CPU, to use GPU to carry out concurrent operation.
For above-mentioned parallel computation under large data environment, the speed of in real time data in buffer area being carried out to feature extraction is more than or equal to the transmission rate of data stream, and according to the concurrent width of the adaptive adjustment feature extraction of the transmission rate of data stream, guarantee can concurrently controlling of dynamic dataflow processing.
Above-mentioned feature extraction method for parallel processing, in conjunction with GPU ardware feature, in the scope of its processing power, the method that the utilization that matching algorithm is taked can parallelization matrix array deal with data comprises following two steps, and equal executed in parallel.
Step 1: task data and each character of characteristic are carried out to PARALLEL MATCHING successively, form an effective matrix array.
Step 2: according to the length of characteristic, the effective array of parallel processing, draws the result of correct coupling, i.e. the number of times of successful characteristic matching.
The leaching process of above-mentioned characteristic, while moving for minimizing program, constantly read the number of times of characteristic, further improve arithmetic speed, will store characteristic key with constant internal memory, described characteristic need to be delivered to the constant internal memory of GPU from CPU.The restrict access of constant internal memory is read-only, in certain address from constant internal memory for the first time, reads after characteristic, when other same addresses of thread request, will directly from buffer memory, read characteristic, thereby save time.
Above-mentioned task data and each character of characteristic are carried out to PARALLEL MATCHING successively, form an effective matrix array, according to task data length STRLEN and characteristic length K EYLEN, each character of task data and characteristic is carried out to PARALLEL MATCHING successively, form " 01 " matrix array of a KEYLEN*STRLEN, with the i of matrix array is capable, make comparisons with i character of characteristic respectively, identical be designated as " 1 ", difference is designated as " 0 ".
According to above-mentioned characteristic length K EYLEN, to the method for the parallel processing of effective array, be: the little array of the individual KEYLEN*KEYLEN of parallel processing successively (STRLEN-KEYLEN+1), whether the diagonal line numerical value that judges it is " 1 " entirely, whether the first bit value that first judges decimal group diagonal of a matrix is " 1 ", if not " 1 " (but " 0 "), need not continue to judge next bit numerical value, directly turn to the next little array of judgement; If " 1 " continues to judge whether the next bit numerical value on diagonal line is " 1 ", until diagonal line numerical value is all " 1 ", has a successful feature extraction, record successfully and mate once.
Accompanying drawing explanation
Accompanying drawing 1 in the present invention for the process flow diagram of the characteristic extraction algorithm of large data environment.
Accompanying drawing 2 in the present invention for the characteristic extraction algorithm embodiment process flow diagram of large data environment.
Accompanying drawing 3 is the structural representation of character in task data matching characteristic data in the present invention.
Accompanying drawing 4 is for utilizing the method for dividing array, the structural representation of parallel processing " 01 " matrix array in the present invention.
Accompanying drawing 5 is the algorithm flow chart of parallel processing matrix array in the present invention.
Embodiment
Below in conjunction with accompanying drawing, content of the present invention is further detailed.
1. the overall procedure of a kind of feature extraction method for parallel processing towards large data relating in the present invention is: during towards large data, in hardware handles limit of power, according to task data to be dealt with and characteristic, build one can parallelization the matrix array of operation, by adopting the mode of parallel processing array, data are carried out to multi-thread concurrent and carry out characteristic matching, extract the data that meet feature, and the number of times of statistical correction feature extraction (referring to accompanying drawing 1).
2. known characteristic and task data are delivered to the storage space of GPU from CPU respectively, wherein characteristic is stored in constant storer (Constant Memory), and task data is stored in global storage (Global Memory) (referring to accompanying drawing 2).
3. the GPU kernel function kernel calling under CUDA framework carries out concurrent operation, and detailed process is as follows:
(1) task data and each character of characteristic are carried out to PARALLEL MATCHING successively, form an effective matrix array, according to task data length STRLEN and characteristic length K EYLEN, each character of task data and characteristic is carried out to PARALLEL MATCHING successively, form " 01 " matrix array of a KEYLEN*STRLEN, with the i of matrix array is capable, make comparisons with i character of characteristic respectively, identical be designated as " 1 ", difference is designated as " 0 " (referring to accompanying drawing 3).
(2), according to characteristic length K EYLEN, the little array (referring to accompanying drawing 4) of the individual KEYLEN*KEYLEN of parallel processing successively (STRLEN-KEYLEN+1), judges whether its diagonal line numerical value is " 1 " entirely.Determination methods is (referring to accompanying drawing 5):
1. extract the first bit value of decimal group diagonal of a matrix;
2. judge whether this value is " 1 ", if " 1 " turns to the 3. step, otherwise turn to the 6. step.
3. judge this whether for this reason position of last on decimal group diagonal of a matrix, if last position turns to the 5. step, otherwise turn to 4. step.
4. extract the next bit numerical value on diagonal line, turn to the 2. step;
5. the match is successful, and counting variable sum adds 1.
6. judge next little array.
This situation can be utilized a judgement statement to complete above-mentioned treatment step, thereby save time according to the characteristic of compiler, improves speed-up ratio.For example characteristic has 3 characters, be designated as a[0], a[1], a[2], now only need judge whether eligible ((a[0]==1) & & (a[1]==1) & & (a[2]==1)), whether the match is successful just can to judge this.
4. the thread in synchronous kernel, after the concurrent operation of guaranteeing GPU all completes, (referring to accompanying drawing 2) on host memory returned in the result transmission that GPU computing is obtained.
5. discharge the upper memory headroom for task data and characteristic distribution of GPU, and the result of calculation (referring to accompanying drawing 2) that indicating characteristic extracts on main frame.
Claims (8)
1. towards a feature extraction method for parallel processing for large data, it is characterized in that: in the scope of hardware handles ability, this disposal route comprises following steps:
Step 1: be task data and characteristic memory allocated space on GPU;
Step 2: when processing large data, according to task data to be dealt with and characteristic, a matrix array with good concurrency of parallel structure;
Step 3: by adopting the method for parallel processing matrix array, data are carried out to multi-thread concurrent and carry out characteristic matching;
Step 4: extract the data that meet feature, and add up the number of times that successfully extracts data.
2. a kind of feature extraction method for parallel processing towards large data according to claim 1, is characterized in that: the method for described employing parallel processing matrix array is the framework based on CUDA, utilizes GPU computation capability to realize.
3. a kind of feature extraction method for parallel processing towards large data according to claim 1, is characterized in that: described task data need to be delivered to the storage unit of GPU from CPU, to use GPU to carry out concurrent operation.
4. a kind of feature extraction method for parallel processing towards large data according to claim 1, it is characterized in that: described extracting meets the data of feature, under large data environment, the speed of in real time data in buffer area being carried out to feature extraction is more than or equal to the transmission rate of data stream, and according to the concurrent width of the adaptive adjustment feature extraction of the transmission rate of data stream, guarantee can concurrently controlling of dynamic dataflow processing.
5. a kind of feature extraction method for parallel processing towards large data according to claim 1, it is characterized in that: described carries out multi-thread concurrent execution characteristic matching to data, in conjunction with GPU ardware feature, in the scope of its processing power, the utilization that matching algorithm is taked can parallelization matrix array deal with data method comprise following two steps, and equal executed in parallel:
Step 1: task data and each character of characteristic are carried out to PARALLEL MATCHING successively, form an effective matrix array;
Step 2: according to the length of characteristic, the effective array of parallel processing, draws the result of correct coupling, i.e. the number of times of successful characteristic matching.
6. according to the feature extraction method for parallel processing towards large data described in claim 1 and 5, it is characterized in that: described characteristic need to be delivered to the constant internal memory of GPU from CPU, with constant internal memory, store characteristic key, the restrict access of constant internal memory is read-only, in certain address from constant internal memory for the first time, read after characteristic, when other same addresses of thread request, will directly from buffer memory, read characteristic.
7. a kind of feature extraction method for parallel processing towards large data according to claim 5, it is characterized in that: the PARALLEL MATCHING of described step 1 is, according to task data length STRLEN and characteristic length K EYLEN, each character of task data and characteristic is carried out to PARALLEL MATCHING successively, form " 01 " matrix array of a KEYLEN*STRLEN, with the i of matrix array is capable, make comparisons with i character of characteristic respectively, identical be designated as " 1 ", difference is designated as " 0 ".
8. a kind of feature extraction method for parallel processing towards large data according to claim 5, it is characterized in that: during described PARALLEL MATCHING successively according to characteristic length K EYLEN, the little array of the individual KEYLEN*KEYLEN of parallel processing successively (STRLEN-KEYLEN+1), whether the diagonal line numerical value that judges it is " 1 " entirely, whether the first bit value that first judges decimal group diagonal of a matrix is " 1 ", if not " 1 " (but " 0 "), need not continue to judge next bit numerical value, directly turn to the next little array of judgement; If " 1 " continues to judge whether the next bit numerical value on diagonal line is " 1 ", until diagonal line numerical value is all " 1 ", has a successful feature extraction, record successfully and mate once.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310487250.8A CN103577160A (en) | 2013-10-17 | 2013-10-17 | Characteristic extraction parallel-processing method for big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310487250.8A CN103577160A (en) | 2013-10-17 | 2013-10-17 | Characteristic extraction parallel-processing method for big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103577160A true CN103577160A (en) | 2014-02-12 |
Family
ID=50049017
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310487250.8A Pending CN103577160A (en) | 2013-10-17 | 2013-10-17 | Characteristic extraction parallel-processing method for big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103577160A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106874510A (en) * | 2017-03-01 | 2017-06-20 | 深圳市博信诺达经贸咨询有限公司 | It is applied to the statistical method and system of big data |
CN106918300A (en) * | 2017-01-11 | 2017-07-04 | 江苏科技大学 | A kind of large-sized object three-dimensional Measured data connection method based on many three-dimensional tracking devices |
CN109033203A (en) * | 2018-06-29 | 2018-12-18 | 大连交通大学 | A kind of feature extraction method for parallel processing towards big data |
CN110414534A (en) * | 2019-07-01 | 2019-11-05 | 深圳前海达闼云端智能科技有限公司 | Image feature extraction method and device, storage medium and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100175538A1 (en) * | 2009-01-15 | 2010-07-15 | Ryoichi Yagi | Rhythm matching parallel processing apparatus in music synchronization system of motion capture data and computer program thereof |
CN103324698A (en) * | 2013-06-08 | 2013-09-25 | 北京航空航天大学 | Large-scale humming melody matching system based on data level paralleling and graphic processing unit (GPU) acceleration |
CN103345382A (en) * | 2013-07-15 | 2013-10-09 | 郑州师范学院 | CPU+GPU group nuclear supercomputer system and SIFT feature matching parallel computing method |
-
2013
- 2013-10-17 CN CN201310487250.8A patent/CN103577160A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100175538A1 (en) * | 2009-01-15 | 2010-07-15 | Ryoichi Yagi | Rhythm matching parallel processing apparatus in music synchronization system of motion capture data and computer program thereof |
CN103324698A (en) * | 2013-06-08 | 2013-09-25 | 北京航空航天大学 | Large-scale humming melody matching system based on data level paralleling and graphic processing unit (GPU) acceleration |
CN103345382A (en) * | 2013-07-15 | 2013-10-09 | 郑州师范学院 | CPU+GPU group nuclear supercomputer system and SIFT feature matching parallel computing method |
Non-Patent Citations (2)
Title |
---|
VINCENT GARCIA等: "K-nearest neighbor search: fast cpu-based implementations and application to high-dimensional feature matching", 《PROCEEDINGS OF 2010 IEEE 17TH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING》 * |
李建江等: "CUDA架构下的灰度图像匹配并行算法", 《电子科技大学学报》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106918300A (en) * | 2017-01-11 | 2017-07-04 | 江苏科技大学 | A kind of large-sized object three-dimensional Measured data connection method based on many three-dimensional tracking devices |
CN106918300B (en) * | 2017-01-11 | 2019-04-05 | 江苏科技大学 | A kind of large-sized object three-dimensional Measured data connection method based on more three-dimensional tracking devices |
CN106874510A (en) * | 2017-03-01 | 2017-06-20 | 深圳市博信诺达经贸咨询有限公司 | It is applied to the statistical method and system of big data |
CN109033203A (en) * | 2018-06-29 | 2018-12-18 | 大连交通大学 | A kind of feature extraction method for parallel processing towards big data |
CN110414534A (en) * | 2019-07-01 | 2019-11-05 | 深圳前海达闼云端智能科技有限公司 | Image feature extraction method and device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109598338B (en) | Convolutional neural network accelerator based on FPGA (field programmable Gate array) for calculation optimization | |
CN106991011B (en) | CPU multithreading and GPU (graphics processing unit) multi-granularity parallel and cooperative optimization based method | |
US9189282B2 (en) | Thread-to-core mapping based on thread deadline, thread demand, and hardware characteristics data collected by a performance counter | |
CN103336758B (en) | The sparse matrix storage means of a kind of employing with the sparse row of compression of local information and the SpMV implementation method based on the method | |
CN103049241B (en) | A kind of method improving CPU+GPU isomery device calculated performance | |
CN106055311B (en) | MapReduce tasks in parallel methods based on assembly line multithreading | |
US20110066806A1 (en) | System and method for memory bandwidth friendly sorting on multi-core architectures | |
CN111105023B (en) | Data stream reconstruction method and reconfigurable data stream processor | |
CN112465110A (en) | Hardware accelerator for convolution neural network calculation optimization | |
CN102981807A (en) | Graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment | |
CN102298567A (en) | Mobile processor architecture integrating central operation and graphic acceleration | |
CN103577160A (en) | Characteristic extraction parallel-processing method for big data | |
CN113361695B (en) | Convolutional neural network accelerator | |
CN105550974A (en) | GPU-based acceleration method of image feature extraction algorithm | |
CN106575220A (en) | Multiple clustered very long instruction word processing core | |
CN109272110A (en) | Photoelectricity based on photon neural network chip merges intelligent signal processing system | |
CN105739951A (en) | GPU-based L1 minimization problem fast solving method | |
CN115880132A (en) | Graphics processor, matrix multiplication task processing method, device and storage medium | |
CN103577161A (en) | Big data frequency parallel-processing method | |
CN103543989A (en) | Adaptive parallel processing method aiming at variable length characteristic extraction for big data | |
CN102810133B (en) | Ray querying method in online game and scene server | |
CN106445472B (en) | A kind of character manipulation accelerated method, device, chip, processor | |
CN103593304B (en) | The quantization method of effective use based on LPT device model caching | |
CN115658323A (en) | FPGA load flow calculation acceleration architecture and method based on software and hardware cooperation | |
CN102339386B (en) | Method for quickening extraction of embedded fingerprint features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20140212 |