CN110188320A - Second order blind source separating parallel optimization method and system based on multi-core platform - Google Patents

Second order blind source separating parallel optimization method and system based on multi-core platform Download PDF

Info

Publication number
CN110188320A
CN110188320A CN201910329707.XA CN201910329707A CN110188320A CN 110188320 A CN110188320 A CN 110188320A CN 201910329707 A CN201910329707 A CN 201910329707A CN 110188320 A CN110188320 A CN 110188320A
Authority
CN
China
Prior art keywords
matrix
blind source
core platform
order blind
source separating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910329707.XA
Other languages
Chinese (zh)
Inventor
刘卫国
刘美洋
殷泽坤
徐晓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN201910329707.XA priority Critical patent/CN110188320A/en
Publication of CN110188320A publication Critical patent/CN110188320A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Abstract

The second order blind source separating parallel optimization method and system based on multi-core platform that the invention discloses a kind of, comprising the following steps: receive environmental variance parameter, CPU line journey nucleophilicity is set;Signal to be processed is received, multi-threaded parallel pretreatment is carried out to signal to be processed;Merge it is multiple can parallel computation region, carry out joint approximate diagonalization;Export separation matrix and source matrix.The present invention has greatly accelerated the processing speed of second order blind source separating by the characteristic of multi-core platform.

Description

Second order blind source separating parallel optimization method and system based on multi-core platform
Technical field
The invention belongs to signal processing technology field more particularly to a kind of second order blind source separating based on multi-core platform are parallel Optimization method and system.
Background technique
In fields such as nerve signal process, statistical analysis, the observation data that are collected into be often have an error mix number According to these errors are many times machine errors, it is difficult to be avoided.Therefore, how a basic problem is by appropriate Method finds an appropriate expression of the source data of observation data.Blind source separating (BSS:Blind source It separation) is the process that each original signal that can not directly observe is recovered from several mixed signals observed. Second order blind source separation algorithm (SOBI) is based on delay cross-correlation matrix principle, carries out joint approximate diagonal to a collection of covariance matrix Change to realize the purpose of signal blind source separating, is a kind of steady blind source separation method.SOBI uses simple second-order statistics Amount, source signal component can be estimated using relatively small number of data point, do not have to consider source signal whether Gaussian distributed, To avoid judging the Gaussian characteristics of source signal probability density function, and multiple Gaussian noise sources can be separated, be current one kind The blind source separation algorithm of mainstream.
Second order blind source separation algorithm (SOBI) calculating process is simple, good separating effect, in processing of biomedical signals, array The fields such as signal processing, voice signal identification, image procossing and mobile communication are widely used.However, inventor exists It is found in application process, effective second order blind source separation algorithm running and comparing is slow, and needing to be promoted speed could be by extensive Ying Yu Under line and realizes real-time nerve signal process and feed back.
Summary of the invention
To overcome above-mentioned the deficiencies in the prior art, the present invention provides a kind of second order blind source separating based on multi-core platform is simultaneously Row optimization method and system are realized by advantages such as the parallel processing of multi-core platform, mathematics core library, instruction set to second order The acceleration of blind source separating implementation procedure.
To achieve the above object, one or more embodiments of the invention provides following technical solution:
A kind of second order blind source separating parallel optimization method based on multi-core platform, comprising the following steps:
Environmental variance parameter is received, CPU line journey nucleophilicity is set;
Signal to be processed is received, multi-threaded parallel pretreatment is carried out to signal to be processed;
Merge it is multiple can parallel computation region, carry out joint approximate diagonalization;
Export separation matrix and source matrix.
One or more embodiments provide a kind of second order blind source separating parallel optimization system based on multi-core platform, packet It includes:
CPU line journey nucleophilicity is arranged for receiving environmental variance parameter in CPU nucleophilicity configuration module;
Data reception module, for receiving signal to be processed;
Data preprocessing module, for carrying out multi-threaded parallel pretreatment to signal to be processed;
Diagonalization module, for merge it is multiple can parallel computation region, carry out joint approximate diagonalization;
Processing result output module exports separation matrix and source matrix.
One or more embodiments provide a kind of computing device, including memory, processor and storage are on a memory And the computer program that can be run on a processor, the processor realize that described one kind is based on multicore when executing described program The second order blind source separating parallel optimization method of platform.
One or more embodiments provide a kind of computer readable storage medium, are stored thereon with computer program, should A kind of second order blind source separating parallel optimization method based on multi-core platform is realized when program is executed by processor.
The above one or more technical solution there are following the utility model has the advantages that
The present invention is based on multi-core platforms, thread nucleophilicity when program is run are arranged by environmental variance first, so that journey CPU only accesses directly connected memory to sequence at runtime, and such memory access time can greatly reduce, and ensure that entire second order is blind The performance boost of source separation process;It raises speed in data preprocessing phase by the parallel processing feature of multi-core platform;Joining Close the joint approximate diagonalization stage, extremely disperse since the region that multithreading is accelerated parallel can be used in the process, by it is multiple simultaneously Row region merging technique, so that acceleration is realized, to substantially increase the operational efficiency of second order blind source separating.
Detailed description of the invention
The Figure of description for constituting a part of the invention is used to provide further understanding of the present invention, and of the invention shows Examples and descriptions thereof are used to explain the present invention for meaning property, does not constitute improper limitations of the present invention.
Fig. 1 is the flow chart of traditional second order blind source separation method;
Fig. 2 is that the second order blind source separating parallel optimization method based on multi-core platform is whole in the one or more embodiments of the present invention Body flow chart;
When Fig. 3 and Fig. 4 is respectively the operation of links when using optimization method in the one or more embodiments of the present invention Between schematic diagram and speed-up ratio schematic diagram.
Specific embodiment
It is noted that described further below be all exemplary, it is intended to provide further instruction to the present invention.Unless another It indicates, all technical and scientific terms used herein has usual with general technical staff of the technical field of the invention The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root According to exemplary embodiments of the present invention.As used herein, unless the context clearly indicates otherwise, otherwise singular Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.
In the absence of conflict, the feature in the embodiment and embodiment in the present invention can be combined with each other.
Intel multi-core processor possesses multiple kernels, can provide higher calculation power, multithreading is supported, in conjunction with its high speed Memory bandwidth higher data transmission capabilities can be provided.Adding in the AVX instruction set supported by Intel, multiplies fusion (FMA) etc. instruction, can fast implement single-instruction multiple-data stream (SIMD) (SIMD), can best play the hardware peak of Intel CPU It is worth performance.The above characteristic can be good at accelerating SOBI algorithm, between can solve second order blind source separation algorithm at runtime On bottleneck, be preferably applied for real-time signal processing system.
Embodiment one
The second order blind source separating parallel optimization method based on multi-core platform that present embodiment discloses a kind of, including following step It is rapid:
Step 1: receiving environmental variance parameter, CPU line journey nucleophilicity is set;
Intel multi-core processor is NUMA architecture, although memory is directly connected with CPU, since memory is averaged point It has fitted on each core cpu (die).Only when CPU accesses the corresponding physical address of the memory that itself is directly connected to, just meeting There is the shorter response time.And when accessing the data of the memory of other CPU connections if necessary, it is necessary to pass through inter- Connect channel access, response time are just compared slack-off before.Therefore, in order to avoid above situation, we are become by environment Thread nucleophilicity when amount setting program is run, making calling program, CPU only accesses directly connected memory at runtime, in this way The memory access time can greatly reduce, and program feature has very big promotion.
Step 2: receiving signal to be processed, in conjunction with mathematics core function library, it is pre- that multi-threaded parallel is carried out to signal to be processed Processing;
Data prediction part will do it a large amount of in calculation delay matrix, Data Whitening, calculating sample covariance matrix Matrix manipulation, including matrix multiplication operation, matrix transposition, ask matrix exgenvalue and feature vector, and matrix in practical application Scale is all bigger.The problems such as large-scale matrix manipulation is related to discontinuous memory access, always and high-performance computing sector Traditional hot spot, we are accelerated using OpenMP multithreading with Intel high-performance math library (Intel MKL).Intel The library MKL can be good at adapting to the computing unit of Intel, reaches best acceleration effect, is a set of extraordinary high-performance number It learns and calculates library.
Pretreatment specifically includes in the step 2: calculation delay matrix, Data Whitening processing and calculating sampling covariance square Battle array.
Wherein, Data Whitening is handled method particularly includes:
Whitening processing is carried out to observation data X (t) by formula (1), so that the covariance matrix of Y (t) is unit matrix, to go Except the second order correlation between each component, W is m × n dimension whitening matrix:
Y (t)=WX (t) (1)
Calculate sample covariance matrix method particularly includes:
For fixed delay, τ ∈ { τj| j=1,2 ..., k }, calculate the sample covariance matrix of whitened data:
R (τ)=E [Y (t+ τ) YT(t)]=ARY(τ)AT (2)
Step 3: merge it is multiple can parallel computation region, carry out joint approximate diagonalization;
During joint approximate diagonalization, integrated use Thread-Level Parallelism and instruction set parallel mode, and memory optimization Access, the boosting algorithm speed of service.Joint approximate diagonalization is the most time-consuming part of SOBI algorithm, this part is related to propitious essay This rotation, seek small-scale matrix eigen vector, small-scale matrix matrix multiplication.More importantly this process It is the process of an iterative solution, time-consuming very long, the time, accounting was 90% or more.For joint approximate diagonalization, we are main Take following measures:
Thread-Level Parallelism acceleration is carried out using OpenMP, merges multiple parallel regions to reduce multithreading expense.Many institutes It is known, single thread can only sequential processes calculating task, complete one and carry out again next, however multithreading can be handled concurrently Calculating task is exactly to be completed at the same time multiple calculating tasks in simple terms.Nevertheless, the parallel programming model under multithread mode The time not necessarily can be saved, because when the unlatching of multithreading, data are synchronous, end of multithreading can all bring additional Between expense, so only when parallel computation save it is lower when calculating intrinsic expense of the time more than multithreading, under multithread mode Parallel computation can just achieve the effect that acceleration.During joint approximate diagonalization, multithreading can be used and add parallel The region of speed extremely disperses, if directly simply using multi-threaded parallel will lead to multithreading need it is frequently synchronous, it is more in this way By the overhead of multithreading to counteracting, acceleration effect is not obvious the acceleration income that thread parallel calculates.Therefore, Wo Mentong Cross adjustment algorithmic code by original 3 disperse can parallel computation region merging technique to 1 parallel regions, the expense of multithreading is reduced It is original 1/3, and reduces the synchronous number of multi-thread data, multithreads computing brings apparent speed-up ratio. Herein in multithreaded programming model we using OpenMP technology.
Joint approximate diagonalization specifically includes in the step 3: carrying out Givens rotation and iteration updates two, M, U matrix Important link.
Wherein, Givens rotation method particularly includes: Givens rotation matrix is calculated using trigonometric function.
During joint approximate diagonalization, need to carry out Givens rotation.The Givens rotation that traditional algorithm uses is It is realized by seeking the eigen vector of matrix, if Givens rotation matrix isBy solving feature Value and feature vector calculate c and s:S=0.5 × (eigenvector [2]-j × Eigenvector [2])/c, but the time loss of the eigen vector of solution matrix is bigger.
We, which use, realizes new Givens rotation: c=sin α, s=using sinusoidal, cosine method cosα.SOBI algorithm whole result is not influenced using the Givens rotation of sine and cosine in actual test, and program is transported The row time significantly reduces.Because trigonometric function is asked only to need an instruction, additional internal storage access is not needed, but seeks matrix Eigen vector necessary not only for a plurality of instruction, and needs additional memory read-write.
Iteration updates M, U matrix and specifically includes: calculating the value of the s in above-mentioned Givens rotation matrix and the threshold of algorithm setting The difference of value, judges whether the difference is less than threshold value, if it is not, updating M, U matrix, re-starts Givens rotation, if so, meter Calculate separation matrix, source matrix.
Wherein, the calculation method of orthogonal matrix U are as follows: for all R (τj), using joint approximate diagonalization algorithm, obtain Orthogonal matrix U out meets formula (3), { DjIt is one group of diagonal matrix.
UTR(τj) U=Dj (3)
The calculation method of separation matrix W are as follows: by above step it can be concluded that Y (t)=UtWX (t) and hybrid matrix A=W+U。 After obtaining the source signal Y (t) of decorrelation, remove should not Independent sources signal component and be reconstructed, it is as follows:
Xr(t)=W+Yr(t) (4)
In formula: X (t) is the observation signal vector after reconstruct;YrIt (t) is by source signal ingredient zero setting unwanted in Y (t) The new independent source matrix obtained after processing;W+For the pseudo inverse matrix of separation matrix W.
Mixing source signal matrix A can be calculated according to separation matrix W.
In addition to using OpenMP to realize the other parallel acceleration of thread-level, we are also made full use of on Intel's multi-core processor Vector registor and AVX instruction set realize instruction-level and data level parallel computation.Traditional calculating mode is individual instructions A number is taken every time and corresponding operating is carried out to this number, and this mode is called single instruction single data stream.But, at present absolutely mostly Several processors all supports single-instruction multiple-data stream (SIMD) mode, that is, individual instructions to take multiple numbers every time, and multiple to this simultaneously Number carries out relevant operation.On Intel's multi-core platform, highest supports 512 vector registors at present, and Intel mentions Supplying to make full use of the AVX instruction set of vector registor, we pass through the support of its vector registor and command adapted thereto collection AVX, Single-instruction multiple-data stream (SIMD) mode may be implemented, by 512 bit vector registers, we can once execute 16 single precisions or The operation of 8 double precision datums, to reach the parallel computation of instruction-level and data level.For example, in second order blind source separation algorithm (SOBI) in, it is related to the mathematical computations of addition subtraction multiplication and division, we all employ AVX instruction of equal value and have carried out parallel acceleration.It removes Except this, we additionally use in AVX instruction set plus multiply fusion instruction (FMA), using only can once calculate ± (a × b) ±c.Traditionally, ± (a × b) ± c needs at least two computations, is multiplication, addition (subtraction) computations respectively, still If using plus multiply fusion instruction (FMA), 2 operations (multiplication, addition) can be combined into one by we, so at runtime between Half can be reduced less, and floating-point operation (FLOPS) per second promotes one times;Further, since Intel multi-core processor is furnished with 2 portions FMA Part, the double growth of peak F LOPS.In addition, due to that will not be rounded up to a × b intermediate result, than multiplying order (MUL) and Addition instruction (ADD) is more acurrate.FMA can promote the performance and accuracy of many Floating-point Computations, such as matrix multiplication.
In C/C++ language, physical store form of the two-dimensional array on memory is therefore row major storage is visited by row When asking two-dimensional array, the access of memory is continuously that continuous internal storage access can make full use of the storage knot of modern computer Structure makes full use of cache members uplift program feature.Because it is only to access memory that processor, which accesses the time that cache is spent, 1 percent.Conversely, the content in cache is substantially all and will not hit when accessing two-dimensional array according to column, occur serious Cache missing, program runtime greatly increase.During joint approximate diagonalization, original algorithm is according to two when updating M, U The column access of dimension group, there are a large amount of discontinuous memory access, program locality is unfriendly, and therefore, the present embodiment is by joining The initial stage for closing joint approximate diagonalization carries out transposition to matrix to be visited, realizes continuous memory access, creates good program part Property, give full play to the performance of CPU-cache- memory modern computing body structure.
In C/C++ language, if aray variable is infrequently updated, when processor uses array element (array) every time The case where directly access memory will be generated, still can but be deposited when accessing floating number (float/double) by access Device avoids internal storage access, and it is only the one thousandth for accessing memory that processor, which accesses the time that register is spent,.So using Floating-point number variable substitution floating number type array can significantly reduce the case where processor access memory.In joint approximate diagonalization mistake Cheng Zhong, when updating M, U matrix, there are more the case where doing temporary variable using aray variable to store the intermediate result of calculating, And these digit group type temporary variables are infrequently updated.At this point, the operation of the variable to digit group type, actually in direct read/write Address is deposited, huge memory read-write expense can be brought in this way, and comparatively speaking memory read-write expense and access register are ten Divide time-consuming.So we replace a series of array by using the floating number type temporary variable that can be put down in a register Temporary variable does not need additional read/write memory at no point in the update process in this way, it is only necessary to which read-write register greatly improves program Performance.
During joint approximate diagonalization, updates M, U matrix and be related to matrix multiplication, but matrix size ratio at this time It is smaller, it is practical if using Intel's high-performance math library (MKL) as step 1 calculates the measure taken in whitening matrix Acceleration effect is not obvious.We have found after tested, Intel's high-performance math library (Intel MKL) under small-scale matrix multiplication It shows and bad, because calling MKL that can generate function call expense, we realize a small-scale matrix multiplication more Fastly.
Step 4: output separation matrix and source matrix.
It will be understood by those skilled in the art that pthread can be used rather than use the OpenMP used herein to line Journey is managed, and also can achieve same effect.
The present embodiment usesTo strongTMProcessor E5 Product Family (code name " Haswell EP ") is based on English A dual slot platform of the newest micro-architecture of Te Er.The product have 18 kernels, the dominant frequency of 3.6GHz, 55M caching and The memory bandwidth of 76.8GB/s supports AVX instruction set, supporting vector register on hardware.They can be improved to be obviously improved and answer Use performance.On the basis of the Matlab realization version of current most popular SOBI, every kind of Optimized Measures are applied to optimization It is timely that program is carried out in version code, obtains program runtime and speed-up ratio, as shown in Figure 3-4.As seen from the figure, synthesis makes Highly significant is improved in speed with SOBI after above-mentioned all Optimized Measures, program becomes final 4.5s reality from initial 180s 39 times of speed-up ratio is showed.
Blind source separating parallel acceleration method is in processing of biomedical signals, array signal processing, language described in the present embodiment The fields such as sound signal identification, image procossing and mobile communication can be applied.
In processing of biomedical signals field, second order blind source separation algorithm (SOBI) can fast and effeciently remove artefact letter Number, be brain-computer interface (brain computer interface, BCI) in EEG signals (electroencephalography, EEG online processing) is laid a good foundation.For arrhythmia cordis auricular fibrillation illness surface mapping signal extraction and During independent component analysis, the algorithm is also used.
In the power system, non-linear equipment access power grid causes power quality index to deteriorate to influence high-grade, precision and advanced electronics The use of equipment.Primary operational during administering power quality is to carry out divisions of responsibility to harmonic pollution.With second order Blind source separation algorithm (SOBI) is separated each source signal using the independence between source signal by second-order statistic.
Second order blind source separating is often taken in art of image analysis, the research such as the extraction of fault-signal and status monitoring Algorithm, and under different system dampings and different signal-to-noise ratio environment, all performance processing of second order blind source separation algorithm are higher Robustness, and accuracy of identification is higher.
The core algorithm of second order blind source separation algorithm or noise diagnostics failure identification of sound source technology, can be used for machinery and sets The research of identification of sound source method in standby noise diagnostics.
Embodiment two
Based on one the method for embodiment, the purpose of the present embodiment is to provide a kind of blind source of the second order based on multi-core platform point From parallel optimization system, comprising:
CPU line journey nucleophilicity is arranged for receiving environmental variance parameter in CPU nucleophilicity configuration module;
Data reception module, for receiving signal to be processed;
Data preprocessing module, for carrying out multi-threaded parallel pretreatment to signal to be processed;
Diagonalization module, for merge it is multiple can parallel computation region, carry out joint approximate diagonalization;
Processing result output module exports separation matrix and source matrix.
Embodiment three
The purpose of the present embodiment is to provide a kind of computing device, including memory, processor and storage are on a memory simultaneously The computer program that can be run on a processor, the processor are realized when executing described program:
Environmental variance parameter is received, CPU line journey nucleophilicity is set;
Signal to be processed is received, multi-threaded parallel pretreatment is carried out to signal to be processed;
Merge it is multiple can parallel computation region, carry out joint approximate diagonalization;
Export separation matrix and source matrix.
Example IV
The purpose of the present embodiment is to provide a kind of computer readable storage medium.
A kind of computer readable storage medium, is stored thereon with computer program, calculates for fingerprint similarity, should Realization when program is executed by processor:
Environmental variance parameter is received, CPU line journey nucleophilicity is set;
Signal to be processed is received, multi-threaded parallel pretreatment is carried out to signal to be processed;
Merge it is multiple can parallel computation region, carry out joint approximate diagonalization;
Export separation matrix and source matrix.
Each step involved in the device of above embodiments two, three and four is corresponding with embodiment of the method one, specific implementation Mode can be found in the related description part of embodiment one.Term " computer readable storage medium " be construed as include one or The single medium or multiple media of multiple instruction collection;It should also be understood as including any medium, any medium can be deposited Storage, coding carry instruction set for being executed by processor and processor are made either to execute in the present invention method.
The above one or more embodiment has following technical effect that
Operating parameter, control multithreading quantity and thread nucleophilicity are set first, promote second order blind source separating on the whole Execution efficiency;
In data preprocessing phase, the parallel of thread-level is carried out using OpenMP multithreading and Intel's high-performance math library Accelerate;
During joint approximate diagonalization, integrated use Thread-Level Parallelism and instruction set parallel mode, and memory optimization The boosting algorithm speed of service: 1) access carries out Thread-Level Parallelism acceleration using OpenMP, during joint approximate diagonalization The characteristics of region that multithreading is accelerated parallel extremely disperses can be used, merge multiple parallel regions to reduce multithreading Expense;2) it uses and realizes new Givens rotation using sinusoidal, cosine method, reduce the use of operational order With memory read-write;3) realize that instruction accelerates parallel with data level with AVX instruction set using vector registor, it is more by single instruction stream Assembly line when optimizing data stream program executes;4) in orthogonal matrix iteration renewal process, by matrix transposition, reduction does not connect Continuous memory access, improves cache hit probability;Digit group type temporary variable is replaced using floating number temporary variable, reduces memory read-write number; It realizes efficient small-scale matrix multiplication simultaneously, avoids Intel's high-performance math library time overhead.
It will be understood by those skilled in the art that each module or each step of aforementioned present invention can be filled with general computer It sets to realize, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored Be performed by computing device in the storage device, perhaps they are fabricated to each integrated circuit modules or by they In multiple modules or step be fabricated to single integrated circuit module to realize.The present invention is not limited to any specific hardware and The combination of software.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.
Above-mentioned, although the foregoing specific embodiments of the present invention is described with reference to the accompanying drawings, not protects model to the present invention The limitation enclosed, those skilled in the art should understand that, based on the technical solutions of the present invention, those skilled in the art are not Need to make the creative labor the various modifications or changes that can be made still within protection scope of the present invention.

Claims (10)

1. a kind of second order blind source separating parallel optimization method based on multi-core platform, which comprises the following steps:
Environmental variance parameter is received, CPU line journey nucleophilicity is set;
Signal to be processed is received, multi-threaded parallel pretreatment is carried out to signal to be processed;
Merge it is multiple can parallel computation region, carry out joint approximate diagonalization;
Export separation matrix and source matrix.
2. the second order blind source separating parallel optimization method based on multi-core platform as described in claim 1, which is characterized in that treat Processing signal carries out pretreatment and includes: calculation delay matrix, Data Whitening processing and calculate sample covariance matrix.
3. the second order blind source separating parallel optimization method based on multi-core platform as claimed in claim 2, which is characterized in that described Preprocessing process is accelerated by mathematics core function library.
4. the second order blind source separating parallel optimization method based on multi-core platform as described in claim 1, which is characterized in that described Joint approximate diagonalization includes: that Givens rotation and iteration update orthogonal matrix.
5. the second order blind source separating parallel optimization method based on multi-core platform as claimed in claim 4, which is characterized in that pass through Trigonometric function solves Givens rotation matrix.
6. the second order blind source separating parallel optimization method based on multi-core platform as claimed in claim 4, which is characterized in that iteration During updating orthogonal matrix, matrix to be visited is subjected to transposition processing, and replace one using floating number type temporary variable Serial digit group type temporary variable.
7. the second order blind source separating parallel optimization method based on multi-core platform as claimed in claim 4, which is characterized in that for It is related to the mathematical computations of the addition subtraction multiplication and division in joint approximate diagonalization, is accelerated parallel using AVX instruction of equal value.
8. a kind of second order blind source separating parallel optimization system based on multi-core platform characterized by comprising
CPU line journey nucleophilicity is arranged for receiving environmental variance parameter in CPU nucleophilicity configuration module;
Data reception module, for receiving signal to be processed;
Data preprocessing module, for carrying out multi-threaded parallel pretreatment to signal to be processed;
Diagonalization module, for merge it is multiple can parallel computation region, carry out joint approximate diagonalization;
Processing result output module exports separation matrix and source matrix.
9. a kind of computing device including memory, processor and stores the calculating that can be run on a memory and on a processor Machine program, which is characterized in that the processor realizes a kind of such as the described in any item bases of claim 1-7 when executing described program In the second order blind source separating parallel optimization method of multi-core platform.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor A kind of such as the described in any item second order blind source separating parallel optimization sides based on multi-core platform claim 1-7 are realized when execution Method.
CN201910329707.XA 2019-04-23 2019-04-23 Second order blind source separating parallel optimization method and system based on multi-core platform Pending CN110188320A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910329707.XA CN110188320A (en) 2019-04-23 2019-04-23 Second order blind source separating parallel optimization method and system based on multi-core platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910329707.XA CN110188320A (en) 2019-04-23 2019-04-23 Second order blind source separating parallel optimization method and system based on multi-core platform

Publications (1)

Publication Number Publication Date
CN110188320A true CN110188320A (en) 2019-08-30

Family

ID=67714992

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910329707.XA Pending CN110188320A (en) 2019-04-23 2019-04-23 Second order blind source separating parallel optimization method and system based on multi-core platform

Country Status (1)

Country Link
CN (1) CN110188320A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705602A (en) * 2019-09-06 2020-01-17 平安科技(深圳)有限公司 Large-scale data clustering method and device and computer readable storage medium
CN113094646A (en) * 2021-03-25 2021-07-09 电子科技大学 Matrix data processing system and method based on matrix joint approximate diagonalization
CN113704691A (en) * 2021-08-26 2021-11-26 中国科学院软件研究所 Small-scale symmetric matrix parallel three-diagonalization method of Shenwei many-core processor
CN114080953A (en) * 2021-11-05 2022-02-25 山东省农业机械科学研究院 Illumination management method and system for mushroom house

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103426436A (en) * 2012-05-04 2013-12-04 索尼电脑娱乐公司 Source separation by independent component analysis in conjuction with optimization of acoustic echo cancellation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103426436A (en) * 2012-05-04 2013-12-04 索尼电脑娱乐公司 Source separation by independent component analysis in conjuction with optimization of acoustic echo cancellation

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
ADEL BELOUCHRANI等: "A Blind Source Separation Technique Using Second-Order Statistics", 《IEEE TRANSACTIONS ON SIGNAL PROCESSING》 *
LEI SHAN等: "Accelerating NystrÖm Kernel Independent Component Analysis with Many Integrated Core Architecture", 《COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE》 *
刘鑫 等: ""神威•太湖之光"计算机系统大规模应用特征分析与E级可扩展性研究", 《计算机学报》 *
吴文伟 等: "《主动控制中的控制器设计与实现》", 30 April 2017, 哈尔滨工程大学出版社 *
陈永健: "OpenMP编译与优化技术研究", 《中国优秀博硕士学位论文全文数据库信息科技辑》 *
雷洪: "《多核异构并行计算OpenMP4.5C/C++篇》", 30 April 2018, 冶金工业出版社 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705602A (en) * 2019-09-06 2020-01-17 平安科技(深圳)有限公司 Large-scale data clustering method and device and computer readable storage medium
CN113094646A (en) * 2021-03-25 2021-07-09 电子科技大学 Matrix data processing system and method based on matrix joint approximate diagonalization
CN113094646B (en) * 2021-03-25 2023-04-28 电子科技大学 Matrix data processing system and method based on matrix joint approximate diagonalization
CN113704691A (en) * 2021-08-26 2021-11-26 中国科学院软件研究所 Small-scale symmetric matrix parallel three-diagonalization method of Shenwei many-core processor
CN113704691B (en) * 2021-08-26 2023-04-25 中国科学院软件研究所 Small-scale symmetric matrix parallel tri-diagonalization method of Shenwei many-core processor
CN114080953A (en) * 2021-11-05 2022-02-25 山东省农业机械科学研究院 Illumination management method and system for mushroom house

Similar Documents

Publication Publication Date Title
CN110188320A (en) Second order blind source separating parallel optimization method and system based on multi-core platform
Li et al. Quantum supremacy circuit simulation on Sunway TaihuLight
Dong et al. Dnnmark: A deep neural network benchmark suite for gpus
Betkaoui et al. Comparing performance and energy efficiency of FPGAs and GPUs for high productivity computing
Yang et al. An efficient parallel algorithm for longest common subsequence problem on gpus
CN107451097B (en) High-performance implementation method of multi-dimensional FFT on domestic Shenwei 26010 multi-core processor
Schmidt et al. Parallel programming: concepts and practice
Chen et al. Brain big data processing with massively parallel computing technology: challenges and opportunities
Dong et al. Characterizing the microarchitectural implications of a convolutional neural network (cnn) execution on gpus
Li et al. Automatic generation of high-performance fft kernels on arm and x86 cpus
Zhao et al. Combined kernel for fast GPU computation of Zernike moments
Pratas et al. Fine-grain parallelism using multi-core, Cell/BE, and GPU systems
Zhang et al. Performance analysis and optimization for SpMV based on aligned storage formats on an ARM processor
CN102902657A (en) Method for accelerating FFT (Fast Fourier Transform) by using GPU (Graphic Processing Unit)
Xu et al. Optimizing finite volume method solvers on Nvidia GPUs
Zhang et al. NUMA-Aware DGEMM based on 64-bit ARMv8 multicore processors architecture
Lan et al. Accelerating large-scale biological database search on Xeon Phi-based neo-heterogeneous architectures
Gan et al. Scaling and analyzing the stencil performance on multi-core and many-core architectures
Chen et al. Performance evaluation of convolutional neural network on Tianhe-3 prototype
Wu et al. A vectorized k-means algorithm for intel many integrated core architecture
Haghi et al. WFA-FPGA: An efficient accelerator of the wavefront algorithm for short and long read genomics alignment
Kaliszan et al. HPC processors benchmarking assessment for global system science applications
Wang et al. Observer-controller stabilization of a class of manipulators with a single flexible link
Cao et al. Critique of “A parallel framework for constraint-based Bayesian network learning via Markov blanket discovery” by SCC team from Tsinghua University
Gugnani et al. MPI-LiFE: Designing high-performance linear fascicle evaluation of brain connectome with MPI

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190830