CN108664348B - Fast variable point detection method and device based on CUDA (compute unified device architecture) and storage medium - Google Patents

Fast variable point detection method and device based on CUDA (compute unified device architecture) and storage medium Download PDF

Info

Publication number
CN108664348B
CN108664348B CN201810432528.4A CN201810432528A CN108664348B CN 108664348 B CN108664348 B CN 108664348B CN 201810432528 A CN201810432528 A CN 201810432528A CN 108664348 B CN108664348 B CN 108664348B
Authority
CN
China
Prior art keywords
detected
sample data
point detection
cuda
gpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810432528.4A
Other languages
Chinese (zh)
Other versions
CN108664348A (en
Inventor
李香银
徐维超
陈昌润
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201810432528.4A priority Critical patent/CN108664348B/en
Publication of CN108664348A publication Critical patent/CN108664348A/en
Application granted granted Critical
Publication of CN108664348B publication Critical patent/CN108664348B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0721Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU]
    • G06F11/0724Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU] in a multiprocessor or a multi-core unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a rapid variable point detection method, a device and a storage medium based on CUDA, wherein the method comprises the following steps: pre-building a CUDA environment for variable point detection; collecting sample data to be detected, and storing the sample data to be detected at a CPU end; distributing a storage space for the sample data to be detected at the GPU end, and transmitting the sample data to be detected from the CPU end to a video memory of the GPU end; controlling a GPU end to calculate AUC estimated values corresponding to the sample data to be detected in parallel; and transmitting the AUC estimated value to the CPU end for variable point detection. The method is realized in the CUDA environment, and the GPU is used for performing parallel calculation on sample data to be detected and simultaneously obtaining a plurality of AUC estimated values, so that the time required by variable point detection is greatly shortened, and the instantaneity of the variable point detection is ensured. The method has obvious effect when the number of sample data to be detected is large.

Description

Fast variable point detection method and device based on CUDA (compute unified device architecture) and storage medium
Technical Field
The invention relates to the technical field of variable point detection, in particular to a quick variable point detection method and device based on a CUDA (compute unified device architecture) and a storage medium.
Background
In the field of statistics, the detection of the change point has been a relatively popular research direction. In recent years, the research on variable point detection is rapidly developed in theory and application, a series of variable point detection algorithms are proposed, such as a least square method, a maximum likelihood method, a Bayesian method, a CUSOM control chart and the like, and the variable point detection technology is widely applied to the fields of meteorological monitoring, signal tracking, sonar detection, financial data analysis, industrial quality detection, network security and the like. The variation point is a point in which the statistical properties (mean, variance, covariance, etc.) of samples vary before and after a certain point in a sequence or process. In an actual application environment, whether the variable point is generated is detected in real time, and the variable point can be processed in time, so that unnecessary damage is avoided. For example, in the process of monitoring typhoon weather, monitoring data can fluctuate, people can utilize a variable point detection algorithm to detect whether the fluctuations belong to normal fluctuations or abnormal fluctuations in real time, and monitoring of the weather is achieved so as to prevent the fluctuations in time.
The Area Under the Curve (AUC) is the receiver operating characteristic Curve, and two types of problems are usually analyzed. The AUC is equivalent to the famous man-wheatstone U statistic (MWUS) in statistics, and the AUC estimated values corresponding to the two types of samples can be obtained. Therefore, the variable point detection method commonly used in the prior art can detect the variable point by using the area under the curve. When the sample data distribution form is unknown, the variable point detection method utilizing the area under the curve can realize the detection of the variable point, can show excellent robustness in a noise interference environment, and has good practicability. However, in the specific implementation, since the area under the curve needs to perform a large amount of operations on the sample data, the generated sample data often needs to be detected synchronously in the actual application environment, which requires that the speed of the variable point detection is fast. However, when the amount of sample data is large, it takes a lot of time to serially process the sample data only by the CPU, which makes it difficult to ensure the real-time performance of the change point detection.
Therefore, in the process of detecting the change point by using the area under the curve, the problem to be solved by the technical personnel in the field is how to improve the detection speed.
Disclosure of Invention
The invention aims to provide a quick variable point detection method, a quick variable point detection device and a storage medium based on CUDA (compute unified device architecture), which are used for improving the detection speed in the variable point detection process by utilizing the area under the curve.
In order to solve the technical problem, the invention provides a rapid variable point detection method based on CUDA, comprising the following steps:
pre-building a CUDA environment for variable point detection;
collecting sample data to be detected, and storing the sample data to be detected at a CPU end;
distributing a storage space for the sample data to be detected at a GPU end, and transmitting the sample data to be detected from the CPU end to a video memory of the GPU end;
controlling the GPU end to calculate AUC estimated values corresponding to the sample data to be detected in parallel;
and transmitting the AUC estimated value to the CPU end for variable point detection.
Preferably, the controlling the GPU terminal to calculate an AUC estimation value corresponding to the sample data to be detected in parallel specifically includes:
defining a kernel function running at the GPU end to perform parallel computation on a plurality of pairs of windows corresponding to the samples to be detected;
calling an API function at the CPU end to control the kernel function to run on the GPU end to obtain the AUC estimated value;
ranking the AUC estimates.
Preferably, the defining a kernel function running at the GPU end to perform parallel computation on a plurality of pairs of windows corresponding to the samples to be detected specifically includes:
determining the number of window pairs in parallel computing according to the dimensionality of the sample to be detected so as to determine the number of threads required;
calculating to obtain grid parameters corresponding to the kernel function in parallel calculation according to the parameters of the preset thread blocks and the number of the threads;
and determining the position of each thread in the grid to be used as the kernel function to access the global index of each pair of windows.
Preferably, the number of the window pairs is N-m-N +1, and the number of the threads is N-m-N + 1;
wherein, N is the dimension of the sample data to be detected, and m and N are the width of each window in each pair of windows respectively.
Preferably, the ranking of the AUC estimation values is specifically performed by a cluster programming library in the CUDA.
Preferably, after the sorting, the method further comprises:
storing the sorting result into a first array and a second array which are defined in advance;
the first array is used for storing the sequenced AUC estimated values, and the second array is used for storing the original positions of the sample data to be detected corresponding to the AUC estimated values.
Preferably, after the result of the change point detection has a change point, the method further comprises:
and outputting the specific position of the change point.
In order to solve the above technical problem, the present invention further provides a fast change point detection device based on CUDA, including:
the building unit is used for building a CUDA environment for variable point detection in advance;
the acquisition unit is used for acquiring sample data to be detected and storing the sample data to be detected at a CPU end;
the transmission unit is used for distributing a storage space for the sample data to be detected at the GPU end and transmitting the sample data to be detected from the CPU end to a video memory of the GPU end;
the control unit is used for controlling the GPU end to calculate AUC estimated values corresponding to the sample data to be detected in parallel;
and the detection unit is used for transmitting the AUC estimated value to the CPU end for variable point detection.
In order to solve the above technical problem, the present invention further provides a fast change point detection device based on CUDA, which includes a memory for storing a computer program;
and a processor, configured to implement the steps of the CUDA-based fast change point detection method when executing the computer program.
In order to solve the above technical problem, the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps of the CUDA-based fast change point detection method are implemented.
The quick variable point detection method based on the CUDA comprises the steps of pre-constructing a CUDA environment for variable point detection; collecting sample data to be detected, and storing the sample data to be detected at a CPU end; distributing a storage space for the sample data to be detected at the GPU end, and transmitting the sample data to be detected from the CPU end to a video memory of the GPU end; controlling a GPU end to calculate AUC estimated values corresponding to the sample data to be detected in parallel; and transmitting the AUC estimated value to the CPU end for variable point detection. Therefore, the method is realized in the CUDA environment, and the GPU is used for performing parallel computation on the sample data to be detected and simultaneously obtaining a plurality of AUC estimated values, so that the time required by the variable point detection is greatly shortened, and the real-time performance of the variable point detection is ensured. The method has obvious effect when the number of sample data to be detected is large. In addition, the invention also provides a rapid change point detection device based on the CUDA and a computer storage medium corresponding to the method, and the effect is as described above.
Drawings
In order to illustrate the embodiments of the present invention more clearly, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained by those skilled in the art without inventive effort.
Fig. 1 is a flowchart of a fast change point detection method based on CUDA according to an embodiment of the present invention;
fig. 2 is a schematic diagram of windowing processing of sample data to be detected according to an embodiment of the present invention;
fig. 3 is a distribution diagram of AUC estimated values of the to-be-detected sample data in the variable point detection in fig. 2 according to an embodiment of the present invention;
fig. 4 is a structural diagram of a fast change point detection apparatus of a CUDA according to an embodiment of the present invention;
fig. 5 is a structural diagram of another fast change point detection apparatus for CUDA according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without any creative work belong to the protection scope of the present invention.
The core of the invention is to provide a rapid variable point detection method, a device and a storage medium based on CUDA, which are used for improving the detection speed in the variable point detection process by utilizing the area under the curve.
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
Fig. 1 is a flowchart of a fast change point detection method based on CUDA according to an embodiment of the present invention. As shown in fig. 1, the method includes:
s10: and pre-building a CUDA environment for variable point detection.
The CUDA (computer Unified Device Architecture) is a parallel programming model developed by NVIDA for GPU-oriented general purpose computing, which uses a CPU as a host and a GPU as a Device or a coprocessor. The CPU is mainly responsible for performing highly logical transaction processing and serial computation, while the GPU is dedicated to performing highly threaded parallel processing tasks. The CUDA is composed of a host end (CPU) program and a device end (GPU) program, the GPU has strong parallel computing capability and ultrahigh floating point arithmetic capability, and if the GPU can be used for carrying out parallel processing on the variable point detection process under the CUDA, the variable point detection efficiency can be greatly improved, and a large amount of time is saved.
Setting up environments for realizing variable point detection, wherein the environments comprise a hardware environment and a software environment, and the hardware environment comprises a CPU and a display chip GPU supporting a CUDA programming model; the software environment includes a C/C + + compiler and a CUDA. The step is to install C/C + + compiling software capable of realizing parallel computing on a computer with a GPU video card (CPU is necessary for each computer). Since those skilled in the art know how to build the CUDA environment, and this step is not the core invention point of the present invention, and only utilizes the existing CUDA environment to perform the subsequent operations, it will not be described herein too much.
S11: and collecting sample data to be detected, and storing the sample data to be detected at a CPU (central processing unit) end.
Data acquisition is generally performed in a CPU environment and stored in a CPU memory, specifically, the CPU dynamically stores the acquired data by calling a malloc function. The point change detection operation of the collected data in the CPU environment is a serial operation, so the speed is low. The calculation process of the AUC estimation value at the CPU end in the prior art is as follows:
fig. 2 is a schematic diagram of windowing processing of sample data to be detected according to an embodiment of the present invention. Fig. 3 is a distribution diagram of AUC estimated values of the sample data to be detected in fig. 2 in the variable point detection according to an embodiment of the present invention. As shown in fig. 2, two continuous windows X and Y are selected to perform windowing on sample data to be detected, the sizes (widths) of the two windows are set to be m and n, respectively, and then the sample data in the window X is X1......XmThe sample data in the window Y is Y1......YnAnd completing the processing of all data by gradually sliding two windows, and recording AUC estimated values corresponding to the sample data in the two windows at each position, wherein the distribution of the estimated values is shown in FIG. 3. Wherein, the estimated value of AUC is calculated as:
Figure BDA0001653795200000051
in the above formula H (t) is a step function, i.e.
Figure BDA0001653795200000061
By using the AUC estimated value as a statistic, when the windows X and Y are in the same sample data, the corresponding AUC value is 0.5. However, when the AUC value is 0 or 1, the two types of samples in windows X and Y are completely separated, and there are two distinct peaks in fig. 3, which are the change points we want to detect.
From the above, a sliding window is required for serial calculation, which undoubtedly increases the time of the change point detection process. Therefore, the calculation process of the AUC estimated value is realized through GPU parallel calculation, and the detection efficiency is greatly improved.
S12: and allocating a storage space for the sample data to be detected at the GPU end, and transmitting the sample data to be detected from the CPU end to the video memory of the GPU end.
If the samples to be detected are subjected to parallel computation through the GPU, the collected data must be copied to the GPU for operation. The purpose of this step is to transmit the data collected in S11 to the GPU for parallel computation, and this transmission process is to copy the data stored in the CPU memory to the internal memory of the GPU (also referred to as a video memory). And performing parallel computation by loading GPU programs and caching data on the chip.
Specifically, the CPU side allocates a storage space for the data by calling an API function cudaMalloc of the CUDA, for example, cudaMalloc (& x, N × sizing of (float)) allocates a storage space with a size of N for the variable, and then calls an API function cudaMemcpy of the CUDA to transmit the data from the CPU memory to the display memory of the GPU, and performs subsequent parallel processing.
S13: and controlling the GPU terminal to calculate AUC estimated values corresponding to the sample data to be detected in parallel.
The calculation of the AUC estimated value of the sample data to be detected in this step is similar to the prior art, and a window needs to be set, but different from the prior art, a pair of windows needs to be set for each position, so that there are multiple pairs of windows, and accordingly, it is not necessary to move each window to perform calculation once, but one thread corresponds to a pair of windows to perform calculation, and when there are multiple threads, parallel calculation is performed on the multiple pairs of windows.
As a preferred embodiment, S13 specifically includes:
s130: and defining a kernel function running at a GPU (graphics processing Unit) end to perform parallel computation on a plurality of pairs of windows corresponding to samples to be detected.
The method for defining the kernel function running at the GPU end to perform parallel computation on a plurality of pairs of windows corresponding to samples to be detected specifically comprises the following steps:
determining the number of window pairs in parallel computation according to the dimensionality of a sample to be detected so as to determine the number of threads required;
calculating according to the parameters of the preset thread blocks and the number of threads to obtain grid parameters corresponding to the kernel function in parallel calculation;
the position of each thread in the grid is determined to be used as a kernel function to access the global index of each pair of windows.
The specific application process of the kernel function is as follows:
and marking the sample data to be detected transmitted from the CPU memory to the GPU as x, storing the x in a global memory at the GPU end, and realizing parallelization calculation through a kernel function. The kernel is not a complete program, but a step in the entire CUDA program that can be executed in parallel must use the __ global __ function type qualifiers specified by the CUDA and specify that threads perform the same operation on different data. The kernel function program provides a two-level parallel development mode, namely a thread level and a thread block level.
Threads are sequential execution units executing on a Stream Processor (SP), and a large number of threads can execute the same code at the same time, belonging to a fine-grained parallel hierarchy. The thread block is composed of a plurality of closely related threads and belongs to a coarse-grained parallel hierarchy. And a plurality of thread blocks which are executed in parallel form a grid, and the grid jointly complete the tasks specified by the kernel function.
The kernel function of the invention realizes that each thread accesses data to be processed in a pair of windows through the thread block index blockIdx and the thread index threadaidx of each thread, thereby obtaining an AUC estimation value corresponding to the pair of windows. And the AUC estimated values corresponding to multiple pairs of windows can be simultaneously obtained by performing multiple threads concurrently, so that the calculation efficiency of the AUC estimated values is improved. Wherein, the calculation formula of the AUC estimated value is described above.
Considering that threads are organized and executed in the form of warp thread bundles, in the configuration process of thread blocks, the parameters (size) of the thread blocks are generally integer multiples of 32 and are denoted as BlockDim. When the variable point detection is carried out on the sample data x to be detected with the dimension N, the AUC value corresponding to the window with the number of N-m-N +1 needs to be calculated, so that the parallelization calculation is completed by N-m-N +1 threads in total. Wherein, N is the dimension of the sample data to be detected, and m and N are the width of each window in each pair of windows respectively. From the selected BlockDim value, a parameter (size) of the grid can be obtained and is denoted as gridddim. Wherein GridDim ═ N-m + BlockDim)/BlockDim. In the kernel function, a global index is formed in the whole grid for each thread through i ═ blockidx.x ×. blockdim.x + threeadidx.x, the value of i can also represent the initial position of the window X in the sample data X to be detected, and then if statements are used for judging whether the global index of the thread is in an effective range. Defined in the if statement is the specific operation of each thread on the data in a pair of windows. Firstly, two arrays x1 and x2 need to be defined, each thread directly accesses data of a specified area in a global memory through a global index i of each thread, and the [ i, i + m-1] area data in sample data x to be detected is loaded into the array x1, and meanwhile, the [ i + m, i + m + n-1] area data in the data x is loaded into the array x 2.
In order to increase the running speed of the program, the data loads are operated by pointers. Then a segment of dual for loop calculation is performed
Figure BDA0001653795200000081
Code for the accumulated sum, the accumulated sum being an estimate of AUC for a pair of windows. And executing the same kernel program by all threads in parallel, defining an array dauc for storing the AUC estimated value calculated by each thread, namely obtaining all AUC estimated values corresponding to the windows, wherein the size of the array is N-m-N + 1.
It should be noted that the values of m and n can be set according to actual conditions, the sizes of the two sliding windows have a great influence on the accuracy of the detection result, and as the size of the window increases, the accuracy of the detection result also improves in a certain range, but the increase of the window affects the detection speed. The appropriate window size needs to be set according to the actual environment, the accuracy of variable point detection is guaranteed, and meanwhile the detection speed can be improved through parallel processing operation.
S131: and calling an API function at the CPU end to control the kernel function to run on the GPU end to obtain an AUC estimated value.
In the above steps, the execution process of the kernel function is described, and in the specific implementation, the kernel function can be controlled to operate only by calling from the CPU. Specifically, the CPU needs to call an API function cudaDeviceReset of the CUDA to start the kernel and initialize the GPU, and then call a kernel function to execute on the GPU side, so that all AUC estimated values corresponding to the window can be obtained. Parameters for execution of the kernel function are GridDim and BlockDim, the dimensions of the grid and the dimensions of each block are defined, and the execution configuration parameters are only used when the kernel function is started or called.
S132: the AUC estimates are ranked.
The sequencing in the step is to quickly determine whether a change point exists after an AUC estimated value is obtained on the CPU. As a preferred embodiment, the AUC estimates are ranked in particular by the cluster programming library in CUDA.
As another preferred embodiment, the sorting result is stored into a first array and a second array which are defined in advance;
the first array is used for storing the sequenced AUC estimated values, and the second array is used for storing the original positions of the sample data to be detected corresponding to the AUC estimated values.
The Standard Template Library (STL) -based Thrust programming library provides a generic parallel algorithm based on a generic, high-performance parallel computation can be realized with minimum programming cost. According to the threst provided by the thread library, the sort _ by _ key function can sort the elements in the d _ auc array according to a given standard, obtain the position of each sorted array element in the original d _ auc array, and define a first array key and a second array value.
S14: and transmitting the AUC estimated value to the CPU end for variable point detection.
After the plurality of AUC estimated values are calculated in parallel in step S13, the obtained AUC estimated values are transmitted to the CPU for performing the variable point detection, which can be understood that this step is the same as the prior art, and therefore, how to perform the variable point detection after obtaining the AUC estimated values is not described here again.
The fast variable point detection method based on the CUDA provided by the embodiment comprises the steps of pre-constructing a CUDA environment for variable point detection; collecting sample data to be detected, and storing the sample data to be detected at a CPU end; distributing a storage space for the sample data to be detected at the GPU end, and transmitting the sample data to be detected from the CPU end to a video memory of the GPU end; controlling a GPU end to calculate AUC estimated values corresponding to the sample data to be detected in parallel; and transmitting the AUC estimated value to the CPU end for variable point detection. Therefore, the method is realized in the CUDA environment, and the GPU is used for performing parallel computation on the sample data to be detected and simultaneously obtaining a plurality of AUC estimated values, so that the time required by the variable point detection is greatly shortened, and the real-time performance of the variable point detection is ensured. The method has obvious effect when the number of sample data to be detected is large.
On the basis of the above embodiment, when the result of the change point detection has a change point, the method further includes:
and outputting the specific position of the variable point.
When the sample data to be detected has a change point, the change point detection method provided in the above embodiment can quickly find the change point, and in order to more conveniently determine the position of the change point, in this embodiment, a step of outputting a specific position of the change point is added. It can be understood that the specific position of the output variable point may adopt various forms of representation modes, which may be a figure or a number, and the description of this embodiment is omitted.
The embodiments described above are described in detail from the perspective of a fast change point detection method of the CUDA, and the present invention also provides a fast change point detection device based on the CUDA corresponding to the method. The apparatus part is mainly described from the point of view of functional modularity, but is essentially the same as the embodiment of the method part. Fig. 4 is a structural diagram of a fast change point detection apparatus of a CUDA according to an embodiment of the present invention. As shown in fig. 4, the fast change point detection apparatus of CUDA includes:
the building unit 10 is used for building a CUDA environment for variable point detection in advance;
the acquisition unit 11 is used for acquiring sample data to be detected and storing the sample data to be detected at the CPU end;
the transmission unit 12 is configured to allocate a storage space for the sample data to be detected at the GPU terminal, and transmit the sample data to be detected from the CPU terminal to the video memory at the GPU terminal;
the control unit 13 is used for controlling the GPU terminal to perform parallel computation on AUC estimated values corresponding to the sample data to be detected;
and the detection unit 14 is used for transmitting the AUC estimated value to the CPU end for carrying out the change point detection.
Since the embodiments of the apparatus portion and the method portion correspond to each other, please refer to the description of the embodiments of the method portion for the embodiments of the apparatus portion, which is not repeated here.
It will be appreciated that the units described as separate components may or may not be physically separate, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The fast variable point detection device based on the CUDA provided by the embodiment comprises a CUDA environment which is set up in advance and used for variable point detection; collecting sample data to be detected, and storing the sample data to be detected at a CPU end; distributing a storage space for the sample data to be detected at the GPU end, and transmitting the sample data to be detected from the CPU end to a video memory of the GPU end; controlling a GPU end to calculate AUC estimated values corresponding to the sample data to be detected in parallel; and transmitting the AUC estimated value to the CPU end for variable point detection. Therefore, the device is realized in the CUDA environment, and the GPU is used for performing parallel computation on sample data to be detected and simultaneously obtaining a plurality of AUC estimated values, so that the time required by variable point detection is greatly shortened, and the real-time performance of the variable point detection is ensured. The method has obvious effect when the number of sample data to be detected is large.
In addition, another fast change point detection device based on CUDA is provided in the embodiment of the present invention, which is mainly described from the perspective of hardware entities, but is essentially the same as the embodiment of the method part. Fig. 5 is a structural diagram of another fast change point detection apparatus for CUDA according to an embodiment of the present invention. As shown in fig. 5, a CUDA-based fast change point detection apparatus includes a memory 20 for storing a computer program;
a processor 21, configured to implement the steps of the CUDA-based fast change point detection method as described above when executing the computer program.
Since the embodiments of the apparatus portion and the method portion correspond to each other, please refer to the description of the embodiments of the method portion for the embodiments of the apparatus portion, which is not repeated here.
The fast variable point detection device based on the CUDA provided by the embodiment comprises a CUDA environment which is set up in advance and used for variable point detection; collecting sample data to be detected, and storing the sample data to be detected at a CPU end; distributing a storage space for the sample data to be detected at the GPU end, and transmitting the sample data to be detected from the CPU end to a video memory of the GPU end; controlling a GPU end to calculate AUC estimated values corresponding to the sample data to be detected in parallel; and transmitting the AUC estimated value to the CPU end for variable point detection. Therefore, the device is realized in the CUDA environment, and the GPU is used for performing parallel computation on sample data to be detected and simultaneously obtaining a plurality of AUC estimated values, so that the time required by variable point detection is greatly shortened, and the real-time performance of the variable point detection is ensured. The method has obvious effect when the number of sample data to be detected is large.
Finally, the invention also provides an embodiment of a computer-readable storage medium. Wherein, the computer readable storage medium has stored thereon a computer program, which when executed by a processor implements the steps of the CUDA-based fast change point detection method according to the above embodiments.
The above units as functional modules may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a mobile terminal (which may be a mobile phone, a tablet computer, or a handheld device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The computer-readable storage medium provided by this embodiment is used for executing a CUDA-based fast change point detection method, and the method includes pre-building a CUDA environment for change point detection; collecting sample data to be detected, and storing the sample data to be detected at a CPU end; distributing a storage space for the sample data to be detected at the GPU end, and transmitting the sample data to be detected from the CPU end to a video memory of the GPU end; controlling a GPU end to calculate AUC estimated values corresponding to the sample data to be detected in parallel; and transmitting the AUC estimated value to the CPU end for variable point detection. Therefore, the method is realized in the CUDA environment, and the GPU is used for performing parallel computation on the sample data to be detected and simultaneously obtaining a plurality of AUC estimated values, so that the time required by the variable point detection is greatly shortened, and the real-time performance of the variable point detection is ensured. The method has obvious effect when the number of sample data to be detected is large.
The fast change point detection method, device and storage medium based on CUDA provided by the present invention are described in detail above. The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (8)

1. A rapid variable point detection method based on CUDA is characterized by comprising the following steps:
pre-building a CUDA environment for variable point detection;
collecting sample data to be detected, and storing the sample data to be detected at a CPU end;
distributing a storage space for the sample data to be detected at a GPU end, and transmitting the sample data to be detected from the CPU end to a video memory of the GPU end;
controlling the GPU end to calculate AUC estimated values corresponding to the sample data to be detected in parallel; the controlling the GPU terminal to calculate the AUC estimated value corresponding to the sample data to be detected in parallel specifically includes: defining a kernel function running at the GPU end to perform parallel computation on a plurality of pairs of windows corresponding to the samples to be detected; calling an API function at the CPU end to control the kernel function to run on the GPU end to obtain the AUC estimated value; ranking said AUC estimates;
the defining a kernel function running at the GPU end to perform parallel computation on a plurality of pairs of windows corresponding to the samples to be detected specifically includes: determining the number of window pairs in parallel computing according to the dimensionality of the sample to be detected so as to determine the number of threads required; calculating to obtain grid parameters corresponding to the kernel function in parallel calculation according to the parameters of the preset thread blocks and the number of the threads; determining the position of each thread in the grid to serve as the kernel function to access the global index of each pair of windows;
and transmitting the AUC estimated value to the CPU end for variable point detection.
2. The CUDA-based fast change point detection method according to claim 1, wherein the number of the window pairs is N-m-N +1, and the number of the threads is N-m-N + 1;
wherein, N is the dimension of the sample data to be detected, and m and N are the width of each window in each pair of windows respectively.
3. The CUDA-based rapid change point detection method of claim 1, wherein the ranking of the AUC estimates is specifically performed by a Thrust programming library in the CUDA.
4. A CUDA-based fast change point detection method according to any of claims 1-3, further comprising after sorting:
storing the sorting result into a first array and a second array which are defined in advance;
the first array is used for storing the sequenced AUC estimated values, and the second array is used for storing the original positions of the sample data to be detected corresponding to the AUC estimated values.
5. The CUDA-based fast change point detection method of claim 4, wherein when there is a change point in the change point detection result, further comprising:
and outputting the specific position of the change point.
6. A quick change point detection device based on CUDA, characterized by comprising:
the building unit is used for building a CUDA environment for variable point detection in advance;
the acquisition unit is used for acquiring sample data to be detected and storing the sample data to be detected at a CPU end;
the transmission unit is used for distributing a storage space for the sample data to be detected at the GPU end and transmitting the sample data to be detected from the CPU end to a video memory of the GPU end;
the control unit is used for controlling the GPU end to calculate AUC estimated values corresponding to the sample data to be detected in parallel; the controlling the GPU terminal to calculate the AUC estimated value corresponding to the sample data to be detected in parallel specifically includes: defining a kernel function running at the GPU end to perform parallel computation on a plurality of pairs of windows corresponding to the samples to be detected; calling an API function at the CPU end to control the kernel function to run on the GPU end to obtain the AUC estimated value; ranking said AUC estimates;
the defining a kernel function running at the GPU end to perform parallel computation on a plurality of pairs of windows corresponding to the samples to be detected specifically includes: determining the number of window pairs in parallel computing according to the dimensionality of the sample to be detected so as to determine the number of threads required; calculating to obtain grid parameters corresponding to the kernel function in parallel calculation according to the parameters of the preset thread blocks and the number of the threads; determining the position of each thread in the grid to serve as the kernel function to access the global index of each pair of windows;
and the detection unit is used for transmitting the AUC estimated value to the CPU end for variable point detection.
7. A fast change point detection device based on CUDA is characterized by comprising a memory for storing a computer program;
a processor for implementing the steps of the CUDA based fast change point detection method according to any of claims 1 to 5 when executing the computer program.
8. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the CUDA-based fast change point detection method according to any one of claims 1 to 5.
CN201810432528.4A 2018-05-08 2018-05-08 Fast variable point detection method and device based on CUDA (compute unified device architecture) and storage medium Active CN108664348B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810432528.4A CN108664348B (en) 2018-05-08 2018-05-08 Fast variable point detection method and device based on CUDA (compute unified device architecture) and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810432528.4A CN108664348B (en) 2018-05-08 2018-05-08 Fast variable point detection method and device based on CUDA (compute unified device architecture) and storage medium

Publications (2)

Publication Number Publication Date
CN108664348A CN108664348A (en) 2018-10-16
CN108664348B true CN108664348B (en) 2021-08-27

Family

ID=63778972

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810432528.4A Active CN108664348B (en) 2018-05-08 2018-05-08 Fast variable point detection method and device based on CUDA (compute unified device architecture) and storage medium

Country Status (1)

Country Link
CN (1) CN108664348B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111024078B (en) * 2019-11-05 2021-03-16 广东工业大学 Unmanned aerial vehicle vision SLAM method based on GPU acceleration
CN112229989A (en) * 2020-10-19 2021-01-15 广州吉源生物科技有限公司 Biological sample identification equipment of GPU (graphics processing Unit) technology
CN113918356B (en) * 2021-12-13 2022-02-18 广东睿江云计算股份有限公司 Method and device for quickly synchronizing data based on CUDA (compute unified device architecture), computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7199841B2 (en) * 2001-12-28 2007-04-03 Lg Electronics Inc. Apparatus for automatically generating video highlights and method thereof
CN103871021A (en) * 2014-02-27 2014-06-18 电子科技大学 CPU (central processing unit)-GPU (graphic processing unit) cooperative work target track initializing method
CN104021519A (en) * 2014-06-17 2014-09-03 电子科技大学 Maneuvering multi-target tracking algorithm under dense clutter condition based on GPU architecture

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7199841B2 (en) * 2001-12-28 2007-04-03 Lg Electronics Inc. Apparatus for automatically generating video highlights and method thereof
CN103871021A (en) * 2014-02-27 2014-06-18 电子科技大学 CPU (central processing unit)-GPU (graphic processing unit) cooperative work target track initializing method
CN104021519A (en) * 2014-06-17 2014-09-03 电子科技大学 Maneuvering multi-target tracking algorithm under dense clutter condition based on GPU architecture

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
《Defect detection in textured materials using Gabor filters 》;Ajay Kumar等;《Conference Record of the 2000 IEEE Industry Applications Conference》;20001012;全文 *
《基于AUC的变点检测》;吴学龙;《中国优秀硕士学位论文全文数据库》;20151015;全文 *
《基于AUC的非参数快速变点检测算法》;吴学龙,徐维超等;《计算机与现代化》;20150731;全文 *
《基于CUDA的Gabor滤波无纺布疵点在线检测》;刘海平;《计算机与数字工程》;20150930;全文 *

Also Published As

Publication number Publication date
CN108664348A (en) 2018-10-16

Similar Documents

Publication Publication Date Title
Lorbeer et al. Variations on the clustering algorithm BIRCH
US10552119B2 (en) Dynamic management of numerical representation in a distributed matrix processor architecture
CN108664348B (en) Fast variable point detection method and device based on CUDA (compute unified device architecture) and storage medium
Keck FastBDT: a speed-optimized multivariate classification algorithm for the Belle II experiment
Chen et al. Actnn: Reducing training memory footprint via 2-bit activation compressed training
US8166479B2 (en) Optimizing data analysis through directional dependencies of a graph including plurality of nodes and attributing threading models and setting status to each of the nodes
Grahn et al. CudaRF: a CUDA-based implementation of random forests
EP3279769B1 (en) Method for processing event signal and event-based sensor performing the same
US20140153834A1 (en) Hough transform for circles
CN109829371B (en) Face detection method and device
US20080082475A1 (en) System and method for resource adaptive classification of data streams
Schweitzer et al. A dual-bound algorithm for very fast and exact template matching
WO2014138234A1 (en) Demand determination for data blocks
Huffmire et al. Wavelet-based phase classification
CN110751400B (en) Risk assessment method and device
CN103927765A (en) Method and device for positioning barcode area in image
Zhang et al. Adaptive sampling scheme for learning in severely imbalanced large scale data
Shen et al. Detecting the phase behavior on cache performance using the reuse distance vectors
CN113723538A (en) Cross-platform power consumption performance prediction method and system based on hierarchical transfer learning
CN109583590B (en) Data processing method and data processing device
EP2915059B1 (en) Analyzing data with computer vision
Gangodkar et al. Efficient variable size template matching using fast normalized cross correlation on multicore processors
CN117992856B (en) User electricity behavior analysis method, system, device, medium and program product
CN105955825B (en) Method for optimizing astronomy software gridding
CN113687421B (en) Data processing method and device for seismic signals, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant