WO2023000251A1

WO2023000251A1 - Method and apparatus for constructing kernel density estimator, and electronic device and medium

Info

Publication number: WO2023000251A1
Application number: PCT/CN2021/107837
Authority: WO
Inventors: 何玉林; 黄德发
Original assignee: 深圳大学
Priority date: 2021-07-22
Filing date: 2021-07-22
Publication date: 2023-01-26

Abstract

Disclosed in the present application are a method and apparatus for constructing a kernel density estimator, and an electronic device and a medium. The method comprises: constructing K data block pairs on the basis of an original data set, wherein each data block pair comprises a training block and a verification block, and the original data set comprises N data samples, K and N both being natural numbers greater than 1; constructing a Gaussian kernel density estimator for each training block, so as to obtain a Gaussian kernel density estimator corresponding to each training block; calculating, by using the Gaussian kernel density estimator corresponding to each training block, an average kernel density corresponding to each verification block; and constructing a final kernel density estimator on the basis of the average kernel densities corresponding to all the verification blocks and a pre-determined structural risk item function. By means of the embodiments of the present application, the stability and accuracy of kernel density estimation can be improved.

Description

Construction method, device, electronic equipment and medium of kernel density estimator

technical field

The embodiments of the present application relate to the technical field of data mining, for example, to a construction method, device, electronic equipment and media of a kernel density estimator.

Background technique

Estimating the probability density function of an unknown distribution data set is one of the important research directions in the field of machine learning and data mining. How to effectively improve the stability and accuracy of the classical kernel density estimator is the core issue of probability density function estimation. The kernel density estimator is a probability density estimation function whose expression is:

Among them, h is the window width;

is the kernel function. The commonly used kernel function of the classic kernel density estimator is the Gaussian function:

The classic cross-validation kernel density estimator (UCV-KDE) uses the unbiased cross-validation method (UCV for short) based on the ISE loss function to find the optimal window width h to construct the kernel density estimator ( Kernel Density Estimator, referred to as KDE). The expression of UCV is as follows:

Among them, the calculation method of the first integral is:

In the second item,

is the expected value of the kernel density estimate, which is solved based on one-sample cross-validation in the UCV method:

At the same time, the standard deviation calculation method of the corresponding kernel density estimation can also be obtained:

However, when the UCV method is used to solve the expected value of the kernel density estimate, the stability of the kernel density estimate is poor, and the standard deviation obtained by the above formula is relatively large; and the KDE finally constructed has a certain error with the real distribution, and its stability and accuracy needs to be further improved.

Contents of the invention

The present application provides a construction method, device, electronic equipment and medium of a kernel density estimator, which can improve the stability and accuracy of kernel density estimation.

In the first aspect, the embodiment of the present application provides a method for constructing a kernel density estimator, the method comprising:

Construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both natural numbers greater than 1;

Construct a Gaussian kernel density estimator for each training block, and obtain the Gaussian kernel density estimator corresponding to each training block;

Use the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block;

Based on the average kernel density corresponding to all verification blocks and the predetermined structural risk term function, the final kernel density estimator is constructed.

In the second aspect, the embodiment of the present application also provides an apparatus for constructing a kernel density estimator, the apparatus including: a construction module and a calculation module; wherein,

The building module is used to construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are A natural number greater than 1; construct a Gaussian kernel density estimator for each training block, and obtain a Gaussian kernel density estimator corresponding to each training block;

The calculation module is used to calculate the average kernel density corresponding to each verification block by using the Gaussian kernel density estimator corresponding to each training block;

The building module is further configured to construct a final kernel density estimator based on the average kernel density corresponding to all verification blocks and a predetermined structural risk term function.

In a third aspect, the embodiment of the present application provides an electronic device, including:

one or more processors;

memory for storing one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the method for constructing a kernel density estimator described in any embodiment of the present application.

In a fourth aspect, an embodiment of the present application provides a storage medium on which a computer program is stored, and when the program is executed by a processor, the method for constructing a kernel density estimator described in any embodiment of the present application is implemented.

Description of drawings

Fig. 1 is the first schematic flowchart of the construction method of the kernel density estimator provided by the embodiment of the present application;

Fig. 2 is the second schematic flow chart of the construction method of the kernel density estimator provided by the embodiment of the present application;

FIG. 3 is a third schematic flow chart of a construction method of a kernel density estimator provided in an embodiment of the present application;

Figure 4 is a comparison chart of the standard deviation of the kernel density estimate calculated by the UCV method and the EUCV method on a normally distributed data set;

Figure 5 is a comparison chart of the standard deviation of the kernel density estimate calculated by the UCV method and the EUCV method on the Beta distribution data set;

Figure 6 is a comparison chart of the standard deviation of the kernel density estimation calculated by the UCV method and the EUCV method on the F-distributed data set;

Figure 7 is a schematic diagram of the change of h calculated with K on a normally distributed data set;

Figure 8 is a schematic diagram of the MAE calculated on the original data set of the normal distribution as a function of K;

Fig. 9 is a schematic diagram of MAE calculated on a normally distributed test data set as a function of K;

Figure 10 is a schematic diagram of the change of h calculated with K on the Beta distribution data set;

Figure 11 is a schematic diagram of the change of MAE calculated on the original data set of Beta distribution with K;

Figure 12 is a schematic diagram of the MAE calculated on the test data set of the Beta distribution as a function of K;

Fig. 13 is a schematic diagram of the change of h calculated with K on the data set of F distribution;

Figure 14 is a schematic diagram of the change of MAE calculated on the original data set of F distribution with K;

Figure 15 is a schematic diagram of the MAE calculated on the test data set of the F distribution as a function of K;

FIG. 16 is a schematic structural diagram of a construction device for a kernel density estimator provided in an embodiment of the present application;

FIG. 17 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

detailed description

The application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, but not to limit the present application. In addition, it should be noted that, for the convenience of description, only some structures related to the present application are shown in the drawings but not all structures.

Embodiment one

Fig. 1 is a schematic flow chart of the first construction method of the kernel density estimator provided by the embodiment of the present application, the method can be executed by the construction device or electronic equipment of the kernel density estimator, and the device or electronic equipment can be implemented by software and/or Realized by means of hardware, this device or electronic device can be integrated into any smart device with network communication function. As shown in Figure 1, the construction method of the kernel density estimator may include the following steps:

S101. Construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both natural numbers greater than 1.

In this step, the electronic device can construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both greater than The natural number of 1. The electronic device can first randomly extract X data samples from the N data samples in the original data set; use the X data samples as training blocks in the current data block pair; then use the remaining Y data samples in the original data set as the current The verification block in the data block pair; construct one of the K data block pairs from the training block in the current data block pair and the verification block in the current data block pair; repeat the above operations until K are constructed A pair of data blocks; wherein, N=X+Y; both X and Y are natural numbers greater than or equal to 1 and less than N. For example, assuming that the original data set includes 10 data samples, when constructing the first data block pair, 3 samples are randomly selected from the 10 sample data, and these 3 data samples are used as the first data block pair training block; the remaining 7 data samples are used as the verification block in the first data block pair; the training block containing 3 data samples and the verification block containing 7 data samples extracted for the first time are used to construct the second A data block pair; when constructing the second data block pair, 3 samples are randomly selected from the 10 sample data, and these 3 data samples are used as the training blocks in the second data block pair; the remaining The 7 data samples of are used as the verification block in the second data block pair; the second data block pair is constructed from the training block containing 3 data samples and the verification block containing 7 data samples extracted for the second time; And so on; until K data block pairs are constructed.

S102. Construct a Gaussian kernel density estimator for each training block, and obtain a Gaussian kernel density estimator corresponding to each training block.

In this step, the electronic device may construct a Gaussian kernel density estimator for each training block, and obtain a Gaussian kernel density estimator corresponding to each training block. The electronic device can first construct a Gaussian kernel function corresponding to each data sample in each training block; then based on the Gaussian kernel function corresponding to each data sample in each training block and the predetermined number of data samples in each training block , to obtain the Gaussian kernel density estimator corresponding to each training block.

S103. Using the Gaussian kernel density estimator corresponding to each training block, calculate the average kernel density corresponding to each verification block.

In this step, the electronic device may use the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block. The electronic device can first use the Gaussian kernel density estimator corresponding to each training block to calculate the kernel density of each data sample in each verification block; then calculate the corresponding kernel density of each verification block based on the kernel density of each data sample in each verification block. average kernel density.

S104. Construct a final kernel density estimator based on the average kernel density corresponding to all verification blocks and the predetermined structural risk term function.

In this step, the electronic device can construct a final kernel density estimator based on the average kernel density corresponding to all verification blocks and the predetermined structural risk term function. The electronic device can obtain an average value based on the average kernel density corresponding to K verification blocks; at the same time, the application can also obtain the standard deviation of the average kernel density corresponding to K verification blocks; thus the improved UCV is obtained, that is, the integrated Unbiased cross-validation method (Ensemble Unbiased Cross-validation, referred to as EUCV). In addition, in order to improve the accuracy of the algorithm, this application adds a structural risk term function to the improved UCV, and obtains the expression of EUCV based on the structural risk term function; then uses the EUCV expression to obtain the optimal window width; finally uses This optimal window width builds the final kernel density estimator.

The construction method of the kernel density estimator proposed in the embodiment of the present application first constructs K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; and then constructs a Gaussian for each training block Kernel density estimator to obtain the Gaussian kernel density estimator corresponding to each training block; then use the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block; finally based on the average kernel density corresponding to all verification blocks Density and a predetermined structural risk term function to construct the final kernel density estimator. That is to say, in the technical solution of this application, the cross-validation method can be combined with the simple random sampling method, and the original data set can be divided by multiple simple random sampling, and multiple kernel density estimators can be constructed, so that the cross-validation can be performed at a higher standard deviation. The expected value of kernel density estimation is obtained under smaller and more stable conditions; and by introducing a structural risk term function, a better window width is searched to improve the accuracy of the final kernel density estimator. Therefore, compared with related technologies, the construction method of the kernel density estimator proposed in the embodiment of the present application can improve the stability and accuracy of the kernel density estimation; moreover, the technical solution of the embodiment of the present application is simple, convenient and popular. The scope of application is wider.

Embodiment two

Fig. 2 is a second schematic flow chart of the construction method of the kernel density estimator provided by the embodiment of the present application. The description and expansion are made based on the above technical solutions, and may be combined with the above optional implementation modes. As shown in Figure 2, the construction method of the kernel density estimator may include the following steps:

S201. Construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both natural numbers greater than 1.

S202. Construct a Gaussian kernel function corresponding to each data sample in each training block.

In this step, the electronic device may construct a Gaussian kernel function corresponding to each data sample in each training block. The Gaussian kernel function commonly used by the classic kernel density estimator can be expressed as:

S203. Based on the Gaussian kernel function corresponding to each data sample in each training block and the predetermined number of data samples in each training block, obtain a Gaussian kernel density estimator corresponding to each training block.

In this step, the electronic device may obtain the Gaussian kernel density estimator corresponding to each training block based on the Gaussian kernel function corresponding to each data sample in each training block and the predetermined number of data samples in each training block. The electronic device can calculate the Gaussian kernel density estimator corresponding to each training block according to the following formula: For the kth training block

Build a Gaussian kernel density estimator:

In the above formula,

Indicates the Gaussian kernel density estimator corresponding to the kth training block;

Indicates the number of data samples in the training block;

The value of is the same as that of X;

Represents the nth data sample in the kth training block; n represents the order of each data sample in each training block; n is greater than or equal to 1 and less than or equal to

Denotes the kth training block. In this way, K Gaussian kernel density estimators can be obtained through K training blocks, respectively:

S204. Calculate the kernel density of each data sample in each verification block by using the Gaussian kernel density estimator corresponding to each training block.

S205. Based on the kernel density of each data sample in each verification block, calculate an average kernel density corresponding to each verification block.

In a specific embodiment of this application, in the case of a given window width h, for the kth verification block X ^(k) , k=1,2,...,K, use the kth Gaussian kernel density estimation device

Compute the kernel density for each sample in this validation block and take the average. The specific calculation method is as follows:

Step 1) For each data sample in the first verification block X ⁽¹⁾

Compute it and the Gaussian kernel function for each data sample in the first training block according to the predetermined Gaussian kernel function formula:

In the above formula,

Represents the Gaussian kernel function constructed based on the mth data sample in the first verification block and the nth data sample in the first training block; N represents the number of data samples in the verification block; the value of N is the same as The value of Y is the same;

Indicates the mth data sample in the first verification block;

Indicates the nth data sample in the first verification block;

represents the first training block.

Step 2) After the calculation of the above 1), each data sample of the first verification block can be obtained

Gaussian kernel function, this

Gaussian kernel functions are summed and divided by

Get the kernel density for each data sample:

Step 3) For the N kernel densities obtained in 2)

Calculate the average value according to the following formula:

This average is the average kernel density of the first verification block X ⁽¹⁾

Step 4) For other verification blocks, use the same method as above to obtain their corresponding average kernel densities. Finally, the average kernel density of K verification blocks can be obtained

S206. Construct a final kernel density estimator based on the average kernel density corresponding to all verification blocks and the predetermined structural risk term function.

In a specific embodiment of the present application, an average value is obtained for the average kernel density corresponding to the K verification blocks obtained in the above step 3) as the expected value of kernel density estimation

in,

At the same time, the application can obtain the standard deviation of the kernel density estimate

in,

Then the improved UCV can be obtained, that is, the integrated unbiased cross-validation method:

Optionally, in a specific embodiment of this application, in order to improve the accuracy of the algorithm, this application can add a structural risk term ξg(h) to the expression of EUCV(h):

Among them, ξ is called the structural risk coefficient, and satisfies ξ≥0. When ξ=0, it is equivalent to not introducing structural risk items. g(h) is called the structural risk function, and an appropriate convex function can be selected. The structural risk function that can be used in this application can be expressed as:

So far, the present application has completed the modification of the UCV method and obtained the expression of the new EUCV method. Next, under the conditions of given sampling rate r, sampling times K and structural risk coefficient ξ, find the value of h that makes the EUCV expression obtain the minimum value within the appropriate range of h>0 by means of search traversal, namely is the optimal window width. For example, in the range of 0.01 to 1.00, the search is performed with a step size of 0.001 to find the h value that makes the EUCV expression obtain the minimum value; finally, the final kernel density estimator is constructed according to the expression of the kernel density estimator.

Embodiment three

Fig. 3 is a third schematic flow chart of the construction method of the kernel density estimator provided by the embodiment of the present application. The description and expansion are made based on the above technical solutions, and may be combined with the above optional implementation modes. As shown in Figure 3, the construction method of the kernel density estimator may include the following steps:

S301. Construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both natural numbers greater than 1.

S302. Construct a Gaussian kernel function corresponding to each data sample in each training block.

S303. Based on the Gaussian kernel function corresponding to each data sample in each training block and the predetermined number of data samples in each training block, obtain a Gaussian kernel density estimator corresponding to each training block.

S304. Based on the Gaussian kernel density estimator corresponding to each training block, calculate the Gaussian kernel function of each data sample in each verification block and each data sample in its corresponding training block, and obtain X multiplied by Y Gaussian kernel functions .

In a specific embodiment of the application, the value of X is the same as

The value of is the same; the value of Y is the same as that of N.

S305. Obtain Y kernel densities corresponding to each verification block based on the X Gaussian kernel functions corresponding to each data sample of each verification block and the predetermined number of data samples in each training block.

S306. Based on the kernel density of each data sample in each verification block, calculate an average kernel density corresponding to each verification block.

S307. Construct a final kernel density estimator based on the average kernel density corresponding to all verification blocks and the predetermined structural risk term function.

This application combines the cross-validation method with the simple random sampling method, divides the original data set through multiple simple random sampling, and constructs multiple kernel density estimators, so that the cross-validation can obtain the kernel density under the condition of smaller standard deviation and more stability estimated expected value

And by introducing the structural risk term ξg(h), a better window width is searched to improve the accuracy of the final kernel density estimator.

The following is a comparison of the stability experiment and the accuracy experiment. In terms of stability experiments, using the UCV method and the EUCV method, the standard deviation of the kernel density estimate was calculated on 100 data sets of each probability distribution. For the EUCV method, the sampling rate r is set to 0.7, the sampling frequency K is set to 200, and the structural risk coefficient ξ=0. Therefore, each probability distribution can obtain 100 standard deviations obtained by the UCV method and 100 EUCV standard deviations, and compare the 200 standard deviations of each probability distribution.

Figure 4 is a comparison chart of the standard deviation of kernel density estimates calculated by the UCV method and the EUCV method on a normally distributed data set. The information of the normal distribution data set is as follows: μ=0, σ=1; the number of data sets is 100; the number of samples N of each data set is 1000. It can be seen from the above experimental comparison figure that the standard deviation of the kernel density estimation obtained by the EUCV method is significantly lower than that of the kernel density estimation obtained by the UCV method, that is, the EUCV method is more stable in kernel density estimation.

Figure 5 is a comparison chart of the standard deviation of the kernel density estimation calculated by the UCV method and the EUCV method on the Beta distribution data set. The information of the data set of Beta distribution is as follows: α=2, β=2; the number of data sets is 100; the number of samples N of each data set is 1000. It can be seen from the above experimental comparison figure that the standard deviation of the kernel density estimation obtained by the EUCV method is significantly lower than that of the kernel density estimation obtained by the UCV method, that is, the EUCV method is more stable in kernel density estimation.

Figure 6 is a comparison chart of the standard deviation of the kernel density estimation calculated by the UCV method and the EUCV method on the F-distributed data set. The information of the data set of F distribution is as follows: n ₁ =20, n ₂ =20; the number of data sets is 100; the number of samples N of each data set is 1000. It can be seen from the above experimental comparison figure that the standard deviation of the kernel density estimation obtained by the EUCV method is significantly lower than that of the kernel density estimation obtained by the UCV method, that is, the EUCV method is more stable in kernel density estimation.

In terms of accuracy experiments, a data set is selected under each probability distribution, and UCV-KDE is constructed, and then the probability density of each sample in the data set is estimated, and compared with the real probability density, the average absolute error is calculated (Mean Absolute Error, MAE for short). In addition, select a data set under the same distribution as the test set, estimate the probability density of each sample in the test set, and compare it with the real probability density to calculate the MAE. Afterwards, under the conditions that the sampling rate r is set to 0.7, the sampling times K=5, 10, 15,...,200, and the structural risk coefficient ξ=0.0001, the EUCV-KDE is constructed, and the MAE is calculated by the same method, and the experiment is repeated 5 times Explore the change curve of MAE with the number of sampling K, and observe the change curve of the window width h with the number of sampling K.

Fig. 7 is a schematic diagram of the variation of h calculated with K on a normally distributed data set. Fig. 8 is a schematic diagram of MAE calculated on the original data set with normal distribution as a function of K. Fig. 9 is a schematic diagram of MAE calculated on a normally distributed test data set as a function of K. In the legend, UCV-KDE represents the result of the unbiased cross-validation method; EUCV-KDE (mean) represents the mean of the results of the five-time integrated unbiased cross-validation method; EUCV-KDE (std) represents the five-time integrated unbiased cross-validation method. The standard deviation of the results of the partial cross-validation method.

It can be seen from Figure 8 and Figure 9 that under normal distribution, the MAE curve of EUCV-KDE is lower than the MAE horizontal line of UCV-KDE, indicating that the error of EUCV-KDE is smaller, that is, the accuracy is higher.

Fig. 10 is a schematic diagram of the variation of h calculated with K on the data set of Beta distribution. Figure 11 is a schematic diagram of the variation of MAE calculated on the original data set of Beta distribution with K. Fig. 12 is a schematic diagram of the MAE calculated on the test data set of Beta distribution as a function of K. It can be seen from Figure 11 and Figure 12 that under the Beta distribution, the MAE curve of EUCV-KDE is lower than the MAE horizontal line of UCV-KDE, indicating that the error of EUCV-KDE is smaller, that is, the accuracy is higher.

Fig. 13 is a schematic diagram of the variation of h calculated on the F-distributed data set with K. Fig. 14 is a schematic diagram of the MAE calculated on the original data set of the F distribution as a function of K. Fig. 15 is a schematic diagram of MAE calculated on the test data set of F distribution as a function of K. It can be seen from Figure 14 and Figure 15 that under the F distribution, the MAE curve of EUCV-KDE is lower than the MAE horizontal line of UCV-KDE, indicating that the error of EUCV-KDE is smaller, that is, the accuracy is higher.

In the schematic diagram of the above accuracy experiment, the dotted line represents the result of UCV-KDE, because UCV-KDE has nothing to do with the number of sampling K, so it is drawn as a horizontal line. The black solid line represents the average result of 5 EUCV-KDE experiments, and the gray shaded area is the interval of 1 standard deviation; because the random sampling results of the 5 experiments are different, there will be a fluctuation range.

Embodiment four

FIG. 16 is a schematic structural diagram of an apparatus for constructing a kernel density estimator provided in an embodiment of the present application. As shown in Figure 16, the construction device 1600 of the kernel density estimator includes: a construction module 1601 and a calculation module 1602; wherein,

The construction module 1601 is used to construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both is a natural number greater than 1; construct a Gaussian kernel density estimator for each training block, and obtain the Gaussian kernel density estimator corresponding to each training block;

The calculation module 1602 is configured to use the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block;

The construction module 1601 is further configured to construct a final kernel density estimator based on the average kernel density corresponding to all verification blocks and the predetermined structural risk term function.

The construction module 1601 is specifically configured to randomly extract X data samples from the N data samples of the original data set; use the X data samples as training blocks in the current data block pair; The remaining Y data samples in the data set are used as verification blocks in the current data block pair; the K data blocks are constructed from the training block in the current data block pair and the verification block in the current data block pair A data block pair in the pair; repeat the above operations until K data block pairs are constructed; wherein, N=X+Y; X and Y are both natural numbers greater than or equal to 1 and less than N.

The construction module 1601 is specifically used to construct a Gaussian kernel function corresponding to each data sample in each training block; based on the Gaussian kernel function corresponding to each data sample in each training block and the predetermined data in each training block The number of samples is used to obtain the Gaussian kernel density estimator corresponding to each training block.

The calculation module 1602 is specifically configured to use the Gaussian kernel density estimator corresponding to each training block to calculate the kernel density of each data sample in each verification block; based on the kernel density of each data sample in each verification block, calculate The average kernel density corresponding to each verification block.

The calculation module 1602 is specifically used to calculate the Gaussian kernel function of each data sample in each verification block and each data sample in its corresponding training block based on the Gaussian kernel density estimator corresponding to each training block, to obtain X ×Y Gaussian kernel functions; based on the X Gaussian kernel functions corresponding to each data sample of each verification block and the predetermined number of data samples in each training block, the corresponding Y kernel densities of each verification block are obtained.

The above-mentioned device for constructing a kernel density estimator can execute the method provided by any embodiment of the present application, and has corresponding functional modules for executing the method. For technical details not exhaustively described in this embodiment, refer to the method for constructing a kernel density estimator provided in any embodiment of this application.

Embodiment five

FIG. 17 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. FIG. 17 shows a block diagram of an exemplary electronic device suitable for implementing embodiments of the present application. The electronic device 12 shown in FIG. 17 is only an example, and should not limit the functions and scope of use of the embodiments of the present application.

As shown in FIG. 17, electronic device 12 takes the form of a general-purpose computing device. Components of electronic device 12 may include, but are not limited to, one or more processors or processing units 16, system memory 28, bus 18 connecting various system components including system memory 28 and processing unit 16.

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include but are not limited to Industry Standard Architecture (Industry Standard Architecture, ISA) bus, Micro Channel Architecture (MicroChannel Architecture, MCA) bus, Enhanced ISA bus, Video Electronics Standards Association (Video Electronics Standards Association) , VESA) local bus and peripheral component interconnect (Peripheral Component Interconnect, PCI) bus.

Electronic device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 12 and include both volatile and nonvolatile media, removable and non-removable media.

System memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32 . Electronic device 12 may include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 17, commonly referred to as a "hard drive"). Although not shown in FIG. 17, a disk drive for reading and writing to a removable nonvolatile disk (such as a "floppy disk") may be provided, as well as a removable nonvolatile disk (such as a Portable Compact Disk ROM ( Compact Disc Read-Only Memory, CD-ROM), Digital Versatile Disc Read-Only Memory (Digital Video Disc Read-Only Memory, DVD-ROM) or other optical media) CD-ROM drive. In these cases, each drive may be connected to bus 18 via one or more data media interfaces. Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present application.

Program/utility 40 may be stored, for example, in memory 28 as a set (at least one) of program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include implementations of network environments. The program modules 42 generally perform the functions and/or methods of the embodiments described herein.

The electronic device 12 may also communicate with one or more external devices 14 (e.g., a keyboard, pointing device, display 24, etc.), may also communicate with one or more devices that enable a user to interact with the electronic device 12, and/or communicate with Any device (eg, network card, modem, etc.) that enables the electronic device 12 to communicate with one or more other computing devices. This communication can be performed through an input/output (Input/Output, I/O) interface 22 . Moreover, the electronic device 12 can also communicate with one or more networks (such as a local area network (Local Area Network, LAN), a wide area network (Wide Area Network, WAN) and/or a public network, such as the Internet) through the network adapter 20. As shown, network adapter 20 communicates with other modules of electronic device 12 via bus 18 . It should be appreciated that although not shown in FIG. 17 , other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, disk arrays (Redundant Arrays of Independent Disks, RAID) systems, tape drives, and data backup storage systems.

The processing unit 16 executes various functional applications and data processing by running the programs stored in the system memory 28 , for example, realizing the construction method of the kernel density estimator provided in the embodiment of the present application.

Embodiment six

Embodiment 6 of the present application provides a computer storage medium.

The computer-readable storage medium in the embodiments of the present application may use any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections with one or more leads, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this document, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a data signal carrying computer readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including - but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for performing the operations of the present application may be written in one or more programming languages or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural Programming Language - such as "C" or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).

Claims

A method for constructing a kernel density estimator, comprising:

Construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both natural numbers greater than 1;

Construct a Gaussian kernel density estimator for each training block, and obtain the Gaussian kernel density estimator corresponding to each training block;

Use the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block;

Based on the average kernel density corresponding to all verification blocks and the predetermined structural risk term function, the final kernel density estimator is constructed.
The method according to claim 1, wherein said constructing K data block pairs based on the original data set comprises:

Randomly extract X data samples from the N data samples in the original data set; use the X data samples as training blocks in the current data block pair; use the remaining Y data samples in the original data set as The verification block in the current data block pair; the training block in the current data block pair and the verification block in the current data block pair construct a data block pair in the K data block pairs; repeat The above operations are performed until K data block pairs are constructed; wherein, N=X+Y; both X and Y are natural numbers greater than or equal to 1 and less than N.
The method according to claim 1, wherein, said constructing a Gaussian kernel density estimator for each training block, obtaining a Gaussian kernel density estimator corresponding to each training block, comprising:

Construct the Gaussian kernel function corresponding to each data sample in each training block;

Based on the Gaussian kernel function corresponding to each data sample in each training block and the predetermined number of data samples in each training block, a Gaussian kernel density estimator corresponding to each training block is obtained.
The method according to claim 1, wherein said use of the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block includes:

Calculate the kernel density of each data sample in each verification block using the Gaussian kernel density estimator corresponding to each training block;

Based on the kernel density of each data sample in each verification block, an average kernel density corresponding to each verification block is calculated.
The method according to claim 4, wherein said use of a Gaussian kernel density estimator corresponding to each training block to calculate the kernel density of each data sample in each verification block includes:

Based on the Gaussian kernel density estimator corresponding to each training block, calculate the Gaussian kernel function of each data sample in each verification block and each data sample in its corresponding training block, and obtain X×Y Gaussian kernel functions;

Based on the X Gaussian kernel functions corresponding to each data sample of each verification block and the predetermined number of data samples in each training block, the corresponding Y kernel densities of each verification block are obtained.
A construction device of a kernel density estimator, comprising: a construction module and a calculation module; wherein,

The building module is used to construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are A natural number greater than 1; construct a Gaussian kernel density estimator for each training block, and obtain a Gaussian kernel density estimator corresponding to each training block;

The calculation module is used to calculate the average kernel density corresponding to each verification block by using the Gaussian kernel density estimator corresponding to each training block;

The building module is further configured to construct a final kernel density estimator based on the average kernel density corresponding to all verification blocks and a predetermined structural risk term function.
The device according to claim 6, wherein the building module is specifically configured to randomly extract X data samples from the N data samples in the original data set; use the X data samples as the current data block The training block in the pair; the remaining Y data samples in the original data set are used as the verification block in the current data block pair; the training block in the current data block pair and the current data block pair in The verification block constructs one of the K data block pairs; repeat the above operations until K data block pairs are constructed; where, N=X+Y; X and Y are both greater than or equal to 1 and less than A natural number of N.
The device according to claim 6, wherein the building block is specifically used to construct a Gaussian kernel function corresponding to each data sample in each training block; based on the Gaussian kernel function corresponding to each data sample in each training block and the predetermined number of data samples in each training block to obtain a Gaussian kernel density estimator corresponding to each training block.
The device according to claim 6, wherein the calculation module is specifically configured to use the Gaussian kernel density estimator corresponding to each training block to calculate the kernel density of each data sample in each verification block; The kernel density of each data sample of , and calculate the average kernel density corresponding to each verification block.
The device according to claim 6, wherein the calculation module is specifically configured to calculate each data sample in each verification block and each data sample in its corresponding training block based on the Gaussian kernel density estimator corresponding to each training block. A Gaussian kernel function of a data sample to obtain X×Y Gaussian kernel functions; based on the X Gaussian kernel functions corresponding to each data sample of each verification block and the predetermined number of data samples in each training block, each The corresponding Y kernel densities of the verification block.
An electronic device comprising:

one or more processors;

memory for storing one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the method for constructing a kernel density estimator according to any one of claims 1-5.
A storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method for constructing a kernel density estimator according to any one of claims 1 to 5 is realized.