WO2023000251A1 - Method and apparatus for constructing kernel density estimator, and electronic device and medium - Google Patents

Method and apparatus for constructing kernel density estimator, and electronic device and medium Download PDF

Info

Publication number
WO2023000251A1
WO2023000251A1 PCT/CN2021/107837 CN2021107837W WO2023000251A1 WO 2023000251 A1 WO2023000251 A1 WO 2023000251A1 CN 2021107837 W CN2021107837 W CN 2021107837W WO 2023000251 A1 WO2023000251 A1 WO 2023000251A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
kernel density
data
training
verification
Prior art date
Application number
PCT/CN2021/107837
Other languages
French (fr)
Chinese (zh)
Inventor
何玉林
黄德发
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Priority to PCT/CN2021/107837 priority Critical patent/WO2023000251A1/en
Publication of WO2023000251A1 publication Critical patent/WO2023000251A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models

Definitions

  • the embodiments of the present application relate to the technical field of data mining, for example, to a construction method, device, electronic equipment and media of a kernel density estimator.
  • the kernel density estimator is a probability density estimation function whose expression is:
  • h is the window width; is the kernel function.
  • the commonly used kernel function of the classic kernel density estimator is the Gaussian function:
  • UCV-KDE The classic cross-validation kernel density estimator (UCV-KDE) uses the unbiased cross-validation method (UCV for short) based on the ISE loss function to find the optimal window width h to construct the kernel density estimator ( Kernel Density Estimator, referred to as KDE).
  • UCV Kernel Density Estimator
  • the calculation method of the first integral is:
  • the stability of the kernel density estimate is poor, and the standard deviation obtained by the above formula is relatively large; and the KDE finally constructed has a certain error with the real distribution, and its stability and accuracy needs to be further improved.
  • the present application provides a construction method, device, electronic equipment and medium of a kernel density estimator, which can improve the stability and accuracy of kernel density estimation.
  • the embodiment of the present application provides a method for constructing a kernel density estimator, the method comprising:
  • each data block pair includes a training block and a verification block;
  • the original data set includes N data samples;
  • K and N are both natural numbers greater than 1;
  • the final kernel density estimator is constructed.
  • the embodiment of the present application also provides an apparatus for constructing a kernel density estimator, the apparatus including: a construction module and a calculation module; wherein,
  • the building module is used to construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are A natural number greater than 1; construct a Gaussian kernel density estimator for each training block, and obtain a Gaussian kernel density estimator corresponding to each training block;
  • the calculation module is used to calculate the average kernel density corresponding to each verification block by using the Gaussian kernel density estimator corresponding to each training block;
  • the building module is further configured to construct a final kernel density estimator based on the average kernel density corresponding to all verification blocks and a predetermined structural risk term function.
  • the embodiment of the present application provides an electronic device, including:
  • processors one or more processors
  • memory for storing one or more programs
  • the one or more processors are made to implement the method for constructing a kernel density estimator described in any embodiment of the present application.
  • an embodiment of the present application provides a storage medium on which a computer program is stored, and when the program is executed by a processor, the method for constructing a kernel density estimator described in any embodiment of the present application is implemented.
  • Fig. 1 is the first schematic flowchart of the construction method of the kernel density estimator provided by the embodiment of the present application;
  • Fig. 2 is the second schematic flow chart of the construction method of the kernel density estimator provided by the embodiment of the present application;
  • FIG. 3 is a third schematic flow chart of a construction method of a kernel density estimator provided in an embodiment of the present application.
  • Figure 4 is a comparison chart of the standard deviation of the kernel density estimate calculated by the UCV method and the EUCV method on a normally distributed data set;
  • Figure 5 is a comparison chart of the standard deviation of the kernel density estimate calculated by the UCV method and the EUCV method on the Beta distribution data set;
  • Figure 6 is a comparison chart of the standard deviation of the kernel density estimation calculated by the UCV method and the EUCV method on the F-distributed data set;
  • Figure 7 is a schematic diagram of the change of h calculated with K on a normally distributed data set
  • Figure 8 is a schematic diagram of the MAE calculated on the original data set of the normal distribution as a function of K;
  • Fig. 9 is a schematic diagram of MAE calculated on a normally distributed test data set as a function of K;
  • Figure 10 is a schematic diagram of the change of h calculated with K on the Beta distribution data set
  • Figure 11 is a schematic diagram of the change of MAE calculated on the original data set of Beta distribution with K;
  • Figure 12 is a schematic diagram of the MAE calculated on the test data set of the Beta distribution as a function of K;
  • Fig. 13 is a schematic diagram of the change of h calculated with K on the data set of F distribution
  • Figure 14 is a schematic diagram of the change of MAE calculated on the original data set of F distribution with K;
  • Figure 15 is a schematic diagram of the MAE calculated on the test data set of the F distribution as a function of K;
  • FIG. 16 is a schematic structural diagram of a construction device for a kernel density estimator provided in an embodiment of the present application.
  • FIG. 17 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Fig. 1 is a schematic flow chart of the first construction method of the kernel density estimator provided by the embodiment of the present application, the method can be executed by the construction device or electronic equipment of the kernel density estimator, and the device or electronic equipment can be implemented by software and/or Realized by means of hardware, this device or electronic device can be integrated into any smart device with network communication function.
  • the construction method of the kernel density estimator may include the following steps:
  • each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both natural numbers greater than 1.
  • the electronic device can construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both greater than The natural number of 1.
  • the electronic device can first randomly extract X data samples from the N data samples in the original data set; use the X data samples as training blocks in the current data block pair; then use the remaining Y data samples in the original data set as the current
  • the original data set includes 10 data samples
  • 3 samples are randomly selected from the 10 sample data, and these 3 data samples are used as the first data block pair training block; the remaining 7 data samples are used as the verification block in the first data block pair; the training block containing 3 data samples and the verification block containing 7 data samples extracted for the first time are used to construct the second A data block pair; when constructing the second data block pair, 3 samples are randomly selected from the 10 sample data, and these 3 data samples are used as the training blocks in the second data block pair; the remaining The 7 data samples of are used as the verification block in the second data block pair; the second data block pair is constructed from the training block containing 3 data samples and the verification block containing 7 data samples extracted for the second time; And so on; until K data block pairs are constructed.
  • the electronic device may construct a Gaussian kernel density estimator for each training block, and obtain a Gaussian kernel density estimator corresponding to each training block.
  • the electronic device can first construct a Gaussian kernel function corresponding to each data sample in each training block; then based on the Gaussian kernel function corresponding to each data sample in each training block and the predetermined number of data samples in each training block , to obtain the Gaussian kernel density estimator corresponding to each training block.
  • the electronic device may use the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block.
  • the electronic device can first use the Gaussian kernel density estimator corresponding to each training block to calculate the kernel density of each data sample in each verification block; then calculate the corresponding kernel density of each verification block based on the kernel density of each data sample in each verification block. average kernel density.
  • the electronic device can construct a final kernel density estimator based on the average kernel density corresponding to all verification blocks and the predetermined structural risk term function.
  • the electronic device can obtain an average value based on the average kernel density corresponding to K verification blocks; at the same time, the application can also obtain the standard deviation of the average kernel density corresponding to K verification blocks; thus the improved UCV is obtained, that is, the integrated Unbiased cross-validation method (Ensemble Unbiased Cross-validation, referred to as EUCV).
  • this application adds a structural risk term function to the improved UCV, and obtains the expression of EUCV based on the structural risk term function; then uses the EUCV expression to obtain the optimal window width; finally uses This optimal window width builds the final kernel density estimator.
  • the construction method of the kernel density estimator proposed in the embodiment of the present application first constructs K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; and then constructs a Gaussian for each training block Kernel density estimator to obtain the Gaussian kernel density estimator corresponding to each training block; then use the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block; finally based on the average kernel density corresponding to all verification blocks Density and a predetermined structural risk term function to construct the final kernel density estimator.
  • the cross-validation method can be combined with the simple random sampling method, and the original data set can be divided by multiple simple random sampling, and multiple kernel density estimators can be constructed, so that the cross-validation can be performed at a higher standard deviation.
  • the expected value of kernel density estimation is obtained under smaller and more stable conditions; and by introducing a structural risk term function, a better window width is searched to improve the accuracy of the final kernel density estimator. Therefore, compared with related technologies, the construction method of the kernel density estimator proposed in the embodiment of the present application can improve the stability and accuracy of the kernel density estimation; moreover, the technical solution of the embodiment of the present application is simple, convenient and popular. The scope of application is wider.
  • Fig. 2 is a second schematic flow chart of the construction method of the kernel density estimator provided by the embodiment of the present application. The description and expansion are made based on the above technical solutions, and may be combined with the above optional implementation modes. As shown in Figure 2, the construction method of the kernel density estimator may include the following steps:
  • each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both natural numbers greater than 1.
  • the electronic device may construct a Gaussian kernel function corresponding to each data sample in each training block.
  • the Gaussian kernel function commonly used by the classic kernel density estimator can be expressed as:
  • the electronic device may obtain the Gaussian kernel density estimator corresponding to each training block based on the Gaussian kernel function corresponding to each data sample in each training block and the predetermined number of data samples in each training block.
  • the electronic device can calculate the Gaussian kernel density estimator corresponding to each training block according to the following formula: For the kth training block Build a Gaussian kernel density estimator:
  • Step 1) For each data sample in the first verification block X (1) Compute it and the Gaussian kernel function for each data sample in the first training block according to the predetermined Gaussian kernel function formula:
  • each data sample of the first verification block can be obtained Gaussian kernel function, this Gaussian kernel functions are summed and divided by Get the kernel density for each data sample:
  • Step 3) For the N kernel densities obtained in 2) Calculate the average value according to the following formula:
  • This average is the average kernel density of the first verification block X (1)
  • Step 4 For other verification blocks, use the same method as above to obtain their corresponding average kernel densities. Finally, the average kernel density of K verification blocks can be obtained
  • an average value is obtained for the average kernel density corresponding to the K verification blocks obtained in the above step 3) as the expected value of kernel density estimation in,
  • the application can obtain the standard deviation of the kernel density estimate in,
  • this application in order to improve the accuracy of the algorithm, this application can add a structural risk term ⁇ g(h) to the expression of EUCV(h):
  • is called the structural risk coefficient, and satisfies ⁇ 0.
  • is equivalent to not introducing structural risk items.
  • g(h) is called the structural risk function, and an appropriate convex function can be selected.
  • the structural risk function that can be used in this application can be expressed as:
  • the present application has completed the modification of the UCV method and obtained the expression of the new EUCV method.
  • the search is performed with a step size of 0.001 to find the h value that makes the EUCV expression obtain the minimum value;
  • the final kernel density estimator is constructed according to the expression of the kernel density estimator.
  • the construction method of the kernel density estimator proposed in the embodiment of the present application first constructs K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; and then constructs a Gaussian for each training block Kernel density estimator to obtain the Gaussian kernel density estimator corresponding to each training block; then use the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block; finally based on the average kernel density corresponding to all verification blocks Density and a predetermined structural risk term function to construct the final kernel density estimator.
  • the cross-validation method can be combined with the simple random sampling method, and the original data set can be divided by multiple simple random sampling, and multiple kernel density estimators can be constructed, so that the cross-validation can be performed at a higher standard deviation.
  • the expected value of kernel density estimation is obtained under smaller and more stable conditions; and by introducing a structural risk term function, a better window width is searched to improve the accuracy of the final kernel density estimator. Therefore, compared with related technologies, the construction method of the kernel density estimator proposed in the embodiment of the present application can improve the stability and accuracy of the kernel density estimation; moreover, the technical solution of the embodiment of the present application is simple, convenient and popular. The scope of application is wider.
  • Fig. 3 is a third schematic flow chart of the construction method of the kernel density estimator provided by the embodiment of the present application. The description and expansion are made based on the above technical solutions, and may be combined with the above optional implementation modes. As shown in Figure 3, the construction method of the kernel density estimator may include the following steps:
  • each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both natural numbers greater than 1.
  • the value of X is the same as The value of is the same; the value of Y is the same as that of N.
  • This application combines the cross-validation method with the simple random sampling method, divides the original data set through multiple simple random sampling, and constructs multiple kernel density estimators, so that the cross-validation can obtain the kernel density under the condition of smaller standard deviation and more stability estimated expected value And by introducing the structural risk term ⁇ g(h), a better window width is searched to improve the accuracy of the final kernel density estimator.
  • each probability distribution can obtain 100 standard deviations obtained by the UCV method and 100 EUCV standard deviations, and compare the 200 standard deviations of each probability distribution.
  • Figure 4 is a comparison chart of the standard deviation of kernel density estimates calculated by the UCV method and the EUCV method on a normally distributed data set.
  • Figure 5 is a comparison chart of the standard deviation of the kernel density estimation calculated by the UCV method and the EUCV method on the Beta distribution data set.
  • Figure 6 is a comparison chart of the standard deviation of the kernel density estimation calculated by the UCV method and the EUCV method on the F-distributed data set.
  • a data set is selected under each probability distribution, and UCV-KDE is constructed, and then the probability density of each sample in the data set is estimated, and compared with the real probability density, the average absolute error is calculated (Mean Absolute Error, MAE for short).
  • MAE Average Absolute Error
  • the EUCV-KDE is constructed, and the MAE is calculated by the same method, and the experiment is repeated 5 times Explore the change curve of MAE with the number of sampling K, and observe the change curve of the window width h with the number of sampling K.
  • Fig. 7 is a schematic diagram of the variation of h calculated with K on a normally distributed data set.
  • Fig. 8 is a schematic diagram of MAE calculated on the original data set with normal distribution as a function of K.
  • Fig. 9 is a schematic diagram of MAE calculated on a normally distributed test data set as a function of K.
  • UCV-KDE represents the result of the unbiased cross-validation method
  • EUCV-KDE (mean) represents the mean of the results of the five-time integrated unbiased cross-validation method
  • EUCV-KDE (std) represents the five-time integrated unbiased cross-validation method. The standard deviation of the results of the partial cross-validation method.
  • Fig. 10 is a schematic diagram of the variation of h calculated with K on the data set of Beta distribution.
  • Figure 11 is a schematic diagram of the variation of MAE calculated on the original data set of Beta distribution with K.
  • Fig. 12 is a schematic diagram of the MAE calculated on the test data set of Beta distribution as a function of K. It can be seen from Figure 11 and Figure 12 that under the Beta distribution, the MAE curve of EUCV-KDE is lower than the MAE horizontal line of UCV-KDE, indicating that the error of EUCV-KDE is smaller, that is, the accuracy is higher.
  • Fig. 13 is a schematic diagram of the variation of h calculated on the F-distributed data set with K.
  • Fig. 14 is a schematic diagram of the MAE calculated on the original data set of the F distribution as a function of K.
  • Fig. 15 is a schematic diagram of MAE calculated on the test data set of F distribution as a function of K. It can be seen from Figure 14 and Figure 15 that under the F distribution, the MAE curve of EUCV-KDE is lower than the MAE horizontal line of UCV-KDE, indicating that the error of EUCV-KDE is smaller, that is, the accuracy is higher.
  • the dotted line represents the result of UCV-KDE, because UCV-KDE has nothing to do with the number of sampling K, so it is drawn as a horizontal line.
  • the black solid line represents the average result of 5 EUCV-KDE experiments, and the gray shaded area is the interval of 1 standard deviation; because the random sampling results of the 5 experiments are different, there will be a fluctuation range.
  • the construction method of the kernel density estimator proposed in the embodiment of the present application first constructs K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; and then constructs a Gaussian for each training block Kernel density estimator to obtain the Gaussian kernel density estimator corresponding to each training block; then use the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block; finally based on the average kernel density corresponding to all verification blocks Density and a predetermined structural risk term function to construct the final kernel density estimator.
  • the cross-validation method can be combined with the simple random sampling method, and the original data set can be divided by multiple simple random sampling, and multiple kernel density estimators can be constructed, so that the cross-validation can be performed at a higher standard deviation.
  • the expected value of kernel density estimation is obtained under smaller and more stable conditions; and by introducing a structural risk term function, a better window width is searched to improve the accuracy of the final kernel density estimator. Therefore, compared with related technologies, the construction method of the kernel density estimator proposed in the embodiment of the present application can improve the stability and accuracy of the kernel density estimation; moreover, the technical solution of the embodiment of the present application is simple, convenient and popular. The scope of application is wider.
  • FIG. 16 is a schematic structural diagram of an apparatus for constructing a kernel density estimator provided in an embodiment of the present application.
  • the construction device 1600 of the kernel density estimator includes: a construction module 1601 and a calculation module 1602; wherein,
  • the construction module 1601 is used to construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both is a natural number greater than 1; construct a Gaussian kernel density estimator for each training block, and obtain the Gaussian kernel density estimator corresponding to each training block;
  • the calculation module 1602 is configured to use the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block;
  • the construction module 1601 is further configured to construct a final kernel density estimator based on the average kernel density corresponding to all verification blocks and the predetermined structural risk term function.
  • the construction module 1601 is specifically used to construct a Gaussian kernel function corresponding to each data sample in each training block; based on the Gaussian kernel function corresponding to each data sample in each training block and the predetermined data in each training block The number of samples is used to obtain the Gaussian kernel density estimator corresponding to each training block.
  • the calculation module 1602 is specifically configured to use the Gaussian kernel density estimator corresponding to each training block to calculate the kernel density of each data sample in each verification block; based on the kernel density of each data sample in each verification block, calculate The average kernel density corresponding to each verification block.
  • the calculation module 1602 is specifically used to calculate the Gaussian kernel function of each data sample in each verification block and each data sample in its corresponding training block based on the Gaussian kernel density estimator corresponding to each training block, to obtain X ⁇ Y Gaussian kernel functions; based on the X Gaussian kernel functions corresponding to each data sample of each verification block and the predetermined number of data samples in each training block, the corresponding Y kernel densities of each verification block are obtained.
  • the above-mentioned device for constructing a kernel density estimator can execute the method provided by any embodiment of the present application, and has corresponding functional modules for executing the method.
  • the method for constructing a kernel density estimator provided in any embodiment of this application.
  • FIG. 17 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 17 shows a block diagram of an exemplary electronic device suitable for implementing embodiments of the present application.
  • the electronic device 12 shown in FIG. 17 is only an example, and should not limit the functions and scope of use of the embodiments of the present application.
  • electronic device 12 takes the form of a general-purpose computing device.
  • Components of electronic device 12 may include, but are not limited to, one or more processors or processing units 16, system memory 28, bus 18 connecting various system components including system memory 28 and processing unit 16.
  • Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include but are not limited to Industry Standard Architecture (Industry Standard Architecture, ISA) bus, Micro Channel Architecture (MicroChannel Architecture, MCA) bus, Enhanced ISA bus, Video Electronics Standards Association (Video Electronics Standards Association) , VESA) local bus and peripheral component interconnect (Peripheral Component Interconnect, PCI) bus.
  • Electronic device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 12 and include both volatile and nonvolatile media, removable and non-removable media.
  • System memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32 .
  • Electronic device 12 may include other removable/non-removable, volatile/nonvolatile computer system storage media.
  • storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 17, commonly referred to as a "hard drive").
  • a disk drive for reading and writing to a removable nonvolatile disk may be provided, as well as a removable nonvolatile disk (such as a Portable Compact Disk ROM ( Compact Disc Read-Only Memory, CD-ROM), Digital Versatile Disc Read-Only Memory (Digital Video Disc Read-Only Memory, DVD-ROM) or other optical media) CD-ROM drive.
  • a removable nonvolatile disk such as a Portable Compact Disk ROM ( Compact Disc Read-Only Memory, CD-ROM), Digital Versatile Disc Read-Only Memory (Digital Video Disc Read-Only Memory, DVD-ROM) or other optical media
  • each drive may be connected to bus 18 via one or more data media interfaces.
  • Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present application.
  • Program/utility 40 may be stored, for example, in memory 28 as a set (at least one) of program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include implementations of network environments.
  • the program modules 42 generally perform the functions and/or methods of the embodiments described herein.
  • the electronic device 12 may also communicate with one or more external devices 14 (e.g., a keyboard, pointing device, display 24, etc.), may also communicate with one or more devices that enable a user to interact with the electronic device 12, and/or communicate with Any device (eg, network card, modem, etc.) that enables the electronic device 12 to communicate with one or more other computing devices. This communication can be performed through an input/output (Input/Output, I/O) interface 22 .
  • the electronic device 12 can also communicate with one or more networks (such as a local area network (Local Area Network, LAN), a wide area network (Wide Area Network, WAN) and/or a public network, such as the Internet) through the network adapter 20.
  • networks such as a local area network (Local Area Network, LAN), a wide area network (Wide Area Network, WAN) and/or a public network, such as the Internet
  • network adapter 20 communicates with other modules of electronic device 12 via bus 18 .
  • other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, disk arrays (Redundant Arrays of Independent Disks, RAID) systems, tape drives, and data backup storage systems.
  • the processing unit 16 executes various functional applications and data processing by running the programs stored in the system memory 28 , for example, realizing the construction method of the kernel density estimator provided in the embodiment of the present application.
  • Embodiment 6 of the present application provides a computer storage medium.
  • the computer-readable storage medium in the embodiments of the present application may use any combination of one or more computer-readable media.
  • the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer readable signal medium may include a data signal carrying computer readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .
  • Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including - but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for performing the operations of the present application may be written in one or more programming languages or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural Programming Language - such as "C" or a similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Algebra (AREA)
  • Complex Calculations (AREA)

Abstract

Disclosed in the present application are a method and apparatus for constructing a kernel density estimator, and an electronic device and a medium. The method comprises: constructing K data block pairs on the basis of an original data set, wherein each data block pair comprises a training block and a verification block, and the original data set comprises N data samples, K and N both being natural numbers greater than 1; constructing a Gaussian kernel density estimator for each training block, so as to obtain a Gaussian kernel density estimator corresponding to each training block; calculating, by using the Gaussian kernel density estimator corresponding to each training block, an average kernel density corresponding to each verification block; and constructing a final kernel density estimator on the basis of the average kernel densities corresponding to all the verification blocks and a pre-determined structural risk item function. By means of the embodiments of the present application, the stability and accuracy of kernel density estimation can be improved.

Description

核密度估计器的构建方法、装置、电子设备及介质Construction method, device, electronic equipment and medium of kernel density estimator 技术领域technical field
本申请实施例涉及数据挖掘技术领域,例如涉及一种核密度估计器的构建方法、装置、电子设备及介质。The embodiments of the present application relate to the technical field of data mining, for example, to a construction method, device, electronic equipment and media of a kernel density estimator.
背景技术Background technique
估计未知分布数据集的概率密度函数是机器学习和数据挖掘领域的重要研究方向之一,如何有效地提升经典核密度估计器的稳定性和准确性是概率密度函数估计的核心问题。核密度估计器是一个概率密度估计函数,其表达式为:Estimating the probability density function of an unknown distribution data set is one of the important research directions in the field of machine learning and data mining. How to effectively improve the stability and accuracy of the classical kernel density estimator is the core issue of probability density function estimation. The kernel density estimator is a probability density estimation function whose expression is:
Figure PCTCN2021107837-appb-000001
Figure PCTCN2021107837-appb-000001
其中,h为窗口宽度;
Figure PCTCN2021107837-appb-000002
为核函数。经典的核密度估计器常用的核函数是高斯函数:
Among them, h is the window width;
Figure PCTCN2021107837-appb-000002
is the kernel function. The commonly used kernel function of the classic kernel density estimator is the Gaussian function:
Figure PCTCN2021107837-appb-000003
Figure PCTCN2021107837-appb-000003
经典的交叉验证核密度估计器(UCV-KDE),是使用基于ISE损失函数的无偏交叉验证方法(Unbiased Cross-validation,简称UCV),求最优窗口宽度h,来构建核密度估计器(Kernel Density Estimator,简称KDE)。UCV的表达式如下:The classic cross-validation kernel density estimator (UCV-KDE) uses the unbiased cross-validation method (UCV for short) based on the ISE loss function to find the optimal window width h to construct the kernel density estimator ( Kernel Density Estimator, referred to as KDE). The expression of UCV is as follows:
Figure PCTCN2021107837-appb-000004
Figure PCTCN2021107837-appb-000004
其中,第一项积分的求法为:Among them, the calculation method of the first integral is:
Figure PCTCN2021107837-appb-000005
Figure PCTCN2021107837-appb-000005
第二项中,
Figure PCTCN2021107837-appb-000006
是核密度估计的期望值,在UCV方法中,该期望值基于单样本式的交叉验证进行求解:
In the second item,
Figure PCTCN2021107837-appb-000006
is the expected value of the kernel density estimate, which is solved based on one-sample cross-validation in the UCV method:
Figure PCTCN2021107837-appb-000007
Figure PCTCN2021107837-appb-000007
同时,还可以得到对应的核密度估计的标准差求法:At the same time, the standard deviation calculation method of the corresponding kernel density estimation can also be obtained:
Figure PCTCN2021107837-appb-000008
Figure PCTCN2021107837-appb-000008
然而,采用UCV方法在求解核密度估计的期望值时,核密度估计的稳定性较差,通过上式求得标准差较大;并且最终构建得到的KDE与真实分布具有一定的误差,其稳定性和准确性有待进一步提高。However, when the UCV method is used to solve the expected value of the kernel density estimate, the stability of the kernel density estimate is poor, and the standard deviation obtained by the above formula is relatively large; and the KDE finally constructed has a certain error with the real distribution, and its stability and accuracy needs to be further improved.
发明内容Contents of the invention
本申请提供一种核密度估计器的构建方法、装置、电子设备及介质,可以提高核密度估计的稳定性和准确性。The present application provides a construction method, device, electronic equipment and medium of a kernel density estimator, which can improve the stability and accuracy of kernel density estimation.
第一方面,本申请实施例提供了一种核密度估计器的构建方法,所述方法包括:In the first aspect, the embodiment of the present application provides a method for constructing a kernel density estimator, the method comprising:
基于原始数据集构建出K个数据块对;其中,各个数据块对包括一个训练块和一个验证块;所述原始数据集包括N个数据样本;K和N均为大于1的自然数;Construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both natural numbers greater than 1;
针对各个训练块构建一个高斯核密度估计器,得到各个训练块对应的高斯核密度估计器;Construct a Gaussian kernel density estimator for each training block, and obtain the Gaussian kernel density estimator corresponding to each training block;
使用各个训练块对应的高斯核密度估计器,计算各个验证块对应的平均核密度;Use the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block;
基于全部的验证块对应的平均核密度以及预先确定的结构风险项函数,构建出最终的核密度估计器。Based on the average kernel density corresponding to all verification blocks and the predetermined structural risk term function, the final kernel density estimator is constructed.
第二方面,本申请实施例还提供了一种核密度估计器的构建装置,所述装置包括:构建模块和计算模块;其中,In the second aspect, the embodiment of the present application also provides an apparatus for constructing a kernel density estimator, the apparatus including: a construction module and a calculation module; wherein,
所述构建模块,用于基于原始数据集构建出K个数据块对;其中,各个数据块对包括一个训练块和一个验证块;所述原始数据集包括N个数据样本;K和N均为大于1的自然数;针对各个训练块构建一个高斯核密度估计器,得到各个训练块对应的高斯核密度估计器;The building module is used to construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are A natural number greater than 1; construct a Gaussian kernel density estimator for each training block, and obtain a Gaussian kernel density estimator corresponding to each training block;
所述计算模块,用于使用各个训练块对应的高斯核密度估计器,计算各个验证块对应的平均核密度;The calculation module is used to calculate the average kernel density corresponding to each verification block by using the Gaussian kernel density estimator corresponding to each training block;
所述构建模块,还用于基于全部的验证块对应的平均核密度以及预先确定的结构风险项函数,构建出最终的核密度估计器。The building module is further configured to construct a final kernel density estimator based on the average kernel density corresponding to all verification blocks and a predetermined structural risk term function.
第三方面,本申请实施例提供了一种电子设备,包括:In a third aspect, the embodiment of the present application provides an electronic device, including:
一个或多个处理器;one or more processors;
存储器,用于存储一个或多个程序,memory for storing one or more programs,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本申请任意实施例所述的核密度估计器的构建方法。When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the method for constructing a kernel density estimator described in any embodiment of the present application.
第四方面,本申请实施例提供了一种存储介质,其上存储有计算机程序,该程序被处理器执行时实现本申请任意实施例所述的核密度估计器的构建方法。In a fourth aspect, an embodiment of the present application provides a storage medium on which a computer program is stored, and when the program is executed by a processor, the method for constructing a kernel density estimator described in any embodiment of the present application is implemented.
附图说明Description of drawings
图1为本申请实施例提供的核密度估计器的构建方法的第一流程示意图;Fig. 1 is the first schematic flowchart of the construction method of the kernel density estimator provided by the embodiment of the present application;
图2为本申请实施例提供的核密度估计器的构建方法的第二流程示意图;Fig. 2 is the second schematic flow chart of the construction method of the kernel density estimator provided by the embodiment of the present application;
图3为本申请实施例提供的核密度估计器的构建方法的第三流程示意图;FIG. 3 is a third schematic flow chart of a construction method of a kernel density estimator provided in an embodiment of the present application;
图4是UCV方法与EUCV方法在正态分布的数据集上计算的核密度估计的标准差的对比图;Figure 4 is a comparison chart of the standard deviation of the kernel density estimate calculated by the UCV method and the EUCV method on a normally distributed data set;
图5是UCV方法与EUCV方法在Beta分布的数据集上计算的核密度估计的标准差的对比图;Figure 5 is a comparison chart of the standard deviation of the kernel density estimate calculated by the UCV method and the EUCV method on the Beta distribution data set;
图6是UCV方法与EUCV方法在F分布的数据集上计算核密度估计的标准差的对比图;Figure 6 is a comparison chart of the standard deviation of the kernel density estimation calculated by the UCV method and the EUCV method on the F-distributed data set;
图7是在正态分布的数据集上计算的h随K的变化的示意图;Figure 7 is a schematic diagram of the change of h calculated with K on a normally distributed data set;
图8是在正态分布的原数据集上计算的MAE随K的变化的示意图;Figure 8 is a schematic diagram of the MAE calculated on the original data set of the normal distribution as a function of K;
图9是在正态分布的测试数据集上计算的MAE随K的变化的示意图;Fig. 9 is a schematic diagram of MAE calculated on a normally distributed test data set as a function of K;
图10是在Beta分布的数据集上计算的h随K的变化的示意图;Figure 10 is a schematic diagram of the change of h calculated with K on the Beta distribution data set;
图11是在Beta分布的原数据集上计算的MAE随K的变化的示意图;Figure 11 is a schematic diagram of the change of MAE calculated on the original data set of Beta distribution with K;
图12是在Beta分布的测试数据集上计算的MAE随K的变化的示意图;Figure 12 is a schematic diagram of the MAE calculated on the test data set of the Beta distribution as a function of K;
图13是在F分布的数据集上计算的h随K的变化的示意图;Fig. 13 is a schematic diagram of the change of h calculated with K on the data set of F distribution;
图14是在F分布的原数据集上计算的MAE随K的变化的示意图;Figure 14 is a schematic diagram of the change of MAE calculated on the original data set of F distribution with K;
图15是在F分布的测试数据集上计算的MAE随K的变化的示意图;Figure 15 is a schematic diagram of the MAE calculated on the test data set of the F distribution as a function of K;
图16为本申请实施例提供的核密度估计器的构建装置的结构示意图;FIG. 16 is a schematic structural diagram of a construction device for a kernel density estimator provided in an embodiment of the present application;
图17为本申请实施例提供的电子设备的结构示意图。FIG. 17 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式detailed description
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。The application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, but not to limit the present application. In addition, it should be noted that, for the convenience of description, only some structures related to the present application are shown in the drawings but not all structures.
实施例一Embodiment one
图1是本申请实施例提供的核密度估计器的构建方法的第一流程示意图,该方法可以由核密度估计器的构建装置或者电子设备来执行,该装置或者电子设备可以由软件和/或硬件的方式实现,该装置或者电子设备可以集成在任何具有网络通信功能的智能设备中。如图1所示,核密度估计器的构建方法可以包括以下步骤:Fig. 1 is a schematic flow chart of the first construction method of the kernel density estimator provided by the embodiment of the present application, the method can be executed by the construction device or electronic equipment of the kernel density estimator, and the device or electronic equipment can be implemented by software and/or Realized by means of hardware, this device or electronic device can be integrated into any smart device with network communication function. As shown in Figure 1, the construction method of the kernel density estimator may include the following steps:
S101、基于原始数据集构建出K个数据块对;其中,各个数据块对包括一个训练块和一个验证块;原始数据集包括N个数据样本;K和N均为大于1的自然数。S101. Construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both natural numbers greater than 1.
在本步骤中,电子设备可以基于原始数据集构建出K个数据块对;其中,各个数据块对包括一个训练块和一个验证块;原始数据集包括N个数据样本;K和N均为大于1的自然数。电子设备可以先在原始数据集的N个数据样本中随机抽取出X个数据样本;将X个数据样本作为当前数据块对中的训练块;然后将原始数据集中剩余的Y个数据样本作为当前数据块对中的验证块;由当前数据块对中的训练块和当前数据块对中的验证块构建出K个数据块对中的一个数据块对;重复执行上述操作,直到构建出K个数据块对;其中,N=X+Y;X和Y均为大于等于1且小于N的自然数。例如,假设原始数据集包括10个数据样本,在构建第一个数据块对时,先在这10个样本数据中随机抽取出3个,将这3个数据样本作为第一个数据块对中的训练块;将剩余的7个数据样本作为第一个数据块对中的验证块;由第一次抽取出的含3个数据样本的训练块和含7个数据样本的验证块构建出第一个数据块对;在构建第二个数据块对时,再在这10个样本数据中随机抽取出3个,将这3个数据样本作为第二个数据块对中的训练块;将剩余的7个数据样本作为第二个数据块对中的验证块;由第二次抽取出的含3个数据样本的训练块和含7个数据样本的验证块构建出第二个数据块对;以此类推;直到构建出K个数据块对。In this step, the electronic device can construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both greater than The natural number of 1. The electronic device can first randomly extract X data samples from the N data samples in the original data set; use the X data samples as training blocks in the current data block pair; then use the remaining Y data samples in the original data set as the current The verification block in the data block pair; construct one of the K data block pairs from the training block in the current data block pair and the verification block in the current data block pair; repeat the above operations until K are constructed A pair of data blocks; wherein, N=X+Y; both X and Y are natural numbers greater than or equal to 1 and less than N. For example, assuming that the original data set includes 10 data samples, when constructing the first data block pair, 3 samples are randomly selected from the 10 sample data, and these 3 data samples are used as the first data block pair training block; the remaining 7 data samples are used as the verification block in the first data block pair; the training block containing 3 data samples and the verification block containing 7 data samples extracted for the first time are used to construct the second A data block pair; when constructing the second data block pair, 3 samples are randomly selected from the 10 sample data, and these 3 data samples are used as the training blocks in the second data block pair; the remaining The 7 data samples of are used as the verification block in the second data block pair; the second data block pair is constructed from the training block containing 3 data samples and the verification block containing 7 data samples extracted for the second time; And so on; until K data block pairs are constructed.
S102、针对各个训练块构建一个高斯核密度估计器,得到各个训练块对应的高斯核密度估计器。S102. Construct a Gaussian kernel density estimator for each training block, and obtain a Gaussian kernel density estimator corresponding to each training block.
在本步骤中,电子设备可以针对各个训练块构建一个高斯核密度估计器, 得到各个训练块对应的高斯核密度估计器。电子设备可以先构建各个训练块中的每一个数据样本对应的高斯核函数;然后基于各个训练块中的每一个数据样本对应的高斯核函数以及预先确定的各个训练块中的数据样本的个数,得到各个训练块对应的高斯核密度估计器。In this step, the electronic device may construct a Gaussian kernel density estimator for each training block, and obtain a Gaussian kernel density estimator corresponding to each training block. The electronic device can first construct a Gaussian kernel function corresponding to each data sample in each training block; then based on the Gaussian kernel function corresponding to each data sample in each training block and the predetermined number of data samples in each training block , to obtain the Gaussian kernel density estimator corresponding to each training block.
S103、使用各个训练块对应的高斯核密度估计器,计算各个验证块对应的平均核密度。S103. Using the Gaussian kernel density estimator corresponding to each training block, calculate the average kernel density corresponding to each verification block.
在本步骤中,电子设备可以使用各个训练块对应的高斯核密度估计器,计算各个验证块对应的平均核密度。电子设备可以先使用各个训练块对应的高斯核密度估计器,计算各个验证块中的每一个数据样本的核密度;然后基于各个验证块中的每一个数据样本的核密度,计算各个验证块对应的平均核密度。In this step, the electronic device may use the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block. The electronic device can first use the Gaussian kernel density estimator corresponding to each training block to calculate the kernel density of each data sample in each verification block; then calculate the corresponding kernel density of each verification block based on the kernel density of each data sample in each verification block. average kernel density.
S104、基于全部的验证块对应的平均核密度以及预先确定的结构风险项函数,构建出最终的核密度估计器。S104. Construct a final kernel density estimator based on the average kernel density corresponding to all verification blocks and the predetermined structural risk term function.
在本步骤中,电子设备可以基于全部的验证块对应的平均核密度以及预先确定的结构风险项函数,构建出最终的核密度估计器。电子设备可以基于K个验证块对应的平均核密度,求得一个平均值;同时,本申请还可以得出K个验证块对应的平均核密度的标准差;于是得到改进后的UCV,即集成无偏交叉验证方法(Ensemble Unbiased Cross-validation,简称EUCV)。此外,为了提高算法的准确性,本申请在改进后的UCV中添加一个结构风险项函数,基于该结构风险项函数得到EUCV的表达式;然后利用EUCV表达式求得最优窗口宽度;最后使用该最优窗口宽度构建出最终的核密度估计器。In this step, the electronic device can construct a final kernel density estimator based on the average kernel density corresponding to all verification blocks and the predetermined structural risk term function. The electronic device can obtain an average value based on the average kernel density corresponding to K verification blocks; at the same time, the application can also obtain the standard deviation of the average kernel density corresponding to K verification blocks; thus the improved UCV is obtained, that is, the integrated Unbiased cross-validation method (Ensemble Unbiased Cross-validation, referred to as EUCV). In addition, in order to improve the accuracy of the algorithm, this application adds a structural risk term function to the improved UCV, and obtains the expression of EUCV based on the structural risk term function; then uses the EUCV expression to obtain the optimal window width; finally uses This optimal window width builds the final kernel density estimator.
本申请实施例提出的核密度估计器的构建方法,先基于原始数据集构建出K个数据块对;其中,各个数据块对包括一个训练块和一个验证块;然后针对各个训练块构建一个高斯核密度估计器,得到各个训练块对应的高斯核密度估计器;再使用各个训练块对应的高斯核密度估计器,计算各个验证块对应的平均核密度;最后基于全部的验证块对应的平均核密度以及预先确定的结构风险项函数,构建出最终的核密度估计器。也就是说,在本申请的技术方案中,可以将交叉验证方法与简单随机抽样方法结合,通过多次简单随机抽样分割原始数据集,构建多个核密度估计器,使得交叉验证在标准差更小、更稳定的条件下求得核密度估计的期望值;并且通过引入结构风险项函数,搜索到更优的窗口宽度,提高最终的核密度估计器的准确性。因此,和相关技术相比,本申请实施例提出的核密度估计器的构建方法,可以提高核密度估计的稳定性和准确性;并且,本申请实施例的技术方案实现简单方便、便于普及,适用范围更广。The construction method of the kernel density estimator proposed in the embodiment of the present application first constructs K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; and then constructs a Gaussian for each training block Kernel density estimator to obtain the Gaussian kernel density estimator corresponding to each training block; then use the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block; finally based on the average kernel density corresponding to all verification blocks Density and a predetermined structural risk term function to construct the final kernel density estimator. That is to say, in the technical solution of this application, the cross-validation method can be combined with the simple random sampling method, and the original data set can be divided by multiple simple random sampling, and multiple kernel density estimators can be constructed, so that the cross-validation can be performed at a higher standard deviation. The expected value of kernel density estimation is obtained under smaller and more stable conditions; and by introducing a structural risk term function, a better window width is searched to improve the accuracy of the final kernel density estimator. Therefore, compared with related technologies, the construction method of the kernel density estimator proposed in the embodiment of the present application can improve the stability and accuracy of the kernel density estimation; moreover, the technical solution of the embodiment of the present application is simple, convenient and popular. The scope of application is wider.
实施例二Embodiment two
图2是本申请实施例提供的核密度估计器的构建方法的第二流程示意图。基于上述技术方案进行说明与扩展,并可以与上述各个可选实施方式进行结合。如图2所示,核密度估计器的构建方法可以包括以下步骤:Fig. 2 is a second schematic flow chart of the construction method of the kernel density estimator provided by the embodiment of the present application. The description and expansion are made based on the above technical solutions, and may be combined with the above optional implementation modes. As shown in Figure 2, the construction method of the kernel density estimator may include the following steps:
S201、基于原始数据集构建出K个数据块对;其中,各个数据块对包括一个训练块和一个验证块;原始数据集包括N个数据样本;K和N均为大于1的自然数。S201. Construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both natural numbers greater than 1.
S202、构建各个训练块中的每一个数据样本对应的高斯核函数。S202. Construct a Gaussian kernel function corresponding to each data sample in each training block.
在本步骤中,电子设备可以构建各个训练块中的每一个数据样本对应的高斯核函数。经典的核密度估计器常用的高斯核函数可以表示为:
Figure PCTCN2021107837-appb-000009
Figure PCTCN2021107837-appb-000010
In this step, the electronic device may construct a Gaussian kernel function corresponding to each data sample in each training block. The Gaussian kernel function commonly used by the classic kernel density estimator can be expressed as:
Figure PCTCN2021107837-appb-000009
Figure PCTCN2021107837-appb-000010
S203、基于各个训练块中的每一个数据样本对应的高斯核函数以及预先确定的各个训练块中的数据样本的个数,得到各个训练块对应的高斯核密度估计器。S203. Based on the Gaussian kernel function corresponding to each data sample in each training block and the predetermined number of data samples in each training block, obtain a Gaussian kernel density estimator corresponding to each training block.
在本步骤中,电子设备可以基于各个训练块中的每一个数据样本对应的高斯核函数以及预先确定的各个训练块中的数据样本的个数,得到各个训练块对应的高斯核密度估计器。电子设备可以按照以下公式计算各个训练块对应的高斯核密度估计器:对于第k个训练块
Figure PCTCN2021107837-appb-000011
构建高斯核密度估计器:
In this step, the electronic device may obtain the Gaussian kernel density estimator corresponding to each training block based on the Gaussian kernel function corresponding to each data sample in each training block and the predetermined number of data samples in each training block. The electronic device can calculate the Gaussian kernel density estimator corresponding to each training block according to the following formula: For the kth training block
Figure PCTCN2021107837-appb-000011
Build a Gaussian kernel density estimator:
Figure PCTCN2021107837-appb-000012
Figure PCTCN2021107837-appb-000012
在上述公式中,
Figure PCTCN2021107837-appb-000013
表示第k个训练块对应的高斯核密度估计器;
Figure PCTCN2021107837-appb-000014
表示训练块中的数据样本的个数;
Figure PCTCN2021107837-appb-000015
的取值与X的取值相同;
Figure PCTCN2021107837-appb-000016
表示第k个训练块中的第n个数据样本;n表示各个训练块中的各个数据样本的次序;n大于等于1且小于等于
Figure PCTCN2021107837-appb-000017
Figure PCTCN2021107837-appb-000018
表示第k个训练块。这样,通过K个训练块可以得到K个高斯核密度估计器,分别为:
Figure PCTCN2021107837-appb-000019
In the above formula,
Figure PCTCN2021107837-appb-000013
Indicates the Gaussian kernel density estimator corresponding to the kth training block;
Figure PCTCN2021107837-appb-000014
Indicates the number of data samples in the training block;
Figure PCTCN2021107837-appb-000015
The value of is the same as that of X;
Figure PCTCN2021107837-appb-000016
Represents the nth data sample in the kth training block; n represents the order of each data sample in each training block; n is greater than or equal to 1 and less than or equal to
Figure PCTCN2021107837-appb-000017
Figure PCTCN2021107837-appb-000018
Denotes the kth training block. In this way, K Gaussian kernel density estimators can be obtained through K training blocks, respectively:
Figure PCTCN2021107837-appb-000019
S204、使用各个训练块对应的高斯核密度估计器,计算各个验证块中的每一个数据样本的核密度。S204. Calculate the kernel density of each data sample in each verification block by using the Gaussian kernel density estimator corresponding to each training block.
S205、基于各个验证块中的每一个数据样本的核密度,计算各个验证块对应的平均核密度。S205. Based on the kernel density of each data sample in each verification block, calculate an average kernel density corresponding to each verification block.
在本申请的具体实施例中,在给定窗口宽度h的情况下,对于第k个验证 块 X (k),k=1,2,...,K,使用第k个高斯核密度估计器
Figure PCTCN2021107837-appb-000020
计算该验证块中每个样本的核密度,并求平均值。具体计算方法如下:
In a specific embodiment of this application, in the case of a given window width h, for the kth verification block X (k) , k=1,2,...,K, use the kth Gaussian kernel density estimation device
Figure PCTCN2021107837-appb-000020
Compute the kernel density for each sample in this validation block and take the average. The specific calculation method is as follows:
步骤1)对于第一个验证块 X (1)中的每一个数据样本
Figure PCTCN2021107837-appb-000021
根据预先确定的高斯核函数的公式计算它和第一个训练块中的每一个数据样本的高斯核函数:
Step 1) For each data sample in the first verification block X (1)
Figure PCTCN2021107837-appb-000021
Compute it and the Gaussian kernel function for each data sample in the first training block according to the predetermined Gaussian kernel function formula:
Figure PCTCN2021107837-appb-000022
Figure PCTCN2021107837-appb-000022
在上述公式中,
Figure PCTCN2021107837-appb-000023
表示基于第一个验证块中的第m个数据样本和第一个训练块中的第n个数据样本构造的高斯核函数; N表示验证块中的数据样本的个数; N的取值与Y的取值相同;
Figure PCTCN2021107837-appb-000024
表示第一个验证块中的第m个数据样本;
Figure PCTCN2021107837-appb-000025
表示第一个验证块中的第n个数据样本;
Figure PCTCN2021107837-appb-000026
Figure PCTCN2021107837-appb-000027
表示第一个训练块。
In the above formula,
Figure PCTCN2021107837-appb-000023
Represents the Gaussian kernel function constructed based on the mth data sample in the first verification block and the nth data sample in the first training block; N represents the number of data samples in the verification block; the value of N is the same as The value of Y is the same;
Figure PCTCN2021107837-appb-000024
Indicates the mth data sample in the first verification block;
Figure PCTCN2021107837-appb-000025
Indicates the nth data sample in the first verification block;
Figure PCTCN2021107837-appb-000026
Figure PCTCN2021107837-appb-000027
represents the first training block.
步骤2)经过上述1)的计算后,第一个验证块的每一个数据样本都可以得到
Figure PCTCN2021107837-appb-000028
个高斯核函数,将这
Figure PCTCN2021107837-appb-000029
个高斯核函数求和,并除以
Figure PCTCN2021107837-appb-000030
得到每一个数据样本的核密度:
Figure PCTCN2021107837-appb-000031
Step 2) After the calculation of the above 1), each data sample of the first verification block can be obtained
Figure PCTCN2021107837-appb-000028
Gaussian kernel function, this
Figure PCTCN2021107837-appb-000029
Gaussian kernel functions are summed and divided by
Figure PCTCN2021107837-appb-000030
Get the kernel density for each data sample:
Figure PCTCN2021107837-appb-000031
步骤3)对2)中求得的 N个核密度
Figure PCTCN2021107837-appb-000032
按照以下公式求得平均值:
Step 3) For the N kernel densities obtained in 2)
Figure PCTCN2021107837-appb-000032
Calculate the average value according to the following formula:
Figure PCTCN2021107837-appb-000033
Figure PCTCN2021107837-appb-000033
该平均值就是第一个验证块 X (1)的平均核密度
Figure PCTCN2021107837-appb-000034
This average is the average kernel density of the first verification block X (1)
Figure PCTCN2021107837-appb-000034
步骤4)对于其他验证块,用上述同样的方法得到它们各自对应的平均核密度。最终可以得到K个验证块的平均核密度
Figure PCTCN2021107837-appb-000035
Step 4) For other verification blocks, use the same method as above to obtain their corresponding average kernel densities. Finally, the average kernel density of K verification blocks can be obtained
Figure PCTCN2021107837-appb-000035
S206、基于全部的验证块对应的平均核密度以及预先确定的结构风险项函数,构建出最终的核密度估计器。S206. Construct a final kernel density estimator based on the average kernel density corresponding to all verification blocks and the predetermined structural risk term function.
在本申请的具体实施例中,在对上述步骤3)中得到的K个验证块对应的平均核密度,求得一个平均值,作为核密度估计的期望值
Figure PCTCN2021107837-appb-000036
其中,
In a specific embodiment of the present application, an average value is obtained for the average kernel density corresponding to the K verification blocks obtained in the above step 3) as the expected value of kernel density estimation
Figure PCTCN2021107837-appb-000036
in,
Figure PCTCN2021107837-appb-000037
Figure PCTCN2021107837-appb-000037
Figure PCTCN2021107837-appb-000038
Figure PCTCN2021107837-appb-000038
同时,本申请可以得出核密度估计的标准差
Figure PCTCN2021107837-appb-000039
其中,
At the same time, the application can obtain the standard deviation of the kernel density estimate
Figure PCTCN2021107837-appb-000039
in,
Figure PCTCN2021107837-appb-000040
Figure PCTCN2021107837-appb-000040
于是可以得到改进后的UCV,即集成无偏交叉验证方法:Then the improved UCV can be obtained, that is, the integrated unbiased cross-validation method:
Figure PCTCN2021107837-appb-000041
Figure PCTCN2021107837-appb-000041
可选地,在本申请的具体实施例中,为了提高算法的准确性,本申请可以在EUCV(h)的表达式中添加一个结构风险项ξg(h):Optionally, in a specific embodiment of this application, in order to improve the accuracy of the algorithm, this application can add a structural risk term ξg(h) to the expression of EUCV(h):
Figure PCTCN2021107837-appb-000042
Figure PCTCN2021107837-appb-000042
其中,ξ称为结构风险系数,且满足ξ≥0。当ξ=0时,相当于不引入结构风险项。g(h)称为结构风险函数,可以选择合适的凸函数,本申请中可以使用的结构风险函数可以表示为:Among them, ξ is called the structural risk coefficient, and satisfies ξ≥0. When ξ=0, it is equivalent to not introducing structural risk items. g(h) is called the structural risk function, and an appropriate convex function can be selected. The structural risk function that can be used in this application can be expressed as:
Figure PCTCN2021107837-appb-000043
Figure PCTCN2021107837-appb-000043
至此,本申请完成了UCV方法的改造,并得到新的EUCV方法的表达式。接下来,在给定抽样率r,抽样次数K和结构风险系数ξ的条件下,通过搜索遍历的方式,在h>0的适当范围内,找到使EUCV表达式取得最小值的h值,即为最优窗口宽度。例如在0.01~1.00的范围内,按步长0.001进行搜索,寻找使EUCV表达式取得最小值的h值;最后根据核密度估计器的表达式构建出最终的核密度估计器。So far, the present application has completed the modification of the UCV method and obtained the expression of the new EUCV method. Next, under the conditions of given sampling rate r, sampling times K and structural risk coefficient ξ, find the value of h that makes the EUCV expression obtain the minimum value within the appropriate range of h>0 by means of search traversal, namely is the optimal window width. For example, in the range of 0.01 to 1.00, the search is performed with a step size of 0.001 to find the h value that makes the EUCV expression obtain the minimum value; finally, the final kernel density estimator is constructed according to the expression of the kernel density estimator.
本申请实施例提出的核密度估计器的构建方法,先基于原始数据集构建出K个数据块对;其中,各个数据块对包括一个训练块和一个验证块;然后针对各个训练块构建一个高斯核密度估计器,得到各个训练块对应的高斯核密度估计器;再使用各个训练块对应的高斯核密度估计器,计算各个验证块对应的平均核密度;最后基于全部的验证块对应的平均核密度以及预先确定的结构风险项 函数,构建出最终的核密度估计器。也就是说,在本申请的技术方案中,可以将交叉验证方法与简单随机抽样方法结合,通过多次简单随机抽样分割原始数据集,构建多个核密度估计器,使得交叉验证在标准差更小、更稳定的条件下求得核密度估计的期望值;并且通过引入结构风险项函数,搜索到更优的窗口宽度,提高最终的核密度估计器的准确性。因此,和相关技术相比,本申请实施例提出的核密度估计器的构建方法,可以提高核密度估计的稳定性和准确性;并且,本申请实施例的技术方案实现简单方便、便于普及,适用范围更广。The construction method of the kernel density estimator proposed in the embodiment of the present application first constructs K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; and then constructs a Gaussian for each training block Kernel density estimator to obtain the Gaussian kernel density estimator corresponding to each training block; then use the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block; finally based on the average kernel density corresponding to all verification blocks Density and a predetermined structural risk term function to construct the final kernel density estimator. That is to say, in the technical solution of this application, the cross-validation method can be combined with the simple random sampling method, and the original data set can be divided by multiple simple random sampling, and multiple kernel density estimators can be constructed, so that the cross-validation can be performed at a higher standard deviation. The expected value of kernel density estimation is obtained under smaller and more stable conditions; and by introducing a structural risk term function, a better window width is searched to improve the accuracy of the final kernel density estimator. Therefore, compared with related technologies, the construction method of the kernel density estimator proposed in the embodiment of the present application can improve the stability and accuracy of the kernel density estimation; moreover, the technical solution of the embodiment of the present application is simple, convenient and popular. The scope of application is wider.
实施例三Embodiment three
图3是本申请实施例提供的核密度估计器的构建方法的第三流程示意图。基于上述技术方案进行说明与扩展,并可以与上述各个可选实施方式进行结合。如图3所示,核密度估计器的构建方法可以包括以下步骤:Fig. 3 is a third schematic flow chart of the construction method of the kernel density estimator provided by the embodiment of the present application. The description and expansion are made based on the above technical solutions, and may be combined with the above optional implementation modes. As shown in Figure 3, the construction method of the kernel density estimator may include the following steps:
S301、基于原始数据集构建出K个数据块对;其中,各个数据块对包括一个训练块和一个验证块;原始数据集包括N个数据样本;K和N均为大于1的自然数。S301. Construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both natural numbers greater than 1.
S302、构建各个训练块中的每一个数据样本对应的高斯核函数。S302. Construct a Gaussian kernel function corresponding to each data sample in each training block.
S303、基于各个训练块中的每一个数据样本对应的高斯核函数以及预先确定的各个训练块中的数据样本的个数,得到各个训练块对应的高斯核密度估计器。S303. Based on the Gaussian kernel function corresponding to each data sample in each training block and the predetermined number of data samples in each training block, obtain a Gaussian kernel density estimator corresponding to each training block.
S304、基于各个训练块对应的高斯核密度估计器,计算各个验证块中的每一个数据样本和其对应的训练块中的每一个数据样本的高斯核函数,得到X乘以Y个高斯核函数。S304. Based on the Gaussian kernel density estimator corresponding to each training block, calculate the Gaussian kernel function of each data sample in each verification block and each data sample in its corresponding training block, and obtain X multiplied by Y Gaussian kernel functions .
在本申请的具体实施例中,X的取值与
Figure PCTCN2021107837-appb-000044
的取值相同;Y的取值与 N的取值相同。
In a specific embodiment of the application, the value of X is the same as
Figure PCTCN2021107837-appb-000044
The value of is the same; the value of Y is the same as that of N.
S305、基于各个验证块的每一个数据样本对应的X个高斯核函数以及预先确定的各个训练块中的数据样本的个数,得到各个验证块的对应的Y个核密度。S305. Obtain Y kernel densities corresponding to each verification block based on the X Gaussian kernel functions corresponding to each data sample of each verification block and the predetermined number of data samples in each training block.
S306、基于各个验证块中的每一个数据样本的核密度,计算各个验证块对应的平均核密度。S306. Based on the kernel density of each data sample in each verification block, calculate an average kernel density corresponding to each verification block.
S307、基于全部的验证块对应的平均核密度以及预先确定的结构风险项函数,构建出最终的核密度估计器。S307. Construct a final kernel density estimator based on the average kernel density corresponding to all verification blocks and the predetermined structural risk term function.
本申请将交叉验证方法与简单随机抽样方法结合,通过多次简单随机抽样分割原始数据集,构建多个核密度估计器,使得交叉验证在标准差更小、更稳 定的条件下求得核密度估计的期望值
Figure PCTCN2021107837-appb-000045
并且通过引入结构风险项ξg(h),搜索到更优的窗口宽度,提高最终的核密度估计器的准确性。
This application combines the cross-validation method with the simple random sampling method, divides the original data set through multiple simple random sampling, and constructs multiple kernel density estimators, so that the cross-validation can obtain the kernel density under the condition of smaller standard deviation and more stability estimated expected value
Figure PCTCN2021107837-appb-000045
And by introducing the structural risk term ξg(h), a better window width is searched to improve the accuracy of the final kernel density estimator.
下面针对稳定性实验和准确性实验分别进行对比。在稳定性实验方面,使用UCV方法和EUCV方法,在每种概率分布的100个数据集上,分别求核密度估计的标准差。对于EUCV方法,抽样率r设为0.7,抽样次数K设为200,结构风险系数ξ=0。于是每种概率分布可以得到100个UCV方法求得的标准差和100个EUCV的标准差,将每种概率分布的这200个标准差进行对比。The following is a comparison of the stability experiment and the accuracy experiment. In terms of stability experiments, using the UCV method and the EUCV method, the standard deviation of the kernel density estimate was calculated on 100 data sets of each probability distribution. For the EUCV method, the sampling rate r is set to 0.7, the sampling frequency K is set to 200, and the structural risk coefficient ξ=0. Therefore, each probability distribution can obtain 100 standard deviations obtained by the UCV method and 100 EUCV standard deviations, and compare the 200 standard deviations of each probability distribution.
图4是UCV方法与EUCV方法在正态分布的数据集上计算的核密度估计的标准差的对比图。正态分布的数据集的信息如下:μ=0,σ=1;数据集的个数为100;各数据集的样本数N为1000。从上述实验比较图中可看出,EUCV方法求得的核密度估计的标准差显著低于UCV方法求得的核密度估计的标准差,即EUCV方法在做核密度估计时更稳定。Figure 4 is a comparison chart of the standard deviation of kernel density estimates calculated by the UCV method and the EUCV method on a normally distributed data set. The information of the normal distribution data set is as follows: μ=0, σ=1; the number of data sets is 100; the number of samples N of each data set is 1000. It can be seen from the above experimental comparison figure that the standard deviation of the kernel density estimation obtained by the EUCV method is significantly lower than that of the kernel density estimation obtained by the UCV method, that is, the EUCV method is more stable in kernel density estimation.
图5是UCV方法与EUCV方法在Beta分布的数据集上计算的核密度估计的标准差的对比图。Beta分布的数据集的信息如下:α=2,β=2;数据集的个数为100;各数据集的样本数N为1000。从上述实验比较图中可看出,EUCV方法求得的核密度估计的标准差显著低于UCV方法求得的核密度估计的标准差,即EUCV方法在做核密度估计时更稳定。Figure 5 is a comparison chart of the standard deviation of the kernel density estimation calculated by the UCV method and the EUCV method on the Beta distribution data set. The information of the data set of Beta distribution is as follows: α=2, β=2; the number of data sets is 100; the number of samples N of each data set is 1000. It can be seen from the above experimental comparison figure that the standard deviation of the kernel density estimation obtained by the EUCV method is significantly lower than that of the kernel density estimation obtained by the UCV method, that is, the EUCV method is more stable in kernel density estimation.
图6是UCV方法与EUCV方法在F分布的数据集上计算核密度估计的标准差的对比图。F分布的数据集的信息如下:n 1=20,n 2=20;数据集的个数为100;各数据集的样本数N为1000。从上述实验比较图中可看出,EUCV方法求得的核密度估计的标准差显著低于UCV方法求得的核密度估计的标准差,即EUCV方法在做核密度估计时更稳定。 Figure 6 is a comparison chart of the standard deviation of the kernel density estimation calculated by the UCV method and the EUCV method on the F-distributed data set. The information of the data set of F distribution is as follows: n 1 =20, n 2 =20; the number of data sets is 100; the number of samples N of each data set is 1000. It can be seen from the above experimental comparison figure that the standard deviation of the kernel density estimation obtained by the EUCV method is significantly lower than that of the kernel density estimation obtained by the UCV method, that is, the EUCV method is more stable in kernel density estimation.
在准确性实验方面,在每个概率分布下各选取一个数据集,构建UCV-KDE,然后对该数据集中的每个样本进行概率密度估计,并与真实的概率密度进行比较,计算平均绝对误差(Mean Absolute Error,简称MAE)。此外再在相同分布下选取一个数据集作为测试集,对该测试集中的每个样本进行概率密度估计,并与真实概率密度进行比较,计算MAE。之后,在抽样率r取0.7,抽样次数K=5,10,15,…,200,结构风险系数ξ=0.0001的条件下,构建EUCV-KDE,用相同的方法计算MAE,通过5次重复实验探究MAE随抽样次数K的变化曲线,同时观察窗口宽度h随抽样次数K的变化曲线。In terms of accuracy experiments, a data set is selected under each probability distribution, and UCV-KDE is constructed, and then the probability density of each sample in the data set is estimated, and compared with the real probability density, the average absolute error is calculated (Mean Absolute Error, MAE for short). In addition, select a data set under the same distribution as the test set, estimate the probability density of each sample in the test set, and compare it with the real probability density to calculate the MAE. Afterwards, under the conditions that the sampling rate r is set to 0.7, the sampling times K=5, 10, 15,...,200, and the structural risk coefficient ξ=0.0001, the EUCV-KDE is constructed, and the MAE is calculated by the same method, and the experiment is repeated 5 times Explore the change curve of MAE with the number of sampling K, and observe the change curve of the window width h with the number of sampling K.
图7是在正态分布的数据集上计算的h随K的变化的示意图。图8是在正态分布的原数据集上计算的MAE随K的变化的示意图。图9是在正态分布的测试数据集上计算的MAE随K的变化的示意图。在图例中,UCV-KDE表示无偏交叉验证方法的结果;EUCV-KDE(均值(mean))表示5次集成无偏交叉 验证方法的结果的均值;EUCV-KDE(std)表示5次集成无偏交叉验证方法的结果的标准差。Fig. 7 is a schematic diagram of the variation of h calculated with K on a normally distributed data set. Fig. 8 is a schematic diagram of MAE calculated on the original data set with normal distribution as a function of K. Fig. 9 is a schematic diagram of MAE calculated on a normally distributed test data set as a function of K. In the legend, UCV-KDE represents the result of the unbiased cross-validation method; EUCV-KDE (mean) represents the mean of the results of the five-time integrated unbiased cross-validation method; EUCV-KDE (std) represents the five-time integrated unbiased cross-validation method. The standard deviation of the results of the partial cross-validation method.
从图8、图9中可以看到,在正态分布下,EUCV-KDE的MAE曲线低于UCV-KDE的MAE横线,说明EUCV-KDE的误差更小,即准确性更高。It can be seen from Figure 8 and Figure 9 that under normal distribution, the MAE curve of EUCV-KDE is lower than the MAE horizontal line of UCV-KDE, indicating that the error of EUCV-KDE is smaller, that is, the accuracy is higher.
图10是在Beta分布的数据集上计算的h随K的变化的示意图。图11是在Beta分布的原数据集上计算的MAE随K的变化的示意图。图12是在Beta分布的测试数据集上计算的MAE随K的变化的示意图。从图11、图12中可以看到,在Beta分布下,EUCV-KDE的MAE曲线低于UCV-KDE的MAE横线,说明EUCV-KDE的误差更小,即准确性更高。Fig. 10 is a schematic diagram of the variation of h calculated with K on the data set of Beta distribution. Figure 11 is a schematic diagram of the variation of MAE calculated on the original data set of Beta distribution with K. Fig. 12 is a schematic diagram of the MAE calculated on the test data set of Beta distribution as a function of K. It can be seen from Figure 11 and Figure 12 that under the Beta distribution, the MAE curve of EUCV-KDE is lower than the MAE horizontal line of UCV-KDE, indicating that the error of EUCV-KDE is smaller, that is, the accuracy is higher.
图13是在F分布的数据集上计算的h随K的变化的示意图。图14是在F分布的原数据集上计算的MAE随K的变化的示意图。图15是在F分布的测试数据集上计算的MAE随K的变化的示意图。从图14、图15中可以看到,在F分布下,EUCV-KDE的MAE曲线低于UCV-KDE的MAE横线,说明EUCV-KDE的误差更小,即准确性更高。Fig. 13 is a schematic diagram of the variation of h calculated on the F-distributed data set with K. Fig. 14 is a schematic diagram of the MAE calculated on the original data set of the F distribution as a function of K. Fig. 15 is a schematic diagram of MAE calculated on the test data set of F distribution as a function of K. It can be seen from Figure 14 and Figure 15 that under the F distribution, the MAE curve of EUCV-KDE is lower than the MAE horizontal line of UCV-KDE, indicating that the error of EUCV-KDE is smaller, that is, the accuracy is higher.
在上述准确性实验的示意图中,虚线表示UCV-KDE的结果,因为UCV-KDE与抽样次数K无关,所以画成一条横线。黑色实线表示5次EUCV-KDE实验的平均结果,灰色阴影区域则是其1倍标准差区间;由于5次实验的随机抽样结果不同,因此会有波动范围。In the schematic diagram of the above accuracy experiment, the dotted line represents the result of UCV-KDE, because UCV-KDE has nothing to do with the number of sampling K, so it is drawn as a horizontal line. The black solid line represents the average result of 5 EUCV-KDE experiments, and the gray shaded area is the interval of 1 standard deviation; because the random sampling results of the 5 experiments are different, there will be a fluctuation range.
本申请实施例提出的核密度估计器的构建方法,先基于原始数据集构建出K个数据块对;其中,各个数据块对包括一个训练块和一个验证块;然后针对各个训练块构建一个高斯核密度估计器,得到各个训练块对应的高斯核密度估计器;再使用各个训练块对应的高斯核密度估计器,计算各个验证块对应的平均核密度;最后基于全部的验证块对应的平均核密度以及预先确定的结构风险项函数,构建出最终的核密度估计器。也就是说,在本申请的技术方案中,可以将交叉验证方法与简单随机抽样方法结合,通过多次简单随机抽样分割原始数据集,构建多个核密度估计器,使得交叉验证在标准差更小、更稳定的条件下求得核密度估计的期望值;并且通过引入结构风险项函数,搜索到更优的窗口宽度,提高最终的核密度估计器的准确性。因此,和相关技术相比,本申请实施例提出的核密度估计器的构建方法,可以提高核密度估计的稳定性和准确性;并且,本申请实施例的技术方案实现简单方便、便于普及,适用范围更广。The construction method of the kernel density estimator proposed in the embodiment of the present application first constructs K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; and then constructs a Gaussian for each training block Kernel density estimator to obtain the Gaussian kernel density estimator corresponding to each training block; then use the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block; finally based on the average kernel density corresponding to all verification blocks Density and a predetermined structural risk term function to construct the final kernel density estimator. That is to say, in the technical solution of this application, the cross-validation method can be combined with the simple random sampling method, and the original data set can be divided by multiple simple random sampling, and multiple kernel density estimators can be constructed, so that the cross-validation can be performed at a higher standard deviation. The expected value of kernel density estimation is obtained under smaller and more stable conditions; and by introducing a structural risk term function, a better window width is searched to improve the accuracy of the final kernel density estimator. Therefore, compared with related technologies, the construction method of the kernel density estimator proposed in the embodiment of the present application can improve the stability and accuracy of the kernel density estimation; moreover, the technical solution of the embodiment of the present application is simple, convenient and popular. The scope of application is wider.
实施例四Embodiment four
图16为本申请实施例提供的核密度估计器的构建装置的结构示意图。如图 16所示,所述核密度估计器的构建装置1600包括:构建模块1601和计算模块1602;其中,FIG. 16 is a schematic structural diagram of an apparatus for constructing a kernel density estimator provided in an embodiment of the present application. As shown in Figure 16, the construction device 1600 of the kernel density estimator includes: a construction module 1601 and a calculation module 1602; wherein,
所述构建模块1601,用于基于原始数据集构建出K个数据块对;其中,各个数据块对包括一个训练块和一个验证块;所述原始数据集包括N个数据样本;K和N均为大于1的自然数;针对各个训练块构建一个高斯核密度估计器,得到各个训练块对应的高斯核密度估计器;The construction module 1601 is used to construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both is a natural number greater than 1; construct a Gaussian kernel density estimator for each training block, and obtain the Gaussian kernel density estimator corresponding to each training block;
所述计算模块1602,用于使用各个训练块对应的高斯核密度估计器,计算各个验证块对应的平均核密度;The calculation module 1602 is configured to use the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block;
所述构建模块1601,还用于基于全部的验证块对应的平均核密度以及预先确定的结构风险项函数,构建出最终的核密度估计器。The construction module 1601 is further configured to construct a final kernel density estimator based on the average kernel density corresponding to all verification blocks and the predetermined structural risk term function.
所述构建模块1601,具体用于在所述原始数据集的N个数据样本中随机抽取出X个数据样本;将所述X个数据样本作为当前数据块对中的训练块;将所述原始数据集中剩余的Y个数据样本作为所述当前数据块对中的验证块;由所述当前数据块对中的训练块和所述当前数据块对中的验证块构建出所述K个数据块对中的一个数据块对;重复执行上述操作,直到构建出K个数据块对;其中,N=X+Y;X和Y均为大于等于1且小于N的自然数。The construction module 1601 is specifically configured to randomly extract X data samples from the N data samples of the original data set; use the X data samples as training blocks in the current data block pair; The remaining Y data samples in the data set are used as verification blocks in the current data block pair; the K data blocks are constructed from the training block in the current data block pair and the verification block in the current data block pair A data block pair in the pair; repeat the above operations until K data block pairs are constructed; wherein, N=X+Y; X and Y are both natural numbers greater than or equal to 1 and less than N.
所述构建模块1601,具体用于构建各个训练块中的每一个数据样本对应的高斯核函数;基于各个训练块中的每一个数据样本对应的高斯核函数以及预先确定的各个训练块中的数据样本的个数,得到各个训练块对应的高斯核密度估计器。The construction module 1601 is specifically used to construct a Gaussian kernel function corresponding to each data sample in each training block; based on the Gaussian kernel function corresponding to each data sample in each training block and the predetermined data in each training block The number of samples is used to obtain the Gaussian kernel density estimator corresponding to each training block.
所述计算模块1602,具体用于使用各个训练块对应的高斯核密度估计器,计算各个验证块中的每一个数据样本的核密度;基于各个验证块中的每一个数据样本的核密度,计算各个验证块对应的平均核密度。The calculation module 1602 is specifically configured to use the Gaussian kernel density estimator corresponding to each training block to calculate the kernel density of each data sample in each verification block; based on the kernel density of each data sample in each verification block, calculate The average kernel density corresponding to each verification block.
所述计算模块1602,具体用于基于各个训练块对应的高斯核密度估计器,计算各个验证块中的每一个数据样本和其对应的训练块中的每一个数据样本的高斯核函数,得到X×Y个高斯核函数;基于各个验证块的每一个数据样本对应的X个高斯核函数以及预先确定的各个训练块中的数据样本的个数,得到各个验证块的对应的Y个核密度。The calculation module 1602 is specifically used to calculate the Gaussian kernel function of each data sample in each verification block and each data sample in its corresponding training block based on the Gaussian kernel density estimator corresponding to each training block, to obtain X ×Y Gaussian kernel functions; based on the X Gaussian kernel functions corresponding to each data sample of each verification block and the predetermined number of data samples in each training block, the corresponding Y kernel densities of each verification block are obtained.
上述核密度估计器的构建装置可执行本申请任意实施例所提供的方法,具备执行方法相应的功能模块。未在本实施例中详尽描述的技术细节,可参见本申请任意实施例提供的核密度估计器的构建方法。The above-mentioned device for constructing a kernel density estimator can execute the method provided by any embodiment of the present application, and has corresponding functional modules for executing the method. For technical details not exhaustively described in this embodiment, refer to the method for constructing a kernel density estimator provided in any embodiment of this application.
实施例五Embodiment five
图17是本申请实施例提供的电子设备的结构示意图。图17示出了适于用来实现本申请实施方式的示例性电子设备的框图。图17显示的电子设备12仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。FIG. 17 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. FIG. 17 shows a block diagram of an exemplary electronic device suitable for implementing embodiments of the present application. The electronic device 12 shown in FIG. 17 is only an example, and should not limit the functions and scope of use of the embodiments of the present application.
如图17所示,电子设备12以通用计算设备的形式表现。电子设备12的组件可以包括但不限于:一个或者多个处理器或者处理单元16,系统存储器28,连接不同系统组件(包括系统存储器28和处理单元16)的总线18。As shown in FIG. 17, electronic device 12 takes the form of a general-purpose computing device. Components of electronic device 12 may include, but are not limited to, one or more processors or processing units 16, system memory 28, bus 18 connecting various system components including system memory 28 and processing unit 16.
总线18表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(Industry Standard Architecture,ISA)总线,微通道体系结构(MicroChannel Architecture,MCA)总线,增强型ISA总线、视频电子标准协会(Video Electronics Standards Association,VESA)局域总线以及外围组件互连(Peripheral Component Interconnect,PCI)总线。 Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include but are not limited to Industry Standard Architecture (Industry Standard Architecture, ISA) bus, Micro Channel Architecture (MicroChannel Architecture, MCA) bus, Enhanced ISA bus, Video Electronics Standards Association (Video Electronics Standards Association) , VESA) local bus and peripheral component interconnect (Peripheral Component Interconnect, PCI) bus.
电子设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被电子设备12访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。 Electronic device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 12 and include both volatile and nonvolatile media, removable and non-removable media.
系统存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(Random Access Memory,RAM)30和/或高速缓存存储器32。电子设备12可以包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图17未显示,通常称为“硬盘驱动器”)。尽管图17中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM),数码多功能光碟只读存储器(Digital Video Disc Read-Only Memory,DVD-ROM)或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线18相连。存储器28可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本申请各实施例的功能。 System memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32 . Electronic device 12 may include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 17, commonly referred to as a "hard drive"). Although not shown in FIG. 17, a disk drive for reading and writing to a removable nonvolatile disk (such as a "floppy disk") may be provided, as well as a removable nonvolatile disk (such as a Portable Compact Disk ROM ( Compact Disc Read-Only Memory, CD-ROM), Digital Versatile Disc Read-Only Memory (Digital Video Disc Read-Only Memory, DVD-ROM) or other optical media) CD-ROM drive. In these cases, each drive may be connected to bus 18 via one or more data media interfaces. Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present application.
具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如存储器28中,这样的程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本申请所描述的实施例中的功能和/或方法。Program/utility 40 may be stored, for example, in memory 28 as a set (at least one) of program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include implementations of network environments. The program modules 42 generally perform the functions and/or methods of the embodiments described herein.
电子设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示 器24等)通信,还可与一个或者多个使得用户能与该电子设备12交互的设备通信,和/或与使得该电子设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(Input/Output,I/O)接口22进行。并且,电子设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器20通过总线18与电子设备12的其它模块通信。应当明白,尽管图17中未示出,可以结合电子设备12使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、磁盘阵列(Redundant Arrays of Independent Disks,RAID)系统、磁带驱动器以及数据备份存储系统等。The electronic device 12 may also communicate with one or more external devices 14 (e.g., a keyboard, pointing device, display 24, etc.), may also communicate with one or more devices that enable a user to interact with the electronic device 12, and/or communicate with Any device (eg, network card, modem, etc.) that enables the electronic device 12 to communicate with one or more other computing devices. This communication can be performed through an input/output (Input/Output, I/O) interface 22 . Moreover, the electronic device 12 can also communicate with one or more networks (such as a local area network (Local Area Network, LAN), a wide area network (Wide Area Network, WAN) and/or a public network, such as the Internet) through the network adapter 20. As shown, network adapter 20 communicates with other modules of electronic device 12 via bus 18 . It should be appreciated that although not shown in FIG. 17 , other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, disk arrays (Redundant Arrays of Independent Disks, RAID) systems, tape drives, and data backup storage systems.
处理单元16通过运行存储在系统存储器28中的程序,从而执行各种功能应用以及数据处理,例如实现本申请实施例所提供的核密度估计器的构建方法。The processing unit 16 executes various functional applications and data processing by running the programs stored in the system memory 28 , for example, realizing the construction method of the kernel density estimator provided in the embodiment of the present application.
实施例六Embodiment six
本申请实施例六提供了一种计算机存储介质。Embodiment 6 of the present application provides a computer storage medium.
本申请实施例的计算机可读存储介质,可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。The computer-readable storage medium in the embodiments of the present application may use any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections with one or more leads, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this document, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer readable signal medium may include a data signal carrying computer readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括—— 但不限于无线、电线、光缆、RF等等,或者上述的任意合适的组合。Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including - but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言或其组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing the operations of the present application may be written in one or more programming languages or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural Programming Language - such as "C" or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).

Claims (12)

  1. 一种核密度估计器的构建方法,包括:A method for constructing a kernel density estimator, comprising:
    基于原始数据集构建出K个数据块对;其中,各个数据块对包括一个训练块和一个验证块;所述原始数据集包括N个数据样本;K和N均为大于1的自然数;Construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are both natural numbers greater than 1;
    针对各个训练块构建一个高斯核密度估计器,得到各个训练块对应的高斯核密度估计器;Construct a Gaussian kernel density estimator for each training block, and obtain the Gaussian kernel density estimator corresponding to each training block;
    使用各个训练块对应的高斯核密度估计器,计算各个验证块对应的平均核密度;Use the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block;
    基于全部的验证块对应的平均核密度以及预先确定的结构风险项函数,构建出最终的核密度估计器。Based on the average kernel density corresponding to all verification blocks and the predetermined structural risk term function, the final kernel density estimator is constructed.
  2. 根据权利要求1所述的方法,其中,所述基于原始数据集构建出K个数据块对,包括:The method according to claim 1, wherein said constructing K data block pairs based on the original data set comprises:
    在所述原始数据集的N个数据样本中随机抽取出X个数据样本;将所述X个数据样本作为当前数据块对中的训练块;将所述原始数据集中剩余的Y个数据样本作为所述当前数据块对中的验证块;由所述当前数据块对中的训练块和所述当前数据块对中的验证块构建出所述K个数据块对中的一个数据块对;重复执行上述操作,直到构建出K个数据块对;其中,N=X+Y;X和Y均为大于等于1且小于N的自然数。Randomly extract X data samples from the N data samples in the original data set; use the X data samples as training blocks in the current data block pair; use the remaining Y data samples in the original data set as The verification block in the current data block pair; the training block in the current data block pair and the verification block in the current data block pair construct a data block pair in the K data block pairs; repeat The above operations are performed until K data block pairs are constructed; wherein, N=X+Y; both X and Y are natural numbers greater than or equal to 1 and less than N.
  3. 根据权利要求1所述的方法,其中,所述针对各个训练块构建一个高斯核密度估计器,得到各个训练块对应的高斯核密度估计器,包括:The method according to claim 1, wherein, said constructing a Gaussian kernel density estimator for each training block, obtaining a Gaussian kernel density estimator corresponding to each training block, comprising:
    构建各个训练块中的每一个数据样本对应的高斯核函数;Construct the Gaussian kernel function corresponding to each data sample in each training block;
    基于各个训练块中的每一个数据样本对应的高斯核函数以及预先确定的各个训练块中的数据样本的个数,得到各个训练块对应的高斯核密度估计器。Based on the Gaussian kernel function corresponding to each data sample in each training block and the predetermined number of data samples in each training block, a Gaussian kernel density estimator corresponding to each training block is obtained.
  4. 根据权利要求1所述的方法,其中,所述使用各个训练块对应的高斯核密度估计器,计算各个验证块对应的平均核密度,包括:The method according to claim 1, wherein said use of the Gaussian kernel density estimator corresponding to each training block to calculate the average kernel density corresponding to each verification block includes:
    使用各个训练块对应的高斯核密度估计器,计算各个验证块中的每一个数据样本的核密度;Calculate the kernel density of each data sample in each verification block using the Gaussian kernel density estimator corresponding to each training block;
    基于各个验证块中的每一个数据样本的核密度,计算各个验证块对应的平均核密度。Based on the kernel density of each data sample in each verification block, an average kernel density corresponding to each verification block is calculated.
  5. 根据权利要求4所述的方法,其中,所述使用各个训练块对应的高斯核密度估计器,计算各个验证块中的每一个数据样本的核密度,包括:The method according to claim 4, wherein said use of a Gaussian kernel density estimator corresponding to each training block to calculate the kernel density of each data sample in each verification block includes:
    基于各个训练块对应的高斯核密度估计器,计算各个验证块中的每一个数据样本和其对应的训练块中的每一个数据样本的高斯核函数,得到X×Y个高斯核函数;Based on the Gaussian kernel density estimator corresponding to each training block, calculate the Gaussian kernel function of each data sample in each verification block and each data sample in its corresponding training block, and obtain X×Y Gaussian kernel functions;
    基于各个验证块的每一个数据样本对应的X个高斯核函数以及预先确定的各个训练块中的数据样本的个数,得到各个验证块的对应的Y个核密度。Based on the X Gaussian kernel functions corresponding to each data sample of each verification block and the predetermined number of data samples in each training block, the corresponding Y kernel densities of each verification block are obtained.
  6. 一种核密度估计器的构建装置,包括:构建模块和计算模块;其中,A construction device of a kernel density estimator, comprising: a construction module and a calculation module; wherein,
    所述构建模块,用于基于原始数据集构建出K个数据块对;其中,各个数据块对包括一个训练块和一个验证块;所述原始数据集包括N个数据样本;K和N均为大于1的自然数;针对各个训练块构建一个高斯核密度估计器,得到各个训练块对应的高斯核密度估计器;The building module is used to construct K data block pairs based on the original data set; wherein, each data block pair includes a training block and a verification block; the original data set includes N data samples; K and N are A natural number greater than 1; construct a Gaussian kernel density estimator for each training block, and obtain a Gaussian kernel density estimator corresponding to each training block;
    所述计算模块,用于使用各个训练块对应的高斯核密度估计器,计算各个验证块对应的平均核密度;The calculation module is used to calculate the average kernel density corresponding to each verification block by using the Gaussian kernel density estimator corresponding to each training block;
    所述构建模块,还用于基于全部的验证块对应的平均核密度以及预先确定的结构风险项函数,构建出最终的核密度估计器。The building module is further configured to construct a final kernel density estimator based on the average kernel density corresponding to all verification blocks and a predetermined structural risk term function.
  7. 根据权利要求6所述的装置,其中,所述构建模块,具体用于在所述原始数据集的N个数据样本中随机抽取出X个数据样本;将所述X个数据样本作为当前数据块对中的训练块;将所述原始数据集中剩余的Y个数据样本作为所述当前数据块对中的验证块;由所述当前数据块对中的训练块和所述当前数据块对中的验证块构建出所述K个数据块对中的一个数据块对;重复执行上述操作,直到构建出K个数据块对;其中,N=X+Y;X和Y均为大于等于1且小于N的自然数。The device according to claim 6, wherein the building module is specifically configured to randomly extract X data samples from the N data samples in the original data set; use the X data samples as the current data block The training block in the pair; the remaining Y data samples in the original data set are used as the verification block in the current data block pair; the training block in the current data block pair and the current data block pair in The verification block constructs one of the K data block pairs; repeat the above operations until K data block pairs are constructed; where, N=X+Y; X and Y are both greater than or equal to 1 and less than A natural number of N.
  8. 根据权利要求6所述的装置,其中,所述构建模块,具体用于构建各个训练块中的每一个数据样本对应的高斯核函数;基于各个训练块中的每一个数据样本对应的高斯核函数以及预先确定的各个训练块中的数据样本的个数,得到各个训练块对应的高斯核密度估计器。The device according to claim 6, wherein the building block is specifically used to construct a Gaussian kernel function corresponding to each data sample in each training block; based on the Gaussian kernel function corresponding to each data sample in each training block and the predetermined number of data samples in each training block to obtain a Gaussian kernel density estimator corresponding to each training block.
  9. 根据权利要求6所述的装置,其中,所述计算模块,具体用于使用各个训练块对应的高斯核密度估计器,计算各个验证块中的每一个数据样本的核密度;基于各个验证块中的每一个数据样本的核密度,计算各个验证块对应的平均核密度。The device according to claim 6, wherein the calculation module is specifically configured to use the Gaussian kernel density estimator corresponding to each training block to calculate the kernel density of each data sample in each verification block; The kernel density of each data sample of , and calculate the average kernel density corresponding to each verification block.
  10. 根据权利要求6所述的装置,其中,所述计算模块,具体用于基于各个训练块对应的高斯核密度估计器,计算各个验证块中的每一个数据样本和其对应的训练块中的每一个数据样本的高斯核函数,得到X×Y个高斯核函数;基于各个验证块的每一个数据样本对应的X个高斯核函数以及预先确定的各个训 练块中的数据样本的个数,得到各个验证块的对应的Y个核密度。The device according to claim 6, wherein the calculation module is specifically configured to calculate each data sample in each verification block and each data sample in its corresponding training block based on the Gaussian kernel density estimator corresponding to each training block. A Gaussian kernel function of a data sample to obtain X×Y Gaussian kernel functions; based on the X Gaussian kernel functions corresponding to each data sample of each verification block and the predetermined number of data samples in each training block, each The corresponding Y kernel densities of the verification block.
  11. 一种电子设备,包括:An electronic device comprising:
    一个或多个处理器;one or more processors;
    存储器,用于存储一个或多个程序,memory for storing one or more programs,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1至5中任一项所述的核密度估计器的构建方法。When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the method for constructing a kernel density estimator according to any one of claims 1-5.
  12. 一种存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如权利要求1至5中任一项所述的核密度估计器的构建方法。A storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method for constructing a kernel density estimator according to any one of claims 1 to 5 is realized.
PCT/CN2021/107837 2021-07-22 2021-07-22 Method and apparatus for constructing kernel density estimator, and electronic device and medium WO2023000251A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/107837 WO2023000251A1 (en) 2021-07-22 2021-07-22 Method and apparatus for constructing kernel density estimator, and electronic device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/107837 WO2023000251A1 (en) 2021-07-22 2021-07-22 Method and apparatus for constructing kernel density estimator, and electronic device and medium

Publications (1)

Publication Number Publication Date
WO2023000251A1 true WO2023000251A1 (en) 2023-01-26

Family

ID=84980328

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/107837 WO2023000251A1 (en) 2021-07-22 2021-07-22 Method and apparatus for constructing kernel density estimator, and electronic device and medium

Country Status (1)

Country Link
WO (1) WO2023000251A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063335A (en) * 2018-08-03 2018-12-21 深圳大学 Generation method, device and the computer readable storage medium of increment Density Estimator device
CN109063128A (en) * 2018-08-02 2018-12-21 深圳大学 Integrated Density Estimator device window parameter optimization method, device and terminal device
CN109388784A (en) * 2018-09-12 2019-02-26 深圳大学 Minimum entropy Density Estimator device generation method, device and computer readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063128A (en) * 2018-08-02 2018-12-21 深圳大学 Integrated Density Estimator device window parameter optimization method, device and terminal device
CN109063335A (en) * 2018-08-03 2018-12-21 深圳大学 Generation method, device and the computer readable storage medium of increment Density Estimator device
CN109388784A (en) * 2018-09-12 2019-02-26 深圳大学 Minimum entropy Density Estimator device generation method, device and computer readable storage medium

Similar Documents

Publication Publication Date Title
Li et al. Sparse sliced inverse regression
CN109783490B (en) Data fusion method and device, computer equipment and storage medium
CN112633511B (en) Method for calculating a quantum partitioning function, related apparatus and program product
Abarbanel et al. Data assimilation with regularized nonlinear instabilities
EP3115939A1 (en) Alternative training distribution based on density modification
CN111061740B (en) Data synchronization method, device and storage medium
CN111145076A (en) Data parallelization processing method, system, equipment and storage medium
CN113254716B (en) Video clip retrieval method and device, electronic equipment and readable storage medium
JP7229291B2 (en) Data expansion method and device, device, storage medium
WO2024036662A1 (en) Parallel graph rule mining method and apparatus based on data sampling
CN109408834A (en) Auxiliary machinery interpretation method, device, equipment and storage medium
WO2024032691A1 (en) Machine translation quality assessment method and apparatus, device, and storage medium
CN112749300A (en) Method, apparatus, device, storage medium and program product for video classification
CN109597881B (en) Matching degree determination method, device, equipment and medium
US11977602B2 (en) Domain generalized margin via meta-learning for deep face recognition
Wei et al. Quantile regression in the secondary analysis of case–control data
WO2024082827A1 (en) Text similarity measurement method and apparatus, device, storage medium, and program product
WO2023000251A1 (en) Method and apparatus for constructing kernel density estimator, and electronic device and medium
WO2023130960A1 (en) Service resource determination method and apparatus, and service resource determination system
WO2023222107A1 (en) Gear parameter tolerance sensitivity analysis method, system, tester and storage medium
WO2021072864A1 (en) Text similarity acquisition method and apparatus, and electronic device and computer-readable storage medium
WO2020252925A1 (en) Method and apparatus for searching user feature group for optimized user feature, electronic device, and computer nonvolatile readable storage medium
CN114969641A (en) Nuclear data processing method, electronic device and computer readable storage medium
CN114548254A (en) Equipment fault classification method, device, equipment and medium
CN108984680B (en) Information recommendation method and device, server and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21950506

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE