CN113222632A

CN113222632A - Object mining method and device

Info

Publication number: CN113222632A
Application number: CN202010079932.5A
Authority: CN
Inventors: 黄倩
Original assignee: Beijing Jingdong Zhenshi Information Technology Co Ltd
Current assignee: Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority date: 2020-02-04
Filing date: 2020-02-04
Publication date: 2021-08-06

Abstract

The invention discloses a method and a device for object mining, and relates to the technical field of computers. One embodiment of the method comprises: acquiring feature data of an object to be mined, calculating the prediction capability of each feature, and then selecting the features for the first time according to the prediction capabilities of the features; performing correlation analysis on the first selected features and performing second selection on the first selected features; compressing and reducing dimensions of the features selected for the second time; and performing model training by using the features after the compression and dimension reduction to obtain an object prediction model, and predicting the object to be mined by using the object prediction model to judge whether the object to be mined is a potential object. The method and the system can more pointedly mine the potential objects with high conversion probability, reduce the investment of sales resources and improve the success rate of object mining.

Description

Object mining method and device

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for object mining.

Background

The potential customers are mined by each company, which is one of the important jobs in the development process of the company, and the methods generally adopted by the company for mining the potential customers at present are as follows:

1) telephone searching method: company sales contact customers directly one-to-one by telephone;

2) introduction methods of the skilled person: finding potential customers through the introduction of old customers or friends;

3) the discussion has the following method: attract customers by developing some discussions.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:

the three methods are not supported by a clear sales strategy, so that the manpower, financial resources and time are consumed, and the mining success rate is low.

Disclosure of Invention

In view of this, the embodiments of the present invention provide an object mining method and apparatus, which can more specifically mine a potential object with a high transformation probability, reduce the investment of sales resources, and improve the success rate of object mining.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of object mining.

A method of object mining, comprising: acquiring feature data of an object to be mined, calculating the prediction capability of each feature, and then selecting the features for the first time according to the prediction capabilities of the features; performing correlation analysis on the first selected features and performing second selection on the first selected features; compressing and reducing dimensions of the features selected for the second time; and performing model training by using the features after the compression and dimension reduction to obtain an object prediction model, and predicting the object to be excavated by using the object prediction model to judge whether the object to be excavated is a potential object.

Optionally, calculating the predictive power of each feature comprises: judging the feature type of each feature, wherein the feature type comprises a classified feature and a numerical feature; if the characteristic is a numerical characteristic, discretizing the characteristic to obtain a corresponding classified characteristic, and then calculating the prediction capability of the corresponding classified characteristic; and if the features are classified features, directly calculating the prediction capability of the features.

Optionally, the performing of the correlation analysis on the first selected feature includes: and calculating the correlation between the first selected features by calculating chi-square statistic between the features so as to perform correlation analysis on the first selected features.

Optionally, the compressing and dimension reduction on the features selected for the second time includes: and carrying out compression and dimension reduction on the features selected for the second time by a principal component analysis method.

Optionally, the model training using the compressed dimensionality-reduced features comprises: and performing feature combination on the features subjected to the compression and dimension reduction through a binary tree algorithm, and inputting the feature combinations of leaf nodes of the binary tree into a logistic regression model for model training.

Optionally, after determining whether the object to be mined is a potential object, the method further includes: determining data segmentation points of all potential objects by using a data fitting algorithm, and classifying the potential objects according to the data segmentation points; and clustering each classified class of potential objects respectively, and determining the common characteristics of each class of potential objects according to clustering results, wherein the number of the clustered classes is determined by the ratio of the class spacing to the class inner spacing.

According to another aspect of the embodiments of the present invention, there is provided an apparatus for object mining.

An apparatus for object mining, comprising: the first selection module is used for acquiring feature data of an object to be mined, calculating the prediction capability of each feature, and then performing first selection of the features according to the prediction capability of the features; the second selection module is used for carrying out correlation analysis on the first selected characteristics and carrying out second selection on the first selected characteristics; the feature dimension reduction module is used for compressing and reducing dimensions of the features selected for the second time; and the training prediction module is used for carrying out model training by using the features after the compression and dimension reduction to obtain an object prediction model, and predicting the object to be excavated by using the object prediction model to judge whether the object to be excavated is a potential object.

Optionally, the first selecting module is further configured to: judging the feature type of each feature, wherein the feature type comprises a classified feature and a numerical feature; if the characteristic is a numerical characteristic, discretizing the characteristic to obtain a corresponding classified characteristic, and then calculating the prediction capability of the corresponding classified characteristic; and if the features are classified features, directly calculating the prediction capability of the features.

Optionally, the second selecting module is further configured to: and calculating the correlation between the first selected features by calculating chi-square statistic between the features so as to perform correlation analysis on the first selected features.

Optionally, the feature dimension reduction module is further configured to: and carrying out compression and dimension reduction on the features selected for the second time by a principal component analysis method.

Optionally, the training prediction module is further configured to: and performing feature combination on the features subjected to the compression and dimension reduction through a binary tree algorithm, and inputting the feature combinations of leaf nodes of the binary tree into a logistic regression model for model training.

Optionally, the method further comprises a cluster analysis module, configured to: after judging whether the object to be mined is a potential object, determining data segmentation points for all potential objects by using a data fitting algorithm, and classifying the potential objects according to the data segmentation points; and clustering each classified class of potential objects respectively, and determining the common characteristics of each class of potential objects according to clustering results, wherein the number of the clustered classes is determined by the ratio of the class spacing to the class inner spacing.

According to yet another aspect of the embodiments of the present invention, there is provided an electronic device for object mining.

An electronic device for object mining, comprising: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors realize the method for object mining provided by the embodiment of the invention.

According to yet another aspect of embodiments of the present invention, a computer-readable medium is provided.

A computer readable medium, on which a computer program is stored, which when executed by a processor implements the method of object mining provided by an embodiment of the invention.

One embodiment of the above invention has the following advantages or benefits: the method comprises the steps of obtaining feature data of an object to be mined, calculating the prediction capability of each feature, and then selecting the features for the first time according to the prediction capabilities of the features; performing correlation analysis on the first selected features and performing second selection on the first selected features; then, compressing and reducing dimensions of the features selected for the second time; finally, model training is carried out by using the features after compression and dimension reduction to obtain an object prediction model, the object to be mined is predicted by using the object prediction model to judge whether the object to be mined is a potential object, analysis on behavior features and the like of the object to be mined is realized by a big data mining method, so that the probability that the object to be mined becomes the potential object is predicted, the potential object with high conversion probability can be mined more specifically, the investment of sales resources is reduced, and the success rate of object mining is improved. In addition, the method uses a mode of fusing the binary tree xgboost and the logistic regression model during model training, processes the features by using xgboost, and then performs final model training by using logistic regression, thereby making up the insensitivity of logistic regression to the nonlinear relation and enhancing the accuracy of overall prediction.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of the main steps of a method of object mining according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an implementation of feature combination using a binary tree according to an embodiment of the present invention;

FIG. 3 is a schematic block diagram of a main block diagram of an apparatus for object mining according to an embodiment of the present invention;

FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 5 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram of main steps of a method of object mining according to an embodiment of the present invention. As shown in fig. 1, the method for object mining according to the embodiment of the present invention mainly includes the following steps S101 to S104.

Step S101: acquiring feature data of an object to be mined, calculating the prediction capability of each feature, and then selecting the features for the first time according to the prediction capabilities of the features.

In the embodiment of the invention, the object to be mined is, for example, a client to be mined, and the object to be mined is the client to be mined. When acquiring the feature data of an object to be mined, the following scenarios are combined for introduction: assuming that a logistics company needs to mine potential objects (i.e. potential customers) from merchants who open a store on a certain e-commerce platform, some merchants already use the logistics and some merchants still do not use the logistics, the historical data of the merchants is used for establishing a functional relationship between basic conditions and behavior characteristics (predictive variables) of the merchants and whether the merchants use the logistics (response variables), so as to predict whether the merchants will become potential customers of the logistics company in the future.

For the logistics company, firstly, acquiring basic data of an object to be mined (namely a merchant who opens a shop on a certain e-commerce platform); then, useful feature data is extracted according to basic data of the object to be mined, and the useful feature data comprises the following steps: basic attributes of the merchant, the operating capacity of the merchant, the sensitivity of the merchant to age, the sensitivity of the merchant to price, etc. And then, selecting characteristics according to the acquired characteristic data of the object to be mined. And selecting the features, namely selecting the features with stronger prediction capability for subsequent modeling. The predictive power of each feature needs to be calculated. After acquiring basic data of an object to be mined, the acquired data can be subjected to data cleaning and conversion, including preprocessing such as vacancy value filling and data quality inspection.

According to an embodiment of the present invention, when calculating the prediction capability of each feature, the following steps may be specifically performed:

judging the feature type of each feature, wherein the feature type comprises a classified feature and a numerical feature;

if the characteristic is a numerical characteristic, discretizing the characteristic to obtain a corresponding classified characteristic, and then calculating the prediction capability of the corresponding classified characteristic;

if the feature is a classified feature, the prediction capability of the feature is directly calculated.

In the embodiment of the invention, the feature fields needing feature prediction capability analysis are classified into classified features and numerical features, the numerical features are also called continuous features, and the less-valued features can be processed as classified features and also can be processed as continuous features. The fields to be analyzed for feature prediction capability are determined according to the service and the existing data. The field is classified mainly according to the category attribute of the field, for example, the value of the field "business grade" is A, B, C, D, and the field is classified. The value of the field 'merchant sales amount' is the specific sales amount data of the merchant 10.1, 1000.5, 20.3, etc., and the field is numerical.

The advantage of the classification type feature is that the distribution of each value can be seen, and the segmentation processing is performed in the next step as the continuity feature, and the target data of partial values are merged. The prediction capability needs to be calculated separately for both the categorical and numerical attributes. For numerical features (continuous features), before the prediction capability is calculated, segmentation processing needs to be performed, that is: the continuous data is discretized into several categories through transformation, and the discretization principle is to make the characteristic and the response variable linear. And calculating the same classification type characteristic of the prediction capability after discretization processing. When discretization processing is carried out, for example, the 'sales amount of a merchant' is continuous, then a new variable is defined as amount classification, when the sales amount is less than 10, the amount classification value is 1, when the sales amount is 10-100, the amount classification value is 2, when the sales amount is 100-1000, the amount classification value is 3, when the sales amount is more than 1000, the amount classification value is 4, then the amount classification is the result of discretization of the sales amount, and the two variables have a certain linear relation.

When the prediction capability of the classification type feature is calculated, the classification type feature X is assumed to have n classes, and the prediction capability calculation formula of X is as follows:

wherein, a_i/a_TIs the proportion of the merchants using the logistics in this class to all merchants using the logistics in the sample, n_i/n_TIs the proportion of the group of merchants who have not used the logistics to all of the unused logistics merchants in the sample.

By calculating the prediction capability of each feature, setting a prediction capability threshold value and taking the feature with the prediction capability higher than the prediction capability threshold value as the feature selected for the first time, the feature with strong prediction capability can be selected for further processing and model training.

Step S102: and carrying out correlation analysis on the first selected characteristics and carrying out second selection on the first selected characteristics.

In order to better introduce the influence of different features on the model during model training, the correlation between the selected features is as weak as possible, so that the first selected feature needs to be subjected to correlation analysis and then to secondary selection.

When the correlation analysis is performed on the first selected features, the correlation degree between the first selected features may be specifically calculated by calculating chi-square statistic between the features, so as to perform the correlation analysis on the first selected features. Where chi-squared statistic refers to a measure of the difference between the distribution of data and a selected expected or hypothetical distribution. Suppose that: h0 indicates that the row classification variable is not associated with the column classification variable; h1 indicates that the row classification variable is associated with the column classification variable, then:

wherein f is_eTo the desired frequency, f₀To observe frequency, x²The degree of freedom is (gamma-1) (c-1), gamma is the number of rows and c is the number of columns.

After chi-squared statistics between two features are obtained, the features can be selected a second time. Chi-square statistic x²Describes how much the observed value correlates with the expected value, if x²The smaller the value of (A), the stronger the correlation between the two, and the screening is required, one of which can be randomly reserved. In general, x can be determined from a threshold corresponding to a chi-squared distribution with a significance level of 0.05 and a degree of freedom of (γ -1) (c-1)₂Whether large or small. If the correlation of the two features is not strong, both may be retained.

Step S103: and carrying out compression and dimension reduction on the features selected for the second time. In the embodiment of the invention, the compression and dimension reduction are carried out on the features selected for the second time by a principal component analysis method. And the principal component analysis method is adopted to compress and reduce the dimension of the features, so that the linear correlation among the features can be further eliminated. Principal component analysis, also called principal component analysis, aims to convert multiple indexes into a few comprehensive indexes (i.e. principal components) by using the idea of dimension reduction, wherein each principal component can reflect most information of an original variable and the contained information is not repeated.

Principal component Y₁，Y₂，…，Y_pExpressed as a linear combination of the original feature parameters, noted as the algebraic form:

wherein, Y_i＝u_i' X is the ith principal component of the original characteristic parameter, u_i＝(u_i1，u_i2，…，u_ip) ' as a coefficient vector, the linear combination is constrained by the following constraint:

1、u_i'u_i＝1；

2. when i ≠ j, Y_iAnd Y_jAre mutually orthogonal;

3、Y₁is X₁，X₂，…，X_pThe one with the largest variance in all linear combinations of (2), Y₂Is at Y₁On the premise that the variance is maximum, X is₁，X₂，…，X_pThe largest variance in all linear combinations of (1), and so on, Y_pIs at Y_p-1On the premise that the variance is maximum, X is₁，X₂，…，X_pThe maximum variance in all linear combinations of (1).

According to the above steps S101 to S103, the model input data can be prepared by performing the feature processing (feature selection, feature compression dimension reduction, etc.).

Step S104: and performing model training by using the features after the compression and dimension reduction to obtain an object prediction model, and predicting the object to be mined by using the object prediction model to judge whether the object to be mined is a potential object.

When the features after the compression and dimension reduction are used for model training, the features after the compression and dimension reduction can be specifically combined through a binary tree algorithm, and the feature combinations of leaf nodes of the binary tree are input into a logistic regression model for model training.

The binary problem generally employs a logistic regression model. Logistic regression is a generalized linear model, and adding a sigma function to make its output value within [0, 1] can be regarded as the probability value of an event. But logistic regression does not work well with non-linear relationships. To solve this problem, the present invention uses a combination of features to solve this problem. If the characteristics "merchant price sensitivity" and "merchant age sensitivity" are non-linear with the final prediction. And the merchant price sensitivity + merchant aging sensitivity is linear with the prediction result. Such as; merchant price sensitivity (1) and merchant age sensitivity (1), one union operation is a combination of characteristic values. In the embodiment of the present invention, one _ hot (an unique hot code, which is intuitively a code system including how many states have how many bits, only one bit is 1, and all others are 0) is used to represent the feature value, so that "price sensitivity 1 →" price sensitivity + aging sensitivity 1 ".

The difficulty of feature value combination is also part of feature engineering, which is how to combine which feature values are valid. When the features are combined, the features to be combined need to be selected first, and when the features are selected, a tree model is mostly used for selecting the most important feature values. In embodiments of the present invention, the tree model functions as a combination of features. Taking the cart tree as an example, because it is a binary tree, each node has two branches. Each leaf node as an output.

Fig. 2 is a schematic diagram illustrating an implementation principle of feature combination using a binary tree according to an embodiment of the present invention, as shown in fig. 2. In this embodiment, if sample x eventually falls on the corresponding leaf node of the girl, "age < 15 and No male" can be considered to have a linear relationship with the final result. Thus, each leaf node can be regarded as a combined feature value, and the feature value has a linear relationship with the result.

In the embodiment of the invention, an eXtreme Gradient Boosting model (hereinafter referred to as xgboost) is adopted to discover the linear relation, and the feature combination corresponding to the leaf node of xgboost is taken as the selected feature combination and input into a logistic regression model as a new feature to train and predict the model. In addition, the selected feature combinations may also be encoded before being input to the logistic regression model, for example: the selected combination of features is encoded using one _ hot encoding. During model training, a binary tree xgboost and logistic regression model fusion mode is used, characteristics are processed by using xgboost, and then final model training is performed by using logistic regression.

In the model training process, the trained model can be evaluated, and model parameters can be adjusted according to the evaluation result, specifically, the model evaluation can be performed through an ROC curve, a KS curve, a lift graph, a GINI coefficient, and the like. And after the evaluation is passed, obtaining an object prediction model, then predicting all current objects to be mined by using the object prediction model, and giving the probability that each object to be mined is converted into a potential object next. And arranging the objects to be mined in a descending order according to the probability of the objects to be mined as potential objects, and taking the objects to be mined which are larger than a given probability threshold value as the potential objects.

According to another embodiment of the present invention, after determining whether the object to be mined is a potential object, the method may further include:

determining data segmentation points of all potential objects by using a data fitting algorithm, and classifying the potential objects according to the data segmentation points;

and clustering each classified class of potential objects respectively, and determining the common characteristics of each class of potential objects according to clustering results, wherein the number of the clustered classes is determined by the ratio of the class spacing to the class inner spacing.

According to the historical data of sales of a large number of merchants and unit prices of products, the nodes of data segmentation can be determined by using a data fitting algorithm, and then potential objects are divided into four categories: high value merchants (high sales, high unit price), core merchants (low unit volume, high unit price), key merchants (high sales, low unit price), and long-term tracking of merchants (low unit volume, low unit price).

For each type of potential object, performing clustering from the existing dimensionalities of the potential object, such as logistics timeliness, logistics price, logistics service and the like, wherein the clustering mainly uses Euclidean distance, namely:

where x and y are feature values of the same feature of two potential objects, respectively.

When clustering is carried out, the clustered class number can be screened by utilizing the ratio of the class spacing to the class inner spacing, and the larger the ratio is, the more reasonable the clustered class number is. In addition, other distance algorithms can also be used in clustering, for example: mahalanobis distance, etc.

Finally, according to the clustering result, summarizing the focus points of each type of potential objects when using logistics, namely: the common characteristics of each class of potential objects, and thus the corresponding marketing strategy, are available at the time of sale.

Fig. 3 is a schematic block diagram of a main block of an object excavating apparatus according to an embodiment of the present invention. As shown in fig. 3, the apparatus 300 for object mining according to the embodiment of the present invention mainly includes a first selecting module 301, a second selecting module 302, a feature dimension reducing module 303, and a training prediction module 304.

The first selection module 301 is configured to obtain feature data of an object to be mined, calculate a prediction capability of each feature, and perform first selection of the features according to the prediction capabilities of the features;

a second selecting module 302, configured to perform correlation analysis on the first selected feature and perform second selection on the first selected feature;

the feature dimension reduction module 303 is configured to perform compression dimension reduction on the features selected for the second time;

and the training prediction module 304 is configured to perform model training using the features after the compression and the dimension reduction to obtain an object prediction model, and predict the object to be mined using the object prediction model to determine whether the object to be mined is a potential object.

According to an embodiment of the present invention, the first selecting module 301 may further be configured to:

and if the features are classified features, directly calculating the prediction capability of the features.

According to another embodiment of the present invention, the second selecting module 302 may further be configured to:

and calculating the correlation between the first selected features by calculating chi-square statistic between the features so as to perform correlation analysis on the first selected features.

According to yet another embodiment of the invention, the feature dimension reduction module 303 may be further configured to:

and carrying out compression and dimension reduction on the features selected for the second time by a principal component analysis method.

According to yet another embodiment of the invention, the training prediction module 304 may be further configured to:

and performing feature combination on the features subjected to the compression and dimension reduction through a binary tree algorithm, and inputting the feature combinations of leaf nodes of the binary tree into a logistic regression model for model training.

According to another embodiment of the present invention, the apparatus 300 for object mining may further include a cluster analysis module (not shown in the figure) for:

after judging whether the object to be mined is a potential object, determining data segmentation points for all potential objects by using a data fitting algorithm, and classifying the potential objects according to the data segmentation points;

According to the technical scheme of the embodiment of the invention, the characteristic data of the object to be mined is obtained, the prediction capability of each characteristic is calculated, and then the characteristic is selected for the first time according to the prediction capability of the characteristic; performing correlation analysis on the first selected features and performing second selection on the first selected features; then, compressing and reducing dimensions of the features selected for the second time; finally, model training is carried out by using the features after compression and dimension reduction to obtain an object prediction model, the object to be mined is predicted by using the object prediction model to judge whether the object to be mined is a potential object, analysis on behavior features and the like of the object to be mined is realized by a big data mining method, so that the probability that the object to be mined becomes the potential object is predicted, the potential object with high conversion probability can be mined more specifically, the investment of sales resources is reduced, and the success rate of object mining is improved. In addition, the method uses a mode of fusing the binary tree xgboost and the logistic regression model during model training, processes the features by using xgboost, and then performs final model training by using logistic regression, thereby making up the insensitivity of logistic regression to the nonlinear relation and enhancing the accuracy of overall prediction.

Fig. 4 illustrates an exemplary system architecture 400 to which the method of object mining or the apparatus of object mining of an embodiment of the present invention may be applied.

As shown in fig. 4, the system architecture 400 may include

terminal devices

401, 402, 403, a network 404, and a server 405. The network 404 serves as a medium for providing communication links between the

terminal devices

401, 402, 403 and the server 405. Network 404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

A user may use

terminal devices

401, 402, 403 to interact with a server 405 over a network 404 to receive or send messages or the like. The

terminal devices

401, 402, 403 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 405 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the

terminal devices

401, 402, 403. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.

It should be noted that the method for object mining provided by the embodiment of the present invention is generally executed by the server 405, and accordingly, the apparatus for object mining is generally disposed in the server 405.

It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use with a terminal device or server implementing an embodiment of the invention is shown. The terminal device or the server shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 501.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware. The described units or modules may also be provided in a processor, and may be described as: a processor comprises a first selection module, a second selection module, a feature dimension reduction module and a training prediction module. The names of the units or modules do not limit the units or modules, for example, the first selection module may be further described as a module that obtains feature data of an object to be mined, calculates the prediction capability of each feature, and then performs the first selection of the features according to the prediction capabilities of the features.

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: acquiring feature data of an object to be mined, calculating the prediction capability of each feature, and then selecting the features for the first time according to the prediction capabilities of the features; performing correlation analysis on the first selected features and performing second selection on the first selected features; compressing and reducing dimensions of the features selected for the second time; and performing model training by using the features after the compression and dimension reduction to obtain an object prediction model, and predicting the object to be excavated by using the object prediction model to judge whether the object to be excavated is a potential object.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of object mining, comprising:

acquiring feature data of an object to be mined, calculating the prediction capability of each feature, and then selecting the features for the first time according to the prediction capabilities of the features;

performing correlation analysis on the first selected features and performing second selection on the first selected features;

compressing and reducing dimensions of the features selected for the second time;

and performing model training by using the features after the compression and dimension reduction to obtain an object prediction model, and predicting the object to be excavated by using the object prediction model to judge whether the object to be excavated is a potential object.

2. The method of claim 1, wherein computing the predictive power of each feature comprises:

3. The method of claim 1, wherein performing a correlation analysis on the first selected feature comprises:

4. The method of claim 1, wherein performing a compressed dimensionality reduction on the second selected feature comprises:

5. The method of claim 1, wherein model training using the compressed dimensionality reduced features comprises:

6. The method of claim 1, wherein after determining whether the object to be mined is a potential object, further comprising:

7. An apparatus for object mining, comprising:

the first selection module is used for acquiring feature data of an object to be mined, calculating the prediction capability of each feature, and then performing first selection of the features according to the prediction capability of the features;

the second selection module is used for carrying out correlation analysis on the first selected characteristics and carrying out second selection on the first selected characteristics;

the feature dimension reduction module is used for compressing and reducing dimensions of the features selected for the second time;

and the training prediction module is used for carrying out model training by using the features after the compression and dimension reduction to obtain an object prediction model, and predicting the object to be excavated by using the object prediction model to judge whether the object to be excavated is a potential object.

8. The apparatus of claim 7, wherein the first selecting module is further configured to:

9. An electronic device for object mining, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-6.