CN112785090A - Model training method, type prediction method, device and computing equipment - Google Patents

Model training method, type prediction method, device and computing equipment Download PDF

Info

Publication number
CN112785090A
CN112785090A CN202110233777.2A CN202110233777A CN112785090A CN 112785090 A CN112785090 A CN 112785090A CN 202110233777 A CN202110233777 A CN 202110233777A CN 112785090 A CN112785090 A CN 112785090A
Authority
CN
China
Prior art keywords
index data
business object
data
type
decision tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110233777.2A
Other languages
Chinese (zh)
Inventor
周茜
浦婧蕾
王毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110233777.2A priority Critical patent/CN112785090A/en
Publication of CN112785090A publication Critical patent/CN112785090A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Software Systems (AREA)
  • Educational Administration (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Technology Law (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification discloses a model training method, a type prediction method, a device and computing equipment. The type prediction method comprises the following steps: acquiring a plurality of index data of a business object in a historical period; and inputting the index data into an integrated learning classification model based on the decision tree to obtain the type of the business object in the future period. The model training method, the type prediction device and the computing equipment in the embodiment of the specification can improve the accuracy of the prediction result.

Description

Model training method, type prediction method, device and computing equipment
Technical Field
The embodiment of the specification relates to the technical field of computers, in particular to a model training method, a type prediction device and computing equipment.
Background
With the development of science and technology, the application of artificial intelligence technology brings various conveniences to people's daily life. In some scenarios, a prediction of the type of business object is required. For example, it is desirable to predict whether a stock is a newly entered stock.
In the related art, a logistic regression model may be selected to predict the type of business object. However, the logistic regression model is relatively simple, and when a large number of features are processed, the logistic regression model is easy to be under-fitted, which causes inaccuracy of a prediction result.
Disclosure of Invention
The embodiment of the specification provides a model training method, a type prediction device and computing equipment, so that the accuracy of a prediction result is improved. The technical scheme of the embodiment of the specification is as follows.
In a first aspect of embodiments of the present specification, there is provided a model training method, including:
acquiring a plurality of index data and labels of the business object, wherein the labels are used for representing the type of the business object;
screening target index data from the plurality of index data;
and training the integrated learning classification model based on the decision tree according to the target index data and the label.
In a second aspect of embodiments of the present specification, there is provided a type prediction method including:
acquiring a plurality of index data of a business object in a historical period;
and inputting the index data into an integrated learning classification model based on the decision tree to obtain the type of the business object in the future period.
In a third aspect of embodiments of the present specification, there is provided a model training apparatus including:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a plurality of index data and labels of business objects, and the labels are used for representing the types of the business objects;
the screening unit is used for screening target index data from the index data;
and the training unit is used for training the integrated learning classification model based on the decision tree according to the target index data and the label.
In a fourth aspect of embodiments of the present specification, there is provided a type prediction apparatus including:
the acquisition unit is used for acquiring a plurality of index data of the business object in a historical period;
and the input unit is used for inputting the index data into the integrated learning classification model based on the decision tree to obtain the type of the business object in the future time period.
In a fifth aspect of embodiments of the present specification, there is provided a computing device comprising:
at least one processor;
a memory storing program instructions configured to be suitable for execution by the at least one processor, the program instructions comprising instructions for performing the method of the first or second aspect.
According to the technical scheme provided by the embodiment of the specification, the integrated learning classification model based on the decision tree can be trained by using the index data and the labels of the business objects. In addition, the types of the business objects can be predicted by using the integrated learning classification model based on the decision tree, so that the accuracy of the prediction result is improved.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a model training method in an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a model training process in an embodiment of the present disclosure;
FIG. 3 is a flow chart illustrating a type prediction method according to an embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure;
FIG. 5 is a schematic structural diagram of a type prediction apparatus in an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a computing device in an embodiment of the present specification.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.
Considering that the integrated learning classification model based on the decision tree has high accuracy, high efficiency and strong interpretability, the embodiment of the specification predicts the type of the business object by using the integrated learning classification model based on the decision tree.
Please refer to fig. 1. The embodiment of the specification provides a model training method. The model training method can be applied to a server. The server may be one server, a server cluster including a plurality of servers, or a server deployed in the cloud. The model training method can be used for training the integrated learning classification model based on the decision tree. The integrated learning classification model based on the decision tree can be an integrated learning model based on decision tree implementation. The decision tree-based ensemble learning classification model may be an XGBoost model. Of course, the integrated learning classification model based on the Decision Tree may also be other models, such as GBDT (Gradient Boosting Decision Tree).
The model training method may include the following steps.
Step S11: obtaining a plurality of index data and labels of the business objects, wherein the labels are used for representing the types of the business objects.
In some embodiments, the indicator data may include market indicator data for the business object and financial indicator data for a business associated with the business object. The index data can comprise market condition index data and financial index data, so that the index data are more comprehensive, and the training effect can be improved. The tag may be used to indicate the type of business object.
The business object may include a stock. The label may be used to indicate whether the stock is a newly entered stock. The barns may be stocks held in large numbers by the organization. Specifically, for example, a fund barter may be a stock held by a fund company and occupying more than 20% of the value of the market for circulation. The newly entered double stocks may be stocks which were not double stocks in the previous quarter and become double stocks in the current quarter. The index data may include market index data for the stock, and financial index data for a business associated with the stock. Wherein the market index data can reflect valuation, stock price, volume of trades, and the like of the stocks. The financial index data can reflect the operation of the company in terms of profitability, operational capacity, cash flow, and the like.
For example, a plurality of index data of a business object can be as shown in table 1 below.
TABLE 1
Figure BDA0002959800930000031
Figure BDA0002959800930000041
Of course, in practice, the business object may be other financial objects such as futures or bonds.
In some embodiments, the server may collect metrics data and tags for one or more business objects. In practice, for each business object, the server may collect a plurality of index data of the business object and tags corresponding to the index data. For example, the server may collect a plurality of index data of the business object and tags corresponding to the plurality of index data from the internet. Specifically, for example, the server may collect, from the internet, data on the financial index of the listed company for each quarter during 2016 to 2020, data on the market quotation index of the stocks issued by the listed company for each quarter, and tags of the stocks issued by the listed company for each quarter.
Step S13: and screening target index data from the plurality of index data.
In some embodiments, in order to improve the training effect of the decision tree-based ensemble learning classification model, the server may filter the index data to obtain target index data.
The server can analyze the correlation among the index data to realize the screening of the target index data from the plurality of index data. Specifically, the server may determine a correlation coefficient between each two of the plurality of index data; a plurality of target index data may be screened out from the plurality of index data such that a correlation coefficient between each two of the plurality of target index data satisfies a first condition. Wherein the correlation coefficient between index data may be used to represent the correlation between index data. The larger the correlation coefficient is, the more correlation between index data is represented; the smaller the correlation coefficient, the less correlation between index data. In practice, the correlation coefficient between two index data may be an empirical value. Or, the correlation coefficient between every two index data can be obtained in a calculation mode. For example, the server may further calculate a consistency coefficient (coefficient of consistency) between each two index data as a correlation coefficient between each two index data. The first condition may include: the correlation coefficient is less than or equal to a first threshold. Therefore, the selected target index data can be irrelevant.
For example, the plurality of metric data may include metric data A, B, C, D, E. The correlation coefficient between the index data a and the index data B is 0.5, the correlation coefficient between the index data a and the index data C is 0.6, the correlation coefficient between the index data a and the index data D is 0.75, the correlation coefficient between the index data a and the index data E is 0.85, the correlation coefficient between the index data B and the index data C is 0.8, the correlation coefficient between the index data B and the index data D is 0.86, the correlation coefficient between the index data B and the index data E is 0.9, the correlation coefficient between the index data C and the index data D is 0.2, the correlation coefficient between the index data C and the index data E is 0.3, and the correlation coefficient between the index data D and the index data E is 0.88. The first condition may include: the correlation coefficient is less than or equal to 0.8. Then the server may select metric data A, C, D from the metric data A, B, C, D, E as target metric data.
Alternatively, the server may further determine a correlation coefficient between the metric data and the tag; index data having a correlation coefficient satisfying the second condition may be screened out from the plurality of index data as target index data. The correlation coefficient between the index data and the label may be used to represent the correlation between the index data and the label. The correlation coefficient between the index data and the label may be an empirical value. Alternatively, the correlation coefficient between the index data and the label may be obtained by calculation. For example, the server may calculate a shrarp value between the index data and the tag as a correlation coefficient between the index data and the tag. The second condition may include: the correlation coefficient is greater than or equal to a second threshold. Therefore, the screened target index data can be index data with large influence on the label.
In some embodiments, there may be some data missing from the multiple metric data of the business object. For example, a company has missing certain index data for a certain quarter. To this end, the server may fill in missing data. For example, the server may populate the missing data with an average or mode.
In some embodiments, the server may normalize the metric data. Specifically, the server can adopt a Z-score method or a Min-Max method to carry out normalization processing on the index data.
Step S15: and training the integrated learning classification model based on the decision tree according to the target index data and the label.
In some embodiments, the decision tree based ensemble learning classification model may include an XGBoost model.
Please refer to fig. 2. The XGBoost model may be an additive model. In particular, the XGboost model may be represented as
Figure BDA0002959800930000051
M represents the number of decision trees in the XGboost model, fM(x) Representing the predicted result of the XGboost model, T (x, theta)m) Represents the predicted result, θ, of the m-th decision treemRepresenting the parameters of the mth decision tree and x representing the input of the XGBoost model.
The XGboost model training process can be realized based on a forward step-by-step algorithm. Specifically, during the training process, one decision tree may be added to each iteration of the XGBoost model. So that the XGboost model can be expressed as fm(x)=fm-1(x)+T(x,θm)。fm-1(x) Representing the prediction result of the current XGboost model, T (x, theta)m) Representing the predicted result of the newly added mth decision tree, fM(x) And (4) representing the prediction result of the XGboost model after the mth decision tree is added. The optimization goal of the iterated XGboost model may be to make
Figure BDA0002959800930000061
The minimum value is obtained.
By pairs
Figure BDA0002959800930000062
And solving to obtain the parameters of the XGboost model. y isiDenotes the label, L denotes the loss function, and arg denotes the gradient.
Of course, the integrated learning classification model based on the Decision Tree may also be other models, such as GBDT (Gradient Boosting Decision Tree).
In some embodiments, the target metric data and the labels of the business objects may constitute training data. For example, the server may collect, from the internet, quarterly financial index data for the listed companies, quarterly market quotation index data for stocks issued by the listed companies, and quarterly labels for stocks issued by the listed companies over 2016-2020. Then, for each business object, the server may construct 20 training data for the business object according to the target index data for 20 quarters during 2016-2020 for the business object and the tags for 20 quarters.
The server may train the decision tree based ensemble learning classification model according to the training data. The server specifically can adopt a gradient descent method or a Newton method to train the integrated learning classification model based on the decision tree.
The model training method in the embodiment of the specification can acquire a plurality of index data and labels of the business object, wherein the labels are used for representing the type of the business object; target index data can be screened from a plurality of index data; the decision tree-based ensemble learning classification model can be trained according to target index data and labels. Therefore, the integrated learning classification model based on the decision tree can be trained by using the index data and the labels of the business objects, and a basis is provided for predicting the types of the business objects by using the integrated learning classification model based on the decision tree.
Please refer to fig. 3. The embodiment of the specification provides a classification method. The classification method may be applied to a server. The server may be one server, a server cluster including a plurality of servers, or a server deployed in the cloud.
The classification method may include the following steps.
Step S21: and acquiring a plurality of index data of the business object in a historical period.
In some embodiments, the indicator data may include market indicator data for the business object and financial indicator data for a business associated with the business object. The index data can comprise market situation index data and financial index data, so that the index data for predicting the stock types is more comprehensive, and the accuracy of prediction is improved. For example, the business object may include stocks. The index data may include market index data for the stock, and financial index data for a business associated with the stock. Wherein the market index data can reflect valuation, stock price, volume of trades, and the like of the stocks. The financial index data can reflect the operation of the company in terms of profitability, operational capacity, cash flow, and the like.
Of course, in practice, the business object may be other financial objects such as futures or bonds.
In some embodiments, the server may obtain a plurality of metric data of the business object over a historical period. The length of the historical period may be one quarter or one year. For example, the server may obtain a plurality of metric data for the business object over the last quarter. Wherein, the type of the index data can be a specified type. For example, the specified type may be a type of the target index data in the embodiment corresponding to fig. 1.
Step S23: and inputting the index data into an integrated learning classification model based on the decision tree to obtain the type of the business object in the future period.
In some embodiments, the integrated learning classification model based on the decision tree may be obtained by training based on the model training method of the embodiment corresponding to fig. 1. The decision tree based ensemble learning classification model may include an XGBoost model.
In some embodiments, the types of business objects at different time periods may be different. For example, the business object may be a stock that may be newly entered into a barns in one quarter and may not be newly entered into a barns in another quarter. The server can input the index data into an ensemble learning classification model based on a decision tree to obtain the type of the business object in a future period. The length of the future period may be one quarter or one year. The type may be selected from a first type and a second type. The meaning of the first type and the second type representation is different according to the business object. Taking the business object as a stock as an example, the first type may be used to indicate that the stock is a newly entered heavy stock, and the second type may be used to indicate that the stock is not a newly entered heavy stock.
For example, the server may obtain a plurality of index data of the business object in the last quarter; the index data may be input to a decision tree based ensemble learning classification model to obtain the type of the business object in the next quarter.
The classification method of the embodiment of the specification can acquire a plurality of index data of the business object in a historical period; the index data can be input into an integrated learning classification model based on a decision tree to obtain the type of the business object in the future period. Therefore, the accuracy of the prediction result can be improved by predicting the type of the business object by using the integrated learning classification model based on the decision tree.
Please refer to fig. 4. The embodiment of the specification provides a model training device, and the device can comprise the following units.
An obtaining unit 31, configured to obtain a plurality of index data and a tag of a business object, where the tag is used to indicate a type of the business object;
a screening unit 33 configured to screen target index data from the plurality of index data;
and the training unit 35 is configured to train the ensemble learning classification model based on the decision tree according to the target index data and the label.
Please refer to fig. 5. The embodiment of the present specification provides a type of prediction apparatus, which may include the following units.
An obtaining unit 41, configured to obtain a plurality of index data of the business object in a history period;
and the input unit 43 is used for inputting the index data into the integrated learning classification model based on the decision tree to obtain the type of the business object in the future time period.
Please refer to fig. 6. The embodiment of the specification also provides a computing device.
The computing device may include a memory and a processor.
In the present embodiment, the Memory includes, but is not limited to, a Dynamic Random Access Memory (DRAM), a Static Random Access Memory (SRAM), and the like. The memory may be used to store computer instructions.
In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The processor may be configured to execute the computer instructions to implement the embodiments corresponding to fig. 1 or fig. 3.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and the same or similar parts in each embodiment may be referred to each other, and each embodiment focuses on differences from other embodiments. In particular, for the apparatus embodiment and the computing device embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and reference may be made to some descriptions of the method embodiment for relevant points. In addition, it is understood that one skilled in the art, after reading this specification document, may conceive of any combination of some or all of the embodiments listed in this specification without the need for inventive faculty, which combinations are also within the scope of the disclosure and protection of this specification.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
From the above description of the embodiments, it is clear to those skilled in the art that the present specification can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the present specification may be essentially or partially implemented in the form of software products, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.
The description is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
While the specification has been described with examples, those skilled in the art will appreciate that there are numerous variations and permutations of the specification that do not depart from the spirit of the specification, and it is intended that the appended claims include such variations and modifications that do not depart from the spirit of the specification.

Claims (10)

1. A model training method, comprising:
acquiring a plurality of index data and labels of the business object, wherein the labels are used for representing the type of the business object;
screening target index data from the plurality of index data;
and training the integrated learning classification model based on the decision tree according to the target index data and the label.
2. The method of claim 1, the indicator data comprising market indicator data for a business object and financial indicator data for a business associated with the business object; the business object comprises a stock;
the decision tree-based ensemble learning classification model comprises an XGboost model.
3. The method of claim 1, wherein the screening target metric data from the plurality of metric data comprises:
determining a correlation coefficient between every two of the plurality of index data; and screening a plurality of target index data from the plurality of index data, wherein the correlation coefficient between every two of the plurality of target index data meets a first condition.
4. The method of claim 1, wherein the screening target metric data from the plurality of metric data comprises:
determining a correlation coefficient between the index data and the label;
and screening out index data of which the correlation coefficient meets a second condition from the plurality of index data as target index data.
5. A type prediction method, comprising:
acquiring a plurality of index data of a business object in a historical period;
and inputting the index data into an integrated learning classification model based on the decision tree to obtain the type of the business object in the future period.
6. The method of claim 5, wherein the business object comprises a stock, and wherein the type is selected from a first type and a second type, the first type being used to indicate that the stock is a newly-entered re-warehouse stock, and the second type being used to indicate that the stock is not a newly-entered re-warehouse stock.
7. The method of claim 5, wherein the integrated learning classification model based on the decision tree is trained based on the method of any one of claims 1-4; the decision tree-based ensemble learning classification model comprises an XGboost model.
8. A model training apparatus comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a plurality of index data and labels of business objects, and the labels are used for representing the types of the business objects;
the screening unit is used for screening target index data from the index data;
and the training unit is used for training the integrated learning classification model based on the decision tree according to the target index data and the label.
9. A type prediction apparatus comprising:
the acquisition unit is used for acquiring a plurality of index data of the business object in a historical period;
and the input unit is used for inputting the index data into the integrated learning classification model based on the decision tree to obtain the type of the business object in the future time period.
10. A computing device, comprising:
at least one processor;
a memory storing program instructions configured for execution by the at least one processor, the program instructions comprising instructions for performing the method of any of claims 1-7.
CN202110233777.2A 2021-03-03 2021-03-03 Model training method, type prediction method, device and computing equipment Pending CN112785090A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110233777.2A CN112785090A (en) 2021-03-03 2021-03-03 Model training method, type prediction method, device and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110233777.2A CN112785090A (en) 2021-03-03 2021-03-03 Model training method, type prediction method, device and computing equipment

Publications (1)

Publication Number Publication Date
CN112785090A true CN112785090A (en) 2021-05-11

Family

ID=75762230

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110233777.2A Pending CN112785090A (en) 2021-03-03 2021-03-03 Model training method, type prediction method, device and computing equipment

Country Status (1)

Country Link
CN (1) CN112785090A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665175A (en) * 2018-05-16 2018-10-16 阿里巴巴集团控股有限公司 A kind of processing method, device and the processing equipment of insurance business risk profile
CN109360089A (en) * 2018-11-20 2019-02-19 四川大学 Credit risk prediction technique and device
CN110084400A (en) * 2019-03-21 2019-08-02 平安直通咨询有限公司上海分公司 Information forecasting method, device, computer equipment and storage medium
CN110364257A (en) * 2019-07-18 2019-10-22 泰康保险集团股份有限公司 People's vehicle Risk Forecast Method, device, medium and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665175A (en) * 2018-05-16 2018-10-16 阿里巴巴集团控股有限公司 A kind of processing method, device and the processing equipment of insurance business risk profile
CN109360089A (en) * 2018-11-20 2019-02-19 四川大学 Credit risk prediction technique and device
CN110084400A (en) * 2019-03-21 2019-08-02 平安直通咨询有限公司上海分公司 Information forecasting method, device, computer equipment and storage medium
CN110364257A (en) * 2019-07-18 2019-10-22 泰康保险集团股份有限公司 People's vehicle Risk Forecast Method, device, medium and electronic equipment

Similar Documents

Publication Publication Date Title
US11663409B2 (en) Systems and methods for training machine learning models using active learning
CN110995459B (en) Abnormal object identification method, device, medium and electronic equipment
CN111783873A (en) Incremental naive Bayes model-based user portrait method and device
CN111611390B (en) Data processing method and device
CN113934851A (en) Data enhancement method and device for text classification and electronic equipment
Aralikatte et al. Fault in your stars: an analysis of android app reviews
CN109299007A (en) A kind of defect repair person's auto recommending method
CN108229572B (en) Parameter optimization method and computing equipment
CN110851600A (en) Text data processing method and device based on deep learning
CN113705201B (en) Text-based event probability prediction evaluation algorithm, electronic device and storage medium
CN115905293A (en) Switching method and device of job execution engine
CN116226373A (en) Industry classification model training method and enterprise industry classification method
CN112785090A (en) Model training method, type prediction method, device and computing equipment
US20220292393A1 (en) Utilizing machine learning models to generate initiative plans
CN111858899B (en) Statement processing method, device, system and medium
CN112560463A (en) Text multi-labeling method, device, equipment and storage medium
Mukherjee et al. Determining standard occupational classification codes from job descriptions in immigration petitions
Ghosh et al. Understanding Machine Learning
CN107451662A (en) Optimize method and device, the computer equipment of sample vector
US20240119470A1 (en) Systems and methods for generating a forecast of a timeseries
Sharma Identifying Factors Contributing to Lead Conversion Using Machine Learning to Gain Business Insights
US20200357049A1 (en) Tuning hyperparameters for predicting spending behaviors
Nguyen et al. Predicting bankruptcy using machine learning algorithms
CN113554184A (en) Model training method and device, electronic equipment and storage medium
CN117391367A (en) Policy task allocation method and device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination