CN114493379B - Enterprise evaluation model automatic generation method, device and system based on government affair data - Google Patents

Enterprise evaluation model automatic generation method, device and system based on government affair data Download PDF

Info

Publication number
CN114493379B
CN114493379B CN202210362892.4A CN202210362892A CN114493379B CN 114493379 B CN114493379 B CN 114493379B CN 202210362892 A CN202210362892 A CN 202210362892A CN 114493379 B CN114493379 B CN 114493379B
Authority
CN
China
Prior art keywords
machine learning
hyper
data
data set
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210362892.4A
Other languages
Chinese (zh)
Other versions
CN114493379A (en
Inventor
范晓忻
曹鸿强
赵鹏
冷巍
王俊
凌艳
闫萌
何大伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
3golden Beijing Information Technology Co ltd
Original Assignee
3golden Beijing Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 3golden Beijing Information Technology Co ltd filed Critical 3golden Beijing Information Technology Co ltd
Priority to CN202210362892.4A priority Critical patent/CN114493379B/en
Publication of CN114493379A publication Critical patent/CN114493379A/en
Application granted granted Critical
Publication of CN114493379B publication Critical patent/CN114493379B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an enterprise evaluation model automatic generation method, device and system based on government affair data, which are characterized in that a given government affair data set comprising at least one of credit data, industrial and commercial data, tax data, intellectual property data and judicial data of an enterprise is copied and sampled to obtain a first data set; screening a first machine learning model of a first number of categories and corresponding hyper-parameters of a second number of groups in a preset model set and a preset hyper-parameter set based on a first data set; inputting the first data set into a first machine learning model adopting the hyper-parameters, and training, verifying and integrating the first machine learning model adopting the hyper-parameters to obtain a target enterprise evaluation model. Can only use the all kinds of models of a small amount of sample data operation of government affairs data set, select out suitable model, improve the speed that the model was selected, strengthen the variety of data set, reduce the overfitting, improve the performance of model, reduce the time of model training under the condition of guaranteeing the degree of accuracy.

Description

Enterprise evaluation model automatic generation method, device and system based on government affair data
Technical Field
The invention relates to the technical field of computers, in particular to an enterprise evaluation model automatic generation method, device and system based on government affair data.
Background
The government affair data is important data for the government to evaluate and supervise the enterprise, and covers credit data, industrial and commercial data, tax data, intellectual property data, judicial data and the like of the enterprise, and can come from related departments such as a committee for improvement, a market supervision agency, a tax agency, an intellectual property agency, a court and the like. Governments generally need to construct enterprise evaluation models, and the enterprise is evaluated by using government affair data through the enterprise evaluation models, so that supervision and management of the enterprise are achieved.
Referring to fig. 1, fig. 1 is a schematic flow chart of a method for constructing an enterprise evaluation model in the prior art according to the present invention, and as shown in fig. 1, the method for constructing an enterprise evaluation model in the prior art mainly comprises the steps of performing data preprocessing, feature engineering, model training, model evaluation, and the like on government affairs data composed of credit data, business data, tax data, intellectual property data, and judicial judgment data of an enterprise in sequence to construct an enterprise evaluation model. In the process of constructing the enterprise evaluation model in the prior art, more steps need to depend on expert experience and manual intervention, and the construction efficiency and the automation level of the enterprise evaluation model are low.
In recent years, with the development of machine learning techniques, automated machine learning techniques can be employed to generate enterprise assessment models. However, the following challenges also exist in generating enterprise valuation models using automated machine learning techniques in the prior art: (1) the complexity of model integration is continuously improved, the operation cost of the model is continuously improved, and the time overhead of automatic machine learning is continuously increased; (2) the continuously improved super parameter space can also lead to the increase of the expenditure of the learning time of the automatic machine; (3) the method is limited by the scale of the training data set, and the optimization of the generalization performance of the super-parameter cannot be guaranteed.
Disclosure of Invention
The invention provides an enterprise evaluation model automatic generation method, device and system based on government affair data, which are used for overcoming the defects that the time overhead of generating an enterprise evaluation model by adopting automatic machine learning is continuously increased and the generalization capability is poor in the prior art, and can improve the efficiency of generating the enterprise evaluation model and the performance of the enterprise evaluation model.
In a first aspect, the invention provides a method for automatically generating an enterprise evaluation model based on government affair data, which comprises the following steps:
copying and sampling a given government affair data set to obtain a first data set; wherein the government data set comprises at least one of credit data, business data, tax data, intellectual property data and judicial data of an enterprise;
based on the first data set, screening a first machine learning model of a first number of categories and a corresponding hyper-parameter of a second number of groups in a preset model set and a preset hyper-parameter set;
inputting the first data set into the first machine learning model adopting the hyper-parameters, and training, verifying and integrating the first machine learning model adopting the hyper-parameters to obtain a target enterprise evaluation model.
Further, the screening, based on the first data set, a first number of classes of first machine learning models and a corresponding second number of sets of hyper-parameters in a preset model set and a preset hyper-parameter set includes:
analyzing the first data set to obtain data set characteristics of the first data set;
and based on the data set characteristics, screening the first machine learning model and the hyper-parameters in the preset model set and the preset hyper-parameter set.
Further, after analyzing the first data set to obtain the data set characteristics of the first data set, the method further includes:
screening the data set characteristics to obtain screened data set characteristics;
and converting the screened data set characteristics into numerical values or category characteristics.
Further, before the filtering the first machine learning model of the first number of categories and the corresponding second number of sets of hyper-parameters in the preset model set and the preset hyper-parameter set based on the first data set, the method further includes:
determining the model set and the hyper-parameter set based on a result of a second data set running on the determined second machine learning model; wherein the second data set is determined based on various types of government data sets, and the second machine learning model is determined among various machine learning models based on the second data set.
Further, the determining based on the result of the running of the second data set on the determined second machine learning model, the model set and the hyper-parameter set, comprises:
inputting the second data set into the second machine learning model to obtain the operation result;
and taking the set of the second machine learning models of which the operation results meet preset conditions as the model set, and taking the set of the hyper-parameters of the second machine learning models meeting the preset conditions as the hyper-parameter set.
Further, the inputting the first data set into the first machine learning model using the hyper-parameters, and training, verifying and integrating the first machine learning model using the hyper-parameters to obtain a target enterprise evaluation model includes:
inputting the first data set into the first machine learning model adopting the hyper-parameters, and training and verifying the first machine learning model adopting the hyper-parameters to obtain a first verification result;
inputting the first verification result into the corresponding first machine learning model adopting the hyper-parameters, and training and verifying the first machine learning model adopting the hyper-parameters to obtain a second verification result;
performing weighted calculation on the second verification result based on the category of the first machine learning model to obtain the first number of optimized machine learning models;
and integrating the optimized machine learning model to obtain the target enterprise evaluation model.
In a second aspect, the present invention further provides an apparatus for automatically generating an enterprise evaluation model based on government affair data, including:
the system comprises a data set acquisition module, a data processing module and a data processing module, wherein the data set acquisition module is used for copying and sampling a given government affair data set to obtain a first data set; wherein the government data set comprises at least one of credit data, business data, tax data, intellectual property data and judicial data of an enterprise;
the screening module is used for screening the first machine learning models of the first number of categories and the corresponding hyper-parameters of the second number of groups in a preset model set and a preset hyper-parameter set based on the first data set;
and the integration module is used for inputting the first data set into the first machine learning model adopting the hyper-parameters, and training, verifying and integrating the first machine learning model adopting the hyper-parameters to obtain a target enterprise evaluation model.
In a third aspect, the present invention further provides an automatic generation system of an enterprise evaluation model based on government affair data, including:
the sample sampler is used for copying and sampling a given government affair data set to obtain a first data set; wherein the government data set comprises at least one of credit data, business data, tax data, intellectual property data and judicial data of an enterprise;
a data set portrayal device for analyzing the first data set to obtain data set characteristics of the first data set;
the model selector is used for screening the first machine learning models of the first number of categories and the corresponding hyper-parameters of the second number of groups in a preset model set and a preset hyper-parameter set based on the characteristics of the data set;
and the model integrator is used for inputting the first data set into the first machine learning model adopting the hyper-parameters, and training, verifying and integrating the first machine learning model adopting the hyper-parameters to obtain a target enterprise evaluation model.
In a fourth aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method for automatically generating an enterprise evaluation model based on government affairs data according to the first aspect.
In a fifth aspect, the embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for automatically generating an enterprise evaluation model based on government data according to the first aspect.
In a sixth aspect, embodiments of the present invention further provide a computer program product, on which executable instructions are stored, and the instructions, when executed by a processor, cause the processor to implement the steps of the method for automatically generating an enterprise evaluation model based on government data according to the first aspect.
According to the method, the device and the system for automatically generating the enterprise evaluation model based on the government affair data, a given government affair data set comprising at least one of credit data, business data, tax data, intellectual property data and judicial data of an enterprise is copied and sampled to obtain a first data set; based on the first data set, screening a first machine learning model of a first number of categories and corresponding hyper-parameters of a second number of groups in a preset model set and a preset hyper-parameter set; inputting the first data set into a first machine learning model adopting the hyper-parameters, and training, verifying and integrating the first machine learning model adopting the hyper-parameters to obtain a target enterprise evaluation model. Through duplicating and sampling given government affair data set, only with the sampling data of this few of government affair data set, operate all kinds of models, select the machine learning model that is fit for this government affair data set, can improve the speed that machine learning model selected, strengthen the variety of data set, can reduce the overfitting, improve the performance of enterprise evaluation model, under the condition of the evaluation model degree of accuracy of guaranteeing the enterprise, show the time that reduces the training of enterprise evaluation model, improve efficiency and the automation level that generates enterprise evaluation model.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method for constructing an enterprise evaluation model according to the prior art;
FIG. 2 is a schematic flow chart of an automatic generation method of an enterprise evaluation model based on government affair data according to the present invention;
FIG. 3 is a schematic flow chart illustrating a process of screening a first machine learning model and corresponding hyper-parameters according to the present invention;
FIG. 4 is a schematic flow chart of determining a model set and a hyper-parameter set according to the present invention;
FIG. 5 is a schematic flow chart illustrating a process for generating a target enterprise valuation model based on a first machine learning model and corresponding hyper-parameters, according to the present invention;
FIG. 6 is a schematic flow diagram of an application scenario of the generation of the target enterprise valuation model of FIG. 5;
fig. 7 is a schematic structural diagram of the composition of an automatic enterprise evaluation model generation device based on government affair data according to the present invention;
FIG. 8 is a schematic diagram of the structure of an enterprise evaluation model automatic generation system based on government affairs data according to the present invention;
fig. 9 is a schematic physical structure diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 2 is a schematic flow chart of an automatic generation method of an enterprise evaluation model based on government affair data according to the present invention. As shown in fig. 2, the method for automatically generating an enterprise evaluation model based on government affairs data may include the following steps:
s201, copying and sampling a given government affair data set to obtain a first data set; wherein the government affair data set comprises at least one of credit data, industrial and commercial data, tax data, intellectual property data and judicial data of the enterprise.
In step S201, the number of times of copying a given government affair data set may be 10 times, or may also be 20 times, which is not limited by the embodiment of the present invention. The given government affair data set is copied to obtain a copied data set, and then the copied data set can be sampled based on a K-fold cross validation method to obtain a first data set.
S202, based on the first data set, screening the first machine learning model in the first number category and the hyper-parameters in the second number group corresponding to the first machine learning model in the first number category in the preset model set and the preset hyper-parameters set.
In step S202, the first number category may be 5 categories, or may also be 10 categories, which is not limited in this embodiment of the present invention. The number of the similar second number groups may be 3 groups, or may also be 5 groups, which is not limited in the embodiment of the present invention. For example, 5 types of first machine learning models may be filtered in a preset model set based on the first data set, and then 3 sets of hyper-parameters corresponding to the 5 types of first machine learning models may be filtered in a preset hyper-parameter set. Each type of first machine learning model comprises a first machine learning model, namely the number of the types of the first machine learning model is equal to the number of the types of the first machine learning model.
S203, inputting the first data set into the first machine learning model adopting the hyper-parameters, and training, verifying and integrating the first machine learning model adopting the hyper-parameters to obtain the target enterprise evaluation model.
In step S203, inputting the first data set into the first machine learning model using the screened hyper-parameters, and training the first machine learning model using the screened hyper-parameters based on the training set in the first data set to obtain a trained first machine learning model; and then verifying the trained first machine learning model based on a verification set in the first data set, and if the verification result meets the set requirement, integrating the trained first machine learning model based on the verification result to obtain a target enterprise evaluation model.
According to the method for automatically generating the enterprise evaluation model based on the government affair data, provided by the embodiment of the invention, a given government affair data set comprising at least one of credit data, industrial and commercial data, tax data, intellectual property data and judicial data of an enterprise is copied and sampled to obtain a first data set; based on the first data set, screening a first machine learning model of a first number of categories and corresponding hyper-parameters of a second number of groups in a preset model set and a preset hyper-parameter set; inputting the first data set into a first machine learning model adopting the hyper-parameters, and training, verifying and integrating the first machine learning model adopting the hyper-parameters to obtain a target enterprise evaluation model. Through duplicating and sampling given government affair data set, only with the sampling data of this few of government affair data set, operate all kinds of models, select the machine learning model that is fit for this government affair data set, can improve the speed that machine learning model selected, strengthen the variety of data set, can reduce the overfitting, improve the performance of enterprise evaluation model, under the condition of the evaluation model degree of accuracy of guaranteeing the enterprise, show the time that reduces the training of enterprise evaluation model, improve efficiency and the automation level that generates enterprise evaluation model.
FIG. 3 is a schematic flow chart illustrating a process of screening a first machine learning model and corresponding hyper-parameters according to the present invention. As shown in fig. 3, based on the first data set, the step of filtering the first machine learning model of the first number of categories and the corresponding second number of sets of hyper-parameters in the preset model set and the preset hyper-parameter set may include the steps of:
s301, analyzing the first data set to obtain data set characteristics of the first data set.
In step S301, the data set features may include temporal, spatial, category, numerical value, and ID type features, which is not limited in this embodiment of the present invention. And analyzing the first data set to obtain the number of rows and columns of the first data set, and the data type, the cardinality and the missing rate of the characteristics of each data set in the first data set.
S302, based on the characteristics of the data set, a first machine learning model and a hyper-parameter are screened in a preset model set and a preset hyper-parameter set.
In step S302, the description of step S302 is referred to in step S202, and is not described herein.
According to the embodiment of the invention, the first machine learning model and the corresponding super-parameters are screened through the data set characteristics of the first data set, so that the speed and the accuracy of screening the first machine learning model and the corresponding super-parameters can be improved.
In some optional embodiments, after analyzing the first data set to obtain the data set characteristics of the first data set, the method may further include: screening the data set characteristics to obtain screened data set characteristics; and converting the screened data set characteristics into numerical values or category characteristics.
Wherein the retention feature types are dataset features of time, space, category, value, and ID. If the deletion rate of a certain column of the first data set is greater than a first preset value, deleting the column; if the missing rate of a certain row of the first data set is greater than a second preset value, deleting the row; if the cardinality/data set line number of a certain feature of the first data set is larger than 0.8, deleting the feature; if the cardinality of a certain feature of the first data set is less than 0.1, deleting the feature; converting the time data set characteristics into numerical characteristics; and converting the spatial data set characteristics into category characteristics, and finally converting all the data set characteristics into numerical values or category characteristics.
In some optional embodiments, before filtering the first machine learning model of the first number of categories and the corresponding second number of sets of hyper-parameters in the preset model set and the preset hyper-parameter set based on the first data set, the method may further include: based on the results of the second data set running on the determined second machine learning model, a model set and a hyper-parameter set are determined. Wherein the second data set is determined based on the various types of government affairs data sets, and the second machine learning model is determined among the various machine learning models based on the second data set.
The second data set may be input to a second machine learning model to obtain an operational result; and taking the set of the second machine learning models of which the operation results meet the preset conditions as a model set, and taking the set of the hyper-parameters of the second machine learning models meeting the preset conditions as a hyper-parameter set. The specific process can be as shown in fig. 4:
copying the second data set for n times to obtain n copied data sets; because the model set is empty at this moment, select n machine learning models based on n copy data sets, wherein every copy data set corresponds to a machine learning model, specifically include: judging whether the size of the copied data set is larger than a preset value or not, if so, performing up-sampling processing, selecting a complex machine learning model for a large copied data set, and selecting a simple machine learning model for a small copied data set; randomly selecting a group of hyper-parameters from an optimized hyper-parameter set of a selected machine learning model in a meta-knowledge base as initial hyper-parameters of the selected machine learning model, inputting a copy data set into the corresponding machine learning model to operate, deleting an error or abnormal machine learning model based on an operation result, selecting N types of well-represented machine learning models as a model set, and selecting M groups of hyper-parameters corresponding to the N types of well-represented machine learning models as hyper-parameter sets.
FIG. 5 is a schematic flow chart illustrating a process of generating a target enterprise valuation model based on a first machine learning model and corresponding hyper-parameters, according to the present invention. As shown in fig. 5, inputting the first data set into the first machine learning model using the hyper-parameters, and training, verifying and integrating the first machine learning model using the hyper-parameters to obtain the target enterprise evaluation model may include the following steps:
s501, inputting the first data set into a first machine learning model adopting the hyper-parameters, and training and verifying the first machine learning model adopting the hyper-parameters to obtain a first verification result.
S502, inputting the first verification result into the corresponding first machine learning model adopting the hyper-parameters, and training and verifying the first machine learning model adopting the hyper-parameters to obtain a second verification result.
In steps S501 and S502, a first data set is input into a network of first machine learning models using the filtered hyper-parameters, the network comprising two layers, each layer comprising all of the first machine learning models. For example, the first machine learning models have 3 types, and each type of the first machine learning model corresponds to one first machine learning model, so the number of the first machine learning models is 3. The first machine learning model is numbered as a No. 1 first machine learning model, a No. 2 first machine learning model and a No. 3 first machine learning model. Inputting a first data set into a No. 1 first machine learning model, a No. 2 first machine learning model and a No. 3 first machine learning model of a first layer in the network, and respectively training and verifying the three models to respectively obtain first verification results of the No. 1 first machine learning model, the No. 2 first machine learning model and the No. 3 first machine learning model; the first verification results of the No. 1 first machine learning model, the No. 2 first machine learning model and the No. 3 first machine learning model are respectively input into the No. 1 first machine learning model, the No. 2 first machine learning model and the No. 3 first machine learning model, and the three first machine learning models are respectively trained and verified to obtain second verification results of the No. 1 first machine learning model, the No. 2 first machine learning model and the No. 3 first machine learning model.
S503, performing weighted calculation on the second verification result based on the category of the first machine learning model to obtain a first number of optimized machine learning models.
And S504, integrating the optimized machine learning model to obtain a target enterprise evaluation model.
In step S503 and step S504, still using the example of step S501 and step S502, the category of the first machine learning model is 3, the second verification result is weighted-averaged based on the category of the first machine learning model to obtain 3 optimized machine learning models, and the 3 optimized machine learning models are integrated to obtain the target enterprise evaluation model.
According to the embodiment of the invention, the first machine learning model in the network formed by the screened hyper-parametric first machine learning model is subjected to multiple times of cross training and verification through the first data set, so that the optimized machine learning model can be obtained, and the target enterprise evaluation model with high performance and high accuracy can be quickly obtained based on the integration of the optimized machine learning model.
FIG. 6 is a flowchart illustrating an application scenario of the target enterprise valuation model generated by FIG. 5. As shown in fig. 6, the application scenario may include the following steps:
copying the data set for n-1 times to obtain n copied data sets, marking the n copied data sets, and enabling the labels to be from 0 to n-1; carrying out random k-fold sampling on each copied data set to obtain k data sets, and marking each data set with labels from 0 to k-1; at present, the data set comprises n x k data sets which are respectively expressed as a data set 00 and a data set 01 … … data set (n-1) (k-1), and the corresponding n x k models which are respectively expressed as a model 00 and a model 01 … … model (n-1) (k-1); selecting N types of models from the N x k models, obtaining and summarizing the models according to the model types to obtain an optimized machine learning model, and integrating the optimized machine learning model to obtain a target enterprise evaluation model.
Fig. 7 is a schematic structural diagram of the composition of the device for automatically generating an enterprise evaluation model based on government affair data according to the present invention. As shown in fig. 7, the apparatus for automatically generating an enterprise evaluation model based on government affairs data includes:
a data set obtaining module 701, configured to copy and sample a given government affair data set to obtain a first data set; wherein the government affair data set comprises at least one of credit data, industrial and commercial data, tax data, intellectual property data and judicial data of the enterprise.
A screening module 702, configured to screen the first machine learning model of the first number of categories and the corresponding hyper-parameters of the second number of groups in a preset model set and a preset hyper-parameter set based on the first data set.
The integration module 703 is configured to input the first data set into the first machine learning model using the hyper-parameters, and train, verify, and integrate the first machine learning model using the hyper-parameters to obtain the target enterprise evaluation model.
Optionally, the screening module 702 includes:
and the analysis unit is used for analyzing the first data set to obtain the data set characteristics of the first data set.
And the first screening unit is used for screening the first machine learning model and the hyper-parameters in a preset model set and a preset hyper-parameter set based on the characteristics of the data set.
Optionally, the screening module 702 further includes:
and the second screening unit is used for screening the data set characteristics to obtain the screened data set characteristics.
And the conversion unit is used for converting the screened data set characteristics into numerical values or category characteristics.
Optionally, the apparatus for automatically generating an enterprise evaluation model based on government affair data further includes:
a model set and hyper-parameter set determining module for determining a model set and a hyper-parameter set based on a result of the second data set running on the determined second machine learning model; wherein the second data set is determined based on the various types of government affairs data sets, and the second machine learning model is determined among the various machine learning models based on the second data set.
Optionally, the model set and hyper-parameter set determining module includes:
the input unit is used for inputting the second data set into the second machine learning model to obtain an operation result;
and the output unit is used for taking the set of the second machine learning models of which the operation results meet the preset conditions as a model set and taking the set of the hyper-parameters of the second machine learning models meeting the preset conditions as a hyper-parameter set.
Optionally, the integration module 703 includes:
the first processing unit is used for inputting the first data set into a first machine learning model adopting the hyper-parameters, and training and verifying the first machine learning model adopting the hyper-parameters to obtain a first verification result;
the second processing unit is used for inputting the first verification result into the corresponding first machine learning model adopting the hyper-parameters, and training and verifying the first machine learning model adopting the hyper-parameters to obtain a second verification result;
the calculation unit is used for performing weighted calculation on the second verification result based on the category of the first machine learning model to obtain a first number of optimized machine learning models;
and the integration unit is used for integrating the optimized machine learning model to obtain a target enterprise evaluation model.
Fig. 8 is a schematic structural diagram of the automatic generation system of an enterprise evaluation model based on government affair data according to the present invention. As shown in fig. 8, the system for automatically generating an enterprise evaluation model based on government affairs data includes:
the sample sampler is used for copying and sampling a given government affair data set to obtain a first data set; wherein the government affair data set comprises at least one of credit data, industrial and commercial data, tax data, intellectual property data and judicial data of the enterprise.
The data set portrait device is used for analyzing the first data set to obtain data set characteristics of the first data set;
the model selector is used for screening the first machine learning models of the first number of categories and the corresponding hyper-parameters of the second number of groups in a preset model set and a preset hyper-parameter set based on the characteristics of the data set;
and the model integrator is used for inputting the first data set into the first machine learning model adopting the hyper-parameters, and training, verifying and integrating the first machine learning model adopting the hyper-parameters to obtain the target enterprise evaluation model.
Wherein, the sample sampler is used for selecting K-fold sampling or Hold-out sampling; the data set portrayal device can analyze the first data set to obtain the characteristics of the first data set, and the characteristics are used as the basis for model selection; the model selector can perform primary screening on various candidate models in the model set according to the portrait result of the sample data set and by combining the adaptation experience of the first data set and the model to obtain an alternative model set; sampling by using the sample data set, performing trial operation on various alternative models in the alternative model set, and obtaining alternative model sequencing according to model evaluation indexes, namely model sequencing; the model integrator is used for combining the models selected by the model selector into an integrated model, training and evaluating the integrated model and outputting the integrated model; the learning knowledge base stores performance expression and optimization hyper-parameters of various models on various government affair data sets.
Fig. 9 illustrates a schematic physical structure diagram of an electronic device, and as shown in fig. 9, the electronic device may include: a processor (processor)901, a communication interface (communication interface) 902, a memory (memory)903 and a communication bus 904, wherein the processor 901, the communication interface 902 and the memory 903 are communicated with each other through the communication bus 904. Processor 901 may invoke logic instructions in memory 903 to perform the following method for automated generation of an enterprise assessment model based on government data:
copying and sampling a given government affair data set to obtain a first data set; wherein the government data set comprises at least one of credit data, business data, tax data, intellectual property data and judicial data of an enterprise;
based on the first data set, screening a first machine learning model of a first number of categories and a corresponding hyper-parameter of a second number of groups in a preset model set and a preset hyper-parameter set;
inputting the first data set into the first machine learning model adopting the hyper-parameters, and training, verifying and integrating the first machine learning model adopting the hyper-parameters to obtain a target enterprise evaluation model.
In addition, the logic instructions in the memory 903 may be implemented in a software functional unit and stored in a computer readable storage medium when the logic instructions are sold or used as a separate product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is capable of executing the method for automatically generating an enterprise evaluation model based on government affairs data, provided by the foregoing embodiments:
copying and sampling a given government affair data set to obtain a first data set; wherein the government data set comprises at least one of credit data, business data, tax data, intellectual property data and judicial data of an enterprise;
based on the first data set, screening a first machine learning model of a first number of categories and a corresponding hyper-parameter of a second number of groups in a preset model set and a preset hyper-parameter set;
inputting the first data set into the first machine learning model adopting the hyper-parameters, and training, verifying and integrating the first machine learning model adopting the hyper-parameters to obtain a target enterprise evaluation model.
In yet another aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to execute the method for automatically generating an enterprise evaluation model based on government data provided in the foregoing embodiments:
copying and sampling a given government affair data set to obtain a first data set; wherein the government data set comprises at least one of credit data, business data, tax data, intellectual property data and judicial data of an enterprise;
based on the first data set, screening a first machine learning model of a first number of categories and a corresponding hyper-parameter of a second number of groups in a preset model set and a preset hyper-parameter set;
inputting the first data set into the first machine learning model adopting the hyper-parameters, and training, verifying and integrating the first machine learning model adopting the hyper-parameters to obtain a target enterprise evaluation model.
The above-described embodiments of the apparatus are merely illustrative, wherein the modules illustrated as separate components may or may not be separate, and the components shown as modules may or may not be second modules, may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (7)

1. An enterprise evaluation model automatic generation method based on government affair data is characterized by comprising the following steps:
copying and sampling a given government affair data set to obtain a first data set; wherein the government data set comprises at least one of credit data, business data, tax data, intellectual property data and judicial data of an enterprise;
based on the first data set, screening a first machine learning model of a first number of categories and a corresponding hyper-parameter of a second number of groups in a preset model set and a preset hyper-parameter set;
inputting the first data set into the first machine learning model adopting the hyper-parameters, and training, verifying and integrating the first machine learning model adopting the hyper-parameters to obtain a target enterprise evaluation model;
wherein, before the filtering the first machine learning model of the first number of categories and the corresponding hyper-parameter of the second number of groups in the preset model set and the preset hyper-parameter set based on the first data set, the method further comprises:
determining the model set and the hyper-parameter set based on a result of a second data set running on a determined second machine learning model; wherein the second data set is determined based on various types of government data sets, and the second machine learning model is determined among various machine learning models based on the second data set;
determining the model set and the hyper-parameter set based on a result of the second data set running on the determined second machine learning model, including:
inputting the second data set into the second machine learning model to obtain the operation result;
taking a set of second machine learning models of which the operation results meet preset conditions as the model set, and taking a set of hyper-parameters of the second machine learning models meeting the preset conditions as the hyper-parameter set;
inputting the first data set into the first machine learning model adopting the hyper-parameters, and training, verifying and integrating the first machine learning model adopting the hyper-parameters to obtain a target enterprise evaluation model, wherein the method comprises the following steps:
inputting the first data set into the first machine learning model adopting the hyper-parameters, and training and verifying the first machine learning model adopting the hyper-parameters to obtain a first verification result;
inputting the first verification result into the corresponding first machine learning model adopting the hyper-parameters, and training and verifying the first machine learning model adopting the hyper-parameters to obtain a second verification result;
performing weighted calculation on the second verification result based on the category of the first machine learning model to obtain the first number of optimized machine learning models;
and integrating the optimized machine learning model to obtain the target enterprise evaluation model.
2. The method according to claim 1, wherein said screening a first number of categories of first machine learning models and a corresponding second number of sets of hyper-parameters in a preset model set and a preset hyper-parameter set based on said first data set comprises:
analyzing the first data set to obtain data set characteristics of the first data set;
and based on the data set characteristics, screening the first machine learning model and the hyper-parameters in the preset model set and the preset hyper-parameter set.
3. The method according to claim 2, wherein after analyzing the first data set to obtain the data set characteristics of the first data set, the method further comprises:
screening the data set characteristics to obtain screened data set characteristics;
and converting the screened data set characteristics into numerical values or category characteristics.
4. An enterprise evaluation model automatic generation device based on government affair data is characterized by comprising:
the system comprises a data set acquisition module, a data processing module and a data processing module, wherein the data set acquisition module is used for copying and sampling a given government affair data set to obtain a first data set; wherein the government data set comprises at least one of credit data, business data, tax data, intellectual property data and judicial data of an enterprise;
the screening module is used for screening the first machine learning models of the first number of categories and the corresponding hyper-parameters of the second number of groups in a preset model set and a preset hyper-parameter set based on the first data set;
the integration module is used for inputting the first data set into the first machine learning model adopting the hyper-parameters, and training, verifying and integrating the first machine learning model adopting the hyper-parameters to obtain a target enterprise evaluation model;
a model set and hyper-parameter set determination module for determining the model set and the hyper-parameter set based on a result of a second data set running on the determined second machine learning model; wherein the second data set is determined based on various types of government data sets, and the second machine learning model is determined among various machine learning models based on the second data set;
the model set and hyper-parameter set determination module comprises:
the input unit is used for inputting the second data set into the second machine learning model to obtain the operation result;
the output unit is used for taking a set of second machine learning models of which the operation results meet preset conditions as the model set, and taking a set of hyper-parameters of the second machine learning models meeting the preset conditions as the hyper-parameter set;
the integrated module includes:
the first processing unit is used for inputting the first data set into the first machine learning model adopting the hyper-parameters, and training and verifying the first machine learning model adopting the hyper-parameters to obtain a first verification result;
the second processing unit is used for inputting the first verification result into the corresponding first machine learning model adopting the hyper-parameters, and training and verifying the first machine learning model adopting the hyper-parameters to obtain a second verification result;
a calculating unit, configured to perform weighted calculation on the second verification result based on the category of the first machine learning model to obtain the first number of optimized machine learning models;
and the integration unit is used for integrating the optimized machine learning model to obtain the target enterprise evaluation model.
5. An enterprise evaluation model automatic generation system based on government affair data is characterized by comprising:
the sample sampler is used for copying and sampling a given government affair data set to obtain a first data set; wherein the government data set comprises at least one of credit data, business data, tax data, intellectual property data and judicial data of an enterprise;
a data set portrayal device for analyzing the first data set to obtain data set characteristics of the first data set;
the model selector is used for screening the first machine learning models of the first number of categories and the corresponding hyper-parameters of the second number of groups in a preset model set and a preset hyper-parameter set based on the characteristics of the data set;
the model integrator is used for inputting the first data set into the first machine learning model adopting the hyper-parameters, and training, verifying and integrating the first machine learning model adopting the hyper-parameters to obtain a target enterprise evaluation model;
a model set and hyper-parameter set determination module for determining the model set and the hyper-parameter set based on a result of a second data set running on the determined second machine learning model; wherein the second data set is determined based on various types of government data sets, and the second machine learning model is determined among various machine learning models based on the second data set;
the model set and hyper-parameter set determination module comprises:
the input unit is used for inputting the second data set into the second machine learning model to obtain the operation result;
the output unit is used for taking a set of second machine learning models of which the operation results meet preset conditions as the model set, and taking a set of hyper-parameters of the second machine learning models meeting the preset conditions as the hyper-parameter set;
the model integrator comprises:
the first processing unit is used for inputting the first data set into the first machine learning model adopting the hyper-parameters, and training and verifying the first machine learning model adopting the hyper-parameters to obtain a first verification result;
the second processing unit is used for inputting the first verification result into the corresponding first machine learning model adopting the hyper-parameters, and training and verifying the first machine learning model adopting the hyper-parameters to obtain a second verification result;
a calculating unit, configured to perform weighted calculation on the second verification result based on the category of the first machine learning model to obtain the first number of optimized machine learning models;
and the integration unit is used for integrating the optimized machine learning model to obtain the target enterprise evaluation model.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for automatically generating an enterprise assessment model based on government data according to any one of claims 1 to 3 when executing the program.
7. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the method for automatically generating a government data based enterprise assessment model according to any one of claims 1-3.
CN202210362892.4A 2022-04-08 2022-04-08 Enterprise evaluation model automatic generation method, device and system based on government affair data Active CN114493379B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210362892.4A CN114493379B (en) 2022-04-08 2022-04-08 Enterprise evaluation model automatic generation method, device and system based on government affair data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210362892.4A CN114493379B (en) 2022-04-08 2022-04-08 Enterprise evaluation model automatic generation method, device and system based on government affair data

Publications (2)

Publication Number Publication Date
CN114493379A CN114493379A (en) 2022-05-13
CN114493379B true CN114493379B (en) 2022-09-20

Family

ID=81487538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210362892.4A Active CN114493379B (en) 2022-04-08 2022-04-08 Enterprise evaluation model automatic generation method, device and system based on government affair data

Country Status (1)

Country Link
CN (1) CN114493379B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472880A (en) * 2019-08-20 2019-11-19 李峰 Evaluate the method, apparatus and storage medium of collaborative problem resolution ability
CN111724083A (en) * 2020-07-21 2020-09-29 腾讯科技(深圳)有限公司 Training method and device for financial risk recognition model, computer equipment and medium
CN113822542A (en) * 2021-08-30 2021-12-21 天元大数据信用管理有限公司 Enterprise credit investigation platform construction method based on government affair big data

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027579A (en) * 2018-10-10 2020-04-17 百度在线网络技术(北京)有限公司 Method, device, equipment and medium for determining hyper-parameters
US11392859B2 (en) * 2019-01-11 2022-07-19 Microsoft Technology Licensing, Llc Large-scale automated hyperparameter tuning
CN110390434A (en) * 2019-07-22 2019-10-29 新奥数能科技有限公司 The method and device of Electric Price Forecasting
CN112700319A (en) * 2020-12-16 2021-04-23 中国建设银行股份有限公司 Enterprise credit line determination method and device based on government affair data
CN113240272B (en) * 2021-05-12 2024-04-12 平安科技(深圳)有限公司 Enterprise ESG index determination method and related products
CN113590807B (en) * 2021-08-05 2023-07-25 苏州工业园区企业发展服务中心 Scientific and technological enterprise credit evaluation method based on big data mining
CN114139720A (en) * 2021-11-16 2022-03-04 广西中科曙光云计算有限公司 Government affair big data processing method and device based on machine learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472880A (en) * 2019-08-20 2019-11-19 李峰 Evaluate the method, apparatus and storage medium of collaborative problem resolution ability
CN111724083A (en) * 2020-07-21 2020-09-29 腾讯科技(深圳)有限公司 Training method and device for financial risk recognition model, computer equipment and medium
CN113822542A (en) * 2021-08-30 2021-12-21 天元大数据信用管理有限公司 Enterprise credit investigation platform construction method based on government affair big data

Also Published As

Publication number Publication date
CN114493379A (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN111124840B (en) Method and device for predicting alarm in business operation and maintenance and electronic equipment
EP4195112A1 (en) Systems and methods for enriching modeling tools and infrastructure with semantics
Deming et al. Exploratory Data Analysis and Visualization for Business Analytics
US11494850B1 (en) Applied artificial intelligence technology for detecting anomalies in payroll data
CN111177135B (en) Landmark-based data filling method and device
US11615321B2 (en) Techniques for modifying the operation of neural networks
CN115063035A (en) Customer evaluation method, system, equipment and storage medium based on neural network
CN112182056A (en) Data detection method, device, equipment and storage medium
e Silva et al. DeepData: Machine learning in the marine ecosystems
CN114493379B (en) Enterprise evaluation model automatic generation method, device and system based on government affair data
LU505740A1 (en) Data monitoring method and system
CN111191239A (en) Process detection method and system for application program
US20230401591A1 (en) Anomaly detection systems and methods
CN117493140B (en) Evaluation system for deep learning model
EP4312160A1 (en) Integrated machine learning and rules platform for improved accuracy and root cause analysis
US11669753B1 (en) Artificial intelligence system providing interactive model interpretation and enhancement tools
Fernandes et al. Impact of Non-Fitting Cases for Remaining Time Prediction in a Multi-Attribute Process-Aware Method.
CN116501764B (en) Automatic SQL optimization method based on generated pre-training model
CN111667107B (en) Research and development management and control problem prediction method and device based on gradient random forest
Ackerman et al. Theory and Practice of Quality Assurance for Machine Learning Systems An Experiment Driven Approach
CN111191692B (en) Data calculation method and device based on decision tree and computer equipment
Weber A framework for analyzing the sustainability of peer produced science commons
Galatro et al. Data Analytics for Process Engineers: Prediction, Control and Optimization
Ackerman et al. Theory and Practice of Quality Assurance for Machine Learning Systems
McPhan et al. Provision of Predictive Vegetation Condition Modelling for Constraints Projects: Project Workplan

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant