CN108764915B - Model training method, data type identification method and computer equipment - Google Patents

Model training method, data type identification method and computer equipment Download PDF

Info

Publication number
CN108764915B
CN108764915B CN201810386283.6A CN201810386283A CN108764915B CN 108764915 B CN108764915 B CN 108764915B CN 201810386283 A CN201810386283 A CN 201810386283A CN 108764915 B CN108764915 B CN 108764915B
Authority
CN
China
Prior art keywords
classification model
variables
variable
migration
historical data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810386283.6A
Other languages
Chinese (zh)
Other versions
CN108764915A (en
Inventor
曾利彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN201810386283.6A priority Critical patent/CN108764915B/en
Publication of CN108764915A publication Critical patent/CN108764915A/en
Application granted granted Critical
Publication of CN108764915B publication Critical patent/CN108764915B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing

Abstract

The embodiment of the specification provides a model training method, a data type identification method and computer equipment. The model training method comprises the following steps: determining a migration variable and a mutual variation variable; the migration variables are used for representing common characteristic information of historical data between the source region and the target region; the different variables are used for representing the unique characteristic information of historical data of the source region and the target region; training a first classification model constructed based on the migration variables and the different variables based on historical data of a source region; training a second classification model constructed based on the migration variable and the different variable based on historical data of a target region and a training result of the first classification model; the second classification model includes a difference constraint term; the difference constraint term is used for constraining the difference of the weight of the migration variable between the first classification model and the second classification model.

Description

Model training method, data type identification method and computer equipment
Technical Field
The embodiment of the specification relates to the technical field of computers, in particular to a model training method, a data type identification method and computer equipment.
Background
In actual business, it is often necessary to identify target business data from a large amount of business data. For example, transaction data relating to illegal content such as fraud is identified from a large amount of transaction data. For this, a classification model may be trained, and then the trained classification model may be used to identify target traffic data from a large amount of traffic data.
The method is limited by factors such as short online time of the service and the like, and the quantity of historical data in some regions is small. In this way, the classification model trained for the region based on the historical data of the region alone is low in distinguishing capability, and is prone to cause misidentification on the type of the business data from the region.
Disclosure of Invention
An object of the embodiments of the present specification is to provide a model training method, a data type recognition method, and a computer device, so as to improve the discriminative power of a trained classification model.
To achieve the above object, an embodiment of the present specification provides a model training method, including: determining a migration variable and a mutual variation variable; the migration variables are used for representing common characteristic information of historical data between the source region and the target region; the different variables are used for representing the unique characteristic information of historical data of the source region and the target region; training a first classification model constructed based on the migration variables and the different variables based on historical data of a source region; training a second classification model constructed based on the migration variable and the different variable based on historical data of a target region and a training result of the first classification model; the second classification model includes a difference constraint term; the difference constraint term is used for constraining the difference of the weight of the migration variable between the first classification model and the second classification model.
To achieve the above object, an embodiment of the present specification provides a computer device, including: a determining unit, configured to determine a migration variable and a mutual variation variable; the migration variables are used for representing common characteristic information of historical data between the source region and the target region; the different variables are used for respectively representing the unique characteristic information of historical data of the source region and the target region; the first training unit is used for training a first classification model constructed based on the migration variable and the different variable based on historical data of a source region; the second training unit is used for training a second classification model constructed based on the migration variable and the different variables based on historical data of a target region and a training result of the first classification model; the second classification model includes a difference constraint term; the difference constraint term is used for constraining the difference of the weight of the migration variable between the first classification model and the second classification model.
To achieve the above object, embodiments of the present specification provide a computer device including a memory and a processor; the memory to store computer instructions; the processor, configured to execute the computer instructions to implement the following steps: determining a migration variable and a mutual variation variable; the migration variables are used for representing common characteristic information of historical data between the source region and the target region; the different variables are used for respectively representing the unique characteristic information of historical data of the source region and the target region; training a first classification model constructed based on the migration variables and the different variables based on historical data of a source region; training a second classification model constructed based on the migration variable and the different variable based on historical data of a target region and a training result of the first classification model; the second classification model includes a difference constraint term; the difference constraint term is used for constraining the difference of the weight of the migration variable between the first classification model and the second classification model.
To achieve the above object, an embodiment of the present specification provides a data type identification method, including: identifying a type of traffic data from a target region using a classification model trained for the target region.
To achieve the above object, an embodiment of the present specification provides a computer device, including: an identification unit for identifying a type of traffic data from a target region using a classification model trained for the target region.
To achieve the above object, embodiments of the present specification provide a computer device including a memory and a processor; the memory to store computer instructions; the processor, configured to execute the computer instructions to implement the following steps: identifying a type of traffic data from a target region using a classification model trained for the target region.
As can be seen from the technical solutions provided in the embodiments of the present specification, the computer device may determine a migration variable and a mutual-exception variable; a first classification model constructed based on the migration variables and the disparate variables may be trained based on historical data of a source region; a second classification model constructed based on the migration variables and the disparate variables may be trained based on historical data of a target region and training results of the first classification model, and the second classification model may include a difference constraint term that may be used to constrain a difference in weight of the migration variables between the first classification model and the second classification model. In this way, the second classification model can learn the common characteristic information of the historical data between the source region and the target region from the historical data of the source region; characteristic feature information of the history data of the target area can be learned from the history data of the target area. Thus, the computer device can use the historical data of the source region and the historical data of the target region to train a classification model for the target region at the same time, and the distinguishing capability of the classification model trained for the target region is improved.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.
FIG. 1 is a flow chart of a model training method according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of a data type identification method according to an embodiment of the present disclosure;
FIG. 3 is a functional block diagram of a computer device according to an embodiment of the present disclosure;
FIG. 4 is a functional block diagram of a computer device according to an embodiment of the present disclosure;
fig. 5 is a functional structure diagram of a computer device according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.
Please refer to fig. 1. The embodiment of the specification provides a model training method. The execution subject of the model training method can be a computer device. The computer devices include, but are not limited to, servers, industrial personal computers (industrial control computers), Personal Computers (PCs), all-in-one machines, and the like. The model training method may include the following steps.
Step S10: migration variables and mutual variables are determined.
In this embodiment, the computer device may select a migration variable and a mutual variation variable from a plurality of preset variables based on at least one history data of the source region and at least one history data of the target region.
In this embodiment, the size of the source region may be flexibly set according to business needs, for example, the source region may be a street, a business district, a city, a country, or a region composed of multiple countries, and the like. The size of the target area can be flexibly set according to business needs, and can be, for example, a street, a business district, a city, a country, or an area formed by a plurality of countries, and the like. Each history data of the source region and each history data of the target region can be any type of data, such as transaction data, product review data, or chat data. Each historical data of the source region and each historical data of the target region may have a variety of characteristic information corresponding to a plurality of dimensions. The dimensions may be the same or different depending on the type of historical data. For example, the historical data may be transactional data. The dimensions may then include a transaction channel, a transaction scenario, a transaction time, a transaction amount, a payment account, a transaction device identification, a transaction network address, and the like. The transaction channel may include wireless payment, PC payment, and agreement payment, among others. The transaction scenario may include on-the-spot payment, batch deduction, house loan repayment, credit card repayment, and the like. Specifically, for example, the history DATA of the source region may include DATA _ a1, and DATA _ a2, and the history DATA of the target region may include DATA _ B1, and DATA _ B2. The history DATA _ a1, DATA _ a2, DATA _ B1, and DATA _ B2 may be as shown in table 1 below.
TABLE 1
Figure BDA0001642318010000041
Taking the historical DATA _ a1 in table 1 above as an example, the characteristic information of the historical DATA _ a1 corresponding to the dimensions of the transaction channel, the transaction scene, the transaction time, the transaction amount, the payment Account, the transaction device identifier, the transaction network address and the like can be wireless payment, in-place payment, 20180430, 1000, Account1, Account2, ID1, 222.92, xxx.xxx, respectively.
Each historical data of the source region and each historical data of the target region may be tagged data, which may be understood as data tagged with a type tag. The type tag may include a first type and a second type. The first type may be a type that the target service data to be recognized has, and the second type may include other types except the first type. For example, the target business data may be transaction data relating to illegal contents such as fraud. Then the first type may be a risk type and the second type may be a normal type. As another example, the target business data may be commodity comment data with negative emotional properties. Then, the first type may be a type in which the emotional property is negative, and the second type may include types in which the emotional property is positive and neutral.
The at least one historical data for the source region and the at least one historical data for the target region may each be used to train a classification model for the target region. Specifically, the number of the historical data of the source region is large and the number of the historical data of the target region is small due to factors such as service online time. In view of the fact that the number of the historical data of the target region is small, the historical data of the target region is used alone to train a classification model for the target region, and the trained classification model often cannot have a good classification effect. In view of the difference between the business logic and the business coverage population of the target region and the source region, the historical data of the source region is used alone to train the classification model for the target region, and the trained classification model often cannot have a good classification effect. Meanwhile, the historical data of the source area and the historical data of the target area are used for training a classification model for the target area, so that a large amount of historical data of the source area is fully utilized, the differences of business logic, business coverage crowds and the like between the target area and the source area are considered, and the trained classification model can achieve a good classification effect.
In this embodiment, the preset plurality of variables may be attributed to at least one variable group. The set of variables may be the same or different depending on the type of historical data. For example, the historical data may be transactional data. Then, the variable sets may include a transaction amount variable set, a transaction number variable set, a transaction time variable set, and the like. Specifically, for example, the variables in the transaction amount variable set, the transaction number variable set, and the transaction time variable set may be as shown in table 2 below.
TABLE 2
Figure BDA0001642318010000051
Figure BDA0001642318010000061
Each of the preset variables may be used to characterize a characteristic information of the history data of the source region and a characteristic information of the history data of the target region. The migration variable may be used to characterize feature information common to the historical data between the source region and the target region, and the diversity variable may be used to characterize feature information unique to the historical data of the source region and the target region. The common feature information may be feature information having a high degree of similarity in the history data between the source region and the target region, and the unique feature information may be feature information having a low degree of similarity in the history data between the source region and the target region.
In this embodiment, the computer device may obtain historical data of the source region and the target region in a specified time interval respectively; a set formed by the acquired historical data of the source region can be used as a first historical data set; a set formed by the acquired historical data of the target region can be used as a second historical data set; a first feature value of each variable of the preset plurality of variables may be calculated based on the first historical data set and the second historical data set; at least one variable may be selected as a migration variable and at least one variable may be selected as a mutual difference variable from the preset plurality of variables based on the first characteristic value of the variable. The specified time interval may be any length of time, and may be, for example, 1 month, 1 quarter, 6 months, or 1 year, etc.
The first characteristic value may include a Mutual Information value (MI). Mutual information values of variables may be used to represent how similar the variables are between the characteristic information characterized by the first set of historical data and the characteristic information characterized by the second set of historical data. The magnitude of the mutual information value may be positively correlated with the degree of similarity the mutual information value represents. For example, the computer device may be formulated by formula
Figure BDA0001642318010000062
To calculate mutual information values of the variables; x can represent a set formed by values of the variable in the first historical data set; y may represent a set of values of the variable in the second historical data set; x may represent a value in set X; y may represent a value in set Y; p (x, y) may represent a probability (a joint probability of x and y) that the variable takes a value of x in the first historical data set and a value of y in the second historical data set; p (x) may represent the probability of the variable taking the value x in the first set of historical data; p (y) may represent the probability that the variable takes on value y in the second set of historical data. It should be noted that the above formula or method for calculating the mutual information value is only an example, and actually, other formulas or methods may be used to calculate the mutual information value of the variable.
The computer equipment can select at least one variable with a mutual information value larger than or equal to a first preset threshold value from the preset variables as a migration variable; at least one variable may be selected as a variant variable from the remaining variables except the selected migration variable. The size of the first preset threshold value can be flexibly set according to actual needs. Or, the computer device may further select, as a migration variable, a first preset number of variables with a maximum mutual information value from the preset plurality of variables; at least one variable may be selected as a variant variable from the remaining variables except the selected migration variable. The numerical value of the first preset quantity can be flexibly set according to actual needs.
It will be appreciated by those skilled in the art that the first characteristic value may also comprise other information values. The computer device may select at least one variable from the preset plurality of variables as a migration variable and at least one variable as a mutual variation variable based on other information values of the variables.
In an implementation manner of this embodiment, the computer device may calculate a second feature value of each variable of the preset plurality of variables based on the first historical data set and the second historical data set; at least one variable may be selected from the preset plurality of variables as a representative variable based on the second characteristic value of the variable.
The second characteristic Value may include an Information Value (IV). The information value of a variable may be used to represent the amount of information the variable implies in the first and second sets of historical data. The larger the value of the information value of a variable, the larger the amount of information that the variable implies in the first and second sets of historical data, and thus the larger the contribution of the variable in identifying the type of business data. For example, the computer device may be represented by the formula IV-IV1+IV2
Figure BDA0001642318010000071
Figure BDA0001642318010000072
To calculate the information value of the variable; g1The sum of the number of the first type historical data corresponding to each value of the variable in the first historical data set can be represented; b is1Can representThe sum of the quantity of the second type historical data corresponding to each value of the variable in the first historical data set; gTThe sum of the number of the first type historical data corresponding to each value of the variable in the first historical data set and the second historical data set can be represented; b isTThe sum of the quantity of the second type historical data corresponding to each value of the variable in the first historical data set and the second historical data set can be represented; g2The sum of the number of the first type historical data corresponding to each value of the variable in the second historical data set can be represented; b is2The sum of the number of the second type of history data corresponding to each value of the variable in the second history data set can be represented. It should be noted that the above formula or method for calculating the information value is only an example, and there may be other formulas or methods for calculating the information value of the variable.
The computer device may select at least one variable having an information value greater than or equal to a second preset threshold value from the preset plurality of variables as a representative variable. The size of the second preset threshold value can be flexibly set according to actual needs. Alternatively, the computer device may select a second preset number of variables having the largest mutual information value from the preset number of variables as the representative variables. The numerical value of the second preset quantity can be flexibly set according to actual needs.
It will be appreciated by those skilled in the art that the second characteristic value may also comprise other information values. The computer device may select at least one variable from the preset plurality of variables as a representative variable based on other information values of the variables.
Step S12: training a first classification model constructed based on the migration variables and the mutually different variables based on historical data of a source region.
In this embodiment, the classification model may be a mathematical model for classifying the unclassified traffic data into known types. The classification model may be a bayesian classification model, a Support Vector Machine (SVM) classification model, or a Convolutional Neural Network (CNN) classification model. The classification model can be a risk classification model, an emotion classification model, a theme classification model or the like. The risk classification model may be used to classify the business data based on risk degree, the emotion classification model may be used to classify the business data based on emotion, and the topic classification model may be used to classify the business data based on expressed topics.
The first classification model may be a classification model constructed based on the migration variables and the disparate variables. For example, the first classification model may be
Figure BDA0001642318010000081
J may represent an objective function (objective function). The objective function may be used to represent how close the predicted and actual values are during machine learning. The goal of machine learning may be to optimize the objective function. The objective function may specifically be any type of function. m may represent a matrix of said mutually different variables, e.g. [ m ]1 m2 … mi …],m1、m2、miOne distinct variable may be represented, respectively. u. of1The matrix may represent a weight composition of the mutually different variables in the first classification model, and may be [ u ] for example11 u12 … u1i …],u11、u12、u1iThe weights of one of the mutually different variables may be represented, respectively. n may represent a matrix of said mutually different variables, e.g. [ n ]1 n2 … ni …],n1、n2、niOne migration variable may be represented, respectively. v. ofsThe matrix may represent a weighted composition of the migration variables in the first classification model, and may be, for example, [ v [ ]s1 vs2 … vsi …],vs1、vs2、vsiThe weights of one migration variable may be represented, respectively. α may be an empirical value, and may be, for example, 0.1, 0.3, 0.6, etc.
In one implementation of this embodiment, the first classification model may include a first weight constraint term. The first weight constraint item can be used for constraining the weight of the mutually different variables in the first classification model, so that the first classification model can learn as much as possible of the common characteristic information of the historical data between the source region and the target region in the training process. Continuing with the previous example, the first weight constraint term may be a data term α | | | u in the first classification model1||2
In this embodiment, the computer device may train the first classification model in any manner. Continuing with the previous example, the process of training the first classification model by the computer device can be understood as solving a formula
Figure BDA0001642318010000091
To the optimization problem of (2). The computer device may specifically solve the matrix u using an algorithm such as Stochastic Gradient Descent (SGD) to solve the formula1And vsSo that the formula
Figure BDA0001642318010000092
And optimization is achieved.
Step S14: and training a second classification model constructed based on the migration variable and the different variable based on the historical data of the target region and the training result of the first classification model.
In this embodiment, the second classification model may be a classification model constructed based on the migration variables and the mutual-variant variables. The second classification model is different from the first classification model. In particular, the first classification model may be adapted; the adjusted first classification model may be used as the second classification model. The process of adjusting may include, for example, adding data items, etc. For example, the first classification model may be
Figure BDA0001642318010000093
J may represent an objective function. m may represent saidMatrices of mutually different variables. u. of2A matrix of weights of the mutually different variables in the second classification model may be represented. n may represent a matrix of said mutually different variables. v may represent a matrix of weights of the migration variables in the second classification model. v. ofsA matrix of weights of the migration variables in the first classification model may be represented. Alpha and beta may be empirical values.
In one implementation of this embodiment, the second classification model may include a second weight constraint term. The second weight constraint term may be used to constrain the weight of the mutually different variables in the second classification model. Continuing with the previous example, the second weight constraint term may be a data term α | | | u in the second classification model2||2
In one implementation of this embodiment, the second classification model may include a difference constraint term. The difference constraint item can be used for constraining the difference of the weight of the migration variable between the first classification model and the second classification model, so that the second classification model can learn characteristic feature information of the target region historical data as much as possible in a training process. Continuing with the previous example, the difference constraint term may be the data term β | | | v-v in the second classification models||。
In this embodiment, the computer device may train the second classification model in any manner, and the trained second classification model may be used to identify the type of the traffic data from the target region. Continuing with the previous example, the process of training the second classification model by the computer device can be understood as solving the formula
Figure BDA0001642318010000101
To the optimization problem of (2). The computer device may specifically solve the matrix u using an algorithm such as stochastic gradient descent solving a formula1And v, such that
Figure BDA0001642318010000102
And optimization is achieved.
In this embodiment, the computer device may determine a migration variable and a mutual exception variable; a first classification model constructed based on the migration variables and the disparate variables may be trained based on historical data of a source region; a second classification model constructed based on the migration variables and the disparate variables may be trained based on historical data of a target region and training results of the first classification model, and the second classification model may include a difference constraint term that may be used to constrain a difference in weight of the migration variables between the first classification model and the second classification model. In this way, the second classification model can learn the common characteristic information of the historical data between the source region and the target region from the historical data of the source region; characteristic feature information of the history data of the target area can be learned from the history data of the target area. Thus, the computer device can use the historical data of the source region and the historical data of the target region to train a classification model for the target region at the same time, and the distinguishing capability of the classification model trained for the target region is improved.
Please refer to fig. 2. The embodiment of the specification provides a data type identification method. The execution subject of the data type identification method can be a computer device. The computer devices include, but are not limited to, servers, industrial personal computers (industrial control computers), Personal Computers (PCs), all-in-one machines, and the like. The data type identification method may include the following steps.
Step S20: identifying a type of traffic data from a target region using a classification model trained for the target region.
In this embodiment, the classification model may be obtained by training based on a model training method in an embodiment of this specification. The business data may be any type of data, and may be, for example, transaction data, product review data, or chat data.
In this embodiment, the computer device may identify the type of traffic data from the target region using a classification model trained for the target region. Specifically, the computer device may calculate feature values of traffic data from a target region using a classification model trained for the target region; the type of the service data may be identified based on the characteristic value of the service data. For example, the business data may be transaction data. Then, when the characteristic value of the traffic data is greater than or equal to a preset threshold, the computer device may identify the type of the traffic data as a risk type. When the characteristic value of the service data is smaller than a preset threshold value, the computer device may identify that the type of the service data is a normal type.
In this embodiment, the computer device may identify the type of traffic data from the target region using a classification model trained for the target region. The embodiment can reduce the false recognition rate of the service data.
Please refer to fig. 3. The embodiment of the specification provides a computer device. The computer device may comprise the following elements.
A determination unit 30 for determining a migration variable and a mutual variation variable; the migration variables are used for representing common characteristic information of historical data between the source region and the target region; the different variables are used for respectively representing the unique characteristic information of historical data of the source region and the target region;
a first training unit 32, configured to train a first classification model constructed based on the migration variable and the mutually different variables based on historical data of a source region;
a second training unit 34, configured to train a second classification model constructed based on the migration variable and the disparate variables based on historical data of a target region and a training result of the first classification model; the second classification model includes a difference constraint term; the difference constraint term is used for constraining the difference of the weight of the migration variable between the first classification model and the second classification model.
Please refer to fig. 4. The embodiment of the specification provides another computer device. The computer device may include a memory and a processor.
In this embodiment, the memory may be implemented in any suitable manner. For example, the memory may be a read-only memory, a mechanical hard disk, a solid state disk, a U disk, or the like. The memory may be used to store computer instructions.
In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The processor may execute the computer instructions to perform the steps of: determining a migration variable and a mutual variation variable; the migration variables are used for representing common characteristic information of historical data between the source region and the target region; the different variables are used for respectively representing the unique characteristic information of historical data of the source region and the target region; training a first classification model constructed based on the migration variables and the different variables based on historical data of a source region; training a second classification model constructed based on the migration variable and the different variable based on historical data of a target region and a training result of the first classification model; the second classification model includes a difference constraint term; the difference constraint term is used for constraining the difference of the weight of the migration variable between the first classification model and the second classification model.
Please refer to fig. 5. The embodiment of the specification provides a computer device. The computer device may comprise the following elements.
An identifying unit 50 for identifying the type of the traffic data from the target region using the classification model trained for the target region. The classification model can be obtained by training based on a model training method in the embodiment of the present specification.
Please refer to fig. 4. The embodiment of the specification provides another computer device. The computer device may include a memory and a processor.
In this embodiment, the memory may be implemented in any suitable manner. For example, the memory may be a read-only memory, a mechanical hard disk, a solid state disk, a U disk, or the like. The memory may be used to store computer instructions.
In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The processor may execute the computer instructions to perform the steps of: identifying a type of traffic data from a target region using a classification model trained for the target region; the classification model can be obtained by training based on a model training method in the embodiment of the present specification.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and the same or similar parts in each embodiment may be referred to each other, and each embodiment focuses on differences from other embodiments. In particular, for the embodiment of the computer device, since it is substantially similar to the embodiment of the model training method, the description is simple, and relevant points can be referred to the partial description of the embodiment of the model training method.
In addition, it is understood that one skilled in the art, after reading this specification document, may conceive of any combination of some or all of the embodiments listed in this specification without the need for inventive faculty, which combinations are also within the scope of the disclosure and protection of this specification.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate a dedicated integrated circuit chip 2. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardbyscript Description Language (vhr Description Language), and the like, which are currently used by Hardware compiler-software (Hardware Description Language-software). It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
From the above description of the embodiments, it is clear to those skilled in the art that the present specification can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the present specification may be essentially or partially implemented in the form of software products, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.
The description is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
While the specification has been described with examples, those skilled in the art will appreciate that there are numerous variations and permutations of the specification that do not depart from the spirit of the specification, and it is intended that the appended claims include such variations and modifications that do not depart from the spirit of the specification.

Claims (15)

1. A model training method, comprising:
determining a migration variable and a mutual variation variable; the migration variables are used for representing common characteristic information of historical data between the source region and the target region; the different variables are used for representing the unique characteristic information of historical data of the source region and the target region;
training a first classification model constructed based on the migration variables and the different variables based on historical data of a source region;
training a second classification model constructed based on the migration variable and the different variable based on historical data of a target region and a training result of the first classification model; the second classification model includes a difference constraint term; the difference constraint term is used for constraining the difference of the weight of the migration variable between the first classification model and the second classification model.
2. The method of claim 1, the determining migration variables and mutual-exception variables, comprising:
and selecting a migration variable and a mutual variation variable from a plurality of preset variables based on historical data of the source region and the target region.
3. The method of claim 2, wherein said selecting a migration variable and a mutually exclusive variable from a preset plurality of variables comprises:
calculating a first characteristic value of at least one variable in a plurality of preset variables based on historical data of a source region and a target region;
and selecting a migration variable and a mutually different variable from a plurality of preset variables based on the first characteristic value of the variable.
4. The method of claim 3, the first characteristic value comprising a mutual information value.
5. The method of claim 2, prior to selecting the migration variable and the mutually exclusive variable from the preset plurality of variables, the method further comprising:
screening a plurality of representative variables from a plurality of preset variables based on historical data of a source region and a target region;
correspondingly, the selecting a migration variable and a mutually different variable from a plurality of preset variables includes:
and selecting a migration variable and a mutual variation variable from the plurality of representative variables based on historical data of the source region and the target region.
6. The method of claim 5, wherein the screening of the plurality of representative variables from the preset plurality of variables comprises:
calculating a second characteristic value of at least one variable in a plurality of preset variables based on historical data of the source region and the target region;
and screening a plurality of representative variables from a plurality of preset variables based on the second characteristic values of the variables.
7. The method of claim 6, the second characteristic value comprising an informational value.
8. The method of claim 1, the first classification model comprising a first weight constraint term; the first weight constraint term is used for constraining the weight of the mutually different variables in the first classification model.
9. The method of claim 1, the training results of the first classification model comprising weights of the migration variables and the disparate variables at the first classification model; accordingly, the training of the second classification model constructed based on the migration variables and the disparate variables includes:
and training a second classification model constructed based on the migration variables and the different variables based on historical data of a target region by taking the weight of the migration variables in the first classification model as the initial weight of the migration variables in the second classification model and the weight of the different variables in the first classification model as the initial weight of the different variables in the second classification model.
10. The method of claim 1, the second classification model comprising a second weight constraint term; the second weight constraint term is used for constraining the weight of the mutually different variables in the second classification model.
11. A computer device, comprising:
a determining unit, configured to determine a migration variable and a mutual variation variable; the migration variables are used for representing common characteristic information of historical data between the source region and the target region; the different variables are used for respectively representing the unique characteristic information of historical data of the source region and the target region;
the first training unit is used for training a first classification model constructed based on the migration variable and the different variable based on historical data of a source region;
the second training unit is used for training a second classification model constructed based on the migration variable and the different variables based on historical data of a target region and a training result of the first classification model; the second classification model includes a difference constraint term; the difference constraint term is used for constraining the difference of the weight of the migration variable between the first classification model and the second classification model.
12. A computer device comprising a memory and a processor;
the memory to store computer instructions;
the processor, configured to execute the computer instructions to implement the following steps: determining a migration variable and a mutual variation variable; the migration variables are used for representing common characteristic information of historical data between the source region and the target region; the different variables are used for respectively representing the unique characteristic information of historical data of the source region and the target region; training a first classification model constructed based on the migration variables and the different variables based on historical data of a source region; training a second classification model constructed based on the migration variable and the different variable based on historical data of a target region and a training result of the first classification model; the second classification model includes a difference constraint term; the difference constraint term is used for constraining the difference of the weight of the migration variable between the first classification model and the second classification model.
13. A data type identification method, comprising:
identifying a type of traffic data from a target region using a classification model trained for the target region; the classification model is trained on the method of any one of claims 1 to 10.
14. A computer device, comprising:
an identification unit for identifying a type of traffic data from a target region using a classification model trained for the target region; the classification model is trained on the method of any one of claims 1 to 10.
15. A computer device comprising a memory and a processor;
the memory to store computer instructions;
the processor, configured to execute the computer instructions to implement the following steps: identifying a type of traffic data from a target region using a classification model trained for the target region; the classification model is trained on the method of any one of claims 1 to 10.
CN201810386283.6A 2018-04-26 2018-04-26 Model training method, data type identification method and computer equipment Active CN108764915B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810386283.6A CN108764915B (en) 2018-04-26 2018-04-26 Model training method, data type identification method and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810386283.6A CN108764915B (en) 2018-04-26 2018-04-26 Model training method, data type identification method and computer equipment

Publications (2)

Publication Number Publication Date
CN108764915A CN108764915A (en) 2018-11-06
CN108764915B true CN108764915B (en) 2021-07-30

Family

ID=64011898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810386283.6A Active CN108764915B (en) 2018-04-26 2018-04-26 Model training method, data type identification method and computer equipment

Country Status (1)

Country Link
CN (1) CN108764915B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109598414B (en) * 2018-11-13 2023-04-21 创新先进技术有限公司 Risk assessment model training, risk assessment method and device and electronic equipment
CN109947816A (en) * 2018-12-17 2019-06-28 阿里巴巴集团控股有限公司 Model parameter calculation method, data type recognition methods, device and server
CN113055208B (en) * 2019-12-27 2023-01-13 中移信息技术有限公司 Method, device and equipment for identifying information identification model based on transfer learning
CN111523995B (en) * 2020-04-20 2023-03-17 支付宝(杭州)信息技术有限公司 Method, device and equipment for determining characteristic value of model migration
CN113917364B (en) * 2021-10-09 2024-03-08 广东电网有限责任公司东莞供电局 High-resistance grounding identification method and device for power distribution network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105628383A (en) * 2016-02-01 2016-06-01 东南大学 Bearing fault diagnosis method and system based on improved LSSVM transfer learning
CN107491792A (en) * 2017-08-29 2017-12-19 东北大学 Feature based maps the electric network fault sorting technique of transfer learning
CN107832711A (en) * 2017-11-13 2018-03-23 常州大学 A kind of recognition methods again of the pedestrian based on transfer learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8719192B2 (en) * 2011-04-06 2014-05-06 Microsoft Corporation Transfer of learning for query classification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105628383A (en) * 2016-02-01 2016-06-01 东南大学 Bearing fault diagnosis method and system based on improved LSSVM transfer learning
CN107491792A (en) * 2017-08-29 2017-12-19 东北大学 Feature based maps the electric network fault sorting technique of transfer learning
CN107832711A (en) * 2017-11-13 2018-03-23 常州大学 A kind of recognition methods again of the pedestrian based on transfer learning

Also Published As

Publication number Publication date
CN108764915A (en) 2018-11-06

Similar Documents

Publication Publication Date Title
CN108764915B (en) Model training method, data type identification method and computer equipment
CN108629687B (en) Anti-money laundering method, device and equipment
CN108305158B (en) Method, device and equipment for training wind control model and wind control
CN110020938B (en) Transaction information processing method, device, equipment and storage medium
CN107391545B (en) Method for classifying users, input method and device
CN114202370A (en) Information recommendation method and device
CN107807958B (en) Personalized article list recommendation method, electronic equipment and storage medium
CN108596410B (en) Automatic wind control event processing method and device
CN110633989B (en) Determination method and device for risk behavior generation model
CN108346107B (en) Social content risk identification method, device and equipment
CN110263157A (en) A kind of data Risk Forecast Method, device and equipment
CN110674188A (en) Feature extraction method, device and equipment
CN113643119A (en) Model training method, business wind control method and business wind control device
Aralikatte et al. Fault in your stars: an analysis of android app reviews
CN113221717A (en) Model construction method, device and equipment based on privacy protection
CN111582565A (en) Data fusion method and device and electronic equipment
CN116308738B (en) Model training method, business wind control method and device
CN108763209B (en) Method, device and equipment for feature extraction and risk identification
CN111538925A (en) Method and device for extracting Uniform Resource Locator (URL) fingerprint features
CN113010562B (en) Information recommendation method and device
CN110738562B (en) Method, device and equipment for generating risk reminding information
CN112967044B (en) Payment service processing method and device
CN114511376A (en) Credit data processing method and device based on multiple models
CN109389157B (en) User group identification method and device and object group identification method and device
US11379929B2 (en) Advice engine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20201022

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201022

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: Greater Cayman, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant