CN111309706A - Model training method and device, readable storage medium and electronic equipment - Google Patents

Model training method and device, readable storage medium and electronic equipment Download PDF

Info

Publication number
CN111309706A
CN111309706A CN202010066345.2A CN202010066345A CN111309706A CN 111309706 A CN111309706 A CN 111309706A CN 202010066345 A CN202010066345 A CN 202010066345A CN 111309706 A CN111309706 A CN 111309706A
Authority
CN
China
Prior art keywords
data
model
sample set
training
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010066345.2A
Other languages
Chinese (zh)
Inventor
徐浩然
陈秀坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202010066345.2A priority Critical patent/CN111309706A/en
Publication of CN111309706A publication Critical patent/CN111309706A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Abstract

The application discloses a model training method, a device, a readable storage medium and an electronic device, wherein the method comprises the following steps: obtaining a data set comprising data records of a plurality of persons, the data records comprising data of at least one activity of known persons; detecting whether the attribute of the data record in the data set changes or not, wherein the attribute comprises the type of the data and the number of each type of data corresponding to the data record in unit time; if the attribute of the data record in the data set changes, acquiring a training sample set, a verification sample set and a test sample set corresponding to the changed attribute from the data set; and inputting the training sample set, the verification sample set and the test sample set into a pre-configured model in sequence for model training to obtain a target model. The scheme provided by the application can automatically update the model, and the consistency of the model and the environment is ensured.

Description

Model training method and device, readable storage medium and electronic equipment
Technical Field
The application relates to the technical field of model training, in particular to a model training method and device, a readable storage medium and electronic equipment.
Background
The big data contains a large amount of information and has great value. By analyzing the big data, information which is not interesting is filtered out, so that interesting information can be obtained.
In the process of big data processing, a model is usually trained first, so that the relevant data processing is performed by the model. At present, a network structure is usually constructed by a professional person when training a model, and then training is carried out. However, with the development of technologies and the like, the environment of data generation is constantly changing, the value, quantity, quality and the like of different types of data may have different influences on the results of data analysis at different times, and the accuracy of the model changes along with the change of the environment, so that higher requirements are put forward on the updating of the model.
Disclosure of Invention
In order to overcome at least the above-mentioned deficiencies in the prior art, it is an object of the present application to provide a model training method, the method comprising:
obtaining a data set comprising data records of a plurality of persons, the data records comprising data of at least one activity of known persons;
detecting whether the attribute of the data record in the data set changes or not, wherein the attribute comprises the type of the data and the number of each type of data corresponding to the data record in unit time;
if the attribute of the data record in the data set changes, acquiring a training sample set, a verification sample set and a test sample set corresponding to the changed attribute from the data set;
and inputting the training sample set, the verification sample set and the test sample set into a pre-configured model in sequence for model training to obtain a target model.
Optionally, the method further comprises: obtaining a pre-configured model training rule, wherein the model training rule comprises an algorithm adopted by model training;
the step of inputting the training sample set, the verification sample set and the test sample set into a pre-configuration model in sequence for model training to obtain a target model comprises the following steps:
and inputting the training sample set, the verification sample set and the test sample set into a pre-configured model in sequence, and carrying out model training according to a pre-configured model training rule to obtain the target model.
Optionally, the method further comprises:
obtaining a model training method determined by a user, wherein the model training method comprises at least one of linear regression, gradient descent, polynomial regression, learning curve, linear model regularization and logistic regression.
Optionally, the step of sequentially inputting the training sample set, the verification sample set, and the test sample set into a preconfigured model, and performing model training according to a preconfigured model training rule to obtain the target model includes:
inputting each training sample set, each verification sample set and each test sample set into the pre-configuration model in sequence, and training by adopting the algorithms in the model training rules respectively to obtain the sub-models corresponding to each algorithm;
and obtaining a target model according to the sub-model corresponding to each algorithm.
Optionally, the training sample set, the verification sample set, and the test sample set are sequentially input into a preconfigured model for model training, and before obtaining the target model, the method further includes:
if the data type of the data changes, acquiring the proportion of each type of data;
obtaining the weight coefficient of each type of data according to the proportion of each type of data;
and adjusting the corresponding network parameters in the pre-configured model according to the weight coefficient to obtain a new pre-configured model.
Optionally, the step of sequentially inputting the training sample set, the verification sample set, and the test sample set into a preconfigured model for model training to obtain a target model includes:
detecting the state of hardware resources and/or acquiring the current time;
judging whether the state of the hardware resource meets a preset hardware condition and/or whether the current time reaches a preset starting time;
and if the state of the hardware resource meets a preset hardware condition and/or the current time reaches a preset starting time, sequentially inputting the training sample set, the verification sample set and the test sample set into a pre-configuration model for model training to obtain a target model.
Optionally, the step of detecting whether the attribute of the data record in the data set changes includes:
acquiring the attribute of a data record generated in the current time period as a first attribute;
acquiring the attribute of the generated data record in a preset time period before the current time period as a second attribute;
judging whether the first attribute is consistent with the second attribute;
and if the first attribute is inconsistent with the second attribute, determining that the attribute of the data record in the data set is changed.
It is another object of the present application to provide a model training apparatus, the apparatus comprising:
a first acquisition module for acquiring a data set comprising data records of a plurality of persons, the data records comprising at least one item of data describing a known person's behavior;
the detection module is used for detecting whether the attribute of the data record in the data set changes or not, wherein the attribute comprises the type of the data and the number of each type of data corresponding to the data record in unit time;
the second acquisition module is used for acquiring a training sample set, a verification sample set and a test sample set corresponding to changed attributes from a data set when the attributes of data records in the data set change;
and the training module is used for inputting the training sample set, the verification sample set and the test sample set into a pre-configuration model in sequence for model training to obtain a target model.
It is also an object of the present application to provide a readable storage medium storing an executable program which, when executed by a processor, implements a method according to any of the present applications.
Another object of the present application is to provide an electronic device, which includes a memory and a processor, the memory is electrically connected to the processor, the memory stores an executable program, and the processor, when executing the executable program, implements the method according to any of the present application.
Compared with the prior art, the method has the following beneficial effects:
according to the model training method, the device, the readable storage medium and the electronic equipment, the change condition of the data in the environment is detected, so that the training sample set, the verification sample set and the test sample set are obtained for the data according to the change condition of the data, and the model training is performed according to the obtained training sample set, the verification sample set and the test sample set to obtain the target model. According to the method and the device, different training sample sets, verification sample sets and test sample sets can be selected according to the change of the environment to carry out model training, so that the trained target model is consistent with the environment, and the accuracy of model identification can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a block diagram schematically illustrating a structure of an electronic device according to an embodiment of the present disclosure;
FIG. 2 is a first flowchart illustrating a model training method according to an embodiment of the present disclosure;
FIG. 3 is a second flowchart illustrating a model training method according to an embodiment of the present disclosure;
FIG. 4 is a third schematic flowchart of a model training method provided in the embodiment of the present application;
fig. 5 is a block diagram schematically illustrating a structure of a model training apparatus according to an embodiment of the present application.
Icon: 100-an electronic device; 110-a model training device; 111-a first acquisition module; 112-a detection module; 113-a second acquisition module; 114-a training module; 120-a memory; 130-a processor; 140-a communication unit.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present application, it is further noted that, unless expressly stated or limited otherwise, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.
Huge data such as public security big data contain huge data values, the values are often reflected through the relation among the data, and the statistical analysis of the data with different dimensions (types) can obtain the required results. In the process of analyzing huge data, the huge data can be processed through a model, so that an intuitive result can be output. If the model is required to output an accurate result, a lengthy training process is necessary.
In practice, training of the model is usually done manually, and since data itself and application scenarios of the model are constantly changing, it takes a lot of manpower and material resources to obtain an accurate model after the data is changed or the application scenarios of the model are changed.
In order to solve the above problem, the present embodiment provides a model training scheme, please refer to fig. 1, fig. 1 is a schematic block diagram of a structure of an electronic device 100 provided in the embodiments of the present application, where the electronic device 100 includes a model training apparatus 110, a memory 120 and a processor 130, and the memory 120 and the processor 130 are electrically connected to each other directly or indirectly for implementing data interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The model training apparatus 110 includes at least one software function module which can be stored in the memory 120 in the form of software or Firmware (Firmware) or solidified in an Operating System (OS) of the electronic device 100. The processor 130 is used for executing executable modules stored in the memory 120, such as software functional modules and computer programs included in the model training device 110.
In this embodiment, the electronic device 100 may further include a communication unit 140, where the communication unit 140 is communicatively connected to the processor 130 and the memory 120.
Referring to fig. 2, an embodiment of the present application further provides a model training method applicable to the electronic device 100, where the method includes steps S110 to S140.
Step S110, a data set comprising data records of a plurality of persons is obtained, the data records comprising data of at least one behavior of known persons.
Because people have different behaviors in the usual life process, the different behaviors can reflect the life habits and other information of people from the side, so that data records of a large number of people can be collected, the data records of the large number of people form a data set, and the data records in the data set can be used as the basis of model training.
The data in this embodiment may be determined according to the specific function of the model, for example, when the eating habits of the person need to be analyzed, the data may be determined according to the type and taste of dishes when the user takes a delivery. When it is desired to analyze whether a person has a particular violation, the analysis can be based on the chat tools used by the person, such as WeChat, QQ, etc.
Step S120, detecting whether an attribute of the data record in the data set changes, where the attribute includes a type of the data and a quantity of each type of data corresponding to a unit time.
For a specific person, the type of behavior, the frequency of different behaviors, etc. may be different in different periods, and these changes of behavior can reflect changes of the person's living habits, preferences, etc., wherein the type of behavior corresponds to the type of data, and the change of the type of behavior means that the type of corresponding data of the person changes.
In this embodiment, the process of detecting whether the attribute of the data record changes may be executed in a loop when it is detected that the attribute of the data record does not change. In this embodiment, a change in the attribute of the data record means that the environment in which the data is generated has changed.
Step S130, if the attribute of the data record in the data set changes, a training sample set, a verification sample set and a test sample set corresponding to the changed attribute are obtained from the data set.
When the attribute of the data record in the data set changes, it indicates that the influence of different data in the data record on the output result of the model may change, and therefore, in this embodiment, the training sample set, the verification sample set, and the test sample set need to be obtained again, at this time, the obtained training sample set, the verification sample set, and the test sample set should be consistent with the changed attribute, for example, each sample (training sample, verification sample, and test sample) in the training sample set, the verification sample set, and the test sample set should contain data of which the type corresponds to the changed attribute. In this embodiment, the training sample set includes a plurality of training samples, the testing sample set includes a plurality of testing samples, and the verification sample set includes a plurality of verification samples. The data compositions of the training samples, the testing samples and the verification samples are the same, that is, the training samples, the testing samples and the verification samples contain the same type of data. Of course, in this embodiment, the training sample may also include a data record and a corresponding tag of the same person, the verification sample may include a data record of the same person, and the test sample may include a data record of the same person.
And step S140, performing model training on the pre-configured model to obtain a target model.
Specifically, the training sample set, the verification sample set and the test sample set are sequentially input into a pre-configuration model for model training, and a target model is obtained. In this embodiment, the pre-configured model is a model pre-configured in the electronic device 100, and each network parameter in the model may be an initial value that has been determined in advance, or may not have been determined.
In this embodiment, the attributes of the data records in the data set are detected, and the training samples, the verification samples and the test samples are determined according to the attribute conditions of the data records, so as to trigger the model training process and train the pre-configured model.
Optionally, in this embodiment, the method further includes obtaining a preconfigured model training rule, where the model training rule includes an algorithm used for model training.
Step S140 includes inputting the training sample set, the verification sample set and the test sample set into a pre-configured model in sequence, and performing model training according to a pre-configured model training rule to obtain the target model.
In this embodiment, the algorithm used by the training model may include Linear Regression (Linear Regression), where the Linear Regression may include at least one of an analytic solution of a Linear Regression model and Computational Complexity (Computational Complexity); a Gradient decline (BGD), a random Gradient decline (SGD), and a minimum Batch decline (Mini-Batch Gradient decline); polynomial regression (polymodal regression); learning Curves (Learning Curves), linear model regularization (Regularized linear models), wherein the linear model regularization may include at least one of ridge Regression (ridge Regression), Lasso Regression (Lasso Regression), Elastic networks (Elastic Net), Early Stopping training (Early Stopping); logistic Regression (Logistic Regression), which may include at least one of estimated probability (estimating probability), Training and loss Function (Training and Cost Function), decision boundary (decision boundaries), and Softmax Regression (Softmax Regression).
It is understood that, in the present embodiment, when the model is trained, other algorithms for training the model may also be used, and are not listed here. In this embodiment, the algorithm may be pre-stored in the electronic device 100, or may be automatically introduced into the electronic device 100 by the user as needed when the user needs to train the model.
In this embodiment, a model training rule is set, and model training is performed according to an algorithm in the model training rule, so that models trained in different scenes are more consistent. In the embodiment, the user automatically introduces the algorithm to train the model, so that the trained target model can better meet the requirements of the user.
Optionally, in this embodiment, the model training rule may include a hardware condition and/or a time condition.
In an embodiment, the model training rule may include a hardware condition, and at this time, the training sample set, the verification sample set, and the test sample set are sequentially input into the preconfigured model for model training, and the step of obtaining the target model includes detecting a state of a hardware resource, and when the state of the hardware resource satisfies a preset hardware condition, the training sample set, the verification sample set, and the test sample set are sequentially input into the preconfigured model for model training, so as to obtain the target model.
In another embodiment, the model training rule may include a time condition, and at this time, the training sample set, the verification sample set, and the test sample set are sequentially input into the preconfigured model for model training, and the step of obtaining the target model includes obtaining a current time, determining whether the current time reaches a preset start time, and if the current time reaches the preset time, sequentially inputting the training sample set, the verification sample set, and the test sample set into the preconfigured model for model training, so as to obtain the target model. When the model training rules only include a time condition, the time condition may be set to a point in time at which a change in an attribute of the data record is detected, that is, the training process of the model may be started immediately after a change in the attribute of the data record is detected. When the model training rule only includes the time condition, the time condition may also be set to any time point after detecting the change of the attribute of the data record or any time point within the preset time length or a time point after the preset time length, that is, the training process of the model may be performed after detecting the change of the attribute of the data record.
In another embodiment, the model training rules include both hardware and time conditions. At this time, the step of inputting the training sample set, the verification sample set and the test sample set into a pre-configuration model in sequence for model training to obtain a target model comprises the steps of detecting the state of hardware resources and obtaining the current time; judging whether the state of the hardware resource meets a preset hardware condition and whether the current time reaches a preset starting time; and if the state of the hardware resource meets the preset hardware condition and the current time reaches the preset starting time, sequentially inputting the training sample set, the verification sample set and the test sample set into a pre-configuration model for model training to obtain a target model.
In this case, the training process of the model is started only when the model satisfies the hardware condition and the time condition.
In this embodiment, the hardware condition refers to a requirement for a hardware resource, and includes a requirement for the following hardware: a Central Processing Unit (CPU) 130, a memory, a video memory, and a hard disk storage. Specifically, the hardware condition may include, but is not limited to, a CPU usage rate, a CPU core count, a memory occupancy rate, a video memory size, a hard disk storage size, and the like.
In the embodiment, hardware conditions are added in the model training, so that the requirements on resources in the model training process can be met, and the model training process is ensured to be smoothly carried out. In addition, model training can be performed under the condition that hardware resources are enough, influence on other processes is avoided, and the hardware resources are utilized more reasonably. And a time condition is added in the model training, so that the interference of the model training process to other processes can be avoided.
In this embodiment, the rule of model training may further include training times, and by setting the training times, the time length of the model training process may be ensured, and meanwhile, sufficient accuracy may also be achieved.
Optionally, in this embodiment, the method further includes obtaining a model training method determined by a user, where the model training method includes at least one of linear regression, gradient descent, polynomial regression, learning curve, linear model regularization, and logistic regression.
It is understood that, in the present embodiment, the model training method determined by the user may also be other algorithms that can be used for model training.
When the model training method determined by the user is specifically obtained, the model training method can be obtained according to the specific operation of the user. For example, if the algorithm required for model training is pre-configured, for example, in the electronic device 100, the user may directly input the identification information of the corresponding algorithm and the algorithm selection command, and then obtain the algorithm corresponding to the identification information of the algorithm according to the identification information of the algorithm and the algorithm selection command of the user.
If the algorithm needed to be used for model training is not pre-configured, the user can manually import the algorithm, so that the algorithm imported by the user is directly acquired when the algorithm used for model training is acquired.
After obtaining a model training method determined by a user, the steps of inputting the training sample set, the verification sample set and the test sample set into a pre-configured model in sequence, and performing model training according to a pre-configured model training rule to obtain the target model can be executed, wherein the specific execution process of the step comprises the steps of inputting each training sample set, the verification sample set and the test sample set into the pre-configured model in sequence, and performing training by respectively adopting algorithms in the model training rule to obtain a sub-model corresponding to each algorithm; and then obtaining a target model according to the sub-model corresponding to each algorithm.
In this embodiment, each algorithm is adopted to train the pre-configured model, so that a sub-model corresponding to each algorithm can be obtained, and model fusion is performed on the sub-models corresponding to each algorithm, so that a target model can be obtained.
For example, a corresponding weight may be determined for each submodel, and then the output of each submodel may be multiplied by the corresponding weight, so that the target model may be obtained. Wherein, when the weight corresponding to each submodel is set, the weight can be set according to the characteristics of the submodel of each algorithm. According to the target model obtained by each sub-model, the output result of the target model is influenced by the output result of each sub-model, so that the output result of the target model is more accurate.
Referring to fig. 3, optionally, in this embodiment, before inputting the training sample set, the verification sample set, and the test sample set into the preconfigured model for model training, the method further includes steps S210 to S230.
Step S210, if the data type of the data changes, the proportion of each type of data is obtained.
In step S220, the weighting coefficients of the data of each type are obtained according to the proportion of the data of each type.
Step S230, adjusting the corresponding network parameters in the preconfigured model according to the weight coefficients, to obtain a new preconfigured model.
In this embodiment, the data types may be each type obtained by classifying according to a data classification rule. The data classification rules can be classified according to the source channel of the data, or can be classified according to other classification methods.
For example, in training models for identifying illicit persons, such as yellow-related persons, the channels of prior activities are relatively single, usually in some real arenas. However, with the advent of novelty, there is still a trail of lawless persons on new vectors. For example, the communication channel of lawbreakers may be from the original telephone, text message to other more social software, and may be further developed to include live platform, friend-making software, etc. Lawless persons may also develop more ways of payment from the original cash payment, such as WeChat, Payment treasured, online transfer, virtual currency. In this case, if a model is trained using data of the data record before the attribute of the data record is changed, the accuracy of the model in identifying the lawless person is necessarily reduced after the attribute of the data record is changed. For example, after the communication tool is changed from WeChat and QQ to WeChat and QQ, a live platform and friend-making software, if the lawless persons are still identified only by the data collected on the WeChat and QQ, the results obtained by the live platform and friend-making software in most of the lawless persons are necessarily inaccurate.
For example, the data types are classified according to the chat software used by the user, and in this embodiment, if two chat software, namely WeChat and QQ, are used at the beginning of the person, after the chat software of the person is changed from WeChat, QQ to WeChat, QQ, live broadcast platform and other friend making software, the network parameters of the network part corresponding to the data collected in QQ and WeChat in the pre-configuration model can be reduced. In this way, a new pre-configured model is obtained.
The embodiment is used for determining the weight coefficient of each type of data according to the proportion of different types of data, and then adjusting the corresponding network parameter in the pre-configured model according to the weight coefficient, so that the trained target model can be more accurate.
The embodiment is used for adjusting the pre-configured model and training the adjusted pre-configured model, so that the trained target model can be more accurate.
Referring to fig. 4, optionally, in this embodiment, the step S120 of detecting whether the attribute of the data record in the data set changes includes substeps S121-substep S125.
In step S121, an attribute of the data record generated in the current time period is acquired as a first attribute.
Step S122, obtaining an attribute of the generated data record in a preset time period before the current time period as a second attribute.
Step S123, determining whether the first attribute is consistent with the second attribute.
Step S124, if the first attribute is consistent with the second attribute, determining that the attribute of the data record has not changed.
Step S125, if the first attribute is inconsistent with the second attribute, determining that the attribute of the data record in the data set changes.
In this embodiment, the first attribute and the second attribute may include quality, type (dimension), number per unit time, value, and the like of the data. When judging whether the first attribute is consistent with the second attribute, the judgment can be carried out through at least one of quality, type, quantity or value, and as long as any one of the quality, the type, the quantity or the value is changed, the first attribute is considered to be inconsistent with the second attribute, and at the moment, the attribute of the data record is judged to be changed.
In this embodiment, the quality of the data record may be determined according to data that has no effect on the recognition result and is contained in the data record, for example, in the data generated in the original unit time, the proportion of data that can affect the recognition result is ninety percent, but becomes fifty percent later, which indicates that the quality of the data has changed. As another example, in the above scheme for identifying eating habits of a person, the collected data includes the type of dish ordered by the person, and if the person is at location X and the friend of the person is at location Y, the quality of the data for taking out ordered by the friend of the person at location Y is low.
In this embodiment, the type of data is a category to which the data belongs. For example, when the data types are changed according to the types of the communication tools, if the personnel only communicate through WeChat and QQ communication tools at first and then add a live broadcast platform and the like for communication, the change of the data types is indicated.
In this embodiment, the number per unit time refers to a certain type of number per unit time, for example, in the above scheme for determining eating habits of a person, if the number of takeouts of a person in the first month is thirty, and the number of takeouts of the person in the second month is twenty, it indicates that the number of data per unit time in the data record of the person is decreased.
In this embodiment, when detecting whether the attribute of the data record in the data set changes, the data may be acquired in different manners according to different situations. The following describes in detail the data acquisition method in different cases.
In one embodiment, the data included in the data record is imported into the data set in real time, at this time, the attribute of the data record imported into the data set in the current period of time (first period of time) may be obtained as the first attribute, and then a period of time (second time quality, type (dimension), number per unit time, value, etc.) before the period of time is compared between the first period of time and the second period of time, so as to determine whether the attribute of the data record changes according to the comparison result. When the first time period is twice the second time period, then the scaling threshold may be set to 2.
In another embodiment, the data included in the data record is imported into the data set at regular time, in this case, an attribute of the data record imported into the data set at a time point (first time point) where the data was imported most recently may be used as a first attribute, an attribute of the data record imported at a time point (second time point) before the first time point by a preset time may be used as a second attribute, and then the quality, type (dimension), number per unit time, and value of the data imported at the first time point are compared with the quality, type (dimension), number per unit time, and value of the data imported at the second time point, so as to determine whether the attribute of the data record changes according to the comparison result.
In another embodiment, the data for comparison may be determined according to the output time of the target model, and the specific principle is similar to that when the data included in the data record is periodically imported into the data set, and will not be described herein again.
In this embodiment, the method may be performed on a model training platform configured on the electronic device 100 in advance, where the model training platform may include a part for importing data, and the data in the data record may be from an Application (APP), a WEB page (WEB), a Database (Database), a Log (Log), and the like, and specifically, the data from the application may be acquired from an Agent server (Agent). Data from web pages can also be obtained from proxy servers (agents), data from databases can be obtained from snapshot systems, and data from logs can be obtained from log systems (flash). And forming a message queue in the message queue cluster from the data acquired from each channel, and classifying the data acquired in the message queue according to a plurality of types (dimensions) by the data import platform so as to acquire a data set.
Referring to fig. 5, an embodiment of the present application further provides a model training apparatus 110, which includes a first obtaining module 111, a detecting module 112, a second obtaining module 113, and a training module 114. The model training apparatus 110 includes a software function module which can be stored in the memory 120 in the form of software or firmware or solidified in an Operating System (OS) of the electronic device 100.
A first obtaining module 111 for obtaining a data set comprising data records of a plurality of persons, the data records comprising at least one item of data describing a known person's behavior.
The first obtaining module 111 in this embodiment is configured to execute step S110, and for a detailed description of the first obtaining module 111, reference may be made to the description of step S110.
A detecting module 112, configured to detect whether an attribute of the data record in the data set changes, where the attribute includes a type of the data and a quantity of each type of data corresponding to a unit time.
The detection module 112 in this embodiment is configured to perform the step S120, and the detailed description about the detection module 112 may refer to the description about the step S120.
The second obtaining module 113 is configured to, when an attribute of a data record in a data set changes, obtain, from the data set, a training sample set, a verification sample set, and a test sample set corresponding to the changed attribute.
The second obtaining module 113 in this embodiment is configured to perform the step S130, and the detailed description about the second obtaining module 113 may refer to the description about the step S130.
And the training module 114 is configured to sequentially input the training sample set, the verification sample set, and the test sample set into a pre-configured model for model training, so as to obtain a target model.
The training module 114 in this embodiment is used to execute step S140, and the detailed description about the training module 114 may refer to the description about step S140.
The present embodiment also provides a readable storage medium, which stores an executable program, and when executing the executable program, the processor 130 implements the method according to any one of the embodiments.
The above description is only for various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and all such changes or substitutions are included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of model training, the method comprising:
obtaining a data set comprising data records of a plurality of persons, the data records comprising data of at least one activity of known persons;
detecting whether the attribute of the data record in the data set changes or not, wherein the attribute comprises the type of the data and the number of each type of data corresponding to the data record in unit time;
if the attribute of the data record in the data set changes, acquiring a training sample set, a verification sample set and a test sample set corresponding to the changed attribute from the data set;
and inputting the training sample set, the verification sample set and the test sample set into a pre-configured model in sequence for model training to obtain a target model.
2. The method of claim 1, further comprising: obtaining a pre-configured model training rule, wherein the model training rule comprises an algorithm adopted by model training;
the step of inputting the training sample set, the verification sample set and the test sample set into a pre-configuration model in sequence for model training to obtain a target model comprises the following steps:
and inputting the training sample set, the verification sample set and the test sample set into a pre-configured model in sequence, and carrying out model training according to a pre-configured model training rule to obtain the target model.
3. The method of claim 2, further comprising:
obtaining a model training method determined by a user, wherein the model training method comprises at least one of linear regression, gradient descent, polynomial regression, learning curve, linear model regularization and logistic regression.
4. The method of claim 3, wherein the step of sequentially inputting the training sample set, the validation sample set, and the test sample set into a preconfigured model and performing model training according to preconfigured model training rules to obtain the target model comprises:
inputting each training sample set, each verification sample set and each test sample set into the pre-configuration model in sequence, and training by adopting the algorithms in the model training rules respectively to obtain the sub-models corresponding to each algorithm;
and obtaining a target model according to the sub-model corresponding to each algorithm.
5. The method according to any one of claims 1-4, wherein the training sample set, the validation sample set, and the test sample set are sequentially input into a pre-configured model for model training, and before obtaining the target model, the method further comprises:
if the data type of the data changes, acquiring the proportion of each type of data;
obtaining the weight coefficient of each type of data according to the proportion of each type of data;
and adjusting the corresponding network parameters in the pre-configured model according to the weight coefficient to obtain a new pre-configured model.
6. The method of claim 5, wherein the step of inputting the training sample set, the validation sample set and the test sample set into the preconfigured model in sequence for model training to obtain the target model comprises:
detecting the state of hardware resources and/or acquiring the current time;
judging whether the state of the hardware resource meets a preset hardware condition and/or whether the current time reaches a preset starting time;
and if the state of the hardware resource meets a preset hardware condition and/or the current time reaches a preset starting time, sequentially inputting the training sample set, the verification sample set and the test sample set into a pre-configuration model for model training to obtain a target model.
7. The method of claim 1, wherein the step of detecting whether the attributes of the data records in the data set have changed comprises:
acquiring the attribute of a data record generated in the current time period as a first attribute;
acquiring the attribute of the generated data record in a preset time period before the current time period as a second attribute;
judging whether the first attribute is consistent with the second attribute;
and if the first attribute is inconsistent with the second attribute, determining that the attribute of the data record in the data set is changed.
8. A model training apparatus, the apparatus comprising:
a first acquisition module for acquiring a data set comprising data records of a plurality of persons, the data records comprising at least one item of data describing a known person's behavior;
the detection module is used for detecting whether the attribute of the data record in the data set changes or not, wherein the attribute comprises the type of the data and the number of each type of data corresponding to the data record in unit time;
the second acquisition module is used for acquiring a training sample set, a verification sample set and a test sample set corresponding to changed attributes from a data set when the attributes of data records in the data set change;
and the training module is used for inputting the training sample set, the verification sample set and the test sample set into a pre-configuration model in sequence for model training to obtain a target model.
9. A readable storage medium, characterized in that the readable storage medium stores an executable program, which when executed by a processor implements the method according to any one of claims 1-7.
10. An electronic device, comprising a memory and a processor, the memory and the processor being electrically connected, the memory having stored therein an executable program, the processor, when executing the executable program, implementing the method of any one of claims 1-7.
CN202010066345.2A 2020-01-20 2020-01-20 Model training method and device, readable storage medium and electronic equipment Pending CN111309706A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010066345.2A CN111309706A (en) 2020-01-20 2020-01-20 Model training method and device, readable storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010066345.2A CN111309706A (en) 2020-01-20 2020-01-20 Model training method and device, readable storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN111309706A true CN111309706A (en) 2020-06-19

Family

ID=71156456

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010066345.2A Pending CN111309706A (en) 2020-01-20 2020-01-20 Model training method and device, readable storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111309706A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114469089A (en) * 2021-12-30 2022-05-13 北京津发科技股份有限公司 Multi-mode data compression resistance evaluation method and system based on virtual reality technology

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718490A (en) * 2014-12-04 2016-06-29 阿里巴巴集团控股有限公司 Method and device for updating classifying model
US20170177924A1 (en) * 2014-07-17 2017-06-22 Nec Solution Innovators, Ltd. Attribute factor analysis method, device, and program
CN108875963A (en) * 2018-06-28 2018-11-23 北京字节跳动网络技术有限公司 Optimization method, device, terminal device and the storage medium of machine learning model
CN108898504A (en) * 2018-07-09 2018-11-27 北京精友世纪软件技术有限公司 The intelligent training and improving method of loss assessment system are surveyed in a kind of movement
CN109978062A (en) * 2019-03-28 2019-07-05 北京九章云极科技有限公司 A kind of model on-line monitoring method and system
CN110610415A (en) * 2019-09-26 2019-12-24 北京明略软件系统有限公司 Method and device for updating model
CN110705717A (en) * 2019-09-30 2020-01-17 支付宝(杭州)信息技术有限公司 Training method, device and equipment of machine learning model executed by computer

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170177924A1 (en) * 2014-07-17 2017-06-22 Nec Solution Innovators, Ltd. Attribute factor analysis method, device, and program
CN105718490A (en) * 2014-12-04 2016-06-29 阿里巴巴集团控股有限公司 Method and device for updating classifying model
CN108875963A (en) * 2018-06-28 2018-11-23 北京字节跳动网络技术有限公司 Optimization method, device, terminal device and the storage medium of machine learning model
CN108898504A (en) * 2018-07-09 2018-11-27 北京精友世纪软件技术有限公司 The intelligent training and improving method of loss assessment system are surveyed in a kind of movement
CN109978062A (en) * 2019-03-28 2019-07-05 北京九章云极科技有限公司 A kind of model on-line monitoring method and system
CN110610415A (en) * 2019-09-26 2019-12-24 北京明略软件系统有限公司 Method and device for updating model
CN110705717A (en) * 2019-09-30 2020-01-17 支付宝(杭州)信息技术有限公司 Training method, device and equipment of machine learning model executed by computer

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114469089A (en) * 2021-12-30 2022-05-13 北京津发科技股份有限公司 Multi-mode data compression resistance evaluation method and system based on virtual reality technology

Similar Documents

Publication Publication Date Title
US10943186B2 (en) Machine learning model training method and device, and electronic device
CN108121795B (en) User behavior prediction method and device
JP6771751B2 (en) Risk assessment method and system
CN110162717B (en) Method and device for recommending friends
US20150278706A1 (en) Method, Predictive Analytics System, and Computer Program Product for Performing Online and Offline Learning
US9462313B1 (en) Prediction of media selection consumption using analysis of user behavior
US20130268520A1 (en) Incremental Visualization for Structured Data in an Enterprise-level Data Store
JP6547070B2 (en) Method, device and computer storage medium for push information coarse selection sorting
CN110442712B (en) Risk determination method, risk determination device, server and text examination system
WO2019061664A1 (en) Electronic device, user's internet surfing data-based product recommendation method, and storage medium
CN110929799A (en) Method, electronic device, and computer-readable medium for detecting abnormal user
CN112114986A (en) Data anomaly identification method and device, server and storage medium
CN111258593B (en) Application program prediction model building method and device, storage medium and terminal
CN112329816A (en) Data classification method and device, electronic equipment and readable storage medium
CN111460384A (en) Policy evaluation method, device and equipment
CN111340233B (en) Training method and device of machine learning model, and sample processing method and device
CN110602207A (en) Method, device, server and storage medium for predicting push information based on off-network
CN112784168B (en) Information push model training method and device, information push method and device
CN111435369A (en) Music recommendation method, device, terminal and storage medium
CN111309706A (en) Model training method and device, readable storage medium and electronic equipment
TW202111592A (en) Learning model application system, learning model application method, and program
US20230069999A1 (en) Method and apparatus for updating recommendation model, computer device and storage medium
CN114697127B (en) Service session risk processing method based on cloud computing and server
CN110717653A (en) Risk identification method and device and electronic equipment
CN111368864A (en) Identification method, availability evaluation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200619

WD01 Invention patent application deemed withdrawn after publication