CN113869544A - Reflow user prediction model establishing method and device, electronic equipment and storage medium - Google Patents

Reflow user prediction model establishing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113869544A
CN113869544A CN202010615016.9A CN202010615016A CN113869544A CN 113869544 A CN113869544 A CN 113869544A CN 202010615016 A CN202010615016 A CN 202010615016A CN 113869544 A CN113869544 A CN 113869544A
Authority
CN
China
Prior art keywords
user
reflow
field set
prediction model
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010615016.9A
Other languages
Chinese (zh)
Inventor
梁彩燕
南添
吴修权
黄志豪
王建宏
刘忱
涂锋
汤嘉铭
戚玉雷
赖柯明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Guangdong Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Guangdong Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Guangdong Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010615016.9A priority Critical patent/CN113869544A/en
Publication of CN113869544A publication Critical patent/CN113869544A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling

Abstract

The application discloses a method and a device for establishing a backflow user prediction model, electronic equipment and a storage medium, and relates to the technical field of network communication. The method comprises the steps that a characteristic field set which represents user internet access attributes of a user with a reflow label in advance under multiple dimensions is collected; then, optimizing a feature field set representing the internet access attribute of the user under multiple dimensions to obtain an effective feature field set; and finally, inputting the effective characteristic field set corresponding to each user marked with the backflow label into a training network model as a training sample for training so as to establish a backflow user prediction model, thereby accurately positioning the user with a large backflow tendency by using the backflow user prediction model, having a higher success rate of retrieval of the positioned user, and saving marketing resources.

Description

Reflow user prediction model establishing method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of network communication technologies, and in particular, to a method and an apparatus for building a reflow user prediction model, an electronic device, and a storage medium.
Background
With the development of the communication industry, the customer has more and more leeway to select a communication carrier. In the case of a market that tends to saturate, the cost of developing new customers is much greater than the cost of recovering customers. To achieve more market share, the recovery of lost customers is an important marketing strategy for operators. Therefore, how to accurately predict and locate customers with a large reflow tendency and recover lost customers from a large number of lost customers is an important topic at present.
In the prior art, a retrieval method for lost users is mainly to perform statistical analysis according to lost user characteristics, determine users who meet a preset standard in a statistical result as customers with a large backflow tendency, and then locate the users with the large backflow tendency inaccurately, thereby causing marketing resource waste for retrieval of the users.
Disclosure of Invention
The embodiment of the application provides a method and a device for establishing a reflow user prediction model, electronic equipment and a storage medium, so as to solve the problem that the positioning of users with large reflow tendency is not accurate enough.
In a first aspect, an embodiment of the present application provides a method for building a reflow user prediction model, including:
collecting a characteristic field set representing user internet access attributes of a user with a reflow label identified in advance under multiple dimensions;
optimizing a feature field set representing user internet access attributes under multiple dimensions to obtain an effective feature field set;
and inputting the effective characteristic field set corresponding to each user marked with the reflow label as a training sample into a training network model for training so as to establish a reflow user prediction model.
In a second aspect, an embodiment of the present application further provides a device for building a reflow user prediction model, including:
the data acquisition unit is configured to acquire a characteristic field set representing user internet surfing attributes of a user which is marked with a reflow label in advance under multiple dimensions;
the data processing unit is configured to optimize a characteristic field set representing the internet access attribute of the user under multiple dimensions to obtain an effective characteristic field set;
and the model establishing unit is configured to input the effective characteristic field set corresponding to each user identified with the reflow label into the training network model as a training sample for training so as to establish a reflow user prediction model.
In a third aspect, an embodiment of the present application further provides an electronic device, including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the reflow user prediction model building method according to the first aspect of the embodiments of the present application.
In a fourth aspect, embodiments of the present application further provide a storage medium, where instructions executed by a processor of an electronic device enable the electronic device to perform the reflow user prediction model building method according to the first aspect of the embodiments of the present application.
The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects: the method comprises the steps that a characteristic field set which represents user internet access attributes of a user with a reflow label in advance under multiple dimensions is collected; then, optimizing a feature field set representing the internet access attribute of the user under multiple dimensions to obtain an effective feature field set; and finally, inputting the effective characteristic field set corresponding to each user marked with the reflow label into a training network model as a training sample for training so as to establish a reflow user prediction model, so that the reflow user prediction model can be used for accurately positioning the user with a large reflow tendency, the success rate of retrieval of the positioned user is higher, and marketing resources are saved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a schematic interaction diagram of an electronic device and a plurality of user terminals according to an embodiment of the present application;
FIG. 2 is a flowchart of a method for building a reflow user prediction model according to an embodiment of the present application;
FIG. 3 is a flowchart of a method for building a reflow user prediction model according to an embodiment of the present application;
FIG. 4 is a flowchart of a method for building a reflow user prediction model according to an embodiment of the present application;
FIG. 5 is a flowchart of a method for building a reflow user prediction model according to an embodiment of the present application;
fig. 6 is a functional block diagram of a reflow user prediction model building apparatus according to an embodiment of the present application;
fig. 7 is a functional block diagram of a reflow user prediction model building apparatus according to an embodiment of the present application;
fig. 8 is a functional block diagram of a reflow user prediction model creation apparatus according to an embodiment of the present application;
fig. 9 is a circuit connection block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Referring to fig. 1, an embodiment of the present application provides a method for building a reflow user prediction model, which is applied to an electronic device 100, where the electronic device 100 may be a server. As shown in fig. 2, the electronic device 100 is communicatively connected with a plurality of user terminals 200 for data interaction. The method comprises the following steps:
s11: and collecting a characteristic field set representing user internet access attributes of a user with a reflow label identified in advance under multiple dimensions.
Optionally, the plurality of dimensions include at least one of a user basic information dimension, a user consumption information dimension, a user location information dimension, and a user terminal information dimension. In the embodiment of the application, the multiple dimensions comprise user basic information, user consumption information, user position information and user terminal information. Wherein, the characteristic fields of the user basic information comprise information fields of name, age, sex, network age, VIP grade and the like; the characteristic fields subordinate to the user consumption information comprise information fields such as month, total flow, total consumption, voice consumption and the like, and data consumption; the characteristic fields subordinate to the user position information comprise information fields of a user resident cell, resident cells belonging to rural areas/towns and the like; type of area (e.g., school, factory, business district) to which the resident address belongs; the characteristic fields subordinate to the user terminal information comprise information fields of terminal brands, models, systems, operating systems and the like. It is to be understood that the above-mentioned feature fields may be collected by a plurality of user terminals 200.
S12: and optimizing the characteristic field set representing the internet access attribute of the user under multiple dimensions to obtain an effective characteristic field set.
Generally, the feature fields included in the feature field set characterizing the internet access attribute of the user in multiple dimensions are orders of magnitude larger, wherein some feature fields that are invalid for the user predicting the reflow are interspersed, and therefore, the invalid feature fields need to be removed or changed into valid feature fields.
S13: and inputting the effective characteristic field set corresponding to each user marked with the reflow label as a training sample into a training network model for training so as to establish a reflow user prediction model.
Optionally, the training network model is a random forest algorithm model, a decision tree algorithm model, or a neural network model. The training process is introduced by taking a training network model to collect a random forest algorithm model as an example as follows: the random forest algorithm model principle is an integrated learning formed by a plurality of decision tree classifiers and can be used for classification and regression, and the splitting principle adopted by the decision tree classifiers is a kini coefficient. Selecting a feature xi with the best classification effect from m feature variables according to the principle of minimum basis coefficient purity, wherein the specific formula is as follows:
Figure BDA0002563447170000041
wherein p (i) represents the ratio of each class to the total number of classes, and weights the different training sample classes according to the following equation:
Figure BDA0002563447170000051
wherein, Wh,majThe weighted value is expressed by the above formula, each decision tree h has different weights for different training sample classifications, nMAG expresses the number of training samples in a training set, and after the classification weight of each decision tree is obtained, the voting value of each training sample is calculated based on the weight, so that the final classification result is obtained.
The final classification result adopts a simple majority voting method. The concrete mode is as follows: firstly, the difference between the classification models is increased by constructing different training sets, so that the extrapolation prediction capability of the combined classification model is improved. Controlling mtry parameters of the random forest algorithm model in parameter combinations [15,30,45,60 and 75], controlling ntree parameters of the random forest algorithm model in parameter combinations [50,100,200,300,500], and taking the mtry parameters and the ntree parameters as parameters for subsequent modeling; repeating the two parameter combinations for 3 times, and extracting different sample sets according to a bostrap method before repeating each time; obtaining a classification model sequence, forming a multi-classification model system by using the classification model sequence, and finally adopting a simple majority voting method, wherein the formula is as follows:
H(x)=argmax∑I(hi(x)=Y)
where h (x) represents the combined classification model (whether a sample point will be classified by reflow), Y represents the output variable (target variable), and I (.) is an indicative function. And evaluating the quality of the current model by using the model accuracy of the reflowed test sample, namely randomly taking 20% as the test sample and 80% as the training sample, establishing the model by using the training sample, counting the average accuracy of each combination of 3 repeated tests, and selecting the H (x) combination with the highest accuracy. The algorithm can judge the performance of the whole model, and if the performance is not obviously increased, the algorithm is stopped and the final classification result is output.
The method for establishing the reflow user prediction model comprises the steps of collecting a characteristic field set representing user internet access attributes of users with reflow labels in multiple dimensions; then, optimizing a feature field set representing the internet access attribute of the user under multiple dimensions to obtain an effective feature field set; and finally, inputting the effective characteristic field set corresponding to each user marked with the reflow label into a training network model as a training sample for training so as to establish a reflow user prediction model, so that the reflow user prediction model can be used for accurately positioning the user with a large reflow tendency, the success rate of retrieval of the positioned user is higher, and marketing resources are saved.
Optionally, as one of the implementation manners, S12 includes:
screening the characteristic fields representing the internet surfing attributes of the user under multiple dimensions according to preset screening rules to obtain an effective characteristic field set, wherein the screening rules at least comprise one or more of the following combinations:
deleting the characteristic fields with the missing value ratio larger than a preset first threshold value in the characteristic field set;
the preset first threshold may be 70%, 65%, or 80%. For example, the data source of the old and rural network is often lost, so that 95% of users cannot match the 'old and rural network MOU', the field loses modeling significance, and the field is removed.
Deleting the characteristic fields of which the occupation ratios of the same record values are greater than a preset second threshold value from a plurality of selected record values belonging to a certain category in the characteristic field set;
wherein, the preset second threshold value can be 90%, 85% or 80%. For example, for 95% of users, the multiple selection record values "whether the customer is billed" belonging to a certain category are all "yes", and the characteristics of the field are not obvious and are removed.
Deleting the characteristic fields of which the variation coefficients are smaller than a preset third threshold in the characteristic field set;
the preset third threshold may be 0.1, 0.2, 0.15. For example, the variation coefficient of the "international roaming calling MOU" is less than 0.1, which indicates that the fluctuation of the feature field set is small, the contained information amount is small, and the samples are difficult to distinguish and reject.
And deleting the characteristic fields of which the correlation coefficients are smaller than a preset fourth threshold value in the characteristic field set.
The preset fourth threshold may be 0.7, 0.75, or 0.8. For example, the correlation coefficient between the refund and the user label is only 0.5, which indicates that the correlation between the refund and the user label is not high, and the refund and the user label are removed.
Alternatively, as shown in fig. 3, as another embodiment, S12 includes:
s31: and carrying out standardization processing on a characteristic field set representing the internet access attribute of the user under multiple dimensions of the user marked with the reflow label in advance to obtain a standardized vector.
Specifically, the formula can be used
Figure BDA0002563447170000071
And carrying out standardization processing on a characteristic field set representing the internet access attribute of the user under multiple dimensions by the user marked with the reflow label in advance to obtain a standardized vector.
Where yi is the normalized vector, xi is the original feature field,
Figure BDA0002563447170000072
is the average, σ, of the fields of the same class of featuresxIs the standard deviation of the same class of feature fields. The meaning of the standardization processing is that all data are scaled in a specific interval, so that a small amount of samples with too large and small sizes are prevented from generating great influence on the whole training, and the generalization capability of a model to be built subsequently is prevented from being influenced.
S32: and finding a second normalized vector closest to the first normalized vector with the missing field according to a proximity algorithm.
Can be calculated according to
Figure BDA0002563447170000073
A second normalized vector that is closest in distance to the first normalized vector where the missing field exists is found.
S33: a weighting value is determined based on the distance, wherein the weighting value decreases as the distance increases.
Specifically, the weight may be determined from the gaussian kernel function, the distance.
S34: and weighting the target field corresponding to the missing field in the second normalized vector according to the weighting value to obtain a filling field.
S35: the padding field is padded to the missing field.
Optionally, as shown in fig. 4, the method further includes:
s41: and feeding new training samples to the backflow user prediction model for training.
S42: and updating the network parameters of the reflow user prediction model.
Through S41-S42, the reflow user prediction model can be continuously optimized through reinforcement learning.
Optionally, as shown in fig. 4, before S11, the method further includes:
s51: and for the login account of the target network existing in the first target time period, logging out the login account in a second time period after the first time period, and recovering the used user identifier reflow label from the login account in a third time period after the second time period.
For example, the first target time period may be from 5 months 1 day to 5 months 31 days, the second time period may be from 6 months 1 day to 6 months 10 days, and the third time period may be from 6 months 11 days to 6 months 30 days. It is understood that the user's login situation to the target network may be collected to a plurality of user terminals 200.
Referring to fig. 6, an apparatus 600 for building a reflow user prediction model is further provided in the embodiment of the present application, and is applied to the electronic device 100, where the electronic device 100 may be a server. As shown in fig. 2, the electronic device 100 is communicatively connected with a plurality of user terminals 200 for data interaction. It should be noted that the basic principle and the resulting technical effect of the reflow user prediction model creation apparatus 600 provided in the embodiment of the present application are the same as those of the above embodiment, and for a brief description, reference may be made to the corresponding contents in the above embodiment for the sake of brevity. The reflow user prediction model creation apparatus 600 includes a data acquisition unit 601, a data processing unit 602, and a model creation unit 603, wherein,
the data acquisition unit 601 is configured to acquire a feature field set representing user internet access attributes of a user who is identified with a reflow tag in advance in multiple dimensions.
The plurality of dimensions may include at least one of a user base information dimension, a user consumption information dimension, a user location information dimension, and a user terminal information dimension.
The way to identify the reflow label may be: for a login account with a target network in a first target time period, logging out the login account in a second time period after the first time period, and recovering the used user identification reflow label by the login account in a third time period after the second time period.
The data processing unit 602 is configured to perform optimization processing on a feature field set representing a user internet access attribute in multiple dimensions, so as to obtain an effective feature field set.
And a model establishing unit 603 configured to input the valid feature field set corresponding to each user identified with the reflow label as a training sample into the training network model for training so as to establish a reflow user prediction model.
The training network model can be, but is not limited to, a random forest algorithm model, a decision tree algorithm model, a neural network model.
The reflow user prediction model creation apparatus 600 may implement the following functions when executed: the method comprises the steps that a characteristic field set which represents user internet access attributes of a user with a reflow label in advance under multiple dimensions is collected; then, optimizing a feature field set representing the internet access attribute of the user under multiple dimensions to obtain an effective feature field set; and finally, inputting the effective characteristic field set corresponding to each user marked with the reflow label into a training network model as a training sample for training so as to establish a reflow user prediction model, so that the reflow user prediction model can be used for accurately positioning the user with a large reflow tendency, the success rate of retrieval of the positioned user is higher, and marketing resources are saved.
Specifically, the data processing unit 602 is specifically configured to filter, according to a preset filtering rule, feature fields representing internet surfing attributes of a user in multiple dimensions, so as to obtain an effective feature field set, where the filtering rule at least includes one or more of the following combinations:
deleting the characteristic fields with the missing value ratio larger than a preset first threshold value in the characteristic field set;
deleting the characteristic fields of which the occupation ratios of the same record values are greater than a preset second threshold value from a plurality of selected record values belonging to a certain category in the characteristic field set;
deleting the characteristic fields of which the variation coefficients are smaller than a preset third threshold in the characteristic field set;
and deleting the characteristic fields of which the correlation coefficients are smaller than a preset fourth threshold value in the characteristic field set.
Specifically, as another embodiment, as shown in fig. 7, the data processing unit 602 is specifically configured to
The vector generation module 701 is configured to perform normalization processing on a feature field set representing user internet access attributes of a user, which is identified with a reflow tag in advance, in multiple dimensions to obtain a normalized vector.
A vector lookup module 702 configured to find a second normalized vector closest to the first normalized vector where the missing field exists according to a proximity algorithm.
A weighting value determining module 703 configured to determine a weighting value according to the distance, wherein the weighting value decreases as the distance increases.
And a padding field generating module 704 configured to weight the target field corresponding to the missing field in the second normalized vector according to the weighted value, so as to obtain a padding field.
A data padding module 705 configured to pad the padding field to the missing field.
Optionally, as shown in fig. 8, the apparatus 600 further includes:
a model training unit 801 configured to feed new training samples into the reflow user prediction model for training.
A parameter updating unit 802 configured to update network parameters of the reflow user prediction model.
It should be noted that the execution subjects of the steps of the method provided in embodiment 1 may be the same device, or different devices may be used as the execution subjects of the method. For example, the execution subject of steps 21 and 22 may be device 1, and the execution subject of step 23 may be device 2; for another example, the execution subject of step 21 may be device 1, and the execution subjects of steps 22 and 23 may be device 2; and so on.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 9, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry standard architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry standard architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 9, but this does not indicate only one bus or one type of bus.
And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.
The processor reads a corresponding computer program from the nonvolatile memory to the memory and then runs the computer program to form the reflow user prediction model establishing device on the logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:
collecting a characteristic field set representing user internet access attributes of a user with a reflow label identified in advance under multiple dimensions;
optimizing a feature field set representing user internet access attributes under multiple dimensions to obtain an effective feature field set;
and inputting the effective characteristic field set corresponding to each user marked with the reflow label as a training sample into a training network model for training so as to establish a reflow user prediction model.
The method executed by the reflow user prediction model building device according to the embodiment shown in fig. 1 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
The electronic device may also execute the method shown in fig. 1, and implement the functions of the apparatus for building a backward user prediction model in the embodiment shown in fig. 1, which are not described herein again in this embodiment of the present application.
Of course, besides the software implementation, the electronic device of the present application does not exclude other implementations, such as a logic device or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.
Embodiments of the present application also provide a computer-readable storage medium storing one or more programs, where the one or more programs include instructions, which when executed by a portable electronic device including a plurality of application programs, enable the portable electronic device to perform the method of the embodiment shown in fig. 1, and are specifically configured to:
collecting a characteristic field set representing user internet access attributes of a user with a reflow label identified in advance under multiple dimensions;
optimizing a feature field set representing user internet access attributes under multiple dimensions to obtain an effective feature field set;
and inputting the effective characteristic field set corresponding to each user marked with the reflow label as a training sample into a training network model for training so as to establish a reflow user prediction model.
In short, the above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims (10)

1. A method for building a reflow user prediction model is characterized by comprising the following steps:
collecting a characteristic field set representing user internet access attributes of a user with a reflow label identified in advance under multiple dimensions;
optimizing a feature field set representing user internet access attributes under multiple dimensions to obtain an effective feature field set;
and inputting the effective characteristic field set corresponding to each user marked with the reflow label as a training sample into a training network model for training so as to establish a reflow user prediction model.
2. The method according to claim 1, wherein the optimizing the feature field set characterizing the internet access attribute of the user in multiple dimensions to obtain an effective feature field set comprises:
screening the characteristic fields representing the internet surfing attributes of the user under multiple dimensions according to preset screening rules to obtain an effective characteristic field set, wherein the screening rules at least comprise one or more of the following combinations:
deleting the characteristic fields with the missing value ratio larger than a preset first threshold value in the characteristic field set;
deleting the characteristic fields of which the occupation ratios of the same record values are greater than a preset second threshold value from a plurality of selected record values belonging to a certain category in the characteristic field set;
deleting the characteristic fields of which the variation coefficients are smaller than a preset third threshold in the characteristic field set;
and deleting the characteristic fields of which the correlation coefficients are smaller than a preset fourth threshold value in the characteristic field set.
3. The method according to claim 1, wherein the optimizing the feature fields representing the internet access attributes of the user in multiple dimensions to obtain an effective feature field set comprises:
standardizing a characteristic field set representing user internet access attributes of a user with a reflow label marked in advance under multiple dimensions to obtain a standardized vector;
finding a second standardized vector which is closest to the first standardized vector with the missing field according to a proximity algorithm;
determining a weighting value from the distance, wherein the weighting value decreases as the distance increases;
weighting a target field corresponding to the missing field in a second standardized vector according to the weighted value to obtain a filling field;
and filling the filling field into the missing field.
4. The method of claim 1, wherein after the valid feature field sets corresponding to the plurality of users identified with reflow labels are input into a training network model as training samples to be trained to build a reflow user prediction model, the method further comprises:
feeding a new training sample to the backflow user prediction model for training;
and updating the network parameters of the reflow user prediction model.
5. The method of claim 1, wherein the plurality of dimensions comprise at least one of a user basic information dimension, a user consumption information dimension, a user location information dimension, and a user terminal information dimension.
6. The method of claim 1, wherein before the collecting the set of feature fields characterizing user surfing attributes of the user who is pre-identified with the reflow tag in multiple dimensions, the method further comprises:
for a login account with a target network in a first target time period, logging out the login account in a second time period after the first time period, and recovering the used user identification reflow label by the login account in a third time period after the second time period.
7. The method of claim 1, wherein the training network model is a random forest algorithm model, a decision tree algorithm model, a neural network model.
8. A reflow user prediction model creation apparatus, comprising:
the data acquisition unit is configured to acquire a characteristic field set representing user internet surfing attributes of a user which is marked with a reflow label in advance under multiple dimensions;
the data processing unit is configured to optimize a characteristic field set representing the internet access attribute of the user under multiple dimensions to obtain an effective characteristic field set;
and the model establishing unit is configured to input the effective characteristic field set corresponding to each user identified with the reflow label into the training network model as a training sample for training so as to establish a reflow user prediction model.
9. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the reflow user prediction model building method of any of claims 1 to 7.
10. A storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the reflow user prediction model building method of any one of claims 1 to 7.
CN202010615016.9A 2020-06-30 2020-06-30 Reflow user prediction model establishing method and device, electronic equipment and storage medium Pending CN113869544A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010615016.9A CN113869544A (en) 2020-06-30 2020-06-30 Reflow user prediction model establishing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010615016.9A CN113869544A (en) 2020-06-30 2020-06-30 Reflow user prediction model establishing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113869544A true CN113869544A (en) 2021-12-31

Family

ID=78981297

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010615016.9A Pending CN113869544A (en) 2020-06-30 2020-06-30 Reflow user prediction model establishing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113869544A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114679390A (en) * 2022-03-30 2022-06-28 中国联合网络通信集团有限公司 Method and device for determining backspacing account and computer readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114679390A (en) * 2022-03-30 2022-06-28 中国联合网络通信集团有限公司 Method and device for determining backspacing account and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN107818344B (en) Method and system for classifying and predicting user behaviors
CN109544166A (en) A kind of Risk Identification Method and device
CN109360089B (en) Loan risk prediction method and device
CN112347367B (en) Information service providing method, apparatus, electronic device and storage medium
CN110826006B (en) Abnormal collection behavior identification method and device based on privacy data protection
CN108416616A (en) The sort method and device of complaints and denunciation classification
CN110647683B (en) Information recommendation method and device
CN108563680A (en) Resource recommendation method and device
CN110069545B (en) Behavior data evaluation method and device
CN110674144A (en) User portrait generation method and device, computer equipment and storage medium
CN110674188A (en) Feature extraction method, device and equipment
CN111695084A (en) Model generation method, credit score generation method, device, equipment and storage medium
CN113205403A (en) Method and device for calculating enterprise credit level, storage medium and terminal
WO2023029397A1 (en) Training data acquisition method, abnormal behavior recognition network training method and apparatus, computer device, storage medium, computer program and computer program product
CN111582872A (en) Abnormal account detection model training method, abnormal account detection device and abnormal account detection equipment
CN115147130A (en) Problem prediction method, apparatus, storage medium, and program product
CN110659930A (en) Consumption upgrading method and device based on user behaviors, storage medium and equipment
CN113869544A (en) Reflow user prediction model establishing method and device, electronic equipment and storage medium
CN111275071B (en) Prediction model training method, prediction device and electronic equipment
CN110334936B (en) Method, device and equipment for constructing credit qualification scoring model
CN112328802A (en) Data processing method and device and server
CN115456801A (en) Artificial intelligence big data wind control system, method and storage medium for personal credit
CN114840762A (en) Recommended content determining method and device and electronic equipment
CN111461892B (en) Method and device for selecting derived variables of risk identification model
CN115168700A (en) Information flow recommendation method, system and medium based on pre-training algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination