CN111190967A - User multi-dimensional data processing method and device and electronic equipment - Google Patents

User multi-dimensional data processing method and device and electronic equipment Download PDF

Info

Publication number
CN111190967A
CN111190967A CN201911290330.8A CN201911290330A CN111190967A CN 111190967 A CN111190967 A CN 111190967A CN 201911290330 A CN201911290330 A CN 201911290330A CN 111190967 A CN111190967 A CN 111190967A
Authority
CN
China
Prior art keywords
user
data
characteristic
machine learning
generate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911290330.8A
Other languages
Chinese (zh)
Other versions
CN111190967B (en
Inventor
董静
常富洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qilu Information Technology Co Ltd
Original Assignee
Beijing Qilu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qilu Information Technology Co Ltd filed Critical Beijing Qilu Information Technology Co Ltd
Priority to CN201911290330.8A priority Critical patent/CN111190967B/en
Priority claimed from CN201911290330.8A external-priority patent/CN111190967B/en
Publication of CN111190967A publication Critical patent/CN111190967A/en
Application granted granted Critical
Publication of CN111190967B publication Critical patent/CN111190967B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The disclosure relates to a user multi-dimensional data processing method and device, electronic equipment and a computer readable medium. The method comprises the following steps: splitting multi-dimensional data of a user according to dimensions to generate a plurality of data units; determining a plurality of user characteristic sub-models based on the plurality of data units; inputting each data unit of the plurality of data units into a corresponding user characteristic model respectively to generate a plurality of characteristic values; and determining a user characteristic based on the plurality of characteristic values. According to the user multi-dimensional data processing method, the user multi-dimensional data processing device, the electronic equipment and the computer readable medium, the machine learning model corresponding to the user data is determined according to the characteristics of the user data, and accurate user characteristic analysis results can be obtained.

Description

User multi-dimensional data processing method and device and electronic equipment
Technical Field
The present disclosure relates to the field of computer information processing, and in particular, to a method and an apparatus for processing user multidimensional data, an electronic device, and a computer readable medium.
Background
Machine learning has been developed greatly in various artificial intelligence research fields nowadays, and in most of current application scenes, people can conveniently find a machine learning model suitable for self problems. For the general application of the machine learning model, a user firstly determines the machine learning model of a certain category or algorithm, then the user inputs specific data according to a specific problem which the user wants to solve, the machine learning model establishes a specific task, then the machine learning model is trained through the specific data, and after the training is finished, the machine learning model suitable for a certain specific task is obtained. In general, even though the same algorithm of the machine learning model is used, the machine learning models trained with different data are completely different.
Feature engineering is the process of utilizing relevant knowledge of the data domain to create features that enable machine learning algorithms to achieve optimal performance. For common user feature extraction, a common method of feature engineering is to acquire original user data; meaningful features and attributes are then obtained, processed, and extracted from the data using data processing techniques to generate multi-dimensional user data. After the feature engineering variable data are extracted, multi-dimensional user data are input into a machine learning period model, the machine learning model carries out operation based on the data, the processing time and accuracy of the feature engineering variable data greatly affect the calculation time and calculation accuracy of the whole machine learning model, and for the same batch of user data, the multi-dimensional user data generated by different feature works are different, so that the obtained machine learning models are different, and the final user feature results are different. How to minimize the result difference caused by processing different feature engineering data is one of the problems to be solved at present.
The above information disclosed in this background section is only for enhancement of understanding of the background of the disclosure and therefore it may contain information that does not constitute prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
In view of this, the present disclosure provides a method and an apparatus for processing user multidimensional data, an electronic device, and a computer readable medium, which determine a machine learning model corresponding to user data according to characteristics of the user data, and can obtain an accurate user characteristic analysis result.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to an aspect of the present disclosure, a method for processing multidimensional data of a user is provided, the method including: splitting multi-dimensional data of a user according to dimensions to generate a plurality of data units; determining a plurality of user characteristic sub-models based on the plurality of data units; inputting each data unit of the plurality of data units into a corresponding user characteristic model respectively to generate a plurality of characteristic values; and determining a user characteristic based on the plurality of characteristic values.
Optionally, the method further comprises: generating the plurality of user feature sub-models based on multi-dimensional data of historical users and a plurality of machine learning models.
Optionally, generating the plurality of user feature sub-models based on the multi-dimensional data of the historical user and a plurality of machine learning models comprises: splitting multi-dimensional data of a historical user according to dimensions to generate a plurality of historical data units; determining a plurality of machine learning models based on features of the plurality of historical data units; and training the plurality of machine learning models through the plurality of historical data units respectively to generate the plurality of user characteristic sub-models.
Optionally, determining a plurality of machine learning models based on the features of the plurality of historical data units comprises: analyzing the data characteristics of the historical data unit to determine a corresponding machine learning model; and/or analyzing the data distribution of the historical data unit to determine a corresponding machine learning model; and/or analyzing the data state of the historical data unit to determine its corresponding machine learning model.
Optionally, splitting the multidimensional data of the user according to the dimensions to generate a plurality of data units, including: splitting multi-dimensional data of a user according to dimensions to generate a plurality of user data; and performing data processing on the plurality of user data to generate the plurality of data units.
Optionally, determining a plurality of user characteristic sub-models based on the plurality of data units comprises: determining a plurality of attribute characteristics of the plurality of data units, and screening the plurality of user characteristic submodels from a characteristic model library based on the attribute characteristics.
Optionally, inputting each of the plurality of data units into its corresponding user feature sub-model, and generating a plurality of characteristic values, includes: inputting the data unit into the corresponding user characteristic sub-model; and calculating the user characteristic submodel based on the data in the data unit to generate a characteristic value corresponding to the data unit.
Optionally, determining the user characteristic based on the plurality of characteristic values comprises: inputting the plurality of feature values into a user feature model to determine a user feature.
Optionally, the method further comprises: the user feature model is generated through a plurality of feature values of historical users and a machine learning model.
Optionally, generating the user feature model by a plurality of feature values of a historical user and a machine learning model comprises: taking a plurality of characteristic values corresponding to users who have historically undergone landing conversion as forward label data; taking a plurality of characteristic values corresponding to users who have not undergone landing conversion historically as negative label data; and training a machine learning model through the positive label data and the negative label data to generate the user feature model.
According to an aspect of the present disclosure, a user multidimensional data processing apparatus is provided, the apparatus comprising: the splitting module is used for splitting the multi-dimensional data of the user according to the dimensions to generate a plurality of data units; a unit module for determining a plurality of user characteristic sub-models based on the plurality of data units; the input module is used for respectively inputting each data unit in the plurality of data units into the corresponding user characteristic model to generate a plurality of characteristic values; a feature module to determine a user feature based on the plurality of feature values.
Optionally, the method further comprises: and the sub-model training module is used for generating the plurality of user characteristic sub-models based on the multi-dimensional data of the historical users and the plurality of machine learning models.
Optionally, the submodel training module is further configured to split the multidimensional data of the historical user according to the dimensionality to generate a plurality of historical data units; determining a plurality of machine learning models based on features of the plurality of historical data units; and training the plurality of machine learning models through the plurality of historical data units respectively to generate the plurality of user characteristic sub-models.
Optionally, the sub-model training module includes: the model determining unit is used for analyzing the data characteristics of the historical data unit to determine a corresponding machine learning model; and/or analyzing the data distribution of the historical data unit to determine a corresponding machine learning model; and/or analyzing the data state of the historical data unit to determine its corresponding machine learning model.
Optionally, splitting is performed to generate a plurality of user data; and the processing unit is used for carrying out data processing on the plurality of user data to generate the plurality of data units.
Optionally, the unit module includes: and the screening unit is used for determining a plurality of attribute characteristics of the data units and screening the user characteristic submodels from the characteristic model library based on the attribute characteristics.
Optionally, the input module includes: the input unit is used for inputting the data unit into the corresponding user characteristic submodel; and the calculating unit is used for calculating the user characteristic sub-model based on the data in the data unit and generating a characteristic value corresponding to the data unit.
Optionally, the feature module is further configured to input the plurality of feature values into a user feature model to determine a user feature.
Optionally, the method further comprises: and the model training unit is used for generating the user feature model through a plurality of feature values of the historical user and the machine learning model.
Optionally, the model training unit is further configured to use a plurality of feature values corresponding to users who have historically undergone landing conversion as forward label data; taking a plurality of characteristic values corresponding to users who have not undergone landing conversion historically as negative label data; and training a machine learning model through the positive label data and the negative label data to generate the user feature model.
According to an aspect of the present disclosure, an electronic device is provided, the electronic device including: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method as above.
According to an aspect of the disclosure, a computer-readable medium is proposed, on which a computer program is stored, which program, when being executed by a processor, carries out the method as above.
According to the user multi-dimensional data processing method, the device, the electronic equipment and the computer readable medium, the multi-dimensional data of the user are split according to the dimension, and a plurality of data units are generated; determining a plurality of user characteristic sub-models based on the plurality of data units; inputting each data unit of the plurality of data units into a corresponding user characteristic model respectively to generate a plurality of characteristic values; and determining a user characteristic mode based on the characteristic values, determining a corresponding machine learning model according to the characteristics of the user data, and acquiring an accurate user characteristic analysis result.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings. The drawings described below are merely some embodiments of the present disclosure, and other drawings may be derived from those drawings by those of ordinary skill in the art without inventive effort.
Fig. 1 is a system block diagram illustrating a method and apparatus for processing multidimensional data of a user according to an exemplary embodiment.
FIG. 2 is a flow diagram illustrating a method for user multidimensional data processing in accordance with an exemplary embodiment.
FIG. 3 is a flow diagram illustrating a method for user multidimensional data processing in accordance with another exemplary embodiment.
FIG. 4 is a schematic diagram illustrating a method of user multidimensional data processing in accordance with another exemplary embodiment.
FIG. 5 is a block diagram illustrating a user multidimensional data processing apparatus in accordance with an exemplary embodiment.
FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment.
FIG. 7 is a block diagram illustrating a computer-readable medium in accordance with an example embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various components, these components should not be limited by these terms. These terms are used to distinguish one element from another. Thus, a first component discussed below may be termed a second component without departing from the teachings of the disclosed concept. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It is to be understood by those skilled in the art that the drawings are merely schematic representations of exemplary embodiments, and that the blocks or processes shown in the drawings are not necessarily required to practice the present disclosure and are, therefore, not intended to limit the scope of the present disclosure.
Fig. 1 is a system block diagram illustrating a method and apparatus for processing multidimensional data of a user according to an exemplary embodiment.
As shown in fig. 1, the system architecture 10 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The terminal devices 101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a financial services application, a shopping application, a web browser application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server analyzing user multidimensional data stored on the terminal devices 101, 102, 103. The background management server can analyze and process the received multidimensional data of the user, and feed back a processing result (user characteristics) to an administrator.
The server 105 may, for example, split the multidimensional data of the user by dimensions, generating a plurality of data units; server 105 may determine a plurality of user characteristic submodels, e.g., based on the plurality of data units; the server 105 may, for example, input each of the plurality of data units into its corresponding user feature model, generating a plurality of characteristic values; server 105 may determine a user characteristic, for example, based on the plurality of characteristic values.
Server 105 may also generate the plurality of user feature sub-models, for example, based on the multi-dimensional data of the historical users and the plurality of machine learning models.
The server 105 may also generate the user feature model, for example, from a plurality of feature values and a machine learning model of the historical user.
The server 105 may be a single entity server, or may be composed of a plurality of servers, for example, it should be noted that the user multidimensional data processing method provided by the embodiment of the present disclosure may be executed by the server 105, and accordingly, the user multidimensional data processing apparatus may be disposed in the server 105. And the web page end provided for the user to browse the financial service platform is generally positioned in the terminal equipment 101, 102 and 103.
FIG. 2 is a flow diagram illustrating a method for user multidimensional data processing in accordance with an exemplary embodiment. The user multidimensional data processing method 20 includes at least steps S202 to S208.
As shown in fig. 2, in S202, the multidimensional data of the user is divided according to the dimensions, and a plurality of data units are generated. The multi-dimensional data of the user can comprise basic information of the user, behavior data of the user, state data of the user and the like, wherein in the basic information of the user, multiple dimensions of the user such as age, occupation, income, address and the like can be included; furthermore, the behavior data of the user can include the operation behavior data of the user on different platforms; the status data of the user may include a registration status of the user, a borrowing status of the user, a repayment status of the user, a debt status of the user, and the like.
In one embodiment, multi-dimensional data of a user can be split according to dimensions to generate a plurality of user data; and performing data processing on the plurality of user data to generate the plurality of data units. Each user data can be used as a dimension of the user data, and the multi-dimensional data of the user is divided into a plurality of data units.
The multidimensional data of the user can be acquired through a third party data source and can also be accumulated through user information on a booking platform, and the disclosure is not limited once.
In S204, a plurality of user characteristic submodels are determined based on the plurality of data units. The method comprises the following steps: determining a plurality of attribute characteristics of the plurality of data units, and screening the plurality of user characteristic submodels from a characteristic model library based on the attribute characteristics.
And generating the plurality of user characteristic submodels based on the multi-dimensional data of the historical users and a plurality of machine learning models.
Each user characteristic submodel in the plurality of user characteristic submodels has a characteristic identification thereof, each data unit also has a characteristic identification thereof, and the corresponding user characteristic submodel is determined for each data unit based on the characteristic identification.
In one embodiment, the feature identification may include an "interest feature," an "age feature," an "occupation feature, a" credit feature, "an" liveness feature, "and so forth.
In S206, each of the plurality of data units is input into its corresponding user feature model, and a plurality of characteristic values are generated. The method comprises the following steps: inputting the data unit into the corresponding user characteristic sub-model; and calculating the user characteristic submodel based on the data in the data unit to generate a characteristic value corresponding to the data unit.
In S208, a user characteristic is determined based on the plurality of characteristic values. The method comprises the following steps: inputting the plurality of feature values into a user feature model to determine a user feature. Wherein the user feature model may be generated by a machine learning model and a plurality of feature values of the historical user.
In one embodiment, generating the user feature model from a plurality of feature values of a historical user and a machine learning model comprises: taking a plurality of characteristic values corresponding to users who have historically undergone landing conversion as forward label data; taking a plurality of characteristic values corresponding to users who have not undergone landing conversion historically as negative label data; and training a machine learning model through the positive label data and the negative label data to generate the user feature model.
It is worth mentioning that the user characteristic model can also be used for auxiliary training through other characteristics of the user, and when the landing conversion of the user is concerned, the landing conversion data of the user can be introduced for collaborative training; user credit data may be introduced into model training when user credit is of interest, and the disclosure is not limited thereto.
According to the user multi-dimensional data processing method, multi-dimensional data of a user are split according to dimensions, and a plurality of data units are generated; determining a plurality of user characteristic sub-models based on the plurality of data units; inputting each data unit of the plurality of data units into a corresponding user characteristic model respectively to generate a plurality of characteristic values; and determining a user characteristic mode based on the characteristic values, determining a corresponding machine learning model according to the characteristics of the user data, and acquiring an accurate user characteristic analysis result.
It should be clearly understood that this disclosure describes how to make and use particular examples, but the principles of this disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.
FIG. 3 is a flow diagram illustrating a method for user multidimensional data processing in accordance with another exemplary embodiment. The flow shown in fig. 3 is a detailed description of "generating the plurality of user feature sub-models based on multi-dimensional data of historical users and a plurality of machine learning models".
As shown in fig. 3, in S302, the multi-dimensional data of the historical user is divided according to the dimensions, and a plurality of historical data units are generated.
In S304, a plurality of machine learning models are determined based on the features of the plurality of historical data units.
In one embodiment, may include: analyzing the data characteristics of the historical data unit to determine a corresponding machine learning model; and/or analyzing the data distribution of the historical data unit to determine a corresponding machine learning model; and/or analyzing the data state of the historical data unit to determine its corresponding machine learning model.
Machine learning models are classified into supervised learning and unsupervised learning according to the types of data that can be used. Supervised learning mainly includes models for classification and for regression:
1) and (4) classification: linear classifiers (e.g., LR), Support Vector Machines (SVM), Naive Bayes (NB), K-nearest neighbors (KNN), Decision Trees (DT), ensemble models (RF/GDBT, etc.)
2) And (3) regression: linear regression, Support Vector Machine (SVM), K-nearest neighbor (KNN), regression Tree (DT), Integrated model (ExtraTrees/RF/GDBT)
2. The unsupervised learning mainly comprises the following steps: data clustering (K-means)/data dimensionality reduction (PCA), and the like.
In S306, the plurality of machine learning models are trained by the plurality of historical data units, respectively, to generate the plurality of user feature sub-models.
The corresponding machine learning model may be determined for different data units based on their characteristics.
A random forest (RandomForest:) model can be trained, for example, by a user data unit of which the attribute is "user state feature", wherein the random forest is a random integration of decision trees, and the vulnerability of the random forest to attack is improved to a certain extent. The method is suitable for the time when the data dimension is not too high (dozens) and the higher accuracy is needed. The method does not need to adjust too many parameters, and is suitable for being used firstly when the method is not known.
For example, a support vector machine model can be trained through a user data unit with the attribute of 'user credit characteristics', wherein the support vector machine is a generalized linear classifier for binary classification of data in a supervised learning mode, a decision boundary of the generalized linear classifier is that a maximum margin hyperplane for solving learning samples keeps the distance between the samples as much as possible, and the attack resistance is strong.
The Markov model may be trained, for example, by a user data element having an attribute of "user operating characteristics". The markov model is primarily used to predict the later states from the previous states.
The training may also be performed by other machine learning models to generate a plurality of user feature sub-models, which is not limited by the present disclosure.
FIG. 4 is a schematic diagram illustrating a method of user multidimensional data processing in accordance with another exemplary embodiment.
As shown in the framework of fig. 4, the multi-dimensional data of the user is split first, and then the multi-dimensional data is input into the corresponding user feature submodels, the user feature submodels are calculated to obtain sub feature values corresponding to the user data of the dimension, and then the sub feature values are input into the user feature model to determine final user feature values.
Those skilled in the art will appreciate that all or part of the steps implementing the above embodiments are implemented as computer programs executed by a CPU. When executed by the CPU, performs the functions defined by the above-described methods provided by the present disclosure. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic or optical disk, or the like.
Furthermore, it should be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.
FIG. 5 is a block diagram illustrating a user multidimensional data processing apparatus in accordance with an exemplary embodiment. As shown in fig. 5, the user multidimensional data processing apparatus 50 includes: a splitting module 502, a unit module 504, an input module 506, a feature module 508, a sub-model training module 510, and a model training module 512.
The splitting module 502 is configured to split multidimensional data of a user according to dimensions to generate a plurality of data units; the splitting module 502 includes: the dimension unit is used for splitting the multi-dimensional data of the user according to the dimension to generate a plurality of user data; and the processing unit is used for carrying out data processing on the plurality of user data to generate the plurality of data units.
Element module 504 is for determining a plurality of user characteristic submodels based on the plurality of data elements; the unit module 504 includes: and the screening unit is used for determining a plurality of attribute characteristics of the data units and screening the user characteristic submodels from the characteristic model library based on the attribute characteristics.
The input module 506 is configured to input each of the plurality of data units into a corresponding user feature model, so as to generate a plurality of characteristic values; the input module 506 includes: the input unit is used for inputting the data unit into the corresponding user characteristic submodel; and the calculating unit is used for calculating the user characteristic sub-model based on the data in the data unit and generating a characteristic value corresponding to the data unit.
The feature module 508 is to determine a user feature based on the plurality of feature values. The feature module 508 is also configured to input the plurality of feature values into a user feature model to determine a user feature.
The submodel training module 510 is configured to generate the plurality of user feature submodels based on the multi-dimensional data of the historical users and the plurality of machine learning models. The submodel training module 510 is further configured to split the multi-dimensional data of the historical user according to dimensions to generate a plurality of historical data units; determining a plurality of machine learning models based on features of the plurality of historical data units; and training the plurality of machine learning models through the plurality of historical data units respectively to generate the plurality of user characteristic sub-models.
The sub-model training module 510 comprises: the model determining unit is used for analyzing the data characteristics of the historical data unit to determine a corresponding machine learning model; and/or analyzing the data distribution of the historical data unit to determine a corresponding machine learning model; and/or analyzing the data state of the historical data unit to determine its corresponding machine learning model.
The model training module 512 is configured to generate the user feature model by using a plurality of feature values of the historical user and a machine learning model. The model training module 512 is further configured to use a plurality of feature values corresponding to users who have historically undergone floor-based conversion as forward label data; taking a plurality of characteristic values corresponding to users who have not undergone landing conversion historically as negative label data; and training a machine learning model through the positive label data and the negative label data to generate the user feature model.
According to the user multi-dimensional data processing device, multi-dimensional data of a user are split according to dimensions, and a plurality of data units are generated; determining a plurality of user characteristic sub-models based on the plurality of data units; inputting each data unit of the plurality of data units into a corresponding user characteristic model respectively to generate a plurality of characteristic values; and determining a user characteristic mode based on the characteristic values, determining a corresponding machine learning model according to the characteristics of the user data, and acquiring an accurate user characteristic analysis result.
FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment.
An electronic device 600 according to this embodiment of the disclosure is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present disclosure.
As shown in fig. 6, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 that connects the various system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.
Wherein the storage unit stores program code executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present disclosure described in the above-mentioned electronic prescription flow processing method section of the present specification. For example, the processing unit 610 may perform the steps shown in fig. 2 and 3.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 600' (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, as shown in fig. 7, the technical solution according to the embodiment of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the above method according to the embodiment of the present disclosure.
The software product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The computer readable medium carries one or more programs which, when executed by a device, cause the computer readable medium to perform the functions of: splitting multi-dimensional data of a user according to dimensions to generate a plurality of data units; determining a plurality of user characteristic sub-models based on the plurality of data units; inputting each data unit of the plurality of data units into a corresponding user characteristic model respectively to generate a plurality of characteristic values; and determining a user characteristic based on the plurality of characteristic values.
Those skilled in the art will appreciate that the modules described above may be distributed in the apparatus according to the description of the embodiments, or may be modified accordingly in one or more apparatuses unique from the embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Exemplary embodiments of the present disclosure are specifically illustrated and described above. It is to be understood that the present disclosure is not limited to the precise arrangements, instrumentalities, or instrumentalities described herein; on the contrary, the disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A user multi-dimensional data processing method is characterized by comprising the following steps:
splitting multi-dimensional data of a user according to dimensions to generate a plurality of data units;
determining a plurality of user characteristic sub-models based on the plurality of data units;
inputting each data unit of the plurality of data units into a corresponding user characteristic model respectively to generate a plurality of characteristic values; and
a user characteristic is determined based on the plurality of characteristic values.
2. The method of claim 1, further comprising:
generating the plurality of user feature sub-models based on multi-dimensional data of historical users and a plurality of machine learning models.
3. The method of claims 1-2, wherein generating the plurality of user feature sub-models based on multi-dimensional data of historical users and a plurality of machine learning models comprises:
splitting multi-dimensional data of a historical user according to dimensions to generate a plurality of historical data units;
determining a plurality of machine learning models based on features of the plurality of historical data units;
and training the plurality of machine learning models through the plurality of historical data units respectively to generate the plurality of user characteristic sub-models.
4. The method of claims 1-3, wherein determining a plurality of machine learning models based on features of the plurality of units of historical data comprises:
analyzing the data characteristics of the historical data unit to determine a corresponding machine learning model; and/or
Analyzing the data distribution of the historical data unit to determine a corresponding machine learning model; and/or
The data states of the historical data units are analyzed to determine their corresponding machine learning models.
5. The method of claims 1-4, wherein splitting multidimensional data of a user by dimensions to generate a plurality of data units comprises:
splitting multi-dimensional data of a user according to dimensions to generate a plurality of user data; and
and performing data processing on the plurality of user data to generate the plurality of data units.
6. The method of claims 1-5, wherein determining a plurality of user characteristic submodels based on the plurality of data units comprises:
determining a plurality of attribute characteristics of the plurality of data units, an
And screening the plurality of user characteristic submodels from the characteristic model library based on the attribute characteristics.
7. The method of claims 1-6, wherein inputting each of the plurality of data units into its corresponding user characteristic submodel, generating a plurality of characteristic values comprises:
inputting the data unit into the corresponding user characteristic sub-model; and
and calculating the user characteristic submodel based on the data in the data unit to generate a characteristic value corresponding to the data unit.
8. A user multidimensional data processing apparatus, comprising:
the splitting module is used for splitting the multi-dimensional data of the user according to the dimensions to generate a plurality of data units;
a unit module for determining a plurality of user characteristic sub-models based on the plurality of data units;
the input module is used for respectively inputting each data unit in the plurality of data units into the corresponding user characteristic model to generate a plurality of characteristic values;
a feature module to determine a user feature based on the plurality of feature values.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN201911290330.8A 2019-12-16 User multidimensional data processing method and device and electronic equipment Active CN111190967B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911290330.8A CN111190967B (en) 2019-12-16 User multidimensional data processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911290330.8A CN111190967B (en) 2019-12-16 User multidimensional data processing method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111190967A true CN111190967A (en) 2020-05-22
CN111190967B CN111190967B (en) 2024-04-26

Family

ID=

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667355A (en) * 2020-05-29 2020-09-15 北京睿知图远科技有限公司 Big data analysis method and system
CN112017062A (en) * 2020-07-15 2020-12-01 北京淇瑀信息科技有限公司 Resource limit distribution method and device based on guest group subdivision and electronic equipment
CN112508202A (en) * 2021-02-07 2021-03-16 北京淇瑀信息科技有限公司 Method and device for adjusting model stability and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201009066D0 (en) * 2010-05-28 2010-07-14 British Broadcasting Corp Processing audio-video data to produce metadata
KR20180014543A (en) * 2016-08-01 2018-02-09 한국전자통신연구원 Apparatus and Method for Analizing Multi-dimensional Data in Simulation based on Performer
CN110059112A (en) * 2018-09-12 2019-07-26 中国平安人寿保险股份有限公司 Usage mining method and device based on machine learning, electronic equipment, medium
CN110110012A (en) * 2019-04-23 2019-08-09 上海淇玥信息技术有限公司 User's expectancy appraisal procedure, device, electronic equipment and readable medium
CN110111139A (en) * 2019-04-23 2019-08-09 上海淇玥信息技术有限公司 Behavior prediction model generation method, device, electronic equipment and readable medium
CN110163661A (en) * 2019-04-23 2019-08-23 上海淇玥信息技术有限公司 Marketing message promotion method, device, electronic equipment and computer-readable medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201009066D0 (en) * 2010-05-28 2010-07-14 British Broadcasting Corp Processing audio-video data to produce metadata
KR20180014543A (en) * 2016-08-01 2018-02-09 한국전자통신연구원 Apparatus and Method for Analizing Multi-dimensional Data in Simulation based on Performer
CN110059112A (en) * 2018-09-12 2019-07-26 中国平安人寿保险股份有限公司 Usage mining method and device based on machine learning, electronic equipment, medium
CN110110012A (en) * 2019-04-23 2019-08-09 上海淇玥信息技术有限公司 User's expectancy appraisal procedure, device, electronic equipment and readable medium
CN110111139A (en) * 2019-04-23 2019-08-09 上海淇玥信息技术有限公司 Behavior prediction model generation method, device, electronic equipment and readable medium
CN110163661A (en) * 2019-04-23 2019-08-23 上海淇玥信息技术有限公司 Marketing message promotion method, device, electronic equipment and computer-readable medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667355A (en) * 2020-05-29 2020-09-15 北京睿知图远科技有限公司 Big data analysis method and system
CN112017062A (en) * 2020-07-15 2020-12-01 北京淇瑀信息科技有限公司 Resource limit distribution method and device based on guest group subdivision and electronic equipment
CN112508202A (en) * 2021-02-07 2021-03-16 北京淇瑀信息科技有限公司 Method and device for adjusting model stability and electronic equipment
CN112508202B (en) * 2021-02-07 2021-07-30 北京淇瑀信息科技有限公司 Method and device for adjusting model stability and electronic equipment

Similar Documents

Publication Publication Date Title
US11093854B2 (en) Emoji recommendation method and device thereof
CN109471944B (en) Training method and device of text classification model and readable storage medium
US20200293722A1 (en) Word vector retrofitting method and apparatus
CN112348660A (en) Method and device for generating risk warning information and electronic equipment
CN110796482A (en) Financial data classification method and device for machine learning model and electronic equipment
CN111145009A (en) Method and device for evaluating risk after user loan and electronic equipment
CN111582314A (en) Target user determination method and device and electronic equipment
CN112015562A (en) Resource allocation method and device based on transfer learning and electronic equipment
CN111583018A (en) Credit granting strategy management method and device based on user financial performance analysis and electronic equipment
CN112348662B (en) Risk assessment method and device based on user occupation prediction and electronic equipment
US11037073B1 (en) Data analysis system using artificial intelligence
CN111191825A (en) User default prediction method and device and electronic equipment
CN111191677B (en) User characteristic data generation method and device and electronic equipment
CN111179051A (en) Financial target customer determination method and device and electronic equipment
CN111241273A (en) Text data classification method and device, electronic equipment and computer readable medium
CN113610366A (en) Risk warning generation method and device and electronic equipment
CN113610625A (en) Overdue risk warning method and device and electronic equipment
CN108768742B (en) Network construction method and device, electronic equipment and storage medium
CN111190967B (en) User multidimensional data processing method and device and electronic equipment
US11816127B2 (en) Quality assessment of extracted features from high-dimensional machine learning datasets
US20220156304A1 (en) Relationship discovery and quantification
CN111190967A (en) User multi-dimensional data processing method and device and electronic equipment
KR102215259B1 (en) Method of analyzing relationships of words or documents by subject and device implementing the same
CN112328784B (en) Data information classification method and device
CN113612777A (en) Training method, traffic classification method, device, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant