CN111191677B - User characteristic data generation method and device and electronic equipment - Google Patents

User characteristic data generation method and device and electronic equipment Download PDF

Info

Publication number
CN111191677B
CN111191677B CN201911263161.9A CN201911263161A CN111191677B CN 111191677 B CN111191677 B CN 111191677B CN 201911263161 A CN201911263161 A CN 201911263161A CN 111191677 B CN111191677 B CN 111191677B
Authority
CN
China
Prior art keywords
user
word vector
word
information
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911263161.9A
Other languages
Chinese (zh)
Other versions
CN111191677A (en
Inventor
李达
张彤彤
苏绥绥
常富洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qilu Information Technology Co Ltd
Original Assignee
Beijing Qilu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qilu Information Technology Co Ltd filed Critical Beijing Qilu Information Technology Co Ltd
Priority to CN201911263161.9A priority Critical patent/CN111191677B/en
Publication of CN111191677A publication Critical patent/CN111191677A/en
Application granted granted Critical
Publication of CN111191677B publication Critical patent/CN111191677B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Abstract

The present disclosure relates to a user characteristic data generation method, apparatus, electronic device, and computer-readable medium. The method comprises the following steps: acquiring a terminal application list of a user, wherein the terminal application list comprises installed application information; respectively inputting application information in the terminal application list into a first word vector model and a second word vector model to generate a plurality of first word vectors and a plurality of second word vectors; information fusion is carried out on the plurality of first word vectors and the plurality of second word vectors, and word vector information of the user is generated; and generating feature data of the user through the word vector information. The user characteristic data generation method, the device, the electronic equipment and the computer readable medium can accurately analyze the characteristics of the user from multiple dimensions to generate data for accurately describing the characteristics of the user, and can perform more comprehensive risk analysis on the user through the user characteristic data.

Description

User characteristic data generation method and device and electronic equipment
Technical Field
The present disclosure relates to the field of computer information processing, and in particular, to a method, an apparatus, an electronic device, and a computer readable medium for generating user feature data.
Background
Individual users or business users often conduct borrowing activities by financial services institutions, for which the borrowing activities of the users are likely to pose a risk to the financial services companies. Currently, the financial risk is often determined by analyzing basic information and behavior information of a user, where the basic information may include, for example, age, sex, occupation, region, etc. of the user, and the behavior information may include borrowing information, repayment information, default information, etc. of the user. How to mine more information capable of reflecting a certain aspect of characteristics of a user so as to more comprehensively analyze and judge financial risks of the user is a subject of wide attention at present.
The prior art focuses on app classification information and manual experience of clients for risk perception of app information, and after a new case appears each time, the new case needs to be checked after a range is defined by an aesthetic co-worker. The result of this is that excessive human labor is used and that human labor may be fatigued excessively, resulting in errors. The traditional statistical model depends on manual experience, so that fine analysis and abnormal information expression are required to be carried out on the app information of the user, and the method is time-consuming and labor-consuming.
Accordingly, there is a need for a new user characteristic data generation method, apparatus, electronic device, and computer-readable medium.
The above information disclosed in the background section is only for enhancement of understanding of the background of the disclosure and therefore it may include information that does not form the prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
In view of this, the present disclosure provides a method, an apparatus, an electronic device, and a computer readable medium for generating user feature data, which can accurately analyze features of a user from multiple dimensions, generate data that accurately describes features of the user, and perform more comprehensive risk analysis on the user through the user feature data.
Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.
According to an aspect of the present disclosure, a user feature data generating method is provided, the method including: acquiring a terminal application list of a user, wherein the terminal application list comprises installed application information; respectively inputting application information in the terminal application list into a first word vector model and a second word vector model to generate a plurality of first word vectors and a plurality of second word vectors; information fusion is carried out on the plurality of first word vectors and the plurality of second word vectors, and word vector information of the user is generated; and generating feature data of the user through the word vector information.
Optionally, the method further comprises: and inputting the word vector data of the user into a user risk classification model to generate a risk classification identifier of the user and a corresponding risk probability thereof.
Optionally, the method further comprises: generating a first word vector model through a terminal application list of a historical user and a quick text classification method; and/or generating a second word vector model through a terminal application list of the historical user and a word vector conversion method.
Optionally, inputting the application information in the terminal application list into a first word vector model and a second word vector model respectively, and generating a plurality of first word vectors and a plurality of second word vectors, including: generating a word vector dictionary as the second word vector model through a word vector conversion method; and inputting application information in the terminal application list into the word vector dictionary to generate the second word vector.
Optionally, information fusion is performed on the plurality of first word vectors and the plurality of second word vectors, so as to generate word vector information of the user, including: generating a plurality of application word vectors by a plurality of first word vectors and the plurality of second word vectors; and carrying out information fusion on the plurality of application word vectors to generate word vector information of the user.
Optionally, generating a plurality of application word vectors from the plurality of first word vectors and the plurality of second word vectors includes: acquiring a first word vector and a second word vector corresponding to single application information; information fusion is carried out on the first word vector and the second word vector to generate an application word vector; and generating a plurality of application word vectors through a plurality of first word vectors and a plurality of second word vectors corresponding to all application information in the terminal application list.
Optionally, performing information fusion on the first word vector and the second word vector to generate an application word vector, including: and carrying out information fusion on the first word vector and the second word vector in a weighted average mode to generate the application word vector.
Optionally, the method further comprises: and training the multi-layer perceptron model through the risk classification identification of the historical user and the terminal application list to generate the user risk classification model.
Optionally, inputting the word vector data of the user into a user risk classification model to generate a risk classification identifier of the user and a corresponding risk probability thereof, and further includes: when the risk classification identifier of the user is an unknown identifier, determining a target classification identifier for the user; and retraining the multi-layer perceptron model through the terminal application list of the user and the target classification identifier to update the user risk classification model.
Optionally, determining a target classification identifier for the user includes: and determining target classification identification for the user through other risk classification models.
According to an aspect of the present disclosure, there is provided a user characteristic data generating apparatus, including: the list module is used for acquiring a terminal application list of a user, wherein the terminal application list comprises installed application information; the vector module is used for inputting the application information in the terminal application list into a first word vector model and a second word vector model respectively to generate a plurality of first word vectors and a plurality of second word vectors; the fusion module is used for carrying out information fusion on the plurality of first word vectors and the plurality of second word vectors to generate word vector information of the user; and a feature module for generating feature data of the user through the word vector information.
Optionally, the method further comprises: and the model module is used for inputting the word vector data of the user into a user risk classification model to generate a risk classification identifier of the user and a corresponding risk probability thereof.
Optionally, the method further comprises: the first training module is used for generating a first word vector model through a terminal application list of the historical user and a quick text classification method; and/or a second training module is used for generating a second word vector model through a terminal application list of the historical user and a word vector conversion method.
Optionally, the second training module includes: a dictionary unit for generating a word vector dictionary as the second word vector model by a word vector conversion method; and an input unit for inputting application information in the terminal application list into the word vector dictionary to generate the second word vector.
Optionally, the fusion module includes: a computing unit configured to generate a plurality of application word vectors from a plurality of first word vectors and the plurality of second word vectors; and the fusion unit is used for carrying out information fusion on the plurality of application word vectors to generate word vector information of the user.
Optionally, the fusion unit is further configured to obtain a first word vector and a second word vector corresponding to the single application information; information fusion is carried out on the first word vector and the second word vector to generate an application word vector; and generating a plurality of application word vectors through a plurality of first word vectors and a plurality of second word vectors corresponding to all application information in the terminal application list.
Optionally, the fusion unit is further configured to perform information fusion on the first word vector and the second word vector in a weighted average manner to generate the application word vector.
Optionally, the method further comprises: and the third training module is used for training the multi-layer perceptron model through the risk classification identification of the historical user and the terminal application list to generate the user risk classification model.
Optionally, the third training module further includes: the model updating unit is used for determining a target classification identifier for the user when the risk classification identifier of the user is an unknown identifier; and retraining the multi-layer perceptron model through the terminal application list of the user and the target classification identifier to update the user risk classification model.
Optionally, the model updating unit is further configured to determine a target classification identifier for the user through another risk classification model.
According to an aspect of the present disclosure, there is provided an electronic device including: one or more processors; a storage means for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the methods as described above.
According to an aspect of the present disclosure, a computer-readable medium is presented, on which a computer program is stored, which program, when being executed by a processor, implements a method as described above.
According to the user characteristic data generation method, the device, the electronic equipment and the computer readable medium, a terminal application list of a user is obtained, wherein the terminal application list comprises installed application information; respectively inputting application information in the terminal application list into a first word vector model and a second word vector model to generate a plurality of first word vectors and a plurality of second word vectors; information fusion is carried out on the plurality of first word vectors and the plurality of second word vectors, and word vector information of the user is generated; and the feature data of the user can be accurately analyzed from multiple dimensions by generating the feature data of the user through the word vector information, the data for accurately describing the feature of the user can be generated, and more comprehensive risk analysis can be performed on the user through the feature data of the user.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings. The drawings described below are merely examples of the present disclosure and other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a system block diagram illustrating a method and apparatus for generating user characteristic data according to an exemplary embodiment.
Fig. 2 is a flowchart illustrating a user characteristic data generation method according to an exemplary embodiment.
Fig. 3 is a flowchart illustrating a user characteristic data generation method according to another exemplary embodiment.
Fig. 4 is a schematic diagram illustrating a user characteristic data generation method according to another exemplary embodiment.
Fig. 5 is a flowchart illustrating a user characteristic data generation method according to another exemplary embodiment.
Fig. 6 is a block diagram illustrating a user characteristic data generating apparatus according to an exemplary embodiment.
Fig. 7 is a block diagram illustrating a user characteristic data generating apparatus according to another exemplary embodiment.
Fig. 8 is a block diagram of an electronic device, according to an example embodiment.
Fig. 9 is a block diagram of a computer-readable medium shown according to an example embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the disclosed aspects may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various components, these components should not be limited by these terms. These terms are used to distinguish one element from another element. Accordingly, a first component discussed below could be termed a second component without departing from the teachings of the concepts of the present disclosure. As used herein, the term "and/or" includes any one of the associated listed items and all combinations of one or more.
Those skilled in the art will appreciate that the drawings are schematic representations of example embodiments and that the modules or flows in the drawings are not necessarily required to practice the present disclosure, and therefore, should not be taken to limit the scope of the present disclosure.
With the development of internet information technology, smart phones have become an integral part of people's daily lives. Various APP realize different functions, and provide convenience and fun for life of people. The APP installation information on the mobile phone is inseparable from the personal preference of the user, or, in other words, the APP installation situation of a person can be regarded as a description feature of the person, so as to better understand personal features such as clients, sense client risks, and infer the preference of the clients.
The inventor of the present disclosure finds that at present, there are two main methods for feature mining of APP installation information, one is classification statistics of single APP under a two-to-three-level directory, the classification information can be regarded as that single APP information is observed on a coarser granularity and is taken as a feature of a client, and in this way, other general APP classifications are often not good enough for ascertaining the risk of the client except classification of some strong financial attributes or classification variables under fraud; the other is that buried point data of detailed use conditions of clients are analyzed and counted in a single APP, and the buried point data in the single APP are private and not easy to obtain, so that only specific APP merchants can obtain the buried point data.
Therefore, the APP installation list is regarded as a whole to be analyzed, the preference of the client is described and presumed by utilizing the integrity of the APP installation list, and the preference of the client can be more accurately described compared with the classification information; the embedded point data is used as data which can be acquired by a user during registration or application, and has wider application space compared with the embedded point data. The user characteristic data generation method in the present disclosure is described in detail below with reference to specific embodiments.
FIG. 1 is a system block diagram illustrating a method and apparatus for generating user characteristic data according to an exemplary embodiment.
As shown in fig. 1, the system architecture 10 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as financial service class applications, shopping class applications, web browser applications, instant messaging tools, mailbox clients, social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server providing support for financial service-like websites browsed by the user using the terminal devices 101, 102, 103. The background management server may analyze the received user data and feed back the processing result (e.g., user feature data) to an administrator of the financial service website.
The server 105 may, for example, obtain a list of terminal applications for the user, including installed application information therein; the server 105 may, for example, input the application information in the terminal application list into a first word vector model and a second word vector model respectively, to generate a plurality of first word vectors and a plurality of second word vectors; server 105 may, for example, perform information fusion on the plurality of first word vectors and the plurality of second word vectors to generate word vector information for the user; the server 105 may generate feature data of the user, for example, by the word vector information.
Server 105 may also, for example, input the word vector data for the user into a user risk classification model to generate a risk classification identification for the user and its corresponding risk probability.
The server 105 may also generate a first word vector model, for example, through a terminal application list and a quick text classification of the historical user; and/or generating a second word vector model through a terminal application list of the historical user and a word vector conversion method.
The server 105 may be an entity server, and may for example be composed of a plurality of servers, and a part of the servers 105 may for example be used for generating the feature data of the user from the word vector information; a portion of the servers 105 may be used, for example, to input the word vector data of the user into a user risk classification model to generate a risk classification identification of the user and its corresponding risk probability; and a portion of the server 105 may also be used to generate a first word vector model, for example, through a list of terminal applications and a quick text taxonomy of the historical user; and/or generating a second word vector model through a terminal application list of the historical user and a word vector conversion method.
It should be noted that, the method for generating user feature data provided in the embodiment of the present disclosure may be executed by the server 105, and accordingly, the device for generating user feature data may be disposed in the server 105. And the web page end provided for the user to browse the financial service platform is generally located in the terminal devices 101, 102, 103.
Fig. 2 is a flowchart illustrating a user characteristic data generation method according to an exemplary embodiment. The user characteristic data generation method 20 includes at least steps S202 to S208.
As shown in fig. 2, in S202, a terminal application list of a user is acquired, the terminal application list including installed application information. The terminal application list has app list information installed on the mobile terminal recorded thereon.
In S204, the application information in the terminal application list is input into a first word vector model and a second word vector model, respectively, to generate a plurality of first word vectors and a plurality of second word vectors.
In one embodiment, further comprising: generating a first word vector model through a terminal application list of a historical user and a quick text classification method; and/or generating a second word vector model through a terminal application list of the historical user and a word vector conversion method.
More specifically, the first word vector model may be generated by a fast text method, which is a text classifier that is open in the year Facebook AI Research. It is characterized by fast. Compared with other text classification models, such as models of SVM, logistic Regression, neural network and the like, fastText greatly shortens training time while maintaining classification effect.
More specifically, a second word vector model may be generated by word vector conversion (word 2 vec), a tool for word vector computation, which word2vec may be efficiently trained on dictionaries on the order of millions and billions of data sets; the word vector (word filling) obtained by the tool can measure similarity between words well.
In one embodiment, inputting the application information in the terminal application list into a first word vector model and a second word vector model respectively, and generating a plurality of first word vectors and a plurality of second word vectors includes: generating a word vector dictionary as the second word vector model through a word vector conversion method; and inputting application information in the terminal application list into the word vector dictionary to generate the second word vector.
In S206, information fusion is performed on the plurality of first word vectors and the plurality of second word vectors, so as to generate word vector information of the user. Comprising the following steps: generating a plurality of application word vectors by a plurality of first word vectors and the plurality of second word vectors; and performing information fusion (meta-unbinding) on a plurality of application word vectors to generate word vector information of the user.
The information fusion is a new scheme for user word vector fusion. By means of complementarity between different embeddings (discrete data serialization method), multiple embeddings are used simultaneously for information fusion.
In one embodiment, generating a plurality of application word vectors from a plurality of first word vectors and the plurality of second word vectors includes: acquiring a first word vector and a second word vector corresponding to single application information; information fusion is carried out on the first word vector and the second word vector to generate an application word vector; and generating a plurality of application word vectors through a plurality of first word vectors and a plurality of second word vectors corresponding to all application information in the terminal application list.
"information fusion is performed on the plurality of first word vectors and the plurality of second word vectors to generate word vector information of the user" will be described in detail in the embodiment corresponding to fig. 3.
In S208, feature data of the user is generated from the word vector information.
In one embodiment, further comprising: and inputting the word vector data of the user into a user risk classification model to generate a risk classification identifier of the user and a corresponding risk probability thereof.
According to the user characteristic data generation method, word vector data of the user are input into a user risk classification model, which is equivalent to automatically generating meta-unbedding vectors by app information of the user and then adding the meta-unbedding vectors into a multi-layer perceptron, so that samples can be classified, and classification of all risk algorithms, such as overdue, gambling and multi-head lending, can be understood as perception of known risks. After the model judges the risk of the user app information and the sample, the current examination system can be accessed, the examination can be automated in batches, and the efficiency and the precision of the examination system are further provided.
According to the user characteristic data generation method, a terminal application list of a user is obtained, wherein the terminal application list comprises installed application information; respectively inputting application information in the terminal application list into a first word vector model and a second word vector model to generate a plurality of first word vectors and a plurality of second word vectors; information fusion is carried out on the plurality of first word vectors and the plurality of second word vectors, and word vector information of the user is generated; and the feature data of the user can be accurately analyzed from multiple dimensions by generating the feature data of the user through the word vector information, the data for accurately describing the feature of the user can be generated, and more comprehensive risk analysis can be performed on the user through the feature data of the user.
It should be clearly understood that this disclosure describes how to make and use particular examples, but the principles of this disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.
Fig. 3 is a flowchart illustrating a user characteristic data generation method according to another exemplary embodiment. The process shown in fig. 3 is a detailed description of "fusing the plurality of first word vectors and the plurality of second word vectors to generate the word vector information" in the process shown in fig. 2 at S206.
As shown in fig. 3, in S302, a first word vector and a second word vector corresponding to single application information are acquired. And predicting the single app by using fastext and word2vec to obtain fastext vector expression and word2vec vector expression of the single app.
In S304, information fusion is performed on the first word vector and the second word vector to generate an application word vector. The application word vector may be generated, for example, by information fusion of the first word vector and the second word vector by means of weighted averaging. The two words of a single app are embedded into the expression vector weighted average to obtain meta-ebedding expression of the single app.
The meta-unbedding vector of each app is obtained by meta-unbedding each app information.
In S306, the plurality of application word vectors are generated by the plurality of first word vectors and the plurality of second word vectors corresponding to all application information in the terminal application list.
In S308, information fusion is performed on a plurality of application word vectors to generate word vector information of the user. The application word vectors may be generated, for example, by information fusion of a plurality of application word vectors by means of weighted averaging. The meta-casting vectors of all apps may be weighted averaged, for example, to obtain the final client's word vector information.
Fig. 4 is a schematic diagram illustrating a user characteristic data generation method according to another exemplary embodiment. As shown in fig. 4, the client's app information is first converted into a list-level app-meta-casting vector. And then training the pre-training model by using the existing app information data to obtain a fastatex model and a word2vec model. And inputting app data into the trained word2vec and fasttext to obtain vectors. And predicting the single app by using fastext and word2vec to obtain fastext vector expression and word2vec vector expression of the single app, and then obtaining information fusion expression of the single app by embedding and expressing weighted average of 2 words of the single app. And carrying out information fusion on each app information to obtain a fusion vector of each app. The fusion vector of all apps is weighted and averaged to obtain the final customer feature data.
Fig. 5 is a flowchart illustrating a user characteristic data generation method according to another exemplary embodiment.
As shown in fig. 5, in S502, the multi-layer perceptron model is trained through risk classification identifiers of historical users and a terminal application list, and the user risk classification model is generated.
In S504, the word vector data of the user is input into a user risk classification model to generate a risk classification identifier of the user and a risk probability corresponding to the risk classification identifier.
In S506, when the risk classification identifier of the user is an unknown identifier, a target classification identifier is determined for the user. Comprising the following steps: and determining target classification identification for the user through other risk classification models.
In S508, retraining the multi-layer perceptron model through the user' S terminal application list and the target classification identifier to update the user risk classification model.
The risk of the user is classified by using a multi-layer perceptron (MLP), and the main idea is to analyze meta-unbedding of app information of the user, correlate the meta-unbedding with known risk classifications, obtain information expression of the user under different risk classifications, and further realize advanced pre-judgment and classification of the risk.
The risk of the user may be classified, for example, because the neural network may approximately provide different confidence levels for the user, and thus different confidence values may be obtained for the user by the multi-layer perceptron, e.g., 0.1 may represent a risk of about 10% of the user, and 0.9 may represent a risk of about 90% of the user, and the risk may be classified into different risk classes of high, medium, and low based on the prediction of the confidence level of the risk of the user by the multi-layer perceptron. And (3) managing and controlling the high-risk users, passing the low-risk users, entering the downstream of the high-risk users, and further grading and classifying the medium-risk users through mechanisms such as checking and the like.
And detecting new abnormal risks in other modes when the risk classification identifier of the user is an unknown identifier. And (3) new classification is carried out on the labels, and after the new classification labels are added into the MLP model for training, the MLP or other models are continuously used for determining abnormal risks.
According to the user characteristic data generation method, the existing risk can be learned, and accurate judgment can be carried out on the existing risk. Can replace manual analysis of the existing risk manipulation, thereby obtaining accurate perception of the risk manipulation.
According to the user characteristic data generation method, processing of app information can be replaced by manual experience, and time and labor consuming manual analysis is avoided by vectorizing the user app information.
According to the user characteristic data generation method, besides basic risk perception, meta-ebedding can be used for carrying out high-level expression on user information, and the method of embedding different words is fused to further obtain the description of user risk behaviors.
Those skilled in the art will appreciate that all or part of the steps implementing the above described embodiments are implemented as a computer program executed by a CPU. The above-described functions defined by the above-described methods provided by the present disclosure are performed when the computer program is executed by a CPU. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic disk or an optical disk, etc.
Furthermore, it should be noted that the above-described figures are merely illustrative of the processes involved in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.
Fig. 6 is a block diagram illustrating a user characteristic data generating apparatus according to an exemplary embodiment. As shown in fig. 6, the user characteristic data generating device 60 includes: list module 602, vector module 604, fusion module 606, feature module 608.
The list module 602 is configured to obtain a terminal application list of a user, where the terminal application list includes installed application information;
the vector module 604 is configured to input application information in the terminal application list into a first word vector model and a second word vector model, respectively, to generate a plurality of first word vectors and a plurality of second word vectors;
the fusion module 606 is configured to perform information fusion on the plurality of first word vectors and the plurality of second word vectors, and generate word vector information of the user; the fusion module 606 includes: a computing unit configured to generate a plurality of application word vectors from a plurality of first word vectors and the plurality of second word vectors; and the fusion unit is used for carrying out information fusion on the plurality of application word vectors to generate word vector information of the user. The fusion unit is further used for acquiring a first word vector and a second word vector corresponding to the single application information; information fusion is carried out on the first word vector and the second word vector to generate an application word vector; and generating a plurality of application word vectors through a plurality of first word vectors and a plurality of second word vectors corresponding to all application information in the terminal application list. And the fusion unit is also used for carrying out information fusion on the first word vector and the second word vector in a weighted average mode to generate the application word vector.
The feature module 608 is configured to generate feature data of the user according to the word vector information.
Fig. 7 is a block diagram illustrating a user characteristic data generating apparatus according to another exemplary embodiment. As shown in fig. 7, the user characteristic data generating device 70 includes: model module 702, first training module 704, second training module 706, and third training module 708.
The model module 702 is configured to input the word vector data of the user into a risk classification model of the user to generate a risk classification identifier of the user and a risk probability corresponding to the risk classification identifier.
The first training module 704 is configured to generate a first word vector model through a terminal application list and a quick text classification method of the historical user; and/or
The second training module 706 is configured to generate a second word vector model through a terminal application list of the history user and a word vector conversion method. The second training module 706 includes: the dictionary unit is used for generating a word vector dictionary serving as the second word vector model through a word vector conversion method; and an input unit for inputting application information in the terminal application list into the word vector dictionary to generate the second word vector.
The third training module 708 is configured to train the multi-layer perceptron model through the risk classification identifier of the historical user and the terminal application list, and generate the user risk classification model. The third training module 708 further comprises: the model updating unit is used for determining a target classification identifier for the user when the risk classification identifier of the user is an unknown identifier; and retraining the multi-layer perceptron model through the terminal application list of the user and the target classification identifier to update the user risk classification model. The model updating unit is further used for determining target classification identifiers for the users through other risk classification models.
According to the user characteristic data generating device, a terminal application list of a user is obtained, wherein the terminal application list comprises installed application information; respectively inputting application information in the terminal application list into a first word vector model and a second word vector model to generate a plurality of first word vectors and a plurality of second word vectors; information fusion is carried out on the plurality of first word vectors and the plurality of second word vectors, and word vector information of the user is generated; and the feature data of the user can be accurately analyzed from multiple dimensions by generating the feature data of the user through the word vector information, the data for accurately describing the feature of the user can be generated, and more comprehensive risk analysis can be performed on the user through the feature data of the user.
Fig. 8 is a block diagram of an electronic device, according to an example embodiment.
An electronic device 800 according to such an embodiment of the present disclosure is described below with reference to fig. 8. The electronic device 800 shown in fig. 8 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present disclosure in any way.
As shown in fig. 8, the electronic device 800 is embodied in the form of a general purpose computing device. Components of electronic device 800 may include, but are not limited to: at least one processing unit 810, at least one memory unit 820, a bus 830 that connects the different system components (including memory unit 820 and processing unit 810), a display unit 840, and the like.
Wherein the storage unit stores program code that is executable by the processing unit 810 such that the processing unit 810 performs steps according to various exemplary embodiments of the present disclosure described in the above-described electronic prescription flow processing methods section of the present specification. For example, the processing unit 810 may perform the steps as shown in fig. 2, 3, 5.
The storage unit 820 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM) 8201 and/or a cache memory unit 8202, and may further include a read only memory unit (ROM) 8203.
The storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 830 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 800 may also communicate with one or more external devices 800' (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 800, and/or any device (e.g., router, modem, etc.) that enables the electronic device 800 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 850. Also, electronic device 800 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 860. Network adapter 860 may communicate with other modules of electronic device 800 via bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 800, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, as shown in fig. 9, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, or a network device, etc.) to perform the above-described method according to the embodiments of the present disclosure.
The software product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The computer-readable medium carries one or more programs, which when executed by one of the devices, cause the computer-readable medium to perform the functions of: acquiring a terminal application list of a user, wherein the terminal application list comprises installed application information; respectively inputting application information in the terminal application list into a first word vector model and a second word vector model to generate a plurality of first word vectors and a plurality of second word vectors; information fusion is carried out on the plurality of first word vectors and the plurality of second word vectors, and word vector information of the user is generated; and generating feature data of the user through the word vector information.
Those skilled in the art will appreciate that the modules may be distributed throughout several devices as described in the embodiments, and that corresponding variations may be implemented in one or more devices that are unique to the embodiments. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or in combination with the necessary hardware. Thus, the technical solutions according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and include several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
Exemplary embodiments of the present disclosure are specifically illustrated and described above. It is to be understood that this disclosure is not limited to the particular arrangements, instrumentalities and methods of implementation described herein; on the contrary, the disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (14)

1. A user characteristic data generation method, comprising:
acquiring a terminal application list of a user, wherein the terminal application list comprises installed application information converted into a vector of a list layer;
generating a first word vector model through a terminal application list of the historical user and a quick text classification method, and generating a second word vector model through the terminal application list of the historical user and a word vector conversion method;
respectively inputting application information in the terminal application list into a first word vector model and a second word vector model to generate a plurality of first word vectors and a plurality of second word vectors;
information fusion is carried out on the plurality of first word vectors and the plurality of second word vectors, and word vector information of the user is generated, wherein the generation comprises the following steps: acquiring a first word vector and a second word vector corresponding to single application information, performing information fusion on the first word vector and the second word vector to generate application word vectors, and generating a plurality of application word vectors through a plurality of first word vectors and a plurality of second word vectors corresponding to all application information in the terminal application list; information fusion is carried out on a plurality of application word vectors through weighted average to generate word vector information of the user;
And generating characteristic data of the user through the word vector information, and inputting the word vector information of the user into a user risk classification model to generate a risk classification identifier and a corresponding risk probability of the user.
2. The method of claim 1, wherein inputting the application information in the terminal application list into the first word vector model and the second word vector model, respectively, generates a plurality of first word vectors and a plurality of second word vectors, comprising:
generating a word vector dictionary as the second word vector model through a word vector conversion method; and
and inputting application information in the terminal application list into the word vector dictionary to generate the second word vector.
3. The method of claim 1, wherein information fusing the first word vector and the second word vector to generate an application word vector comprises:
and carrying out information fusion on the first word vector and the second word vector in a weighted average mode to generate the application word vector.
4. The method as recited in claim 1, further comprising:
and training the multi-layer perceptron model through the risk classification identification of the historical user and the terminal application list to generate the user risk classification model.
5. The method of claim 4, wherein inputting the user's word vector data into a user risk classification model generates a risk classification identification for the user and its corresponding risk probability, comprising:
when the risk classification identifier of the user is an unknown identifier, determining a target classification identifier for the user;
and retraining the multi-layer perceptron model through the terminal application list of the user and the target classification identifier to update the user risk classification model.
6. The method of claim 5, wherein determining a target classification identity for the user comprises:
and determining target classification identification for the user through other risk classification models.
7. A user characteristic data generation apparatus, comprising:
the list module is used for acquiring a terminal application list of a user, wherein the terminal application list comprises installed application information converted into a vector of a list layer;
the first training module is used for generating a first word vector model through a terminal application list of the historical user and a quick text classification method;
the second training module is used for generating a second word vector model through a terminal application list of the historical user and a word vector conversion method;
The vector module is used for inputting the application information in the terminal application list into a first word vector model and a second word vector model respectively to generate a plurality of first word vectors and a plurality of second word vectors;
the fusion module is used for carrying out information fusion on the plurality of first word vectors and the plurality of second word vectors to generate word vector information of the user, and comprises a calculation unit, a calculation unit and a processing unit, wherein the calculation unit is used for acquiring the first word vectors and the second word vectors corresponding to single application information, carrying out information fusion on the first word vectors and the second word vectors to generate application word vectors, and generating the plurality of application word vectors through the plurality of first word vectors and the plurality of second word vectors corresponding to all application information in the terminal application list; the fusion unit is used for carrying out information fusion on a plurality of application word vectors through weighted average to generate word vector information of the user;
the feature module is used for generating feature data of the user through the word vector information; the method comprises the steps of,
and the model module is used for inputting the word vector information of the user into a user risk classification model to generate a risk classification identifier of the user and a corresponding risk probability thereof.
8. The apparatus of claim 7, wherein the second training module comprises:
a dictionary unit for generating a word vector dictionary as the second word vector model by a word vector conversion method; and
and the input unit is used for inputting the application information in the terminal application list into the word vector dictionary to generate the second word vector.
9. The apparatus of claim 7, wherein the fusion unit is further configured to generate the application word vector by information fusion of the first word vector and the second word vector by means of weighted averaging.
10. The apparatus as recited in claim 7, further comprising:
and the third training module is used for training the multi-layer perceptron model through the risk classification identification of the historical user and the terminal application list to generate the user risk classification model.
11. The apparatus of claim 10, wherein the third training module further comprises:
the model updating unit is used for determining a target classification identifier for the user when the risk classification identifier of the user is an unknown identifier; and retraining the multi-layer perceptron model through the terminal application list of the user and the target classification identifier to update the user risk classification model.
12. The apparatus of claim 11, wherein the model updating unit is further configured to
And determining target classification identification for the user through other risk classification models.
13. An electronic device, comprising:
one or more processors;
a storage means for storing one or more programs;
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.
14. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.
CN201911263161.9A 2019-12-11 2019-12-11 User characteristic data generation method and device and electronic equipment Active CN111191677B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911263161.9A CN111191677B (en) 2019-12-11 2019-12-11 User characteristic data generation method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911263161.9A CN111191677B (en) 2019-12-11 2019-12-11 User characteristic data generation method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111191677A CN111191677A (en) 2020-05-22
CN111191677B true CN111191677B (en) 2023-09-26

Family

ID=70707771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911263161.9A Active CN111191677B (en) 2019-12-11 2019-12-11 User characteristic data generation method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111191677B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666382A (en) * 2020-06-19 2020-09-15 中信银行股份有限公司 User feature extraction method and device, electronic equipment and readable storage medium
CN112183630B (en) * 2020-09-28 2023-09-26 中国平安人寿保险股份有限公司 Embedding vector generation method, device, equipment and medium based on embedded point level
CN111966730A (en) * 2020-10-23 2020-11-20 北京淇瑀信息科技有限公司 Risk prediction method and device based on permanent premises and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850662A (en) * 2015-06-08 2015-08-19 浙江每日互动网络科技有限公司 User portrait based mobile terminal intelligent message pushing method, server and system
CN106651057A (en) * 2017-01-03 2017-05-10 有米科技股份有限公司 Mobile terminal user age prediction method based on installation package sequence table
CN107705156A (en) * 2017-10-16 2018-02-16 深圳大宇无限科技有限公司 User feature analysis method and device
CN108416663A (en) * 2018-01-18 2018-08-17 阿里巴巴集团控股有限公司 The method and device of the financial default risk of assessment
KR20180121466A (en) * 2017-04-06 2018-11-07 네이버 주식회사 Personalized product recommendation using deep learning
CN108845986A (en) * 2018-05-30 2018-11-20 中兴通讯股份有限公司 A kind of sentiment analysis method, equipment and system, computer readable storage medium
CN109582796A (en) * 2018-12-05 2019-04-05 深圳前海微众银行股份有限公司 Generation method, device, equipment and the storage medium of enterprise's public sentiment event network
CN110134948A (en) * 2019-04-23 2019-08-16 北京淇瑀信息科技有限公司 A kind of Financial Risk Control method, apparatus and electronic equipment based on text data
CN110134793A (en) * 2019-05-28 2019-08-16 电子科技大学 Text sentiment classification method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9811830B2 (en) * 2013-07-03 2017-11-07 Google Inc. Method, medium, and system for online fraud prevention based on user physical location data

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850662A (en) * 2015-06-08 2015-08-19 浙江每日互动网络科技有限公司 User portrait based mobile terminal intelligent message pushing method, server and system
CN106651057A (en) * 2017-01-03 2017-05-10 有米科技股份有限公司 Mobile terminal user age prediction method based on installation package sequence table
KR20180121466A (en) * 2017-04-06 2018-11-07 네이버 주식회사 Personalized product recommendation using deep learning
CN107705156A (en) * 2017-10-16 2018-02-16 深圳大宇无限科技有限公司 User feature analysis method and device
CN108416663A (en) * 2018-01-18 2018-08-17 阿里巴巴集团控股有限公司 The method and device of the financial default risk of assessment
CN108845986A (en) * 2018-05-30 2018-11-20 中兴通讯股份有限公司 A kind of sentiment analysis method, equipment and system, computer readable storage medium
CN109582796A (en) * 2018-12-05 2019-04-05 深圳前海微众银行股份有限公司 Generation method, device, equipment and the storage medium of enterprise's public sentiment event network
CN110134948A (en) * 2019-04-23 2019-08-16 北京淇瑀信息科技有限公司 A kind of Financial Risk Control method, apparatus and electronic equipment based on text data
CN110134793A (en) * 2019-05-28 2019-08-16 电子科技大学 Text sentiment classification method

Also Published As

Publication number Publication date
CN111191677A (en) 2020-05-22

Similar Documents

Publication Publication Date Title
CN109992710B (en) Click rate estimation method, system, medium and computing device
CN111210335B (en) User risk identification method and device and electronic equipment
CN111191677B (en) User characteristic data generation method and device and electronic equipment
CN111210336A (en) User risk model generation method and device and electronic equipment
CN111583018A (en) Credit granting strategy management method and device based on user financial performance analysis and electronic equipment
CN111783039A (en) Risk determination method, risk determination device, computer system and storage medium
CN111198967A (en) User grouping method and device based on relational graph and electronic equipment
CN111191893B (en) Wind control text processing method and device and electronic equipment
CN113297287B (en) Automatic user policy deployment method and device and electronic equipment
CN114358147A (en) Training method, identification method, device and equipment of abnormal account identification model
CN113610625A (en) Overdue risk warning method and device and electronic equipment
CN113610366A (en) Risk warning generation method and device and electronic equipment
US11074486B2 (en) Query analysis using deep neural net classification
US11893132B2 (en) Discovery of personal data in machine learning models
CN111178687B (en) Financial risk classification method and device and electronic equipment
CN113568739A (en) User resource limit distribution method and device and electronic equipment
CN113612777A (en) Training method, traffic classification method, device, electronic device and storage medium
CN111582648A (en) User policy generation method and device and electronic equipment
CN111178687A (en) Financial risk classification method and device and electronic equipment
CN112348661B (en) Service policy distribution method and device based on user behavior track and electronic equipment
CN111626438B (en) Model migration-based user policy allocation method and device and electronic equipment
CN117172632B (en) Enterprise abnormal behavior detection method, device, equipment and storage medium
CN112016793B (en) Resource allocation method and device based on target user group and electronic equipment
US11477236B2 (en) Trend-aware combo-squatting detection
US20220253602A1 (en) Systems and methods for increasing accuracy in categorizing characters in text string

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant