CN111191677A - User characteristic data generation method and device and electronic equipment - Google Patents

User characteristic data generation method and device and electronic equipment Download PDF

Info

Publication number
CN111191677A
CN111191677A CN201911263161.9A CN201911263161A CN111191677A CN 111191677 A CN111191677 A CN 111191677A CN 201911263161 A CN201911263161 A CN 201911263161A CN 111191677 A CN111191677 A CN 111191677A
Authority
CN
China
Prior art keywords
user
word vector
word
information
application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911263161.9A
Other languages
Chinese (zh)
Other versions
CN111191677B (en
Inventor
李达
张彤彤
苏绥绥
常富洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qilu Information Technology Co Ltd
Original Assignee
Beijing Qilu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qilu Information Technology Co Ltd filed Critical Beijing Qilu Information Technology Co Ltd
Priority to CN201911263161.9A priority Critical patent/CN111191677B/en
Publication of CN111191677A publication Critical patent/CN111191677A/en
Application granted granted Critical
Publication of CN111191677B publication Critical patent/CN111191677B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Abstract

The disclosure relates to a user feature data generation method, a user feature data generation device, an electronic device and a computer readable medium. The method comprises the following steps: acquiring a terminal application list of a user, wherein the terminal application list comprises installed application information; respectively inputting the application information in the terminal application list into a first word vector model and a second word vector model to generate a plurality of first word vectors and a plurality of second word vectors; performing information fusion on the plurality of first word vectors and the plurality of second word vectors to generate word vector information of the user; and generating feature data of the user through the word vector information. According to the user characteristic data generation method, the user characteristic data generation device, the electronic equipment and the computer readable medium, the characteristics of the user can be accurately analyzed from multiple dimensions, data which accurately describe the characteristics of the user can be generated, and more comprehensive risk analysis can be performed on the user through the characteristic data of the user.

Description

User characteristic data generation method and device and electronic equipment
Technical Field
The present disclosure relates to the field of computer information processing, and in particular, to a method and an apparatus for generating user characteristic data, an electronic device, and a computer-readable medium.
Background
The individual user or the enterprise user is often subjected to borrowing activities by the financial services, and the borrowing activities of the user may possibly bring risks to the financial services for the financial services. Currently, the financial risk is often determined by analyzing basic information and behavior information of the user, for example, the basic information may include the age, sex, occupation, region, etc. of the user, and the behavior information may include borrowing information, repayment information, default information, etc. of the user. How to dig out more information capable of reflecting a certain aspect of a user so as to perform more comprehensive analysis and judgment on the financial risk of the user is a subject of wide attention at present.
In the prior art, risk perception of app information is mostly concentrated on app classification information and manual experience of customers, and after a new case appears each time, the new case needs to be checked after being defined by colleagues in an audition. This results in excessive use of manpower, which may be fatigued excessively leading to errors. The traditional statistical model depends on manual experience, and the user app information needs to be subjected to fine analysis and abnormal information expression, so that the time and the labor are consumed.
Therefore, a new user feature data generation method, device, electronic device and computer readable medium are needed.
The above information disclosed in this background section is only for enhancement of understanding of the background of the disclosure and therefore it may contain information that does not constitute prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
In view of this, the present disclosure provides a user feature data generation method, an apparatus, an electronic device, and a computer readable medium, which can accurately analyze features of a user from multiple dimensions, generate data that accurately describes the features of the user, and perform more comprehensive risk analysis on the user through user characteristic data.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to an aspect of the present disclosure, a method for generating user feature data is provided, where the method includes: acquiring a terminal application list of a user, wherein the terminal application list comprises installed application information; respectively inputting the application information in the terminal application list into a first word vector model and a second word vector model to generate a plurality of first word vectors and a plurality of second word vectors; performing information fusion on the plurality of first word vectors and the plurality of second word vectors to generate word vector information of the user; and generating feature data of the user through the word vector information.
Optionally, the method further comprises: and inputting the word vector data of the user into a user risk classification model to generate a risk classification identifier of the user and a corresponding risk probability of the risk classification identifier.
Optionally, the method further comprises: generating a first word vector model through a terminal application list of a historical user and a rapid text classification method; and/or generating a second word vector model through a terminal application list of the historical user and a word vector conversion method.
Optionally, the step of inputting the application information in the terminal application list into a first word vector model and a second word vector model respectively to generate a plurality of first word vectors and a plurality of second word vectors includes: generating a word vector dictionary as the second word vector model by a word vector conversion method; and inputting the application information in the application terminal list into the word vector dictionary to generate the second word vector.
Optionally, performing information fusion on the plurality of first word vectors and the plurality of second word vectors to generate word vector information of the user, including: generating a plurality of application word vectors by a plurality of first word vectors and the plurality of second word vectors; and performing information fusion on the plurality of application word vectors to generate word vector information of the user.
Optionally, generating a plurality of application word vectors by a plurality of first word vectors and the plurality of second word vectors comprises: acquiring a first word vector and a second word vector corresponding to single application information; performing information fusion on the first word vector and the second word vector to generate an application word vector; and generating the plurality of application word vectors through a plurality of first word vectors and a plurality of second word vectors corresponding to all application information in the terminal application category.
Optionally, performing information fusion on the first word vector and the second word vector to generate an application word vector, including: and performing information fusion on the first word vector and the second word vector in a weighted average mode to generate the application word vector.
Optionally, the method further comprises: and training the multilayer perceptron model through the risk classification identification of the historical user and the terminal application list to generate the user risk classification model.
Optionally, inputting the word vector data of the user into a user risk classification model to generate a risk classification identifier of the user and a risk probability corresponding to the risk classification identifier, further comprising: when the risk classification identifier of the user is an unknown identifier, determining a target classification identifier for the user; and training the multilayer perceptron model again through the terminal application list of the user and the target classification identifier so as to update the user risk classification model.
Optionally, determining a target classification identifier for the user includes: and determining a target classification identification for the user through other risk classification models.
According to an aspect of the present disclosure, a user feature data generating apparatus is provided, the apparatus including: the system comprises a list module, a service module and a service module, wherein the list module is used for acquiring a terminal application list of a user, and the terminal application list comprises installed application information; the vector module is used for respectively inputting the application information in the terminal application list into a first word vector model and a second word vector model to generate a plurality of first word vectors and a plurality of second word vectors; the fusion module is used for carrying out information fusion on the plurality of first word vectors and the plurality of second word vectors to generate word vector information of the user; and the characteristic module is used for generating the characteristic data of the user through the word vector information.
Optionally, the method further comprises: and the model module is used for inputting the word vector data of the user into a user risk classification model to generate a risk classification identifier of the user and a corresponding risk probability.
Optionally, the method further comprises: the first training module is used for generating a first word vector model through a terminal application list of a historical user and a rapid text classification method; and/or the second training module is used for generating a second word vector model through a terminal application list of the historical user and a word vector conversion method.
Optionally, the second training module comprises: the dictionary unit is used for generating a word vector dictionary as the second word vector model by a word vector conversion method; and the input unit is used for inputting the application information in the application terminal list into the word vector dictionary to generate the second word vector.
Optionally, the fusion module includes: a calculation unit for generating a plurality of application word vectors from a plurality of first word vectors and the plurality of second word vectors; and the fusion unit is used for carrying out information fusion on the multiple application word vectors to generate word vector information of the user.
Optionally, the fusion unit is further configured to obtain a first word vector and a second word vector corresponding to a single application information; performing information fusion on the first word vector and the second word vector to generate an application word vector; and generating the plurality of application word vectors through a plurality of first word vectors and a plurality of second word vectors corresponding to all application information in the terminal application category.
Optionally, the fusion unit is further configured to perform information fusion on the first word vector and the second word vector in a weighted average manner to generate the application word vector.
Optionally, the method further comprises: and the third training module is used for training the multilayer perceptron model through the risk classification identification of the historical user and the terminal application list to generate the user risk classification model.
Optionally, the third training module further includes: the model updating unit is used for determining a target classification identifier for the user when the risk classification identifier of the user is an unknown identifier; and training the multilayer perceptron model again through the terminal application list of the user and the target classification identifier so as to update the user risk classification model.
Optionally, the model updating unit is further configured to determine a target classification identifier for the user through another risk classification model.
According to an aspect of the present disclosure, an electronic device is provided, the electronic device including: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method as above.
According to an aspect of the disclosure, a computer-readable medium is proposed, on which a computer program is stored, which program, when being executed by a processor, carries out the method as above.
According to the user characteristic data generation method, the user characteristic data generation device, the electronic equipment and the computer readable medium, a terminal application list of a user is obtained, wherein the terminal application list comprises installed application information; respectively inputting the application information in the terminal application list into a first word vector model and a second word vector model to generate a plurality of first word vectors and a plurality of second word vectors; performing information fusion on the plurality of first word vectors and the plurality of second word vectors to generate word vector information of the user; and the method for generating the feature data of the user through the word vector information can accurately analyze the features of the user from multiple dimensions, generate data for accurately describing the features of the user, and perform more comprehensive risk analysis on the user through the characteristic data of the user.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings. The drawings described below are merely some embodiments of the present disclosure, and other drawings may be derived from those drawings by those of ordinary skill in the art without inventive effort.
Fig. 1 is a system block diagram illustrating a user characteristic data generation method and apparatus according to an exemplary embodiment.
FIG. 2 is a flow chart illustrating a method of user characteristic data generation according to an exemplary embodiment.
Fig. 3 is a flow chart illustrating a method of user characteristic data generation according to another exemplary embodiment.
Fig. 4 is a schematic diagram illustrating a user characteristic data generation method according to another exemplary embodiment.
Fig. 5 is a flow chart illustrating a method of user characteristic data generation according to another exemplary embodiment.
Fig. 6 is a block diagram illustrating a user characteristic data generating apparatus according to an example embodiment.
Fig. 7 is a block diagram illustrating a user characteristic data generating apparatus according to another exemplary embodiment.
FIG. 8 is a block diagram illustrating an electronic device in accordance with an example embodiment.
FIG. 9 is a block diagram illustrating a computer-readable medium in accordance with an example embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various components, these components should not be limited by these terms. These terms are used to distinguish one element from another. Thus, a first component discussed below may be termed a second component without departing from the teachings of the disclosed concept. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It is to be understood by those skilled in the art that the drawings are merely schematic representations of exemplary embodiments, and that the blocks or processes shown in the drawings are not necessarily required to practice the present disclosure and are, therefore, not intended to limit the scope of the present disclosure.
With the development of internet information technology, smart phones have become an indispensable part of people's daily life. Various APP realize different functions, and convenience and fun are provided for life of people. The APP installation information on the mobile phone is inseparable from the personal preference of the user, or the APP installation situation of one person can be regarded as a description feature of the person, so that the personal features such as the client, the client risk perception, the client preference presumption and the like can be better understood.
The inventor of the present disclosure finds that, at present, there are two main methods for feature mining of APP installation information, one is classification statistics of a single APP in a second to third-level catalog, and this classification information can be regarded as that a single APP information is observed on a coarser granularity, and is used as a feature of a client, and in doing so, except for classification variables under some strong financial attributes or fraud classes, other general APP classifications are often not effective enough for ascertaining client risks; the other type is that the data of the embedded points of the detailed use condition of the customer is analyzed and statistically recorded in a single APP, and the data of the embedded points in the single APP is relatively private and not easy to obtain and can be obtained only by a specific APP merchant.
Therefore, the APP installation list is regarded as a whole to be analyzed, the integrity of the APP installation list is utilized to describe and guess the preference of the client, and the preference of the client can be more accurately described compared with the classification information; and the data can be collected when the user registers or applies for the data, and the data has wider application space compared with buried point data. The following describes the user characteristic data generation method in the present disclosure in detail with reference to specific embodiments.
Fig. 1 is a system block diagram illustrating a user characteristic data generation method and apparatus according to an exemplary embodiment.
As shown in fig. 1, the system architecture 10 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a financial services application, a shopping application, a web browser application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server that provides various services, such as a background management server that supports financial services websites browsed by the user using the terminal apparatuses 101, 102, and 103. The background management server may analyze the received user data, and feed back the processing result (e.g., user characteristic data) to the administrator of the financial service website.
The server 105 may, for example, obtain a terminal application list of the user, where the terminal application list includes installed application information; the server 105 may, for example, input the application information in the terminal application list into the first word vector model and the second word vector model, respectively, and generate a plurality of first word vectors and a plurality of second word vectors; the server 105 may, for example, perform information fusion on the plurality of first word vectors and the plurality of second word vectors to generate word vector information of the user; the server 105 may generate feature data of the user, for example, from the word vector information.
Server 105 may also generate a risk classification identification for the user and its corresponding risk probability, for example, by inputting the user's word vector data into a user risk classification model.
Server 105 may also generate a first word vector model, for example, by a terminal application list of historical users and a fast text taxonomy; and/or generating a second word vector model through a terminal application list of the historical user and a word vector conversion method.
The server 105 may be a server of one entity, and may also be composed of a plurality of servers, for example, a part of the server 105 may be used to generate the feature data of the user through the word vector information, for example; some of the servers 105 may be configured to input word vector data of the user into a user risk classification model to generate a risk classification identifier of the user and a risk probability corresponding to the risk classification identifier; and a portion of the server 105 may also be used, for example, to generate a first word vector model through a terminal application list of historical users and a fast text taxonomy; and/or generating a second word vector model through a terminal application list of the historical user and a word vector conversion method.
It should be noted that the user characteristic data generation method provided by the embodiment of the present disclosure may be executed by the server 105, and accordingly, the user characteristic data generation device may be disposed in the server 105. And the web page end provided for the user to browse the financial service platform is generally positioned in the terminal equipment 101, 102 and 103.
FIG. 2 is a flow chart illustrating a method of user characteristic data generation according to an exemplary embodiment. The user characteristic data generating method 20 includes at least steps S202 to S208.
As shown in fig. 2, in S202, a terminal application list of the user is obtained, where the terminal application list includes installed application information. And the terminal application list records the app list information installed on the mobile terminal.
In S204, the application information in the terminal application list is respectively input into the first word vector model and the second word vector model, and a plurality of first word vectors and a plurality of second word vectors are generated.
In one embodiment, further comprising: generating a first word vector model through a terminal application list of a historical user and a rapid text classification method; and/or generating a second word vector model through a terminal application list of the historical user and a word vector conversion method.
More specifically, the first word vector model may be generated by a fast text classification method (fastText method), which is a text classifier sourced by Facebook AI Research in 16 years. It is characterized by fast. Compared with other text classification models, such as SVM, logic Regression, neural network and other models, the fastText greatly shortens the training time while keeping the classification effect.
More specifically, a second word vector model may be generated by word vector transformation (word2vec), word2vec being a tool for word vector computation, which can be efficiently trained on millions of dictionaries and billions of datasets; the word vector (word embedding) obtained by the tool can well measure the similarity between words.
In one embodiment, the inputting the application information in the terminal application list into the first word vector model and the second word vector model respectively to generate a plurality of first word vectors and a plurality of second word vectors includes: generating a word vector dictionary as the second word vector model by a word vector conversion method; and inputting the application information in the application terminal list into the word vector dictionary to generate the second word vector.
In S206, the plurality of first word vectors and the plurality of second word vectors are subjected to information fusion, so as to generate word vector information of the user. The method comprises the following steps: generating a plurality of application word vectors by a plurality of first word vectors and the plurality of second word vectors; and performing information fusion (meta-embedding) on the plurality of application word vectors to generate word vector information of the user.
The information fusion is a new scheme of user word vector fusion. By means of complementarity among different embeddings (discrete data serialization methods), a plurality of embeddings are used for information fusion at the same time.
In one embodiment, generating a plurality of application word vectors from a plurality of first word vectors and the plurality of second word vectors comprises: acquiring a first word vector and a second word vector corresponding to single application information; performing information fusion on the first word vector and the second word vector to generate an application word vector; and generating the plurality of application word vectors through a plurality of first word vectors and a plurality of second word vectors corresponding to all application information in the terminal application category.
The detailed description of the embodiment corresponding to fig. 3 will be given to "perform information fusion on the plurality of first word vectors and the plurality of second word vectors to generate word vector information of the user".
In S208, feature data of the user is generated through the word vector information.
In one embodiment, further comprising: and inputting the word vector data of the user into a user risk classification model to generate a risk classification identifier of the user and a corresponding risk probability of the risk classification identifier.
According to the user characteristic data generation method disclosed by the disclosure, word vector data of the user is input into a user risk classification model, namely, a meta-embedding vector is automatically generated from app information of the user and then added into a multi-layer perceptron, so that samples can be classified, and classification of all risk methods, such as overdue, gambling and multi-head loan, can be understood as perception of known risks. After the model judges the risk of the user app information and the sample, the model can be accessed to the current auditing system, batch automatic auditing can be realized, and the efficiency and precision of the auditing system are further improved.
According to the user characteristic data generation method, a terminal application list of a user is obtained, wherein the terminal application list comprises installed application information; respectively inputting the application information in the terminal application list into a first word vector model and a second word vector model to generate a plurality of first word vectors and a plurality of second word vectors; performing information fusion on the plurality of first word vectors and the plurality of second word vectors to generate word vector information of the user; and the method for generating the feature data of the user through the word vector information can accurately analyze the features of the user from multiple dimensions, generate data for accurately describing the features of the user, and perform more comprehensive risk analysis on the user through the characteristic data of the user.
It should be clearly understood that this disclosure describes how to make and use particular examples, but the principles of this disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.
Fig. 3 is a flow chart illustrating a method of user characteristic data generation according to another exemplary embodiment. The flow shown in fig. 3 is a detailed description of S206 "performing information fusion on the plurality of first word vectors and the plurality of second word vectors to generate word vector information of the user" in the flow shown in fig. 2.
As shown in fig. 3, in S302, a first word vector and a second word vector corresponding to a single application information are obtained. And predicting by using the fasttext and the word2vec aiming at the single app to obtain the fasttext vector expression and the word2vec vector expression of the single app.
In S304, the first word vector and the second word vector are subjected to information fusion to generate an application word vector. The application word vector may be generated, for example, by performing information fusion on the first word vector and the second word vector by means of weighted average. And (3) embedding two words of a single app into the expression vector to obtain the meta-embedding expression of the single app.
Obtaining a meta-embedding vector of each app by carrying out meta-embedding on each app information
In S306, the plurality of application word vectors are generated through the plurality of first word vectors and the plurality of second word vectors corresponding to all application information in the terminal application category.
In S308, information fusion is performed on the multiple application word vectors to generate word vector information of the user. The application word vector may be generated by fusing information of a plurality of application word vectors by means of weighted average, for example. The final word vector information for the client can be obtained, for example, by weighted averaging the meta-embedding vectors of all apps.
Fig. 4 is a schematic diagram illustrating a user characteristic data generation method according to another exemplary embodiment. As shown in FIG. 4, the app information of the client is first converted into a list-level app-meta-embedding vector. And then training the pre-training model by using the existing app information data to obtain a fasttext model and a word2vec model. And inputting the app data into the trained word2vec and fasttext to obtain a vector. And predicting by using the fasttext and the word2vec aiming at the single app to obtain the fasttext vector expression and the word2vec vector expression of the single app, and then embedding 2 words of the single app into the expression to perform weighted average to obtain the information fusion expression of the single app. And performing information fusion on each app information to obtain a fusion vector of each app. And weighted averaging the fused vectors of all the apps to obtain the final characteristic data of the client.
Fig. 5 is a flow chart illustrating a method of user characteristic data generation according to another exemplary embodiment.
As shown in fig. 5, in S502, the multi-layered perceptron model is trained through the risk classification identifier of the historical user and the terminal application list, so as to generate the user risk classification model.
In S504, the word vector data of the user is input into the user risk classification model to generate a risk classification identifier of the user and a risk probability corresponding to the risk classification identifier.
In S506, when the risk classification identifier of the user is an unknown identifier, a target classification identifier is determined for the user. The method comprises the following steps: and determining a target classification identification for the user through other risk classification models.
In S508, the multi-layered perceptron model is retrained through the terminal application list of the user and the target classification identifier to update the user risk classification model.
The method comprises the steps of using a multi-layer perceptron (MLP) to grade risks of users, and the main idea is to analyze the meta-embedding of app information of the users, associate the obtained meta-embedding with known risk classifications, obtain information expressions of the users under different risk classifications, and further realize the advance prejudgment and classification of the risks
The risk of the user may be ranked, for example, since the neural network may approximately provide different user confidence levels, and thus when the user gets different confidence values through the multi-tier perceptron, e.g., 0.1 represents about 10% of the user's probability of having a risk, and 0.9 represents about 90% of the user's probability of having a risk, the risk may be ranked into different risk ranks of high, medium, and low based on the multi-tier perceptron's prediction of the user's risk confidence level. And managing and controlling high-risk users, allowing low-risk users to pass through, allowing middle-risk users to enter downstream, and further grading and classifying the middle risks through mechanisms such as verification and the like.
And when the risk classification identification of the user is an unknown identification, detecting a new abnormal risk by using other modes. And newly classifying the labels, adding the new classification labels into the MLP model for training, and then continuously determining the abnormal risk by using the MLP or other models.
According to the user characteristic data generation method disclosed by the invention, existing risks can be learned, and the existing risks can be accurately judged. Can replace the existing risk technique of manual analysis, thereby obtaining the accurate perception of the risk technique.
According to the user characteristic data generation method disclosed by the invention, the processing of the app information by manual experience can be replaced, and the time-consuming and labor-consuming manual analysis is avoided by vectorizing the user app information.
According to the user characteristic data generation method disclosed by the disclosure, besides basic risk perception, meta-embedding can be used for carrying out high-level expression on user information, and the method of fusing different word embedding further obtains the depiction of user risk behaviors.
Those skilled in the art will appreciate that all or part of the steps implementing the above embodiments are implemented as computer programs executed by a CPU. When executed by the CPU, performs the functions defined by the above-described methods provided by the present disclosure. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic or optical disk, or the like.
Furthermore, it should be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.
Fig. 6 is a block diagram illustrating a user characteristic data generating apparatus according to an example embodiment. As shown in fig. 6, the user feature data generation device 60 includes: list module 602, vector module 604, fusion module 606, feature module 608.
The list module 602 is configured to obtain a terminal application list of a user, where the terminal application list includes installed application information;
the vector module 604 is configured to input the application information in the terminal application list into a first word vector model and a second word vector model, respectively, and generate a plurality of first word vectors and a plurality of second word vectors;
the fusion module 606 is configured to perform information fusion on the plurality of first word vectors and the plurality of second word vectors to generate word vector information of the user; the fusion module 606 includes: a calculation unit for generating a plurality of application word vectors from a plurality of first word vectors and the plurality of second word vectors; and the fusion unit is used for carrying out information fusion on the multiple application word vectors to generate word vector information of the user. The fusion unit is further used for acquiring a first word vector and a second word vector corresponding to the single application information; performing information fusion on the first word vector and the second word vector to generate an application word vector; and generating the plurality of application word vectors through a plurality of first word vectors and a plurality of second word vectors corresponding to all application information in the terminal application category. The fusion unit is further configured to perform information fusion on the first word vector and the second word vector in a weighted average manner to generate the application word vector.
The feature module 608 is configured to generate feature data of the user through the word vector information.
Fig. 7 is a block diagram illustrating a user characteristic data generating apparatus according to another exemplary embodiment. As shown in fig. 7, the user feature data generation device 70 includes: a model module 702, a first training module 704, a second training module 706, and a third training module 708.
The model module 702 is configured to input the word vector data of the user into a user risk classification model to generate a risk classification identifier of the user and a risk probability corresponding to the risk classification identifier.
The first training module 704 is used for generating a first word vector model through a terminal application list of a historical user and a fast text classification method; and/or
The second training module 706 is configured to generate a second word vector model through a terminal application list of the historical user and a word vector transformation method. The second training module 706 comprises: the dictionary unit is used for generating a word vector dictionary as the second word vector model by a word vector conversion method; and the input unit is used for inputting the application information in the application terminal list into the word vector dictionary to generate the second word vector.
The third training module 708 is configured to train the multi-layer perceptron model through the risk classification identifier of the historical user and the terminal application list, and generate the user risk classification model. The third training module 708 further comprises: the model updating unit is used for determining a target classification identifier for the user when the risk classification identifier of the user is an unknown identifier; and training the multilayer perceptron model again through the terminal application list of the user and the target classification identifier so as to update the user risk classification model. And the model updating unit is also used for determining a target classification identifier for the user through other risk classification models.
According to the user characteristic data generation device, a terminal application list of a user is obtained, wherein the terminal application list comprises installed application information; respectively inputting the application information in the terminal application list into a first word vector model and a second word vector model to generate a plurality of first word vectors and a plurality of second word vectors; performing information fusion on the plurality of first word vectors and the plurality of second word vectors to generate word vector information of the user; and the method for generating the feature data of the user through the word vector information can accurately analyze the features of the user from multiple dimensions, generate data for accurately describing the features of the user, and perform more comprehensive risk analysis on the user through the characteristic data of the user.
FIG. 8 is a block diagram illustrating an electronic device in accordance with an example embodiment.
An electronic device 800 according to this embodiment of the disclosure is described below with reference to fig. 8. The electronic device 800 shown in fig. 8 is only an example and should not bring any limitations to the functionality and scope of use of the embodiments of the present disclosure.
As shown in fig. 8, electronic device 800 is in the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: at least one processing unit 810, at least one memory unit 820, a bus 830 connecting the various system components (including the memory unit 820 and the processing unit 810), a display unit 840, and the like.
Wherein the storage unit stores program code executable by the processing unit 810 to cause the processing unit 810 to perform steps according to various exemplary embodiments of the present disclosure described in the electronic prescription flow processing method section described above in this specification. For example, the processing unit 810 may perform the steps as shown in fig. 2, 3, 5.
The memory unit 820 may include readable media in the form of volatile memory units such as a random access memory unit (RAM)8201 and/or a cache memory unit 8202, and may further include a read only memory unit (ROM) 8203.
The memory unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 830 may be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 800 may also communicate with one or more external devices 800' (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 800, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 800 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 850. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 860. The network adapter 860 may communicate with other modules of the electronic device 800 via the bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, as shown in fig. 9, the technical solution according to the embodiment of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the above method according to the embodiment of the present disclosure.
The software product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The computer readable medium carries one or more programs which, when executed by a device, cause the computer readable medium to perform the functions of: acquiring a terminal application list of a user, wherein the terminal application list comprises installed application information; respectively inputting the application information in the terminal application list into a first word vector model and a second word vector model to generate a plurality of first word vectors and a plurality of second word vectors; performing information fusion on the plurality of first word vectors and the plurality of second word vectors to generate word vector information of the user; and generating feature data of the user through the word vector information.
Those skilled in the art will appreciate that the modules described above may be distributed in the apparatus according to the description of the embodiments, or may be modified accordingly in one or more apparatuses unique from the embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Exemplary embodiments of the present disclosure are specifically illustrated and described above. It is to be understood that the present disclosure is not limited to the precise arrangements, instrumentalities, or instrumentalities described herein; on the contrary, the disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A method for generating user characteristic data, comprising:
acquiring a terminal application list of a user, wherein the terminal application list comprises installed application information;
respectively inputting the application information in the terminal application list into a first word vector model and a second word vector model to generate a plurality of first word vectors and a plurality of second word vectors;
performing information fusion on the plurality of first word vectors and the plurality of second word vectors to generate word vector information of the user; and
generating feature data of the user through the word vector information.
2. The method of claim 1, further comprising:
and inputting the word vector data of the user into a user risk classification model to generate a risk classification identifier of the user and a corresponding risk probability of the risk classification identifier.
3. The method of claims 1-2, further comprising:
generating a first word vector model through a terminal application list of a historical user and a rapid text classification method; and/or
And generating a second word vector model through a terminal application list of the historical user and a word vector conversion method.
4. The method of claims 1-3, wherein inputting application information in the terminal application list into a first word vector model and a second word vector model, respectively, to generate a plurality of first word vectors and a plurality of second word vectors, comprises:
generating a word vector dictionary as the second word vector model by a word vector conversion method; and
inputting the application information in the application terminal list into the word vector dictionary to generate the second word vector.
5. The method of claims 1-4, wherein fusing the first plurality of word vectors and the second plurality of word vectors to generate word vector information for the user comprises:
generating a plurality of application word vectors by a plurality of first word vectors and the plurality of second word vectors; and
and performing information fusion on the multiple application word vectors to generate word vector information of the user.
6. The method of claims 1-5, wherein generating a plurality of application word vectors from a plurality of first word vectors and the plurality of second word vectors comprises:
acquiring a first word vector and a second word vector corresponding to single application information;
performing information fusion on the first word vector and the second word vector to generate an application word vector; and
and generating the plurality of application word vectors through a plurality of first word vectors and a plurality of second word vectors corresponding to all application information in the terminal application category.
7. The method of claims 1-6, wherein fusing information of the first word vector and the second word vector to generate an application word vector comprises:
and performing information fusion on the first word vector and the second word vector in a weighted average mode to generate the application word vector.
8. A user characteristic data generation apparatus, comprising:
the system comprises a list module, a service module and a service module, wherein the list module is used for acquiring a terminal application list of a user, and the terminal application list comprises installed application information;
the vector module is used for respectively inputting the application information in the terminal application list into a first word vector model and a second word vector model to generate a plurality of first word vectors and a plurality of second word vectors;
the fusion module is used for carrying out information fusion on the plurality of first word vectors and the plurality of second word vectors to generate word vector information of the user; and
and the characteristic module is used for generating the characteristic data of the user through the word vector information.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN201911263161.9A 2019-12-11 2019-12-11 User characteristic data generation method and device and electronic equipment Active CN111191677B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911263161.9A CN111191677B (en) 2019-12-11 2019-12-11 User characteristic data generation method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911263161.9A CN111191677B (en) 2019-12-11 2019-12-11 User characteristic data generation method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111191677A true CN111191677A (en) 2020-05-22
CN111191677B CN111191677B (en) 2023-09-26

Family

ID=70707771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911263161.9A Active CN111191677B (en) 2019-12-11 2019-12-11 User characteristic data generation method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111191677B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666382A (en) * 2020-06-19 2020-09-15 中信银行股份有限公司 User feature extraction method and device, electronic equipment and readable storage medium
CN111966730A (en) * 2020-10-23 2020-11-20 北京淇瑀信息科技有限公司 Risk prediction method and device based on permanent premises and electronic equipment
CN112183630A (en) * 2020-09-28 2021-01-05 中国平安人寿保险股份有限公司 Embedded vector generation method, device, equipment and medium based on embedded point hierarchy

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850662A (en) * 2015-06-08 2015-08-19 浙江每日互动网络科技有限公司 User portrait based mobile terminal intelligent message pushing method, server and system
US20160140561A1 (en) * 2013-07-03 2016-05-19 Google Inc. Fraud prevention based on user activity data
CN106651057A (en) * 2017-01-03 2017-05-10 有米科技股份有限公司 Mobile terminal user age prediction method based on installation package sequence table
CN107705156A (en) * 2017-10-16 2018-02-16 深圳大宇无限科技有限公司 User feature analysis method and device
CN108416663A (en) * 2018-01-18 2018-08-17 阿里巴巴集团控股有限公司 The method and device of the financial default risk of assessment
KR20180121466A (en) * 2017-04-06 2018-11-07 네이버 주식회사 Personalized product recommendation using deep learning
CN108845986A (en) * 2018-05-30 2018-11-20 中兴通讯股份有限公司 A kind of sentiment analysis method, equipment and system, computer readable storage medium
CN109582796A (en) * 2018-12-05 2019-04-05 深圳前海微众银行股份有限公司 Generation method, device, equipment and the storage medium of enterprise's public sentiment event network
CN110134948A (en) * 2019-04-23 2019-08-16 北京淇瑀信息科技有限公司 A kind of Financial Risk Control method, apparatus and electronic equipment based on text data
CN110134793A (en) * 2019-05-28 2019-08-16 电子科技大学 Text sentiment classification method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160140561A1 (en) * 2013-07-03 2016-05-19 Google Inc. Fraud prevention based on user activity data
CN104850662A (en) * 2015-06-08 2015-08-19 浙江每日互动网络科技有限公司 User portrait based mobile terminal intelligent message pushing method, server and system
CN106651057A (en) * 2017-01-03 2017-05-10 有米科技股份有限公司 Mobile terminal user age prediction method based on installation package sequence table
KR20180121466A (en) * 2017-04-06 2018-11-07 네이버 주식회사 Personalized product recommendation using deep learning
CN107705156A (en) * 2017-10-16 2018-02-16 深圳大宇无限科技有限公司 User feature analysis method and device
CN108416663A (en) * 2018-01-18 2018-08-17 阿里巴巴集团控股有限公司 The method and device of the financial default risk of assessment
CN108845986A (en) * 2018-05-30 2018-11-20 中兴通讯股份有限公司 A kind of sentiment analysis method, equipment and system, computer readable storage medium
CN109582796A (en) * 2018-12-05 2019-04-05 深圳前海微众银行股份有限公司 Generation method, device, equipment and the storage medium of enterprise's public sentiment event network
CN110134948A (en) * 2019-04-23 2019-08-16 北京淇瑀信息科技有限公司 A kind of Financial Risk Control method, apparatus and electronic equipment based on text data
CN110134793A (en) * 2019-05-28 2019-08-16 电子科技大学 Text sentiment classification method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666382A (en) * 2020-06-19 2020-09-15 中信银行股份有限公司 User feature extraction method and device, electronic equipment and readable storage medium
CN112183630A (en) * 2020-09-28 2021-01-05 中国平安人寿保险股份有限公司 Embedded vector generation method, device, equipment and medium based on embedded point hierarchy
CN112183630B (en) * 2020-09-28 2023-09-26 中国平安人寿保险股份有限公司 Embedding vector generation method, device, equipment and medium based on embedded point level
CN111966730A (en) * 2020-10-23 2020-11-20 北京淇瑀信息科技有限公司 Risk prediction method and device based on permanent premises and electronic equipment

Also Published As

Publication number Publication date
CN111191677B (en) 2023-09-26

Similar Documents

Publication Publication Date Title
CN109992710B (en) Click rate estimation method, system, medium and computing device
CN111210335B (en) User risk identification method and device and electronic equipment
CN110111139B (en) Behavior prediction model generation method and device, electronic equipment and readable medium
CN111178687B (en) Financial risk classification method and device and electronic equipment
CN111191677B (en) User characteristic data generation method and device and electronic equipment
CN111210336A (en) User risk model generation method and device and electronic equipment
CN111583018A (en) Credit granting strategy management method and device based on user financial performance analysis and electronic equipment
CN111783039A (en) Risk determination method, risk determination device, computer system and storage medium
CN111191893A (en) Wind control text processing method and device and electronic equipment
CN113298354A (en) Automatic generation method and device of business derivative index and electronic equipment
CN113297287B (en) Automatic user policy deployment method and device and electronic equipment
US20210150270A1 (en) Mathematical function defined natural language annotation
CN113610625A (en) Overdue risk warning method and device and electronic equipment
CN112017062A (en) Resource limit distribution method and device based on guest group subdivision and electronic equipment
CN114742645B (en) User security level identification method and device based on multi-stage time sequence multitask
CN114493853A (en) Credit rating evaluation method, credit rating evaluation device, electronic device and storage medium
CN111767290B (en) Method and apparatus for updating user portraits
CN113391988A (en) Method and device for losing user retention, electronic equipment and storage medium
CN113568739A (en) User resource limit distribution method and device and electronic equipment
CN113128773A (en) Training method of address prediction model, address prediction method and device
CN112348658A (en) Resource allocation method and device and electronic equipment
CN112348661A (en) Service strategy distribution method and device based on user behavior track and electronic equipment
CN111582648A (en) User policy generation method and device and electronic equipment
CN111626438B (en) Model migration-based user policy allocation method and device and electronic equipment
CN111126649A (en) Method and apparatus for generating information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant