WO2021093320A1 - 用于输出信息的方法和装置 - Google Patents

用于输出信息的方法和装置 Download PDF

Info

Publication number
WO2021093320A1
WO2021093320A1 PCT/CN2020/095193 CN2020095193W WO2021093320A1 WO 2021093320 A1 WO2021093320 A1 WO 2021093320A1 CN 2020095193 W CN2020095193 W CN 2020095193W WO 2021093320 A1 WO2021093320 A1 WO 2021093320A1
Authority
WO
WIPO (PCT)
Prior art keywords
values
variable
value
characteristic
discrete
Prior art date
Application number
PCT/CN2020/095193
Other languages
English (en)
French (fr)
Inventor
刘昊骋
李原
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Priority to EP20887795.1A priority Critical patent/EP3901789A4/en
Priority to KR1020217022835A priority patent/KR20210097204A/ko
Priority to JP2021541618A priority patent/JP7288062B2/ja
Publication of WO2021093320A1 publication Critical patent/WO2021093320A1/zh
Priority to US17/379,781 priority patent/US20210349920A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Definitions

  • the embodiments of the present application relate to the field of computer technology, and in particular to methods and devices for outputting information.
  • Big data brings high data latitude and large data magnitude to financial-related businesses. How to mine the credit characteristics of relevant financial service objects from big data is an important research topic at present.
  • the embodiments of the present application propose methods and devices for outputting information.
  • an embodiment of the present application provides a method for outputting information, including: acquiring characteristic data of a user, the characteristic data including a user ID, the value of a characteristic variable, and a label value corresponding to the user ID; determining the characteristic variable Discrete feature variables and continuous feature variables; determine the value set of discrete feature variables corresponding to different label values and determine the set of continuous feature variable values corresponding to different label values; discrete features corresponding to different label values.
  • the set of variable values and the set of continuous feature variable values determine the set of feature variable values corresponding to different label values; output the set of feature variable values corresponding to different label values.
  • determining the discrete feature variable and the continuous feature variable among the feature variables includes: for each feature variable, performing the following judgment step: counting the first number and different values of the feature variable. Determine the ratio of the second quantity to the first quantity; if the second quantity is greater than the preset quantity threshold and the ratio is greater than the preset ratio threshold, then the characteristic variable is identified as a continuous characteristic variable; If the second quantity is not greater than the preset quantity threshold or the ratio is not greater than the preset ratio threshold, then the characteristic variable is identified as a discrete characteristic variable.
  • the above determining the set of values of discrete feature variables corresponding to different label values includes: using the values of discrete feature variables and the label values corresponding to the user identifiers to train to obtain the first two classification model;
  • the first and second classification model determines the weight of each discrete feature variable; extracts some discrete feature variables according to the weight of each discrete feature variable; determines the extraction based on the label value corresponding to the user ID and the preset evidence weight calculation formula Part of the discrete feature variable value of the evidence weight value; according to the obtained evidence weight value, determine the value set of the discrete feature variable corresponding to different label values.
  • the foregoing determining the set of values of continuous feature variables corresponding to different label values includes: using the values of the continuous feature variables and the label values corresponding to the user identifiers to train to obtain the second binary classification model; The decision path of the second two-category model determines the set of values of continuous feature variables corresponding to different label values.
  • determining the set of values of the characteristic variables corresponding to different label values according to the set of discrete feature variable values corresponding to different label values and the set of continuous feature variable values includes: determining each label value The intersection or union of the set of discrete feature variable values corresponding to a single label value and the set of continuous feature variables to obtain the set of feature variable values corresponding to a single label value in each label value.
  • an embodiment of the present application provides an apparatus for outputting information, including: a data acquiring unit configured to acquire characteristic data of a user, the characteristic data including a user identifier, the value of a characteristic variable, and the corresponding user identifier
  • the variable classification unit is configured to determine the discrete feature variable and the continuous feature variable among the above-mentioned feature variables; the first set determining unit is configured to determine the set of discrete feature variable values corresponding to different label values And determining the set of continuous feature variable values corresponding to different label values; the second set determining unit is configured to determine the set of discrete feature variable values corresponding to different label values and the set of continuous feature variable values A collection of feature variable values corresponding to different label values; the collection output unit is configured to output a collection of feature variable values corresponding to different label values.
  • variable classification unit is further configured to: for each characteristic variable, perform the following judgment steps: count the first number of values of the characteristic variable and the second number of different values; determine the above-mentioned second number The ratio of the aforementioned first quantity to the aforementioned first quantity; if the aforementioned second quantity is greater than the preset quantity threshold and the aforementioned ratio is greater than the preset ratio threshold, then the characteristic variable is regarded as a continuous characteristic variable; if the aforementioned second quantity is not greater than the preset quantity threshold Or the above ratio is not greater than the preset ratio threshold, then the characteristic variable is regarded as a discrete characteristic variable.
  • the above-mentioned first set determining unit is further configured to: use the value of the discrete feature variable and the label value corresponding to the user identifier to train to obtain the first two-class classification model; according to the above-mentioned first two-class classification model, determine The weight of each discrete feature variable; according to the weight of each discrete feature variable, part of the discrete feature variable is extracted; according to the label value corresponding to the user ID and the preset evidence weight calculation formula, determine the extracted part of the discrete feature variable to take The value of the evidence weight value; according to the obtained evidence weight value, the set of values of the discrete feature variables corresponding to different label values is determined.
  • the above-mentioned first set determining unit is further configured to: use the value of the continuous feature variable and the label value corresponding to the user ID to train to obtain the second two-classification model; and according to the decision of the above-mentioned second two-classification model The path determines the set of continuous feature variable values corresponding to different label values.
  • the above-mentioned second set determining unit is further configured to: determine the intersection or union of the set of discrete feature variable values corresponding to a single tag value in each tag value and the set of continuous feature variable values, Obtain the value set of the characteristic variable corresponding to a single tag value in each tag value.
  • an embodiment of the present application provides a server, including: one or more processors; a storage device, on which one or more programs are stored, when the above one or more programs are processed by the above one or more The processor executes, so that the above-mentioned one or more processors implement the method described in any one of the embodiments of the first aspect.
  • an embodiment of the present application provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the method as described in any embodiment of the first aspect is implemented.
  • the method and device for outputting information provided by the above-mentioned embodiments of the present application may first obtain the characteristic data of the user.
  • the above-mentioned characteristic data may include user identification, the value of the characteristic variable, and the label value corresponding to each characteristic variable.
  • the above-mentioned characteristic variables are divided, and the discrete characteristic variables and continuous characteristic variables among them are determined.
  • the set of feature variables corresponding to different label values is determined.
  • the set of characteristic variables corresponding to different label values is output.
  • the method of this embodiment can mine the tag value corresponding to the user from the big data, and realizes efficient and automated information mining.
  • Fig. 1 is an exemplary system architecture diagram in which an embodiment of the present application can be applied;
  • Fig. 2 is a flowchart of an embodiment of a method for outputting information according to the present application
  • Fig. 3 is a schematic diagram of an application scenario of the method for outputting information according to the present application
  • Fig. 4 is a flowchart of another embodiment of a method for outputting information according to the present application.
  • Fig. 5 is a schematic structural diagram of an embodiment of an apparatus for outputting information according to the present application.
  • Fig. 6 is a schematic structural diagram of a computer system suitable for implementing a server according to an embodiment of the present application.
  • FIG. 1 shows an exemplary system architecture 100 to which an embodiment of the method for outputting information or the apparatus for outputting information of the present application can be applied.
  • the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on.
  • Various communication client applications such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, and social platform software, may be installed on the terminal devices 101, 102, and 103.
  • the terminal devices 101, 102, 103 may be hardware or software.
  • the terminal devices 101, 102, 103 can be various electronic devices, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and so on.
  • the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, to provide distributed services), or as a single software or software module. There is no specific limitation here.
  • the server 105 may be a server that provides various services, for example, a back-end server that processes the characteristic data generated by the user through the terminal devices 101, 102, and 103.
  • the background server can analyze and process the acquired feature data, and feed back the processing results (for example, the collection of feature variables corresponding to different tag values) to the terminal devices 101, 102, 103.
  • the server 105 may be hardware or software.
  • the server 105 can be implemented as a distributed server cluster composed of multiple servers, or as a single server.
  • the server 105 is software, it can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. There is no specific limitation here.
  • the method for outputting information provided by the embodiment of the present application is generally executed by the server 105.
  • a device for outputting information is generally provided in the server 105.
  • terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks, and servers.
  • a process 200 of an embodiment of the method for outputting information according to the present application includes the following steps:
  • Step 201 Acquire characteristic data of the user.
  • the executor of the method for outputting information may obtain the characteristic data of the user through a wired connection or a wireless connection.
  • the above-mentioned users may be users who have registered on a certain website.
  • the above-mentioned characteristic data may include the user identification, the value of the characteristic variable, and the label value corresponding to the user identification.
  • the user identification may be the user's registration ID on the website.
  • the characteristic variable may be the user's age, education, monthly income, monthly consumption amount, and so on.
  • the aforementioned characteristic variables may include discrete characteristic variables and continuous characteristic variables. Discrete characteristic variable means that its value can only be calculated in natural numbers or integer units. Conversely, variables that can take arbitrary values within a certain interval are called continuous feature variables.
  • the tag value corresponding to the user can include 0 or 1. Different tag values can indicate different user qualities. For example, a tag value of 0 indicates that the user's credit is poor, and a tag value of 1 indicates that the user's credit is good. Alternatively, a tag value of 0 indicates that the user has the ability to repay, and a tag value of 1 indicates that the user does not have the ability to repay.
  • the executive body can obtain the user's characteristic data from the backend server used to support a certain website, or from the database used to store the user's characteristic data.
  • Step 202 Determine discrete feature variables and continuous feature variables in the feature variables.
  • the executive body can analyze the characteristic variables and determine the discrete characteristic variables and continuous characteristic variables among them. Specifically, the executive body can determine whether a characteristic variable is a discrete characteristic variable or a continuous characteristic variable based on the number of different values of the characteristic variable.
  • the execution subject can determine the discrete feature variable and the continuous feature variable for each feature variable through the following judgment steps not shown in FIG. 2: statistics of the feature variable The first number of values and the second number of different values; determine the ratio of the first number to the second number; if the first number is greater than the preset number threshold and the ratio is greater than the preset ratio threshold, then the characteristic variable is identified as Continuous feature variable; if the first number is not greater than the preset number threshold or the ratio is not greater than the preset ratio threshold, then the feature variable is identified as a discrete feature variable.
  • the execution subject can count the first number of values of each characteristic variable and the second number of different values.
  • the characteristic variable is age.
  • the value of age can include 20, 25, 22, 29, 25, 22, and 26.
  • the first number of values of age is 7, and the second number of different values is 5 (the repeated 25 and 22).
  • the executive body can calculate the ratio of the second quantity to the first quantity. For the previous example, the above ratio is 5/7. If the second number is greater than the preset number threshold, and the ratio is greater than the preset ratio threshold, then the characteristic variable is identified as a continuous characteristic variable. Otherwise, the characteristic variable is regarded as a discrete characteristic variable.
  • Step 203 Determine a set of discrete feature variable values corresponding to different label values and a set of continuous feature variable values corresponding to different label values.
  • the executive body can respectively determine the set of discrete feature variable values corresponding to different label values and the set of continuous feature variable values corresponding to different label values. Specifically, the executive body may perform statistics on the characteristic data of a large number of users, and determine the value of the discrete characteristic variable and the value of the continuous characteristic variable that are common among multiple users with the same label value. Then, according to the statistical results, a set of discrete feature variable values and a set of continuous feature variable values corresponding to different label values are obtained.
  • the executive body counts the characteristic data of 1000 users, and finds that the discrete characteristic variables (educational background) shared by 780 users with a label value of 1 are all "postgraduate students and above", and their ages are all located in "Between 25 and 35", and the monthly income is "more than 15,000 yuan", and the monthly consumption is "less than 8,000 yuan”. Then the executive body can determine that the set of discrete feature variable values corresponding to the label value of 1 includes elements: education background is "postgraduate and above", age is "between 25 and 35". The set of continuous feature variable values corresponding to the label value 1 includes elements: monthly income "more than 15,000 yuan", and monthly consumption "less than 8,000 yuan”.
  • Step 204 Determine the value set of the characteristic variable corresponding to the different label value according to the set of discrete characteristic variable values and the set of continuous characteristic variable values corresponding to different label values.
  • the executive body After determining the set of discrete feature variable values corresponding to different label values and the set of continuous feature variable values, the executive body can determine the set of feature variable values corresponding to different label values based on the two.
  • the execution subject can determine the set of feature variable values corresponding to different tag values according to the following steps not shown in FIG. 2: determine the value corresponding to a single tag value in each tag value The intersection or union of the value set of the discrete feature variable and the set value of the continuous feature variable is used to obtain the set of feature variable values corresponding to a single tag value in each tag value.
  • the executive body can take the intersection or union of the set of discrete feature variable values corresponding to a single tag value and the set of continuous feature variable values to obtain the set of feature variable values corresponding to a single tag value . It is understandable that you can choose whether to perform the intersection operation or the union operation on the two sets according to the specific situation of the business.
  • Step 205 Output a set of characteristic variable values corresponding to different label values.
  • FIG. 3 is a schematic diagram of an application scenario of the method for outputting information according to this embodiment.
  • the server obtains the characteristic data of the user on a financial website.
  • the feature data is processed in steps 201 to 204, and the feature with a label value of 1 (credit quality user) is determined to be 25-40 years old, undergraduate degree or above, monthly income greater than 8,000 yuan, deposits more than 50,000 yuan, and consumption Less than 10,000 yuan.
  • the features with a label value of 0 (users with poor credit quality) are high school education, monthly income of less than 8,000 yuan, deposits of less than 50,000 yuan, and consumption of more than 10,000 yuan.
  • the characteristic data of the user may be acquired first.
  • the above-mentioned characteristic data may include user identification, the value of the characteristic variable, and the label value corresponding to each characteristic variable.
  • the above-mentioned characteristic variables are divided, and the discrete characteristic variables and continuous characteristic variables among them are determined.
  • the set of feature variables corresponding to different label values is determined.
  • the set of characteristic variables corresponding to different label values is output.
  • the method of this embodiment can mine the tag value corresponding to the user from the big data, and realizes efficient and automated information mining.
  • FIG. 4 shows a flow 400 of another embodiment of the method for outputting information according to the present application.
  • the method for outputting information in this embodiment may include the following steps:
  • Step 401 Acquire characteristic data of the user.
  • Step 402 Determine discrete feature variables and continuous feature variables in the feature variables.
  • Step 4031 for discrete feature variables, perform steps 4031a to 4031e.
  • Step 4031a Use the value of the discrete feature variable and the label value corresponding to the user ID to train to obtain the first two classification model.
  • the execution body may use the values of the discrete feature variables and the label values corresponding to the user identification as training samples to train to obtain the first two classification model.
  • the execution body may use the value of the discrete feature variable and the label value corresponding to the user ID to obtain the first two classification model by using the XgBoost multi-round training parameter optimization method.
  • XGBoost eXtreme Gradient Boosting
  • the traditional XGBoost algorithm is derived from the Boosting integrated learning algorithm, and it incorporates the advantages of the Bagging integrated learning method in the evolution process.
  • the custom loss function through the Gradient Boosting framework improves the algorithm’s ability to solve general problems. Therefore, the XGBoost algorithm is used in academic competitions and industrial circles. The domain is used very frequently and can be effectively applied to specific scenarios such as classification, regression, and sorting.
  • Step 4031b Determine the weight of each discrete feature variable according to the first two classification model.
  • each weight for each discrete feature variable can also be obtained.
  • the above weight is obtained by adding up the predicted scores of each discrete feature variable for each tree.
  • Step 4031c according to the weight of each discrete feature variable, extract part of the discrete feature variable.
  • the executive body can sort the discrete feature variables according to the weight of each discrete feature variable, and extract the top 10% of the discrete feature variables in the ranking as feature variables for further discussion.
  • Step 4031d according to the label value corresponding to the user ID and the preset evidence weight calculation formula, determine the evidence weight value of the value of the extracted partial discrete feature variable.
  • the executive body can also calculate the weight of evidence (WOE) value of each discrete feature variable extracted according to the label value corresponding to the user identification and the preset weight of evidence calculation formula.
  • WE weight of evidence
  • WOE ln (proportion of users with a label of 1/proportion of users with a label of 0) ⁇ 100%
  • the proportion of users with a label of 1 the number of users with a label of 1 / the total number of users
  • the proportion of users with a label of 0 the number of users with a label of 0 / the total number of users.
  • Step 4031e Determine the value set of discrete feature variables corresponding to different label values according to the obtained evidence weight value.
  • the executive body can determine the set of discrete feature variable values corresponding to different label values. For example, the executive body can add the discrete feature variable with WOE value> 0 to the set of discrete feature variable corresponding to tag value 1, and add the discrete feature variable with WOE value ⁇ 0 to the corresponding tag value 0 A collection of values of discrete feature variables.
  • Step 4032 for continuous feature variables, perform steps 4032a to 4032b.
  • Step 4032a using the value of the continuous feature variable and the label value corresponding to the user ID to train to obtain the second binary classification model.
  • the executive body can use the value of each continuous feature variable and the label value corresponding to the user ID to use the decision tree to perform multiple rounds of training to obtain the decision tree split point structure, that is, the second two classification model.
  • Step 4032b Determine a set of values of continuous feature variables corresponding to different label values according to the decision path of the second two-class classification model.
  • the set of values of continuous feature variables corresponding to the label value 1 can be obtained according to the decision path with the label value 1 obtained in the second two-class classification model. It is also possible to obtain a set of values of continuous feature variables corresponding to a label value of 0 according to the decision path with a label value of 0 obtained in the second binary classification model.
  • Step 404 Determine the intersection or union of the discrete feature variable value set corresponding to a single tag value in each tag value and the continuous feature variable set value, and obtain the feature variable value corresponding to the single tag value in each tag value Collection.
  • Step 405 Output a set of characteristic variable values corresponding to different label values.
  • the executive body can formulate corresponding rules. For example, according to the set of feature variable values corresponding to the label value 1, the rule “users who are 25-40 years old, have a bachelor’s degree or above, have a monthly income of more than 8,000 yuan, deposit more than 50,000 yuan, and consume less than 10,000 yuan Credit quality users”.
  • the method for outputting information can use a two-class model to realize the mining of user characteristic data, thereby making the information excavated more credible.
  • this application provides an embodiment of a device for outputting information.
  • the device embodiment corresponds to the method embodiment shown in FIG.
  • the device can be specifically applied to various electronic devices.
  • the apparatus 500 for outputting information in this embodiment includes: a data acquisition unit 501, a variable classification unit 502, a first set determination unit 503, a second set determination unit 504, and a set output unit 505.
  • the data acquisition unit 501 is configured to acquire characteristic data of the user.
  • the characteristic data includes the user ID, the value of the characteristic variable, and the label value corresponding to the user ID.
  • the variable classification unit 502 is configured to determine discrete feature variables and continuous feature variables among the feature variables.
  • the first set determining unit 503 is configured to determine a set of discrete feature variable values corresponding to different label values and a set of continuous feature variable values corresponding to different label values.
  • the second set determining unit 504 is configured to determine a set of values of characteristic variables corresponding to different label values according to a set of discrete characteristic variable values corresponding to different label values and a set of continuous characteristic variable values.
  • the set output unit 505 is configured to output a set of characteristic variable values corresponding to different tag values.
  • variable classification unit 502 may be further configured to: for each characteristic variable, perform the following judgment step: count the first number of values of the characteristic variable and different values of the characteristic variable. Second quantity; determine the ratio of the second quantity to the first quantity; if the second quantity is greater than the preset quantity threshold and the ratio is greater than the preset ratio threshold, then the characteristic variable is regarded as a continuous characteristic variable; if the second quantity If it is not greater than the preset number threshold or the ratio is not greater than the preset ratio threshold, the characteristic variable is identified as a discrete characteristic variable.
  • the first set determining unit 503 may be further configured to: use the value of the discrete feature variable and the label value corresponding to the user ID to train to obtain the first two-class classification model;
  • the first and second classification model determines the weight of each discrete feature variable; extracts some discrete feature variables according to the weight of each discrete feature variable; determines the extraction based on the label value corresponding to the user ID and the preset evidence weight calculation formula Part of the discrete feature variable value of the evidence weight value; according to the obtained evidence weight value, determine the value set of the discrete feature variable corresponding to different label values.
  • the first set determining unit 503 may be further configured to: use the value of the continuous feature variable and the label value corresponding to the user ID to train to obtain the second binary classification model; The decision path of the second two-category model determines the set of values of continuous feature variables corresponding to different label values.
  • the second set determining unit 504 may be further configured to: determine a set of discrete feature variable values and a set of continuous feature variables corresponding to a single tag value in each tag value The intersection or union of values is used to obtain the set of values of the characteristic variables corresponding to a single tag value in each tag value.
  • the units 501 to 505 recorded in the apparatus 500 for outputting information respectively correspond to the steps in the method described with reference to FIG. 2. Therefore, the operations and features described above for the method for outputting information are also applicable to the device 500 and the units contained therein, and will not be repeated here.
  • FIG. 6 shows a schematic structural diagram of an electronic device (for example, the server in FIG. 1) 600 suitable for implementing the embodiments of the present disclosure.
  • the server shown in FIG. 6 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
  • the electronic device 600 may include a processing device (such as a central processing unit, a graphics processor, etc.) 601, which may be loaded into a random access device according to a program stored in a read-only memory (ROM) 602 or from a storage device 608.
  • the program in the memory (RAM) 603 executes various appropriate actions and processing.
  • various programs and data required for the operation of the electronic device 600 are also stored.
  • the processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to the bus 604.
  • the following devices can be connected to the I/O interface 605: including input devices 606 such as touch screens, touch pads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; including, for example, liquid crystal displays (LCD), speakers, vibration An output device 607 such as a device; a storage device 608 such as a magnetic tape, a hard disk, etc.; and a communication device 609.
  • the communication device 609 may allow the electronic device 600 to perform wireless or wired communication with other devices to exchange data.
  • FIG. 6 shows an electronic device 600 having various devices, it should be understood that it is not required to implement or have all the illustrated devices. It may alternatively be implemented or provided with more or fewer devices. Each block shown in FIG. 6 may represent one device, or may represent multiple devices as needed.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602.
  • the processing device 601 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
  • the computer-readable medium described in the embodiments of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable removable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable signal medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs. When the above-mentioned one or more programs are executed by the electronic device, the electronic device: acquires characteristic data of the user.
  • the characteristic data includes the user identification, the value of the characteristic variable, and The label value corresponding to the user ID of each characteristic variable; determine the discrete characteristic variable and continuous characteristic variable in the characteristic variable; determine the set of discrete characteristic variable values corresponding to different label values and determine the continuous corresponding to different label values.
  • the set of values of type characteristic variables; the set of values of characteristic variables corresponding to different label values is determined according to the set of values of discrete characteristic variables and the set of values of continuous characteristic variables corresponding to different label values; the set of values of characteristic variables corresponding to different label values is determined; different label values are output The set of corresponding characteristic variable values.
  • the computer program code for performing the operations of the embodiments of the present disclosure may be written in one or more programming languages or a combination thereof, the programming languages including object-oriented programming languages-such as Java, Smalltalk, C++, Also includes conventional procedural programming languages-such as "C" language or similar programming languages.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagram may represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logical function Executable instructions.
  • the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner.
  • the described unit may also be provided in the processor.
  • a processor includes a data acquisition unit, a variable classification unit, a first set determination unit, a second set determination unit, and a set output unit.
  • the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • a data acquisition unit can also be described as a "unit for acquiring characteristic data of a user".

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Software Systems (AREA)
  • Technology Law (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种用于输出信息的方法和装置。所述方法包括:获取用户的特征数据(201),特征数据包括用户标识、特征变量的取值和用户标识对应的标签值;确定特征变量中的离散型特征变量和连续型特征变量(202);确定不同标签值对应的离散型特征变量取值的集合以及确定不同标签值对应的连续型特征变量取值的集合(203);根据不同标签值对应的离散型特征变量取值的集合和连续型特征变量取值的集合,确定不同标签值对应的特征变量取值的集合(204);输出不同标签值对应的特征变量取值的集合(205)。该方法可以从大数据中挖掘出用户对应的标签值,实现了高效、自动化的信息挖掘。

Description

用于输出信息的方法和装置
本专利申请要求于2019年11月13日提交的、申请号为201911106997.8、申请人为北京百度网讯科技有限公司、发明名称为“用于输出信息的方法和装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。
技术领域
本申请实施例涉及计算机技术领域,具体涉及用于输出信息的方法和装置。
背景技术
目前,随着我国金融业的发展,金融服务覆盖面逐渐扩大。对于在银行贷过款或是在商业银行办理过个人信用卡的用户,央行存下了他们的信用记录,如贷款金额、次数、是否按时偿还以及信用卡消费透支偿还等情况。商业银行可以付费将信用记录调出,但是对于没有办理过信用卡并且没有贷款记录的金融服务对象,其相关信用资料缺乏。
大数据给金融相关业务带来了数据纬度高、数据量级大的问题,如何从大数据中挖掘出相关金融服务对象的信用特征,是目前的重要研究课题。
发明内容
本申请实施例提出了用于输出信息的方法和装置。
第一方面,本申请实施例提供了一种用于输出信息的方法,包括:获取用户的特征数据,特征数据包括用户标识、特征变量的取值和用户标识对应的标签值;确定特征变量中的离散型特征变量和连续型特征变量;确定不同标签值对应的离散型特征变量取值的集合以及确定 不同标签值对应的连续型特征变量取值的集合;根据不同标签值对应的离散型特征变量取值的集合和连续型特征变量取值的集合,确定不同标签值对应的特征变量取值的集合;输出不同标签值对应的特征变量取值的集合。
在一些实施例中,上述确定上述特征变量中的离散型特征变量和连续型特征变量,包括:对于每个特征变量,执行以下判断步骤:统计该特征变量的取值的第一数量以及不同取值的第二数量;确定上述第二数量与上述第一数量的比值;如果上述第二数量大于预设数量阈值以及上述比值大于预设比值阈值,则将该特征变量认定为连续型特征变量;如果上述第二数量不大于预设数量阈值或上述比值不大于预设比值阈值,则将该特征变量认定为离散型特征变量。
在一些实施例中,上述确定不同标签值对应的离散型特征变量取值的集合,包括:利用离散型特征变量的取值以及用户标识对应的标签值,训练得到第一二分类模型;根据上述第一二分类模型,确定每个离散型特征变量的权重;根据各离散型特征变量的权重,提取部分离散型特征变量;根据用户标识对应的标签值以及预设的证据权重计算公式,确定提取的部分离散型特征变量取值的证据权重值;根据得到的证据权重值,确定不同标签值对应的离散型特征变量取值的集合。
在一些实施例中,上述确定不同标签值对应的连续型特征变量取值的集合,包括:利用连续型特征变量的取值以及用户标识对应的标签值,训练得到第二二分类模型;根据上述第二二分类模型的决策路径,确定不同标签值对应的连续型特征变量取值的集合。
在一些实施例中,上述根据不同标签值对应的离散型特征变量取值的集合和连续型特征变量取值的集合,确定不同标签值对应的特征变量取值的集合,包括:确定各标签值中单个标签值对应的离散型特征变量取值的集合和连续型特征变量的集合取值的交集或并集,得到各标签值中单个标签值对应的特征变量取值的集合。
第二方面,本申请实施例提供了一种用于输出信息的装置,包括:数据获取单元,被配置成获取用户的特征数据,上述特征数据包括用户标识、特征变量的取值和用户标识对应的标签值;变量分类单元, 被配置成确定上述特征变量中的离散型特征变量和连续型特征变量;第一集合确定单元,被配置成确定不同标签值对应的离散型特征变量取值的集合以及确定不同标签值对应的连续型特征变量取值的集合;第二集合确定单元,被配置成根据不同标签值对应的离散型特征变量取值的集合和连续型特征变量取值的集合,确定不同标签值对应的特征变量取值的集合;集合输出单元,被配置成输出不同标签值对应的特征变量取值的集合。
在一些实施例中,变量分类单元进一步被配置成:对于每个特征变量,执行以下判断步骤:统计该特征变量的取值的第一数量以及不同取值的第二数量;确定上述第二数量与上述第一数量的比值;如果上述第二数量大于预设数量阈值以及上述比值大于预设比值阈值,则将该特征变量认定为连续型特征变量;如果上述第二数量不大于预设数量阈值或上述比值不大于预设比值阈值,则将该特征变量认定为离散型特征变量。
在一些实施例中,上述第一集合确定单元进一步被配置成:利用离散型特征变量的取值以及用户标识对应的标签值,训练得到第一二分类模型;根据上述第一二分类模型,确定每个离散型特征变量的权重;根据各离散型特征变量的权重,提取部分离散型特征变量;根据用户标识对应的标签值以及预设的证据权重计算公式,确定提取的部分离散型特征变量取值的证据权重值;根据得到的证据权重值,确定不同标签值对应的离散型特征变量取值的集合。
在一些实施例中,上述第一集合确定单元进一步被配置成:利用连续型特征变量的取值以及用户标识对应的标签值,训练得到第二二分类模型;根据上述第二二分类模型的决策路径,确定不同标签值对应的连续型特征变量取值的集合。
在一些实施例中,上述第二集合确定单元进一步被配置成:确定各标签值中单个标签值对应的离散型特征变量取值的集合和连续型特征变量的集合取值的交集或并集,得到各标签值中单个标签值对应的特征变量取值的集合。
第三方面,本申请实施例提供了一种服务器,包括:一个或多个 处理器;存储装置,其上存储有一个或多个程序,当上述一个或多个程序被上述一个或多个处理器执行,使得上述一个或多个处理器实现如第一方面任一实施例所描述的方法。
第四方面,本申请实施例提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现如第一方面任一实施例所描述的方法。
本申请的上述实施例提供的用于输出信息的方法和装置,可以首先获取用户的特征数据。上述特征数据可以包括用户标识、特征变量的取值和每个特征变量对应的标签值。然后,对上述特征变量进行划分,确定其中的离散型特征变量和连续型特征变量。确定出不同的标签值对应的离散型特征变量的集合以及不同的标签值对应的连续型特征变量的集合。根据得到的标签值与集合的对应关系,确定不同标签值对应的特征变量的集合。最后,将不同标签值对应的特征变量的集合输出。本实施例的方法,可以从大数据中挖掘出用户对应的标签值,实现了高效、自动化的信息挖掘。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是本申请的一个实施例可以应用于其中的示例性系统架构图;
图2是根据本申请的用于输出信息的方法的一个实施例的流程图;
图3是根据本申请的用于输出信息的方法的一个应用场景的示意图;
图4是根据本申请的用于输出信息的方法的另一个实施例的流程图;
图5是根据本申请的用于输出信息的装置的一个实施例的结构示意图;
图6是适于用来实现本申请实施例的服务器的计算机系统的结构 示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了可以应用本申请的用于输出信息的方法或用于输出信息的装置的实施例的示例性系统架构100。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。
终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
服务器105可以是提供各种服务的服务器,例如对用户通过终端设备101、102、103产生的特征数据进行处理的后台服务器。后台服务器可以对获取到的特征数据进行分析等处理,并将处理结果(例如不同标签值对应的特征变量的集合)反馈给终端设备101、102、103。
需要说明的是,服务器105可以是硬件,也可以是软件。当服务器105为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器105为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
需要说明的是,本申请实施例所提供的用于输出信息的方法一般由服务器105执行。相应地,用于输出信息的装置一般设置于服务器105中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
继续参考图2,示出了根据本申请的用于输出信息的方法的一个实施例的流程200。本实施例的用于输出信息的方法,包括以下步骤:
步骤201,获取用户的特征数据。
在本实施例中,用于输出信息的方法的执行主体(例如图1所示的服务器105)可以通过有线连接方式或者无线连接方式获取用户的特征数据。上述用户可以是在某一网站注册过的用户。上述特征数据可以包括用户标识、特征变量的取值以及用户标识对应的标签值。
其中,用户标识可以为用户在网站的注册ID。特征变量可以是用户的年龄、学历、月收入、月消费金额等等。上述特征变量可以包括离散型特征变量和连续型特征变量。离散型特征变量是指其取值只能用自然数或整数单位计算。反之,在一定区间内可以任意取值的变量叫连续型特征变量。用户对应的标签值可以包括0或1。不同的标签值可以表示不同的用户品质。例如标签值为0表示用户的信用差,标签值为1表示用户的信用好。或者,标签值为0表示用户具备偿还能力,标签值为1表示用户不具有偿还能力。
执行主体可以从用于支持某一网站的后台服务器处获取用户的特征数据,也可以从用于存储用户的特征数据的数据库中获取。
步骤202,确定特征变量中的离散型特征变量和连续型特征变量。
在获取到特征数据后,执行主体可以对特征变量进行分析,确定其中的离散型特征变量和连续型特征变量。具体的,执行主体可以根 据特征变量的不同取值的个数来确定某一特征变量是离散型特征变量,还是连续型特征变量。
在本实施例的一些可选的实现方式中,执行主体可以针对每个特征变量,通过图2中未示出的以下判断步骤来确定离散型特征变量和连续型特征变量:统计该特征变量的取值的第一数量以及不同取值的第二数量;确定第一数量与第二数量的比值;如果第一数量大于预设数量阈值以及比值大于预设比值阈值,则将该特征变量认定为连续型特征变量;如果第一数量不大于预设数量阈值或比值不大于预设比值阈值,则将该特征变量认定为离散型特征变量。
本实现方式中,执行主体可以统计每个特征变量的取值的第一数量以及不同取值的第二数量。举例来说,特征变量为年龄。年龄的取值可以包括20、25、22、29、25、22、26。则年龄的取值的第一数量为7,不同取值的第二数量为5(去的重复的25和22)。然后,执行主体可以计算第二数量与第一数量的比值。对于前面的例子来说,上述比值为5/7。如果第二数量大于预设数量阈值,并且比值大于预设比值阈值,则将该特征变量认定为连续型特征变量。否则,则将该特征变量认定为离散型特征变量。
步骤203,确定不同标签值对应的离散型特征变量取值的集合以及确定不同的标签值对应的连续型特征变量取值的集合。
在确定离散型特征变量和连续型特征变量后,执行主体可以分别确定不同的标签值对应的离散型特征变量取值的集合以及不同的标签值对应的连续型特征变量取值的集合。具体的,执行主体可以对大量用户的特征数据进行统计,确定拥有相同标签值的多个用户之间共同的离散型特征变量的取值和连续型特征变量的取值。然后,根据统计结果,得到不同标签值对应的离散型特征变量取值的集合和连续型特征变量取值的集合。举例来说,执行主体对1000个用户的特征数据进行统计,发现标签值为1的780个用户之间共有的离散型特征变量(学历)的取值均为“研究生及以上”,年龄均位于“25~35之间”,且月收入均为“大于1.5万元”,月消费均为“小于8000元”。则执行主体可以确定标签值为1对应的离散型特征变量取值的集合包括元素:学 历为“研究生及以上”,年龄为“25~35之间”。确定标签值为1对应的连续型特征变量取值的集合包括元素:月收入“大于1.5万元”,月消费“小于8000元”。
步骤204,根据不同标签值对应的离散型特征变量取值的集合和连续型特征变量取值的集合,确定不同标签值对应的特征变量取值的集合。
在确定了不同的标签值对应的离散型特征变量取值的集合和连续型特征变量取值的集合后,执行主体可以根据二者,确定出不同标签值对应的特征变量取值的集合。
在本实施例的一些可选的实现方式中,执行主体可以根据图2中未示出的以下步骤来确定不同标签值对应的特征变量取值的集合:确定各标签值中单个标签值对应的离散型特征变量取值的集合和连续型特征变量的集合取值的交集或并集,得到各标签值中单个标签值对应的特征变量取值的集合。
本实现方式中,执行主体可以对单个标签值对应的离散型特征变量取值的集合和连续型特征变量取值的集合取交集或取并集,得到单个标签值对应的特征变量取值的集合。可以理解的是,可以根据业务的具体情况,选择对两个集合进行取交集操作还是取并集操作。
步骤205,输出不同标签值对应的特征变量取值的集合。
继续参见图3,图3是根据本实施例的用于输出信息的方法的一个应用场景的示意图。在图3的应用场景中,服务器获取了用户在某金融网站的特征数据。并对特征数据进行步骤201~204的处理,确定出标签值为1(信用优质用户)的特征为年龄25-40岁、学历本科及以上、月收入大于8000元、存款超过5万元以及消费低于1万元。标签值为0(信用劣质用户)的特征为高中学历、月收入低于8000元、存款不到5万元以及消费超过1万元。
本申请的上述实施例提供的用于输出信息的方法,可以首先获取用户的特征数据。上述特征数据可以包括用户标识、特征变量的取值和每个特征变量对应的标签值。然后,对上述特征变量进行划分,确定其中的离散型特征变量和连续型特征变量。确定出不同的标签值对 应的离散型特征变量的集合以及不同的标签值对应的连续型特征变量的集合。根据得到的标签值与集合的对应关系,确定不同标签值对应的特征变量的集合。最后,将不同标签值对应的特征变量的集合输出。本实施例的方法,可以从大数据中挖掘出用户对应的标签值,实现了高效、自动化的信息挖掘。
继续参见图4,其示出了根据本申请的用于输出信息的方法的另一个实施例的流程400。如图4所示,本实施例的用于输出信息的方法,可以包括以下步骤:
步骤401,获取用户的特征数据。
步骤402,确定特征变量中的离散型特征变量和连续型特征变量。
步骤4031,对于离散型特征变量,执行步骤4031a~4031e。
步骤4031a,利用离散型特征变量的取值以及用户标识对应的标签值,训练得到第一二分类模型。
本实施例中,执行主体可以利用各离散型特征变量的取值以及用户标识对应的标签值作为训练样本,训练得到第一二分类模型。具体的,执行主体可以利用离散型特征变量的取值以及用户标识对应的标签值,采用XgBoost多轮训练参数优化方法来得到第一二分类模型。XGBoost(eXtreme Gradient Boosting)由Tian Chen于2015年提出的一种集成学习算法。传统XGBoost算法源于Boosting集成学习算法,在演化过程中又融入Bagging集成学习方法的优势,通过Gradient Boosting框架自定义损失函数提高了算法解决通用问题的能力,由此XGBoost算法在学术竞赛和工业界领域使用非常频繁,能有效应用至分类,回归,排序等具体场景。
步骤4031b,根据第一二分类模型,确定每个离散型特征变量的权重。
在训练得到上述第一二分类模型后,还可以得到每个针对每个离散型特征变量的权重。上述权重为每个树对每个离散型特征变量的预测分值相加得到的。
步骤4031c,根据各离散型特征变量的权重,提取部分离散型特征变量。
执行主体可以根据各离散型特征变量的权重,对各离散型特征变量进行排序,将排序中位于前10%的离散型特征变量提取出来,作为进一步讨论的特征变量。
步骤4031d,根据用户标识对应的标签值以及预设的证据权重计算公式,确定提取的部分离散型特征变量取值的证据权重值。
执行主体还可以根据用户标识对应的标签值以及预设的证据权重计算公式,来计算提取出来的每个离散型特征变量取值的证据权重(WOE)值。上述预设的证据权重计算公式可以如下所示:
WOE=ln(标签为1的用户占比/标签为0的用户占比)×100%,
其中,标签为1的用户占比=标签为1的用户数量/总的用户数量,标签为0的用户占比=标签为0的用户数量/总的用户数量。
步骤4031e,根据得到的证据权重值,确定不同标签值对应的离散型特征变量取值的集合。
执行主体在确定各提取的离散型特征变量取值的WOE值后,可以确定不同标签值对应的离散型特征变量取值的集合。例如,执行主体可以将WOE值>0的离散型特征变量取值加入标签值1对应的离散型特征变量取值的集合,将WOE值≤0的离散型特征变量取值加入标签值0对应的离散型特征变量取值的集合。
步骤4032,对于连续型特征变量,执行步骤4032a~4032b。
步骤4032a,利用连续型特征变量的取值以及用户标识对应的标签值,训练得到第二二分类模型。
执行主体可以利用各连续型特征变量的取值以及用户标识对应的标签值,利用决策树进行多轮训练,得到决策树分裂点结构,即第二二分类模型。
步骤4032b,根据第二二分类模型的决策路径,确定不同标签值对应的连续型特征变量取值的集合。
在得到上述第二二分类模型后,可以根据第二二分类模型中得到标签值为1的决策路径,得到标签值1对应的连续型特征变量取值的集合。还可以根据第二二分类模型中得到标签值为0的决策路径,得到标签值0对应的连续型特征变量取值的集合。
步骤404,确定各标签值中单个标签值对应的离散型特征变量取值的集合和连续型特征变量的集合取值的交集或并集,得到各标签值中单个标签值对应的特征变量取值的集合。
步骤405,输出不同标签值对应的特征变量取值的集合。
在得到不同标签值对应的特征变量取值的集合后,执行主体可以制定相应的规则。例如,根据标签值1对应的特征变量取值的集合,确定规则“年龄25-40岁、学历本科及以上、月收入大于8000元、存款超过5万元以及消费低于1万元的用户为信用优质用户”。
本申请的上述实施例提供的用于输出信息的方法,可以利用二分类模型来实现对用户特征数据的挖掘,从而使得挖掘出的信息可信度更高。
进一步参考图5,作为对上述各图所示方法的实现,本申请提供了一种用于输出信息的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图5所示,本实施例的用于输出信息的装置500包括:数据获取单元501、变量分类单元502、第一集合确定单元503、第二集合确定单元504和集合输出单元505。
数据获取单元501,被配置成获取用户的特征数据。特征数据包括用户标识、特征变量的取值和用户标识对应的标签值。
变量分类单元502,被配置成确定特征变量中的离散型特征变量和连续型特征变量。
第一集合确定单元503,被配置成确定不同标签值对应的离散型特征变量取值的集合以及确定不同标签值对应的连续型特征变量取值的集合。
第二集合确定单元504,被配置成根据不同标签值对应的离散型特征变量取值的集合和连续型特征变量取值的集合,确定不同标签值对应的特征变量取值的集合。
集合输出单元505,被配置成输出不同标签值对应的特征变量取值的集合。
在本实施例的一些可选的实现方式中,变量分类单元502可以进 一步被配置成:对于每个特征变量,执行以下判断步骤:统计该特征变量的取值的第一数量以及不同取值的第二数量;确定第二数量与所述第一数量的比值;如果第二数量大于预设数量阈值以及比值大于预设比值阈值,则将该特征变量认定为连续型特征变量;如果第二数量不大于预设数量阈值或比值不大于预设比值阈值,则将该特征变量认定为离散型特征变量。
在本实施例的一些可选的实现方式中,第一集合确定单元503可以进一步被配置成:利用离散型特征变量的取值以及用户标识对应的标签值,训练得到第一二分类模型;根据第一二分类模型,确定每个离散型特征变量的权重;根据各离散型特征变量的权重,提取部分离散型特征变量;根据用户标识对应的标签值以及预设的证据权重计算公式,确定提取的部分离散型特征变量取值的证据权重值;根据得到的证据权重值,确定不同标签值对应的离散型特征变量取值的集合。
在本实施例的一些可选的实现方式中,第一集合确定单元503可以进一步被配置成:利用连续型特征变量的取值以及用户标识对应的标签值,训练得到第二二分类模型;根据第二二分类模型的决策路径,确定不同标签值对应的连续型特征变量取值的集合。
在本实施例的一些可选的实现方式中,第二集合确定单元504可以进一步被配置成:确定各标签值中单个标签值对应的离散型特征变量取值的集合和连续型特征变量的集合取值的交集或并集,得到各标签值中单个标签值对应的特征变量取值的集合。
应当理解,用于输出信息的装置500中记载的单元501至单元505分别与参考图2中描述的方法中的各个步骤相对应。由此,上文针对用于输出信息的方法描述的操作和特征同样适用于装置500及其中包含的单元,在此不再赘述。
下面参考图6,其示出了适于用来实现本公开的实施例的电子设备(例如图1中的服务器)600的结构示意图。图6示出的服务器仅仅是一个示例,不应对本公开的实施例的功能和使用范围带来任何限制。
如图6所示,电子设备600可以包括处理装置(例如中央处理器、 图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有电子设备600操作所需的各种程序和数据。处理装置601、ROM 602以及RAM603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置608;以及通信装置609。通信装置609可以允许电子设备600与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有各种装置的电子设备600,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。图6中示出的每个方框可以代表一个装置,也可以根据需要代表多个装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开的实施例的方法中限定的上述功能。需要说明的是,本公开的实施例所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储 器件、或者上述的任意合适的组合。在本公开的实施例中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开的实施例中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取用户的特征数据,特征数据包括用户标识、特征变量的取值和每个特征变量用户标识对应的标签值;确定特征变量中的离散型特征变量和连续型特征变量;确定不同的标签值对应的离散型特征变量取值的集合以及确定不同的标签值对应的连续型特征变量取值的集合;根据不同的标签值对应的离散型特征变量取值的集合和连续型特征变量取值的集合,确定不同标签值对应的特征变量取值的集合;输出不同标签值对应的特征变量取值的集合。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的实施例的操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网 (WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开的实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括数据获取单元、变量分类单元、第一集合确定单元、第二集合确定单元和集合输出单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,数据获取单元还可以被描述为“获取用户的特征数据的单元”。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开的实施例中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开的实施例中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (12)

  1. 一种用于输出信息的方法,所述方法包括:
    获取用户的特征数据,所述特征数据包括用户标识、特征变量的取值和用户标识对应的标签值;
    确定所述特征变量中的离散型特征变量和连续型特征变量;
    确定不同标签值对应的离散型特征变量取值的集合以及确定不同标签值对应的连续型特征变量取值的集合;
    根据不同标签值对应的离散型特征变量取值的集合和连续型特征变量取值的集合,确定不同标签值对应的特征变量取值的集合;以及
    输出不同标签值对应的特征变量取值的集合。
  2. 根据权利要求1所述的方法,其中,所述确定所述特征变量中的离散型特征变量和连续型特征变量,包括:
    对于每个特征变量,执行以下判断步骤:
    统计该特征变量的取值的第一数量以及不同取值的第二数量;
    确定所述第二数量与所述第一数量的比值;
    如果所述第二数量大于预设数量阈值以及所述比值大于预设比值阈值,则将该特征变量认定为连续型特征变量;以及
    如果所述第二数量不大于预设数量阈值或所述比值不大于预设比值阈值,则将该特征变量认定为离散型特征变量。
  3. 根据权利要求1所述的方法,其中,所述确定不同标签值对应的离散型特征变量取值的集合,包括:
    利用离散型特征变量的取值以及用户标识对应的标签值,训练得到第一二分类模型;
    根据所述第一二分类模型,确定每个离散型特征变量的权重;
    根据各离散型特征变量的权重,提取部分离散型特征变量;
    根据用户标识对应的标签值以及预设的证据权重计算公式,确定提取的部分离散型特征变量取值的证据权重值;以及
    根据得到的证据权重值,确定不同标签值对应的离散型特征变量取值的集合。
  4. 根据权利要求1所述的方法,其中,所述确定不同标签值对应的连续型特征变量取值的集合,包括:
    利用连续型特征变量的取值以及用户标识对应的标签值,训练得到第二二分类模型;以及
    根据所述第二二分类模型的决策路径,确定不同标签值对应的连续型特征变量取值的集合。
  5. 根据权利要求1所述的方法,其中,所述根据不同标签值对应的离散型特征变量取值的集合和连续型特征变量取值的集合,确定不同标签值对应的特征变量取值的集合,包括:
    确定各标签值中单个标签值对应的离散型特征变量取值的集合和连续型特征变量的集合取值的交集或并集,得到各标签值中单个标签值对应的特征变量取值的集合。
  6. 一种用于输出信息的装置,包括:
    数据获取单元,被配置成获取用户的特征数据,所述特征数据包括用户标识、特征变量的取值和用户标识对应的标签值;
    变量分类单元,被配置成确定所述特征变量中的离散型特征变量和连续型特征变量;
    第一集合确定单元,被配置成确定不同标签值对应的离散型特征变量取值的集合以及确定不同标签值对应的连续型特征变量取值的集合;
    第二集合确定单元,被配置成根据不同标签值对应的离散型特征变量取值的集合和连续型特征变量取值的集合,确定不同标签值对应的特征变量取值的集合;以及
    集合输出单元,被配置成输出不同标签值对应的特征变量取值的集合。
  7. 根据权利要求6所述的装置,其中,变量分类单元进一步被配置成:
    对于每个特征变量,执行以下判断步骤:
    统计该特征变量的取值的第一数量以及不同取值的第二数量;
    确定所述第二数量与所述第一数量的比值;
    如果所述第二数量大于预设数量阈值以及所述比值大于预设比值阈值,则将该特征变量认定为连续型特征变量;以及
    如果所述第二数量不大于预设数量阈值或所述比值不大于预设比值阈值,则将该特征变量认定为离散型特征变量。
  8. 根据权利要求6所述的装置,其中,所述第一集合确定单元进一步被配置成:
    利用离散型特征变量的取值以及用户标识对应的标签值,训练得到第一二分类模型;
    根据所述第一二分类模型,确定每个离散型特征变量的权重;
    根据各离散型特征变量的权重,提取部分离散型特征变量;
    根据用户标识对应的标签值以及预设的证据权重计算公式,确定提取的部分离散型特征变量取值的证据权重值;以及
    根据得到的证据权重值,确定不同标签值对应的离散型特征变量取值的集合。
  9. 根据权利要求6所述的装置,其中,所述第一集合确定单元进一步被配置成:
    利用连续型特征变量的取值以及用户标识对应的标签值,训练得到第二二分类模型;以及
    根据所述第二二分类模型的决策路径,确定不同标签值对应的连续型特征变量取值的集合。
  10. 根据权利要求6所述的装置,其中,所述第二集合确定单元 进一步被配置成:
    确定各标签值中单个标签值对应的离散型特征变量取值的集合和连续型特征变量的集合取值的交集或并集,得到各标签值中单个标签值对应的特征变量取值的集合。
  11. 一种服务器,包括:
    一个或多个处理器;
    存储装置,其上存储有一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-5中任一所述的方法。
  12. 一种计算机可读介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如权利要求1-5中任一所述的方法。
PCT/CN2020/095193 2019-11-13 2020-06-09 用于输出信息的方法和装置 WO2021093320A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP20887795.1A EP3901789A4 (en) 2019-11-13 2020-06-09 METHOD AND DEVICE FOR ISSUEING INFORMATION
KR1020217022835A KR20210097204A (ko) 2019-11-13 2020-06-09 정보를 출력하는 방법 및 장치
JP2021541618A JP7288062B2 (ja) 2019-11-13 2020-06-09 情報を出力するための方法及び装置、電子機器、記憶媒体並びにコンピュータプログラム
US17/379,781 US20210349920A1 (en) 2019-11-13 2021-07-19 Method and apparatus for outputting information

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911106997.8 2019-11-13
CN201911106997.8A CN110795638A (zh) 2019-11-13 2019-11-13 用于输出信息的方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/379,781 Continuation US20210349920A1 (en) 2019-11-13 2021-07-19 Method and apparatus for outputting information

Publications (1)

Publication Number Publication Date
WO2021093320A1 true WO2021093320A1 (zh) 2021-05-20

Family

ID=69444459

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/095193 WO2021093320A1 (zh) 2019-11-13 2020-06-09 用于输出信息的方法和装置

Country Status (6)

Country Link
US (1) US20210349920A1 (zh)
EP (1) EP3901789A4 (zh)
JP (1) JP7288062B2 (zh)
KR (1) KR20210097204A (zh)
CN (1) CN110795638A (zh)
WO (1) WO2021093320A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795638A (zh) * 2019-11-13 2020-02-14 北京百度网讯科技有限公司 用于输出信息的方法和装置
CN113536107B (zh) * 2020-10-06 2022-07-29 西安创业天下网络科技有限公司 基于区块链的大数据决策方法、系统及云端服务中心

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160125297A1 (en) * 2014-10-30 2016-05-05 Umm Al-Qura University System and method for solving spatiotemporal-based problems
CN106651574A (zh) * 2016-12-30 2017-05-10 苏州大学 一种个人信用评估方法及装置
CN107545360A (zh) * 2017-07-28 2018-01-05 浙江邦盛科技有限公司 一种基于决策树的风控智能规则导出方法及系统
CN108154430A (zh) * 2017-12-28 2018-06-12 上海氪信信息技术有限公司 一种基于机器学习和大数据技术的信用评分构建方法
CN110795638A (zh) * 2019-11-13 2020-02-14 北京百度网讯科技有限公司 用于输出信息的方法和装置

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0261769A (ja) * 1988-08-29 1990-03-01 Fujitsu Ltd 分類決定木生成装置
JPH0696050A (ja) * 1992-09-16 1994-04-08 Yaskawa Electric Corp 決定木の作成方法
US20150220951A1 (en) * 2009-01-21 2015-08-06 Truaxis, Inc. Method and system for inferring an individual cardholder's demographic data from shopping behavior and external survey data using a bayesian network
US20130085965A1 (en) * 2011-10-04 2013-04-04 Hui Dai Method and Apparatus of Investment Strategy Formulation and Evaluation
CN103136247B (zh) * 2011-11-29 2015-12-02 阿里巴巴集团控股有限公司 属性数据区间划分方法及装置
EP2688264B1 (en) * 2012-07-16 2016-08-24 Alcatel Lucent Method and apparatus for privacy protected clustering of user interest profiles
US9384571B1 (en) * 2013-09-11 2016-07-05 Google Inc. Incremental updates to propagated social network labels
CN105591972B (zh) * 2015-12-22 2018-09-11 桂林电子科技大学 一种基于本体的网络流量分类方法
US10997672B2 (en) * 2017-05-31 2021-05-04 Intuit Inc. Method for predicting business income from user transaction data
CN107590735A (zh) * 2017-09-04 2018-01-16 深圳市华傲数据技术有限公司 用于信用评估的数据挖掘方法及装置
CN110266510B (zh) * 2018-03-21 2022-05-24 腾讯科技(深圳)有限公司 网络控制策略生成方法及装置、网络控制方法、存储介质
CN110210218B (zh) * 2018-04-28 2023-04-14 腾讯科技(深圳)有限公司 一种病毒检测的方法以及相关装置
CN110210884B (zh) * 2018-05-29 2023-05-05 腾讯科技(深圳)有限公司 确定用户特征数据的方法、装置、计算机设备及存储介质
CN109685574A (zh) * 2018-12-25 2019-04-26 拉扎斯网络科技(上海)有限公司 数据确定方法、装置、电子设备及计算机可读存储介质
CN110147821A (zh) * 2019-04-15 2019-08-20 中国平安人寿保险股份有限公司 目标用户群体确定方法、装置、计算机设备及存储介质
US20210097424A1 (en) * 2019-09-26 2021-04-01 Microsoft Technology Licensing, Llc Dynamic selection of features for training machine learning models

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160125297A1 (en) * 2014-10-30 2016-05-05 Umm Al-Qura University System and method for solving spatiotemporal-based problems
CN106651574A (zh) * 2016-12-30 2017-05-10 苏州大学 一种个人信用评估方法及装置
CN107545360A (zh) * 2017-07-28 2018-01-05 浙江邦盛科技有限公司 一种基于决策树的风控智能规则导出方法及系统
CN108154430A (zh) * 2017-12-28 2018-06-12 上海氪信信息技术有限公司 一种基于机器学习和大数据技术的信用评分构建方法
CN110795638A (zh) * 2019-11-13 2020-02-14 北京百度网讯科技有限公司 用于输出信息的方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3901789A4

Also Published As

Publication number Publication date
CN110795638A (zh) 2020-02-14
KR20210097204A (ko) 2021-08-06
JP7288062B2 (ja) 2023-06-06
JP2022534160A (ja) 2022-07-28
EP3901789A1 (en) 2021-10-27
EP3901789A4 (en) 2022-09-21
US20210349920A1 (en) 2021-11-11

Similar Documents

Publication Publication Date Title
Ma et al. A new aspect on P2P online lending default prediction using meta-level phone usage data in China
US20200074565A1 (en) Automated enterprise transaction data aggregation and accounting
US11270375B1 (en) Method and system for aggregating personal financial data to predict consumer financial health
CN110378786B (zh) 模型训练方法、违约传导风险识别方法、装置及存储介质
CN110135901A (zh) 一种企业用户画像构建方法、系统、介质和电子设备
US10037194B2 (en) Systems and methods for visual data management
US20180018734A1 (en) Method and system for automatically categorizing financial transaction data
CN110795568A (zh) 基于用户信息知识图谱的风险评估方法、装置和电子设备
WO2017133568A1 (zh) 一种目标特征数据的挖掘方法和装置
CN112541817A (zh) 一种个人消费贷款潜在客户的营销响应处理方法及系统
WO2021093320A1 (zh) 用于输出信息的方法和装置
CN110197426B (zh) 一种信用评分模型的建立方法、装置及可读存储介质
CN110119415A (zh) 基于渠道投放的数据分析方法、系统、介质和电子设备
US11093528B2 (en) Automated data supplementation and verification
CN111553487B (zh) 一种业务对象识别方法及装置
CN112950359A (zh) 一种用户识别方法和装置
US20170300937A1 (en) System and method for inferring social influence networks from transactional data
CN117033431A (zh) 工单处理方法、装置、电子设备和介质
CN111177653A (zh) 一种信用评估方法和装置
US10313854B2 (en) Listing service registrations through a mobile number
CN114493853A (zh) 信用等级评价方法、装置、电子设备及存储介质
CN114066603A (zh) 贷后风险预警方法、装置、电子设备和计算机可读介质
CN110895564A (zh) 一种潜在客户数据处理方法和装置
Kang Fraud Detection in Mobile Money Transactions Using Machine Learning
US20230419344A1 (en) Attribute selection for matchmaking

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20887795

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021541618

Country of ref document: JP

Kind code of ref document: A

Ref document number: 20217022835

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020887795

Country of ref document: EP

Effective date: 20210719

NENP Non-entry into the national phase

Ref country code: DE