WO2019091177A1 - 风险识别模型构建和风险识别方法、装置及设备 - Google Patents
风险识别模型构建和风险识别方法、装置及设备 Download PDFInfo
- Publication number
- WO2019091177A1 WO2019091177A1 PCT/CN2018/100989 CN2018100989W WO2019091177A1 WO 2019091177 A1 WO2019091177 A1 WO 2019091177A1 CN 2018100989 W CN2018100989 W CN 2018100989W WO 2019091177 A1 WO2019091177 A1 WO 2019091177A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- sequence
- user state
- state
- risk identification
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/067—Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/02—Banking, e.g. interest calculation or account maintenance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/08—Insurance
Definitions
- the present specification relates to the field of data mining technology, and in particular, to a risk identification model construction and risk identification method, device and device.
- the embodiment of the present specification provides a risk identification model construction and risk identification method, device and device, and the technical solution is as follows:
- a method for constructing a risk identification model including:
- the risk identification sample data is constructed, and the risk identification model is constructed by using at least one piece of sample data.
- a risk identification method including:
- the sequence feature is used as an input to a pre-built risk identification model to output a risk identification result.
- a risk identification model construction apparatus including:
- An extracting unit extracts, from data of a given user, a user status record within a preset duration before a specific event occurs, the user status record including: an operation behavior of the user and/or a system event, where the specific event is a determined risk Type of event;
- a sequence generating unit sorting the extracted user state records according to the occurrence time, and generating a user state sequence for the specific event according to the sorting result;
- a feature conversion unit that converts the generated user state sequence into a sequence feature
- the model construction unit uses the obtained sequence feature as the feature value and the event risk type as the tag value to form the risk identification sample data, and constructs the risk recognition model by using at least one piece of sample data.
- a risk identification apparatus including:
- the acquiring unit extracts, from the data of the target user, a user status record of the target user within a given time period, where the user status record includes: an operation behavior of the user and/or a system event;
- a sequence generating unit sorting the user state records according to an occurrence time, and generating a user state sequence according to the sorting result
- a feature conversion unit that converts the generated user state sequence into a sequence feature
- the output unit outputs the risk identification result by using the sequence feature as an input of a pre-built risk identification model.
- a computer device including:
- a memory for storing processor executable instructions
- the processor is configured to:
- the risk identification sample data is constructed, and the risk identification model is constructed by using at least one piece of sample data.
- a computer device including:
- a memory for storing processor executable instructions
- the processor is configured to:
- the sequence feature is used as an input to a pre-built risk identification model to output a risk identification result.
- the risk identification model is constructed by mining the potential relationship between the user state sequence and the risk event.
- the user to be identified can be predicted based on the risk identification model and the extracted user state sequence of the user to be identified.
- the specific event or the risk of the user itself increases the ability of the risk control system to identify risks.
- any of the embodiments of the present specification does not need to achieve all of the above effects.
- FIG. 1 is a schematic flow chart of a risk model construction method in an embodiment of the present specification
- FIG. 2 is a schematic flow chart of a risk identification method according to an embodiment of the present specification
- FIG. 3 is a schematic structural diagram of a risk model construction apparatus according to an embodiment of the present specification.
- FIG. 4 is a schematic structural diagram of a risk identification device according to an embodiment of the present specification.
- Figure 5 is a block diagram showing the structure of an apparatus for configuring an apparatus of an embodiment of the present specification.
- risk-related static features can include user asset information, authentication information, social relationship information, etc.
- behavioral features can include users.
- behaviors on the platform such as clicks, browses, transfers, access operations, and more.
- Behavioral characteristics are often more suitable for risk prediction than static features. For example, an ordinary user and a fraudster may not be able to make effective distinctions through static features. However, since the fraudster often performs some preparatory activities before the fraud, the operations, the number of operations, the frequency, etc., which are often performed before the fraud occurs, can depict and express the fraudster to a certain extent. motivation.
- the user's behavioral characteristics may also contain time information associated with these events, for example, the sequence of behavior of two users in the past hour. for:
- a risk identification scheme which includes two phases: a construction phase of a risk identification model and a phase of risk identification using a risk identification model. Firstly, the construction phase of the risk identification model is introduced.
- a risk identification model construction method may include the following steps 101 to 104, wherein:
- Step 101 Extract user status records within a preset duration before a specific event occurs from data of a given user.
- the model is built on the machine learning process based on the sample data, so the collection of sample data is the first step.
- the "given user" is the sample user. Usually, the larger the number of samples, the better the model will be built.
- the selection of sample users can be based on specific events made by the user.
- the specific event may be an event of a determined risk type. For example, for a transfer that has occurred, if the subsequent determination is that the transfer is performed by the thief, the type of the transfer event may be determined as “high risk”. If it is determined later that the transfer is made by the user himself, the type of the transfer event can be determined as "low risk”.
- a user status record within a preset duration (eg, 1 hour) before a specific event occurs can be extracted from the corresponding data of the sample.
- the user status record includes: an operation behavior of the user (eg, click, browse, transfer, access operation, etc.) and/or a system event, and the system event may include: an event caused by the user behavior, and a non-user behavior event. That is, the user status record reflects what behaviors the user has done over a period of time and/or which system events have occurred, and also records the time of occurrence of each user behavior and/or system event.
- Step 102 Sort the extracted user state records according to the occurrence time, and generate a user state sequence for the specific event according to the sorting result.
- the extracted user status record contains: four states of A, B, C, and D (ie, behavior or system events), and the time in which each state occurs is recorded in chronological order.
- the results of sorting are, for example:
- the results may be exactly the same after sorting, but the interval between two adjacent states is different for the two users. In actual applications, the interval between states is different.
- the user motives reflected may also be completely different.
- the distinguishing feature of a thief is that the interval between the two states B and A is shorter, while the average user is the opposite.
- the process of generating a sequence of user states may be specifically:
- the i-th user state is converted into a user state carrying the interval duration information according to an interval duration of the i-th (i ⁇ 1)-th user state and the i+1th-th user state.
- the first user state is B
- the second user state is A. If the interval between the two user states B and A is t1, the result of the conversion is, for example:
- a and C in the sequence can also be transformed in the same way.
- the time interval can be divided into two types: “long” and “short” (which can be divided according to a set duration threshold), and the result obtained after the conversion is, for example:
- the process of generating the user state sequence may be specifically as follows:
- the i-th user state is converted into the user state carrying the evaluation result information according to the evaluation result of the i-th user state in advance.
- the obtained evaluation result is used to reflect the status of the user.
- the evaluation result is a score between 0 and 1. The higher the score, the higher the score. , indicating that it is less likely to cause fraudulent events.
- it can be evaluated by collecting a large number of user status records containing the status of a particular user. For example, 100 records containing user status A are collected. Of these 100 records, 10 records actually led to fraudulent events. Therefore, the evaluation result of the user state A can be 0.9.
- sequence of user states obtained after the final conversion can be, for example:
- the evaluation result of the user state may be classified into two categories: “high risk” and “low risk”, and the result obtained after the conversion is, for example:
- the process of generating a sequence of user states can be specifically:
- the user status in the sorting result is filtered according to a preset filtering rule.
- the filtering rules at least define which user states should be filtered out.
- Step 103 Convert the generated user state sequence into a sequence feature.
- step 103 The purpose of this step 103 is to express the sequence of user states using mathematical features to form sequence features that can be used for identification.
- step 103 specifically includes encoding the generated sequence of user states as a sequence vector. Further, the encoding the generated user state sequence into a sequence vector may include:
- Step 131 Encode each state in the sequence of user states into a state vector according to a first encoding rule (eg, word2vector). For example, for B ⁇ A ⁇ C ⁇ D, B is coded as: 000101.
- a first encoding rule eg, word2vector
- Step 132 Encode the sequence formed by the state vector into a sequence vector using a neural network.
- the neural network includes, but is not limited to, a Recurrent Neural Network (RNN) and a Convolutional Neural Network (CNN).
- RNN Recurrent Neural Network
- CNN Convolutional Neural Network
- the method may further include steps 10 and 20, wherein:
- Step 10 Mining a frequent sequence item set from a plurality of user state sequences.
- a frequent sequence item set is a set of several frequent sequence items, and a frequent sequence item refers to a sequence with a relatively frequent occurrence.
- a number of user state sequences can be collected, and the number of occurrences of each user state sequence can be counted, and finally, the number of occurrences is sorted from high to low, thereby taking a certain number of ranked user state sequences as frequent.
- Sequence item In another aspect, a co-occurring sequence can also be mined from a sequence of multiple user states as a frequent sequence term. E.g:
- Step 20 Determine, for each frequent sequence item in the frequent sequence item set, a feature value corresponding to the frequent sequence item according to a black and white sample distribution corresponding to the frequent sequence item.
- the feature value corresponding to the frequent sequence item may be determined according to the risk probability corresponding to the frequent sequence item.
- the risk probability corresponding to the frequent sequence item "A ⁇ B ⁇ C" is determined as its eigenvalue.
- the feature value can be obtained by the softmax function.
- the essence of the softmax function is to compress (map) a K-dimensional arbitrary real vector into another K-dimensional real vector, where each element in the vector can be between (0, 1), softmax function Used to solve multi-classification problems.
- the step 103 may include the following steps 133 and 134, wherein:
- step 133 based on the frequent sequence item set, the frequent sequence items included in the generated user state sequence are determined.
- a sequence feature corresponding to the user state sequence is determined according to a frequent sequence item included in the user state sequence and a predetermined feature value corresponding to each frequent sequence item.
- State sequence Sequence characteristics corresponding to A ⁇ B ⁇ D ⁇ F ⁇ C ⁇ E. For example, summing the eigenvalues of the frequent sequence items hit, using the sum value as a sequence feature, and so on.
- Step 104 The obtained sequence feature is used as the feature value and the event risk type as the tag value, and the risk identification sample data is constructed, and the risk identification model is constructed by using at least one piece of sample data.
- the feature value is an input of a model to be trained, and the tag value (eg, a tag value corresponding to a high risk and a tag value of 0 corresponding to a low risk) may be used as an output desired by the model.
- the tag value eg, a tag value corresponding to a high risk and a tag value of 0 corresponding to a low risk
- other feature variables may be introduced, for example, non-sequence conventional model variables. All variables train the model together and identify online risks.
- a risk identification method may include steps 201 to 204, wherein:
- Step 201 Extract, from the data of the target user, a user status record of the target user within a given time period, where the user status record includes: an operation behavior of the user and/or a system event.
- Case 1 when it is recognized that the target user is about to perform a certain event (such as a transfer), start to extract the user status record of the target user within a given time period before the specific event. If the risk is finally identified, take appropriate action, such as: the transfer is not allowed.
- a certain event such as a transfer
- Case 2 after the target user has performed a certain event, start to extract the user status record of the target user within a given time period before the specific event. If the risk is finally identified, the account used by the fraudster can be frozen.
- Step 202 Sort the user status records according to the occurrence time, and generate a user status sequence according to the sorting result.
- Step 203 Convert the generated user state sequence into a sequence feature.
- Step 204 The sequence feature is used as an input of a pre-built risk identification model, and the risk identification result is output. Among them, the output of the model can indicate the possibility that there is a risk for the current specific event.
- the risk identification model is constructed by mining the potential relationship between the user state sequence and the risk event.
- the user to be identified can be predicted based on the risk identification model and the extracted user state sequence of the user to be identified.
- the specific event or the risk of the user itself improves the risk identification ability of the risk control system, improves the defense robustness of the risk control system, and makes the strategy and model more difficult to be bypassed by the thief.
- the behavior sequence information can intuitively reflect the thief's modus operandi, which can help the strategist to conveniently analyze the case misappropriation behavior and improve work efficiency.
- the embodiment of the present specification further provides a risk identification model construction device and a risk identification device.
- a risk identification model constructing apparatus 300 may include:
- the extracting unit 301 is configured to: extract, from the data of the given user, a user status record within a preset duration before the occurrence of the specific event, where the user status record includes: an operation behavior of the user and/or a system event, the specific An event is an event that has determined the type of risk.
- the sequence generating unit 302 is configured to sort the extracted user state records according to the occurrence time, and generate a user state sequence for the specific event according to the sorting result.
- the feature conversion unit 303 is configured to convert the generated user state sequence into a sequence feature.
- the model construction unit 304 is configured to construct the risk identification sample data by using the obtained sequence feature as the feature value and the event risk type as the tag value, and construct the risk identification model by using at least one piece of sample data.
- sequence generation unit 302 can be configured to:
- the i-th user state is converted into a user state carrying the interval duration information according to the interval duration between the i-th user state and the i+1th user state.
- sequence generation unit 302 can be configured to:
- the i-th user state is converted into the user state carrying the evaluation result information according to the evaluation result of the i-th user state in advance.
- sequence generation unit 302 can be configured to:
- the user status in the sorting result is filtered according to a preset filtering rule.
- a risk identification device 400 can include:
- the obtaining unit 401 is configured to: extract, from the data of the target user, a user status record of the target user within a given time period, where the user status record includes: an operation behavior of the user and/or a system event.
- the sequence generating unit 402 is configured to sort the user state records according to the occurrence time, and generate a user state sequence according to the sorting result.
- Feature conversion unit 403 is configured to convert the generated sequence of user states into sequence features.
- the output unit 404 is configured to output the risk identification result by using the sequence feature as an input of a pre-built risk identification model.
- sequence generation unit 402 can be configured to:
- the i-th user state is converted into a user state carrying the interval duration information according to the interval duration between the i-th user state and the i+1th user state.
- sequence generation unit 402 can be configured to:
- the i-th user state is converted into the user state carrying the evaluation result information according to the evaluation result of the i-th user state in advance.
- sequence generation unit 402 can be configured to:
- the user status in the sorting result is filtered according to a preset filtering rule.
- the embodiment of the present specification further provides a computer device including at least a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor implements the foregoing risk identification method when the program is executed.
- the method at least includes:
- the risk identification sample data is constructed, and the risk identification model is constructed by using at least one piece of sample data.
- the method at least includes:
- the sequence feature is used as an input to a pre-built risk identification model to output a risk identification result.
- FIG. 5 is a schematic diagram showing a hardware structure of a more specific computing device provided by an embodiment of the present specification.
- the device may include a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050.
- the processor 1010, the memory 1020, the input/output interface 1030, and the communication interface 1040 implement communication connections within the device with each other through the bus 1050.
- the processor 1010 can be implemented by using a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits for performing correlation.
- the program is implemented to implement the technical solutions provided by the embodiments of the present specification.
- the memory 1020 can be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like.
- the memory 1020 can store an operating system and other applications.
- the technical solution provided by the embodiment of the present specification is implemented by software or firmware, the related program code is saved in the memory 1020 and is called and executed by the processor 1010.
- the input/output interface 1030 is used to connect an input/output module to implement information input and output.
- the input/output/module can be configured as a component in the device (not shown) or externally connected to the device to provide the corresponding function.
- the input device may include a keyboard, a mouse, a touch screen, a microphone, various types of sensors, and the like, and the output device may include a display, a speaker, a vibrator, an indicator light, and the like.
- the communication interface 1040 is configured to connect a communication module (not shown) to implement communication interaction between the device and other devices.
- the communication module can communicate by wired means (such as USB, network cable, etc.), or can communicate by wireless means (such as mobile network, WIFI, Bluetooth, etc.).
- Bus 1050 includes a path for communicating information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
- the above device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040, and the bus 1050, in a specific implementation, the device may also include necessary for normal operation. Other components.
- the above-mentioned devices may also include only the components necessary for implementing the embodiments of the present specification, and do not necessarily include all the components shown in the drawings.
- the embodiments of the present specification can be implemented by means of software plus a necessary general hardware platform. Based on such understanding, the technical solution of the embodiments of the present specification may be embodied in the form of a software product in essence or in the form of a software product, which may be stored in a storage medium such as a ROM/RAM. Disks, optical disks, and the like, including instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the embodiments of the present specification or embodiments.
- a computer device which may be a personal computer, server, or network device, etc.
- the system, device, module or unit illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function.
- a typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, and a game control.
- the various embodiments in the specification are described in a progressive manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments.
- the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
- the device embodiments described above are merely illustrative, and the modules described as separate components may or may not be physically separated, and the functions of the modules may be the same in the implementation of the embodiments of the present specification. Or implemented in multiple software and/or hardware. It is also possible to select some or all of the modules according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without any creative effort.
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Economics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- General Engineering & Computer Science (AREA)
- Technology Law (AREA)
- Tourism & Hospitality (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Educational Administration (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
公开了一种风险识别模型构建和风险识别方法、装置及设备。所述风险识别方法包括:获取目标用户在给定时长内的用户状态记录,所述用户状态记录包括:用户的操作行为和/或系统事件;按照发生时间对所述用户状态记录进行排序,根据排序结果生成用户状态序列;将所生成的用户状态序列转换为序列特征;将所述序列特征作为预先构建的风险识别模型的输入,输出风险识别结果。
Description
本说明书涉及数据挖掘技术领域,尤其涉及一种风险识别模型构建和风险识别方法、装置及设备。
在大数据时代,数据挖掘和机器学习技术逐渐被应用于各种领域以解决实际问题。例如根据大量的真实用户数据或事件数据建立模型,从而实现对新用户或新事件的各种未知情况的预测。
以金融行业的风控场景为例,通过对已定性风险事件的参与用户进行分析,可以挖掘各种用户特征与风险事件的潜在关系,形成用户特征与风险的关系模型,即预测“什么样的用户或用户行为更容易导致风险事件”,从而提前采取各种措施,以避免或减少风险事件的发生。
发明内容
针对上述技术问题,本说明书实施例提供一种风险识别模型构建和风险识别方法、装置及设备,技术方案如下:
根据本说明书实施例的第一方面,提供一种风险识别模型构建方法,包括:
从给定用户的数据中,提取特定事件发生前预设时长内的用户状态记录,所述用户状态记录包括:用户的操作行为和/或系统事件,所述特定事件为已确定风险类型的事件;
按照发生时间对所提取到的用户状态记录进行排序,根据排序结果生成针对所述特定事件的用户状态序列;
将所生成的用户状态序列转换为序列特征;
以所得到的序列特征作为特征值、所述事件风险类型作为标签值,构成风险识别样本数据,并利用至少一条样本数据构建风险识别模型。
根据本说明书实施例的第二方面,提供一种风险识别方法,包括:
从目标用户的数据中,提取该目标用户在给定时长内的用户状态记录,所述用户状态记录包括:用户的操作行为和/或系统事件;
按照发生时间对所述用户状态记录进行排序,根据排序结果生成用户状态序列;
将所生成的用户状态序列转换为序列特征;
将所述序列特征作为预先构建的风险识别模型的输入,输出风险识别结果。
根据本说明书实施例的第三方面,提供一种风险识别模型构建装置,包括:
提取单元,从给定用户的数据中,提取特定事件发生前预设时长内的用户状态记录,所述用户状态记录包括:用户的操作行为和/或系统事件,所述特定事件为已确定风险类型的事件;
序列生成单元,按照发生时间对所提取到的用户状态记录进行排序,根据排序结果生成针对所述特定事件的用户状态序列;
特征转换单元,将所生成的用户状态序列转换为序列特征;
模型构建单元,以所得到的序列特征作为特征值、所述事件风险类型作为标签值,构成风险识别样本数据,并利用至少一条样本数据构建风险识别模型。
根据本说明书实施例的第四方面,提供一种风险识别装置,包括:
获取单元,从目标用户的数据中,提取该目标用户在给定时长内的用户状态记录,所述用户状态记录包括:用户的操作行为和/或系统事件;
序列生成单元,按照发生时间对所述用户状态记录进行排序,根据排序结果生成用户状态序列;
特征转换单元,将所生成的用户状态序列转换为序列特征;
输出单元,将所述序列特征作为预先构建的风险识别模型的输入,输出风险识别结果。
根据本说明书实施例的第五方面,提供一种计算机设备,包括:
处理器;
用于存储处理器可执行指令的存储器;
所述处理器被配置为:
从给定用户的数据中,提取特定事件发生前预设时长内的用户状态记录,所述用户状态记录包括:用户的操作行为和/或系统事件,所述特定事件为已确定风险类型的事件;
按照发生时间对所提取到的用户状态记录进行排序,根据排序结果生成针对所述特定事件的用户状态序列;
将所生成的用户状态序列转换为序列特征;
以所得到的序列特征作为特征值、所述事件风险类型作为标签值,构成风险识别样本数据,并利用至少一条样本数据构建风险识别模型。
根据本说明书实施例的第六方面,提供一种计算机设备,包括:
处理器;
用于存储处理器可执行指令的存储器;
所述处理器被配置为:
从目标用户的数据中,提取该目标用户在给定时长内的用户状态记录,所述用户状态记录包括:用户的操作行为和/或系统事件;
按照发生时间对所述用户状态记录进行排序,根据排序结果生成用户状态序列;
将所生成的用户状态序列转换为序列特征;
将所述序列特征作为预先构建的风险识别模型的输入,输出风险识别结果。
本说明书实施例所提供的技术方案所产生的效果包括:
通过挖掘用户状态序列和风险事件之间的潜在关系,来构建风险识别模型,在风险识别阶段,则可基于所述风险识别模型和提取到的待识别用户的用户状态序列,来预测待识别用户所作的特定事件或该用户本身的风险,提高了风控体系对于风险的识别能力。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本说明书实施例。
此外,本说明书实施例中的任一实施例并不需要达到上述的全部效果。
为了更清楚地说明本说明书实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书实施例中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。
图1是本说明书实施例的风险模型构建方法的流程示意图;
图2是本说明书实施例的风险识别方法的流程示意图;
图3是本说明书实施例的风险模型构建装置的结构示意图;
图4是本说明书实施例的风险识别装置的结构示意图;
图5是用于配置本说明书实施例装置的一种设备的结构示意图。
为了使本领域技术人员更好地理解本说明书实施例中的技术方案,下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行详细地描述,显然,所描述的实施例仅仅是本说明书的一部分实施例,而不是全部的实施例。基于本说明书中的实施例,本领域普通技术人员所获得的所有其他实施例,都应当属于保护的范围。
为了实现风险识别,可以利用大量的已定性事件作为样本,训练得到“用户特征-风险分值”的对应关系。常用的用户特征一般可以分为静态特征和行为特征两类,在金融领域,与风险相关的静态特征可以包括用户的资产信息、认证信息、社交关系信息等等,而行为特征则可以包括用户在平台上的各种行为,例如点击、浏览、转账、存取操作等等。
相对于静态特征而言,行为特征往往更适合进行风险预测。例如,一名普通用户和一名欺诈者,通过静态特征可能并不容易进行有效区别。但是,由于在欺诈行为之前,欺诈者往往会进行一些准备活动,因此在欺诈行为发生前经常会做哪些操作、操作的次数、频率等等,都能够在一定程度上刻画和表达出欺诈者的动机。
更进一步讲,用户的行为特征除了包含行为事件本身(即用户做过哪些事)之外,还可以包含与这些事件相关联的时间信息,例如,两名用户在过去1小时内的行为序列 分别为:
用户1:A→B→C→D
用户2:B→C→A→D
可以看出,虽然两个序列中都包含了同样的行为事件,但因发生顺序不同而形成两种不同的行为模式,实际应用中,两种不同的行为模式所导致的后续结果也可能是完全不同的。因此,随着当今盗用和反盗用之间攻防的不断升级,对盗用行为特征的刻画能力提出了新要求,在风险预测时,可以将用户行为的时间特征纳入风险预测模型的训练,以进一步提升风控效果。
出于以上考虑,本说明书实施例提供一种风险识别方案,该方案包括两个阶段:风险识别模型的构建阶段和运用风险识别模型进行风险识别的阶段。首先介绍风险识别模型的构建阶段,参见图1所示,一种风险识别模型构建方法可以包括以下步骤101~104,其中:
步骤101,从给定用户的数据中,提取特定事件发生前预设时长内的用户状态记录。
模型的构建是基于样本数据所进行的机器学习过程,所以样本数据的收集是第一步。“给定用户”便是样本用户,通常样本数越大,构建出的模型效果越好。一般地,样本用户的选择可以基于用户所做出的特定事件来进行的。其中,特定事件可以是已确定风险类型的事件,例如,对于已经发生的一笔转账,若后续确定该笔转账是盗用者所为,可将该转账事件的类型被确定为“高风险”,若后续确定该笔转账是用户本人所为,则可将该转账事件的类型被确定为“低风险”。
在业务运营的过程中,可以将每一用户的历史数据沉淀下来(存储到数据库中),也可以依赖于对线上数据的采集,这些数据可以包括静态数据和动态数据。在筛选出大量样本之后,针对每一样本,都可以从样本对应的数据中,提取在特定事件发生之前预设时长(如:1个小时)内的用户状态记录。其中,所述用户状态记录包括:用户的操作行为(如:点击、浏览、转账、存取操作等)和/或系统事件,所述系统事件可以包括:用户行为导致的事件、非用户行为导致的事件。也就是说,用户状态记录反映了用户在一段时长内做了哪些行为和/或发生了哪些系统事件,并且还记录了每一用户行为和/或系统事件的发生时刻。
步骤102,按照发生时间对所提取到的用户状态记录进行排序,根据排序结果生成针对所述特定事件的用户状态序列。
例如,对于用户1,所提取到的用户状态记录包含:A,B,C,D四个状态(即行为或系统事件),由于记录了每个状态所发生的时间,故按照时间先后的顺序进行排序的结果例如为:
B→A→C→D
实际上,将排序结果直接作为最终的用户状态序列可能并不够合理。例如,对于用户1和用户2,可能排序后的结果完全一样,但是两个用户在任意两个相邻状态间的间隔时长是不一样的,在实际应用中,状态之间的间隔时长的不同所反映出来的用户动机也可能是完全不同的,例如,一名盗用者的显著特点便是B和A这两个状态之间的间隔时长较短,而普通用户则相反。
考虑到上述情况,在一实施例中,生成用户状态序列的过程可具体为:
在所述排序结果中,根据第i(i≥1)次用户状态与第i+1次用户状态的间隔时长,将第i次用户状态转化为携带所述间隔时长信息的用户状态。
沿用以上例子,假设i=1,则第1次用户状态为B,第2次用户状态为A。如果B和A这两个用户状态之间的间隔时长为t1,则转化后的结果例如为:
B(t1)→A→C→D
以此类推,还可以按照同样方法对序列中的A和C进行转化。
又例如,可将时间间隔划分为“长”、“短”两类(可以按照一个设定时长阈值来划分),则转化后所得的结果例如为:
B(长)→A(短)→C(长)→D
在另一个方面,为了使得最终生成的用户状态序列能够更好地表达用户的动机,生成用户状态序列的过程可具体为:
在所述排序结果中,根据预先对第i次用户状态的评估结果,将第i次用户状态转化为携带所述评估结果信息的用户状态。
对于任意一种用户状态,都可以对其进行评估,所得到的评估结果用来反映该用户状态的好坏,例如,评估结果为一个介于0到1之间的分值,分值越高,表明其导致欺诈事件的可能性越小。实际上,可以通过收集大量包含某特定用户状态的用户状态记录来进行评估,例如,收集了100条包含用户状态A的记录,这100条记录中,实际有10条记录最终导致了欺诈事件,故对该用户状态A的评估结果可为0.9。
继续沿用上述例子,最终转化后所得的用户状态序列可以例如为:
B(0.3)→A(0.9)→C(0.6)→D(0.99)
又例如,可将对用户状态的评估结果划分为“高风险”、“低风险”两类,则转化后所得的结果例如为:
B(低风险)→A(高风险)→C(低风险)→D(高风险)
在实际运用过程中发现,个别用户状态(例如:用户点击推送广告等)由于对风险识别本身来说没有实际意义,故并不需要在用户状态序列中有所体现。为此,生成用户状态序列的过程可具体为:
根据预设的过滤规则,对所述排序结果中的用户状态进行过滤。其中,过滤规则至少定义了哪些用户状态应该被过滤掉。
继续沿用以上例子,对于排序结果:B→A→C→D,若按照过滤规则确定出C属于应该过滤掉的状态,则最终获得的用户状态序列为:B→A→D。
当然,可行的序列转换方式并不限于以上所列举的例子,本文不作一一列举。
步骤103,将所生成的用户状态序列转换为序列特征。
该步骤103的目的是将用户状态序列采用数学化的特征表达,形成可用来识别的序列特征。
在一实施例中,步骤103具体包括:将所生成的用户状态序列编码为序列向量。更进一步地,所述将所生成的用户状态序列编码为序列向量可以包括:
步骤131:根据第一编码规则(如:word2vector),将所述用户状态序列中的每一个状态分别编码为状态向量。例如,对于B→A→C→D,将B编码为:000101。
步骤132:利用神经网络将状态向量所构成的序列编码为序列向量。其中,所述神经网络包括但不限于:循环神经网络(Recurrent neural Network,RNN)、卷积神经网络(Convolutional Neural Network,CNN)。实际上,通过将多个状态向量输入到神经网络,以将神经网络最终输出的向量作为序列向量。
在另一实施例中,所述方法还可包括步骤10和步骤20,其中:
步骤10,从多条用户状态序列中挖掘频繁序列项集。
频繁序列项集是由若干频繁序列项所组成的集合,而频繁序列项是指出现次数较为 频繁的序。在一个方面,可以通过收集若干条用户状态序列,并统计每一种用户状态序列的出现次数,最终按照出现次数从高到底进行排序,从而取一定数量的排位靠前的用户状态序列作为频繁序列项。在另一个方面,也可以从多条用户状态序列中挖掘出共同出现过的序列,作为频繁序列项。例如:
用户状态序列1:A→B→D→F→C→E
用户状态序列2:A→B→F→C
用户状态序列3:A→B→D→C→E
则通过上述3个用户状态序列,最终挖掘出的频繁序列项为:
A→B→C
步骤20,对于所述频繁序列项集中的每一频繁序列项,根据所述频繁序列项对应的黑白样本分布情况,确定所述频繁序列项对应的特征值。
例如,对于“A→B→C”这一频繁序列项,选取了100个包含该序列的样本(当某个样本在给定时长内出现了某序列,则确定该样本包含该序列),其中,可以根据每一个样本最终是否导致风险事件(如盗用),将每一样本定性为白样本(即未发生风险事件)或黑样本(即发生了风险事件)。比如:这100个样本中包括90个黑样本和10个白样本,则可以确定该频繁序列项“A→B→C”所对应的风险概率(即导致风险事件的概率)=0.9。
本文中,可以根据频繁序列项对应的风险概率,确定该频繁序列项所对应的特征值。例如,将频繁序列项“A→B→C”对应的风险概率确定为其特征值。或者,对于任意的频繁序列项,可以通过softmax函数来得到特征值。其中,softmax函数的本质是将一个K维的任意实数向量压缩(映射)成另一个K维的实数向量,其中向量中的每个元素取值可以介于(0,1)之间,softmax函数用来解决多分类问题。
相应地,所述步骤103可以包括如下步骤133和步骤134,其中:
在步骤133中,基于所述频繁序列项集,确定生成的用户状态序列中包含的频繁序列项。
在步骤134中,根据所述用户状态序列中包含的频繁序列项和预先确定的各频繁序列项对应的特征值,确定与所述用户状态序列对应的序列特征。
举例来说,假设生成的一个用户状态序列为:
A→B→D→F→C→E
基于预先挖掘得到的所述频繁序列项集,确定该用户状态序列中包含的频繁序列项(即该用户状态序列命中了哪些预先挖掘到的频繁序列项)为:
A→B→D和D→F→C
其中,假设A→B→D所对应的特征值是x1,D→F→C所对应的特征值是x2,则最终可以根据所命中的频繁序列项的特征值x1、x2来计算出的用户状态序列:A→B→D→F→C→E所对应的序列特征。例如:将所命中的频繁序列项的特征值进行求和,将和值作为序列特征,等等。
步骤104,以所得到的序列特征作为特征值、所述事件风险类型作为标签值,构成风险识别样本数据,并利用至少一条样本数据构建风险识别模型。
其中,所述特征值是待训练的模型的输入,所述标签值(如:高风险对应的标签值为1,低风险对应的标签值为0)可以作为模型所期望的输出。当然,在实际模型训练过程中,除了上述序列特征之外,还可以引入其他特征变量,例如,非序列的常规模型变量。所有的变量一同训练模型并对线上风险进行识别。
参照图2所示,基于以上构建的模型,一种风险识别方法可以包括步骤201~步骤204,其中:
步骤201,从目标用户的数据中,提取该目标用户在给定时长内的用户状态记录,其中,所述用户状态记录包括:用户的操作行为和/或系统事件。
其中,关于步骤201何时应该提取用户状态记录,包括如下情况:
情况一,当识别到目标用户将要进行某个特定事件(如转账)时,开始提取该目标用户在该特定事件之前的给定时长内的用户状态记录。若最终识别出风险,则采取相应措施,如:不允许该转账。
情况二,当目标用户已经执行了某个特定事件之后,开始提取该目标用户在该特定事件之前的给定时长内的用户状态记录。若最终识别出风险,则可以冻结欺诈者所用的账号。
步骤202,按照发生时间对所述用户状态记录进行排序,根据排序结果生成用户状态序列。
步骤203,将所生成的用户状态序列转换为序列特征。
步骤204,将所述序列特征作为预先构建的风险识别模型的输入,输出风险识别结果。其中,模型的输出可以表示当前特定事件存在风险的可能性。
本说明书实施例所提供的方法所产生的效果包括:
通过挖掘用户状态序列和风险事件之间的潜在关系,来构建风险识别模型,在风险识别阶段,则可基于所述风险识别模型和提取到的待识别用户的用户状态序列,来预测待识别用户所作的特定事件或该用户本身的风险,提高了风控体系对于风险的识别能力,提升了风控系统的防御鲁棒性,使得策略和模型更难被盗用者绕过。另外,行为序列信息可以直观地反映了盗用者的作案手法,可辅助策略分析师方便地分析案件盗用行为手法,提高工作效率。
相应于上述方法实施例,本说明书实施例还提供一种风险识别模型构建装置和风险识别装置。
参见图3所示,一种风险识别模型构建装置300可以包括:
提取单元301,被配置为:从给定用户的数据中,提取特定事件发生前预设时长内的用户状态记录,所述用户状态记录包括:用户的操作行为和/或系统事件,所述特定事件为已确定风险类型的事件。
序列生成单元302,被配置为:按照发生时间对所提取到的用户状态记录进行排序,根据排序结果生成针对所述特定事件的用户状态序列。
特征转换单元303,被配置为:将所生成的用户状态序列转换为序列特征。
模型构建单元304,被配置为:以所得到的序列特征作为特征值、所述事件风险类型作为标签值,构成风险识别样本数据,并利用至少一条样本数据构建风险识别模型。
在一实施例中,所述序列生成单元302可以被配置为:
在所述排序结果中,根据第i次用户状态与第i+1次用户状态的间隔时长,将第i次用户状态转化为携带所述间隔时长信息的用户状态。
在一实施例中,所述序列生成单元302可以被配置为:
在所述排序结果中,根据预先对第i次用户状态的评估结果,将第i次用户状态转化为携带所述评估结果信息的用户状态。
在一实施例中,所述序列生成单元302可以被配置为:
根据预设的过滤规则,对所述排序结果中的用户状态进行过滤。
参见图4所示,一种风险识别装置400可以包括:
获取单元401,被配置为:从目标用户的数据中,提取该目标用户在给定时长内的用户状态记录,所述用户状态记录包括:用户的操作行为和/或系统事件。
序列生成单元402,被配置为:按照发生时间对所述用户状态记录进行排序,根据排序结果生成用户状态序列。
特征转换单元403,被配置为:将所生成的用户状态序列转换为序列特征。
输出单元404,被配置为:将所述序列特征作为预先构建的风险识别模型的输入,输出风险识别结果。
在一实施例中,所述序列生成单元402可以被配置为:
在所述排序结果中,根据第i次用户状态与第i+1次用户状态的间隔时长,将第i次用户状态转化为携带所述间隔时长信息的用户状态。
在一实施例中,所述序列生成单元402可以被配置为:
在所述排序结果中,根据预先对第i次用户状态的评估结果,将第i次用户状态转化为携带所述评估结果信息的用户状态。
在一实施例中,所述序列生成单元402可以被配置为:
根据预设的过滤规则,对所述排序结果中的用户状态进行过滤。
上述装置中各个模块的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。
本说明书实施例还提供一种计算机设备,其至少包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行所述程序时实现前述的风险识别方法。在一实施例中,方法至少包括:
从给定用户的数据中,提取特定事件发生前预设时长内的用户状态记录,所述用户状态记录包括:用户的操作行为和/或系统事件,所述特定事件为已确定风险类型的事件;
按照发生时间对所提取到的用户状态记录进行排序,根据排序结果生成针对所述特定事件的用户状态序列;
将所生成的用户状态序列转换为序列特征;
以所得到的序列特征作为特征值、所述事件风险类型作为标签值,构成风险识别样本数据,并利用至少一条样本数据构建风险识别模型。
在另一实施例中,方法至少包括:
从目标用户的数据中,提取该目标用户在给定时长内的用户状态记录,所述用户状态记录包括:用户的操作行为和/或系统事件;
按照发生时间对所述用户状态记录进行排序,根据排序结果生成用户状态序列;
将所生成的用户状态序列转换为序列特征;
将所述序列特征作为预先构建的风险识别模型的输入,输出风险识别结果。
图5示出了本说明书实施例所提供的一种更为具体的计算设备硬件结构示意图,该设备可以包括:处理器1010、存储器1020、输入/输出接口1030、通信接口1040和总线1050。其中处理器1010、存储器1020、输入/输出接口1030和通信接口1040通过总线1050实现彼此之间在设备内部的通信连接。
处理器1010可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本说明书实施例所提供的技术方案。
存储器1020可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器1020可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器1020中,并由处理器1010来调用执行。
输入/输出接口1030用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。
通信接口1040用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通 过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。
总线1050包括一通路,在设备的各个组件(例如处理器1010、存储器1020、输入/输出接口1030和通信接口1040)之间传输信息。
需要说明的是,尽管上述设备仅示出了处理器1010、存储器1020、输入/输出接口1030、通信接口1040以及总线1050,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本说明书实施例可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本说明书实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本说明书实施例各个实施例或者实施例的某些部分所述的方法。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,在实施本说明书实施例方案时可以把各模块的功能在同一个或多个软件和/或硬件中实现。也可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
以上所述仅是本说明书实施例的具体实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本说明书实施例原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本说明书实施例的保护范围。
Claims (21)
- 一种风险识别模型构建方法,包括:从给定用户的数据中,提取特定事件发生前预设时长内的用户状态记录,所述用户状态记录包括:用户的操作行为和/或系统事件,所述特定事件为已确定风险类型的事件;按照发生时间对所提取到的用户状态记录进行排序,根据排序结果生成针对所述特定事件的用户状态序列;将所生成的用户状态序列转换为序列特征;以所得到的序列特征作为特征值、所述事件风险类型作为标签值,构成风险识别样本数据,并利用至少一条样本数据构建风险识别模型。
- 根据权利要求1所述的方法,所述根据排序结果生成针对所述特定事件的用户状态序列,包括:在所述排序结果中,根据第i次用户状态与第i+1次用户状态的间隔时长,将第i次用户状态转化为携带所述间隔时长信息的用户状态。
- 根据权利要求1所述的方法,所述根据排序结果生成针对所述特定事件的用户状态序列,包括:在所述排序结果中,根据预先对第i次用户状态的评估结果,将第i次用户状态转化为携带所述评估结果信息的用户状态。
- 根据权利要求1所述的方法,所述根据排序结果生成针对所述特定事件的用户状态序列,包括:根据预设的过滤规则,对所述排序结果中的用户状态进行过滤。
- 根据权利要求1所述的方法,所述将所生成的用户状态序列转换为序列特征,包括:将所生成的用户状态序列编码为序列向量。
- 根据权利要求5所述的方法,所述将所生成的用户状态序列编码为序列向量,包括:根据第一编码规则,将所述用户状态序列中的每一个状态分别编码为状态向量;利用神经网络将状态向量所构成的序列编码为序列向量。
- 根据权利要求5所述的方法,在将所生成的用户状态序列转换为序列特征之前,还包括:从多条用户状态序列中挖掘频繁序列项集;对于所述频繁序列项集中的每一频繁序列项,根据所述频繁序列项对应的黑白样本分布情况,确定所述频繁序列项对应的特征值;所述将所生成的用户状态序列转换为序列特征,包括:基于所述频繁序列项集,确定生成的用户状态序列中包含的频繁序列项;根据所述用户状态序列中包含的频繁序列项和预先确定的各频繁序列项对应的特征值,确定与所述用户状态序列对应的序列特征。
- 一种风险识别方法,包括:从目标用户的数据中,提取该目标用户在给定时长内的用户状态记录,所述用户状态记录包括:用户的操作行为和/或系统事件;按照发生时间对所述用户状态记录进行排序,根据排序结果生成用户状态序列;将所生成的用户状态序列转换为序列特征;将所述序列特征作为预先构建的风险识别模型的输入,输出风险识别结果。
- 根据权利要求8所述的方法,所述根据排序结果生成用户状态序列,包括:在所述排序结果中,根据第i次用户状态与第i+1次用户状态的间隔时长,将第i次用户状态转化为携带所述间隔时长信息的用户状态。
- 根据权利要求8所述的方法,所述根据排序结果生成用户状态序列,包括:在所述排序结果中,根据预先对第i次用户状态的评估结果,将第i次用户状态转化为携带所述评估结果信息的用户状态。
- 根据权利要求8所述的方法,所述根据排序结果生成用户状态序列,包括:根据预设的过滤规则,对所述排序结果中的用户状态进行过滤。
- 一种风险识别模型构建装置,包括:提取单元,从给定用户的数据中,提取特定事件发生前预设时长内的用户状态记录,所述用户状态记录包括:用户的操作行为和/或系统事件,所述特定事件为已确定风险类型的事件;序列生成单元,按照发生时间对所提取到的用户状态记录进行排序,根据排序结果生成针对所述特定事件的用户状态序列;特征转换单元,将所生成的用户状态序列转换为序列特征;模型构建单元,以所得到的序列特征作为特征值、所述事件风险类型作为标签值,构成风险识别样本数据,并利用至少一条样本数据构建风险识别模型。
- 根据权利要求12所述的装置,所述序列生成单元被配置为:在所述排序结果中,根据第i次用户状态与第i+1次用户状态的间隔时长,将第i次用户状态转化为携带所述间隔时长信息的用户状态。
- 根据权利要求12所述的装置,所述序列生成单元被配置为:在所述排序结果中,根据预先对第i次用户状态的评估结果,将第i次用户状态转化为携带所述评估结果信息的用户状态。
- 根据权利要求12所述的装置,所述序列生成单元被配置为:根据预设的过滤规则,对所述排序结果中的用户状态进行过滤。
- 一种风险识别装置,包括:获取单元,从目标用户的数据中,提取该目标用户在给定时长内的用户状态记录,所述用户状态记录包括:用户的操作行为和/或系统事件;序列生成单元,按照发生时间对所述用户状态记录进行排序,根据排序结果生成用户状态序列;特征转换单元,将所生成的用户状态序列转换为序列特征;输出单元,将所述序列特征作为预先构建的风险识别模型的输入,输出风险识别结果。
- 根据权利要求16所述的装置,所述序列生成单元被配置为:在所述排序结果中,根据第i次用户状态与第i+1次用户状态的间隔时长,将第i次用户状态转化为携带所述间隔时长信息的用户状态。
- 根据权利要求16所述的装置,所述序列生成单元被配置为:在所述排序结果中,根据预先对第i次用户状态的评估结果,将第i次用户状态转化为携带所述评估结果信息的用户状态。
- 根据权利要求16所述的装置,所述序列生成单元被配置为:根据预设的过滤规则,对所述排序结果中的用户状态进行过滤。
- 一种计算机设备,包括:处理器;用于存储处理器可执行指令的存储器;所述处理器被配置为:从给定用户的数据中,提取特定事件发生前预设时长内的用户状态记录,所述用户 状态记录包括:用户的操作行为和/或系统事件,所述特定事件为已确定风险类型的事件;按照发生时间对所提取到的用户状态记录进行排序,根据排序结果生成针对所述特定事件的用户状态序列;将所生成的用户状态序列转换为序列特征;以所得到的序列特征作为特征值、所述事件风险类型作为标签值,构成风险识别样本数据,并利用至少一条样本数据构建风险识别模型。
- 一种计算机设备,包括:处理器;用于存储处理器可执行指令的存储器;所述处理器被配置为:从目标用户的数据中,提取该目标用户在给定时长内的用户状态记录,所述用户状态记录包括:用户的操作行为和/或系统事件;按照发生时间对所述用户状态记录进行排序,根据排序结果生成用户状态序列;将所生成的用户状态序列转换为序列特征;将所述序列特征作为预先构建的风险识别模型的输入,输出风险识别结果。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP18876802.2A EP3648023A4 (en) | 2017-11-10 | 2018-08-17 | DEVICE, DEVICE AND METHOD FOR BUILDING UP A RISK IDENTIFICATION MODEL AND DEVICE, DEVICE AND METHOD FOR RISK IDENTIFICATION |
SG11202000861PA SG11202000861PA (en) | 2017-11-10 | 2018-08-17 | Risk identification model building method, apparatus, and device and risk identification method, apparatus, and device |
US16/805,141 US10977739B2 (en) | 2017-11-10 | 2020-02-28 | Risk identification model building and risk identification |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711106115.9A CN107886243A (zh) | 2017-11-10 | 2017-11-10 | 风险识别模型构建和风险识别方法、装置及设备 |
CN201711106115.9 | 2017-11-10 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/805,141 Continuation US10977739B2 (en) | 2017-11-10 | 2020-02-28 | Risk identification model building and risk identification |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019091177A1 true WO2019091177A1 (zh) | 2019-05-16 |
Family
ID=61780066
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/100989 WO2019091177A1 (zh) | 2017-11-10 | 2018-08-17 | 风险识别模型构建和风险识别方法、装置及设备 |
Country Status (6)
Country | Link |
---|---|
US (1) | US10977739B2 (zh) |
EP (1) | EP3648023A4 (zh) |
CN (1) | CN107886243A (zh) |
SG (1) | SG11202000861PA (zh) |
TW (1) | TWI688917B (zh) |
WO (1) | WO2019091177A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112016788A (zh) * | 2020-07-14 | 2020-12-01 | 北京淇瑀信息科技有限公司 | 风险控制策略生成及风险控制方法、装置和电子设备 |
CN113570204A (zh) * | 2021-07-06 | 2021-10-29 | 北京淇瑀信息科技有限公司 | 用户行为预测方法、系统和计算机设备 |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107886243A (zh) * | 2017-11-10 | 2018-04-06 | 阿里巴巴集团控股有限公司 | 风险识别模型构建和风险识别方法、装置及设备 |
CN108985770A (zh) * | 2018-06-07 | 2018-12-11 | 阿里巴巴集团控股有限公司 | 模型训练方法、特征序列生成方法和服务器 |
CN108985583A (zh) * | 2018-06-27 | 2018-12-11 | 中国银行股份有限公司 | 基于人工智能的金融数据风险控制方法及装置 |
CN108984721A (zh) * | 2018-07-10 | 2018-12-11 | 阿里巴巴集团控股有限公司 | 垃圾账号的识别方法和装置 |
CN109657890B (zh) * | 2018-09-14 | 2023-04-25 | 蚂蚁金服(杭州)网络技术有限公司 | 一种转账欺诈的风险确定方法及装置 |
CN109615454A (zh) * | 2018-10-30 | 2019-04-12 | 阿里巴巴集团控股有限公司 | 确定用户金融违约风险的方法及装置 |
CN109582834B (zh) * | 2018-11-09 | 2023-06-02 | 创新先进技术有限公司 | 数据风险预测方法及装置 |
US11797843B2 (en) | 2019-03-06 | 2023-10-24 | Samsung Electronics Co., Ltd. | Hashing-based effective user modeling |
CN110264037B (zh) * | 2019-05-14 | 2023-10-27 | 创新先进技术有限公司 | 一种用户数据的处理方法和装置 |
CN110427971A (zh) * | 2019-07-05 | 2019-11-08 | 五八有限公司 | 用户及ip的识别方法、装置、服务器和存储介质 |
CN110516418A (zh) * | 2019-08-21 | 2019-11-29 | 阿里巴巴集团控股有限公司 | 一种操作用户识别方法、装置及设备 |
US10885160B1 (en) | 2019-08-21 | 2021-01-05 | Advanced New Technologies Co., Ltd. | User classification |
CN110728583A (zh) * | 2019-10-11 | 2020-01-24 | 支付宝(杭州)信息技术有限公司 | 一种骗赔行为识别方法和系统 |
CN111489165B (zh) * | 2020-04-15 | 2022-08-12 | 支付宝(杭州)信息技术有限公司 | 目标对象的数据处理方法、装置和服务器 |
CN111460323B (zh) * | 2020-06-17 | 2020-09-25 | 腾讯科技(深圳)有限公司 | 基于人工智能的焦点用户挖掘方法和装置 |
CN111818093B (zh) * | 2020-08-28 | 2020-12-11 | 支付宝(杭州)信息技术有限公司 | 用于进行风险评估的神经网络系统、方法及装置 |
CN112037001A (zh) * | 2020-09-03 | 2020-12-04 | 云账户技术(天津)有限公司 | 打款风险预测模型训练方法、打款风险预测方法及其装置 |
CN112329974B (zh) * | 2020-09-03 | 2024-02-27 | 中国人民公安大学 | 基于lstm-rnn的民航安保事件行为主体识别与预测方法及系统 |
CN114666391B (zh) * | 2020-12-03 | 2023-09-19 | 中国移动通信集团广东有限公司 | 访问轨迹确定方法、装置、设备及存储介质 |
CN112967134B (zh) * | 2021-05-19 | 2021-09-21 | 北京轻松筹信息技术有限公司 | 网络训练方法、风险用户识别方法、装置、设备及介质 |
CN113592293A (zh) * | 2021-07-29 | 2021-11-02 | 上海掌门科技有限公司 | 风险识别处理方法、电子设备及计算机可读存储介质 |
CN114282688B (zh) * | 2022-03-02 | 2022-06-03 | 支付宝(杭州)信息技术有限公司 | 一种两方决策树训练方法和系统 |
CN115293650A (zh) * | 2022-03-07 | 2022-11-04 | 王建丰 | 基于大数据的风控处理方法及服务器 |
CN115859132B (zh) * | 2023-02-27 | 2023-05-09 | 广州帝隆科技股份有限公司 | 一种基于神经网络模型的大数据风险管控方法及系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9516053B1 (en) * | 2015-08-31 | 2016-12-06 | Splunk Inc. | Network security threat detection by user/user-entity behavioral analysis |
CN106845999A (zh) * | 2017-02-20 | 2017-06-13 | 百度在线网络技术(北京)有限公司 | 风险用户识别方法、装置和服务器 |
CN107316198A (zh) * | 2016-04-26 | 2017-11-03 | 阿里巴巴集团控股有限公司 | 账户风险识别方法及装置 |
CN107886243A (zh) * | 2017-11-10 | 2018-04-06 | 阿里巴巴集团控股有限公司 | 风险识别模型构建和风险识别方法、装置及设备 |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2288987A4 (en) | 2008-06-12 | 2015-04-01 | Guardian Analytics Inc | USER MODELING FOR DETECTING FRAUD AND ANALYSIS |
TW201025177A (en) * | 2008-12-19 | 2010-07-01 | Folion Financial Technology Co Ltd | Money investment simulation system based on investment analysis, and combination of time compression and event schedule |
US20110131131A1 (en) * | 2009-12-01 | 2011-06-02 | Bank Of America Corporation | Risk pattern determination and associated risk pattern alerts |
US9916538B2 (en) * | 2012-09-15 | 2018-03-13 | Z Advanced Computing, Inc. | Method and system for feature detection |
US9531736B1 (en) * | 2012-12-24 | 2016-12-27 | Narus, Inc. | Detecting malicious HTTP redirections using user browsing activity trees |
US9904893B2 (en) * | 2013-04-02 | 2018-02-27 | Patternex, Inc. | Method and system for training a big data machine to defend |
WO2015037499A1 (ja) | 2013-09-13 | 2015-03-19 | 株式会社Ubic | 行動解析システム、行動解析方法および行動解析プログラム |
CN104778176A (zh) * | 2014-01-13 | 2015-07-15 | 阿里巴巴集团控股有限公司 | 一种数据搜索处理方法及装置 |
US9800605B2 (en) * | 2015-01-30 | 2017-10-24 | Securonix, Inc. | Risk scoring for threat assessment |
US9960912B2 (en) | 2015-07-06 | 2018-05-01 | Quanta Computer Inc. | Key management for a rack server system |
US11164089B2 (en) * | 2015-10-12 | 2021-11-02 | International Business Machines Corporation | Transaction data analysis |
CN105740707B (zh) * | 2016-01-20 | 2019-11-05 | 北京京东尚科信息技术有限公司 | 恶意文件的识别方法和装置 |
US11423414B2 (en) * | 2016-03-18 | 2022-08-23 | Fair Isaac Corporation | Advanced learning system for detection and prevention of money laundering |
US10154051B2 (en) * | 2016-08-31 | 2018-12-11 | Cisco Technology, Inc. | Automatic detection of network threats based on modeling sequential behavior in network traffic |
CN106650273B (zh) * | 2016-12-28 | 2019-08-23 | 东方网力科技股份有限公司 | 一种行为预测方法和装置 |
CN107067283B (zh) * | 2017-04-21 | 2021-05-18 | 重庆邮电大学 | 基于历史商家记录及用户行为的电商消费客流量预测方法 |
US11271954B2 (en) * | 2017-07-14 | 2022-03-08 | Cisco Technology, Inc. | Generating a vector representative of user behavior in a network |
-
2017
- 2017-11-10 CN CN201711106115.9A patent/CN107886243A/zh active Pending
-
2018
- 2018-08-17 WO PCT/CN2018/100989 patent/WO2019091177A1/zh unknown
- 2018-08-17 EP EP18876802.2A patent/EP3648023A4/en not_active Ceased
- 2018-08-17 SG SG11202000861PA patent/SG11202000861PA/en unknown
- 2018-09-10 TW TW107131693A patent/TWI688917B/zh active
-
2020
- 2020-02-28 US US16/805,141 patent/US10977739B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9516053B1 (en) * | 2015-08-31 | 2016-12-06 | Splunk Inc. | Network security threat detection by user/user-entity behavioral analysis |
CN107316198A (zh) * | 2016-04-26 | 2017-11-03 | 阿里巴巴集团控股有限公司 | 账户风险识别方法及装置 |
CN106845999A (zh) * | 2017-02-20 | 2017-06-13 | 百度在线网络技术(北京)有限公司 | 风险用户识别方法、装置和服务器 |
CN107886243A (zh) * | 2017-11-10 | 2018-04-06 | 阿里巴巴集团控股有限公司 | 风险识别模型构建和风险识别方法、装置及设备 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3648023A4 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112016788A (zh) * | 2020-07-14 | 2020-12-01 | 北京淇瑀信息科技有限公司 | 风险控制策略生成及风险控制方法、装置和电子设备 |
CN113570204A (zh) * | 2021-07-06 | 2021-10-29 | 北京淇瑀信息科技有限公司 | 用户行为预测方法、系统和计算机设备 |
Also Published As
Publication number | Publication date |
---|---|
US10977739B2 (en) | 2021-04-13 |
TW201923685A (zh) | 2019-06-16 |
SG11202000861PA (en) | 2020-02-27 |
TWI688917B (zh) | 2020-03-21 |
EP3648023A4 (en) | 2020-08-12 |
CN107886243A (zh) | 2018-04-06 |
US20200202449A1 (en) | 2020-06-25 |
EP3648023A1 (en) | 2020-05-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019091177A1 (zh) | 风险识别模型构建和风险识别方法、装置及设备 | |
Zhang et al. | efraudcom: An e-commerce fraud detection system via competitive graph neural networks | |
Rizoiu et al. | Hawkes processes for events in social media | |
Wu et al. | Deep learning for video classification and captioning | |
WO2023124204A1 (zh) | 反欺诈风险评估方法、训练方法、装置及可读存储介质 | |
CN111177569A (zh) | 基于人工智能的推荐处理方法、装置及设备 | |
CN111523677B (zh) | 实现对机器学习模型的预测结果进行解释的方法及装置 | |
CN106027577A (zh) | 一种异常访问行为检测方法及装置 | |
CN106874253A (zh) | 识别敏感信息的方法及装置 | |
Wu | Masked face recognition algorithm for a contactless distribution cabinet | |
CN112035549B (zh) | 数据挖掘方法、装置、计算机设备及存储介质 | |
CN112329816A (zh) | 数据分类方法、装置、电子设备和可读存储介质 | |
CN112990583B (zh) | 一种数据预测模型的入模特征确定方法及设备 | |
CN110913354A (zh) | 短信分类方法、装置及电子设备 | |
CN112966113A (zh) | 一种数据的风险防控方法、装置及设备 | |
CN114268747A (zh) | 基于虚拟数字人的访谈业务处理方法及相关装置 | |
CN113887214B (zh) | 基于人工智能的意愿推测方法、及其相关设备 | |
CN105809488B (zh) | 一种信息处理方法及电子设备 | |
CN110929285B (zh) | 一种隐私数据的处理方法及装置 | |
CN117009670A (zh) | 基于用户画像的综合推荐方法、装置、设备及存储介质 | |
CN113259369B (zh) | 一种基于机器学习成员推断攻击的数据集认证方法及系统 | |
CN105512914A (zh) | 一种信息处理方法及电子设备 | |
CN113780318B (zh) | 用于生成提示信息的方法、装置、服务器和介质 | |
CN113706207A (zh) | 基于语义解析的订单成交率分析方法、装置、设备及介质 | |
Zhao et al. | Detecting fake reviews via dynamic multimode network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18876802 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2018876802 Country of ref document: EP Effective date: 20200131 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |