CN113919905A - Risk user identification method, system, equipment and storage medium - Google Patents
Risk user identification method, system, equipment and storage medium Download PDFInfo
- Publication number
- CN113919905A CN113919905A CN202111143506.4A CN202111143506A CN113919905A CN 113919905 A CN113919905 A CN 113919905A CN 202111143506 A CN202111143506 A CN 202111143506A CN 113919905 A CN113919905 A CN 113919905A
- Authority
- CN
- China
- Prior art keywords
- vector
- sequence
- layer
- user
- request
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0609—Buyer or seller confidence or verification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The application discloses a risk user identification method, which comprises the following steps: and (3) sequence splicing: generating a corresponding request time interval sequence based on the acquired request behavior sequence of the user to be identified, and splicing the request behavior sequence and the request time interval sequence to generate a new request behavior sequence; vector conversion step: converting the new request behavior sequence into a computable vector; an identification step: and inputting the calculable vector into the risk user identification model to complete risk identification of the user to be identified. The method of the invention is a method for identifying the risk users by constructing a request sequence behavior analysis model of the users through a deep learning method.
Description
Technical Field
The present application relates to the field of information query, and in particular, to a method, a system, a computer device, and a computer-readable storage medium for identifying a risky user.
Background
With the development of e-commerce platforms in high-speed rail and passenger transport, great convenience is brought to people for traveling, and people buy tickets through various channels such as 12306 websites, mobile phone clients, telephones, self-service machines and the like, so that the situation of difficulty in ticket buying in the past is gradually changed. However, the following problems still exist in the current train and automobile operation system: every holiday, the passenger flow peak appears, the ticket is still a short social resource, various ticket robbing systems are born, unreasonable distribution of resources is caused, and bad phenomena such as ticket falling, ticket selling and the like also exist.
The ticket sources of the hot train numbers and hot lines are relatively tense during holidays, so that some passengers use malicious tickets. The frequent ticket brushing of the ticket robbing systems brings huge pressure to the regular ticket selling system of the railway, and influences the stable operation of the system. Therefore, in order to maintain a fair and stable ticket buying environment, ticket buying of the risky user needs to be identified and controlled. The risk user has a series of request actions such as login, residual ticket query and the like when purchasing the ticket to form an access sequence. The request behavior sequence of the user represents the operation habit of the user, and the analysis of the request behavior sequence of the user is also an effective way for mining the risk user.
When the traditional method identifies abnormal users, most of the abnormal users are established with a certain pattern library, and when a new user request exists, the request sequence is matched with the request behavior in the pattern library, so that the abnormal users are discovered. According to the analysis mode of the user request, each request of the user needs to be matched with the whole pattern library, the efficiency is low, the matching effect is related to the number of abnormal access patterns in the pattern library, the pattern library is built by accumulating for a certain time and judging by experts, and the period is long.
The invention provides a set of solution for solving the problem of risk user identification based on deep learning and behavior sequence combination, and discloses a method for identifying risk users by constructing a request sequence behavior analysis model of the users by using a deep learning method.
Disclosure of Invention
The embodiment of the application provides a solution for solving the problem of risk user identification based on deep learning and behavior sequence combination.
In a first aspect, an embodiment of the present application provides a method for identifying a risky user, including:
and (3) sequence splicing: generating a corresponding request time interval sequence based on the acquired request behavior sequence of the user to be identified, and splicing the request behavior sequence and the request time interval sequence to generate a new request behavior sequence;
vector conversion step: converting the new request behavior sequence into a computable vector based on the vector conversion;
an identification step: and inputting the calculable vector into the risk user identification model to complete risk identification of the user to be identified.
In some embodiments, the risk user identification model comprises: one-dimensional convolutional layers, LSTM layers, and attention layers.
In some embodiments, the vector conversion step comprises:
text conversion: taking the new request behavior sequence as a text to perform index conversion and filling processing to generate a request sequence text;
embedding conversion step: the request sequence text is converted into a computable vector through an embedding layer.
In some embodiments, the identifying step comprises:
a characteristic extraction step: completing feature extraction capable of calculating vectors based on the one-dimensional convolution layer, the LSTM layer and the attention layer;
and (3) feature classification step: and outputting the classification probability of the user to be identified by adopting a softmax layer through the calculable vector after the characteristic extraction passes through the full connection layer, and generating the final risk score of the user to be identified.
In some embodiments, the feature extracting step includes:
and (3) one-dimensional convolution step: adopting a one-dimensional convolution layer to complete the extraction of local mode features of the calculable vector and output a one-dimensional convolution vector;
and (3) LSTM step: adopting an LSTM layer to complete the time sequence data characteristic extraction of the one-dimensional convolution vector and output an LSTM vector;
attention layer step: and (4) adopting an attention layer, completing feature extraction of the LSTM vector based on semantic contribution weighting, and outputting an attention weight vector.
In a second aspect, an embodiment of the present application provides a system for identifying a risky user, where the method for identifying a risky user is adopted, and includes:
a sequence splicing module: generating a corresponding request time interval sequence based on the acquired request behavior sequence of the user to be identified, and splicing the request behavior sequence and the request time interval sequence to generate a new request behavior sequence;
a vector conversion module: converting the new request behavior sequence into a computable vector based on the vector conversion;
an identification module: and inputting the calculable vector into the risk user identification model to complete risk identification of the user to be identified.
In some embodiments, the risk user identification model comprises: one-dimensional convolutional layers, LSTM layers, and attention layers.
In some embodiments, the vector conversion module comprises:
a text conversion module: taking the new request behavior sequence as a text to perform index conversion and filling processing to generate a request sequence text;
the Embedding conversion module: the request sequence text is converted into a computable vector through an embedding layer.
In some embodiments, the identification module comprises:
a feature extraction module: completing feature extraction capable of calculating vectors based on the one-dimensional convolution layer, the LSTM layer and the attention layer;
a feature classification module: and outputting the classification probability of the user to be identified by adopting a softmax layer through the calculable vector after the characteristic extraction passes through the full connection layer, and generating the final risk score of the user to be identified.
In some embodiments, the feature extraction module comprises:
a one-dimensional convolution module: adopting a one-dimensional convolution layer to complete the extraction of local mode features of the calculable vector and output a one-dimensional convolution vector;
LSTM module: adopting an LSTM layer to complete the time sequence data characteristic extraction of the one-dimensional convolution vector and output an LSTM vector;
attention layer module: and (4) adopting an attention layer, completing feature extraction of the LSTM vector based on semantic contribution weighting, and outputting an attention weight vector.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the method for identifying a risky user according to the first aspect when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method for identifying an at-risk user according to the first aspect.
Compared with the related prior art, the method has the following outstanding advantages:
1) the method mainly realizes the risk user identification based on the user request behavior sequence, provides favorable data support for the railway accurate attack risk user ticket purchasing, and better protects the ticket purchasing rights and interests of passengers;
2) the invention provides a risk user identification method based on deep learning and behavior sequence combination, which mainly solves the problem of identifying risk users by a deep learning method based on a user request sequence;
3) the method organically combines the request behavior sequence and the time interval sequence of the user, converts the request behavior sequence and the time interval sequence into a computable vector by using a text analysis method, and constructs a risk user identification model by using one-dimensional convolution and an LSTM deep learning technology.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a method of risk user identification of the present invention;
FIG. 2 is a diagram of a one-dimensional convolution and LSTM based risky user identification model architecture;
FIG. 3 is a flow chart of data change according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the present invention risky user identification system;
fig. 5 is a hardware structure diagram of a computer device according to an embodiment of the present application.
In the above figures:
100 risk user identification system
10 sequence splicing module 20 vector conversion module
30 identification module
81. A processor; 82. a memory; 83. a communication interface; 80. a bus.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The invention aims to provide a risk user identification method based on deep learning and behavior sequence combination, and mainly solves the problem of identifying risk users by using a deep learning method based on a user request sequence. The method comprises the steps of organically combining a request behavior sequence and a time interval sequence of a user, converting the request behavior sequence and the time interval sequence into a vector capable of being calculated by using a text analysis method, and constructing a risk user identification model by using one-dimensional convolution and an LSTM deep learning technology.
The method of the invention comprises the steps of firstly obtaining a request behavior sequence of a user, generating a corresponding request time interval sequence, combining the two sequences of the user, and then converting the combined sequence into a computable vector by a text analysis method and an Embedding method to be used as the input of a model. And finally constructing a risk user identification model for the request behavior sequence by using the one-dimensional convolution, the LSTM and the attention layer.
Fig. 1 is a method for identifying a risky user according to the present invention, and as shown in fig. 1, the method of the present invention provides a method for identifying a risky user, including:
sequence stitching step S10: generating a corresponding request time interval sequence based on the acquired request behavior sequence of the user to be identified, and splicing the request behavior sequence and the request time interval sequence to generate a new request behavior sequence;
vector conversion step S20: converting the new request behavior sequence into a computable vector based on the vector conversion;
identification step S30: and inputting the calculable vector into the risk user identification model to complete risk identification of the user to be identified.
Wherein the risk user identification model comprises: one-dimensional convolutional layers, LSTM layers, and attention layers.
Wherein the vector conversion step S20 includes:
text conversion: taking the new request behavior sequence as a text to perform index conversion and filling processing to generate a request sequence text;
embedding conversion step: the request sequence text is converted into a computable vector through an embedding layer.
Wherein the identifying step S30 includes:
a characteristic extraction step: completing feature extraction capable of calculating vectors based on the one-dimensional convolution layer, the LSTM layer and the attention layer;
and (3) feature classification step: and outputting the classification probability of the user to be identified by adopting a softmax layer through the calculable vector after the characteristic extraction passes through the full connection layer, and generating the final risk score of the user to be identified.
Further, the feature extraction step includes:
and (3) one-dimensional convolution step: adopting a one-dimensional convolution layer to complete the extraction of local mode features of the calculable vector and output a one-dimensional convolution vector;
and (3) LSTM step: adopting an LSTM layer to complete the time sequence data characteristic extraction of the one-dimensional convolution vector and output an LSTM vector;
attention layer step: and (4) adopting an attention layer, completing feature extraction of the LSTM vector based on semantic contribution weighting, and outputting an attention weight vector.
The following detailed description of specific embodiments of the invention refers to the accompanying drawings in which:
FIG. 2 is a diagram of a risk user identification model based on one-dimensional convolution and LSTM, as shown in FIG. 2:
as shown in fig. 2, a new sequence is formed by splicing a request sequence and a time interval sequence, and the model structure mainly includes an Embedding layer, a one-dimensional convolution layer, an LSTM layer, an Attention layer, and a sense layer, and for the following description, the model is abbreviated as Conv1D _ LSTM _ At.
(1) Method for combining request sequence and time interval sequence
Since the request sequence and the time interval are different types of data and the request sequence and the time interval cannot be directly spliced, the time interval is considered as a 'no-operation' request and is inserted into the request sequence, and the number of times of the 'no-operation' request is determined according to the size of the time interval. Because the value of the time interval is large and the distribution range is wide, the numerical value needs to be discretized, the noise caused by sample deviation is reduced, and the influence of extreme values and abnormal values can be effectively weakened. Unsupervised discretization is used on the time interval variables, i.e. sample data is divided into a plurality of spaces, each space being identified by a discrete value. The unsupervised Discretization algorithm includes an Equal-Distance Discretization Method (ED), an Equal-Frequency Discretization Method (EF), an Approximate Equal-Frequency Discretization Method (AEFD), a Discretization algorithm based on local density, a Discretization algorithm based on clustering, and the like. The user request time interval is divided into 6 levels by using a discretization method based on local density, as shown in table 1.
TABLE 1 request time interval ranking for users
Each level represents the number of times of 'no-operation' requests, and the difference of the time interval length is embodied by different levels, for example, if the level is 3,3 'no-operation' requests are inserted between corresponding request sequences. And if the request sequence is [ login, common contact person is added, order is submitted ], and the corresponding interval grade is [1,3], the new sequence is [ login, no operation, common contact person is added, no operation, order is submitted ]. The final request sequence string is considered as text as input to the model.
(2) Word vector representation
The request sequence is regarded as a string of texts, each operation request is regarded as a word, and the method mainly analyzes 17 requests of ticket purchasing of a user and a request of 'no operation', wherein the total number of the requests is 18, and each sentence is equivalent to that each sentence is composed of some words in 18 words. Dictionary corresponding to the request sequence is generated through Dictionary in the generic. corpa, corresponding words are converted into corresponding indexes through doc2idx function, an index list corresponding to each sample is generated, and the filling of the length shortage is 0. Taking [ login, no operation, add common contact, no operation, submit order ] as an example, the generated dictionary is [ login, no operation, add common contact, submit order ], the index corresponding to each word is [1,3,2,4], and the index list corresponding to the above sequence is [1,3,2,3,3, 4 ]. Assuming that the maximum length is 10, the index list needs to be filled, since the time point of the last access by the user is used as the end point, the request sequence before the time point is counted, if the length is insufficient, the request sequence needs to be supplemented at the forefront of the sequence, and as a result, the length is [0,0,0,1,3,2,3,3,3,4], which is used as the input of the Embedding.
In the Embedding layer, the index sequence of the user is converted into a corresponding vector, if the length of the request sequence of the user is 10, the dimension of the word vector set by the Embedding layer is assumed to be 5, and after passing through the Embedding layer, each sample generates a 10 × 5 two-dimensional vector as the input of the feature extraction layer. The processing of the Embedding layer is mainly used for compressing high-dimensional sparse data, the number of words in the text is small, and therefore the main function of the Embedding layer is to extract the context relationship.
(3) Feature extraction and classification
FIG. 3 is a flow chart of data change according to an embodiment of the present invention, and as shown in FIG. 3, the feature extraction layer includes three parts in total, namely, a one-dimensional convolution layer, an LSTM layer, and an Attention layer. Since the one-dimensional CNN is good at extracting the features of the local pattern, a layer of one-dimensional convolution is preferentially used for feature extraction. Considering that a text string formed by a request sequence has the property of a text and is time sequence data with definite time sequence relation, LSTM is good at extracting the characteristics of the time sequence data, and good results are obtained in speech and text analysis, so that an LSTM layer is added after one-dimensional convolution for characteristic extraction.
In pair requestWhen the features extracted from the sequence are classified, the contribution degrees of the features extracted from different requests to the final classification calculation are different, particularly the contribution degrees of the 'no operation' and the normal service request are different, so that the contribution of the features of the important requests to the semantic are weighted to the features of the sentence level by using an Attention mechanism, and the real semantic of the request sequence can be expressed more deeply. Assume that the output vector of the LSTM network structure is denoted as h1,h2,…,hn]Coding each feature hi of the LSTM output by using a formula (4-7), carrying out one-time nonlinear transformation by using a tanh activation function to obtain ui, and obtaining an attribute weight vector W (beta) of each component ui by using a softmax function according to a formula (4-8)1,β2,…βn) And finally, outputting the vector [ h ] of the LSTM structure through a formula (4-9)1,h2,…,hn]And weighting the sum to obtain LSTM extracted sentence feature vector representation S.
ui=tanh(Whhi+bh) (4-7)
βi=softmax(Wβui) (4-8)
And finally, adding a full connection layer in the feature extraction layer, and outputting the probability of belonging to each category by using a softmax layer to realize final user classification. And taking the probability of the category being 1 in the result as the final risk score of the user. Taking the 10 × 5 vector of the word vector output as an example, the flow of the data change is shown in fig. 3 according to 3 convolution kernels, hidden _ size ═ 8.
Fig. 4 is a schematic diagram of a risk user identification system of the present invention, and as shown in fig. 4, the present invention provides a risk user identification system 100, which adopts the above risk user identification method, including:
sequence splicing module 10: generating a corresponding request time interval sequence based on the acquired request behavior sequence of the user to be identified, and splicing the request behavior sequence and the request time interval sequence to generate a new request behavior sequence;
the vector conversion module 20: converting the new request behavior sequence into a computable vector based on the vector conversion;
the identification module 30: and inputting the calculable vector into the risk user identification model to complete risk identification of the user to be identified.
Wherein the risk user identification model comprises: one-dimensional convolutional layers, LSTM layers, and attention layers.
Wherein, the vector conversion module includes:
a text conversion module: taking the new request behavior sequence as a text to perform index conversion and filling processing to generate a request sequence text;
the Embedding conversion module: the request sequence text is converted into a computable vector through an embedding layer.
Wherein, the identification module includes:
a feature extraction module: completing feature extraction capable of calculating vectors based on the one-dimensional convolution layer, the LSTM layer and the attention layer;
a feature classification module: and outputting the classification probability of the user to be identified by adopting a softmax layer through the calculable vector after the characteristic extraction passes through the full connection layer, and generating the final risk score of the user to be identified.
Wherein, the feature extraction module includes:
a one-dimensional convolution module: adopting a one-dimensional convolution layer to complete the extraction of local mode features of the calculable vector and output a one-dimensional convolution vector;
LSTM module: adopting an LSTM layer to complete the time sequence data characteristic extraction of the one-dimensional convolution vector and output an LSTM vector;
attention layer module: and (4) adopting an attention layer, completing feature extraction of the LSTM vector based on semantic contribution weighting, and outputting an attention weight vector.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the method for identifying a risky user according to the first aspect when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method for identifying an at-risk user according to the first aspect.
In addition, the risk user identification method of the embodiment of the present application described in conjunction with fig. 1 may be implemented by a computer device. Fig. 5 is a hardware structure diagram of a computer device according to an embodiment of the present application.
The computer device may comprise a processor 81 and a memory 82 in which computer program instructions are stored.
Specifically, the processor 81 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
The memory 82 may be used to store or cache various data files for processing and/or communication use, as well as possible computer program instructions executed by the processor 81.
The processor 81 implements any of the above-described embodiments of the method of risk user identification by reading and executing computer program instructions stored in the memory 82.
In some of these embodiments, the computer device may also include a communication interface 83 and a bus 80. As shown in fig. 5, the processor 81, the memory 82, and the communication interface 83 are connected via the bus 80 to complete communication therebetween.
The communication interface 83 is used for implementing communication between modules, devices, units and/or equipment in the embodiment of the present application. The communication port 83 may also be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.
The computer device may implement the risky user identification method described in connection with fig. 1 based on the usage relationship.
Compared with the prior art, the method mainly realizes the risk user identification based on the user request behavior sequence, provides favorable data support for the railway accurate hit risk user ticket purchasing, and better protects the ticket purchasing rights and interests of passengers; the invention provides a risk user identification method based on deep learning and behavior sequence combination, which mainly solves the problem of identifying risk users by a deep learning method based on a user request sequence; the method organically combines the request behavior sequence and the time interval sequence of the user, converts the request behavior sequence and the time interval sequence into a computable vector by using a text analysis method, and constructs a risk user identification model by using one-dimensional convolution and an LSTM deep learning technology.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (12)
1. A method for identifying an at-risk user, comprising:
and (3) sequence splicing: generating a corresponding request time interval sequence based on the acquired request behavior sequence of the user to be identified, and splicing the request behavior sequence and the request time interval sequence of the user to be identified to generate a new request behavior sequence;
vector conversion step: converting the new request behavior sequence into a calculable vector;
an identification step: and inputting the calculable vector into a risk user identification model to complete risk identification of the user to be identified.
2. The risky user identification method according to claim 1, wherein the risky user identification model comprises: one-dimensional convolutional layers, LSTM layers, and attention layers.
3. The method of claim 2, wherein the vector transformation step comprises:
text conversion: taking the new request behavior sequence as a text to perform index conversion and filling processing to generate a request sequence text;
embedding conversion step: and converting the request sequence text into the calculable vector through an embedding layer.
4. The method for identifying a risky user according to claim 2, wherein the identifying step comprises:
a characteristic extraction step: completing feature extraction of the calculable vector based on the one-dimensional convolutional layer, the LSTM layer and the attention layer;
and (3) feature classification step: and outputting the classification probability of the user to be identified by adopting a softmax layer through the calculable vector after feature extraction through a full connection layer, and generating the final risk score of the user to be identified.
5. The risky user identification method according to claim 4, wherein the feature extraction step comprises:
and (3) one-dimensional convolution step: adopting the one-dimensional convolution layer to complete the local mode feature extraction of the calculable vector and output a one-dimensional convolution vector;
and (3) LSTM step: adopting the LSTM layer to complete the time sequence data feature extraction of the one-dimensional convolution vector and output an LSTM vector;
attention layer step: and (4) adopting an attention layer, completing feature extraction of the LSTM vector based on semantic contribution weighting, and outputting an attention weight vector.
6. A risky user identification system using the risky user identification method according to any one of claims 1 to 5, comprising:
a sequence splicing module: generating a corresponding request time interval sequence based on the acquired request behavior sequence of the user to be identified, and splicing the request behavior sequence and the request time interval sequence to generate a new request behavior sequence;
a vector conversion module: converting the new request behavior sequence into a computable vector based on a vector conversion;
an identification module: and inputting the calculable vector into a risk user identification model to complete risk identification of the user to be identified.
7. The risky user identification system of claim 6, wherein the risky user identification model comprises: one-dimensional convolutional layers, LSTM layers, and attention layers.
8. The at risk user identification system of claim 7, wherein the vector conversion module comprises:
a text conversion module: taking the new request behavior sequence as a text to perform index conversion and filling processing to generate a request sequence text;
the Embedding conversion module: and converting the request sequence text into the calculable vector through an embedding layer.
9. The at risk user identification system of claim 7, wherein the identification module comprises:
a feature extraction module: completing the feature extraction of the calculable vector based on the one-dimensional convolution layer, the LSTM layer and the attention layer;
a feature classification module: and outputting the classification probability of the user to be identified by adopting a softmax layer through the calculable vector after feature extraction through a full connection layer, and generating the final risk score of the user to be identified.
10. The at risk user identification system of claim 9, wherein the feature extraction module comprises:
a one-dimensional convolution module: adopting the one-dimensional convolution layer to complete the local mode feature extraction of the calculable vector and output a one-dimensional convolution vector;
LSTM module: adopting the LSTM layer to complete the time sequence data feature extraction of the one-dimensional convolution vector and output an LSTM vector;
attention layer module: and (4) adopting an attention layer, completing feature extraction of the LSTM vector based on semantic contribution weighting, and outputting an attention weight vector.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of at risk user identification according to any of claims 1 to 5 when executing the computer program.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method for at risk user identification according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111143506.4A CN113919905A (en) | 2021-09-28 | 2021-09-28 | Risk user identification method, system, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111143506.4A CN113919905A (en) | 2021-09-28 | 2021-09-28 | Risk user identification method, system, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113919905A true CN113919905A (en) | 2022-01-11 |
Family
ID=79236635
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111143506.4A Pending CN113919905A (en) | 2021-09-28 | 2021-09-28 | Risk user identification method, system, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113919905A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023007921A1 (en) * | 2021-07-30 | 2023-02-02 | 株式会社Nttドコモ | Time-series data processing device |
-
2021
- 2021-09-28 CN CN202111143506.4A patent/CN113919905A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023007921A1 (en) * | 2021-07-30 | 2023-02-02 | 株式会社Nttドコモ | Time-series data processing device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108363790A (en) | For the method, apparatus, equipment and storage medium to being assessed | |
CN108733682B (en) | Method and device for generating multi-document abstract | |
CN110162620B (en) | Method and device for detecting black advertisements, server and storage medium | |
CN111539197B (en) | Text matching method and device, computer system and readable storage medium | |
CN103577452A (en) | Website server and method and device for enriching content of website | |
CN113392209B (en) | Text clustering method based on artificial intelligence, related equipment and storage medium | |
CN111143547B (en) | Big data display method based on knowledge graph | |
CN111984792A (en) | Website classification method and device, computer equipment and storage medium | |
CN108108468A (en) | A kind of short text sentiment analysis method and apparatus based on concept and text emotion | |
CN112800292A (en) | Cross-modal retrieval method based on modal specificity and shared feature learning | |
Selamat et al. | Word-length algorithm for language identification of under-resourced languages | |
CN114936266A (en) | Multi-modal fusion rumor early detection method and system based on gating mechanism | |
CN113836938A (en) | Text similarity calculation method and device, storage medium and electronic device | |
CN112232070A (en) | Natural language processing model construction method, system, electronic device and storage medium | |
CN113535912B (en) | Text association method and related equipment based on graph rolling network and attention mechanism | |
CN113204624B (en) | Multi-feature fusion text emotion analysis model and device | |
CN113919905A (en) | Risk user identification method, system, equipment and storage medium | |
CN110162769B (en) | Text theme output method and device, storage medium and electronic device | |
CN113220964B (en) | Viewpoint mining method based on short text in network message field | |
CN110909247B (en) | Text information pushing method, electronic equipment and computer storage medium | |
WO2022262632A1 (en) | Webpage search method and apparatus, and storage medium | |
Aouchiche et al. | Authorship attribution in twitter: a comparative study of machine learning and deep learning approaches | |
CN114936282A (en) | Financial risk cue determination method, apparatus, device and medium | |
CN113536773A (en) | Commodity comment sentiment analysis method and system, electronic equipment and storage medium | |
Hung et al. | Aafndl-an accurate fake information recognition model using deep learning for the vietnamese language |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |