CN113919905A - Risk user identification method, system, equipment and storage medium - Google Patents

Risk user identification method, system, equipment and storage medium Download PDF

Info

Publication number
CN113919905A
CN113919905A CN202111143506.4A CN202111143506A CN113919905A CN 113919905 A CN113919905 A CN 113919905A CN 202111143506 A CN202111143506 A CN 202111143506A CN 113919905 A CN113919905 A CN 113919905A
Authority
CN
China
Prior art keywords
vector
sequence
layer
user
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111143506.4A
Other languages
Chinese (zh)
Inventor
杨立鹏
樊春美
景辉
朱建生
阎志远
戴琳琳
梅巧玲
王拓
李雯
游雪松
张智
朱颖婷
王思宇
谢泽
李琪
徐东平
郝晓培
纪宇宣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Academy of Railway Sciences Corp Ltd CARS
Institute of Computing Technologies of CARS
Beijing Jingwei Information Technology Co Ltd
Original Assignee
China Academy of Railway Sciences Corp Ltd CARS
Institute of Computing Technologies of CARS
Beijing Jingwei Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Academy of Railway Sciences Corp Ltd CARS, Institute of Computing Technologies of CARS, Beijing Jingwei Information Technology Co Ltd filed Critical China Academy of Railway Sciences Corp Ltd CARS
Priority to CN202111143506.4A priority Critical patent/CN113919905A/en
Publication of CN113919905A publication Critical patent/CN113919905A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application discloses a risk user identification method, which comprises the following steps: and (3) sequence splicing: generating a corresponding request time interval sequence based on the acquired request behavior sequence of the user to be identified, and splicing the request behavior sequence and the request time interval sequence to generate a new request behavior sequence; vector conversion step: converting the new request behavior sequence into a computable vector; an identification step: and inputting the calculable vector into the risk user identification model to complete risk identification of the user to be identified. The method of the invention is a method for identifying the risk users by constructing a request sequence behavior analysis model of the users through a deep learning method.

Description

Risk user identification method, system, equipment and storage medium
Technical Field
The present application relates to the field of information query, and in particular, to a method, a system, a computer device, and a computer-readable storage medium for identifying a risky user.
Background
With the development of e-commerce platforms in high-speed rail and passenger transport, great convenience is brought to people for traveling, and people buy tickets through various channels such as 12306 websites, mobile phone clients, telephones, self-service machines and the like, so that the situation of difficulty in ticket buying in the past is gradually changed. However, the following problems still exist in the current train and automobile operation system: every holiday, the passenger flow peak appears, the ticket is still a short social resource, various ticket robbing systems are born, unreasonable distribution of resources is caused, and bad phenomena such as ticket falling, ticket selling and the like also exist.
The ticket sources of the hot train numbers and hot lines are relatively tense during holidays, so that some passengers use malicious tickets. The frequent ticket brushing of the ticket robbing systems brings huge pressure to the regular ticket selling system of the railway, and influences the stable operation of the system. Therefore, in order to maintain a fair and stable ticket buying environment, ticket buying of the risky user needs to be identified and controlled. The risk user has a series of request actions such as login, residual ticket query and the like when purchasing the ticket to form an access sequence. The request behavior sequence of the user represents the operation habit of the user, and the analysis of the request behavior sequence of the user is also an effective way for mining the risk user.
When the traditional method identifies abnormal users, most of the abnormal users are established with a certain pattern library, and when a new user request exists, the request sequence is matched with the request behavior in the pattern library, so that the abnormal users are discovered. According to the analysis mode of the user request, each request of the user needs to be matched with the whole pattern library, the efficiency is low, the matching effect is related to the number of abnormal access patterns in the pattern library, the pattern library is built by accumulating for a certain time and judging by experts, and the period is long.
The invention provides a set of solution for solving the problem of risk user identification based on deep learning and behavior sequence combination, and discloses a method for identifying risk users by constructing a request sequence behavior analysis model of the users by using a deep learning method.
Disclosure of Invention
The embodiment of the application provides a solution for solving the problem of risk user identification based on deep learning and behavior sequence combination.
In a first aspect, an embodiment of the present application provides a method for identifying a risky user, including:
and (3) sequence splicing: generating a corresponding request time interval sequence based on the acquired request behavior sequence of the user to be identified, and splicing the request behavior sequence and the request time interval sequence to generate a new request behavior sequence;
vector conversion step: converting the new request behavior sequence into a computable vector based on the vector conversion;
an identification step: and inputting the calculable vector into the risk user identification model to complete risk identification of the user to be identified.
In some embodiments, the risk user identification model comprises: one-dimensional convolutional layers, LSTM layers, and attention layers.
In some embodiments, the vector conversion step comprises:
text conversion: taking the new request behavior sequence as a text to perform index conversion and filling processing to generate a request sequence text;
embedding conversion step: the request sequence text is converted into a computable vector through an embedding layer.
In some embodiments, the identifying step comprises:
a characteristic extraction step: completing feature extraction capable of calculating vectors based on the one-dimensional convolution layer, the LSTM layer and the attention layer;
and (3) feature classification step: and outputting the classification probability of the user to be identified by adopting a softmax layer through the calculable vector after the characteristic extraction passes through the full connection layer, and generating the final risk score of the user to be identified.
In some embodiments, the feature extracting step includes:
and (3) one-dimensional convolution step: adopting a one-dimensional convolution layer to complete the extraction of local mode features of the calculable vector and output a one-dimensional convolution vector;
and (3) LSTM step: adopting an LSTM layer to complete the time sequence data characteristic extraction of the one-dimensional convolution vector and output an LSTM vector;
attention layer step: and (4) adopting an attention layer, completing feature extraction of the LSTM vector based on semantic contribution weighting, and outputting an attention weight vector.
In a second aspect, an embodiment of the present application provides a system for identifying a risky user, where the method for identifying a risky user is adopted, and includes:
a sequence splicing module: generating a corresponding request time interval sequence based on the acquired request behavior sequence of the user to be identified, and splicing the request behavior sequence and the request time interval sequence to generate a new request behavior sequence;
a vector conversion module: converting the new request behavior sequence into a computable vector based on the vector conversion;
an identification module: and inputting the calculable vector into the risk user identification model to complete risk identification of the user to be identified.
In some embodiments, the risk user identification model comprises: one-dimensional convolutional layers, LSTM layers, and attention layers.
In some embodiments, the vector conversion module comprises:
a text conversion module: taking the new request behavior sequence as a text to perform index conversion and filling processing to generate a request sequence text;
the Embedding conversion module: the request sequence text is converted into a computable vector through an embedding layer.
In some embodiments, the identification module comprises:
a feature extraction module: completing feature extraction capable of calculating vectors based on the one-dimensional convolution layer, the LSTM layer and the attention layer;
a feature classification module: and outputting the classification probability of the user to be identified by adopting a softmax layer through the calculable vector after the characteristic extraction passes through the full connection layer, and generating the final risk score of the user to be identified.
In some embodiments, the feature extraction module comprises:
a one-dimensional convolution module: adopting a one-dimensional convolution layer to complete the extraction of local mode features of the calculable vector and output a one-dimensional convolution vector;
LSTM module: adopting an LSTM layer to complete the time sequence data characteristic extraction of the one-dimensional convolution vector and output an LSTM vector;
attention layer module: and (4) adopting an attention layer, completing feature extraction of the LSTM vector based on semantic contribution weighting, and outputting an attention weight vector.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the method for identifying a risky user according to the first aspect when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method for identifying an at-risk user according to the first aspect.
Compared with the related prior art, the method has the following outstanding advantages:
1) the method mainly realizes the risk user identification based on the user request behavior sequence, provides favorable data support for the railway accurate attack risk user ticket purchasing, and better protects the ticket purchasing rights and interests of passengers;
2) the invention provides a risk user identification method based on deep learning and behavior sequence combination, which mainly solves the problem of identifying risk users by a deep learning method based on a user request sequence;
3) the method organically combines the request behavior sequence and the time interval sequence of the user, converts the request behavior sequence and the time interval sequence into a computable vector by using a text analysis method, and constructs a risk user identification model by using one-dimensional convolution and an LSTM deep learning technology.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a method of risk user identification of the present invention;
FIG. 2 is a diagram of a one-dimensional convolution and LSTM based risky user identification model architecture;
FIG. 3 is a flow chart of data change according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the present invention risky user identification system;
fig. 5 is a hardware structure diagram of a computer device according to an embodiment of the present application.
In the above figures:
100 risk user identification system
10 sequence splicing module 20 vector conversion module
30 identification module
81. A processor; 82. a memory; 83. a communication interface; 80. a bus.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The invention aims to provide a risk user identification method based on deep learning and behavior sequence combination, and mainly solves the problem of identifying risk users by using a deep learning method based on a user request sequence. The method comprises the steps of organically combining a request behavior sequence and a time interval sequence of a user, converting the request behavior sequence and the time interval sequence into a vector capable of being calculated by using a text analysis method, and constructing a risk user identification model by using one-dimensional convolution and an LSTM deep learning technology.
The method of the invention comprises the steps of firstly obtaining a request behavior sequence of a user, generating a corresponding request time interval sequence, combining the two sequences of the user, and then converting the combined sequence into a computable vector by a text analysis method and an Embedding method to be used as the input of a model. And finally constructing a risk user identification model for the request behavior sequence by using the one-dimensional convolution, the LSTM and the attention layer.
Fig. 1 is a method for identifying a risky user according to the present invention, and as shown in fig. 1, the method of the present invention provides a method for identifying a risky user, including:
sequence stitching step S10: generating a corresponding request time interval sequence based on the acquired request behavior sequence of the user to be identified, and splicing the request behavior sequence and the request time interval sequence to generate a new request behavior sequence;
vector conversion step S20: converting the new request behavior sequence into a computable vector based on the vector conversion;
identification step S30: and inputting the calculable vector into the risk user identification model to complete risk identification of the user to be identified.
Wherein the risk user identification model comprises: one-dimensional convolutional layers, LSTM layers, and attention layers.
Wherein the vector conversion step S20 includes:
text conversion: taking the new request behavior sequence as a text to perform index conversion and filling processing to generate a request sequence text;
embedding conversion step: the request sequence text is converted into a computable vector through an embedding layer.
Wherein the identifying step S30 includes:
a characteristic extraction step: completing feature extraction capable of calculating vectors based on the one-dimensional convolution layer, the LSTM layer and the attention layer;
and (3) feature classification step: and outputting the classification probability of the user to be identified by adopting a softmax layer through the calculable vector after the characteristic extraction passes through the full connection layer, and generating the final risk score of the user to be identified.
Further, the feature extraction step includes:
and (3) one-dimensional convolution step: adopting a one-dimensional convolution layer to complete the extraction of local mode features of the calculable vector and output a one-dimensional convolution vector;
and (3) LSTM step: adopting an LSTM layer to complete the time sequence data characteristic extraction of the one-dimensional convolution vector and output an LSTM vector;
attention layer step: and (4) adopting an attention layer, completing feature extraction of the LSTM vector based on semantic contribution weighting, and outputting an attention weight vector.
The following detailed description of specific embodiments of the invention refers to the accompanying drawings in which:
FIG. 2 is a diagram of a risk user identification model based on one-dimensional convolution and LSTM, as shown in FIG. 2:
as shown in fig. 2, a new sequence is formed by splicing a request sequence and a time interval sequence, and the model structure mainly includes an Embedding layer, a one-dimensional convolution layer, an LSTM layer, an Attention layer, and a sense layer, and for the following description, the model is abbreviated as Conv1D _ LSTM _ At.
(1) Method for combining request sequence and time interval sequence
Since the request sequence and the time interval are different types of data and the request sequence and the time interval cannot be directly spliced, the time interval is considered as a 'no-operation' request and is inserted into the request sequence, and the number of times of the 'no-operation' request is determined according to the size of the time interval. Because the value of the time interval is large and the distribution range is wide, the numerical value needs to be discretized, the noise caused by sample deviation is reduced, and the influence of extreme values and abnormal values can be effectively weakened. Unsupervised discretization is used on the time interval variables, i.e. sample data is divided into a plurality of spaces, each space being identified by a discrete value. The unsupervised Discretization algorithm includes an Equal-Distance Discretization Method (ED), an Equal-Frequency Discretization Method (EF), an Approximate Equal-Frequency Discretization Method (AEFD), a Discretization algorithm based on local density, a Discretization algorithm based on clustering, and the like. The user request time interval is divided into 6 levels by using a discretization method based on local density, as shown in table 1.
TABLE 1 request time interval ranking for users
Figure BDA0003284866270000081
Each level represents the number of times of 'no-operation' requests, and the difference of the time interval length is embodied by different levels, for example, if the level is 3,3 'no-operation' requests are inserted between corresponding request sequences. And if the request sequence is [ login, common contact person is added, order is submitted ], and the corresponding interval grade is [1,3], the new sequence is [ login, no operation, common contact person is added, no operation, order is submitted ]. The final request sequence string is considered as text as input to the model.
(2) Word vector representation
The request sequence is regarded as a string of texts, each operation request is regarded as a word, and the method mainly analyzes 17 requests of ticket purchasing of a user and a request of 'no operation', wherein the total number of the requests is 18, and each sentence is equivalent to that each sentence is composed of some words in 18 words. Dictionary corresponding to the request sequence is generated through Dictionary in the generic. corpa, corresponding words are converted into corresponding indexes through doc2idx function, an index list corresponding to each sample is generated, and the filling of the length shortage is 0. Taking [ login, no operation, add common contact, no operation, submit order ] as an example, the generated dictionary is [ login, no operation, add common contact, submit order ], the index corresponding to each word is [1,3,2,4], and the index list corresponding to the above sequence is [1,3,2,3,3, 4 ]. Assuming that the maximum length is 10, the index list needs to be filled, since the time point of the last access by the user is used as the end point, the request sequence before the time point is counted, if the length is insufficient, the request sequence needs to be supplemented at the forefront of the sequence, and as a result, the length is [0,0,0,1,3,2,3,3,3,4], which is used as the input of the Embedding.
In the Embedding layer, the index sequence of the user is converted into a corresponding vector, if the length of the request sequence of the user is 10, the dimension of the word vector set by the Embedding layer is assumed to be 5, and after passing through the Embedding layer, each sample generates a 10 × 5 two-dimensional vector as the input of the feature extraction layer. The processing of the Embedding layer is mainly used for compressing high-dimensional sparse data, the number of words in the text is small, and therefore the main function of the Embedding layer is to extract the context relationship.
(3) Feature extraction and classification
FIG. 3 is a flow chart of data change according to an embodiment of the present invention, and as shown in FIG. 3, the feature extraction layer includes three parts in total, namely, a one-dimensional convolution layer, an LSTM layer, and an Attention layer. Since the one-dimensional CNN is good at extracting the features of the local pattern, a layer of one-dimensional convolution is preferentially used for feature extraction. Considering that a text string formed by a request sequence has the property of a text and is time sequence data with definite time sequence relation, LSTM is good at extracting the characteristics of the time sequence data, and good results are obtained in speech and text analysis, so that an LSTM layer is added after one-dimensional convolution for characteristic extraction.
In pair requestWhen the features extracted from the sequence are classified, the contribution degrees of the features extracted from different requests to the final classification calculation are different, particularly the contribution degrees of the 'no operation' and the normal service request are different, so that the contribution of the features of the important requests to the semantic are weighted to the features of the sentence level by using an Attention mechanism, and the real semantic of the request sequence can be expressed more deeply. Assume that the output vector of the LSTM network structure is denoted as h1,h2,…,hn]Coding each feature hi of the LSTM output by using a formula (4-7), carrying out one-time nonlinear transformation by using a tanh activation function to obtain ui, and obtaining an attribute weight vector W (beta) of each component ui by using a softmax function according to a formula (4-8)1,β2,…βn) And finally, outputting the vector [ h ] of the LSTM structure through a formula (4-9)1,h2,…,hn]And weighting the sum to obtain LSTM extracted sentence feature vector representation S.
ui=tanh(Whhi+bh) (4-7)
βi=softmax(Wβui) (4-8)
Figure BDA0003284866270000101
And finally, adding a full connection layer in the feature extraction layer, and outputting the probability of belonging to each category by using a softmax layer to realize final user classification. And taking the probability of the category being 1 in the result as the final risk score of the user. Taking the 10 × 5 vector of the word vector output as an example, the flow of the data change is shown in fig. 3 according to 3 convolution kernels, hidden _ size ═ 8.
Fig. 4 is a schematic diagram of a risk user identification system of the present invention, and as shown in fig. 4, the present invention provides a risk user identification system 100, which adopts the above risk user identification method, including:
sequence splicing module 10: generating a corresponding request time interval sequence based on the acquired request behavior sequence of the user to be identified, and splicing the request behavior sequence and the request time interval sequence to generate a new request behavior sequence;
the vector conversion module 20: converting the new request behavior sequence into a computable vector based on the vector conversion;
the identification module 30: and inputting the calculable vector into the risk user identification model to complete risk identification of the user to be identified.
Wherein the risk user identification model comprises: one-dimensional convolutional layers, LSTM layers, and attention layers.
Wherein, the vector conversion module includes:
a text conversion module: taking the new request behavior sequence as a text to perform index conversion and filling processing to generate a request sequence text;
the Embedding conversion module: the request sequence text is converted into a computable vector through an embedding layer.
Wherein, the identification module includes:
a feature extraction module: completing feature extraction capable of calculating vectors based on the one-dimensional convolution layer, the LSTM layer and the attention layer;
a feature classification module: and outputting the classification probability of the user to be identified by adopting a softmax layer through the calculable vector after the characteristic extraction passes through the full connection layer, and generating the final risk score of the user to be identified.
Wherein, the feature extraction module includes:
a one-dimensional convolution module: adopting a one-dimensional convolution layer to complete the extraction of local mode features of the calculable vector and output a one-dimensional convolution vector;
LSTM module: adopting an LSTM layer to complete the time sequence data characteristic extraction of the one-dimensional convolution vector and output an LSTM vector;
attention layer module: and (4) adopting an attention layer, completing feature extraction of the LSTM vector based on semantic contribution weighting, and outputting an attention weight vector.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the method for identifying a risky user according to the first aspect when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method for identifying an at-risk user according to the first aspect.
In addition, the risk user identification method of the embodiment of the present application described in conjunction with fig. 1 may be implemented by a computer device. Fig. 5 is a hardware structure diagram of a computer device according to an embodiment of the present application.
The computer device may comprise a processor 81 and a memory 82 in which computer program instructions are stored.
Specifically, the processor 81 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 82 may include a Hard Disk Drive (Hard Disk Drive, abbreviated to HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 82 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.
The memory 82 may be used to store or cache various data files for processing and/or communication use, as well as possible computer program instructions executed by the processor 81.
The processor 81 implements any of the above-described embodiments of the method of risk user identification by reading and executing computer program instructions stored in the memory 82.
In some of these embodiments, the computer device may also include a communication interface 83 and a bus 80. As shown in fig. 5, the processor 81, the memory 82, and the communication interface 83 are connected via the bus 80 to complete communication therebetween.
The communication interface 83 is used for implementing communication between modules, devices, units and/or equipment in the embodiment of the present application. The communication port 83 may also be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.
Bus 80 includes hardware, software, or both to couple the components of the computer device to each other. Bus 80 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 80 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a Video Electronics Bus (audio Electronics Association), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 80 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The computer device may implement the risky user identification method described in connection with fig. 1 based on the usage relationship.
Compared with the prior art, the method mainly realizes the risk user identification based on the user request behavior sequence, provides favorable data support for the railway accurate hit risk user ticket purchasing, and better protects the ticket purchasing rights and interests of passengers; the invention provides a risk user identification method based on deep learning and behavior sequence combination, which mainly solves the problem of identifying risk users by a deep learning method based on a user request sequence; the method organically combines the request behavior sequence and the time interval sequence of the user, converts the request behavior sequence and the time interval sequence into a computable vector by using a text analysis method, and constructs a risk user identification model by using one-dimensional convolution and an LSTM deep learning technology.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (12)

1. A method for identifying an at-risk user, comprising:
and (3) sequence splicing: generating a corresponding request time interval sequence based on the acquired request behavior sequence of the user to be identified, and splicing the request behavior sequence and the request time interval sequence of the user to be identified to generate a new request behavior sequence;
vector conversion step: converting the new request behavior sequence into a calculable vector;
an identification step: and inputting the calculable vector into a risk user identification model to complete risk identification of the user to be identified.
2. The risky user identification method according to claim 1, wherein the risky user identification model comprises: one-dimensional convolutional layers, LSTM layers, and attention layers.
3. The method of claim 2, wherein the vector transformation step comprises:
text conversion: taking the new request behavior sequence as a text to perform index conversion and filling processing to generate a request sequence text;
embedding conversion step: and converting the request sequence text into the calculable vector through an embedding layer.
4. The method for identifying a risky user according to claim 2, wherein the identifying step comprises:
a characteristic extraction step: completing feature extraction of the calculable vector based on the one-dimensional convolutional layer, the LSTM layer and the attention layer;
and (3) feature classification step: and outputting the classification probability of the user to be identified by adopting a softmax layer through the calculable vector after feature extraction through a full connection layer, and generating the final risk score of the user to be identified.
5. The risky user identification method according to claim 4, wherein the feature extraction step comprises:
and (3) one-dimensional convolution step: adopting the one-dimensional convolution layer to complete the local mode feature extraction of the calculable vector and output a one-dimensional convolution vector;
and (3) LSTM step: adopting the LSTM layer to complete the time sequence data feature extraction of the one-dimensional convolution vector and output an LSTM vector;
attention layer step: and (4) adopting an attention layer, completing feature extraction of the LSTM vector based on semantic contribution weighting, and outputting an attention weight vector.
6. A risky user identification system using the risky user identification method according to any one of claims 1 to 5, comprising:
a sequence splicing module: generating a corresponding request time interval sequence based on the acquired request behavior sequence of the user to be identified, and splicing the request behavior sequence and the request time interval sequence to generate a new request behavior sequence;
a vector conversion module: converting the new request behavior sequence into a computable vector based on a vector conversion;
an identification module: and inputting the calculable vector into a risk user identification model to complete risk identification of the user to be identified.
7. The risky user identification system of claim 6, wherein the risky user identification model comprises: one-dimensional convolutional layers, LSTM layers, and attention layers.
8. The at risk user identification system of claim 7, wherein the vector conversion module comprises:
a text conversion module: taking the new request behavior sequence as a text to perform index conversion and filling processing to generate a request sequence text;
the Embedding conversion module: and converting the request sequence text into the calculable vector through an embedding layer.
9. The at risk user identification system of claim 7, wherein the identification module comprises:
a feature extraction module: completing the feature extraction of the calculable vector based on the one-dimensional convolution layer, the LSTM layer and the attention layer;
a feature classification module: and outputting the classification probability of the user to be identified by adopting a softmax layer through the calculable vector after feature extraction through a full connection layer, and generating the final risk score of the user to be identified.
10. The at risk user identification system of claim 9, wherein the feature extraction module comprises:
a one-dimensional convolution module: adopting the one-dimensional convolution layer to complete the local mode feature extraction of the calculable vector and output a one-dimensional convolution vector;
LSTM module: adopting the LSTM layer to complete the time sequence data feature extraction of the one-dimensional convolution vector and output an LSTM vector;
attention layer module: and (4) adopting an attention layer, completing feature extraction of the LSTM vector based on semantic contribution weighting, and outputting an attention weight vector.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of at risk user identification according to any of claims 1 to 5 when executing the computer program.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method for at risk user identification according to any one of claims 1 to 5.
CN202111143506.4A 2021-09-28 2021-09-28 Risk user identification method, system, equipment and storage medium Pending CN113919905A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111143506.4A CN113919905A (en) 2021-09-28 2021-09-28 Risk user identification method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111143506.4A CN113919905A (en) 2021-09-28 2021-09-28 Risk user identification method, system, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113919905A true CN113919905A (en) 2022-01-11

Family

ID=79236635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111143506.4A Pending CN113919905A (en) 2021-09-28 2021-09-28 Risk user identification method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113919905A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023007921A1 (en) * 2021-07-30 2023-02-02 株式会社Nttドコモ Time-series data processing device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023007921A1 (en) * 2021-07-30 2023-02-02 株式会社Nttドコモ Time-series data processing device

Similar Documents

Publication Publication Date Title
CN108363790A (en) For the method, apparatus, equipment and storage medium to being assessed
CN108733682B (en) Method and device for generating multi-document abstract
CN110162620B (en) Method and device for detecting black advertisements, server and storage medium
CN111539197B (en) Text matching method and device, computer system and readable storage medium
CN103577452A (en) Website server and method and device for enriching content of website
CN113392209B (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN111143547B (en) Big data display method based on knowledge graph
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN108108468A (en) A kind of short text sentiment analysis method and apparatus based on concept and text emotion
CN112800292A (en) Cross-modal retrieval method based on modal specificity and shared feature learning
Selamat et al. Word-length algorithm for language identification of under-resourced languages
CN114936266A (en) Multi-modal fusion rumor early detection method and system based on gating mechanism
CN113836938A (en) Text similarity calculation method and device, storage medium and electronic device
CN112232070A (en) Natural language processing model construction method, system, electronic device and storage medium
CN113535912B (en) Text association method and related equipment based on graph rolling network and attention mechanism
CN113204624B (en) Multi-feature fusion text emotion analysis model and device
CN113919905A (en) Risk user identification method, system, equipment and storage medium
CN110162769B (en) Text theme output method and device, storage medium and electronic device
CN113220964B (en) Viewpoint mining method based on short text in network message field
CN110909247B (en) Text information pushing method, electronic equipment and computer storage medium
WO2022262632A1 (en) Webpage search method and apparatus, and storage medium
Aouchiche et al. Authorship attribution in twitter: a comparative study of machine learning and deep learning approaches
CN114936282A (en) Financial risk cue determination method, apparatus, device and medium
CN113536773A (en) Commodity comment sentiment analysis method and system, electronic equipment and storage medium
Hung et al. Aafndl-an accurate fake information recognition model using deep learning for the vietnamese language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination