CN111523832B - Merchant risk inspection method and device, electronic equipment and storage medium - Google Patents

Merchant risk inspection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111523832B
CN111523832B CN202010632364.7A CN202010632364A CN111523832B CN 111523832 B CN111523832 B CN 111523832B CN 202010632364 A CN202010632364 A CN 202010632364A CN 111523832 B CN111523832 B CN 111523832B
Authority
CN
China
Prior art keywords
sequence
risk data
modal
data
risk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010632364.7A
Other languages
Chinese (zh)
Other versions
CN111523832A (en
Inventor
高睿哲
李超
汲小溪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010632364.7A priority Critical patent/CN111523832B/en
Publication of CN111523832A publication Critical patent/CN111523832A/en
Application granted granted Critical
Publication of CN111523832B publication Critical patent/CN111523832B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Game Theory and Decision Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the specification discloses a method and a device for merchant risk inspection, electronic equipment and a storage medium, and multi-mode risk data of an online merchant are obtained; performing bidirectional matching fusion on each group of modal risk data in the multi-modal risk data according to the time sequence information of the group of modal risk data to obtain a first bidirectional matching fusion sequence and a second bidirectional matching fusion sequence corresponding to the group of modal risk data; aggregating a first bidirectional matching fusion sequence and a second bidirectional matching fusion sequence which are obtained by corresponding multi-modal risk data to obtain a risk feature vector; and predicting whether the commercial tenant has illegal platform risks or not according to the risk feature vector.

Description

Merchant risk inspection method and device, electronic equipment and storage medium
Technical Field
The embodiment of the specification relates to a data processing technology, in particular to a merchant risk inspection method, a merchant risk inspection device, electronic equipment and a storage medium.
Background
With the development of internet economy, signed merchants on e-commerce platforms are increasing day by day, and more users trade with merchants through network trading platforms to purchase products or services. Some merchants utilize a network transaction platform to accomplish some illegal or illegal purpose. In order to improve the security of internet transactions, risk patrol of merchants is required. Meanwhile, as a man-machine interaction mode is continuously innovated today, data description of a certain scene or a main body presents a plurality of expression forms.
Disclosure of Invention
The embodiment of the specification provides a method and a device for merchant risk inspection, electronic equipment and a storage medium, so as to improve the accuracy of predicting whether merchants have illegal platform risks.
In a first aspect, an embodiment of the present specification provides a merchant risk inspection method, including: obtaining multi-modal risk data of an online merchant; performing bidirectional matching fusion on each group of modal risk data in the multi-modal risk data according to the time sequence information of the group of modal risk data to obtain a first bidirectional matching fusion sequence and a second bidirectional matching fusion sequence corresponding to the group of modal risk data; aggregating the first bi-directional matching fusion sequence and the second bi-directional matching fusion sequence which are obtained correspondingly to the multi-modal risk data to obtain a risk feature vector; and predicting whether the online commercial tenant has the illegal platform risk according to the risk feature vector.
In a second aspect, an embodiment of the present specification provides a merchant risk inspection device, including: the risk data acquisition unit is used for acquiring multi-mode risk data of the online merchant; the data matching and fusing unit is used for performing bidirectional matching and fusing on each group of modal risk data in the multi-modal risk data according to the time sequence information of the group of modal risk data to obtain a first bidirectional matching and fusing sequence and a second bidirectional matching and fusing sequence corresponding to the group of modal risk data; a risk data aggregation unit, configured to aggregate a first bi-directional matching fusion sequence and a second bi-directional matching fusion sequence, which are obtained by corresponding to the multi-modal risk data, to obtain a risk feature vector; and the risk prediction unit is used for predicting whether the online commercial tenant has the illegal platform risk according to the risk feature vector.
In a third aspect, an embodiment of the present specification provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method according to the first aspect when executing the program.
In a fourth aspect, embodiments of the present specification provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method according to the first aspect.
One or more technical solutions provided in the embodiments of the present description at least achieve the following technical effects or advantages:
according to the technical scheme provided by the embodiment of the specification, when whether the online merchant has the illegal platform risk or not is predicted, the time sequence information of the multi-mode risk data is considered, and the time sequence information in the multi-mode risk data sequence data is subjected to bidirectional fusion, so that the information among all modes is more fully utilized and fused, and the loss of risk characteristic information can be avoided. Furthermore, the accuracy of identifying whether the online merchants have the illegal platform risks can be improved, so that the illegal platform risks of the online merchants are identified to a greater extent. Furthermore, the system can better finish the polling of illegal transactions, illegal investment and financing and forbidden sale behaviors of the online merchants so as to optimize the supervision of the online merchants.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only the embodiments of the present specification, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a merchant risk inspection method in an embodiment of the present specification;
FIG. 2 is a functional block diagram of a merchant risk inspection device in an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an electronic device in an embodiment of this specification.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step are within the scope of the present specification.
In a first aspect, an embodiment of the present specification provides a method for inspecting a merchant risk, which is shown in fig. 1 and includes the following steps:
s100, obtaining multi-modal risk data of the online merchant.
In particular, the risk data for each modality in the multi-modal risk data is serialized data. Serialized data refers to data existence time sequence information, and the serialized data has the characteristic that features can be extracted according to time steps. For example, text, audio, behavioral sequences, pictures, video, etc. belong to the serialized data.
For a business scenario of merchant risk patrol, acquiring multi-modal risk data of an online merchant specifically comprises the following steps: acquiring more than one of the following serialized risk data for online merchants: the method comprises the steps of obtaining webpage content of a merchant of an online merchant, transaction data of the merchant and complaint information of the online merchant; the data modality of the merchant webpage content comprises one or more of an audio modality, a text modality and a video modality; the data modality of the merchant transaction data is one or more of an audio modality, a text modality and a video modality; the data modality of the complaint information also includes one or more of an audio modality, a text modality, and a video modality.
S102, aiming at each group of modal risk data in the multi-modal risk data, performing bidirectional matching fusion on the group of modal risk data according to the time sequence information of the group of modal risk data to obtain a first bidirectional matching fusion sequence and a second bidirectional matching fusion sequence corresponding to the group of modal risk data.
As for the multi-modal risk data acquired by the online merchants, the webpage contents of the merchants of various data modalities, the transaction data of the merchants of various data modalities and the complaint information of various data modalities are combined in pairs to obtain at least one group of modal risk data.
In this embodiment of the present specification, each set of modal risk data is composed of two different serialized risk data, and specifically, two serialized risk data of the same data modality may be, for example: text modal merchant transaction data-text modal merchant web page content; or two serialized risk data of different data modalities, such as complaint information of audio modality, namely complaint information of text modality.
And respectively processing each group of module risk data through a multi-mode fusion model to obtain a risk feature vector for identifying whether the online commercial tenant has illegal platform risk. If only two different serialized risk data are contained, a set of modal risk data is constructed, such as: complaint information in a voice modality and merchant transaction data in a text modality. For another example, the method also includes: the complaint information of the audio modality, the complaint information of the text modality, the merchant transaction data of the text modality and the merchant webpage content of the video modality are combined to obtain the following six sets of modal risk data through the combination of two sets of modal risk data:
complaint information of an audio modality, namely complaint information of a text modality; complaint information in an audio mode, namely merchant transaction data in a text mode; complaint information of an audio modality, namely merchant webpage content of a video modality; complaint information in text mode-merchant transaction data in text mode; complaint information in a text mode, namely merchant webpage content in a video mode; sixthly, the text mode commercial tenant transaction data-the video mode commercial tenant webpage content.
In the embodiment of the present specification, the structure of the multimodal fusion model comprises five layers connected in sequence: a Word Representation Layer (Word Representation Layer), a Context Representation Layer (Context Representation Layer), a Matching fusion Layer (Matching Layer), an Aggregation Layer (Aggregation Layer), and a Prediction Layer (Prediction Layer). Step S102 is completed by the word representation layer, the context content representation layer and the matching fusion layer. Specifically, for each group of modal risk data formed by the multi-modal risk data, the step S102 is specifically to obtain a first bi-directional matching fusion sequence and a second bi-directional matching fusion sequence corresponding to the group of modal risk data by sequentially processing a word representation layer, a context content representation layer and a matching fusion layer of the multi-modal fusion model.
Specifically, each set of modal risk data is composed of first modal risk data and second modal risk data. It should be noted that the first modal risk data and the second modal risk data are only used for distinguishing two different serialized risk data in the same group of modal risk data, and the first modal risk data and the second modal risk data are different or different in one type under different groups of modal risk data.
Specifically, for the first modal risk data and the second modal risk data in each group of modal risk data, performing bidirectional matching fusion on the group of modal risk data according to the time sequence information of the group of modal risk data through the following steps 1021 to 1023 to obtain a first bidirectional matching fusion sequence and a second bidirectional matching fusion sequence corresponding to the group of modal risk data:
step 1021, performing vector representation on each dimension unit risk data in the first modal risk data to obtain a first representation sequence corresponding to the first modal risk data; and performing vector characterization on the risk data of each dimension unit in the second modal risk data to obtain a second characterization sequence corresponding to the second modal risk data.
Specifically, the step 1021 is completed through a word representation layer, and the word representation layer performs unit dimension data representation on the first modal risk data and the second modal risk data respectively by using a word representation model corresponding to the data modality, so as to complete primary processing on the first modal risk data and the second modal risk data.
Taking the first modal risk data sequence P with the first modal risk data represented as the sequence length M as an example, performing vector characterization on the unit risk data of each dimension in the first modal risk data sequence P to obtain a corresponding first characterization sequence P: [ P: [ P ]1,p2,…,pM](ii) a Taking the second modal risk data sequence Q with the second modal risk data represented as the sequence length N as an example, performing vector characterization on the unit risk data of each dimension in the second modal risk data sequence Q to obtain a corresponding second characterization sequence Q: [ Q ] Q1,q2,…,qN]。
For example, if the first modal risk data sequence P is a text sequence P, vector representation may be performed on the risk data of each dimension unit in the text sequence P by using a pre-trained word representation model (e.g., pre-trained word2vec or GloVe), so as to obtain a corresponding text representation sequence. Of course, for other modal risk data, corresponding word representation models need to be used for vector representation, and the used word representation models are not listed one by one here.
And step 1022, inputting the first characterization sequence and the second characterization sequence into the time-series information extraction model respectively, so as to obtain a first bidirectional context sequence corresponding to the first characterization sequence and a second bidirectional context sequence corresponding to the second characterization sequence.
Specifically, the first dual-direction context sequence consists of a first forward context sequence and a first reverse context sequence; the second bi-directional context sequence is comprised of a second forward context sequence and a second reverse context sequence.
Specifically, the step 1022 is performed by a context representation layer, which extracts timing information of the first representation sequence and the second representation sequence respectively through a timing information extraction model, and in a specific implementation process, the timing information extraction model adopts a Bi-directional Long Short-Term Memory (Bi-directional Long Short-Term Memory) model. To a first characterizing sequence P: [ P ]1,p2,…,pM]For example, the specific formula for extracting the timing information is as follows:
Figure GDA0002875635900000061
Figure GDA0002875635900000062
wherein M is the sequence length of the first modal risk data sequence P, PiOutputting a first bidirectional context sequence corresponding to the first modal risk data sequence P for vector characterization of the ith unit risk data in the first modal risk data sequence P: forward context sequence
Figure GDA0002875635900000063
Reverse context sequence
Figure GDA0002875635900000064
The first bi-directional context sequence contains a context vector for each time step (time-step). In the same processing mode, the time sequence information extraction model outputs a second dual-direction context sequence corresponding to the second modal risk data sequence Q, namely the forward context sequence
Figure GDA0002875635900000065
Reverse context sequence
Figure GDA0002875635900000066
By the steps, the context information of the first modality risk data sequence P is fused to each time step of the first modality risk data sequence P, and the context information of the second modality risk data sequence Q is fused to each time step of the second modality risk data sequence Q.
And 1023, performing information interaction on the first bidirectional context sequence and the second bidirectional context sequence to generate a first bidirectional matching fusion sequence corresponding to the first bidirectional context sequence and a second bidirectional matching fusion sequence corresponding to the second bidirectional context sequence.
It should be noted that, in step 1023, the matchmaking layer implements information interaction between the first and second dual-context sequences by using an Attentive-Matching strategy. The Attentive-Matching strategy is a Matching strategy based on an Attention mechanism.
Specifically, the information interaction between the first dual-context sequence and the second dual-context sequence is carried out bidirectionally based on the Attentive-Matching strategy, namely, the Matching fusion of the following two directions is carried out:
matching and fusing a first bidirectional context sequence to a second bidirectional context sequence: for the context vector of each i time step in the first dual-direction context sequence, fusing the context vector of the i time step into the second dual-direction context sequence to obtain a matched fused vector representation corresponding to the i time step in the first dual-direction context sequence, wherein i is 1 to M in sequence, and M is the sequence length of the first dual-direction context sequence; forming a first bidirectional matching fusion sequence according to M matching fusion vector representations correspondingly obtained from 1 st to M time steps in the first bidirectional context sequence;
more specifically, the fusion of the context vector at the ith time step in the first dual-context sequence into the second dual-context sequence to obtain the matching fusion vector representation corresponding to the ith time step is as follows: and matching and fusing the context vector of the ith time step in the first dual-direction context sequence and the context vectors of all time steps in the second dual-direction context sequence to obtain the matching and fusing vector representation corresponding to the ith time step in the first dual-direction context sequence.
Matching and fusing the second bidirectional context sequence to the first bidirectional context sequence: for the context vector of the jth time step in the second dual-context sequence, fusing the context vector of the jth time step into the first dual-context sequence to obtain a matching fusion vector representation corresponding to the jth time step in the second dual-context sequence, wherein j is 1 to N in sequence, and N is the sequence length of the second dual-context sequence; and forming a second bidirectional matching fusion sequence according to N matching fusion vector representations correspondingly obtained from 1 st to N time steps in the second bidirectional context sequence.
More specifically, the context vector at the jth time step in the second dual-context sequence is fused into the first dual-context sequence, and the matching fusion vector representation corresponding to the jth time step is obtained, that is: and matching and fusing the context vector of the jth time step in the second dual-context sequence with the context vectors of all time steps in the first dual-context sequence to obtain the matching and fusing vector representation of the jth time step in the second dual-context sequence.
Next, the implementation process of obtaining the matching fusion vector characterization at the ith time step in the first bi-directional matching fusion sequence includes the following steps a 1-A3:
step A1, calculating cosine similarity between the context vector of the ith time step in the first dual-direction context sequence and the context vectors of all the time steps in the second dual-direction context sequence as an attention coefficient of the ith time step, wherein the calculation process of the attention coefficient of the ith time step comprises the following steps: calculating a positive attention coefficient and a negative attention coefficient of the ith time step:
calculating cosine similarity between the context vector of the ith time step in the first forward context sequence and the context vectors of all time steps in the second forward context sequence, and taking the cosine similarity as a forward attention coefficient of the ith time step; calculating cosine similarity between the context vector of the ith time step in the first reverse context sequence and the context vectors of all time steps in the second reverse context sequence, wherein the cosine similarity is used as a reverse attention coefficient of the ith time step, and the specific formula is as follows:
Figure GDA0002875635900000081
Figure GDA0002875635900000082
wherein,
Figure GDA0002875635900000083
is the positive attention coefficient of the ith time step;
Figure GDA0002875635900000084
is the inverse attention coefficient for the ith time step.
Next, step a2 is performed: updating the second bidirectional context sequence based on the calculated positive attention coefficient and the calculated negative attention coefficient of the ith time step to obtain an attention vector (attentive vector) of the ith time step: the positive attention vector and the negative attention vector are specifically expressed as follows:
Figure GDA0002875635900000085
Figure GDA0002875635900000091
and finally, executing a step A3, namely calculating the cosine similarity between the context vector of the ith time step in the first bidirectional context sequence and the attention vector of the ith time step, and using the cosine similarity as the matching fusion vector characterization of the ith time step in the first bidirectional matching fusion sequence: the forward matching fusion vector characterization and the reverse matching fusion vector characterization have the following specific formula:
Figure GDA0002875635900000092
Figure GDA0002875635900000093
Figure GDA0002875635900000094
w is a parameter matrix, and W is a parameter matrix,
Figure GDA0002875635900000095
an intelligent product (element-wise multiplication), f, representing an elementmA function is calculated for the cosine similarity with the parameter matrix.
It should be noted that the obtained first bidirectional matching fusion sequence includes a first forward matching fusion sequence and a first reverse matching fusion sequence; the resulting second matching fusion sequence comprises a second forward matching fusion sequence and a second reverse matching fusion sequence. In a specific implementation process, the implementation process of matching and fusing the second bidirectional context sequence to the first bidirectional context sequence is similar to the implementation process of matching and fusing the first bidirectional context sequence to the second bidirectional context sequence, and for brevity of the description, no further description is given here.
And S104, aggregating the first bi-directional matching fusion sequence and the second bi-directional matching fusion sequence which are obtained by corresponding multi-modal risk data to obtain a risk feature vector.
It should be noted that step S104 is completed by an aggregation layer, and the aggregation layer may complete aggregation of the first bi-directional matching fusion sequence and the second bi-directional matching fusion sequence obtained by corresponding to the multi-modal risk data based on the BiLSTM model. The specific implementation mode is as follows:
extracting matching fusion vector representation of a last time step in a first bidirectional matching fusion sequence obtained by corresponding the kth group of modal risk data and matching fusion vector representation of a last time step in a second bidirectional matching fusion sequence obtained by corresponding the kth group of modal risk data, wherein k is 1 to G, and G is the group number of the multimodal risk data; and connecting the matched fusion vector representations of each final time step, which are obtained by corresponding the 1 st to G th groups of model risk data, to obtain a risk feature vector.
For each first bidirectional matching fusion sequence, extracting a matching fusion vector representation of a last time step in the first forward matching fusion sequence and a matching fusion vector representation of a last time step in the first reverse matching fusion sequence; for each second bidirectional matching fusion sequence, extracting a matching fusion vector representation of a last time step in the second forward matching fusion sequence and a matching fusion vector representation of a last time step in the second backward matching fusion sequence. Therefore, for each group of modal risk data, 4 matching fusion vector representations of the last time step can be obtained respectively. Taking the six sets of modal risk data obtained in the foregoing as an example, a total of 6 × 2 — 24 Bi-LSTM is used, and a feature vector of a last time-step obtained by connecting each Bi-LSTM in the 24 Bi-LSTM is used, so as to obtain a risk feature vector of 24 dimensions.
And S106, predicting whether the commercial tenant has illegal platform risks on the line according to the risk feature vector.
And inputting the risk feature vector output by the aggregation layer into a final prediction layer, and predicting whether the commercial tenant has illegal platform risk or not by the prediction layer according to the risk feature vector.
Due to the fact that the time sequence information of the multi-modal risk data is considered when the commercial tenant carries out the illegal platform risk on the prediction line, the time sequence information in the multi-modal risk data sequence data is subjected to two-way fusion, and information among all the modes can be utilized and fused more fully. Thus avoiding the loss of risk information. Therefore, the accuracy of identifying whether the online merchants have illegal platform risks can be improved, so that the illegal platform risks of the online merchants can be identified to a greater extent, and further, the routing inspection of illegal transactions, illegal investment and financing and sale prohibition behaviors of the online merchants can be better completed, so that the supervision of the online merchants is optimized.
Further, in order to improve the supervision efficiency and the automatic supervision, after the illegal platform risk prediction is performed on the online merchant according to the risk feature vector, if the online merchant has the illegal platform risk, the online merchant is processed and/or early warning information is sent to the target platform. Specifically, the processing of the online merchant includes: and sending illegal prompt information and performing punishment processing (processing such as account freezing, transaction behavior limiting and the like) on the online merchant.
In a second aspect, based on the same inventive concept as the foregoing merchant risk inspection method, an embodiment of the present specification further provides a merchant risk inspection device, which is shown in fig. 2 and includes:
a risk data acquiring unit 201, configured to acquire multi-modal risk data of an online merchant;
the data matching and fusing unit 202 is configured to perform bidirectional matching and fusing on each group of modal risk data in the multi-modal risk data according to the time sequence information of the group of modal risk data to obtain a first bidirectional matching and fusing sequence and a second bidirectional matching and fusing sequence corresponding to the group of modal risk data;
a risk data aggregation unit 203, configured to aggregate a first bi-directional matching fusion sequence and a second bi-directional matching fusion sequence, which are obtained by corresponding to the multi-modal risk data, to obtain a risk feature vector;
and the risk prediction unit 204 is used for predicting whether the commercial tenant has the illegal platform risk according to the risk feature vector.
In a specific embodiment, the risk data acquiring unit 201 is specifically configured to: acquiring more than one of the following serialized risk data for online merchants: the method comprises the steps of obtaining webpage content of a merchant of an online merchant, transaction data of the merchant and complaint information of the online merchant; the data modality of each serialized risk data in the webpage content of the merchant, the transaction data of the merchant and the complaint information of the online merchant comprises one or more of an audio modality, a text modality and a video modality.
In a specific embodiment, each set of modal risk data includes first modal risk data and second modal risk data; the data matching and fusing unit 202 includes:
the risk data characterization subunit 2021 is configured to perform vector characterization on each dimension of unit risk data in the first modal risk data to obtain a first characterization sequence corresponding to the first modal risk data; performing vector characterization on the risk data of each dimension unit in the second modal risk data to obtain a second characterization sequence corresponding to the second modal risk data;
the timing information extraction subunit 2022 is configured to input the first token sequence and the second token sequence to corresponding timing information extraction models, respectively, to obtain a first bidirectional context sequence corresponding to the first token sequence and a second bidirectional context sequence corresponding to the second token sequence;
the information interaction subunit 2023 is configured to perform information interaction on the first bidirectional context sequence and the second bidirectional context sequence, and generate a first bidirectional matching fusion sequence corresponding to the first bidirectional context sequence and a second bidirectional matching fusion sequence corresponding to the second bidirectional context sequence.
In a specific embodiment, the information interaction subunit 2023 includes:
the first information fusion subunit is used for fusing the context vector of the ith time step into the second bidirectional context sequence aiming at the context vector of each i time step in the first bidirectional context sequence to obtain a matching fusion vector representation corresponding to the ith time step, wherein i is 1 to M in sequence, and M is the sequence length of the first bidirectional context sequence; forming a first bidirectional matching fusion sequence according to M matching fusion vector representations correspondingly obtained from 1 st to M time steps in the first bidirectional context sequence;
the second information fusion subunit is used for fusing the context vector of the jth time step into the first dual-context sequence aiming at the context vector of the jth time step in the second dual-context sequence to obtain a matching fusion vector representation corresponding to the jth time step, wherein j sequentially takes 1 to N, and N is the sequence length of the second dual-context sequence; and forming a second bidirectional matching fusion sequence according to N matching fusion vector representations correspondingly obtained from 1 st to N time steps in the second bidirectional context sequence.
In a specific embodiment, the risk data aggregation unit 203 is specifically configured to: aiming at a kth group of modal risk data in the multi-modal risk data, extracting matching fusion vector representation of a last time step in a first bi-directional matching fusion sequence obtained corresponding to the kth group of modal risk data and matching fusion vector representation of a last time step in a second bi-directional matching fusion sequence obtained corresponding to the kth group of modal risk data, wherein k is 1 to G, and G is the group number of the multi-modal risk data; and connecting the matched fusion vector representations of each final time step, which are obtained by corresponding the 1 st to G th groups of model risk data, to obtain a risk feature vector.
In a specific embodiment, the apparatus further comprises:
and the processing unit 205 is configured to, if the prediction result of the risk prediction unit indicates that the online merchant has an illegal platform risk, process the online merchant and/or send early warning information to the target platform.
With regard to the above-mentioned apparatus, the specific functions of each unit have been described in detail in the method embodiment provided in the first aspect of the embodiment of the present invention, and will not be described in detail here, and the specific implementation process may refer to the method embodiment provided in the first aspect.
In a third aspect, based on the same inventive concept as that of the foregoing embodiment of the merchant risk inspection method, an embodiment of this specification further provides an electronic device, as shown in fig. 3, including a memory 304, a processor 302, and a computer program stored in the memory 304 and operable on the processor 302, where the processor 302 implements the steps of the foregoing embodiment of the merchant risk inspection method when executing the program.
Where in fig. 3 a bus architecture (represented by bus 300), bus 300 may include any number of interconnected buses and bridges, bus 300 linking together various circuits including one or more processors, represented by processor 302, and memory, represented by memory 304. The bus 300 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 306 provides an interface between the bus 300 and the receiver 301 and transmitter 303. The receiver 301 and the transmitter 303 may be the same element, i.e., a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 302 is responsible for managing the bus 300 and general processing, and the memory 304 may be used for storing data used by the processor 302 in performing operations.
In a fourth aspect, based on the same inventive concept as the embodiment of the merchant risk inspection method, the embodiment of the present specification further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the embodiment of the merchant risk inspection method.
The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims (12)

1. A merchant risk inspection method comprises the following steps:
obtaining multi-modal risk data for an online merchant, the multi-modal risk data comprising: the online merchant webpage content, the merchant transaction data and the complaint information of the online merchant are obtained, the risk data of each mode in the multi-mode risk data are serialized data, and the serialized data refers to the time sequence information of the data;
combining the webpage content of the online merchants with various data modes, the transaction data of the merchants with various data modes and the complaint information with various data modes with each other to obtain a plurality of sets of modal risk data, wherein each set of modal risk data is composed of two different serialized risk data or two serialized risk data with different data modes;
respectively processing each group of modal risk data through a multi-modal fusion model to obtain a risk feature vector for identifying whether an illegal platform risk exists at an online merchant, wherein for each group of modal risk data, performing bidirectional matching fusion on the group of modal risk data according to time sequence information of the group of modal risk data through sequential processing of a word representation layer, a context content representation layer and a matching fusion layer of the multi-modal fusion model to obtain a first bidirectional matching fusion sequence and a second bidirectional matching fusion sequence corresponding to the group of modal risk data, and aggregating the first bidirectional matching fusion sequence and the second bidirectional matching fusion sequence obtained corresponding to the multi-modal risk data through an aggregation layer of the multi-modal fusion model to obtain the risk feature vector;
and predicting whether the online commercial tenant has the illegal platform risk or not according to the risk characteristic vector through a prediction layer of the multi-mode fusion model, and performing punishment processing on the online commercial tenant and/or sending early warning information to a target platform if the prediction result represents that the online commercial tenant has the illegal platform risk.
2. The method of claim 1, the data modalities of the merchant web content, the merchant transaction data, and each serialized risk data in the complaint information for the online merchant comprising one or more of an audio modality, a text modality, and a video modality.
3. The method of claim 1, each set of modal risk data comprising first modal risk data and second modal risk data; the bidirectional matching fusion is performed on the set of modal risk data according to the time sequence information of the set of modal risk data to obtain a first bidirectional matching fusion sequence and a second bidirectional matching fusion sequence corresponding to the set of modal risk data, and the bidirectional matching fusion method includes:
performing vector characterization on each dimension unit risk data in the first modal risk data to obtain a first characterization sequence corresponding to the first modal risk data;
performing vector characterization on each dimension unit risk data in the second modal risk data to obtain a second characterization sequence corresponding to the second modal risk data;
inputting the first characterization sequence and the second characterization sequence into corresponding time sequence information extraction models respectively to obtain a first bidirectional context sequence corresponding to the first characterization sequence and a second bidirectional context sequence corresponding to the second characterization sequence;
and performing information interaction on the first bidirectional context sequence and the second bidirectional context sequence to generate a first bidirectional matching fusion sequence corresponding to the first bidirectional context sequence and a second bidirectional matching fusion sequence corresponding to the second bidirectional context sequence.
4. The method of claim 3, wherein the interacting the first bi-directional context sequence with the second bi-directional context sequence to generate a first bi-directional matching fused sequence corresponding to the first bi-directional context sequence and a second bi-directional matching fused sequence corresponding to the second bi-directional context sequence comprises:
for the context vector of each i time step in the first dual-direction context sequence, fusing the context vector of the i time step into the second dual-direction context sequence to obtain a matching fusion vector representation corresponding to the i time step, wherein i sequentially takes 1 to M, and M is the sequence length of the first dual-direction context sequence;
forming the first bidirectional matching fusion sequence according to M matching fusion vector representations correspondingly obtained from 1 st to M th time steps in the first bidirectional context sequence;
for the context vector of the jth time step in the second bidirectional context sequence, fusing the context vector of the jth time step into the first bidirectional context sequence to obtain a matching fusion vector representation corresponding to the jth time step, wherein j sequentially takes 1 to N, and N is the sequence length of the second bidirectional context sequence;
and forming the second bidirectional matching fusion sequence according to N matching fusion vector representations correspondingly obtained from 1 st to N time steps in the second bidirectional context sequence.
5. The method of claim 1, wherein aggregating the obtained first bi-directionally matched fused sequence and second bi-directionally matched fused sequence corresponding to the multi-modal risk data to obtain a risk feature vector comprises:
extracting, for a kth group of modal risk data in the multi-modal risk data, a matching fusion vector representation of a last time step in a first bi-directional matching fusion sequence obtained corresponding to the kth group of modal risk data and a matching fusion vector representation of a last time step in a second bi-directional matching fusion sequence obtained corresponding to the kth group of modal risk data, where k is 1 to G, and G is the group number of the multi-modal risk data;
and connecting the matched fusion vector representations of each final time step, which are obtained by corresponding the 1 st to G th groups of model risk data, to obtain the risk feature vector.
6. A merchant risk inspection device, includes:
a risk data obtaining unit, configured to obtain multi-modal risk data of an online merchant, where the multi-modal risk data includes: the online merchant webpage content, the merchant transaction data and the complaint information of the online merchant are obtained, the risk data of each mode in the multi-mode risk data are serialized data, and the serialized data refers to the time sequence information of the data;
combining the webpage content of the online merchants with various data modes, the transaction data of the merchants with various data modes and the complaint information with various data modes with each other to obtain a plurality of sets of modal risk data, wherein each set of modal risk data is composed of two different serialized risk data or two serialized risk data with different data modes;
the data matching and fusing unit is used for respectively processing each group of modal risk data through a multi-modal fusion model to obtain a risk feature vector for identifying whether an online merchant has an illegal platform risk, wherein for each group of modal risk data, the group of modal risk data is subjected to bidirectional matching and fusing through sequential processing of a word representation layer, a context content representation layer and a matching and fusing layer of the multi-modal fusion model according to time sequence information of the group of modal risk data to obtain a first bidirectional matching and fusing sequence and a second bidirectional matching and fusing sequence corresponding to the group of modal risk data;
a risk data aggregation unit, configured to aggregate, through an aggregation layer of the multi-modal fusion model, a first bi-directional matching fusion sequence and a second bi-directional matching fusion sequence that are obtained by the multi-modal risk data, so as to obtain a risk feature vector;
the risk prediction unit is used for predicting whether the online commercial tenant has the illegal platform risk according to the risk feature vector through a prediction layer of the multi-mode fusion model;
and the processing unit is used for performing punishment processing on the online commercial tenant and/or sending early warning information to a target platform if the prediction result represents that the online commercial tenant has illegal platform risk.
7. The apparatus of claim 6, the data modalities of the merchant web content, the merchant transaction data, and each serialized risk data in the complaint information for the online merchant comprising one or more of an audio modality, a text modality, and a video modality.
8. The apparatus of claim 6, each set of modal risk data comprising first modal risk data and second modal risk data; the data matching and fusing unit comprises:
the risk data characterization subunit is used for performing vector characterization on each dimension of unit risk data in the first modal risk data to obtain a first characterization sequence corresponding to the first modal risk data; performing vector characterization on each dimension unit risk data in the second modal risk data to obtain a second characterization sequence corresponding to the second modal risk data;
a time sequence information extraction subunit, configured to input the first token sequence and the second token sequence to corresponding time sequence information extraction models respectively, so as to obtain a first bidirectional context sequence corresponding to the first token sequence and a second bidirectional context sequence corresponding to the second token sequence;
and the information interaction subunit is configured to perform information interaction on the first bidirectional context sequence and the second bidirectional context sequence, and generate a first bidirectional matching fusion sequence corresponding to the first bidirectional context sequence and a second bidirectional matching fusion sequence corresponding to the second bidirectional context sequence.
9. The apparatus of claim 8, the information interaction subunit comprising:
a first information fusion subunit, configured to fuse, for a context vector at each i time step in the first dual-context sequence, a context vector at an ith time step into the second dual-context sequence, to obtain a matching fusion vector representation corresponding to the ith time step, where i sequentially takes 1 to M, and M is a sequence length of the first dual-context sequence; forming the first bidirectional matching fusion sequence according to M matching fusion vector representations correspondingly obtained from 1 st to M th time steps in the first bidirectional context sequence;
a second information fusion subunit, configured to fuse, for a context vector at a jth time step in the second dual-context sequence, the context vector at the jth time step into the first dual-context sequence, so as to obtain a matching fusion vector representation corresponding to the jth time step, where j sequentially takes 1 to N, and N is a sequence length of the second dual-context sequence; and forming the second bidirectional matching fusion sequence according to N matching fusion vector representations correspondingly obtained from 1 st to N time steps in the second bidirectional context sequence.
10. The apparatus of claim 6, wherein the risk data aggregation unit is specifically configured to:
extracting, for a kth group of modal risk data in the multi-modal risk data, a matching fusion vector representation of a last time step in a first bi-directional matching fusion sequence obtained corresponding to the kth group of modal risk data and a matching fusion vector representation of a last time step in a second bi-directional matching fusion sequence obtained corresponding to the kth group of modal risk data, where k is 1 to G, and G is the group number of the multi-modal risk data;
and connecting the matched fusion vector representations of each final time step, which are obtained by corresponding the 1 st to G th groups of model risk data, to obtain the risk feature vector.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any of claims 1-5 when executing the program.
12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
CN202010632364.7A 2020-07-03 2020-07-03 Merchant risk inspection method and device, electronic equipment and storage medium Active CN111523832B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010632364.7A CN111523832B (en) 2020-07-03 2020-07-03 Merchant risk inspection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010632364.7A CN111523832B (en) 2020-07-03 2020-07-03 Merchant risk inspection method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111523832A CN111523832A (en) 2020-08-11
CN111523832B true CN111523832B (en) 2021-02-26

Family

ID=71911624

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010632364.7A Active CN111523832B (en) 2020-07-03 2020-07-03 Merchant risk inspection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111523832B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020766A (en) * 2018-11-21 2019-07-16 阿里巴巴集团控股有限公司 Risk control method, device, server and storage medium
US10699277B2 (en) * 2014-12-30 2020-06-30 Mastercard International Incorporated Security for mobile payment applications

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10699277B2 (en) * 2014-12-30 2020-06-30 Mastercard International Incorporated Security for mobile payment applications
CN110020766A (en) * 2018-11-21 2019-07-16 阿里巴巴集团控股有限公司 Risk control method, device, server and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
On the Intrinsic Dimensionality of Image Representations;Sixue Gong等;《2019 IEEE/CVF Conference on CVPR》;20200109;全文 *
Word Image Representation Based on Visual Embeddings and Spatial Constraints for Keyword Spotting on Historical Documents;Hongxi Wei等;《2018 24th International Conference on ICPR》;20181129;全文 *

Also Published As

Publication number Publication date
CN111523832A (en) 2020-08-11

Similar Documents

Publication Publication Date Title
WO2020199693A1 (en) Large-pose face recognition method and apparatus, and device
CN112100387B (en) Training method and device of neural network system for text classification
CN108428132B (en) Fraud transaction identification method, device, server and storage medium
CN106649774A (en) Artificial intelligence-based object pushing method and apparatus
CN109784978A (en) Advertisement competition power calculation method, device, medium and equipment based on big data
CN109933792B (en) Viewpoint type problem reading and understanding method based on multilayer bidirectional LSTM and verification model
CN109599095A (en) A kind of mask method of voice data, device, equipment and computer storage medium
CN109670161B (en) Commodity similarity calculation method and device, storage medium and electronic equipment
CN112580352B (en) Keyword extraction method, device and equipment and computer storage medium
CN105761102A (en) Method for predicting user commodity purchasing behavior and device thereof
CN112561663A (en) Vehicle recommendation method and device, computer equipment and storage medium
CN108038541B (en) CTR (China train redundancy) estimation method, device, equipment and computer readable medium
CN111444335B (en) Method and device for extracting central word
CN111523832B (en) Merchant risk inspection method and device, electronic equipment and storage medium
CN112765481B (en) Data processing method, device, computer and readable storage medium
CN113592593A (en) Training and application method, device, equipment and storage medium of sequence recommendation model
CN117520499A (en) Training method, using method, device, equipment and medium of general language model
CN113761352A (en) Information pushing method, device, equipment and storage medium
CN110162769A (en) Text subject output method and device, storage medium and electronic device
CN116842384A (en) Multi-mode model training method and device, electronic equipment and readable storage medium
CN110827078A (en) Information recommendation method, device, equipment and storage medium
CN111143454A (en) Text output method and device and readable storage medium
CN112364258A (en) Map-based recommendation method, system, storage medium and electronic device
CN117370679B (en) Method and device for verifying false messages of multi-mode bidirectional implication social network
CN105635072B (en) Regulated account recognition methods and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant