CN116402546A - Store risk attribution method and device, equipment, medium and product thereof - Google Patents

Store risk attribution method and device, equipment, medium and product thereof Download PDF

Info

Publication number
CN116402546A
CN116402546A CN202310332189.3A CN202310332189A CN116402546A CN 116402546 A CN116402546 A CN 116402546A CN 202310332189 A CN202310332189 A CN 202310332189A CN 116402546 A CN116402546 A CN 116402546A
Authority
CN
China
Prior art keywords
store
risk
target
type
contribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310332189.3A
Other languages
Chinese (zh)
Inventor
刘涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huanju Shidai Information Technology Co Ltd
Original Assignee
Guangzhou Huanju Shidai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huanju Shidai Information Technology Co Ltd filed Critical Guangzhou Huanju Shidai Information Technology Co Ltd
Publication of CN116402546A publication Critical patent/CN116402546A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0203Market surveys; Market polls
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to a store risk attribution method and a device, equipment, medium and product thereof, wherein the method comprises the following steps: acquiring a shop image of a target shop; invoking a risk classification model to predict a risk label corresponding to a store image of the target store, and determining a public praise grade corresponding to a public praise grading numerical frequency band hit by the classification probability of the risk label belonging to the high risk type; calculating the feature contribution degree of each type of feature data in the store portrait, and determining the contribution grade of the contribution grading numerical frequency band hit by each type of feature contribution degree; and screening root feature data of risk tags which cause the target store to hit a high risk type from the feature data of each type according to the public praise grade and the contribution grade of each type. The comprehensive public praise grade and the contribution grade determine root cause characteristic data from the shop image, are more accurate and efficient, and are beneficial to maintaining the management order of online shops of the electronic commerce platform.

Description

Store risk attribution method and device, equipment, medium and product thereof
Technical Field
The application relates to an electronic commerce information processing technology, in particular to a store risk attribution method and device, equipment, medium and product thereof.
Background
The e-commerce platform usually has a large number of online shops, and the online shops are charged with actual operations by different operation entities, and different operators and operation modes thereof enable the corresponding online shops to present different risk levels. The risk level of the online store refers to an abstract description concept of the operation health condition of the online store, and is generally reflected as the operation credit of the online store, and the online store with high risk level has better operation credit, whereas the online store with lower risk level, namely, the risk store, has relatively worse operation credit.
It can be seen that the risk level of the online store of the e-commerce platform basically determines the survival condition of the platform, if the concentration of the risk store in the e-commerce platform is too high, the platform is difficult to develop healthily, various resources of the platform are preempted by the risk store, the operating pressure of the good store becomes high, and under such vicious circle, the platform is exposed to huge risks. Therefore, how to effectively, accurately and quickly identify and treat the risk shops and reduce the adverse effect of the risk shops on the platform is a great difficulty in the wind control of the e-commerce platform.
In many situations, it is difficult to effectively analyze the root cause of risks formed by each risk store, and particularly when a deep learning model is used for predicting the risk store, the black box characteristics of the deep learning model are limited, and if attribution is only realized through weight parameters of the deep learning model, the difficulty is conceivable. Other machine learning models are similar, and it is generally difficult to analyze the root cause of the risk store from the mechanism of the model itself, and it is impossible to identify specific features that lead to the formation of the risk store by the online store.
In view of this, there is a need to advance related technologies to optimize the store risk attribution capability of the e-commerce platform, ensuring healthy operation of the platform.
Disclosure of Invention
It is an object of the present application to solve the above-mentioned problems and to provide a store risk attribution method and corresponding apparatus, device, non-volatile readable storage medium, and computer program product.
According to one aspect of the present application, there is provided a store risk attribution method comprising the steps of:
acquiring a shop image of a target shop, wherein the shop image comprises a plurality of types of characteristic data;
invoking a risk classification model to predict a risk label corresponding to a store image of the target store, and determining a public praise grade corresponding to a public praise grading numerical frequency band hit by the classification probability of the risk label belonging to the high risk type;
calculating the characteristic contribution degree of each type of characteristic data in the store image of the target store to the classification probability, and determining the contribution grade of the contribution grade numerical frequency band hit by the characteristic contribution degree of each type of characteristic data;
and screening root feature data of the risk tag causing the target store to hit the high risk type from the feature data of each type according to the public praise grade of the target store and the contribution grade of the feature data of each type.
According to another aspect of the present application, there is provided a store risk attribution apparatus, comprising:
an image acquisition module configured to acquire a store image of a target store, the store image including a plurality of types of feature data;
the public praise prediction module is used for calling a risk classification model to predict a risk label corresponding to a store image of the target store, and determining a public praise grade corresponding to a public praise grading numerical frequency band hit by the classification probability of the risk label belonging to the high risk type;
the contribution analysis module is used for calculating the characteristic contribution degree of each type of characteristic data in the store image of the target store to the classification probability and determining the contribution grade of the contribution grading numerical frequency band hit by the characteristic contribution degree of each type of characteristic data;
and the attribution analysis module is used for screening root characteristic data of the risk tag which causes the target store to hit a high risk type from the characteristic data of each type according to the public praise grade of the target store and the contribution grade of the characteristic data of each type.
According to another aspect of the present application, there is provided a store risk attribution apparatus comprising a central processor and a memory, the central processor being operable to invoke the steps of running a computer program stored in the memory to perform the store risk attribution method described herein.
According to another aspect of the present application, there is provided a non-transitory readable storage medium storing a computer program implemented in accordance with the store risk attribution method in the form of computer readable instructions, which when executed by a computer call, performs the steps included in the method.
According to another aspect of the present application, there is provided a computer program product comprising computer programs/instructions which when executed by a processor implement the steps of the method as described in any of the embodiments of the present application.
Compared with the prior art, the method and the system for classifying the risk labels of the electronic commerce by using the shop images formed by the feature data of the multiple types, firstly determining the classification probability of the risk labels of the corresponding high risk types by means of the risk classification model, matching the public praise grades of the corresponding target shops by using the classification probability, further calculating the effect of the feature data of each type, which results in hitting the risk labels of the high risk types, to quantify, determining the feature contribution degree of the feature data of each type and the corresponding contribution grade of the feature data, and on the basis, screening root cause feature data, which results in hitting the risk labels of the high risk types, of the target shops by combining the public praise grade of the target shops and the contribution grade of the feature data of each type of the feature data of each type, so that the risk causes the risk of the target shops to the risk causes the risk of the target shops, the credit risk of the target shops can be improved, and the business order of the electronic commerce platform can be maintained.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is an exemplary deployment of a corresponding network architecture of a store wind control system of the present application in an e-commerce platform;
FIG. 2 is a flow diagram of one embodiment of a store risk attribution method of the present application;
FIG. 3 is a schematic flow chart of determining root cause feature data and alerting in an embodiment of the present application;
FIG. 4 is a flow chart of assigning corresponding hierarchical numeric bands for a public praise level and a contribution level in an embodiment of the present application;
FIG. 5 is a flow diagram of constructing a sample dataset in an embodiment of the present application;
FIG. 6 is a flow chart of configuring risk policies according to a sample dataset in an embodiment of the present application;
FIG. 7 is a flowchart illustrating a specific process of configuring a risk policy in an embodiment of the present application;
FIG. 8 is a flow chart of updating risk policies using a self-learning mechanism in the practice of the present application;
FIG. 9 is a schematic block diagram of a store risk attribution device of the present application;
fig. 10 is a schematic structural view of a shop risk attribution apparatus used in the present application.
Detailed Description
Referring to fig. 1, an exemplary network architecture suitable for an e-commerce platform scenario includes a terminal device 80, a security server 81, and a store server 82.
The security server 81 may be used as a main execution body of a store wind control system of the present application, where the store wind control system identifies various types of feature data in a store image of an online store in an e-commerce platform according to a risk policy, predicts whether the online store belongs to a risk store, and the risk policy may be determined and configured according to a target rule set determined by a risk classification model of the present application. The shop risk attribution method can be realized as a computer program product through programming, and is implanted into the shop wind control system, when the shop wind control system predicts the shop image of one online shop as a risk label corresponding to a high risk type, the root cause analysis can be carried out on the online shop judged to be of the high risk type by using the shop risk attribution method, and the root cause characteristic data which causes the online shop to form the risk shop is determined from the shop image of the online shop through the root cause analysis so as to conveniently and rapidly find out the specific reason of the online shop to form the risk shop through the root cause characteristic data, thereby solving the problem.
The store server 82 may be used to deploy one or more online stores of an internet platform, and various network requests to access the online stores may be responded to by corresponding application services provided by the store server. Each online store has various types of feature data suitable for making its store representation, such as basic feature data, transaction feature data, complaint feature data, behavioral feature data, risk merchandise feature data, any of which may be used to construct the store representation. Of course, in each type of feature data, the package may include one or more specific feature data.
In addition, various evaluation index data, such as customer complaint rates generated by customer complaint events in various time periods, good rates obtained by users purchasing commodity items of the online stores, and the like, are formed in the history of each online store. These evaluation index data can be used to determine risk labels corresponding to store images of the corresponding online stores, so as to construct a store sample corresponding to each online store for training the risk classification model of the application. When the risk classification model is a decision tree model based on a decision tree, it may also be used to service the determination of the target rule set.
The terminal device 80 may be used to trigger the network request to use various application services in the store server 82, such as to browse the online store where the order operation flow of merchandise items is performed, etc.
The security server 81 may read relevant feature data of the online store in the store server 82, and be used to construct or update a store portrait of the online store, and then, by the store wind control system, based on a rule matching manner, determine, according to a matching relationship between various types of feature data in the store portrait and a risk policy of the store wind control system, whether a risk tag to which the online store belongs is a high risk type or a low risk type, so as to implement identification of a risk level of the online store.
The store wind control system running in the security server 81 may continuously acquire feature data from each online store through a self-learning mechanism to update corresponding store images, continuously identify risk tags to which each online store belongs, iterate to update a decision tree model with a training data set constructed by the store images and the risk tags, update a target rule set with the decision tree model, and update a risk policy of the store wind control system according to the updated target rule set, so that the store wind control system continuously improves risk level identification capability of whether each online store belongs to the risk store or not depending on the continuously updated risk policy, and continuously improves wind control capability of the electric business platform to the risk store.
Referring to fig. 2, according to a shop risk attribution method provided in the present application, in one embodiment thereof, the method includes the following steps:
step S1100, acquiring a shop image of a target shop, wherein the shop image comprises a plurality of types of characteristic data;
when the shop risk system of the application operates normally, the shop image of each online shop in the electronic commerce platform can be updated regularly or irregularly according to preset business logic, so that whether the online shop belongs to a risk shop hitting a high risk type risk tag can be identified according to the shop image of the online shop.
For this purpose, an online store in the e-commerce platform is used as a target store to acquire store images.
In one embodiment, the store representation includes a plurality of types of feature data that are preset, for example, including, but not limited to, any of basic feature data of the target store, transaction feature data of the target store, complaint feature data of the target store, behavioral feature data of the target store, and risk merchandise feature data of the target store.
The following describes exemplary configurations of various types of feature data by way of an exemplary collection scheme of the various types of feature data:
the basic feature data may include, for example, a store status, a registration day, a registration country, a payment day, a subscription status, a subscription package of the store, and features of a registration day of the store corresponding to the seller, the number of stores to which the seller belongs in the industry, etc., for describing the registration, business, subscription, etc. of the online store.
The transaction characteristic data comprises the characteristics of online shops, the number of historical orders of sellers corresponding to the online shops, the amount of historical orders, the number of near-x-day orders, the amount of near-x-day orders, the number of near-x-day transaction days, the maximum/minimum/average discount rate of near-x-day orders, the average logistics cost ratio of near-x-day orders and the like, wherein x can be respectively taken as a value 1/3/5/7/15/30/60/90/180/365 (unit: day for describing the transaction conditions of shops/sellers in different time intervals according to slicing time.
The complaint characteristic data comprises historical customer complaint quantity, historical customer complaint amount, near-x-day customer complaint quantity, near-x-day customer complaint amount, previous customer complaint distance to the present day, first customer complaint distance to the present day, variation condition of customer complaint quantity/customer complaint amount and the like of shops and corresponding buyers. Wherein, x can be respectively taken as 1/3/5/7/15/30/60/90/180/365 according to slicing time, which is used for describing the complaint characteristic conditions of shops/sellers in different time intervals.
The behavior characteristic data is processed according to store behavior buried point data, and is used for describing overall behaviors of stores and sellers, wherein the behavior characteristic data comprises near-x-day store login times, near-x-day store shelf commodity paving times, near-x-day store single click discarding times, near-x-day store newly added payment mode times, near-x-day store login country/city times, store last login times, most-frequently login countries, most-frequently login cities and other characteristics < x > are respectively taken as 1/3/5/7/15/30/60/90/180/365 (unit: day) according to slicing time and are used for describing behavior conditions of stores/sellers in different time intervals.
The risk commodity characteristic data comprises characteristics of the store and the historical commodity item quantity/amount, the historical ultra-low price commodity item quantity/amount, the historical forbidden commodity item quantity/amount, the commodity category of near-x-day intercourse, the average commodity price of near-x-day intercourse, the commodity quantity of near-x-day complaint, the commodity quantity of near-x-day collection and the like of the store corresponding to the seller, wherein x can be respectively valued 1/3/5/7/15/30/60/90/180/365 according to slicing time and is used for describing commodity conditions of the store in different time intervals. .
According to the above exemplary sampling scheme of the feature data of the shop portrait, the feature composition of the shop portrait can be flexibly defined according to needs, and the general purpose of the sampling scheme is to determine some feature data capable of describing the management information of the shop on the line as a whole according to different types, and to use the feature data to carry out ordered organization to form corresponding feature sets, wherein the feature sets play a role of the shop portrait. For each type of characteristic data, a plurality of specific characteristic data can be flexibly adopted to form the characteristic data according to requirements, and the specific objects and the specific number are not limited.
The above-described method of constructing a shop image is applicable not only to the target shop of the present application, but also to shop images of other online shops that need to be processed by the shop wind control system of the present application, and is of course also applicable to shop images of shop samples in a sample data set that are needed to train the risk classification model of the present application.
Step 1200, invoking a risk classification model to predict a risk label corresponding to a store image of the target store, and determining a public praise grade corresponding to a public praise grade numerical frequency band hit by a classification probability of the risk label belonging to a high risk type;
the risk classification model adopts store samples in corresponding sample data sets to train the store samples in advance, so that the risk classification model has the classification probability that the store samples are mapped to two risk labels of a high risk type and a low risk type according to store images of given online stores, and it is easy to understand that when the classification probability of the high risk type is high, the online stores can be judged as risk stores, and otherwise, the online stores are low risk or risk-free stores.
The risk classification model may be a machine learning model or a deep learning model, and in one embodiment, it is recommended to implement a decision tree model based on machine learning, for example, a LightGBM model, and of course, for other common algorithms such as decision tree algorithm, gbdt, xgboost, random forest, and neural network, the risk classification model may be used.
For each online store, the classification probability obtained by mapping the store image to the high risk type plays a role in representing the degree to which the corresponding online store belongs to the risk store, and therefore, the online store image classifying method has the advantages that a plurality of public praise grades are preset, corresponding classification probability frequency bands are set in a dividing mode according to each public praise grade, and the classification probability frequency bands are used as public praise grading numerical frequency bands, so that each public praise grade has the corresponding public praise grading numerical frequency band.
For example, there are 5 public praise levels, and considering that the risk classification model normalizes the classification probability to a value interval of [0,1], the public praise hierarchical value frequency band corresponding to each public praise level can be set according to the following table:
public praise grade Lower limit value of frequency band Upper limit value of frequency band
1 0.0000 0.2000
2 0.2000 0.4000
3 0.4000 0.6000
4 0.6000 0.8000
5 0.8000 1.0000
From the above table, it can be seen that the higher the order of the public praise ranking, the higher the value of the corresponding public praise ranking value band.
Therefore, for the shop image of the target shop, the shop image is input into the risk classification model, the risk classification model can be mapped into a classification label of a high risk type to obtain corresponding classification probability, and the corresponding public praise grade can be determined by matching the classification probability with the public praise grade numerical frequency band corresponding to each public praise grade.
Step S1300, calculating the characteristic contribution degree of each type of characteristic data in the shop image of the target shop to the classification probability, and determining the contribution grade of the contribution grading numerical frequency band hit by the characteristic contribution degree of each type of characteristic data;
for each type of feature data, a plurality of contribution grading value frequency bands can be set corresponding to the feature contribution degree of the feature data, so that the feature contribution degree of each type of feature data can correspondingly hit one contribution grouping value frequency band, and the corresponding contribution grade of the feature data can be determined. Similarly, in the case where the feature contribution degree is normalized to a specific numerical interval such as [0,1], the mapping relationship between the contribution rank and its contribution rank numerical frequency band may be set in the form of the following table:
contribution rank Lower limit value of frequency band Upper limit value of frequency band
1 0.0000 0.2000
2 0.2000 0.4000
3 0.4000 0.6000
4 0.6000 0.8000
5 0.8000 1.0000
As can be seen from the above table, the higher the ranking of the contribution rank, the higher the corresponding value of the contribution rank value band.
The method for calculating the feature contribution degree of each type of feature data in the store portrait can be implemented as follows:
based on the risk classification model m, it is assumed that the store image x of the target store is composed of three types of characteristic data of a, b, and c. First a concept, the absence and introduction of features is introduced. To understand the feature contribution, one intuitive idea is to see the impact of certain types of feature data on and without model m prediction results. The general method of "missing" features is to replace feature data of a corresponding type with other random values, and then compare the prediction results of the model m, which may generally refer to the classification probability of the risk tag of the high risk type. The "introduction" of the feature indicates how the classification probability of the output of the model m is, using the corresponding value in the store representation, this type of feature data.
Next examine the marginal contribution that each type of feature data introduces at different stages. That is, three types of feature data, a, b, and c, the contribution of the type a feature data is calculated, respectively, as what the marginal contribution of the type a feature data is introduced as the first feature data, as the second feature data, and as the third feature data. The introduction of the characteristic is represented by a, and the loss of the characteristic is represented by a', and the specific calculation is as follows:
the first feature is introduced, and m (a, b ', c ') -m (a ', b ', c ') are used to obtain the marginal contribution degree.
The second feature is introduced by using 1/2 (m (a, b, c ') -m (a', b, c ')) +1/2 (m (a, b', c) -m (a ', b', c)), to obtain the marginal contribution degree. It can be seen that there are two cases when introduced as the second feature, so here multiplied by 1/2, respectively.
A third feature is introduced, using m (a, b, c) -m (a', b, c) to get the marginal contribution.
And finally multiplying the three conditions by 1/3 sum respectively to obtain the total contribution degree of the characteristic data of the type a, namely the characteristic contribution degree.
And (3) calculating the feature contribution degree of the feature data of other types, namely, calculating the feature contribution degree of b and c in the same way, and summarizing the feature contribution degrees of the three types to obtain the final prediction result of the risk classification model, namely, the classification probability obtained when the risk classification model corresponds to the high risk type.
With reference to the above process, for store portraits with multiple types of feature data, feature contribution degrees corresponding to each type of feature data can be calculated, and for each type of feature data, the corresponding contribution level can be determined by matching the feature contribution degrees with each contribution grading numerical frequency band.
For determining the value range of the contribution grouping numerical frequency band of each contribution level, in one embodiment, the risk classification model may be utilized to divide and determine through prediction of store portraits of a plurality of store samples in one sample data set and data distribution of classification probabilities of corresponding high risk types obtained by the prediction; of course, in another embodiment, the determination may be made manually by practical experience alone.
In one embodiment, finer granularity sub-type analysis can be performed on each type of feature data according to actual needs, so that the feature contribution degree of each sub-type of feature data can be determined according to the above process, and then the finer granularity according feature data can be extracted according to the feature contribution degree obtained by the feature data of each sub-type.
Step S1400, according to the public praise grade of the target store and the contribution grade of each type of feature data, the root feature data of the risk tag causing the target store to hit the high risk type is screened from the feature data of each type.
Through the above process, based on the shop image of the target shop, the corresponding public praise grade and the contribution grade corresponding to each type of characteristic data are determined, so that part of the types of characteristic data can be extracted as root cause characteristic data according to actual needs.
In one embodiment, it is first determined whether the target store's mouth-inscription level has reached a preset level, e.g., when its mouth-inscription level is the highest level, which would be indicative that the target store has been identified as a risk store, and therefore, root feature data needs to be determined for it. Accordingly, the contribution levels of various types of feature data in the store portrait are further identified, the types of feature data with contribution levels higher than the preset level are screened out, and the feature data have higher feature contribution degrees, so that the important cause for identifying the target store as a risk store is hidden, the important cause is taken as root cause feature data, and the root cause of the risk tag of the target store hitting the high risk type can be obtained through analysis of each specific feature data.
In another embodiment, the public praise grade may be used as a first coefficient, the contribution grade obtained by each type of feature data may be used as a second coefficient, the product of the first coefficient and the second coefficient may be used as the evaluation score of the corresponding type of feature data, and then the feature data with the evaluation score higher than the preset threshold may be screened out as the root cause feature data, which may have the same effect as the previous embodiment.
According to the embodiment of determining the characteristic contribution degree of the characteristic data of each subtype for the characteristic data subdivision of each type, the root characteristic data determined through the above process can be further selected, the characteristic data of the subtype with lower characteristic contribution degree is filtered, and the characteristic data of the subtype with higher characteristic contribution degree is reserved so as to purify the root characteristic data and assist in improving the accuracy of root investigation.
According to the embodiment, the classification probability of the risk label corresponding to the high risk type is firstly determined by using the shop portrait formed by the plurality of types of feature data by means of the risk classification model, the public praise grade of the corresponding target shop is matched according to the classification probability, the effect of the risk label of the high risk type hit by the feature data of each type is further calculated to be quantified, the feature contribution degree of the feature data of each type and the corresponding contribution grade are determined, and on the basis, the public praise grade of the target shop and the contribution grade of the feature data of each type are combined, the root feature data of the risk label of the high risk type hit by the target shop are screened out, so that the risk attribution to the target shop is realized, the efficiency of the business credit risk of the shop on the investigation line can be improved, and the business order of the electric commerce platform is facilitated to be maintained.
On the basis of any embodiment of the present application, referring to fig. 3, according to the public praise level of the target store and the contribution level to which each type of feature data belongs, root feature data of a risk tag that causes the target store to hit a high risk type is screened from each type of feature data, including:
step S1410, judging whether the public praise grade of the target store belongs to a target grade, and when the public praise grade belongs to the target grade, screening out the characteristic data of the type belonging to the highest contribution grade from the characteristic data of the types of the target store as root characteristic data;
considering that a target store is often identified as a low-risk or no-risk store if the classification probability obtained by the corresponding risk tag of the high-risk type is too low, for example, below 0.5000, in this case, taking the case where the public praise level is divided into 5 equal frequency bands as an example, only the two public praise levels, which are the highest and the next highest, may be determined as target levels, and root feature data may be determined only when the target store is identified as the two target levels.
Accordingly, whether the public praise grade of the target store belongs to the target grade is judged, and when the public praise grade belongs to the target grade, the root feature data can be further screened according to the feature contribution degree obtained by the feature data of each type of the target store.
Similarly, when screening root feature data, only for the type having the highest contribution rank among them, these types of feature data may be used as root feature data that causes the target store to constitute a risk store.
Step S1420, storing the target store and the root cause characteristic data thereof as mapping relation data and sending corresponding alarm information to a preset communication interface.
After the root cause characteristic data of the target store is determined, the root cause characteristic data and the target store, particularly the characteristic identification of the target store, can be constructed into mapping relation data, the mapping relation data are stored in a related log file, an alarm message is constructed, the alarm message is sent to a preset communication interface, the communication interface is generally a communication interface used by an e-commerce platform management user, can be an instant communication interface, can also be a system message interface or other mail communication interfaces and the like, and can play a role of quick alarm by sending the alarm message, so that the reason affecting the operation credit can be further checked for the target store by means of manual check, and the operation order can be effectively maintained.
According to the above embodiment, the root cause characteristic data corresponding to the root cause of the target store as the risk store can be efficiently determined through the public praise level and the contribution level of various types of characteristic data, so that the root cause characteristic data plays a very convenient and critical role in checking the cause of the risk store, is beneficial to the subsequent correction of both the platform and the merchant, and can effectively maintain the role of the management order of online stores in the electronic commerce platform.
On the basis of any embodiment of the present application, referring to fig. 4, before acquiring a shop image of an online shop, the method includes:
step S2100, acquiring a sample data set, wherein the sample data set comprises a plurality of store samples, each store sample comprises a store portrait of a single online store and a risk label thereof, and the risk label is marked as a high risk type or a low risk type;
the sample data set may be prepared in advance so that a large number of store samples are contained therein, the store samples being constructed for each online store, the online store used to construct the store samples being either an online store within the current e-commerce platform or an online store of another third party e-commerce platform, as long as the corresponding feature data required to construct the store samples can be provided. In general, it is preferable to collect a corresponding store sample for an online store in the current e-commerce platform.
The store samples include store images of the corresponding online stores and their corresponding risk tags. The store image in the store sample is a set of feature data which is formed by extracting feature data which can directly or indirectly reflect the operation credit of the online store from various online stores, and specific feature data of the store image is formed, and the description of acquiring various types of feature data of the target store can be referred to in the foregoing description of the application, and is flexibly set according to actual conditions, and is not repeated here.
Each online store can correspondingly acquire a store image, each store image is determined by acquiring corresponding characteristic data components corresponding to a plurality of types according to a unified acquisition scheme, and all characteristic data adopted in the store images can comprehensively reflect the operation credit of the corresponding online store. Accordingly, for each online store, the corresponding store image is used, and the mapping relationship data is established with the corresponding risk tag, so that the store sample corresponding to the online store is formed.
The risk tag may be at least two sources for the sample data set, one of the sources may be determined by using some evaluation indexes of the online store in the stage of initializing and preparing the sample data set, and the other source may be determined by the subsequent wind-controlled store system according to the identification of the corresponding risk tag by the online store image, and the online store image and the corresponding risk tag thereof form a new store sample of the sample data set, or replace the original store sample thereof.
In one embodiment, risk tags of online stores may be classified into two types of tags, namely a high risk type and a low risk type, and as the name implies, the high risk type risk tag refers to an online store with lower business credit, and the corresponding online store is a risk store that does not relatively meet the business specifications of the e-commerce platform; the risk tag of the low risk type refers to an online store with higher operation credit, and the corresponding online store is a normal store which relatively accords with the operation standard of an electronic commerce platform, or is called a risk-free store or a low risk store, and the like. In other embodiments, risk tags can theoretically also be divided into more than two types, which can be implemented as desired by a person skilled in the art according to the above principles given in this application.
For the sake of visual understanding, the risk tag belongs to a shop sample of a high risk type, which may be regarded as a black sample, and the risk tag belongs to a shop sample of a low risk type, which may be regarded as a white sample.
S2200, predicting the classification probability of the corresponding risk label of the high risk type for each shop portrait in the sample data set by adopting a risk classification model trained by the sample data set;
as described above, the risk classification model of the present application is used to perform iterative training with each store sample in the sample dataset so as to converge and then put into use. In this embodiment, the trained risk classification model is adopted to predict each store sample in the sample data set, so as to determine the classification probability of the risk tag corresponding to the high risk type for each store image.
For each shop image, the feature contribution degree corresponding to each type of feature data can be determined one by means of the risk classification model for later use in the manner of determining the feature contribution degree of each type of feature data described above.
Step S2300, dividing and setting a public praise grading numerical frequency band corresponding to a plurality of public praise grades according to the data distribution of the classification probability of each shop portrait in the sample data set;
After predicting the classification probability of each shop portrait in the sample data set corresponding to the high risk type through the risk classification model, the classification probability of all shop portraits can present a data distribution, so that a person skilled in the art can examine the data distribution by himself, and divide and set a plurality of public praise grading numerical frequency bands corresponding to the public praise grades according to the presented characteristics of the data distribution, so that the division of the public praise grading numerical frequency bands is more in accordance with the data distribution rule, and the public praise grades are more accurate when the public praise grades are determined according to the public praise grading numerical frequency bands.
Step 2400, dividing and setting contribution rank value frequency bands corresponding to a plurality of contribution ranks according to the data distribution of the feature contribution degrees of the feature data of each type of the shop portraits in the sample data set.
The corresponding characteristic contribution degree of each type of characteristic data in the store portrait of each store sample in the sample data set can be determined as described above, so that for the characteristic contribution degree of each type of characteristic data, a corresponding data distribution is also presented in the full store sample, and similarly, a person skilled in the art can examine the data distribution by himself, divide and set contribution grading value frequency bands corresponding to a plurality of contribution grades according to the characteristics presented by the data distribution, so that the division of the contribution grading value frequency bands is more in accordance with the data distribution rule, and the contribution grades are determined more accurately in the follow-up according to the contribution grading value frequency bands.
According to the above embodiment, it can be seen that, in the process of setting the respective corresponding hierarchical numerical frequency ranges of the public praise level and the contribution level in advance, the method is determined in association with the prediction result of the risk classification model on the store image of the store sample in the same sample data set, so that no matter the public praise level is set or the contribution level is set, the method establishes close association with the sample data set, so that when the root cause characteristic data is screened for the target store in the follow-up process, the method can refer to the big data characteristic provided by the sample data set to obtain a more accurate analysis result, and the root cause analysis result is more effective.
On the basis of any embodiment of the present application, referring to fig. 5, obtaining a sample data set includes:
step S2110, determining a target online shop for extracting a shop sample in the electronic commerce platform;
the electronic commerce platform has a large number of online shops, the online shops have various operation conditions, for example, some online shops are in a closed state for a long time, the corresponding characteristic data of the online shops cannot effectively reflect the daily operation conditions of the online shops, and the online shops such as the online shops can not be considered for preparing shop samples.
In any event, one skilled in the art can determine some on-target stores, including all on-line stores, from the e-commerce platform so as to prepare corresponding store samples based on these on-target stores.
Step S2120, according to whether the evaluation index of each target online store is higher than a preset threshold, correspondingly setting the risk label of the target online store as a high risk type or a low risk type;
for stores on each target line, corresponding risk labels are generated in an automatic mode to form labeling information of store samples, and the problem that implementation cost is increased due to the fact that the risk labels are manually customized is avoided.
In the case of automatically identifying the evaluation index of each on-target store, for example, the caliber of the risk store is first determined, that is, the evaluation index is unified, if the store whose customer complaint amount rate exceeds 10% in the last 3 months is the risk store, the customer complaint amount rate of the store in the last 3 months of each on-target store needs to be calculated for the target stores, and the risk tag of the store whose customer complaint amount rate exceeds 10% is identified as a high risk type and a value is 1, otherwise the risk tag of the store is identified as a low risk type and a value is 0.
Step S2300, obtaining multiple types of characteristic data of each on-target-line store as store images of the corresponding on-target-line store;
according to the method for acquiring the shop images of the online shops in the previous embodiment of the application, the corresponding shop images are constructed by acquiring the characteristic data of various types of the online shops for each target online shop. In this embodiment, five types of feature data, including basic feature data, transaction feature data, complaint feature data, behavioral feature data, and risk commodity feature data, of the store on the target line can be obtained simultaneously to construct a corresponding store image. It will be appreciated that each on-target store may determine its corresponding store representation in accordance with the principles illustrated above.
Step S2400 constructs a store image of the store on each target line and its corresponding risk tag as a corresponding store sample to be stored in the sample data set.
After the shop image of each shop on the target line and the corresponding risk label are determined according to the process, the shop image of each shop on the target line and the risk label can be mapped and stored in the sample data set to form a shop sample in the sample data set, and the shop image and the risk label can be used for training the risk classification model and determining grading numerical frequency bands corresponding to the public praise grade and the contribution grade.
According to the embodiment, the shop portrait of the shop on the line is constructed by matching the feature data of various types with rich dimensions, so that the risk level of the shop on each dimension on the line is drawn from multiple directions, potential risk points are captured more accurately, the feature data of various types of the shop and corresponding sellers are reserved in the shop portrait, the omnibearing portrait of the shop can be effectively covered, the rich features are beneficial to improving the robustness of the risk classification model, and comprehensive feature alternatives are provided for the risk classification model.
On the basis of any embodiment of the present application, referring to fig. 6, after predicting, for each shop portrait in the sample data set, a classification probability of a risk tag of a corresponding high risk type by using a risk classification model trained by the sample data set, the method includes:
step S3100, classifying all shop samples in the sample data set into a plurality of sample subsets by adopting a risk classification model trained by the sample data set, and obtaining rule sets corresponding to the sample subsets, wherein the risk classification model is a decision tree model;
in order to use the sample data set to extract a rule for identifying a risk tag to which a store belongs on the line, the sample data set is adopted to train a decision tree model as a risk classification model of the application, the decision tree model (decision tree) is a simple, efficient and highly-explanatory prediction model which is generally generated from top to bottom, each decision or event (i.e. natural state) possibly leads out two or more events and leads to different results, and the decision branch is drawn into a graph to be similar to a branch of a tree, so the decision tree is called. The expression is to use a tree diagram to represent expected values of all decisions, and finally select a decision method with maximum benefit and minimum cost through calculation.
And (3) using the decision tree model trained by the sample data set to obtain sample subsets obtained by classifying and collecting all store samples in the sample data set at each leaf node of the decision tree model, wherein each sample subset comprises part of store samples in the sample data set, corresponding classification paths are formed from the root node of the decision tree model to each leaf node of the decision tree model, the characteristics of internal nodes in each classification path correspond to the condition of rules, and class labels of the leaf nodes correspond to the conclusion of the rules, so that each classification path correspondingly provides a corresponding rule set.
Common decision tree models include, but are not limited to: CLS, ID3, C4.5, CART, RF, random forest algorithms, etc. may all be used as a model of decision tree for the present application.
In one embodiment, after determining the type of decision tree model of the present application, the decision tree model of the present application may be prepared and used according to the following principles:
first, training the decision tree model:
building a two-class model by applying a corresponding decision tree algorithm to form a decision tree model of the application, wherein the relevant parameter setting of the model is approximately determined according to shop sample distribution in the sample data set, and fine adjustment or parameter grid search can be performed later; and the characteristics can be iteratively screened according to the importance of the variables, the characteristics with the front importance are reserved, the characteristics with the rear importance are removed until the technical indexes of the model are stable or the reserved characteristics meet the online requirement.
Secondly, evaluating the effect of the decision tree model:
technical evaluation indexes of the decision tree model generally use the intercept of a tangent line with a slope of 1 of an AUC (area under the curve) and KS (Kolmogorov-Smirnov). The distinguishing capability of the decision tree model for black and white samples can be evaluated, and the larger the AUC and KS are, the better the decision tree model effect is represented; to ensure model stability, cross-time sample effect verification is preferably added in model evaluation.
Then, extracting rule sets generated by the decision tree model:
as previously described, the decision tree model may extract a rule set. If a decision tree of depth 4 is constructed, because the decision trees are binary trees, a maximum of 16 rule sets are generated (note: the maximum number of decision tree model rules is 2 bar, h is the decision tree depth), and all store samples can be divided into a maximum of 16 sample subsets. Thus, each sample subset obtained after classifying all store samples in the sample data set by the decision tree model and the corresponding rule set are obtained.
Step S3200, screening out a plurality of rule sets according to the statistical indexes of the sample subsets corresponding to the rule sets, and taking the rule sets as target rule sets;
The sample subsets produced by the decision tree model are in one-to-one correspondence with the rule sets, but the classification effects of the sample subsets are different, and the effects can be determined through corresponding statistical indexes of the sample subsets. The statistical indexes are commonly accurate rate, recall rate and the like. Screening of the plurality of rule sets is implemented as follows, taking accuracy as an example, so as to screen out a representative rule set as a target rule set.
Assuming that the total number of store samples is C, the number of black samples is A, the number of white samples is B, taking a decision tree with depth of 4 as an example, extracting at most 16 rule sets, wherein each rule set contains at most 4 features, xi represents policy features, and Ci represents segmentation nodes of the features; then the 16 rule sets divide the total number of samples C into mutually disjoint sample subsets C1-C16, and the number of black samples and the number of white samples can be counted in the sample subsets C1-C16 respectively, the accuracy rate is calculated for the proportion of black and white samples in the sample subsets (pi=ai/Bi), and the recall rate is calculated for the proportion of black samples in the sample subsets to the total black samples in the sample data set (ri=ai/a).
The following is a schematic diagram for performing calculations according to the above principles, which can be referred to:
Figure BDA0004155271900000161
As can be seen from the principle graph, each rule set has the corresponding feature for decision and the rule condition corresponding to the feature, on the basis of counting each rule set, the statistical indexes such as accuracy and recall rate can be determined, each rule set is ordered according to any statistical index, the rule set with better ordering can be selected as the target rule set, and the target rule set is the description of the rule which can reflect the classification capability more. It is easy to understand that when the accuracy is preferred, the accuracy of the corresponding rule set is stronger, and the black and white samples can be accurately judged; when the recall rate is preferred, the searching capability of the corresponding rule set is stronger, and the capability of identifying different samples in a generalization way is further provided.
In one embodiment, the target rule set may be optimized according to a statistical index therein by a preset threshold, for example, filtering based on accuracy, and the preset threshold may be set to a value such as 90%.
In another embodiment, firstly, according to sample subsets corresponding to each rule set, calculating the accuracy and recall of risk labels of corresponding target types in each sample subset as statistical indexes: when a rule set is preferred based on statistical indicators of respective sample subsets determined after classification of the decision tree model, such statistics may be implemented using both accuracy and recall statistical indicators. The statistical manner of the accuracy and recall of each sample subset is the same as that disclosed in the previous embodiments of the present application, and is not repeated here. According to the modes, determining the accuracy and recall corresponding to the risk labels indicating the target types belonging to the black samples, namely the high risk types corresponding to the risk shops; then, screening out a plurality of sample subsets with accuracy higher than a first preset threshold and recall rate higher than a second preset threshold, and taking the corresponding rule set as a target rule set: specifically, the accuracy is used as a priority statistical index, a partial rule set corresponding to a partial sample subset with the accuracy higher than a first preset threshold value, for example, 90%, is determined from all rule sets determined by a decision tree model, then a second preset threshold value corresponding to the recall rate is utilized to screen out the partial rule set on the basis of the partial rule sets with high accuracy, and the recall rate is higher than a target rule set with the second preset threshold value, wherein the accuracy statistical index is applied to ensure the accuracy of a store wind control system in risk store identification, and the recall rate statistical index is applied to ensure that the searching capability of the corresponding sample subset can cover service requirements. According to the method and the system for identifying the risk labels of the online shops, the specific business requirements are adapted, the target rule sets can be further optimized by means of recall rates according to the specific business requirements under the condition that accuracy is prioritized, and the risk strategies generated by the target rule sets can meet the business requirements and ensure accurate identification of the risk labels of the online shops, so that the shop wind control system is more credible.
Step S3300, configuring a risk policy of a store wind control system by using the target rule set, so that the store wind control system identifies a risk tag corresponding to a store image of a store on any line according to the risk policy.
A plurality of target rule sets are determined, and in fact, a plurality of preferable rules for identifying whether store images of online stores belong to black samples or white samples are determined, that is, preferable rules for determining whether corresponding online stores belong to risk tags of high risk type or risk tags of low risk type according to the store images. Accordingly, these target rules may be used to configure the risk policies of the store wind control system of the present application. The risk policies of the store wind control system may be configured with multiple, e.g., one corresponding risk policy for each target rule set.
In addition, in other embodiments, based on the target rule set, more features are manually introduced and corresponding rule conditions are added to restrict the corresponding target rule set, so that a risk policy revised based on the target rule set is formed. In this regard, one skilled in the art can implement this as desired.
After the configuration of the risk strategy of the store wind control system is completed, the store wind control system can apply the risk strategy, after a store image of an online store is obtained, one or more risk strategies are utilized to carry out rule matching on feature data in the store image, then a corresponding risk label is determined according to a rule matching result, namely, whether the online store is of a high risk type or a low risk type is judged, and when the online store is of a high risk type, the corresponding online store is indicated to be a risk store, and corresponding labeling can be carried out.
In one embodiment, after the store wind control system identifies the corresponding risk tag of the online store after matching the store image of the online store with each risk policy, the store image of the online store and the corresponding risk tag thereof can be constructed as a store sample of the online store and stored in the sample data set, so as to realize expansion of the store sample in the sample data set. And then, using the sample data set expanded with the shop sample for iteratively training the decision tree model, and then using the decision tree model to generate a new target rule set to update the risk strategy of the shop wind control system, so that a self-learning mechanism can be realized for the shop wind control system, the shop wind control system can be continuously iterated and continuously updated, and the capability of identifying the risk shop is continuously improved.
The above embodiments have a rich technical advantage including, but not limited to:
firstly, training a decision tree model by using store images of online stores of an electronic commerce platform and risk labels thereof as store samples, classifying each store sample by using the decision tree model, determining rule sets and sample subsets corresponding to each classification path, optimizing target rule sets according to statistical indexes of the sample subsets, configuring a risk strategy of a store wind control system by using the optimized target rule sets, enabling the store wind control system to adapt to various changes of the online stores, and accurately identifying whether each online store in the electronic commerce platform belongs to the wind control store according to the risk strategy.
Secondly, the key basic data for deciding whether the online store belongs to the risk store is a store portrait of the online store, wherein the store portrait is usually derived from data of transaction, risk, behavior and the like of the online store, and the data are dominant data of the electronic commerce platform, so that an endogenous wind control capability lifting mechanism is established for a store wind control system of the electronic commerce platform, and a maintenance means of the electronic commerce platform for self health operation is sound in a technical implementation mode.
In addition, a risk classification model is constructed based on the decision tree model, and a risk strategy for identifying whether the online store is a risk store is mined by using store images, so that the accuracy of the strategy can be ensured, the analysis of the characteristic contribution degree can be combined to have stronger interpretation, and meanwhile, the online speed can be higher, so that the method can be more suitable for high-speed change of the business risk of an electronic commerce.
On the basis of any embodiment of the present application, referring to fig. 7, configuring a risk policy of a store wind control system by using the target rule set, so that the store wind control system identifies a risk tag corresponding to a store image of a store on any line according to the risk policy, including:
step S3310, determining a strategy feature table according to the data features in the target rule set;
assuming that the N standard rule sets are finally screened out to meet the online standard through screening of rule sets generated by the decision tree model, for the features used in the N rule sets, offline features or real-time features can be further developed, original target rule sets are enriched, and a strategy feature table is correspondingly generated for strategy online configuration so as to generate corresponding risk strategies in the store wind control system.
Step S3320, configuring the strategy feature table into a risk strategy in a shop wind control system;
further, the policy feature table developed in the previous step is connected to a risk policy library of a store wind control system, and then nodes are segmented according to the features of the target rule set, and the risk policy is configured, so that if the risk policy is hit by a store on line, an alarm record can be submitted at a wind control center.
And step S3330, controlling a store wind control system to start the risk strategies, scanning all online stores in the electronic commerce platform, and identifying risk labels corresponding to store images of all online stores according to the risk strategies.
After the configuration of the risk strategies is completed, the store wind control system can be controlled to start a new risk strategy, so that the store wind control system can identify the operation credit of any online store according to each risk strategy in the wind control strategy library which is just updated, identify whether the online store belongs to the risk store according to store images of the online stores, and determine the corresponding risk label of the online store.
In one embodiment, considering that the risk policy of the store wind control system is updated, the store wind control system can be controlled to re-identify the risk labels of store images of all online stores in the electronic commerce platform one by one so as to update the identification result of the business credit of each online store in time.
According to the embodiment, when the risk strategies of the store wind control system are configured based on the target rule set, the features can be further enriched according to the target rule set, and the corresponding risk labels of all online stores in the electronic commerce platform are timely identified again according to the updated risk strategies, so that the investigation of the risk stores of the whole platform is realized, the formulated risk strategies are ensured to have stronger identification capability, the identification capability of the store wind control system to the risk stores is comprehensively improved, and the management order of the electronic commerce platform is ensured to be effectively maintained.
On the basis of any embodiment of the present application, referring to fig. 8, a risk policy of a store wind control system is configured by using the target rule set, so that after the store wind control system identifies a risk tag corresponding to a store image of a store on any line according to the risk policy, the method includes:
step S4100, constructing a shop image of the online shop identified by the shop wind control system and a corresponding risk label thereof into a new shop sample, and adding the new shop sample into the sample data set;
the various underlying data of online stores are dynamically changing, resulting in the store images actually changing. Therefore, the store wind control system runs in real time, continuously acquires store images of all online stores in the electronic commerce platform, dynamically identifies risk tags of all online stores, and timely discovers risk stores with abnormal business credit.
When the shop wind control system identifies the risk label corresponding to the shop image of a shop on a certain line, the shop image and the risk label thereof can be constructed into a new shop sample and then added into the sample data set.
In one embodiment, when a new shop sample is added to the sample data set, an additional mode can be adopted to realize expansion of the shop sample of the sample data set, so that sample characteristics of the sample data set are generalized, the decision tree model is easier to converge when the decision tree model is retrained later, and the feature generalization capability of the decision tree model is improved.
In another embodiment, when a new shop sample is added to the sample data set, the shop sample determined last time for the online shop may be replaced with the new shop sample, so as to optimize the shop sample of the corresponding online shop.
In one embodiment, after the store wind control system identifies the corresponding risk tag of the store on the line, the risk tag can be confirmed by a background user and then is determined whether to be added into the sample data set, and if the strategy is online for a period of time by combining a manual auditing mechanism, a new risk mode which cannot be covered by the original strategy set appears; if the samples are confirmed to be normal through manual verification, namely the samples are strategy mishits, and the risk strategy correction is needed to be carried out on the samples, so that the self-learning mechanism is needed to be utilized to reconstruct the mishit data into store samples and add the store samples into a sample data set, strategy mining and updating can be carried out in time, and the accuracy rate and the application effect of the risk strategy are improved.
By performing this step, it is readily appreciated that the store samples in the sample dataset are continually expanded and/or updated.
Step S4200, retraining the decision tree model to a convergence state by using the sample data set in response to a timing arrival event triggered by a timing task;
the method is suitable for continuous upgrading of a sample data set, and can control the triggering of learning behaviors of a self-learning mechanism of the store wind control system through a timing task. The timed tasks may set their trigger time period for timed arrival events as desired, e.g., daily/weekly/monthly, etc. When the corresponding period expires, a corresponding timed arrival event is triggered.
And responding to the timing arrival event, restarting training of the decision tree model by adopting the upgraded sample data set, and retraining the decision tree model to a convergence state through each store sample and risk labels thereof in the sample data set.
Step S4300, re-determining a target rule set corresponding to the sample data set by adopting a decision tree model re-trained to a convergence state, and updating a risk strategy of the store wind control system.
And the result of retraining the decision tree model to a convergence state is that the classification capability of the decision tree model is improved, so that a rule set can be formulated more accurately. And re-predicting the sample data set by using the updated decision tree model, and reclassifying to determine a plurality of sample subsets, thereby determining rule sets corresponding to the sample subsets.
And on the basis of the rule set redetermined by the updated decision tree model and the corresponding sample subset, the target rule set with the better statistical index in each rule set can be further determined by using the statistical indexes corresponding to each sample subset, and the risk strategy in the store wind control system can be further updated by adopting the target rule sets, so that the updating of the risk strategy is realized. After the risk strategy of the store wind control system is upgraded, the capability of accurately identifying whether the store on the line is a risk store or not can be further improved, and the method is iterated continuously, so that a self-learning mechanism for upgrading the store wind control system is actually realized.
According to the above embodiment, a self-learning mechanism is introduced into the mining flow of the risk strategy of the store wind control system, and for the strategy mishit the store sample, sample label updating can be performed at regular time, the decision tree model is retrained, and the strategy rule is corrected; meanwhile, as time goes on, a new risk mode appears, so that the original risk strategy cannot be covered, a self-learning mechanism at the moment can regularly learn new store risk data distribution, a new rule set is mined, the accuracy of the online rule set is ensured at any time, and the generalization of the store risk strategy is improved. Therefore, under the action of a self-learning mechanism, the risk strategy in the store wind control system can be maintained, and the sudden risk can be covered and the normal risk can be also dealt with.
Referring to fig. 9, a store risk attribution apparatus provided according to an aspect of the present application includes an image acquisition module 1100, a public praise prediction module 1200, a contribution analysis module 1300, and an attribution analysis module 1400, wherein the image acquisition module 1100 is configured to acquire a store image of a target store, the store image containing a plurality of types of feature data; the public praise prediction module 1200 is configured to invoke a risk classification model to predict a risk tag corresponding to a store image of the target store, and determine a public praise grade corresponding to a public praise grading numerical frequency band hit by a classification probability of the risk tag belonging to a high risk type; the contribution analysis module 1300 is configured to calculate a feature contribution degree of each type of feature data in the store image of the target store to the classification probability, and determine a contribution level to which a contribution hierarchical numerical frequency band hit by the feature contribution degree of each type of feature data belongs; the attribution analysis module 1400 is configured to screen root feature data of a risk tag that causes the target store to hit a high risk type from the feature data of each type according to the public praise level of the target store and the contribution level to which the feature data of each type belongs.
On the basis of any embodiment of the application, the store portrait comprises any plurality of basic feature data of the target store, transaction feature data of the target store, complaint feature data of the target store, behavior feature data of the target store and risk commodity feature data of the target store.
On the basis of any embodiment of the present application, the attribution analysis module 1400 includes: a screening processing unit configured to determine whether a public praise level of the target store belongs to a target level, and when the public praise level belongs to the target level, screen out feature data of a type belonging to a highest contribution level from feature data of various types of the target store as root feature data; and the alarm processing unit is used for storing the target store and the root cause characteristic data thereof as mapping relation data and sending corresponding alarm information to a preset communication interface.
On the basis of any embodiment of the application, the shop risk attribution device of the application comprises: a sample acquisition module configured to acquire a sample data set, wherein the sample data set contains a plurality of store samples, each store sample contains a store portrait of a single online store and a risk tag thereof, and the risk tag is labeled as a high risk type or a low risk type; the sample reasoning module is used for predicting the classification probability of the corresponding risk label of the high risk type for each shop portrait in the sample data set by adopting the risk classification model trained by the sample data set; the public praise segmentation module is used for dividing and setting public praise grading numerical frequency bands corresponding to a plurality of public praise grades according to the data distribution of the classification probability of each shop portrait in the sample data set; and the contribution segmentation module is used for dividing and setting contribution grading numerical frequency bands corresponding to a plurality of contribution grades according to the data distribution of the characteristic contribution degree of the characteristic data of each type of each store portrait in the sample data set.
On the basis of any embodiment of the present application, the sample acquisition module includes: a store determination unit configured to determine an on-line store of a target for extracting a store sample in the electronic commerce platform; the labeling processing unit is used for correspondingly setting risk labels of stores on the target lines to be of a high risk type or a low risk type according to whether evaluation indexes of the stores on the target lines are higher than a preset threshold value; an image construction unit configured to acquire, as store images of the stores on the respective target lines, a plurality of types of feature data of the stores on the respective target lines; and a sample construction unit configured to construct a store image of the store on each target line and its corresponding risk tag as a corresponding store sample to be stored in the sample data set.
On the basis of any embodiment of the application, the shop risk attribution device of the application comprises: the rule set acquisition module is used for classifying all store samples in the sample data set into a plurality of sample subsets by adopting a risk classification model trained by the sample data set to obtain rule sets corresponding to the sample subsets, and the risk classification model is a decision tree model; the rule set screening module is used for screening a plurality of rule sets according to the statistical indexes of the sample subsets corresponding to the rule sets and taking the rule sets as target rule sets; the rule set configuration module is used for configuring a risk strategy of the store wind control system by utilizing the target rule set, so that the store wind control system identifies a risk label corresponding to a store image of a store on any line according to the risk strategy.
Another embodiment of the present application also provides a store risk attribution apparatus. As shown in fig. 10, the internal structure of the store risk attribution apparatus is schematically shown. The store risk attribution device includes a processor, a computer readable storage medium, a memory, and a network interface connected by a system bus. Wherein the computer readable non-volatile storage medium of the store risk attribution device stores an operating system, a database, and computer readable instructions, the database storing a sequence of information, the computer readable instructions when executed by a processor cause the processor to implement a store risk attribution method.
The processor of the store risk attribution device is operable to provide computing and control capabilities supporting the operation of the entire store risk attribution device. The store risk attribution device may have stored in a memory thereof computer readable instructions that, when executed by a processor, may cause the processor to perform the store risk attribution method of the present application. The network interface of the store risk attribution device is used for communicating with the terminal connection.
It will be appreciated by those skilled in the art that the structure shown in fig. 10 is merely a block diagram of a portion of the structure associated with the present application and does not constitute a limitation of the store risk attribution apparatus to which the present application is applied, and that a particular store risk attribution apparatus may include more or less components than those shown in the drawings, or may combine certain components, or have a different arrangement of components.
The processor in this embodiment is configured to perform specific functions of each module in fig. 9, and the memory stores program codes and various types of data required for executing the above-described modules or sub-modules. The network interface is used for realizing data transmission between the user terminals or the servers. The nonvolatile readable storage medium in this embodiment stores therein program codes and data necessary for executing all modules in the store risk attribution apparatus of the present application, and the server can call the program codes and data of the server to execute the functions of all modules.
The present application also provides a non-transitory readable storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the store risk attribution method of any embodiment of the present application.
The present application also provides a computer program product comprising computer programs/instructions which when executed by one or more processors implement the steps of the method described in any of the embodiments of the present application.
It will be appreciated by those skilled in the art that implementing all or part of the above-described methods according to the embodiments of the present application may be accomplished by way of a computer program stored in a non-transitory readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a computer readable storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
In summary, according to the method and the device, for the shop image of the target shop predicted as the high risk type by the risk classification model, the corresponding public praise grade under the condition that the target shop belongs to the high risk type is quantified, the corresponding contribution grade of various types is determined by calculating the feature contribution degree of the feature data of various types in the shop image, and root cause feature data is determined from the shop image by integrating the public praise grade and the contribution grade, so that the method and the device are more accurate and efficient, and are beneficial to maintaining the management order of online shops of an electronic commerce platform.

Claims (10)

1. A store risk attribution method, comprising:
acquiring a shop image of a target shop, wherein the shop image comprises a plurality of types of characteristic data;
invoking a risk classification model to predict a risk label corresponding to a store image of the target store, and determining a public praise grade corresponding to a public praise grading numerical frequency band hit by the classification probability of the risk label belonging to the high risk type;
calculating the characteristic contribution degree of each type of characteristic data in the store image of the target store to the classification probability, and determining the contribution grade of the contribution grade numerical frequency band hit by the characteristic contribution degree of each type of characteristic data;
And screening root feature data of the risk tag causing the target store to hit the high risk type from the feature data of each type according to the public praise grade of the target store and the contribution grade of the feature data of each type.
2. The store risk attribution method according to claim 1, wherein in the step of acquiring a store image of a target store, the store image includes any of basic feature data of the target store, transaction feature data of the target store, complaint feature data of the target store, behavior feature data of the target store, and risk commodity feature data of the target store.
3. The store risk attribution method according to claim 1, wherein the root feature data of the risk tag causing the target store to hit a high risk type is selected from the feature data of each type thereof according to the public praise level of the target store and the contribution level to which the feature data of each type thereof belongs, comprising:
judging whether the public praise grade of the target store belongs to a target grade, and screening out the characteristic data of the type belonging to the highest contribution grade from the characteristic data of the types of the target store when the public praise grade belongs to the target grade;
And storing the target store and the root cause characteristic data thereof as mapping relation data and sending corresponding alarm information to a preset communication interface.
4. The store risk attribution method according to claim 1, wherein before acquiring a store image of an online store, comprising:
acquiring a sample data set, wherein the sample data set comprises a plurality of store samples, each store sample comprises a store portrait of a single online store and a risk label thereof, and the risk label is marked as a high risk type or a low risk type;
predicting the classification probability of a corresponding risk tag of a high risk type for each shop portrait in the sample data set by adopting a risk classification model trained by the sample data set;
dividing and setting a plurality of public praise grading numerical frequency bands corresponding to the public praise grades according to the data distribution of the classification probability of each shop portrait in the sample data set;
and dividing and setting contribution grading numerical frequency bands corresponding to a plurality of contribution grades according to the data distribution of the characteristic contribution degrees of the characteristic data of each type of the store portraits in the sample data set.
5. The store risk attribution method of claim 4, wherein obtaining a sample dataset comprises:
Determining a target online store for extracting a store sample in an electronic commerce platform;
according to whether the evaluation index of each target line store is higher than a preset threshold value, correspondingly setting the risk label of the target line store as a high risk type or a low risk type;
acquiring multiple types of characteristic data of stores on each target line as store images of the stores on the corresponding target line;
store images of stores on respective target lines and their corresponding risk tags are structured as corresponding store samples stored in the sample data set.
6. The store risk attribution method according to claim 4, wherein after predicting a classification probability of a risk tag of its corresponding high risk type for each store portrait in the sample data set using a risk classification model trained with the sample data set, comprising:
classifying all shop samples in the sample data set into a plurality of sample subsets by adopting a risk classification model trained by the sample data set to obtain rule sets corresponding to the sample subsets, wherein the risk classification model is a decision tree model;
screening out a plurality of rule sets according to the statistical indexes of the sample subsets corresponding to the rule sets, and taking the rule sets as target rule sets;
And configuring a risk strategy of a store wind control system by utilizing the target rule set, so that the store wind control system identifies a risk label corresponding to a store image of a store on any line according to the risk strategy.
7. The store risk attribution method according to claim 6, wherein configuring a risk policy of a store wind control system using the target rule set, causing the store wind control system to identify a risk tag corresponding to a store image of an arbitrary online store according to the risk policy, comprises:
determining a strategy feature table according to the data features in the target rule set;
configuring the strategy feature table as a risk strategy in a store wind control system;
and controlling a store wind control system to start the risk strategy, scanning each online store in the electronic commerce platform, and identifying a risk label corresponding to a store image of each online store according to the risk strategy.
8. A store risk attribution apparatus, comprising:
an image acquisition module configured to acquire a store image of a target store, the store image including a plurality of types of feature data;
the public praise prediction module is used for calling a risk classification model to predict a risk label corresponding to a store image of the target store, and determining a public praise grade corresponding to a public praise grading numerical frequency band hit by the classification probability of the risk label belonging to the high risk type;
The contribution analysis module is used for calculating the characteristic contribution degree of each type of characteristic data in the store image of the target store to the classification probability and determining the contribution grade of the contribution grading numerical frequency band hit by the characteristic contribution degree of each type of characteristic data;
and the attribution analysis module is used for screening root characteristic data of the risk tag which causes the target store to hit a high risk type from the characteristic data of each type according to the public praise grade of the target store and the contribution grade of the characteristic data of each type.
9. A store risk attribution device comprising a central processor and a memory, wherein the central processor is configured to invoke execution of a computer program stored in the memory to perform the steps of the method of any of claims 1 to 7.
10. A non-transitory readable storage medium, characterized in that it stores in form of computer readable instructions a computer program implemented according to the method of any one of claims 1 to 7, which when invoked by a computer, performs the steps comprised by the corresponding method.
CN202310332189.3A 2023-01-12 2023-03-29 Store risk attribution method and device, equipment, medium and product thereof Pending CN116402546A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310064032 2023-01-12
CN2023100640327 2023-01-12

Publications (1)

Publication Number Publication Date
CN116402546A true CN116402546A (en) 2023-07-07

Family

ID=87013675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310332189.3A Pending CN116402546A (en) 2023-01-12 2023-03-29 Store risk attribution method and device, equipment, medium and product thereof

Country Status (1)

Country Link
CN (1) CN116402546A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116664016A (en) * 2023-07-27 2023-08-29 北京中关村科金技术有限公司 Screening method and device of ESG (electronic service guide) sub-topics, electronic equipment and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116664016A (en) * 2023-07-27 2023-08-29 北京中关村科金技术有限公司 Screening method and device of ESG (electronic service guide) sub-topics, electronic equipment and readable storage medium
CN116664016B (en) * 2023-07-27 2023-09-26 北京中关村科金技术有限公司 Screening method and device of ESG (electronic service guide) sub-topics, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN110837931B (en) Customer churn prediction method, device and storage medium
CN110704572B (en) Suspected illegal fundraising risk early warning method, device, equipment and storage medium
CN107908606A (en) Method and system based on different aforementioned sources automatic report generation
CN106649890A (en) Data storage method and device
CN103154991A (en) Credit risk mining
CN111402061A (en) Asset management method and system
CN109859052A (en) A kind of intelligent recommendation method, apparatus, storage medium and the server of investment tactics
CN110147389B (en) Account processing method and device, storage medium and electronic device
JP7017149B2 (en) Information processing equipment, information processing method and information processing program using deep learning
CN110310114A (en) Object classification method, device, server and storage medium
CN111222994A (en) Client risk assessment method, device, medium and electronic equipment
CN114186626A (en) Abnormity detection method and device, electronic equipment and computer readable medium
CN111369344A (en) Method and device for dynamically generating early warning rule
CN116402546A (en) Store risk attribution method and device, equipment, medium and product thereof
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN115630221A (en) Terminal application interface display data processing method and device and computer equipment
CN114693409A (en) Product matching method, device, computer equipment, storage medium and program product
CN102496126A (en) Custody asset transaction data monitoring equipment
CN116911994B (en) External trade risk early warning system
CN112231299B (en) Method and device for dynamically adjusting feature library
CN111445139A (en) Business process simulation method and device, storage medium and electronic equipment
CN115965464A (en) Empty shell enterprise identification method and device, storage medium and electronic device
CN115600818A (en) Multi-dimensional scoring method and device, electronic equipment and storage medium
Mohamed et al. A review of machine learning methods for predicting churn in the telecom sector
CN113821542B (en) Automatic significant feature recommendation system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination