CN116308444A - Data processing method and device, electronic equipment, storage medium and program product - Google Patents

Data processing method and device, electronic equipment, storage medium and program product Download PDF

Info

Publication number
CN116308444A
CN116308444A CN202111472689.4A CN202111472689A CN116308444A CN 116308444 A CN116308444 A CN 116308444A CN 202111472689 A CN202111472689 A CN 202111472689A CN 116308444 A CN116308444 A CN 116308444A
Authority
CN
China
Prior art keywords
aggregation
characteristic information
abnormal
information
target object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111472689.4A
Other languages
Chinese (zh)
Inventor
张宇康
张洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111472689.4A priority Critical patent/CN116308444A/en
Publication of CN116308444A publication Critical patent/CN116308444A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Abstract

The application discloses a data processing method and device, electronic equipment, storage medium and program product. The method comprises the following steps: performing content analysis on the characteristic information of the target object according to a preset content abnormality judgment rule to obtain a content analysis result corresponding to the characteristic information; performing association analysis on characteristic information of which the content analysis result indicates that the content is normal according to a preset abnormal information set to obtain an association analysis result corresponding to the characteristic information; performing aggregation anomaly analysis on the characteristic information of which the association analysis result indicates to be non-intra-set information, and obtaining an aggregation analysis result corresponding to the characteristic information; and inputting the characteristic information of the target object and the analysis parameters corresponding to the characteristic information into an abnormal scoring model to obtain an abnormal evaluation value output by the abnormal scoring model, wherein the analysis parameters corresponding to the characteristic information comprise at least one of content analysis results, association analysis results and aggregation analysis results. The method and the device can obtain accurate abnormal evaluation values.

Description

Data processing method and device, electronic equipment, storage medium and program product
Technical Field
The present application relates to the field of big data, and in particular, to a data processing method and apparatus, an electronic device, a computer readable storage medium, and a computer program product.
Background
With the rapid development of the internet and communication technology, electronic commerce is gradually developed into an industry related to the close-living of people. The merchant usually needs to register a merchant account on the internet, and submit identification materials of the merchant when the merchant account is registered, and after the identification materials are checked by related personnel, the merchant is allowed to operate on the internet through the merchant account, and the risk of the merchant account is identified by relying on transaction data in the operation process, but the risk cannot be found in advance, and potential loss is difficult to avoid.
Disclosure of Invention
To solve the above technical problems, embodiments of the present application provide a data processing method and apparatus, an electronic device, a computer readable storage medium, and a computer program product.
According to an aspect of the embodiments of the present application, there is provided a data processing method, including: performing content analysis on the characteristic information of the target object according to a preset content abnormality judgment rule to obtain a content analysis result corresponding to the characteristic information; performing association analysis on characteristic information of which the content analysis result indicates that the content is normal according to a preset abnormal information set to obtain an association analysis result corresponding to the characteristic information; performing aggregation anomaly analysis on the characteristic information of which the association analysis result indicates to be non-intra-set information, and obtaining an aggregation analysis result corresponding to the characteristic information; and inputting the characteristic information of the target object and analysis parameters corresponding to the characteristic information into an abnormal evaluation model to obtain an abnormal evaluation value output by the abnormal evaluation model, wherein the analysis parameters corresponding to the characteristic information comprise at least one of the content analysis result, the association analysis result and the aggregation analysis result.
According to an aspect of an embodiment of the present application, there is provided a data processing apparatus including: the content analysis module is configured to perform content analysis on the characteristic information of the target object according to a preset content abnormality judgment rule to obtain a content analysis result corresponding to the characteristic information; the association analysis module is configured to perform association analysis on the characteristic information of which the content analysis result indicates that the content is normal according to a preset abnormal information set, so as to obtain an association analysis result corresponding to the characteristic information; the aggregation analysis module is configured to perform aggregation anomaly analysis on the characteristic information of which the associated analysis result indicates to be non-intra-set information, and obtain an aggregation analysis result corresponding to the characteristic information; the abnormality evaluation module is configured to input the characteristic information of the target object and analysis parameters corresponding to the characteristic information into an abnormality evaluation model to obtain an abnormality evaluation value output by the abnormality evaluation model, wherein the analysis parameters corresponding to the characteristic information comprise at least one of the content analysis result, the association analysis result and the aggregation analysis result.
According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; and storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to implement the data processing method as described above.
According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon computer-readable instructions, which when executed by a processor of a computer, cause the computer to perform a data processing method as described above.
According to an aspect of the embodiments of the present application, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements a data processing method as described above.
In the technical scheme provided by the embodiment of the application, on one hand, the process of obtaining the abnormal evaluation score is based on the processing of the characteristic information of the target object, if the target object is specifically realized as a merchant account, the characteristic information of the target object correspondingly comprises the related information in the identification material submitted by the merchant account during registration, and the corresponding risk can be evaluated in advance according to the characteristic information without depending on the traffic data, so that the early discovery of the risk can be realized, and the potential risk is avoided; on the other hand, based on the analysis of three in one of content abnormality, association abnormality and aggregation abnormality on the characteristic information of the target object, the obtained analysis parameters corresponding to the characteristic information of the target object can comprehensively represent the abnormality degree of the target object, and further an accurate abnormality evaluation value can be obtained through an abnormality evaluation model, and if the target object is specifically realized as a merchant account, the obtained abnormality evaluation value can accurately reflect the risk degree of the merchant account.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:
FIG. 1 is a schematic illustration of one implementation environment to which the present application relates;
FIG. 2 is a flow chart of a data processing method shown in an exemplary embodiment of the present application;
FIG. 3 is a flow chart of step S150 in the embodiment shown in FIG. 2 in an exemplary embodiment;
FIG. 4 is a flow chart of another data processing method proposed on the basis of the embodiment shown in FIG. 3;
FIG. 5 is a flow chart of step S250 in the embodiment of FIG. 4 in an exemplary embodiment;
FIG. 6 is a flow chart of a data processing method shown in another exemplary embodiment of the present application;
FIG. 7 is a flow chart of data processing in an exemplary application scenario;
FIG. 8 is a block diagram of a data processing apparatus shown in an exemplary embodiment of the present application;
fig. 9 shows a schematic diagram of a computer system suitable for use in implementing the electronic device of the embodiments of the present application.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application as detailed in the accompanying claims.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
Reference to "a plurality" in this application means two or more than two. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., a and/or B may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
In the embodiments of the present application, data related to users, such as feature information of target objects, is required to obtain user permission or consent when the embodiments are applied to specific products or technologies, and the collection, use and processing of the related data is required to comply with relevant laws and regulations and standards of relevant countries and regions.
As described above, the merchant usually needs to register the merchant account on the internet, and submit the identification material of the merchant when the merchant account is registered, such as the business information (e.g. merchant registration address, unified social credit code or organization code, legal document number, profit owner document number, etc.), the bank account information, the contact information (e.g. merchant contact mobile phone number, merchant contact mail address, merchant manager information), etc., and after the identification material is checked by the personnel, the merchant is allowed to operate on the internet through the merchant account, and the risk of the merchant account is identified by relying on the transaction data in the operation process, which is equivalent to compressing the risk control in the in-process stage, so that the early discovery of the risk cannot be achieved, and the potential loss is difficult to avoid.
To solve the above technical problems, embodiments of the present application provide a data processing method, a data processing apparatus, an electronic device, a computer readable storage medium, and a computer program product, respectively, and these embodiments will be described in detail below.
Referring first to fig. 1, fig. 1 is a schematic diagram of an implementation environment according to the present application. The implementation environment comprises a server 10 and a plurality of user terminals 20 (only 2 are shown in fig. 1), each user terminal 20 establishing a wired or wireless communication connection with the server 10 such that the server 10 can obtain the data to be processed of a target object associated with the user terminal 20 based on the communication connection.
In different application scenarios, the target object and the data to be processed of the target object may be implemented in different data forms. For example, if the target object is implemented as the merchant account, the data to be processed of the target object includes related information in the identification material submitted by the merchant account at registration. For another example, if the target object is implemented as a vehicle-mounted user account, the processing to be processed of the target object includes information related to the vehicle-mounted user account, such as user basic information, payment account information, and the like, which is not limited herein.
The server 10 is configured to process data to be processed of a target object associated in the user terminal 20 to obtain an abnormality evaluation value of the target object, so as to characterize an abnormality risk of the target object by the obtained abnormality evaluation value. The detailed data processing procedure is described in the following embodiments, and is not described herein.
The server 10 may be, for example, an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), and basic cloud computing services such as big data and artificial intelligence platforms, which are not limited herein.
The user terminal 20 may be a smart phone, a tablet, a notebook computer, a vehicle-mounted terminal, a smart television, a smart watch, etc., but is not limited thereto, and may be selected according to a specific application scenario. For example, when the user terminal 20 specifically includes an in-vehicle terminal, a dedicated service, such as a precise driving service, may be provided to a corresponding driver according to an in-vehicle user account registered in the in-vehicle terminal.
FIG. 2 is a flow chart illustrating a data processing method according to an exemplary embodiment of the present application. The data processing method may be adapted to the implementation environment shown in fig. 1 and is specifically executed by the server 10.
As shown in fig. 2, the data processing method includes steps S110 to S170, which are described in detail as follows:
step S110, content analysis is carried out on the characteristic information of the target object according to a preset content abnormality judgment rule, and a content analysis result corresponding to the characteristic information is obtained.
It is first described that the target object is typically a business account logged in the user terminal, for example, the merchant account described above; the characteristic information of the target object is accordingly information about the identity of the service account, for example information contained in the identification material submitted for registering the merchant account as described above.
The preset content abnormality determination rule mainly determines the characteristic information of the target object from a content compliance layer, wherein the content compliance layer can be understood as a compliance standard related to the supervision requirement or the management specification for the identity of the target object, and if the identity information of the target object does not meet the compliance standard set by the supervision requirement or the management specification of the person, the characteristic information is determined to be content abnormality. For example, taking the specific implementation of the characteristic information of the target object as an identification material submitted when registering the account number of the merchant as an example, the preset content abnormality determination rule includes, for example, defining the characteristic information content abnormality if the age of a legal person of the merchant is less than 16 years old, and defining the characteristic information content abnormality if the unified social credit code or organization code of the merchant cannot be inquired from the business office management system, which is not listed here.
The number of feature information of the target object is usually plural, and the preset content abnormality determination rule correspondingly includes plural. In this embodiment, the process of performing content analysis on the feature information of the target object according to the preset content anomaly determination rule, that is, the process of scanning each piece of feature information one by one according to the preset content anomaly determination rule, and the corresponding output determination result is the content analysis result corresponding to the feature information. For example, the content analysis result of each piece of feature information includes a 0/1 variable, where the 0 variable indicates that any piece of content abnormality determination rule is missed, thus representing a case where no content abnormality exists in the feature information, and the 1 variable indicates that any piece of content abnormality determination rule is hit, then corresponding representing a case where content abnormality exists in the feature information.
Step S130, carrying out association analysis on characteristic information of which the content analysis result indicates that the content is normal according to a preset abnormal information set, and obtaining an association analysis result corresponding to the characteristic information.
The preset abnormal information set collects feature information of a plurality of known abnormalities, for example, the abnormal information set can collect abnormal feature information published by authorities such as banks, payment clearing associations and the like, and also can collect abnormal feature information maintained in the payment mechanism, and the method is not limited.
In this embodiment, a process of performing association analysis on feature information of which the content analysis result indicates that the content is normal according to a preset abnormal information set, that is, a process of scanning feature information of which the content analysis result indicates that the content is normal (corresponding to a content analysis result being a 0 variable) one by one in the abnormal information set, and if any abnormal feature information in the abnormal information set is missed, outputting a 0 variable for representing a situation that the corresponding feature information has no association abnormality; if any abnormal characteristic information in the abnormal information set is hit, outputting a 1 variable for representing the condition that the corresponding characteristic information has relevance abnormality, and taking a variable result output aiming at the characteristic information as a relevance analysis result corresponding to the characteristic information.
And step S150, carrying out aggregation anomaly analysis on the characteristic information of which the association analysis result indicates to be non-intra-set information, and obtaining an aggregation analysis result corresponding to the characteristic information.
The feature information of the non-set information, that is, the feature information with the association analysis result being a 0 variable, is subjected to aggregation anomaly analysis, so that whether the feature information has anomalies on a clustering level is further judged, and the feature information is sold for batch creation of internet accounts and other anomalies is avoided. For example, taking a mobile phone number with feature information as merchant legal information as an example, if the mobile phone number is used for registering different merchant accounts for multiple times, the mobile phone number may be maliciously stolen or used in batches after purchase, and the merchant account registered by the mobile phone number has a relatively high risk, so that the mobile phone number needs to be identified as abnormal feature information.
And in the obtained aggregation analysis result of the characteristic information, the characteristic information is characterized by 0 variable, and the characteristic information is characterized by 1 variable, so that the abnormal condition does not exist. It should be noted that, in this embodiment, the normal condition of the feature information is represented by the 0 variable, and the abnormal condition of the feature information is represented by the 1 variable, and in other embodiments, the normal condition of the feature information is still represented by the 1 variable, and the abnormal condition of the feature information is represented by the 0 variable, which can be determined according to the actual development requirement, and is not limited herein.
Step S170, inputting the characteristic information of the target object and the analysis parameters corresponding to the characteristic information into an abnormal evaluation model to obtain an abnormal evaluation value output by the abnormal evaluation model, wherein the analysis parameters corresponding to the characteristic information comprise at least one of a content analysis result, a correlation analysis result and an aggregation analysis result.
Based on the analysis processes performed on the feature information in steps S110 to S150, analysis parameters corresponding to the feature information can be obtained, and the analysis parameters include at least one of content analysis results, association analysis results, and aggregation analysis results, for example.
The anomaly evaluation model is an artificial intelligent model for evaluating the anomaly possibility of the target object according to the characteristic information, and the characteristic information of the target object and analysis parameters corresponding to the characteristic information are input into the anomaly evaluation model as input signals, and the anomaly evaluation scores correspondingly output by the anomaly evaluation model represent the anomaly possibility of the target object.
The anomaly evaluation model can adopt a logistic regression model, a machine learning model and the like, and can be selected according to actual application requirements. The machine learning model may be a neural network model, and the neural network model includes a classification network layer, where the classification network layer outputs a first probability value that the target object belongs to an abnormal type label and a second probability value that the target object belongs to a normal type label, and uses the first probability value as an abnormal evaluation value of the target object.
If the anomaly evaluation model adopts a logistic regression model, the obtained anomaly evaluation value corresponds to one of a plurality of anomaly grades by inputting the characteristic information of the target object and the analysis parameters corresponding to the characteristic information into the logistic regression model, and the plurality of anomaly grades comprise a first anomaly grade, a second anomaly grade and a third anomaly grade in sequence, wherein the anomaly degree of the target object represented by the three anomaly grades is gradually increased. The process of obtaining the abnormality evaluation score by the logistic regression model may be expressed as the following formula:
P=β 01 X 12 X 2 +…+β k X k
Wherein P represents an abnormality evaluation value, X 1 …X k Represents k independent variables, and it can be understood that the independent variables comprise characteristic information and analysis parameters corresponding to the characteristic information, beta 0 …β k Representing the coefficient of an argument, which is the weight that needs to be given to each argument by training.
According to the above description, in the technical solution provided in the embodiments of the present application, on one hand, since the process of obtaining the abnormal evaluation score is based on the processing of the feature information of the target object, if the target object is specifically implemented as a merchant account, the corresponding risk can be evaluated according to the feature information submitted by the merchant account during registration, so that early discovery of the risk can be achieved, and potential risks are avoided; on the other hand, based on the analysis of three in sequence of content abnormality, association abnormality and aggregation abnormality on the characteristic information of the target object, the obtained analysis parameters corresponding to the characteristic information of the target object can more comprehensively represent the abnormal program of the target object, and further an accurate abnormality evaluation value can be obtained through an abnormality evaluation model, and if the target object is specifically realized as a merchant account, the obtained abnormality evaluation value can accurately reflect the risk degree of the merchant account.
In other application scenarios, the target object may be implemented as a consumer account, and when registering the consumer account, the user may submit some material information proving the identity of the user, and use the material information as feature information of the target object, and then based on analysis processing of the feature information, the obtained abnormal evaluation score may represent the risk degree of the corresponding consumer account. The whole analysis and processing process does not use the transaction data of the consumer account, and the characteristic information of the consumer account can be obtained before the transaction is started, so that the early discovery of risks is realized, and the potential risks are avoided.
It should be understood that the target object may have different implementation manners under different application scenarios, which is not limited by the embodiment.
In addition, the method is worth mentioning that the characteristic information of the target object is analyzed in a three-in-one mode of content abnormality, association abnormality and aggregation abnormality, wherein the content abnormality is the basis of the analysis process, and the characteristic information relied on by the association abnormality analysis and the aggregation abnormality analysis is real, complete and reliable; the associated anomaly analysis is to further diverge the characteristic information of the target object, and the maximization of the information coverage surface for representing the anomaly is promoted through the co-environmental association of the anomaly information set, so that the information range for evaluating the anomaly degree of the target object is ensured; the aggregate analysis is the core in the whole analysis process, and reflects the degree of abnormality of the target object by analyzing the number of times the feature information is used in the scene system where the target object is located, and for details of the analysis process, please refer to the subsequent examples. The three analysis processes are advanced step by step, so that the analysis parameters corresponding to the obtained characteristic information embody the possible abnormality caused by the characteristic information from multiple aspects, and the abnormality evaluation score finally obtained according to the characteristic information and the corresponding analysis parameters can accurately reflect the abnormality degree of the target object.
Fig. 3 is a flow chart of step S150 in the embodiment shown in fig. 2 in an exemplary embodiment. As shown in fig. 3, the process of performing the aggregation anomaly analysis on the feature information, where the association analysis result indicates that the feature information is not intra-set information, to obtain the aggregation analysis result corresponding to the feature information may include steps S151 to S153, and the detailed analysis is as follows:
step S151, an aggregation anomaly threshold value that matches the feature information indicated as non-intra-set by the association analysis result is obtained, and the number of times the feature information indicated as non-intra-set by the association analysis result is used by other objects is determined.
It is first described that the clustering anomaly threshold is a critical value for evaluating whether the feature information has anomalies on the clustering level, and is used for reflecting the overall anomaly level of the target object, the critical value is obtained based on sampling and statistical manners, and the detailed processing procedure is described in the following embodiments. The aggregation abnormal threshold value corresponding to each piece of characteristic information can be obtained in advance, an aggregation abnormal threshold value set is correspondingly obtained, and the aggregation abnormal threshold value matched with the characteristic information can be obtained by searching and matching the characteristic information which indicates the association analysis result as the information in the non-set in the aggregation abnormal threshold value set.
The other objects mentioned in this embodiment refer to other objects except for the target object in the scene system where the target object is located, for example, if the target object is implemented as a merchant account in an e-commerce scene, the other objects refer to other merchant accounts except for the target object. The degree of abnormality of the target object is reflected by determining the number of times the feature information, which is indicated as non-intra-set information by the association analysis result, is used by other objects.
And step 153, comparing the number of times with the aggregation abnormal threshold value, and determining an aggregation analysis result corresponding to the characteristic information according to the obtained comparison result.
In a specific comparison process, if the number of times corresponding to the feature information is greater than the aggregation anomaly threshold of the feature information, the number of times that the feature information is used by other objects is indicated to be beyond a critical value for evaluating whether the feature information has anomalies on the aggregation level, and the feature information should have anomaly risks on the clustering level, so that an aggregation analysis result for representing that the feature information has anomalies, for example, the aggregation analysis result is indicated as 1 variable. Otherwise, if the number of times corresponding to the feature information is smaller than or equal to the aggregation abnormality threshold of the feature information, the number of times that the feature information is used by other objects does not exceed a critical value for evaluating whether the feature information has abnormality on a clustering level, and the feature information has no abnormality on the clustering level, so that an aggregation analysis result for representing that the feature information has no abnormality risk is obtained, and the analysis result can be represented by a 0 variable.
As can be seen from the foregoing, in this embodiment, by acquiring the aggregation anomaly threshold corresponding to the feature information and the number of times the feature information is used by other objects, based on the aggregation anomaly threshold and the number of times, whether the feature information is at an anomaly risk on the aggregation level is evaluated quantitatively, which is more favorable for performing a convenient data process on these quantified parameters.
Fig. 4 is a flow chart of another data processing method proposed on the basis of the embodiment shown in fig. 3. As shown in fig. 4, the data processing method further includes steps S210 to S250 before step S151, and is described in detail as follows:
step S210, an information sample set composed of feature information of a plurality of sample objects is acquired.
Firstly, the sample object is obtained by sampling a large number of account objects in a scene system where the target object is located, and specific sampling modes include random sampling, cluster sampling, hierarchical sampling and the like, which can be selected according to actual application requirements, and the embodiment is not limited to this. The characteristic information of the plurality of sample objects obtained by sampling constitutes an information sample set.
For example, an information sample set formed by characteristic information of a plurality of sample objects may be randomly acquired from a database, in which characteristic information of a large number of sample objects is stored in advance, and the characteristic information of the sample objects may be sampled from the internet. In addition, the integral abnormal level is reflected by the aggregation abnormal threshold, and the integral abnormal level is generally changed correspondingly along with continuous change of the internet environment, so that the accuracy of the aggregation abnormal threshold is ensured, the characteristic information of the sample object stored in the database is required to be updated according to a preset period, and the preset period is required to represent an abnormal change period under a scene system where the target object is located, wherein the scene system where the target object is located can be understood as the internet environment in the field where the target object is located.
In step S230, the aggregate value of the feature information samples contained in the information sample set is counted, and the aggregate value characterizes the number of times each feature information sample appears in the information sample set.
Considering that repeated feature information may exist between a plurality of sample objects, and precisely because the repeated feature information easily causes an abnormal situation that the piece of feature information has a high possibility, it is necessary to count the number of times the feature information repeatedly appears.
In this embodiment, by counting the number of times each feature information sample appears in the information sample set, the aggregate value of the feature information samples contained in the information sample set may be calculated. It should be understood that, in general, a characteristic information sample is composed of an information name and an information value, the information name corresponds to a field name at the data level, the information value corresponds to a field value, and different sample objects may have characteristic information samples with the same information name, but the corresponding information values may be different. Based on this, the feature information sample corresponding to the same information name may have a plurality of aggregation values, but the information values corresponding to different aggregation values are different.
Step S250, calculating aggregation abnormality thresholds of different characteristic information samples according to the aggregation values of the characteristic information samples.
Because the aggregation anomaly threshold reflects the overall anomaly level, the aggregation values of the characteristic information samples are further integrated to obtain the aggregation anomaly threshold meeting the requirements.
Referring to fig. 5, fig. 5 is a flowchart of step S250 in the embodiment shown in fig. 4 in an exemplary embodiment, where the process of calculating the aggregation anomaly threshold value for different feature information samples according to the aggregation value of the feature information samples may include steps S251 to S255, which are described in detail below:
step S251, grouping the aggregation values of the feature information samples to obtain a plurality of aggregation value groups. As described above, the feature information samples corresponding to the same information name have a plurality of aggregation values, and a plurality of aggregation value groups can be obtained by grouping the plurality of aggregation values. For example, after sorting the plurality of aggregation values from large to small, two parameters, that is, a group spacing value and an initial group lower limit value, may be used to group the sorted aggregation value sequence, where the group spacing value characterizes a distance between two adjacent aggregation value groups, and the initial group lower limit value characterizes a minimum aggregation value in the first aggregation value group, which is equivalent to determining an initial starting point and a distance between the two adjacent aggregation value groups, so that the aggregation value sequence may be divided into the plurality of aggregation value groups.
Step S253, sampling the plurality of aggregation value groups respectively to obtain sample objects respectively contained in the respective aggregation value groups.
The present embodiment may sample a plurality of sets of aggregate values using a hierarchical sampling method, wherein the hierarchical sampling method divides the population into a plurality of disjoint portions, each of which is referred to as a layer, and then samples in each layer according to a certain proportion. Each aggregation value group in this embodiment may be referred to as a corresponding layer, where a sample object corresponding to an aggregation value obtained by sampling is used as a sample object corresponding to a sample contained in the corresponding aggregation value group by randomly sampling a preset number of aggregation values in each layer.
And S255, determining the degree of abnormality of each aggregation value group according to the abnormal proportion of the sampled sample object in each aggregation value group, and if the abnormal proportion is larger than or equal to a preset ratio, determining that the minimum aggregation value in the aggregation value group is an aggregation abnormal threshold of the characteristic information sample represented by the aggregation value group.
The abnormal ratio of the sampling sample objects in each aggregation value group refers to the ratio between the number of sampling sample objects with abnormal conditions in the aggregation value group and the total number of sampling sample objects, and if the abnormal ratio is greater than or equal to a preset ratio, the minimum aggregation value in the aggregation value group is determined to be the aggregation abnormal threshold value of the characteristic information samples represented by the aggregation value group.
The sample object with abnormal condition in the aggregate value group can be determined by the type labels associated with the sample object, and the type labels represent whether the sample object is abnormal or not, and can be obtained by adding after manual decision. Or in some embodiments, the sampled sample object with the abnormal condition in the aggregate value group may also be obtained by scanning a characteristic information sample of the sampled sample object through some preset abnormal decision rules, where the abnormal decision rules include a plurality of rules for judging whether the sampled sample object is abnormal according to the characteristic information sample. However, in either way, the overall anomaly condition of the sampled sample object is combined to obtain the anomaly duty ratio, so that the aggregate anomaly threshold value finally determined based on the anomaly duty ratio can reflect the overall anomaly condition of the sampled sample object.
In some embodiments, a case may also occur in which feature information samples having the same information name obtained through the above processing have different minimum aggregation values, in which case, the minimum aggregation value with the lowest value may be used as an aggregation abnormality threshold of the feature information sample, or an average value between a plurality of minimum aggregation values may be used as an aggregation abnormality threshold of the feature information sample, which is not limited herein.
As can be seen from the foregoing, in this embodiment, after grouping and abstracting the aggregate values of the feature information samples, by determining the abnormal duty ratio of the sampled sample objects in each aggregate value group, and obtaining the aggregate anomaly threshold of the feature information samples according to the abnormal duty ratio, the obtained aggregate anomaly threshold of the feature information samples can reflect the overall anomaly condition of the sampled sample objects, so that not only the accuracy of the aggregate anomaly threshold of the feature information samples is improved, but also the accuracy of the subsequently obtained aggregate analysis results is improved, and these accuracies will all be used for outputting the anomaly evaluation scores corresponding to the anomaly evaluation model, thereby ensuring the accuracy of the anomaly evaluation scores.
In some exemplary embodiments, the preset threshold set in step S255 includes a first preset ratio and a second preset ratio, the first preset ratio being greater than the second preset ratio. If the abnormal proportion of the sampling sample object in each aggregation value group is larger than or equal to a first preset ratio, determining that the sampling sample object corresponds to a first early warning level, and taking the minimum aggregation value in the aggregation value group as an aggregation abnormal threshold corresponding to the first early warning level. If the abnormal proportion of the sampling sample object in each aggregation value group is determined to be larger than or equal to a second preset ratio and smaller than the first preset ratio, determining that the sampling sample object corresponds to a second early warning level, and taking the minimum aggregation value in the aggregation value group as an aggregation abnormal threshold value corresponding to the second early warning level.
In the embodiment, two levels of aggregation anomaly thresholds are set, and in step S153, the number of times that the association analysis result indicates that the feature information of the non-intra-set information is used by other objects is compared with the aggregation anomaly thresholds of different levels, and if the number of times is greater than or equal to the aggregation anomaly threshold corresponding to the first early warning level, an aggregation analysis result for representing the first early warning level is generated; if the number of times is greater than or equal to the aggregation abnormality threshold corresponding to the first early warning level and less than the aggregation abnormality threshold corresponding to the second early warning level, generating an aggregation analysis result for representing the second early warning level.
The degree of abnormality represented by the first early warning level is higher than that of the second early warning level, and the degree of abnormality represented by the aggregation analysis result corresponding to the first early warning level is also higher than that represented by the aggregation analysis result corresponding to the second early warning level.
It can be seen that, in this embodiment, the degree of abnormality represented by the aggregate analysis result is further refined, so that the degree of abnormality represented by the subsequently obtained abnormality evaluation value is also more accurate.
Fig. 6 is a flow chart of a data processing method shown in another exemplary embodiment of the present application. As shown in fig. 6, the data processing method further includes steps S310 to S330 before step S110, and is described in detail as follows:
Step S310, data cleaning is performed on the feature information of the target object.
The data cleaning refers to a last procedure for finding and correcting identifiable errors in a data file, including checking data consistency, processing invalid values, missing values, and the like, and the process of cleaning the characteristic information of the target object according to the embodiment may include removing error values, abnormal values, null values, special values, and the like, where the cleaned characteristic information can increase the efficiency of data processing, and meet application requirements.
Step S330, classifying the feature information after data cleaning, and adding a type label to each piece of feature information according to the obtained classification result, wherein the type label is used for selecting a content judgment rule or an abnormal information set matched with the type label of each piece of feature information.
The classifying process of the feature information after data cleaning according to the embodiment can be performed through a machine learning model, so that the types corresponding to the feature information can be obtained quickly, and the machine learning model adopts a supervised classifying model. For example, in a scenario in which the target object is implemented as a merchant account, the feature information may include types of business information, such as a merchant registration address, a unified social credit code or organization code, a legal person license number, a revenue owner license number, etc., and contact information, such as a merchant contact cell phone number, a merchant contact mail address, merchant administrator information, etc., which are not listed herein.
According to the embodiment, the characteristic information after data cleaning is classified, and the type label is added to each piece of characteristic information according to the obtained classification result, so that on one hand, data resources required in the data processing process, such as content judgment rules for content exception analysis and exception information sets for associated exception analysis, can be quickly obtained based on the added type label, and on the other hand, the efficiency of the preprocessing stage is improved because the data is classified based on different types in the preprocessing stage (such as the content judgment rules and the exception information sets).
In a further embodiment (not shown in the figures), the data processing method further comprises the following step after step S170:
determining an abnormality grade of the target object according to the abnormality evaluation value;
if the abnormality level of the target object is the first abnormality level, adding a mark to the target object;
if the abnormal level of the target object is the second abnormal level, sending notification information for checking the identity of the target object;
and if the abnormal level of the target object is the third abnormal level, controlling to close the target object to conduct business transaction, and adding the characteristic information of the target object into the abnormal information set.
In this embodiment, the degree of abnormality characterized by the first abnormality level, the second abnormality level and the third abnormality level is gradually increased, so if the abnormality level of the target object is the first abnormality level, it indicates that the target object has a low possibility of occurrence of an abnormality, and therefore the target object is not limited at all, and only a mark is added to the target object to characterize that the target object has completed abnormality evaluation. If the abnormal level of the target object is the second abnormal level, the possibility of the abnormal condition of the target object is a medium level, so that notification information for verifying the identity of the target object is sent to related personnel, and after the related personnel receive the notification information, the identity of the target object is manually verified, so that the safety of service operation of the target object in an Internet environment is ensured. If the abnormality level of the target object is the third abnormality level, the possibility of occurrence of abnormality of the target object is higher, so that the target object is controlled to be closed for business transaction, for example, the payment authority of the target object is closed, so as to ensure the operation safety of the target object, and the characteristic information of the target object is added into the abnormality information set, thereby realizing active update of the abnormality information set.
Therefore, it can be seen that, according to the embodiment, the corresponding processing operation is performed on the target object according to the abnormal level of the target object, so that possible abnormal situations can be flexibly handled, and the security of service operation of the target object in the internet environment is ensured.
In order to facilitate understanding of the technical solution disclosed in the embodiments of the present application, the following description is provided by a specific application scenario. In this exemplary application scenario, the target object is implemented as a merchant account for the anomaly to be evaluated, and the detailed data processing procedure is shown in fig. 7.
Firstly, content anomaly analysis, association anomaly analysis and aggregation anomaly analysis are sequentially carried out on characteristic information of a target object, content analysis results, association analysis results and aggregation analysis results are correspondingly obtained, and the analysis results are used as analysis parameters corresponding to the characteristic information.
And then inputting each piece of characteristic information and analysis parameters corresponding to the characteristic information into an abnormal evaluation model, correspondingly obtaining an abnormal evaluation value output by the abnormal evaluation model, and determining specific treatment measures according to the abnormal evaluation value output by the abnormal evaluation model.
Specifically, if the abnormality grade of the target object is determined to be the first abnormality grade according to the abnormality evaluation value, the possibility of occurrence of abnormality of the target object is low, so that the merchant account is determined to be a normal merchant, the target object is not limited, and only a mark is added to the target object to represent that the abnormality evaluation of the target object is completed.
If the abnormality level of the target object is the second abnormality level, which indicates that the possibility of occurrence of the abnormality of the target object is at a medium level, notification information for verifying the identity of the target object is sent to related personnel, so that the related personnel can manually verify the identity of the target object after receiving the notification information, and if no abnormality is found through manual verification, the monitoring strength of the target object can be enhanced, for example, the abnormality evaluation frequency of the target object can be increased, so as to ensure the security of service operation of the target object in the internet environment.
If the abnormality level of the target object is the third abnormality level, the possibility of occurrence of abnormality of the target object is higher, so that the target object is controlled to be closed for business transaction, for example, the payment right of the target object is closed, so as to ensure the operation safety of the target object, and the characteristic information of the target object is added into the abnormality information set, so that the active update of the abnormality information set is realized, and the subsequent abnormality evaluation accuracy of other objects to be evaluated is ensured.
The pieces of feature information and analysis parameters corresponding to the feature information input into the abnormality evaluation model may also be referred to as arguments of the abnormality evaluation model, and these arguments may include data information shown in table 1 below, for example.
Figure RE-GDA0003504897390000141
/>
Figure RE-GDA0003504897390000151
/>
Figure RE-GDA0003504897390000161
TABLE 1
In table 1, the contents such as legal identity document, legal age, merchant unified social credit code refer to the characteristic information of the target object, the regulatory rule is the content anomaly determination rule described in each of the foregoing embodiments, the blacklist is the anomaly information set described in each of the foregoing embodiments, and the anomaly threshold is the aggregation anomaly threshold described in each of the foregoing embodiments. The variable Li-variable Bs is a content analysis result obtained by performing content abnormality analysis on the relevant characteristic information; the variable Lb-variable Tb is a correlation analysis result obtained by performing a correlation abnormality analysis on the relevant feature information; the variable Lca-variable Tca is an aggregation analysis result obtained by performing aggregation abnormality analysis on the relevant characteristic information. The types of the variables are classified into two types, and the corresponding analysis results are quantified through 0/1 variable. The type of variable age-variable Tc in table 1 above is a continuous variable, specifically formed by the information value distribution of the relevant characteristic information.
In the process of carrying out aggregation anomaly analysis on the characteristic information, firstly, an information sample set formed by the characteristic information of a plurality of sample objects is obtained, the aggregation value of the characteristic information samples contained in the information sample set is counted, and then, the aggregation anomaly threshold value of different characteristic information samples is calculated according to the aggregation value of the characteristic information samples. It should be understood that the nature of the feature information sample, i.e. the feature information, is only distinguished from the feature information of the target object by different names, and thus the resulting aggregate anomaly threshold for the feature information sample, i.e. the aggregate anomaly threshold for the corresponding feature information. And if the number of times of using the feature information by other objects is larger than or equal to the aggregation abnormality threshold corresponding to the feature information, the obtained aggregation analysis result is expressed as a 1 variable, otherwise, the obtained aggregation analysis result is expressed as a 0 variable.
The process of calculating the aggregation anomaly threshold value of the characteristic information sample is as follows:
firstly, grouping the aggregation values of the characteristic information samples to obtain a plurality of aggregation value groups, wherein the group spacing value adopted by the grouping is represented as K, and the lower limit value in the initial group is represented as T;
then, sampling the plurality of aggregation value groups respectively to obtain sampling sample objects correspondingly contained in each aggregation value group;
and finally, determining an abnormal duty ratio F of the sampled sample object in each aggregation value group, and if the abnormal duty ratio F is larger than or equal to a preset ratio, determining the minimum aggregation value in the aggregation value group as an aggregation abnormal threshold value of the characteristic information sample represented by the aggregation value group.
The preset ratio may set a first preset ratio a and a second preset ratio B, and the first preset ratio a is greater than the second preset ratio B. If the abnormal duty ratio F is greater than or equal to a first preset ratio A, the first early warning level corresponding to the sampling sample object can be determined, and the minimum aggregation value in the aggregation value group is used as an aggregation abnormal threshold corresponding to the first early warning level. If the abnormal duty ratio F is greater than or equal to the first preset ratio B and less than the first preset ratio A, the second early warning level corresponding to the sampled sample object can be determined, and the minimum aggregation value in the aggregation value group is used as an aggregation abnormal threshold corresponding to the second early warning level, so that the degree of abnormality represented by the aggregation abnormal threshold is refined.
The values of the group spacing value K, the initial group internal lower limit value T, the first preset ratio a and the second preset ratio B can be referred to in the following table 2:
parameters (parameters) Parameter interpretation Suggesting value
k Group spacing >=10
t Initial packet lower limit >=20
a Abnormal proportion of first early warning level >=50%
b Abnormal proportion of second early warning level >=70%
TABLE 2
The values of the first preset ratio a and the second preset ratio B corresponding to different feature information can be shown in the following table 3:
Figure RE-GDA0003504897390000171
Figure RE-GDA0003504897390000181
TABLE 3 Table 3
As can be seen from table 3 above, the first preset ratio a is recommended to take 70% and the second preset ratio B is recommended to take 50% and divided based on different characteristic information. It should be noted that, the values of the first preset ratio a and the second preset ratio B set for different feature information may also be different (not shown in table 3 above).
Based on the above, not only accurate abnormality degree evaluation for the target object is realized, but also specific treatment measures are further determined according to the abnormality evaluation value of the target object, so that the business operation safety of the target object is ensured. And it can be seen that the whole data processing process is completely independent of transaction data, and can be realized only based on the characteristic information of the merchant account, wherein the characteristic information can be obtained usually in the registration stage of the merchant account, so that the abnormal risk of the merchant account can be identified in advance based on the data to be processed, and the safety and reliability of the Internet commerce environment are ensured.
Fig. 8 is a block diagram of a data processing apparatus according to an exemplary embodiment of the present application. As shown in fig. 8, the apparatus includes:
the content analysis module 410 is configured to perform content analysis on the feature information of the target object according to a preset content abnormality determination rule, so as to obtain a content analysis result corresponding to the feature information;
the association analysis module 430 is configured to perform association analysis on the characteristic information of which the content analysis result indicates that the content is normal according to a preset abnormal information set, so as to obtain an association analysis result corresponding to the characteristic information;
the aggregation analysis module 450 is configured to perform aggregation anomaly analysis on the characteristic information, the association analysis result of which indicates that the characteristic information is not intra-set information, so as to obtain an aggregation analysis result corresponding to the characteristic information;
the anomaly evaluation module 470 is configured to input feature information of the target object and analysis parameters corresponding to each piece of feature information into the anomaly evaluation model to obtain an anomaly evaluation value output by the anomaly evaluation model, where the analysis parameters corresponding to each piece of feature information include at least one of a content analysis result, a correlation analysis result, and an aggregate analysis result.
The data processing device provided in this embodiment has at least the following technical effects: on the one hand, the process of obtaining the abnormal evaluation score is based on the processing of the characteristic information of the target object, if the target object is specifically realized as a merchant account, the corresponding risk can be evaluated according to the characteristic information submitted by the merchant account during registration, so that the early discovery of the risk can be realized, and the potential risk is avoided; on the other hand, based on the analysis of three-in-one of content abnormality, association abnormality and aggregation abnormality on the characteristic information of the target object, the obtained analysis parameters corresponding to the characteristic information of the target object can comprehensively represent the abnormal program of the target object, and further an accurate abnormal evaluation value can be obtained through an abnormal evaluation model, and if the target object is specifically realized as a merchant account, the obtained abnormal evaluation value can accurately reflect the risk degree of the merchant account.
In another exemplary embodiment, the aggregate analysis module 450 includes:
a parameter acquisition unit configured to acquire an aggregation abnormality threshold value that matches the feature information of which the association analysis result indicates non-intra-set information, and to determine the number of times the feature information of which the association analysis result indicates non-intra-set information is used by other objects;
and the parameter comparison unit is configured to compare the times with the aggregation abnormal threshold value in value and determine an aggregation analysis result corresponding to the characteristic information according to the obtained comparison result.
In another exemplary embodiment, the aggregate analysis module 450 further includes:
a sample set acquisition unit configured to acquire an information sample set constituted by characteristic information of a plurality of sample objects;
the aggregation value statistics unit is configured to count aggregation values of the characteristic information samples contained in the information sample set, and the aggregation values represent the occurrence times of the characteristic information samples in the information sample set;
an aggregation abnormality threshold value acquisition unit configured to calculate an aggregation abnormality threshold value of different feature information samples from the aggregation values of the feature information samples.
In another exemplary embodiment, the sample set obtaining unit is configured to randomly obtain an information sample set formed by feature information of a plurality of sample objects from a database, where the feature information of the sample objects stored in the database is updated according to a preset period, and the preset period characterizes an abnormal change period in a scene system where the target object is located.
In another exemplary embodiment, the aggregation abnormality threshold acquiring unit includes:
an aggregate value grouping subunit configured to group the aggregate values of the feature information samples to obtain a plurality of aggregate value groups;
an aggregate value group sampling subunit configured to sample each of the plurality of aggregate value groups to obtain a sample object correspondingly contained in each of the aggregate value groups;
and the aggregation anomaly threshold value determining subunit is configured to determine the anomaly duty ratio of the sampled sample object in each aggregation value group, and if the anomaly duty ratio is greater than or equal to a preset ratio, determine that the minimum aggregation value in the aggregation value group is the aggregation anomaly threshold value of the characteristic information sample represented by the aggregation value group.
In another exemplary embodiment, the aggregation anomaly threshold value determination subunit is further configured to use, if the same feature information sample corresponds to a plurality of minimum aggregation values, the minimum aggregation value with the lowest numerical value as the aggregation anomaly threshold value of the feature information sample, or use an average value between the plurality of minimum aggregation values as the aggregation anomaly threshold value of the feature information sample.
In another exemplary embodiment, the aggregate anomaly threshold determination subunit is further configured to: if the abnormal duty ratio is greater than or equal to a first preset ratio, determining that the sampling sample object corresponds to a first early warning level, and taking the minimum aggregation value in the aggregation value group as an aggregation abnormal threshold value corresponding to the first early warning level; if the abnormal duty ratio is larger than or equal to the second preset ratio and smaller than the first preset ratio, determining that the sampled sample object corresponds to the second early warning level, and taking the minimum aggregation value in the aggregation value group as an aggregation abnormal threshold value corresponding to the second early warning level.
In another exemplary embodiment, the parameter comparison unit includes:
the threshold comparison subunit is configured to compare the number of times with an aggregation abnormal threshold corresponding to the first early warning level and an aggregation abnormal threshold corresponding to the second early warning level respectively;
an aggregate analysis result generation subunit configured to: if the times are greater than or equal to the aggregation abnormal threshold value corresponding to the first early warning level, generating an aggregation analysis result for representing the first early warning level; if the times are larger than or equal to the aggregation abnormality threshold corresponding to the first early warning level and smaller than the aggregation abnormality threshold corresponding to the second early warning level, generating an aggregation analysis result used for representing the second early warning level.
In another exemplary embodiment, the data processing apparatus further includes:
the data cleaning module is configured to clean the data of the characteristic information of the target object;
the data classification module is configured to perform classification processing on the feature information after data cleaning, and add type labels for the feature information according to the obtained classification result, wherein the type labels are used for selecting content judgment rules or abnormal information sets matched with the type labels of the feature information.
In another exemplary embodiment, the data processing apparatus further includes:
an anomaly level determination module configured to determine an anomaly level of the target object based on the anomaly evaluation score;
an exception handling module configured to: if the abnormality level of the target object is the first abnormality level, adding a mark to the target object; if the abnormal level of the target object is the second abnormal level, sending notification information for checking the identity of the target object; and if the abnormal level of the target object is the third abnormal level, controlling to close the target object to conduct business transaction, and adding the characteristic information of the target object into the abnormal information set.
It should be noted that, the data processing apparatus provided in the foregoing embodiments and the data processing method provided in the foregoing embodiments belong to the same concept, and a specific manner in which each module and unit perform an operation has been described in detail in the method embodiment, which is not described herein again. In practical application, the data processing apparatus provided in the foregoing embodiment may allocate the functions to different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above, which is not limited herein.
The embodiment of the application also provides electronic equipment, which comprises: one or more processors; and a storage device for storing one or more programs which, when executed by the one or more processors, cause the electronic device to implement the data processing method provided in the above embodiments.
Fig. 9 shows a schematic diagram of a computer system suitable for use in implementing the electronic device of the embodiments of the present application. It should be noted that, the computer system 800 of the electronic device shown in fig. 9 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.
As shown in fig. 9, the computer system 800 includes a central processing unit (Central Processing Unit, CPU) 801 that can perform various appropriate actions and processes, such as performing the methods described in the above embodiments, according to a program stored in a Read-Only Memory (ROM) 802 or a program loaded from a storage section 808 into a random access Memory (Random Access Memory, RAM) 803. In the RAM 803, various programs and data required for system operation are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other by a bus 804. An Input/Output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and the like, and a speaker, and the like; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN (Local Area Network ) card, modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read therefrom is mounted into the storage portion 808 as needed.
In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809 and/or installed from the removable media 811. When executed by a Central Processing Unit (CPU) 801, the computer program performs the various functions defined in the system of the present application.
It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with a computer-readable computer program embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
Another aspect of the present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data processing method as described above. The computer-readable storage medium may be included in the electronic device described in the above embodiment or may exist alone without being incorporated in the electronic device.
Another aspect of the present application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the data processing method provided in the above-described respective embodiments.
The foregoing is merely a preferred exemplary embodiment of the present application and is not intended to limit the embodiments of the present application, and those skilled in the art may make various changes and modifications according to the main concept and spirit of the present application, so that the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (14)

1. A method of data processing, comprising:
performing content analysis on the characteristic information of the target object according to a preset content abnormality judgment rule to obtain a content analysis result corresponding to the characteristic information;
performing association analysis on characteristic information of which the content analysis result indicates that the content is normal according to a preset abnormal information set to obtain an association analysis result corresponding to the characteristic information;
performing aggregation anomaly analysis on the characteristic information of which the association analysis result indicates to be non-intra-set information, and obtaining an aggregation analysis result corresponding to the characteristic information;
inputting the characteristic information of the target object and the analysis parameters corresponding to the characteristic information into an abnormal evaluation model to obtain an abnormal evaluation value output by the abnormal evaluation model, wherein the analysis parameters corresponding to the characteristic information comprise at least one of the content analysis result, the association analysis result and the aggregation analysis result.
2. The method of claim 1, wherein the performing the aggregate anomaly analysis on the feature information indicated as the non-intra-set information by the association analysis result to obtain an aggregate analysis result corresponding to the feature information includes:
Acquiring an aggregation abnormal threshold value matched with the characteristic information of which the association analysis result indicates as non-intra-set information, and determining the frequency of using the characteristic information of which the association analysis result indicates as non-intra-set information by other objects;
and comparing the times with the aggregation abnormal threshold value in value, and determining an aggregation analysis result corresponding to the characteristic information according to the obtained comparison result.
3. The method of claim 2, wherein prior to the obtaining an aggregate anomaly threshold that matches the characteristic information indicated as non-intra-set by the association analysis result and determining a number of times the characteristic information indicated as non-intra-set by the association analysis result is used by other objects, the method further comprises:
acquiring an information sample set formed by characteristic information of a plurality of sample objects;
counting the aggregation value of the characteristic information samples contained in the information sample set, wherein the aggregation value represents the occurrence times of each characteristic information sample in the information sample set;
and calculating the aggregation abnormal threshold value of different characteristic information samples according to the aggregation value of the characteristic information samples.
4. A method according to claim 3, wherein said obtaining an information sample set of characteristic information of a plurality of sample objects comprises:
And randomly acquiring an information sample set formed by the characteristic information of a plurality of sample objects from a database, wherein the characteristic information of the sample objects stored in the database is updated according to a preset period, and the preset period represents an abnormal change period of a scene system where the target object is located.
5. A method according to claim 3, wherein said calculating an aggregate anomaly threshold for different characteristic information samples from the aggregate value of the characteristic information samples comprises:
grouping the aggregate values of the characteristic information samples to obtain a plurality of aggregate value groups;
sampling each of the plurality of aggregate value groups to obtain a sample object corresponding to each of the aggregate value groups;
and determining an abnormal duty ratio of the sampled sample object in each aggregation value group, and if the abnormal duty ratio is larger than or equal to a preset ratio, determining the minimum aggregation value in the aggregation value group as an aggregation abnormal threshold value of the characteristic information sample represented by the aggregation value group.
6. The method of claim 5, wherein the method further comprises:
and if the same characteristic information sample corresponds to a plurality of minimum aggregation values, taking the minimum aggregation value with the lowest value as an aggregation abnormality threshold of the characteristic information sample, or taking an average value among the plurality of minimum aggregation values as the aggregation abnormality threshold of the characteristic information sample.
7. The method according to claim 5, wherein determining the abnormal duty cycle of the sampled sample object in each aggregation value group, and if the abnormal duty cycle is greater than or equal to a preset ratio, determining the smallest aggregation value in the aggregation value group as the aggregation abnormal threshold of the characteristic information sample represented by the aggregation value group includes:
if the abnormal duty ratio is greater than or equal to a first preset ratio, determining that the sampling sample object corresponds to a first early warning level, and taking the minimum aggregation value in the aggregation value group as an aggregation abnormal threshold corresponding to the first early warning level;
if the abnormal duty ratio is larger than or equal to a second preset ratio and smaller than the first preset ratio, determining that the sampling sample object corresponds to a second early warning level, and taking the minimum aggregation value in the aggregation value group as an aggregation abnormal threshold corresponding to the second early warning level.
8. The method according to claim 7, wherein comparing the number of times with the aggregation anomaly threshold value in magnitude, and determining the aggregation analysis result corresponding to the feature information according to the obtained comparison result comprises:
the times are respectively compared with an aggregation abnormal threshold corresponding to the first early warning level and an aggregation abnormal threshold corresponding to the second early warning level;
If the times are greater than or equal to the aggregation abnormal threshold corresponding to the first early warning level, generating an aggregation analysis result for representing the first early warning level;
and if the times are greater than or equal to the aggregation abnormality threshold corresponding to the first early warning level and less than the aggregation abnormality threshold corresponding to the second early warning level, generating an aggregation analysis result for representing the second early warning level.
9. The method according to any one of claims 1 to 8, wherein before the content analysis is performed on the feature information of the target object according to the preset content abnormality determination rule, the method further comprises:
carrying out data cleaning on the characteristic information of the target object;
and classifying the feature information after data cleaning, and adding type labels for each piece of feature information according to the obtained classification result, wherein the type labels are used for selecting content judgment rules or abnormal information sets matched with the type labels of each piece of feature information.
10. The method according to any one of claims 1 to 8, further comprising, after the inputting of the feature information of the target object and the analysis parameters corresponding to the respective pieces of feature information into an abnormality evaluation model to obtain an abnormality evaluation score output by the abnormality evaluation model:
Determining an abnormality grade of the target object according to the abnormality evaluation value;
if the abnormal level of the target object is the first abnormal level, adding a mark to the target object;
if the abnormal level of the target object is the second abnormal level, sending notification information for checking the identity of the target object;
and if the abnormal level of the target object is the third abnormal level, controlling to close the target object to conduct business transaction, and adding the characteristic information of the target object into the abnormal information set.
11. A data processing apparatus, comprising:
the content analysis module is configured to perform content analysis on the characteristic information of the target object according to a preset content abnormality judgment rule to obtain a content analysis result corresponding to the characteristic information;
the association analysis module is configured to perform association analysis on the characteristic information of which the content analysis result indicates that the content is normal according to a preset abnormal information set, so as to obtain an association analysis result corresponding to the characteristic information;
the aggregation analysis module is configured to perform aggregation anomaly analysis on the characteristic information of which the associated analysis result indicates to be non-intra-set information, and obtain an aggregation analysis result corresponding to the characteristic information;
The abnormality evaluation module is configured to input the characteristic information of the target object and the analysis parameters corresponding to the characteristic information into an abnormality evaluation model to obtain an abnormality evaluation value output by the abnormality evaluation model, wherein the analysis parameters corresponding to the characteristic information comprise at least one of the content analysis result, the association analysis result and the aggregation analysis result.
12. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to implement the data processing method of any of claims 1 to 10.
13. A computer readable storage medium having stored thereon computer readable instructions which, when executed by a processor of a computer, cause the computer to perform the data processing method of any of claims 1 to 10.
14. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the data processing method of any of claims 1 to 10.
CN202111472689.4A 2021-12-03 2021-12-03 Data processing method and device, electronic equipment, storage medium and program product Pending CN116308444A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111472689.4A CN116308444A (en) 2021-12-03 2021-12-03 Data processing method and device, electronic equipment, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111472689.4A CN116308444A (en) 2021-12-03 2021-12-03 Data processing method and device, electronic equipment, storage medium and program product

Publications (1)

Publication Number Publication Date
CN116308444A true CN116308444A (en) 2023-06-23

Family

ID=86791020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111472689.4A Pending CN116308444A (en) 2021-12-03 2021-12-03 Data processing method and device, electronic equipment, storage medium and program product

Country Status (1)

Country Link
CN (1) CN116308444A (en)

Similar Documents

Publication Publication Date Title
CN112581259B (en) Account risk identification method and device, storage medium and electronic equipment
CN111931047B (en) Artificial intelligence-based black product account detection method and related device
CN114139931A (en) Enterprise data evaluation method and device, computer equipment and storage medium
CN111524614B (en) Epidemic situation information notification system
CN112950359A (en) User identification method and device
CN115115369A (en) Data processing method, device, equipment and storage medium
CN116739605A (en) Transaction data detection method, device, equipment and storage medium
CN111582722A (en) Risk identification method and device, electronic equipment and readable storage medium
CN114363082B (en) Network attack detection method, device, equipment and computer readable storage medium
CN114817518B (en) License handling method, system and medium based on big data archive identification
CN107871213B (en) Transaction behavior evaluation method, device, server and storage medium
CN116228312A (en) Processing method and device for large-amount point exchange behavior
CN116308444A (en) Data processing method and device, electronic equipment, storage medium and program product
CN115205026A (en) Credit evaluation method, device, equipment and computer storage medium
CN112712270B (en) Information processing method, device, equipment and storage medium
CN110458707B (en) Behavior evaluation method and device based on classification model and terminal equipment
CN114189585A (en) Crank call abnormity detection method and device and computing equipment
CN113452648A (en) Method, device, equipment and computer readable medium for detecting network attack
CN110675136A (en) Information processing method, device and equipment
US9996691B1 (en) Using signals from developer clusters
CN116506333B (en) Transaction system production inversion detection method and equipment
CN115511428A (en) Data processing method and device, computer equipment and storage medium
CN117010892A (en) Payment risk detection method, device, electronic equipment and readable medium
CN111681041A (en) Electronic coupon issuing method and device, electronic equipment and storage medium
CN117172700A (en) Real estate asset supervision method, system and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40088344

Country of ref document: HK