CN115601042A - Information identification method and device, electronic equipment and storage medium - Google Patents

Information identification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115601042A
CN115601042A CN202211317594.XA CN202211317594A CN115601042A CN 115601042 A CN115601042 A CN 115601042A CN 202211317594 A CN202211317594 A CN 202211317594A CN 115601042 A CN115601042 A CN 115601042A
Authority
CN
China
Prior art keywords
information
identified
model
fraud
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211317594.XA
Other languages
Chinese (zh)
Inventor
尤丽
王加正
胡宝龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202211317594.XA priority Critical patent/CN115601042A/en
Publication of CN115601042A publication Critical patent/CN115601042A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an information identification method, an information identification device, electronic equipment and a storage medium. Acquiring information to be identified, and determining characteristics to be identified corresponding to the information to be identified; inputting the features to be recognized into a target anti-fraud model which is trained in advance to obtain an information recognition result corresponding to the information to be recognized, wherein the target anti-fraud model is obtained by training based on sample information and an expected recognition result corresponding to the sample information, and the training process of the target anti-fraud model comprises the operation of executing model compression processing; and determining whether the information to be identified is fraud information or not based on the information identification result. The accuracy of determining the information identification result of the information to be identified is improved.

Description

Information identification method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of anti-fraud technologies, and in particular, to an information identification method and apparatus, an electronic device, and a storage medium.
Background
In recent years, with the rapid development of financial technologies, industrial changes continue to accelerate. However, while financial businesses have been rapidly developing, their risks have been expanding. Each internet financial institution faces increasingly serious fraud challenges. In this context, anti-fraud becomes an indispensable loop in financial systems.
Currently, machine learning is widely applied in anti-fraud scenarios, and similarity calculation is performed on application data through a neural network algorithm to determine fraud risk of the application data. However, the relevance among variables in the application data is often ignored in the current neural network algorithm, so that the accuracy of the identification result of the fraud risk of the application data is low.
Disclosure of Invention
The invention provides an information identification method, an information identification device, electronic equipment and a storage medium, and aims to solve the technical problem that the accuracy of identification results of application data fraud risks is low.
According to an aspect of the present invention, there is provided an information identifying method, wherein the method includes:
acquiring information to be identified, and determining characteristics to be identified corresponding to the information to be identified;
inputting the features to be recognized into a pre-trained target anti-fraud model to obtain an information recognition result corresponding to the information to be recognized, wherein the target anti-fraud model is obtained by training based on sample information and an expected recognition result corresponding to the sample information, and the training process of the target anti-fraud model comprises an operation of executing model compression processing;
and determining whether the information to be identified is fraud information or not based on the information identification result.
According to another aspect of the present invention, there is provided an information recognition apparatus, wherein the apparatus comprises:
the characteristic extraction module is used for acquiring information to be identified and determining the characteristic to be identified corresponding to the information to be identified;
the model processing module is used for inputting the features to be recognized into a pre-trained target anti-fraud model to obtain an information recognition result corresponding to the information to be recognized, wherein the target anti-fraud model is obtained by training based on sample information and an expected recognition result corresponding to the sample information, and the training process of the target anti-fraud model comprises an operation of executing model compression processing;
and the information identification module is used for determining whether the information to be identified is fraud information or not based on the information identification result.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the information identification method according to any of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement the information identification method according to any one of the embodiments of the present invention when the computer instructions are executed.
According to the technical scheme of the embodiment of the invention, the information to be identified is obtained, the characteristic to be identified corresponding to the information to be identified is determined, the preset characteristic in the information to be identified is associated, and the risk identification basis of the information to be identified is increased; inputting the features to be recognized into a pre-trained target anti-fraud model to obtain an information recognition result corresponding to the information to be recognized, wherein the target anti-fraud model is obtained by performing model compression processing on an initial anti-fraud model, and the initial anti-fraud model is obtained by training based on sample information and an expected recognition result corresponding to the sample information, so that the consumption of server computing resources is reduced, and the model computing speed is improved; and determining whether the information to be identified is fraudulent information based on the information identification result, so that the accuracy of the information identification result is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of an information identification method according to an embodiment of the present invention;
fig. 2 is a flowchart of an information identification method according to a second embodiment of the present invention;
FIG. 3 is a flowchart of an information recognition method according to a third embodiment of the present invention;
FIG. 4 is a flowchart illustrating an information recognition method according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an information recognition apparatus according to a fourth embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device implementing the information identification method according to the embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It will be appreciated that the data referred to in this disclosure, including but not limited to the data itself, the acquisition or use of the data, should comply with the requirements of the applicable laws and regulations and related regulations. For example, the user basic information may include information set to be public by the user.
Example one
Fig. 1 is a flowchart of an information recognition method according to an embodiment of the present invention, which is applicable to the anti-fraud case of the internet financial institution, and the method may be executed by an information recognition apparatus, which may be implemented in the form of hardware and/or software, and the information recognition apparatus may be configured in a computer. As shown in fig. 1, the method includes:
s110, information to be identified is obtained, and features to be identified corresponding to the information to be identified are determined.
Wherein, the information to be identified can be understood as the information to be identified whether the fraud risk exists. Alternatively, the information to be identified may be network information that may present a risk of fraud. Illustratively, the information to be identified may be platform application information of the network platform. Optionally, the obtaining information to be identified includes: platform application information aiming at a target platform is received and is used as information to be identified.
The feature to be recognized can be understood as a feature constructed by analyzing the information to be recognized. Optionally, the feature to be identified may be a parameter feature associated with the information to be identified. For example, the information to be recognized may include multiple types of sub information, and the feature to be recognized may be a feature vector constructed based on feature values corresponding to the various types of sub information.
Optionally, determining the feature to be identified corresponding to the information to be identified includes: performing feature extraction on the information to be identified based on a preset feature extraction algorithm corresponding to the feature to be identified to obtain the feature to be identified corresponding to the information to be identified; and/or extracting the features of the information to be recognized based on a pre-trained feature extraction model to obtain the features to be recognized corresponding to the information to be recognized. The feature extraction model is obtained by training a pre-established neural network model based on sample identification information and expected identification features corresponding to the sample identification information.
The neural network model can be understood as a network similar to the network formed by neurons of the human brain. Alternatively, the neural network algorithm may be formed by connecting individual units, each unit having an input and an output of numerical quantities, and may be in the form of a real number or a linear combined function. It is to be understood that the neural network model needs to be learned with a learning criterion to work. The probability of a decision error can be reduced by the neural network model. The neural network model has strong generalization capability and nonlinear mapping capability, can perform model processing on a system with small information quantity, has parallelism and can quickly transmit information.
And S120, inputting the features to be recognized into a pre-trained target anti-fraud model to obtain an information recognition result corresponding to the information to be recognized.
The target anti-fraud model is obtained by training based on sample information and an expected recognition result corresponding to the sample information, and the training process of the target anti-fraud model comprises an operation of executing model compression processing.
Wherein, the target anti-fraud model can be understood as a model for identifying whether the information to be identified is fraud information.
Wherein the model compression process may include a pruning process and/or a quantization process.
The information identification result can be understood as an identification result corresponding to the to-be-identified information determined by the target anti-fraud model for the input to-be-identified feature. Alternatively, the information identification result may be a result used to determine whether the information to be identified is fraudulent information. For example, the information identification result may be a fraud score corresponding to the information to be identified.
Wherein the sample information may be understood as sample information for training the initial anti-fraud model. Optionally, the sample information may be history information of the same type as the information to be identified. Illustratively, the sample information may be historical application information. The expected recognition result may be understood as a recognition result that is expected to be determined by the initial anti-fraud model for the sample information. Alternatively, the expected recognition result may be a true recognition result of the sample information. Illustratively, the expected identification result may be fraudulent information or non-fraudulent information.
Specifically, training a pre-established full-connection model based on sample information and an expected recognition result corresponding to the sample information to obtain an optimized full-connection model; further, pruning is carried out on the adjusted and optimized full-connection model to obtain a compressed full-connection model; further, training the full-connection model based on the training set; and carrying out quantitative processing on the full-connection model in the training process to obtain the target anti-fraud model. In the embodiment of the invention, in the training process of the full-connection model, pruning quantification is alternately carried out on the full-connection model until the model training is stopped when one-time pruning or quantification has great influence on an information identification result, so that the target anti-fraud model is obtained.
S130, determining whether the information to be identified is fraud information or not based on the information identification result.
Wherein, the information identification result may be an information score of the information to be identified. The fraud information may be understood as the information to be identified that presents a risk of fraud.
Specifically, a fraud scoring threshold value may be preset, and the information to be identified, which is corresponding to the information identification result being greater than or equal to the fraud scoring threshold value, is determined as fraud information; and determining the information to be identified, which is smaller than the preset fraud scoring threshold value in the information identification result, as non-fraud information. The fraud scoring threshold may be preset according to a scene requirement, and is not specifically limited herein. Alternatively, the fraud scoring threshold may be preset to 50, 60, 70, etc.
According to the technical scheme of the embodiment of the invention, the information to be identified is obtained, the characteristic to be identified corresponding to the information to be identified is determined, the preset characteristic in the information to be identified is associated, and the risk identification basis of the information to be identified is increased; inputting the features to be recognized into a target anti-fraud model which is trained in advance to obtain an information recognition result corresponding to the information to be recognized, wherein the target anti-fraud model is obtained by model compression processing of an initial anti-fraud model, and the initial anti-fraud model is obtained by training based on sample information and an expected recognition result corresponding to the sample information, so that the consumption of server computing resources is reduced, and the model computing speed is improved; whether the information to be identified is fraud information or not is determined based on the information identification result, and the accuracy of the information identification result is improved.
Example two
Fig. 2 is a flowchart of an information identification method according to a second embodiment of the present invention, where the present embodiment is to refine the determination of the feature to be identified corresponding to the information to be identified in the foregoing embodiment. As shown in fig. 2, the method includes:
and S210, acquiring information to be identified.
S220, determining a plurality of preset features corresponding to the information to be identified.
The preset feature may be understood as a feature that fraudulent activities may exist in the information to be identified. Optionally, the preset feature may be a first preset feature, a second preset feature, a third preset feature, and the like. In the embodiment of the present invention, the preset feature may be preset according to a scene requirement, and is not specifically limited herein. The preset features may be preset information features of different dimensions of the information to be identified, and the like. For example, the information type of each sub information in the information to be identified may be used.
S230, aiming at each preset feature, determining the information quantity of the associated information corresponding to the information to be identified based on the preset feature, and obtaining a feature parameter corresponding to the preset feature based on the information quantity.
Wherein, the associated information can be understood as information associated with the information to be identified. Optionally, the associated information may be comparison information of the information to be identified. For example, the association information may be different information having the same preset characteristic as the information to be identified. Specifically, if a first preset feature that is the same as the information to be identified and a second preset feature that is different from the information to be identified exist in first application information, the first application information may be used as the associated information of the information to be identified.
The information quantity may be understood as the quantity of the associated information corresponding to the information to be identified. It is understood that, in the history application information, there may be associated information corresponding to the information to be identified or there may be no associated information corresponding to the information to be identified. Thus, the amount of information may be zero, or other non-zero value.
The characteristic parameter may be understood as a parameter corresponding to the preset characteristic obtained based on the information amount. Optionally, the characteristic parameter may be a numerical value corresponding to the preset characteristic based on the information amount. In the embodiment of the present invention, the manner of obtaining the feature parameter corresponding to the preset feature based on the information amount may be preset according to a scene requirement, which is not specifically limited herein.
Optionally, the obtaining of the feature parameter corresponding to the preset feature based on the information quantity includes: taking the information quantity as a characteristic parameter corresponding to the preset characteristic; or when the information quantity is zero, adopting a first numerical value as the characteristic parameter corresponding to the preset characteristic, and when the information quantity is greater than zero, adopting a second numerical value as the characteristic parameter corresponding to the preset characteristic.
Exemplarily, when the number of the associated information corresponding to the information to be identified is 0, determining that the characteristic parameter corresponding to the preset characteristic is 0; when the quantity of the associated information corresponding to the information to be identified is 1, determining that a characteristic parameter corresponding to the preset characteristic is 1; and when the number of the associated information corresponding to the information to be identified is 5, determining that the characteristic parameter corresponding to the preset characteristic is 5.
Or when the quantity of the associated information corresponding to the information to be identified is 0, determining that the characteristic parameter corresponding to the preset characteristic is a first numerical value; and when the quantity of the associated information corresponding to the information to be identified is 1, 2 or 5, determining that the characteristic parameter corresponding to the preset characteristic is a second numerical value. In the embodiment of the present invention, the first numerical value and the second numerical value may be preset according to a scene requirement, and are not specifically limited herein. Alternatively, the first value may be 0, 3, or 4, etc.; the second value may be 1, 3 or 5, etc. It is understood that the first and second values may be different.
S240, analyzing the information to be identified based on a plurality of preset features, and respectively obtaining feature parameters corresponding to each preset feature.
Specifically, each preset feature in the information to be identified is analyzed, and a feature parameter corresponding to each preset feature is determined. It can be understood that, in the information to be identified, the feature parameters corresponding to each of the preset features may be the same or different.
S250, constructing the feature to be identified corresponding to the information to be identified based on the feature parameter corresponding to each preset feature.
The feature to be recognized may be understood as a feature constructed based on the feature parameter corresponding to each preset feature in the information to be recognized. Optionally, the feature to be recognized may be a vector constructed based on the feature parameter corresponding to each of the preset features.
Optionally, the constructing the feature to be identified corresponding to the information to be identified based on the feature parameter corresponding to each preset feature includes: and normalizing the characteristic parameters corresponding to each preset characteristic, and constructing the characteristic to be identified corresponding to the information to be identified by taking the normalized characteristic parameters as vector elements of the characteristic to be identified, wherein the vector elements of the characteristic to be identified are arranged according to a preset sequence.
The vector elements may be understood as elements obtained by normalizing the characteristic parameters. Specifically, the feature parameters may be mapped into a range of [0,1], and the feature parameters obtained after mapping are used as vector elements of the features to be identified.
The preset sequence can be understood as the arrangement sequence of each vector element of the feature to be recognized. In the embodiment of the present invention, the preset sequence may be preset according to a scene requirement, and is not specifically limited herein. Optionally, specifically, the arrangement order of the preset features may be preset, and further, each vector element of the feature to be recognized is arranged based on the corresponding arrangement order of the preset features, so as to construct the feature to be recognized corresponding to the information to be recognized.
And S260, inputting the features to be recognized into a pre-trained target anti-fraud model to obtain an information recognition result corresponding to the information to be recognized.
S270, determining whether the information to be identified is fraud information or not based on the information identification result.
According to the technical scheme of the embodiment of the invention, a plurality of preset characteristics corresponding to the information to be identified are determined; for each preset feature, determining the information quantity of the associated information corresponding to the information to be identified based on the preset feature, and obtaining a feature parameter corresponding to the preset feature based on the information quantity; analyzing the information to be identified based on a plurality of preset features to respectively obtain a feature parameter corresponding to each preset feature; and constructing the feature to be identified corresponding to the information to be identified based on the feature parameter corresponding to each preset feature. By constructing the to-be-identified features corresponding to the to-be-identified information, the relation among a plurality of preset features corresponding to the to-be-identified information is established, and the accuracy of the information identification result corresponding to the to-be-identified information is improved.
EXAMPLE III
Fig. 3 is a flowchart of an information recognition method according to a third embodiment of the present invention, which is added to the above embodiment by inputting the feature to be recognized into a pre-trained target anti-fraud model. As shown in fig. 3, the method includes:
s310, obtaining information to be identified, and determining the feature to be identified corresponding to the information to be identified.
S320, training a pre-established full-connection model based on the sample information and the expected identification result corresponding to the sample information to obtain an initial anti-fraud model.
The fully connected model can be understood as a multi-layer perceptron, namely, the hyperplane which is most reasonable and robust among classes can be found. Optionally, the fully connected model may be a Support Vector Machines (SVM) algorithm. In the embodiment of the present invention, specifically, the pre-established full connection model may be trained based on the sample information and the expected recognition result corresponding to the sample information, so as to obtain the initial anti-fraud model.
S330, model compression processing is carried out on the initial anti-fraud model to obtain a target anti-fraud model.
Wherein the model compression process at least comprises a pruning process and/or a quantization process.
It should be understood that, in the embodiment of the present invention, there may be a plurality of weights in the initial anti-fraud model approaching to zero, which have little influence on the information identification result of the model, but a large amount of computing resources may be consumed in the computing process, so that the computing speed of the model is slow. Therefore, pruning processing can be carried out on the initial anti-fraud model so as to prune the neurons with the weight approaching zero and improve the calculation speed of the model.
Optionally, the pruning processing on the initial anti-fraud model includes: and acquiring a plurality of weights corresponding to the initial anti-fraud model, determining the weight to be pruned based on the amplitude of the weight, and setting the weight to be pruned to be zero.
Specifically, a plurality of weights corresponding to the initial anti-fraud model may be obtained, the weights are arranged from small to large, the weight smaller than a preset threshold is used as the weight to be pruned, and the weight to be pruned is set to zero. The preset threshold may be preset according to a scene requirement, and is not specifically limited herein. Optionally, the preset threshold may be a pruning rate. Optionally, the preset threshold may be 70%.
Further, the weight of the initial anti-fraud model after pruning processing may be subjected to sparsification and/or discretization, that is, quantization processing. So that the computational resources and memory resources of the model are exponentially improved.
Optionally, the performing quantization processing on the initial anti-fraud model includes: and converting the weight corresponding to the initial anti-fraud model from a floating-point real number into an integer number.
Here, the floating-point type real number may be understood as a real number in which a fractional part exists. The integer number can be understood as a real number without a fractional part, and alternatively, the integer number can be a fixed-point integer of 8 bits, i.e., an INT 8-bit weight. It can be understood that, by converting the weight corresponding to the initial anti-fraud model from a floating-point real number to an integer number, the precision of the weight can be reduced, and the consumption of computing resources can be reduced.
Specifically, pruning processing and quantization processing are alternately and repeatedly performed until the model compression processing is stopped when a pruning or reduction of the weight digit causes great difference to the information identification result; further, the initial anti-fraud model after pruning and/or quantification can be used as the target anti-fraud model.
S340, inputting the features to be recognized into a pre-trained target anti-fraud model to obtain information recognition results corresponding to the information to be recognized.
And S350, determining whether the information to be identified is fraud information or not based on the information identification result.
According to the technical scheme of the embodiment of the invention, an initial anti-fraud model is obtained by training a pre-established full-connection model based on sample information and an expected identification result corresponding to the sample information; and performing model compression processing on the initial anti-fraud model to obtain a target anti-fraud model, wherein the model compression processing at least comprises pruning processing and/or quantification processing. The initial anti-fraud model is pruned, neurons with weights approaching zero in the initial anti-fraud model can be pruned, the calculation speed of the model is improved, the initial anti-fraud model is quantized, the weights of the initial anti-fraud model can be further thinned and discretized, and the calculation resources and the storage resources of the model are improved. The target anti-fraud model with high calculation speed and excellent calculation resources and storage resources is obtained.
FIG. 4 is a flowchart illustrating an overall method for identifying information according to an embodiment of the present invention; as shown in fig. 4, the overall flow of the information identification method may include: determining the characteristics to be identified of the application information; training a full-connection model based on sample information to obtain an initial anti-fraud model; pruning the initial anti-fraud model to obtain a target anti-fraud model; and testing whether the information identification result determined by the target anti-fraud model is accurate or not through the test information.
At present, for an anti-fraud application scenario, each internet platform has a respective anti-fraud policy, and determines whether the application information has a fraud risk by using the authority application information as a data source. Among them, machine learning is widely applied in anti-fraud scenarios. Enterprises can establish own data platforms, provide great computational power required by machine learning schemes, and collect data of each data platform for unified processing and return the data to the data platform for fraud scoring; further, on the basis of fraud scoring, similarity calculation is performed on the application information aiming at the application scene to determine whether fraud risks exist in the application information.
However, under the architecture of front-middle-background separation, the data platform cannot acquire the full amount of application information, and normally, the application information is pushed to the data platform through a T +1 mode, so that a data source of the data platform has hysteresis, a fraud risk of the application information on the current day needs to be considered, but huge calculation power cannot be provided, only part of preset features with possible anti-fraud are extracted for similarity calculation, a threshold value is set, a fraud risk is determined if the threshold value is exceeded, in this case, the relevance of the preset features is often ignored, and in an actual fraud scene, the preset features have a relatively large relevance.
Specifically, the overall process of the information identification method may be:
1. and determining the characteristics to be identified of the application information. Acquiring application information of a big data platform, comparing the data of the application information with the full-scale information of the big data platform, extracting the quantity of associated information corresponding to preset features in the application information, and constructing features to be identified as network model input; and outputting the fraud score of the application information as a network model.
Illustratively, the preset features in the application information may include, but are not limited to: different unit names of the same unit telephone, different unit addresses of the same unit telephone, the same mobile phone number, the same unit full-name different unit telephones, different unit addresses of the same unit name, different unit telephones of the same unit address, different unit full-names of the same unit address, the same family address, different contact names of the same contact certificate number, different contact mobile phone numbers of the same contact certificate number, different contact names of the same contact mobile phone number, different contact certificate numbers of the same contact, the same marketer name, different mobile phone numbers of the same e-mail box, the same promoter and other personal or unit application form numbers and other characteristics.
Then, the data is preprocessed: considering that the number of the application information is limited, most of the application information may have a smaller result corresponding to 15 features after the above processing, and a smaller part of the application form has a larger result corresponding to the features. Therefore, in order to avoid the influence on the model training and prediction effects caused by serious imbalance of the samples, a threshold value can be selected to sample the two types of data; and normalizing the characteristic values of the 15 preset characteristics to reduce the influence of singular data on model training and result evaluation. And dividing the preprocessed data into a training set and a testing set according to a certain proportion.
2. And training a full-connection model based on the sample information to obtain an initial anti-fraud model. Training to obtain an initial anti-fraud model based on sample information and an expected recognition result corresponding to the sample information. Using the training set for training of the fully-connected neural network model: the fully-connected application network model architecture in the embodiment of the invention is 15 × 5, and the number of neurons and the network depth of each layer can be adjusted according to the input number and model precision required by different service scenes.
3. In the neural network model, part of weight values approach zero, the influence on the model prediction result is small, and a large amount of computing resources are consumed in the computing process. Therefore, in the embodiment of the invention, pruning processing is carried out on the neural network model, the net output is regularized, normalization constraint is carried out on the net output, unnecessary weight is pruned, and the required weight can be increased and the capacity is transferred. And (4) cutting off neurons with weights approaching zero, and accelerating the model calculation speed. Specifically, pruning quantification processing is carried out on the initial anti-fraud model to obtain a target anti-fraud model. Carrying out quantitative pruning on the initial anti-fraud model, and alternately carrying out quantitative pruning: firstly, pruning the model, in the embodiment of the invention, amplitude-based pruning can be adopted, namely, the weight of the initial anti-fraud model after training is taken out, further, the weight of the first 70% of the weights arranged from small to large is set as 0, and then the model is trained; furthermore, the weight value of the model after pruning and training can be a 32-bit floating point real number, namely FP32, and the weight value is normalized to be INT8; pruning quantification is alternately and repeatedly carried out until the quantification pruning is stopped when the recall ratio and the precision ratio are greatly different by pruning once or reducing the weight digits; in the embodiment of the present invention, in pruning, the number of neurons in the quantization Unit may not be fixed, and therefore, the activation function (Relu) may be a quantization activation function, that is, quantizedralu.
4. And testing whether the information identification result determined by the target anti-fraud model is accurate or not through the test information. And (3) combing a network model access, inputting the test information into the target anti-fraud model, and outputting the hierarchical calculation network model as the next layer of input until a fraud score is calculated finally. When the accuracy of the fraud score output by the target anti-fraud model reaches an expected value, the target anti-fraud model can be determined to be trained and finished, and then the target anti-fraud model can be deployed to an application end.
The technical scheme of the embodiment of the invention has the following beneficial effects:
1) And training a neural network model for a data source through the historical application information, and identifying the application information which possibly has fraudulent behaviors. And training the nonlinear weight obtained by the neural network model into a low-precision quantitative network model by combining a pruning quantitative method. The consumption of the computing resources of the server is reduced, and the machine learning can be comprehensively applied to the scene of quickly identifying the application information.
2) By means of superposition calculation of the preset features, the relevance of the preset features is increased, the problem that a threshold value is set to be non-zero or one in a relevant anti-fraud model is solved, the relevance among the preset features is fully considered, and the defect of the hysteresis of fraud information is overcome;
3) By pruning and quantifying the neural network model, the consumption of computational performance is reduced, the problems of data hysteresis and computational power bottleneck of a big data platform are solved, and the recognition speed of the target anti-fraud model on the information to be recognized is improved. By analyzing the path of the initial anti-fraud model and realizing the nonlinear superposition of the weights based on the codes, the problem of dependence of a server on a cloud computing platform is solved, and the information identification result corresponding to the information to be identified, which is determined by the target anti-fraud model, is more accurate.
Example four
Fig. 5 is a schematic structural diagram of an information identification apparatus according to a fourth embodiment of the present invention. As shown in fig. 5, the apparatus includes: a feature extraction module 410, a model processing module 420, and an information identification module 430.
The feature extraction module 410 is configured to obtain information to be identified, and determine a feature to be identified corresponding to the information to be identified; the model processing module 420 is configured to input the feature to be recognized into a target anti-fraud model that is trained in advance, and obtain an information recognition result corresponding to the information to be recognized, where the target anti-fraud model is obtained by training based on sample information and an expected recognition result corresponding to the sample information, and a training process of the target anti-fraud model includes an operation of performing model compression processing; an information identification module 430, configured to determine whether the information to be identified is fraud information based on the information identification result.
According to the technical scheme of the embodiment of the invention, the information to be identified is obtained, the characteristics to be identified corresponding to the information to be identified are determined, the preset characteristics in the information to be identified are associated, and the risk identification basis of the information to be identified is increased; inputting the features to be recognized into a target anti-fraud model which is trained in advance to obtain an information recognition result corresponding to the information to be recognized, wherein the target anti-fraud model is obtained by model compression processing of an initial anti-fraud model, and the initial anti-fraud model is obtained by training based on sample information and an expected recognition result corresponding to the sample information, so that the consumption of server computing resources is reduced, and the model computing speed is improved; whether the information to be identified is fraud information or not is determined based on the information identification result, and the accuracy of the information identification result is improved.
Optionally, the feature extraction module 410 includes: the device comprises a preset feature determining submodule, an information quantity determining submodule, an information analyzing submodule and a feature building submodule to be identified.
The preset feature determining submodule is used for determining a plurality of preset features corresponding to the information to be identified;
the information quantity determining submodule is used for determining the information quantity of the associated information corresponding to the information to be identified based on the preset features aiming at each preset feature, and obtaining the feature parameters corresponding to the preset features based on the information quantity;
the information analysis submodule is used for analyzing the information to be identified based on a plurality of preset features to respectively obtain a feature parameter corresponding to each preset feature;
the feature to be identified constructing submodule is used for constructing the feature to be identified corresponding to the information to be identified based on the feature parameter corresponding to each preset feature.
Optionally, the information quantity determining sub-module is configured to:
taking the information quantity as a characteristic parameter corresponding to the preset characteristic; alternatively, the first and second liquid crystal display panels may be,
and when the information quantity is zero, adopting a first numerical value as the characteristic parameter corresponding to the preset characteristic, and when the information quantity is more than zero, adopting a second numerical value as the characteristic parameter corresponding to the preset characteristic.
Optionally, the feature to be identified constructs a sub-module, which is configured to:
and normalizing the characteristic parameters corresponding to each preset characteristic, and constructing the characteristic to be identified corresponding to the information to be identified by taking the normalized characteristic parameters as vector elements of the characteristic to be identified, wherein the vector elements of the characteristic to be identified are arranged according to a preset sequence.
Optionally, the information identification method further includes: the system comprises an initial anti-fraud model acquisition module and a model compression processing module.
Before the feature to be recognized is input into the target anti-fraud model which is trained in advance, the method further comprises the following steps:
the initial anti-fraud model obtaining module is used for training a pre-established full-connection model based on sample information and an expected identification result corresponding to the sample information to obtain an initial anti-fraud model;
and the model compression processing module is used for performing model compression processing on the initial anti-fraud model to obtain a target anti-fraud model, wherein the model compression processing at least comprises pruning processing and/or quantification processing.
Optionally, the model compression processing module is configured to:
and acquiring a plurality of weights corresponding to the initial anti-fraud model, determining the weight to be pruned based on the magnitude of the weight, and setting the weight to be pruned to be zero.
Optionally, the model compression processing module is configured to:
and converting the weight corresponding to the initial anti-fraud model from a floating-point real number into an integer number.
The information identification device provided by the embodiment of the invention can execute the information identification method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
EXAMPLE five
FIG. 6 illustrates a block diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 6, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The processor 11 performs the various methods and processes described above, such as an information identification method.
In some embodiments, the information identification method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the information identification method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the information identification method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An information identification method, comprising:
acquiring information to be identified, and determining characteristics to be identified corresponding to the information to be identified;
inputting the features to be recognized into a pre-trained target anti-fraud model to obtain an information recognition result corresponding to the information to be recognized, wherein the target anti-fraud model is obtained by training based on sample information and an expected recognition result corresponding to the sample information, and the training process of the target anti-fraud model comprises an operation of executing model compression processing;
and determining whether the information to be identified is fraud information or not based on the information identification result.
2. The method according to claim 1, wherein the determining the feature to be identified corresponding to the information to be identified comprises:
determining a plurality of preset features corresponding to the information to be identified;
for each preset feature, determining the information quantity of the associated information corresponding to the information to be identified based on the preset feature, and obtaining a feature parameter corresponding to the preset feature based on the information quantity;
analyzing the information to be identified based on a plurality of preset features to respectively obtain a feature parameter corresponding to each preset feature;
and constructing the feature to be identified corresponding to the information to be identified based on the feature parameter corresponding to each preset feature.
3. The method according to claim 2, wherein the obtaining of the feature parameter corresponding to the preset feature based on the information amount comprises:
taking the information quantity as a characteristic parameter corresponding to the preset characteristic; alternatively, the first and second electrodes may be,
and when the information quantity is zero, adopting a first numerical value as the characteristic parameter corresponding to the preset characteristic, and when the information quantity is more than zero, adopting a second numerical value as the characteristic parameter corresponding to the preset characteristic.
4. The method according to claim 2, wherein the constructing the feature to be identified corresponding to the information to be identified based on the feature parameter corresponding to each preset feature comprises:
and normalizing the characteristic parameters corresponding to each preset characteristic, and constructing the characteristic to be identified corresponding to the information to be identified by taking the normalized characteristic parameters as vector elements of the characteristic to be identified, wherein the vector elements of the characteristic to be identified are arranged according to a preset sequence.
5. The method according to claim 1, wherein before the inputting the feature to be recognized into the pre-trained target anti-fraud model, further comprising:
training a pre-established full-connection model based on sample information and an expected recognition result corresponding to the sample information to obtain an initial anti-fraud model;
and performing model compression processing on the initial anti-fraud model to obtain a target anti-fraud model, wherein the model compression processing at least comprises pruning processing and/or quantification processing.
6. The method of claim 5, wherein said pruning said initial anti-fraud model comprises:
and acquiring a plurality of weights corresponding to the initial anti-fraud model, determining the weight to be pruned based on the amplitude of the weight, and setting the weight to be pruned to be zero.
7. The method of claim 5, wherein said quantifying said initial anti-fraud model comprises:
and converting the weight corresponding to the initial anti-fraud model from a floating-point real number into an integer number.
8. An information identifying apparatus, comprising:
the characteristic extraction module is used for acquiring information to be identified and determining the characteristic to be identified corresponding to the information to be identified;
the model processing module is used for inputting the features to be recognized into a pre-trained target anti-fraud model to obtain an information recognition result corresponding to the information to be recognized, wherein the target anti-fraud model is obtained by training based on sample information and an expected recognition result corresponding to the sample information, and the training process of the target anti-fraud model comprises an operation of executing model compression processing;
and the information identification module is used for determining whether the information to be identified is fraud information or not based on the information identification result.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the information identification method of any one of claims 1-7.
10. A computer-readable storage medium storing computer instructions for causing a processor to implement the information identification method according to any one of claims 1 to 7 when executed.
CN202211317594.XA 2022-10-26 2022-10-26 Information identification method and device, electronic equipment and storage medium Pending CN115601042A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211317594.XA CN115601042A (en) 2022-10-26 2022-10-26 Information identification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211317594.XA CN115601042A (en) 2022-10-26 2022-10-26 Information identification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115601042A true CN115601042A (en) 2023-01-13

Family

ID=84851194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211317594.XA Pending CN115601042A (en) 2022-10-26 2022-10-26 Information identification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115601042A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116205664A (en) * 2023-04-28 2023-06-02 成都新希望金融信息有限公司 Intermediary fraud identification method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116205664A (en) * 2023-04-28 2023-06-02 成都新希望金融信息有限公司 Intermediary fraud identification method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN114282670A (en) Neural network model compression method, device and storage medium
US20220374678A1 (en) Method for determining pre-training model, electronic device and storage medium
CN112949767A (en) Sample image increment, image detection model training and image detection method
CN114580916A (en) Enterprise risk assessment method and device, electronic equipment and storage medium
CN115601042A (en) Information identification method and device, electronic equipment and storage medium
CN117011025A (en) Credit risk prediction method, apparatus, device, storage medium and program product
CN116757476A (en) Method and device for constructing risk prediction model and method and device for risk prevention and control
CN116342164A (en) Target user group positioning method and device, electronic equipment and storage medium
CN114443896B (en) Data processing method and method for training predictive model
CN115759283A (en) Model interpretation method and device, electronic equipment and storage medium
CN114997419A (en) Updating method and device of rating card model, electronic equipment and storage medium
CN114999665A (en) Data processing method and device, electronic equipment and storage medium
CN113807391A (en) Task model training method and device, electronic equipment and storage medium
CN111429257A (en) Transaction monitoring method and device
CN116862020A (en) Training method of text classification model, text classification method and device
CN115482422A (en) Deep learning model training method, image processing method and device
CN118134590A (en) Information transmission method, device, equipment and storage medium
CN117609723A (en) Object identification method and device, electronic equipment and storage medium
CN115758142A (en) Deep learning model training method, data processing method and device
CN115034893A (en) Deep learning model training method, risk assessment method and device
CN116192608A (en) Cloud mobile phone fault prediction method, device and equipment
CN117522143A (en) Method, device, equipment and storage medium for determining risk level
CN114912541A (en) Classification method, classification device, electronic equipment and storage medium
CN116521977A (en) Product recommendation method, device, equipment and medium
CN115983445A (en) PUE prediction method, and training method, device and equipment of PUE prediction model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination