CN111651500A - User identity recognition method, electronic device and storage medium - Google Patents

User identity recognition method, electronic device and storage medium Download PDF

Info

Publication number
CN111651500A
CN111651500A CN202010476727.2A CN202010476727A CN111651500A CN 111651500 A CN111651500 A CN 111651500A CN 202010476727 A CN202010476727 A CN 202010476727A CN 111651500 A CN111651500 A CN 111651500A
Authority
CN
China
Prior art keywords
data
sample data
user
negative sample
positive sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010476727.2A
Other languages
Chinese (zh)
Inventor
张惠玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202010476727.2A priority Critical patent/CN111651500A/en
Publication of CN111651500A publication Critical patent/CN111651500A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A user identification method comprises the following steps: receiving a personal declaration request which is input by a user and carries original data; extracting a plurality of first characteristic data which meet the declaration requirement from the original data; mining a plurality of first characteristic data to obtain a plurality of newly added second characteristic data; inputting the first characteristic data and the second characteristic data into a plurality of fusion models trained in advance to obtain an output result of each fusion model, wherein the fusion models are used for carrying out secondary classification on the legality of the personal declaration request; and according to the output results, carrying out validity prediction on the personal declaration request so as to identify the user. The invention also provides an electronic device and a storage medium. The invention can accurately identify the identity of the user.

Description

User identity recognition method, electronic device and storage medium
Technical Field
The invention relates to the technical field of intelligent terminals, in particular to a user identity identification method, electronic equipment and a storage medium.
Background
At present, with the development and popularization of network technology, the work of various industries is developed more and more without leaving the network. Typically, a user needs to fill in information on the network and submit relevant material to enable the information to be transmitted over the network. And various industries realize information interaction through a network.
However, due to the complexity of the network, when a user submits various types of data online, the identity of the user (a legal user or an illegal user) cannot be accurately identified, and if the data submitted by the illegal user is transmitted on the network, adverse effects are easily caused.
Therefore, how to accurately identify the identity of the user is an urgent technical problem to be solved.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a user identification method, an electronic device and a storage medium, which can accurately identify the identity of a user.
A first aspect of the present invention provides a user identity recognition method, including:
receiving a personal declaration request which is input by a user and carries original data;
extracting a plurality of first characteristic data which meet the declaration requirement from the original data;
mining a plurality of first characteristic data to obtain a plurality of newly added second characteristic data;
inputting the first characteristic data and the second characteristic data into a plurality of fusion models trained in advance to obtain an output result of each fusion model, wherein the fusion models are used for carrying out secondary classification on the legality of the personal declaration request;
and according to the output results, carrying out validity prediction on the personal declaration request so as to identify the user.
In a possible implementation manner, the mining the plurality of first feature data, and obtaining a plurality of newly added second feature data includes:
determining the personal declaration project required by the personal declaration request;
acquiring key index parameters matched with the personal declaration project;
and mining a plurality of first feature data based on the key index parameters to obtain newly added second feature data matched with the key index parameters.
In a possible implementation manner, the mining the plurality of first feature data, and obtaining a plurality of newly added second feature data includes:
classifying the plurality of first feature data according to a preset dimension to obtain the first feature data of a plurality of dimensions;
acquiring a data mining algorithm of each dimension aiming at the first characteristic data of the dimension;
and mining the first feature data of the dimension according to the data mining algorithm to obtain newly added second feature data.
In a possible implementation manner, the inputting the plurality of first feature data and the plurality of second feature data into a plurality of fusion models trained in advance, and obtaining an output result of each fusion model includes:
inputting a plurality of first feature data and a plurality of second feature data into a plurality of fusion models trained in advance;
judging the validity of the original data according to the plurality of first characteristic data and the plurality of second characteristic data through each fusion model;
if the original data is valid, outputting an output result for representing that the personal declaration request is legal; or
And if the original data is invalid, outputting an output result for indicating that the personal declaration request is illegal.
In a possible implementation manner, the predicting the validity of the personal declaration request according to the plurality of output results to identify the user includes:
judging whether an output result for representing that the personal declaration request belongs to an illegal request exists in the output results;
if the output results which are used for representing that the personal declaration request belongs to an illegal request exist in the output results, determining that the user is an illegal user;
and if the output results do not exist in the plurality of output results and are used for representing that the personal declaration request belongs to an illegal request, determining that the user is a legal user.
In a possible implementation manner, before receiving a personal declaration request carrying original data input by a user, the user identification method further includes:
acquiring positive sample data of a legal user and negative sample data of an illegal user;
judging whether the proportional value of the quantity of the positive sample data and the quantity of the negative sample data is greater than or equal to a preset threshold value;
if the proportional value of the number of the positive sample data and the number of the negative sample data is greater than or equal to a preset threshold value, adopting an over-sampling negative sample strategy to repeatedly sample the negative sample data so as to keep the number of the repeatedly sampled negative sample data consistent with the number of the positive sample data;
and respectively inputting the positive sample data and the repeatedly sampled negative sample data into a plurality of initial model frames of different types for training to obtain a plurality of trained fusion models.
In a possible implementation manner, before receiving a personal declaration request carrying original data input by a user, the user identification method further includes:
acquiring positive sample data of a legal user and negative sample data of an illegal user;
judging whether the proportional value of the quantity of the positive sample data and the quantity of the negative sample data is greater than or equal to a preset threshold value;
if the proportional value of the number of the positive sample data and the number of the negative sample data is greater than or equal to a preset threshold value, randomly sampling the positive sample data by adopting an under-sampling positive sample strategy so as to keep the number of the randomly sampled positive sample data consistent with the number of the negative sample data;
and respectively inputting the positive sample data and the negative sample data after random sampling into a plurality of initial model frames of different types for training to obtain a plurality of trained fusion models.
In a possible implementation manner, before receiving a personal declaration request carrying original data input by a user, the user identification method further includes:
acquiring positive sample data of a legal user and negative sample data of an illegal user;
judging whether the proportional value of the quantity of the positive sample data and the quantity of the negative sample data is greater than or equal to a preset threshold value;
if the proportional value of the number of the positive sample data and the number of the negative sample data is greater than or equal to a preset threshold value, determining a positive sample weight and a negative sample weight according to the number of the positive sample data and the number of the negative sample data;
setting the positive sample weights and the negative sample weights in a plurality of different types of initial model frames;
and respectively inputting the positive sample data and the negative sample data into a plurality of initial model frames with weights set for training to obtain a plurality of trained fusion models.
A third aspect of the present invention provides an electronic device comprising a processor and a memory, wherein the processor is configured to implement the user identification method when executing a computer program stored in the memory.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the user identification method.
According to the technical scheme, after the original data input by the user is received, the first characteristic data can be extracted from the original data to mine the first characteristic data to obtain the newly-added second characteristic data, so that the number of the characteristic data can be increased, enough characteristic data are used for detecting the model, the first characteristic data and the second characteristic data are input into a plurality of fusion models, the accuracy of model identification can be increased, the identity of the user can be accurately identified, and the identification accuracy is improved.
Drawings
Fig. 1 is a flowchart of a method for identifying a user identity according to a preferred embodiment of the present invention.
Fig. 2 is a functional block diagram of a user identification apparatus according to a preferred embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an electronic device implementing a user identification method according to a preferred embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first" and "second" in the description and claims of the present application and the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order, nor should they be construed to indicate or imply the relative importance thereof or the number of technical features indicated. It will be appreciated that the data so used are interchangeable under appropriate circumstances such that the embodiments described herein are capable of operation in sequences other than those illustrated or otherwise described herein, and that the features defined as "first" and "second" may explicitly or implicitly include at least one such feature.
Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
Fig. 1 is a flowchart of a method for identifying a user identity according to a preferred embodiment of the present invention. The order of the steps in the flowchart may be changed, and some steps may be omitted.
And S11, receiving a personal declaration request carrying the original data input by the user.
The original data may include personal basic information of the user (such as user name, user age) and declaration information matched with the personal declaration request, such as: the personal declaration request is used for requesting to declare the personal allowance, and the declaration information can be data related to the personal allowance; for another example: the personal declaration request is for requesting declaration of personal authentication, and the declaration information may be material related to the personal authentication.
And S12, extracting a plurality of first characteristic data which meet the declaration requirement from the original data.
Some original data input by the user may be in accordance with requirements, some original data input by the user may not be in accordance with requirements, the original data can be screened one by one according to the declaration requirements, and first feature data in accordance with the declaration requirements are extracted from the original data. Such as: the personal declaration request is used for requesting to declare personal allowance, and the first characteristic data meeting the requirement can be occupation, work location, annual income of the person, expense of the person and the like.
Optionally, missing value processing, abnormal value processing, data type conversion, new variable combination, data standardization processing and the like may be performed on the original data.
And S13, mining the plurality of first characteristic data to obtain a plurality of newly added second characteristic data.
The original data input by the user is limited, and the extracted first feature data is more limited. In order to identify the identity of the user more accurately, more implicit feature data needs to be mined out based on limited feature data. The first characteristic data can be directly calculated to obtain new information which is not mined in the original data and is used as the newly added second characteristic data. Alternatively, the second feature data related to the first feature data may be acquired from the network by web crawler technology based on the first feature data.
Such as: the first feature data is the income of the individual in the last months, and an average value can be calculated to obtain an average value as the second feature data, such as: the first characteristic data is the occupation where the individual is engaged in, the work location, and the data such as the individual salary range and the average salary about the occupation can be mined and acquired from the network in the area where the work location belongs as the second characteristic data.
It should be noted that, the newly added second feature data belongs to the information implicit in the first feature data, and the deficiency of the first feature data can be made up by mining the plurality of first feature data.
Specifically, the mining the plurality of first feature data to obtain a plurality of newly added second feature data includes:
determining the personal declaration project required by the personal declaration request;
acquiring key index parameters matched with the personal declaration project;
and mining a plurality of first feature data based on the key index parameters to obtain newly added second feature data matched with the key index parameters.
Generally, some key index parameters may be preset for a personal declaration project, for example, whether the declaration is successful or not may be preset, where the key index parameters may be set as an average annual income of an individual, and if a plurality of first feature data are annual incomes of users in recent years, an average value of the plurality of first feature data may be calculated to obtain the average annual income of the users in recent years, and the average annual income is used as the second feature data.
Specifically, the mining the plurality of first feature data to obtain a plurality of newly added second feature data includes:
classifying the plurality of first feature data according to a preset dimension to obtain the first feature data of a plurality of dimensions;
acquiring a data mining algorithm of each dimension aiming at the first characteristic data of the dimension;
and mining the first feature data of the dimension according to the data mining algorithm to obtain newly added second feature data.
A plurality of preset dimensions, such as a time dimension, a region dimension, and the like, may be preset. Assuming that data which changes with time exists in the plurality of first feature data, for example, data of the latest months of the user, and assuming that data of different regions exist in the plurality of first feature data, the plurality of first feature data may be classified according to preset dimensions, and a corresponding data mining algorithm may be obtained, for example, variance calculation may be performed on the first feature data of the time dimension, mean calculation may be performed on the first feature data of the region dimension, and the calculated result may be used as the newly added second feature data.
And S14, inputting the first feature data and the second feature data into a plurality of fusion models trained in advance to obtain an output result of each fusion model, wherein the fusion models are used for carrying out secondary classification on the legality of the personal declaration request.
Wherein, the output result of each fusion model can be represented by a identifier of two categories, for example, the identifier "1" represents that the personal declaration request belongs to an illegal request, and the user is an illegal user; the label "0" indicates that the personal declaration request belongs to a legitimate request, and the user is a legitimate user.
Specifically, the inputting the plurality of first feature data and the plurality of second feature data into a plurality of fusion models trained in advance, and obtaining an output result of each fusion model includes:
inputting a plurality of first feature data and a plurality of second feature data into a plurality of fusion models trained in advance;
judging the validity of the original data according to the plurality of first characteristic data and the plurality of second characteristic data through each fusion model;
if the original data is valid, outputting an output result for representing that the personal declaration request is legal; or
And if the original data is invalid, outputting an output result for indicating that the personal declaration request is illegal.
In which intentionally providing data fraud, data not ever generated by compilation or intentionally generated data and other data recognized by authorities as illegal information are considered invalid data and, correspondingly, requests for personal declaration are considered illegal requests.
For example, assuming that the first feature data is a job, a work location, a personal monthly income, a personal monthly expense, and the like, where the person is engaged, and mining, obtaining data about a personal salary range, an average salary, a living expense, and the like of the job as second feature data from an area where the work location belongs on a network, and comparing the first feature data with the second feature data through a fusion model, so as to find that the first feature data is seriously inconsistent with the second feature data, for example, the personal monthly income in the first feature data is far beyond the personal salary range in the second feature data, so that it can be indicated that the original data corresponding to the first feature data is invalid, and otherwise, if the first feature data is consistent with the second feature data, it is indicated that the original data corresponding to the first feature data is valid.
In this case, after the plurality of first feature data and the plurality of second feature data are input into the plurality of fusion models trained in advance, since the analysis logic of each fusion model is different, the output result of each fusion model may be the same or different. If any one of the fusion models is analyzed according to the plurality of first characteristic data and the plurality of second characteristic data, and the first characteristic data is judged to be not in accordance with the requirements, it is indicated that the original data corresponding to the first characteristic data is also in accordance with the requirements, that is, the original data is invalid.
According to the scheme, the validity of the original data can be identified by extracting the first characteristic data from the original data, mining the second characteristic data according to the first characteristic data and combining the two-classification judgment of the fusion model, so that the validity of the personal declaration request can be determined, and the validity of the identity of the user who makes the personal declaration request can be effectively inferred.
As an optional implementation manner, before step S11, the user identification method further includes:
acquiring positive sample data of a legal user and negative sample data of an illegal user;
judging whether the proportional value of the quantity of the positive sample data and the quantity of the negative sample data is greater than or equal to a preset threshold value;
if the proportional value of the number of the positive sample data and the number of the negative sample data is greater than or equal to a preset threshold value, adopting an over-sampling negative sample strategy to repeatedly sample the negative sample data so as to keep the number of the repeatedly sampled negative sample data consistent with the number of the positive sample data;
and respectively inputting the positive sample data and the repeatedly sampled negative sample data into a plurality of initial model frames of different types for training to obtain a plurality of trained fusion models.
In this alternative embodiment, a preset threshold may be preset, and the preset threshold is used to measure the balance between the numbers of positive and negative sample data. If the ratio of the number of the positive sample data to the number of the negative sample data is greater than or equal to a preset threshold value, it is indicated that the number of the positive sample data far exceeds the number of the negative sample data, that is, the number of the positive sample data and the number of the negative sample data are in a state of serious imbalance.
The strategy of oversampling negative samples can be adopted so that the number of the two is balanced. Specifically, the negative sample data may be repeatedly sampled, that is, the negative sample data may be repeatedly copied several times to obtain more negative sample data, so that the number of the repeatedly sampled negative sample data is consistent with the number of the positive sample data, and thus, the positive sample data and the repeatedly sampled negative sample data are respectively input into multiple different types of initial model frames for training, and the problem of inaccurate trained model due to serious imbalance between the number of the positive sample and the number of the negative sample does not occur.
As an optional implementation manner, before step S11, the user identification method further includes:
acquiring positive sample data of a legal user and negative sample data of an illegal user;
judging whether the proportional value of the quantity of the positive sample data and the quantity of the negative sample data is greater than or equal to a preset threshold value;
if the proportional value of the number of the positive sample data and the number of the negative sample data is greater than or equal to a preset threshold value, randomly sampling the positive sample data by adopting an under-sampling positive sample strategy so as to keep the number of the randomly sampled positive sample data consistent with the number of the negative sample data;
and respectively inputting the positive sample data and the negative sample data after random sampling into a plurality of initial model frames of different types for training to obtain a plurality of trained fusion models.
In this alternative embodiment, a preset threshold may be preset, and the preset threshold is used to measure the balance between the numbers of positive and negative sample data. If the ratio of the number of the positive sample data to the number of the negative sample data is greater than or equal to a preset threshold value, it is indicated that the number of the positive sample data far exceeds the number of the negative sample data, that is, the number of the positive sample data and the number of the negative sample data are in a state of serious imbalance.
A strategy of undersampling the positive samples may be employed such that the number of the two is balanced. Specifically, the positive sample data may be randomly sampled, that is, a part of data may be randomly selected from the positive sample data to reduce the number of the positive sample data, so that the number of the positive sample data after random sampling is consistent with the number of the negative sample data, and thus, the positive sample data and the negative sample data after random sampling are respectively input into multiple different types of initial model frames for training, and the problem of inaccurate trained model due to serious imbalance between the number of the positive sample data and the number of the negative sample data does not occur.
As an optional implementation manner, before step S11, the user identification method further includes:
acquiring positive sample data of a legal user and negative sample data of an illegal user;
judging whether the proportional value of the quantity of the positive sample data and the quantity of the negative sample data is greater than or equal to a preset threshold value;
if the proportional value of the number of the positive sample data and the number of the negative sample data is greater than or equal to a preset threshold value, determining a positive sample weight and a negative sample weight according to the number of the positive sample data and the number of the negative sample data;
setting the positive sample weights and the negative sample weights in a plurality of different types of initial model frames;
and respectively inputting the positive sample data and the negative sample data into a plurality of initial model frames with weights set for training to obtain a plurality of trained fusion models.
In this alternative embodiment, a preset threshold may be preset, and the preset threshold is used to measure the balance between the numbers of positive and negative sample data. If the ratio of the number of the positive sample data to the number of the negative sample data is greater than or equal to a preset threshold value, it is indicated that the number of the positive sample data far exceeds the number of the negative sample data, that is, the number of the positive sample data and the number of the negative sample data are in a state of serious imbalance.
Because the processing of positive and negative sample data presents a state of serious imbalance, when performing model training, a positive sample weight and a negative sample weight need to be set in the loss functions of a plurality of different types of initial model frames according to the number of the positive sample data and the number of the negative sample data, specifically, if the number of the positive sample data is far greater than the number of the negative sample data, when setting the weights, the value of the positive sample weight is far smaller than the value of the negative sample weight, and conversely, if the number of the positive sample data is far smaller than the number of the negative sample data, when setting the weights, the value of the positive sample weight is far greater than the value of the negative sample weight. Through such setting, can be when the model training, the reality of laminating better for the model that trains out at last is more accurate.
The method comprises the following steps that a plurality of initial model frameworks of different types such as xgboost/gbdt/lr/scorecard are adopted, wherein the gboot can customize a loss function, so long as the function is first-order and second-order conductible, the weight of the initial model framework for different sample categories can be different during model training by adjusting the form and parameter values of a cost function, and the gbdt can also customize the loss function as long as the function is first-order conductible; the lr logistic regression algorithm is generally selected as a logarithmic loss function; the scorecard scoring card model is a linear regression model in nature after binning of variables, and different loss functions such as square loss, absolute loss and the like can be selected as well.
And S15, according to the output results, carrying out legality prediction on the personal declaration request so as to identify the user.
Specifically, the predicting the validity of the personal declaration request according to the plurality of output results to identify the user includes:
judging whether an output result for representing that the personal declaration request belongs to an illegal request exists in the output results;
if the output results which are used for representing that the personal declaration request belongs to an illegal request exist in the output results, determining that the user is an illegal user;
and if the output results do not exist in the plurality of output results and are used for representing that the personal declaration request belongs to an illegal request, determining that the user is a legal user.
In the multiple output results, if there is one output result indicating that the personal declaration request belongs to an illegal request, for example, there are 1, 2, 3, or 4 output results indicating that the personal declaration request belongs to an illegal request, the user may be determined as an illegal user, otherwise, if there is no output result indicating that the personal declaration request belongs to an illegal request, for example, all of the 4 output results indicate that the personal declaration request belongs to a legal request, the user may be determined as a legal user.
In the scheme, the validity of the personal declaration request is predicted through a plurality of output results of a plurality of fusion models, so that the prediction error caused by the defect of one fusion model can be avoided, and the prediction accuracy can be improved.
In the method flow described in fig. 1, after receiving the original data input by the user, the first feature data may be extracted from the original data to mine the first feature data, and the newly added second feature data is obtained, so that the number of the feature data can be increased, and sufficient feature data is used for detecting the model.
The above description is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and it will be apparent to those skilled in the art that modifications may be made without departing from the inventive concept of the present invention, and these modifications are within the scope of the present invention.
Fig. 2 is a functional block diagram of a user identification apparatus according to a preferred embodiment of the present invention.
In some embodiments, the user identification device is operated in an electronic device. The user identification means may comprise a plurality of functional modules consisting of program code segments. The program codes of the program segments in the user identification apparatus may be stored in the memory and executed by the at least one processor to perform some or all of the steps in the user identification method described in fig. 1, which please refer to the related description in fig. 1, and are not described herein again.
In this embodiment, the user identification apparatus may be divided into a plurality of functional modules according to the functions executed by the user identification apparatus. The functional module may include: a receiving module 201, an extracting module 202, a mining module 203, an inputting module 204 and a predicting module 205. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory.
The receiving module 201 is configured to receive a personal declaration request carrying original data input by a user.
An extracting module 202, configured to extract a plurality of first feature data meeting the declaration requirement from the raw data.
And the mining module 203 is configured to mine the plurality of first feature data to obtain a plurality of newly added second feature data.
An input module 204, configured to input the plurality of first feature data and the plurality of second feature data into a plurality of fusion models trained in advance, and obtain an output result of each fusion model, where the fusion model is used to perform a second classification on the validity of the personal declaration request.
The predicting module 205 is configured to perform validity prediction on the personal declaration request according to the plurality of output results, so as to perform identity recognition on the user.
In the user identification apparatus described in fig. 2, after receiving the original data input by the user, the first feature data may be extracted from the original data to mine the first feature data, and the newly added second feature data is obtained, so that the number of the feature data can be increased, and sufficient feature data is used for detecting the model.
Fig. 3 is a schematic structural diagram of an electronic device implementing a user identification method according to a preferred embodiment of the present invention. The electronic device 3 comprises a memory 31, at least one processor 32, a computer program 33 stored in the memory 31 and executable on the at least one processor 32, and at least one communication bus 34.
Those skilled in the art will appreciate that the schematic diagram shown in fig. 3 is merely an example of the electronic device 3, and does not constitute a limitation of the electronic device 3, and may include more or less components than those shown, or combine some components, or different components, for example, the electronic device 3 may further include an input/output device, a network access device, and the like.
The electronic device 3 is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like. The electronic device may also include a network device and/or a user device. The network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a cloud computing (cloud computing) based cloud consisting of a large number of hosts or network servers. The user device includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), or the like.
The at least one Processor 32 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The processor 32 may be a microprocessor or the processor 32 may be any conventional processor or the like, and the processor 32 is a control center of the electronic device 3 and connects various parts of the whole electronic device 3 by various interfaces and lines.
The memory 31 may be used to store the computer program 33 and/or the module/unit, and the processor 32 may implement various functions of the electronic device 3 by running or executing the computer program and/or the module/unit stored in the memory 31 and calling data stored in the memory 31. The memory 31 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data) created according to the use of the electronic device 3, and the like. Further, the memory 31 may include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other non-volatile solid state storage device.
With reference to fig. 1, the memory 31 of the electronic device 3 stores a plurality of instructions to implement a user identification method, and the processor 32 executes the plurality of instructions to implement:
receiving a personal declaration request which is input by a user and carries original data;
extracting a plurality of first characteristic data which meet the declaration requirement from the original data;
mining a plurality of first characteristic data to obtain a plurality of newly added second characteristic data;
inputting the first characteristic data and the second characteristic data into a plurality of fusion models trained in advance to obtain an output result of each fusion model, wherein the fusion models are used for carrying out secondary classification on the legality of the personal declaration request;
and according to the output results, carrying out validity prediction on the personal declaration request so as to identify the user.
Specifically, the processor 32 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the instruction, which is not described herein again.
In the electronic device 3 described in fig. 3, after receiving the original data input by the user, the first feature data may be extracted from the original data to mine the first feature data, and the newly added second feature data is obtained, so that the number of the feature data can be increased, and sufficient feature data is used for detecting the model.
The integrated modules/units of the electronic device 3 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, and Read-Only Memory (ROM).
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A user identification method is characterized by comprising the following steps:
receiving a personal declaration request which is input by a user and carries original data;
extracting a plurality of first characteristic data which meet the declaration requirement from the original data;
mining a plurality of first characteristic data to obtain a plurality of newly added second characteristic data;
inputting the first characteristic data and the second characteristic data into a plurality of fusion models trained in advance to obtain an output result of each fusion model, wherein the fusion models are used for carrying out secondary classification on the legality of the personal declaration request;
and according to the output results, carrying out validity prediction on the personal declaration request so as to identify the user.
2. The method according to claim 1, wherein the mining the plurality of first feature data to obtain a plurality of newly added second feature data comprises:
determining the personal declaration project required by the personal declaration request;
acquiring key index parameters matched with the personal declaration project;
and mining a plurality of first feature data based on the key index parameters to obtain newly added second feature data matched with the key index parameters.
3. The method according to claim 1, wherein the mining the plurality of first feature data to obtain a plurality of newly added second feature data comprises:
classifying the plurality of first feature data according to a preset dimension to obtain the first feature data of a plurality of dimensions;
acquiring a data mining algorithm of each dimension aiming at the first characteristic data of the dimension;
and mining the first feature data of the dimension according to the data mining algorithm to obtain newly added second feature data.
4. The method according to claim 1, wherein the inputting the plurality of first feature data and the plurality of second feature data into a plurality of fusion models trained in advance, and obtaining the output result of each fusion model comprises:
inputting a plurality of first feature data and a plurality of second feature data into a plurality of fusion models trained in advance;
judging the validity of the original data according to the plurality of first characteristic data and the plurality of second characteristic data through each fusion model;
if the original data is valid, outputting an output result for representing that the personal declaration request is legal; or
And if the original data is invalid, outputting an output result for indicating that the personal declaration request is illegal.
5. The method of claim 1, wherein the predicting the validity of the personal declaration request according to the plurality of output results to identify the user comprises:
judging whether an output result for representing that the personal declaration request belongs to an illegal request exists in the output results;
if the output results which are used for representing that the personal declaration request belongs to an illegal request exist in the output results, determining that the user is an illegal user;
and if the output results do not exist in the plurality of output results and are used for representing that the personal declaration request belongs to an illegal request, determining that the user is a legal user.
6. The method according to any one of claims 1 to 5, wherein before receiving the personal declaration request carrying the original data input by the user, the method further comprises:
acquiring positive sample data of a legal user and negative sample data of an illegal user;
judging whether the proportional value of the quantity of the positive sample data and the quantity of the negative sample data is greater than or equal to a preset threshold value;
if the proportional value of the number of the positive sample data and the number of the negative sample data is greater than or equal to a preset threshold value, adopting an over-sampling negative sample strategy to repeatedly sample the negative sample data so as to keep the number of the repeatedly sampled negative sample data consistent with the number of the positive sample data;
and respectively inputting the positive sample data and the repeatedly sampled negative sample data into a plurality of initial model frames of different types for training to obtain a plurality of trained fusion models.
7. The method according to any one of claims 1 to 5, wherein before receiving the personal declaration request carrying the original data input by the user, the method further comprises:
acquiring positive sample data of a legal user and negative sample data of an illegal user;
judging whether the proportional value of the quantity of the positive sample data and the quantity of the negative sample data is greater than or equal to a preset threshold value;
if the proportional value of the number of the positive sample data and the number of the negative sample data is greater than or equal to a preset threshold value, randomly sampling the positive sample data by adopting an under-sampling positive sample strategy so as to keep the number of the randomly sampled positive sample data consistent with the number of the negative sample data;
and respectively inputting the positive sample data and the negative sample data after random sampling into a plurality of initial model frames of different types for training to obtain a plurality of trained fusion models.
8. The method according to any one of claims 1 to 5, wherein before receiving the personal declaration request carrying the original data input by the user, the method further comprises:
acquiring positive sample data of a legal user and negative sample data of an illegal user;
judging whether the proportional value of the quantity of the positive sample data and the quantity of the negative sample data is greater than or equal to a preset threshold value;
if the proportional value of the number of the positive sample data and the number of the negative sample data is greater than or equal to a preset threshold value, determining a positive sample weight and a negative sample weight according to the number of the positive sample data and the number of the negative sample data;
setting the positive sample weights and the negative sample weights in a plurality of different types of initial model frames;
and respectively inputting the positive sample data and the negative sample data into a plurality of initial model frames with weights set for training to obtain a plurality of trained fusion models.
9. An electronic device, characterized in that the electronic device comprises a processor and a memory, the processor being configured to execute a computer program stored in the memory to implement the user identification method according to any one of claims 1 to 8.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the user identification method according to any one of claims 1 to 8.
CN202010476727.2A 2020-05-29 2020-05-29 User identity recognition method, electronic device and storage medium Pending CN111651500A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010476727.2A CN111651500A (en) 2020-05-29 2020-05-29 User identity recognition method, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010476727.2A CN111651500A (en) 2020-05-29 2020-05-29 User identity recognition method, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN111651500A true CN111651500A (en) 2020-09-11

Family

ID=72346875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010476727.2A Pending CN111651500A (en) 2020-05-29 2020-05-29 User identity recognition method, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN111651500A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112732846A (en) * 2021-01-27 2021-04-30 深圳市科荣软件股份有限公司 Water affair operation analysis system, method, electronic equipment and storage medium
CN112784888A (en) * 2021-01-12 2021-05-11 中国银联股份有限公司 User identification method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109936561A (en) * 2019-01-08 2019-06-25 平安科技(深圳)有限公司 User request detection method and device, computer equipment and storage medium
CN110163242A (en) * 2019-04-03 2019-08-23 阿里巴巴集团控股有限公司 Risk Identification Method, device and server
CN110555717A (en) * 2019-07-29 2019-12-10 华南理工大学 method for mining potential purchased goods and categories of users based on user behavior characteristics

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109936561A (en) * 2019-01-08 2019-06-25 平安科技(深圳)有限公司 User request detection method and device, computer equipment and storage medium
CN110163242A (en) * 2019-04-03 2019-08-23 阿里巴巴集团控股有限公司 Risk Identification Method, device and server
CN110555717A (en) * 2019-07-29 2019-12-10 华南理工大学 method for mining potential purchased goods and categories of users based on user behavior characteristics

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784888A (en) * 2021-01-12 2021-05-11 中国银联股份有限公司 User identification method, device, equipment and storage medium
CN112732846A (en) * 2021-01-27 2021-04-30 深圳市科荣软件股份有限公司 Water affair operation analysis system, method, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112669138B (en) Data processing method and related equipment
CN108876213B (en) Block chain-based product management method, device, medium and electronic equipment
CN112181835B (en) Automatic test method, device, computer equipment and storage medium
CN111931047B (en) Artificial intelligence-based black product account detection method and related device
CN113763057A (en) User identity portrait data processing method and device
CN111651500A (en) User identity recognition method, electronic device and storage medium
CN110764999A (en) Automatic testing method and device, computer device and storage medium
CN111831708A (en) Missing data-based sample analysis method and device, electronic equipment and medium
CN115577691A (en) Bidding generation method, storage medium and electronic device
CN111210321B (en) Risk early warning method and system based on contract management
CN111783871A (en) Abnormal data identification method based on supervised learning model and related equipment
CN118154186A (en) Method, device and server for determining abnormal operation of transaction service
CN112181482B (en) Version verification method and device, electronic equipment and storage medium
CN111784360B (en) Anti-fraud prediction method and system based on network link backtracking
CN117495544A (en) Sandbox-based wind control evaluation method, sandbox-based wind control evaluation system, sandbox-based wind control evaluation terminal and storage medium
CN112966965A (en) Import and export big data analysis and decision method, device, equipment and storage medium
CN115577983B (en) Enterprise task matching method based on block chain, server and storage medium
CN112085611A (en) Asynchronous data verification method and device, electronic equipment and storage medium
CN111242779A (en) Financial data characteristic selection and prediction method, device, equipment and storage medium
CN114595216A (en) Data verification method and device, storage medium and electronic equipment
CN115037790A (en) Abnormal registration identification method, device, equipment and storage medium
CN114840668A (en) Network text auditing method, electronic equipment and storage medium
CN106169158A (en) A kind of finance account assets information analysis and Control system and method
CN111612023A (en) Classification model construction method and device
CN114020687B (en) User retention analysis method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination