Disclosure of Invention
The embodiment of the application aims to provide schemes for realizing equipment feature extraction in wind control management.
In order to solve the above technical problem, the embodiment of the present application is implemented as follows:
, embodiments of the present specification provide methods for extracting features of a device, including:
acquiring equipment information and environment information of equipment;
performing numerical value conversion on the equipment information to generate a corresponding equipment information numerical value;
aggregating the device information values to generate device identifiers, wherein the device identifiers are used for identifying devices of the same type;
generating a feature vector containing the equipment identifier and the environmental information, and determining the service processing times corresponding to the feature vector under the same time window;
and determining the service processing times corresponding to the feature vector as the feature value of the training sample corresponding to the equipment so as to train the equipment risk identification model.
In another aspect, an embodiment of the present specification further provides risk identification methods for devices based on risk identification models, including:
acquiring equipment information to be detected and environment information of the equipment;
performing numerical value conversion on the equipment information to generate a corresponding equipment information numerical value;
aggregating the device information values to generate device identifiers, wherein the device identifiers are used for identifying devices of the same type;
generating a characteristic vector containing the equipment identifier and the environmental information, acquiring the service processing times corresponding to the characteristic vector under the same time window, and determining the service processing times as a characteristic value of the characteristic vector of the equipment to be detected;
and based on the characteristic value of the characteristic vector of the equipment to be detected, adopting the equipment risk identification model to evaluate the risk degree of the equipment to be detected.
Correspondingly, the embodiment of the present specification further provides kinds of feature extraction apparatuses for devices, including:
the acquisition module acquires equipment information and environment information of the equipment;
the conversion module is used for carrying out numerical value conversion on the equipment information to generate a corresponding equipment information numerical value;
the aggregation module aggregates the device information numerical values to generate device identifiers, and the device identifiers are used for identifying devices of the same type;
the generating module generates a feature vector containing the equipment identifier and the environmental information and determines the service processing times corresponding to the feature vector under the same time window;
and the characteristic value determining module is used for determining the service processing times corresponding to the characteristic vector as the characteristic value of the training sample corresponding to the equipment so as to train the equipment risk identification model.
Corresponding to another aspect, embodiments of the present specification further provide apparatus for identifying risk of equipment based on an equipment risk identification model, including:
the acquisition module acquires the information of the equipment to be detected and the environmental information of the equipment;
the conversion module is used for carrying out numerical value conversion on the equipment information to generate a corresponding equipment information numerical value;
the aggregation module aggregates the device information numerical values to generate device identifiers, and the device identifiers are used for identifying devices of the same type;
the generating module is used for generating a characteristic vector containing the equipment identifier and the environmental information, acquiring the service processing times corresponding to the characteristic vector under the same time window and determining the service processing times as a characteristic value of the characteristic vector of the equipment to be detected;
and the risk identification module is used for evaluating the risk degree of the equipment to be detected by adopting the equipment risk identification model based on the characteristic value of the characteristic vector of the equipment to be detected.
Through the scheme provided by the embodiment of the specification, pieces of equipment information and environment information which are and are not easy to tamper are collected to jointly form a characteristic vector of the equipment, and the characteristic vector containing the characteristic value is generated by combining with the number of times of service processing, so that the characteristic value can be used as a training sample, model training and risk identification are performed, the identification accuracy of a risk identification model on the equipment is improved, the equipment dimension information is prevented from being broken through by a single point, and the stability and the accuracy of the equipment risk identification are integrally improved.
It is to be understood that both the foregoing -based general description and the following detailed description are exemplary and explanatory only and are not restrictive of the embodiments herein.
Moreover, it is not necessary for any of the embodiments in the present specification to achieve all of the above-described effects.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present specification, the technical solutions in the embodiments of the present specification will be described in detail below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, rather than all embodiments.
When the current marketer performs the business push , there are often a lot of push activities, such as issuing red packets, cash, vouchers, etc. in this process, the common identification device manner of the wind control system is to use the device fingerprint, for example, the local mac address, International Mobile Equipment Identity (IMEI), International Mobile Subscriber Identity (IMSI), baseband, version number, etc. are used as the device fingerprint to locate devices.
In this way, the black-office team often modifies mobile phone key parameters through a tampering tool, so that the fingerprint of the device is changed, and only characters are damaged, the black-office team can disguise an infinite number of devices through continuous tampering of a small number of devices, so that interception of a wind control system is bypassed, marketing rewards are received without limitation, and loss is caused, wherein the core problem is that the characteristics of the devices in the risk identification model are too single and are easily bypassed.
Based on this, the embodiment of the present specification provides feature extraction methods for devices, so as to be used for risk recognition model training, as shown in fig. 1, fig. 1 is a schematic flow chart of feature extraction methods for devices, which specifically includes the following steps:
s101, acquiring equipment information and environment information of the equipment.
As previously mentioned, the device may be a user terminal such as a cell phone, tablet, personal computer, etc.
In the embodiment of the present specification, the device information may include strong device information of the aforementioned IMEI, IMSI, baseband, and version number, and may also include weak device information such as a device brand, a device model, a processor frequency, a ring volume, a call volume, an alarm volume, a battery remaining capacity, a device memory remaining capacity, or a device memory card remaining capacity. In other words, the acquired device information is device information that is not easily or frequently modified by the user.
The environment information in which the device is located may include a network protocol, IP, address of the device, a media access control, MAC, address of the device, or a real physical address of the device (e.g., latitude and longitude coordinates obtained by a location module of the device).
And S103, performing numerical value conversion on the equipment information to generate a corresponding equipment information numerical value.
Specifically, for device information such as device brand, device model, processor frequency, etc., which is not changeable (where unchangeable refers to not being changed naturally during use by a user), mapping may be performed using a preset mapping table, as shown in Table 1, Table 1 is a mapping table of device brand and device information values provided by embodiments of the present specification.
Equipment brand
|
Device information value
|
Apple
|
1
|
Huawei
|
2
|
Honor
|
3
|
Vivo
|
4
|
……
|
…… |
For variable device information such as ring volume, call volume, alarm volume, remaining battery capacity, remaining device memory capacity, or remaining device memory card capacity, corresponding value conversion may be performed based on a preset algorithm based on a current value of the device information.
For example, for the variable device information, a percentage coefficient related to the device information in the device is obtained, is used to describe the remaining available proportion of the device information, furthermore, the coefficient interval described by the percentage statement is determined, and the device information value corresponding to the coefficient interval is determined according to the preset interval value corresponding relation.
For example, assuming that five intervals are equally divided from 0 to 100% in advance for the remaining proportion of the battery power and correspond to the values 1 to 5 in sequence, assuming that the battery remaining power of devices is 50%, it can be known that the percentage coefficient of 50% corresponds to the coefficient interval [0.4, 0.6], and thus the corresponding device information value is 3.
By performing interval numerical value mapping on the variable device information, the deviation caused by the micro fluctuation of the device information can be reduced, and the stability of the sample characteristics can be improved.
And S105, aggregating the device information numerical values to generate a device identifier, wherein the device identifier is used for identifying devices of the same type.
The aggregation may be performed by splicing the device information values in a designated order to generate a character string containing the device information values as a device identifier. For example, the character string obtained by splicing is "112141336", that is, the character string represents a device brand 1, a device model 1, a processor frequency 2, a ring volume 1, a call volume 4, an alarm volume 1, a battery remaining capacity 3, a device memory remaining capacity 3, and a device memory card remaining capacity 6, respectively.
Alternatively, other operations may be performed in steps for different device information values, for example, the device information values of the non-variable device information are individually encoded, and the device information values of the variable device information are generalized in steps and aggregated to obtain the device identifier.
It will be readily appreciated that although the generated device identification may already characterize the device, in actual practice, devices of other users may often have the same or similar device information, and thus may derive the same device identification.
S107, generating a feature vector containing the equipment identifier and the environment information, and determining the service processing times corresponding to the feature vector under the same time window.
Specifically, the network protocol IP address of the device can be obtained, and th feature vector devicetag _ IP _ variable _ category containing the device identifier and the IP address is generated, or a media access control MAC address of the device is obtained, and a second feature vector devicetag _ Mac _ variable _ category containing the device identifier and the MAC address is generated, or a real physical address of the device is obtained, and a third feature vector devicetag _ lbs _ variable _ category containing the device identifier and the real physical address is generated.
For devices, or more of the three feature vectors may be included in the training samples corresponding to the device.
, for a certain determined feature vector, the number of business processes in the same time window corresponding to the feature vector can be obtained, wherein the number of business processes includes transaction number, reward pickup number, account number, etc. the time window can be preset, for example, the first 24 hours of the current time.
For example, for the th feature vector devicetag _ ip _ variable _ category, if the feature vector is in the form of "(112141336, ip 1)", the number of awards for drawing N1, the number of transactions N2 or the number of accounts N3 at the device identifier "112141336" and the ip address "ip 1" needs to be acquired for the full amount of sample data (usually, historical data within time).
And S109, determining the service processing times corresponding to the feature vector as the feature value of the training sample corresponding to the equipment so as to train the equipment risk identification model.
As described above, the corresponding feature value N1 may be determined, so that the feature value N1 corresponding to the th feature vector (112141336, ip1) "is used as the th feature value of the training sample corresponding to the device.
The device identification is used for identifying devices, but after the environment information is combined, the obtained feature vector can be used for identifying devices, actually, the device information obtained by different devices through the embodiment of the specification can be considered to be sufficiently distinguished from each other under certain environment information conditions, for example, under ip or longitude and latitude coordinates, the device information of the two devices is basically not , therefore, the feature vector can be used as sample features in risk identification to participate in model training and scoring.
Through the scheme provided by the embodiment of the specification, pieces of equipment information and environment information which are and are not easy to tamper are collected to jointly form a characteristic vector of the equipment, and the characteristic vector containing the characteristic value is generated by combining with the number of times of service processing, so that the characteristic value can be used as a training sample, model training and risk identification are performed, the identification accuracy of a risk identification model on the equipment is improved, the equipment dimension information is prevented from being broken through by a single point, and the stability and the accuracy of the equipment risk identification are integrally improved.
, the embodiments of the present specification can also perform corresponding model training based on the feature values of the training samples corresponding to the aforementioned devices.
In other words, the illustrated embodiment can determine in advance whether devices are at risk (i.e., whether they are blackers with modified device information), and can also perform feature value extraction on blacker samples based on the foregoing steps as negative samples during training, so that, in supervised learning, each training sample can be actually given a corresponding label (whether it is a blacker or not), and thus, a -th device risk identification model can be obtained by training according to the feature values of the training samples, and used for evaluating whether devices are blackers.
Or, without a label of each sample device, in this embodiment of the present specification, a corresponding unsupervised clustering model training may be performed based on a feature value of a sample feature, and through the clustering training, devices with similar features may be classified correspondingly, so as to obtain a second device risk identification model for classification, which is used to evaluate whether devices are blackers.
In another aspect, after the foregoing equipment risk identification model has been trained, an embodiment of this specification further provides equipment risk identification methods based on the foregoing equipment risk identification model, as shown in fig. 2, where fig. 2 is a schematic flow chart of equipment risk identification methods provided by this specification, and includes:
s201, acquiring information of equipment to be detected and environmental information of the equipment;
s203, performing numerical value conversion on the equipment information to generate a corresponding equipment information numerical value;
s205, aggregating the device information numerical values to generate a device identifier, wherein the device identifier is used for identifying devices of the same type;
s207, generating a feature vector containing the equipment identifier and the environmental information, acquiring the service processing times corresponding to the feature vector under the same time window, and determining the service processing times as a feature value of the feature vector of the equipment to be detected;
s209, based on the characteristic value of the characteristic vector of the equipment to be detected, adopting the equipment risk identification model to evaluate the risk degree of the equipment to be detected.
Correspondingly, an embodiment of the present specification further provides kinds of feature extraction apparatuses for a device, as shown in fig. 3, where fig. 3 is a schematic structural diagram of kinds of feature extraction apparatuses for a device, which is provided in the embodiment of the present specification, and includes:
an obtaining module 301, configured to obtain device information and environment information where the device is located;
the conversion module 303 performs numerical value conversion on the device information to generate a corresponding device information numerical value;
the aggregation module 305 aggregates the device information values to generate device identifiers, where the device identifiers are used to identify devices of the same type;
a generating module 307, configured to generate a feature vector including the device identifier and the environment information, and determine service processing times corresponding to the feature vector in the same time window;
the eigenvalue determination module 309 determines the number of times of service processing corresponding to the eigenvector as an eigenvalue of a training sample corresponding to the equipment, so as to perform equipment risk identification model training.
, the device information includes at least of device brand, device model, processor frequency, ring volume, call volume, alarm volume, battery remaining capacity, device memory remaining capacity or device memory card remaining capacity, and the conversion module 303 obtains the percentage coefficient of ring volume, call volume, alarm volume, battery remaining capacity, device memory remaining capacity or device memory card remaining capacity, determines the coefficient interval to which the percentage coefficient belongs, and determines the device information value corresponding to the coefficient interval according to the preset interval value corresponding relation.
Further , the aggregation module 305 concatenates the device information values according to a specified order to generate a string containing the device information values, and determines the string as the device identifier.
, the generating module 307 obtains the IP address of the device and generates th eigenvector containing the device id and the IP address, or obtains the MAC address of the device and generates the second eigenvector containing the device id and the MAC address, or obtains the real physical address of the device and generates the third eigenvector containing the device id and the real physical address.
, the apparatus further includes a model training module 311, which performs supervised or unsupervised model training according to the feature values of the training samples corresponding to the devices to generate the device risk identification model.
Corresponding to another aspect, the present specification embodiment further provides equipment risk identification devices based on the aforementioned equipment risk identification model, as shown in fig. 4, fig. 4 is a schematic structural diagram of equipment risk identification devices provided in the present specification embodiment, and the device risk identification devices include:
the acquiring module 401 acquires the information of the device to be detected and the environment information of the device;
the conversion module 403 performs numerical value conversion on the device information to generate a corresponding device information numerical value;
the aggregation module 405 aggregates the device information values to generate device identifiers, where the device identifiers are used to identify devices of the same type;
the generating module 407 is configured to generate a feature vector including the device identifier and the environmental information, acquire service processing times corresponding to the feature vector in the same time window, and determine the service processing times as a feature value of the feature vector of the device to be detected;
and the risk identification module 409 is used for evaluating the risk degree of the equipment to be detected by adopting the equipment risk identification model based on the characteristic value of the characteristic vector of the equipment to be detected.
The present specification further provides computer apparatuses including at least a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the feature extraction method shown in fig. 1.
The present specification further provides computer devices, which at least include a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the device risk identification method shown in fig. 2.
FIG. 5 illustrates more specific hardware architecture diagrams of a computing device that may include a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050, where the processor 1010, the memory 1020, the input/output interface 1030, and the communication interface 1040 are communicatively coupled to each other within the device via the bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
Bus 1050 includes paths that carry information between the various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
The embodiments of the present specification further provide computer-readable storage media, on which a computer program is stored, and the program, when executed by a processor, implements the feature extraction method shown in fig. 1.
The embodiments of the present specification further provide computer-readable storage media, on which computer programs are stored, and when the computer programs are executed by a processor, the computer programs implement the device risk identification method shown in fig. 2.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
Based on the understanding that the technical solutions of the embodiments of the present specification or portions thereof contributing to the prior art can be embodied in the form of a software product, which can be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing computer devices (which can be personal computers, servers, or network devices, etc.) to execute the methods described in the various embodiments or portions of the embodiments of the present specification.
typical implementation devices are computers, which may be in the form of personal computers, laptop computers, cellular phones, camera phones, smart phones, personal digital assistants, media players, navigation devices, email messaging devices, game consoles, tablet computers, wearable devices, or a combination of any of these devices.
The method embodiments described above are merely illustrative, wherein the modules described as separate components may or may not be physically separate, and the functions of the modules may be implemented in or more pieces of software and/or hardware in implementing the embodiments of the present disclosure.
The foregoing is only a specific embodiment of the embodiments of the present disclosure, and it should be noted that, for those skilled in the art, a plurality of modifications and decorations can be made without departing from the principle of the embodiments of the present disclosure, and these modifications and decorations should also be regarded as the protection scope of the embodiments of the present disclosure.