CN113254919B - Abnormal device identification method, electronic device, and computer-readable storage medium - Google Patents

Abnormal device identification method, electronic device, and computer-readable storage medium Download PDF

Info

Publication number
CN113254919B
CN113254919B CN202110792412.3A CN202110792412A CN113254919B CN 113254919 B CN113254919 B CN 113254919B CN 202110792412 A CN202110792412 A CN 202110792412A CN 113254919 B CN113254919 B CN 113254919B
Authority
CN
China
Prior art keywords
abnormal
equipment
sample
group
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110792412.3A
Other languages
Chinese (zh)
Other versions
CN113254919A (en
Inventor
朱金星
张静雅
胡磊
张波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yunxinzhice Technology Co ltd
Original Assignee
Hangzhou Yunxinzhice Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yunxinzhice Technology Co ltd filed Critical Hangzhou Yunxinzhice Technology Co ltd
Priority to CN202110792412.3A priority Critical patent/CN113254919B/en
Publication of CN113254919A publication Critical patent/CN113254919A/en
Application granted granted Critical
Publication of CN113254919B publication Critical patent/CN113254919B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/44Program or device authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an abnormal device identification method, an electronic device and a computer readable storage medium, wherein the method comprises the following steps: acquiring characteristic information of target equipment; and performing expansion processing on the feature information of the target equipment, wherein the expansion processing mode comprises the following steps: log transformation is carried out on the characteristic information, an FM decomposition matrix is formed by each piece of characteristic information after log transformation, and any two elements of the FM decomposition matrix are crossed to obtain expanded characteristic information; and determining whether the target equipment belongs to abnormal equipment or not based on the expanded characteristic information and an abnormal equipment identification model. According to the technical scheme, whether the target equipment is abnormal equipment or not can be automatically and quickly identified directly through the abnormal equipment identification model, the reliability of the identification result is guaranteed while the labor cost is reduced, the quick detection of the abnormal equipment is facilitated, and the information and property safety related to the target equipment is guaranteed.

Description

Abnormal device identification method, electronic device, and computer-readable storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to an abnormal device identification method, an electronic device, and a computer-readable storage medium.
Background
With the development of science and technology, potential safety hazards are correspondingly generated, and electronic equipment such as a mobile phone and the like is used as a tool for people to contact with in daily life, and is possibly used in illegal fields or placed in an improper use which has a threat to the information and property safety of third parties. However, the number of electronic devices is large, and the reliability of the electronic devices cannot be thoroughly checked only by manual means.
Therefore, how to conveniently identify abnormal equipment used in illegal fields or used in improper purposes becomes a technical problem to be solved at present.
Disclosure of Invention
The embodiment of the invention provides an abnormal equipment identification method, electronic equipment and a computer readable storage medium, and aims to solve the technical problem that the abnormal equipment identification in the related art is lack of convenience.
In a first aspect, an embodiment of the present invention provides an abnormal device identification method, including: acquiring characteristic information of target equipment; and performing expansion processing on the feature information of the target equipment, wherein the expansion processing mode comprises the following steps: log transformation is carried out on the characteristic information, an FM decomposition matrix is formed by each piece of characteristic information after log transformation, and any two elements of the FM decomposition matrix are crossed to obtain expanded characteristic information; determining whether the target equipment belongs to abnormal equipment or not based on the extended characteristic information and an abnormal equipment identification model, wherein the abnormal equipment identification model is obtained by training with the extended characteristic information as input and the equipment type of the sample equipment as output, and the sample equipment comprises sample abnormal equipment and sample non-abnormal equipment; wherein the training step of the abnormal equipment identification model comprises the following steps: acquiring the sample abnormal device and the sample non-abnormal device; selecting a first group of devices from the sample anomalous device that accounts for a first specified percentage of the sample anomalous device a plurality of times while selecting a second group of devices from the sample non-anomalous device that accounts for a second specified percentage of the sample non-anomalous device a plurality of times; after the first equipment group and the second equipment group are selected each time, determining standby model parameters of the abnormal equipment identification model based on the first equipment group and the second equipment group selected at the current time; and selecting the backup model parameter with the highest accuracy as the target model parameter of the abnormal equipment identification model from multiple corresponding sets of backup model parameters selected for multiple times.
In a second aspect, an embodiment of the present invention provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being arranged to perform the method of any of the first aspects above.
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores computer-executable instructions for executing the method flow described in any one of the first aspect.
According to the technical scheme, aiming at the technical problem that the identification of the abnormal equipment in the related technology is lack of convenience, whether the target equipment is the abnormal equipment or not can be automatically and quickly identified through the abnormal equipment identification model, the labor cost is reduced, the reliability of the identification result is guaranteed, the quick detection of the abnormal equipment is facilitated, and the safety of information and property related to the target equipment is guaranteed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 shows a flowchart of an abnormal device identification method according to an embodiment of the present invention.
Detailed Description
For better understanding of the technical solutions of the present invention, the following detailed descriptions of the embodiments of the present invention are provided with reference to the accompanying drawings.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
Example one
As shown in fig. 1, a flow of an abnormal device identification method according to an embodiment of the present invention includes:
step 102, obtaining characteristic information of the target device.
The target device is the electronic device to be identified whether the target device is the abnormal device or not, and the characteristic information shows the actual situation of the target device and can be used for judging whether the target device is the abnormal device or not. The abnormal equipment is electronic equipment meeting the preset conditions of the user.
Specifically, the characteristic information of the target device is information related to whether the electronic device is an abnormal device or not to a different extent, and preferably includes, but is not limited to: the user information of the target device, the APP active information of the target device and the position information of the target device.
The user information of the target device preferably includes, but is not limited to, information of an age, a sex, an occupation, a work and rest time, and the like of a user to which the target device belongs, and the user information of the target device may be presented in a manner including, but not limited to, a character string.
The APP activity information of the target device preferably includes, but is not limited to, information about APP, such as specific APP and/or random APP, APP type, APP installation list, usage duration information, usage frequency information, usage time interval, and the like. In addition, optionally, the APP activity information of the target device may further include an APP uninstall list within a time period obtained by intersecting APP installation lists at different time points.
The location information of the target device preferably includes, but is not limited to, location information without a timestamp or with a timestamp, etc., reported by the target device through the SDK.
Step 104, performing an expansion process on the feature information of the target device, where the expansion process includes: and carrying out log transformation on the characteristic information, forming an FM decomposition matrix by each piece of characteristic information after log transformation, and crossing any two elements of the FM decomposition matrix to obtain the expanded characteristic information.
When the magnitude of the characteristic information is too large, the accuracy of the calculation result is influenced, and for this reason, the magnitude of all the characteristic information can be reduced based on log transformation, so that the calculation result is within the range of not influencing the accuracy of the calculation result. Further, decomposing the log-transformed multiple types of feature information into an FM decomposition matrix by using an FM (factor decomposition Machine) algorithm, wherein each element of the FM decomposition matrix corresponds to one type of feature information, and then, intersecting any two elements to obtain the expanded feature information. The interleaving as described herein may alternatively be a multiplication.
And 106, determining whether the target equipment belongs to abnormal equipment or not based on the expanded characteristic information and an abnormal equipment identification model, wherein the abnormal equipment identification model is obtained by training with the expanded characteristic information as input and the equipment type to which the sample equipment belongs as output, and the sample equipment comprises sample abnormal equipment and sample non-abnormal equipment.
The abnormal device identification model is then used to reflect the association of the augmented characteristic information of the device with the device type to which the sample device belongs. Therefore, after the extended feature information of the target device is obtained, it is possible to determine that the device type to which the target device belongs is an abnormal device and a non-abnormal device based on the obtained extended feature information of the target device and the association relationship.
On this basis, step 104 may include: and determining the abnormality degree of the target equipment based on the expanded characteristic information and the abnormal equipment identification model, wherein when the abnormality degree of the target equipment is larger than or equal to a specified value, the target equipment is determined to belong to abnormal equipment.
The abnormal equipment identification model can be selected as a Logistic Regression (Logistic Regression) classification model, and the abnormal degree of the target equipment can be obtained by processing the expanded characteristic information through the Logistic Regression classification model, wherein the abnormal degree is used for reflecting the probability that the target equipment is used in an illegal field or is placed in an improper use. In addition, a specified numerical value is set, which is the lowest abnormality degree that a target apparatus may have when the target apparatus is an abnormal apparatus, and therefore, when the abnormality degree of the target apparatus is greater than or equal to the specified numerical value, it is determined that the target apparatus belongs to an abnormal apparatus, and conversely, it is determined that the target apparatus belongs to a non-abnormal apparatus.
Optionally, the value of the abnormality degree is in an interval of [0, 1], and the specified value is also a specified value in the interval of [0, 1 ].
According to the technical scheme, whether the target equipment is abnormal equipment or not can be automatically and quickly identified through the abnormal equipment identification model, the labor cost is reduced, the reliability of the identification result is guaranteed, the abnormal equipment is rapidly detected, and information and property safety related to the target equipment are guaranteed.
On the basis of the first embodiment, the step of training the abnormal device identification model includes:
step 202, acquiring a sample abnormal device and a sample non-abnormal device.
In fact, the abnormal equipment identification model is a classification model in nature, and therefore, when the abnormal equipment identification model is trained, a certain sample needs to be set for each classification involved in the abnormal equipment identification model. Then, the present application relates to two classifications of abnormal devices and non-abnormal devices, that is, samples of two types of sample abnormal devices and sample non-abnormal devices need to be obtained to train an abnormal device identification model, so as to ensure the reliability of the classification result of the abnormal device identification model.
Step 204, selecting a first device group in the sample abnormal device for a plurality of times, wherein the first device group accounts for a first specified percentage of the sample abnormal device, and simultaneously selecting a second device group in the sample non-abnormal device for a plurality of times, wherein the second device group accounts for a second specified percentage of the sample non-abnormal device.
If the number of the sample devices is huge, the model training is directly performed by using the feature information of all the sample devices, so that huge calculation amount, time consumption and labor consumption are generated. In this regard, a preset number threshold may be set for the number of sample abnormal devices, the number of sample non-abnormal devices, or the total number of sample abnormal devices and sample non-abnormal devices, where the preset number threshold is the lowest number when the number of sample abnormal devices, the number of sample non-abnormal devices, or the total number of both is up to the efficiency of the model parameter training calculation.
Before the step 204 of selecting the first device group multiple times and selecting the second device group multiple times, the method further includes: judging whether the total number of the abnormal sample devices and the non-abnormal sample devices, the number of the abnormal sample devices or the number of the non-abnormal sample devices is larger than or equal to a preset number threshold value or not; and if the total number, the number of the abnormal sample devices or the number of the non-abnormal sample devices is larger than or equal to the preset number threshold, the steps of selecting the first device group for multiple times and selecting the second device group for multiple times are carried out. In this way, computational efficiency may be improved by reducing the number of samples in a single training when the number of sample devices is large enough to impact computational efficiency.
In one possible design, the multiple selections of the first device group and the multiple selections of the second device group in step 204 may be selected by dividing sample anomalous devices into multiple groups, each time one group is selected as the first device group, and dividing sample non-anomalous devices into multiple groups, each time one group is selected as the second device group.
In another possible design, the multiple selection of the first device group and the multiple selection of the second device group in step 204 may be performed by multiple random selections in an abnormal sample device and a non-abnormal sample device, so as to obtain multiple sets of the first device group and the second device group, respectively. Wherein, many times of random selection can be carried out in parallel, so as to further save the time cost and improve the efficiency of model training.
Step 206, after the first device group and the second device group are selected each time, determining standby model parameters of the abnormal device identification model based on the first device group and the second device group selected at the current time.
In this regard, part of the sample abnormal devices and part of the sample non-abnormal devices may be selected multiple times in the sample device as the control group, the calculation amount of the single model training is reduced, and in order to obtain the most reliable model parameters of the abnormal device identification model, the model parameters obtained multiple times may be selected as the target model parameters of the abnormal device identification model with the highest accuracy. Therefore, the efficiency of model training is improved, and more reliable model parameters are obtained.
In a possible design, a plurality of groups of the first equipment group and the second equipment group are used for respectively training model parameters, and the training can be performed in parallel, so that the time cost is further saved, and the efficiency of model training is improved. For example, a plurality of parallel threads are established, a first device group and a second device group are allocated to each thread, and standby model parameters are determined in the threads based on the first device group and the second device group.
Preferably, the first specified percentage and the second specified percentage are both set to be 10%, and 10% of sample abnormal devices and 10% of sample non-abnormal devices are randomly selected from all sample abnormal devices for the first time and all sample non-abnormal devices respectively as the first device group and the second device group. And training based on the first equipment group and the second equipment group to obtain a group of standby model parameters of the abnormal equipment identification model. And after a group of standby model parameters are obtained, randomly selecting 10% of sample abnormal devices from all sample abnormal devices, randomly selecting 10% of sample non-abnormal devices from all sample non-abnormal devices, and calculating a new group of standby model parameters. And determining the final target model parameters of the abnormal equipment identification model in the ten groups of standby model parameters until the ten groups of standby model parameters are obtained.
In one possible design, the first device group and the second device group are selected a predetermined number of times, and the product of the first specified percentage and the predetermined number of times is a first specified value; the product of the second specified percentage and the predetermined number of times is a second specified value.
In particular, the higher the specified percentage of the single selection of the first and second groups of devices, i.e. the higher the proportion of the sample devices of the single selection of the first and second groups of devices in the total sample devices, the greater the amount of sample selected, and the greater the accuracy of the resulting backup model parameters. Under the condition that the accuracy of the obtained standby model parameters is high, the acquisition times of the standby model parameters can be relatively reduced so as to reduce the calculation pressure. And reducing the number of times of obtaining the parameters of the standby model, namely, reducing the number of times of selecting the first equipment group and the second equipment group for multiple times. As a result, if the numbers of times of selecting the first device group and the second device group are both predetermined numbers, the first specified percentage is inversely proportional to the predetermined numbers, and similarly, the second specified percentage is inversely proportional to the predetermined numbers.
In particular, a first finger may be providedSetting the product of the percentage and the preset times as a first designated value, and setting the first designated value as
Figure 186271DEST_PATH_IMAGE002
K is the number of sample abnormality devices, the larger K, the smaller the first specified value, and in the case of the predetermined number of determinations, the smaller the first specified percentage, that is, the larger the amount of sample to be selected, the smaller the proportion in which the sample abnormality device is selected. Therefore, the quantity of the equipment group used for training the model can be adaptively controlled based on the quantity of the alternative samples, the waste of computing resources is avoided, the computing pressure can be reduced, and the model training efficiency can be improved. Similarly, the second specified value may be set to
Figure 891226DEST_PATH_IMAGE004
And J is the number of the sample non-abnormal devices.
And 208, selecting the backup model parameter with the highest accuracy as the target model parameter of the abnormal equipment identification model from the multiple groups of corresponding backup model parameters selected for multiple times.
The higher the accuracy of the parameters of the backup model is, the higher the reliability of the classification result obtained by substituting the parameters into the abnormal equipment identification model is. Specifically, step 208 includes: obtaining a third device group in the sample abnormal device and a fourth device group in the sample non-abnormal device; respectively carrying out abnormal equipment identification on the equipment in the third equipment group and the fourth equipment group based on the multiple selection of the corresponding standby model parameters to obtain multiple groups of identification results; respectively determining the confidence degrees of the recognition results of each group based on the third device group, the fourth device group and the multiple groups of recognition results, wherein the confidence degrees show the matching degrees of the device types shown by the recognition results and the actual device types of the devices in the third device group and the fourth device group; and taking the confidence degree of the recognition result as the accuracy of the backup model parameter corresponding to the recognition result, and selecting the backup model parameter with the highest accuracy as the target model parameter of the abnormal equipment recognition model.
The third equipment group and the fourth equipment group are composed of equipment of which whether the equipment is known to be abnormal or not, on the basis, the third equipment group and the fourth equipment group are selected as verification groups, the verification results of the third equipment group and the fourth equipment group by using the standby model parameters are compared with the actual results of whether the equipment known to be abnormal or not of the third equipment group and the fourth equipment group are verified to determine whether the obtained standby model parameters are reliable or not.
In one possible design, the percentage of the third device group in the sample abnormal device is greater than the first specified percentage, and the percentage of the fourth device group in the sample non-abnormal device is greater than the second specified percentage, so that samples with larger sizes are selected from the sample abnormal device and the sample non-abnormal device to serve as bases for verifying the multiple sets of spare model parameters.
In one possible design, the sample device coincidence rate of the third device group with any of the first device groups is lower than a first predetermined coincidence rate, and the sample device coincidence rate of the fourth device group with any of the second device groups is lower than a second predetermined coincidence rate. And if the third equipment group and the first equipment group comprise a large number of same sample abnormal equipment, and the fourth equipment group and the second equipment group comprise a large number of same sample non-abnormal equipment, the third equipment group and the fourth equipment group are used for verifying the standby model parameters obtained by the first equipment group and the second equipment group with high coincidence degree, and the obtained verification result does not have reference value. Therefore, the third device group cannot excessively coincide with the collated first device group, and the fourth device group cannot excessively coincide with the collated second device group. For this, a first predetermined coincidence rate, which is a lowest coincidence rate at which the degree of coincidence of the third device group and the collated first device group is sufficient to make the verification result unreliable, and a second predetermined coincidence rate, which is a lowest coincidence rate at which the degree of coincidence of the fourth device group and the collated second device group is sufficient to make the verification result unreliable, may be set.
And classifying the equipment in the third equipment group and the fourth equipment group based on any group of standby model parameters to obtain a recognition result, wherein the recognition result shows the detected equipment type. The actual device types of the devices in the third device group and the fourth device group are known, and the matching degree between the recognition result and the known content can be obtained by comparing the recognition result with the known content, and certainly, the higher the matching degree is, the more reliable the recognition result is, that is, the higher the confidence degree of the recognition result is.
Further, the confidence of the recognition result is used as the accuracy of the model parameters corresponding to the recognition result, in other words, for each set of model parameters, the accuracy is determined by the confidence of the recognition result obtained by classifying the devices of the third device group and the fourth device group. And finally, selecting the backup model parameter with the highest accuracy from the multiple sets of backup model parameters as the target model parameter of the abnormal equipment identification model.
Through the technical scheme, compared with the method for training the abnormal equipment identification model by directly using the characteristic information of all sample equipment, more accurate and reliable model parameters can be obtained, so that the classification result of the obtained abnormal equipment identification model has higher reliability, and the identification accuracy of the abnormal equipment is improved.
Example two
On the basis of the first embodiment, in the process of training the abnormal equipment identification model by taking the characteristic information of the sample equipment as an input sample, the input sample can be expanded to increase the sample amount, so that the obtained abnormal equipment identification model can more accurately and reliably reflect the association relationship between the characteristic information of the equipment and the equipment type. Next, an information processing method according to an embodiment of the present application includes:
step 302, obtaining characteristic information of a sample device.
Specifically, the characteristic information of the sample device is information related to whether the electronic device is an abnormal device or not to different degrees, and preferably includes but is not limited to: the user information of the sample device, the APP active information of the sample device and the position information of the sample device.
The user information of the sample device preferably includes, but is not limited to, information such as age, sex, occupation, work and rest time of a user to which the sample device belongs, and the user information of the sample device can be presented in a manner including, but not limited to, a character string.
The APP activity information of the sample device preferably includes, but is not limited to, information about APP such as specific APP and/or random APP, APP type, APP installation list, usage duration information, usage frequency information, usage time interval, and the like. In addition, optionally, the APP activity information of the sample device may further include an APP uninstall list within a time period obtained by intersecting APP installation lists at different time points.
The location information of the sample device preferably includes, but is not limited to, location information without a timestamp or with a timestamp reported by the sample device via the SDK, and the like.
Step 304, preprocessing the characteristic information of the sample device based on a predetermined sample expansion mode.
And step 306, training an abnormal device identification model by taking the preprocessed characteristic information of the sample device as input and taking the device type to which the sample device belongs as output, wherein the abnormal device identification model is used for determining whether the target device belongs to an abnormal device or not based on the characteristic information of the target device.
The predetermined sample expansion method in step 304 includes performing three steps of category expansion, decision tree screening and binning screening on the feature information of the sample device in sequence.
First, the feature information of the sample device is expanded in type, and specifically, a new type of feature information is generated by combining different types of feature information, so that the expansion of the type of feature information is realized. Optionally, at least two kinds of feature information are selected from the multiple kinds of feature information of the sample device to be combined to obtain expanded information; determining a feature value of the augmented information based on feature values of the two kinds of feature information; adding the augmented information to the characterization information of the sample device.
In one possible embodiment, the characteristic information of the sample devices is presented in the form of a data table, for example, with the unique identifier of the sample device as a row label, the characteristic information of each sample device as a row, and one category of the characteristic information as a column. Each sample device then has a corresponding location under any characteristic information, which may be a characteristic value or may be empty.
For example, 10000 sample devices are used, the data table of the characteristic information of the sample devices has 10000 rows, and the first column of each row is set as the unique identification of the sample device of the row; setting the characteristic information includes: the age, sex, occupation, sample equipment of the affiliated user of sample equipment whether install appointed APP, appointed APP long time of use, appointed APP's frequency of use, then the only identification sign and 7 kinds of characteristic information setting 8 of the above-mentioned sample equipment of the column correspondence of data table.
It should be understood that in an actual scenario, the number of sample devices may be arbitrarily set based on actual needs, the variety of feature information may be as many as several hundreds or even thousands, and the numbers given in the context of this application are only examples and are not practical limitations. And, in various examples in the context of the present application, the rows and columns of the data table may be swapped based on actual needs, without limitation.
Based on this, if at least two kinds of feature information are selected from the plurality of kinds of feature information of the sample device and combined to obtain the extended information, and the extended information is to generate a new column, the position corresponding to each sample device under the column can be obtained based on the feature value of each kind of feature information of the at least two kinds of feature information.
In one possible design, the multiple types of feature information of the sample device may be decomposed into a matrix by an FM (factor decomposition Machine) algorithm, where each element of the matrix corresponds to one type of feature information, and then any two elements of the matrix are crossed to obtain the augmented information. The interleaving as described herein may alternatively be a multiplication.
In another possible design, the augmentation information may also be derived based on a single item of feature information. Specifically, when the magnitude of a piece of feature information is greater than a predetermined magnitude, log transformation is performed on the feature information to obtain corresponding extended information, where the predetermined magnitude is the maximum magnitude that the feature information is convenient to calculate, and if the magnitude of the feature information is greater than the predetermined magnitude, it is indicated that the magnitude is too large, which affects the accuracy of the calculation result. Finally, the magnitude of the expanded information is lower than that of the characteristic information, in other words, the magnitude of the characteristic information can be reduced by means of log transformation, so that the accuracy of the calculation result is improved.
In yet another possible design, it may be detected whether the magnitude of each feature information is greater than a predetermined magnitude, and if so, the feature information is log-transformed. After the orders of magnitude of all the characteristic information are set in the range convenient for calculation, various kinds of characteristic information of the sample equipment are decomposed into a matrix through an FM algorithm, each element of the matrix corresponds to one kind of characteristic information, and then any two elements of the matrix are crossed to obtain the expanded information.
In yet another possible design, it may be detected whether the magnitude of each feature information is greater than a predetermined magnitude, and if so, the feature information is log-transformed. After the orders of magnitude of all the characteristic information are set in the range convenient for calculation, various kinds of characteristic information of the sample equipment are decomposed into a matrix through an FM algorithm, each element of the matrix corresponds to one kind of characteristic information, and then any two elements of the matrix are crossed to obtain the expanded information. And then, carrying out order judgment on the expansion information obtained by crossing any two elements so as to further determine whether log transformation needs to be carried out on the expansion information.
After the feature information is expanded by the technical scheme, the number of the types of the feature information is greatly increased and is far greater than the initial type number of the feature information, so that the expanded feature information is screened and is an essential part in the expansion process.
Next, a preliminary screening is first performed using a decision tree. Each item of feature information is used as an initial node of the decision tree with an initial gain, and the initial gain represents the information quantity or the information value of the feature information. Through the splitting function of the decision tree, the initial node is finally split into a plurality of leaf nodes, each leaf node has a corresponding gain, and the gain of the leaf node represents the information amount or the information value of the leaf node.
In the decision tree splitting process, the initial gain of an initial node of a feature information is different from the sum of the gains of a plurality of leaf nodes which are finally generated. The difference between the initial gain and the sum of the gains of the leaf nodes represents the degree of change of the information quantity/information value of the feature information in the decision tree splitting process, and the greater the degree of change, the greater the influence of the feature information is, the greater the contribution to the subsequently trained model is. At this time, a difference between the initial gain of the feature information and the sum of the gains of the leaf nodes may be set as a weight of the feature information, and an initial weight representing a minimum difference between the initial gain of the feature information and the sum of the gains of the leaf nodes when the feature information has sufficient contribution to a subsequently trained model may be set. Then, taking each feature information of the feature information as an initial node of a decision tree, and determining a weight of each feature information, wherein for each feature information, the weight is a difference value between an initial gain of the initial node and a sum of gains of a plurality of leaf nodes obtained by splitting the initial node; and for each piece of feature information, if the weight of the feature information is greater than or equal to the designated weight, retaining the feature information, and otherwise, deleting the feature information.
Then, if the weight of the feature information is greater than or equal to the specified weight, it indicates that the feature information has sufficient contribution to the subsequently trained model, in other words, sufficient contribution to identifying whether the device is an abnormal device, and needs to be retained. On the contrary, if the weight of the feature information of this kind is smaller than the designated weight, it indicates that the feature information does not sufficiently contribute to the model trained subsequently, in other words, does not sufficiently contribute to identifying whether the device is an abnormal device, and the feature information can be deleted.
And after the decision tree screening is finished, performing box-separation screening on the characteristic information obtained by screening the decision tree.
In one possible design, when the feature information is continuous data, the feature information is subjected to bin screening. For any feature information that is continuous data, the data is divided into a plurality of ranges, for example, for the feature information of the age of the user belonging to the sample device, the ages of the users having the same or similar influence on the desired model can be classified into one box based on actual needs. For example, four cases of 0 to 25 years, 26 to 35 years, 36 to 45 years and over 45 years are divided, and at this time, it can be assumed that when the user is at any age in the range of 0 to 25 years, such as the user at 18 years and the user at 22 years, the sample devices corresponding to the two have the same or similar contribution to the abnormal device identification model in the dimension of the age, and further, in the case that the characteristic information other than the age is the same, there is no difference between the sample device at 18 years and the sample device at 22 years for the abnormal device identification model. Thus, a specific characteristic value can be set for each box, and for all sample devices, the characteristic value corresponding to the column of the age of the sample device is replaced by the specific characteristic value corresponding to the box with the age of the sample device.
In addition, if the characteristic information has an abnormal value, for example, the age of a sample device is 1000, which is obviously unreasonable, if the 1000 is directly used as the characteristic value for calculation, the calculation result is inaccurate, and the reliability of the obtained model is finally affected. The data is directly distributed in a box of data over 45 years old in a box dividing mode, and the characteristic value of the data is replaced by the specified characteristic value corresponding to the box over 45 years old. Therefore, the influence of the abnormal value on the model training effect is effectively reduced.
After the characteristic values of all sample devices under the characteristic information are used as independent variables for carrying out binning, determining whether the independent variables have monotonicity or not based on positive and negative sample difference values in a binning result; and when the independent variable has monotonicity, keeping the characteristic information, otherwise, deleting the characteristic information.
And positive and negative sample difference values in the binning result are WOE (Evidence Weight), each binning corresponds to the positive and negative sample difference values, and the positive and negative sample difference values are used for reflecting the difference or segmentation degree between the characteristic values corresponding to the abnormal equipment and the characteristic values corresponding to the non-abnormal equipment in the binning Of the binning device. The binning is that the characteristic values of all sample devices under a kind of characteristic information are used as arguments, when the arguments have monotonicity, i.e. monotone increasing or monotone decreasing, the change of the arguments follows a certain rule, and the characteristic information has enough contribution to identifying whether the device is an abnormal device or not due to the regularity of the change of the characteristic information. Thus, when the argument has monotonicity, the feature information is retained, otherwise, the feature information indicating that this kind of feature information does not sufficiently contribute to identifying whether the device is an abnormal device, and the feature information can be deleted.
In the binning screening, for each feature Information, a predictive capability Value, that is, an Information Value (IV), for reflecting a contribution level of a dependent variable corresponding to the independent variable, that is, a Value of a feature Information to a required abnormal device identification model may be determined based on the difference Value between the positive and negative samples in the binning result, and the predetermined predictive capability Value reflects a lowest Value of whether the contribution level of the independent variable to the dependent variable has a sufficient contribution to the identification device.
If the predictive ability value of the feature information is larger than or equal to the preset predictive ability value, the feature information of the type is indicated to have enough contribution to identifying whether the equipment is abnormal equipment or not, the feature information is reserved, and otherwise, the feature information is deleted.
In summary, based on a predetermined sample expansion method, the step of preprocessing the feature information of the sample device mainly includes: firstly, new characteristic information is obtained by combining and/or transforming the characteristic information, so that the types of the characteristic information are expanded; secondly, primarily screening a large amount of expanded characteristic information through decision tree screening; thirdly, the feature information which is continuous data in the feature information screened by the decision tree is further processed in a box separation mode, and the specific processing mode comprises the following steps: and after the characteristic information is subjected to box separation, replacing the characteristic value under the characteristic information with the specified characteristic value of the corresponding box, deleting the characteristic information without monotonicity, and deleting the characteristic information with insufficient prediction capability value.
In addition, before the characteristic information of the sample device is preprocessed based on the predetermined sample expansion mode, the characteristic information may be preliminarily screened in another mode, or after the characteristic information of the sample device is preprocessed based on the predetermined sample expansion mode, the screened characteristic information may be screened again.
In one possible design, for any sample device, if the number of pieces of feature information with empty contents in all pieces of feature information of the sample device is greater than or equal to a specified number, all pieces of feature information of the sample device are deleted. That is, if there is too much empty information in all feature information corresponding to any sample device, the sample device cannot be used as a valid sample because of the excessive loss of the feature information, and at this time, all the feature information of the sample device should be deleted. Optionally, the row in which the sample device is located is deleted from the table.
In another possible design, for any kind of feature information, if the saturation degree of the feature information reaches a specified saturation degree, the feature information is deleted from all the feature information of all the sample devices, wherein the saturation degree is the maximum ratio of the same feature value in the feature values of all the sample devices under the feature information.
For example, if 90% of all sample devices are female under the characteristic information of the user gender, the model classification may be more useful in an actual scene that the user gender is female, which indicates that the characteristic information of the user gender does not greatly affect the model classification, and therefore, the characteristic information of the user gender may be deleted. Optionally, the list of user genders is deleted in the table.
In another possible design, if the specified characteristic information of any sample device is within a preset abnormal range, deleting all the characteristic information of the sample device; or if the specified characteristic information of any sample device is within a preset abnormal range, deleting the specified characteristic information of all sample devices. That is, the preset abnormal range is a range in which the feature value of the specified feature information of any sample device is located when the specified feature information is triggered to be screened, and if the specified feature information of the sample device is in the preset abnormal range, the screening action of the specified feature information is triggered.
Specifically, all the feature information of the sample device may be deleted, that is, a row where a single sample device whose feature value of the specified feature information is within a preset abnormal range is located may be deleted, or the specified feature information of all the sample devices may be deleted, that is, a column where the specified feature information is located may be deleted.
An electronic device of an embodiment of the invention includes at least one memory; and a processor communicatively coupled to the at least one memory; wherein the memory stores instructions executable by the at least one processor, the instructions being arranged to perform the scheme of any of the embodiments described above. Therefore, the electronic device has the same technical effects as any of the above embodiments, and is not described herein again.
The electronic device of embodiments of the present invention exists in a variety of forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.
(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.
(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.
(5) And other electronic devices with data interaction functions.
In addition, an embodiment of the present invention provides a computer-readable storage medium, which stores computer-executable instructions for executing the method flow described in any of the above embodiments.
The technical scheme of the invention is described in detail in the above with reference to the attached drawings, and through the technical scheme of the invention, whether the target equipment is abnormal equipment can be automatically and quickly identified directly through the abnormal equipment identification model, so that the labor cost is reduced, the reliability of the identification result is ensured, the quick inspection of the abnormal equipment is facilitated, and the guarantee is provided for the safety of information and property related to the target equipment.
It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
It should be understood that although the terms first, second, etc. may be used to describe groups of devices in embodiments of the present invention, these groups of devices should not be limited by these terms. These terms are only used to distinguish groups of devices from one another. For example, a first device group may also be referred to as a second device group, and similarly, a second device group may also be referred to as a first device group, without departing from the scope of embodiments of the present invention.
The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (6)

1. An abnormal device identification method, comprising:
acquiring characteristic information of target equipment;
and performing expansion processing on the feature information of the target equipment, wherein the expansion processing mode comprises the following steps: log transformation is carried out on the characteristic information, an FM decomposition matrix is formed by each piece of characteristic information after log transformation, and any two elements of the FM decomposition matrix are crossed to obtain expanded characteristic information;
determining whether the target equipment belongs to abnormal equipment or not based on the expanded characteristic information and an abnormal equipment identification model, wherein the abnormal equipment identification model is obtained by training with the expanded characteristic information as input and the equipment type of sample equipment as output, and the sample equipment comprises sample abnormal equipment and sample non-abnormal equipment;
wherein the training step of the abnormal equipment identification model comprises the following steps:
acquiring the sample abnormal device and the sample non-abnormal device;
selecting a first group of devices from the sample anomalous device that accounts for a first specified percentage of the sample anomalous device a plurality of times while selecting a second group of devices from the sample non-anomalous device that accounts for a second specified percentage of the sample non-anomalous device a plurality of times;
after the first equipment group and the second equipment group are selected each time, determining standby model parameters of the abnormal equipment identification model based on the first equipment group and the second equipment group selected at the current time;
selecting a standby model parameter with the highest accuracy as a target model parameter of the abnormal equipment identification model from multiple groups of corresponding standby model parameters selected for multiple times;
the number of times the first device group and the second device group are selected is both a predetermined number of times,
the product of the first specified percentage and the predetermined number of times is a first specified value, and the first specified value is
Figure 605276DEST_PATH_IMAGE001
K is the number of the sample abnormal devices;
the product of the second specified percentage and the predetermined number of times is a second specified value
Figure 13255DEST_PATH_IMAGE002
J is the number of the sample non-abnormal devices;
selecting the backup model parameter with the highest accuracy as the target model parameter of the abnormal equipment identification model in the multiple sets of corresponding backup model parameters selected for multiple times, wherein the selecting comprises the following steps:
acquiring a third device group in the sample abnormal device and a fourth device group in the sample non-abnormal device, wherein the coincidence rate of the third device group and any one of the first device groups is lower than a first preset coincidence rate, and the coincidence rate of the fourth device group and any one of the second device groups is lower than a second preset coincidence rate;
respectively carrying out abnormal equipment identification on the equipment in the third equipment group and the fourth equipment group based on the multiple selection of the corresponding standby model parameters to obtain multiple groups of identification results;
respectively determining the confidence degrees of the recognition results of each group based on the third device group, the fourth device group and the multiple groups of recognition results, wherein the confidence degrees show the matching degrees of the device types shown by the recognition results and the actual device types of the devices in the third device group and the fourth device group;
and taking the confidence degree of the recognition result as the accuracy of the backup model parameter corresponding to the recognition result, and selecting the backup model parameter with the highest accuracy as the target model parameter of the abnormal equipment recognition model.
2. The abnormal device identification method according to claim 1, wherein the determining whether the target device belongs to an abnormal device based on the augmented feature information and an abnormal device identification model comprises:
and determining the abnormality degree of the target equipment based on the expanded characteristic information and the abnormal equipment identification model, wherein when the abnormality degree of the target equipment is larger than or equal to a specified value, the target equipment is determined to belong to abnormal equipment.
3. The abnormal device identification method according to claim 1, further comprising, before the steps of selecting the first device group a plurality of times and selecting the second device group a plurality of times:
judging whether the total number of the abnormal sample devices and the non-abnormal sample devices, the number of the abnormal sample devices or the number of the non-abnormal sample devices is larger than or equal to a preset number threshold value or not;
and if the total number, the number of the abnormal sample devices or the number of the non-abnormal sample devices is larger than or equal to the preset number threshold, the steps of selecting the first device group for multiple times and selecting the second device group for multiple times are carried out.
4. The abnormal device identification method according to claim 1 or 2, wherein the characteristic information of the target device includes: the user information of the target device, the APP active information of the target device and the position information of the target device.
5. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor, the instructions being arranged to perform the method of any of the preceding claims 1 to 4.
6. A computer-readable storage medium having stored thereon computer-executable instructions for performing the method flow of any of claims 1-4.
CN202110792412.3A 2021-07-14 2021-07-14 Abnormal device identification method, electronic device, and computer-readable storage medium Active CN113254919B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110792412.3A CN113254919B (en) 2021-07-14 2021-07-14 Abnormal device identification method, electronic device, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110792412.3A CN113254919B (en) 2021-07-14 2021-07-14 Abnormal device identification method, electronic device, and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN113254919A CN113254919A (en) 2021-08-13
CN113254919B true CN113254919B (en) 2021-10-12

Family

ID=77191166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110792412.3A Active CN113254919B (en) 2021-07-14 2021-07-14 Abnormal device identification method, electronic device, and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN113254919B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114334696B (en) * 2021-12-30 2024-03-05 中国电信股份有限公司 Quality detection method and device, electronic equipment and computer readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108364106A (en) * 2018-02-27 2018-08-03 平安科技(深圳)有限公司 A kind of expense report Risk Forecast Method, device, terminal device and storage medium
CN109583904B (en) * 2018-11-30 2023-04-07 深圳市腾讯计算机系统有限公司 Training method of abnormal operation detection model, abnormal operation detection method and device
CN109858625A (en) * 2019-02-01 2019-06-07 北京奇艺世纪科技有限公司 Model training method and equipment, prediction technique and equipment, data processing equipment, medium
CN111651760B (en) * 2020-08-04 2020-11-20 北京志翔科技股份有限公司 Method for comprehensively analyzing equipment safety state and computer readable storage medium
CN112118551B (en) * 2020-10-16 2022-09-09 同盾控股有限公司 Equipment risk identification method and related equipment

Also Published As

Publication number Publication date
CN113254919A (en) 2021-08-13

Similar Documents

Publication Publication Date Title
CN113254918B (en) Information processing method, electronic device, and computer-readable storage medium
CN110275965B (en) False news detection method, electronic device and computer readable storage medium
CN105787025B (en) Network platform public account classification method and device
CN108648000B (en) Method and device for evaluating user retention life cycle and electronic equipment
CN105354210A (en) Mobile game payment account behavior data processing method and apparatus
CN112328909B (en) Information recommendation method and device, computer equipment and medium
CN110728543B (en) Abnormal account identification method and device
CN109558384B (en) Log classification method, device, electronic equipment and storage medium
CN108932646B (en) User tag verification method and device based on operator and electronic equipment
CN112749973A (en) Authority management method and device and computer readable storage medium
CN113254919B (en) Abnormal device identification method, electronic device, and computer-readable storage medium
CN115174250A (en) Network asset safety assessment method and device, electronic equipment and storage medium
CN111444930B (en) Method and device for determining prediction effect of two-classification model
CN111027065B (en) Leucavirus identification method and device, electronic equipment and storage medium
CN112183052A (en) Document repetition degree detection method, device, equipment and medium
CN110717787A (en) User classification method and device
CN110674632A (en) Method and device for determining security level, storage medium and equipment
CN114817518B (en) License handling method, system and medium based on big data archive identification
CN109213924B (en) Popularization task allocation method and device and computer equipment
CN110717817A (en) Pre-loan approval method and device, electronic equipment and computer-readable storage medium
CN113593546B (en) Terminal equipment awakening method and device, storage medium and electronic device
CN111625720B (en) Method, device, equipment and medium for determining execution strategy of data decision item
CN114334696A (en) Quality detection method and device, electronic equipment and computer readable storage medium
CN114943479A (en) Risk identification method, device and equipment of business event and computer readable medium
CN110825717B (en) Data normalization method, device and medium for identity recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant