CN109977150B - Classification method based on physical characteristics and implicit style characteristics of data - Google Patents

Classification method based on physical characteristics and implicit style characteristics of data Download PDF

Info

Publication number
CN109977150B
CN109977150B CN201910205905.5A CN201910205905A CN109977150B CN 109977150 B CN109977150 B CN 109977150B CN 201910205905 A CN201910205905 A CN 201910205905A CN 109977150 B CN109977150 B CN 109977150B
Authority
CN
China
Prior art keywords
node
data
influence
social network
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910205905.5A
Other languages
Chinese (zh)
Other versions
CN109977150A (en
Inventor
顾苏杭
王惠宇
高佳琴
王士同
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou Vocational Institute of Light Industry
Original Assignee
Changzhou Vocational Institute of Light Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou Vocational Institute of Light Industry filed Critical Changzhou Vocational Institute of Light Industry
Priority to CN201910205905.5A priority Critical patent/CN109977150B/en
Publication of CN109977150A publication Critical patent/CN109977150A/en
Application granted granted Critical
Publication of CN109977150B publication Critical patent/CN109977150B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24143Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical fields of pattern recognition, artificial intelligence and machine learning, in particular to a classification method based on physical characteristics and implicit style characteristics of data, which comprises the following steps: (1) Mapping the data set into a social network containing C sub-networks by using a K nearest neighbor algorithm; (2) Mining implicit style characteristics-authority and influence of data in a constructed social network; (3) Calculating the efficiency of a double-layer structure between each test sample and each node in the social network according to the data distance characteristics and the authoritative style characteristics, and determining an allowable connection set of each sub-network of the test sample; (4) Calculating the sum of the influence of all nodes in each allowed connection set according to the allowed connection set; (5) And judging the test sample label type as a sub-network label type corresponding to the sum with the maximum node influence.

Description

Classification method based on physical characteristics and implicit style characteristics of data
Technical Field
The invention relates to the technical fields of pattern recognition, artificial intelligence and machine learning, in particular to a classification method based on physical characteristics and implicit style characteristics of data.
Background
The data classification technology is always a hot problem of research in the fields of machine learning, pattern recognition, data mining and the like, and particularly combines the data classification technology with practical application, such as intelligent medical treatment, face recognition, intelligent traffic monitoring, market dynamic analysis and the like, thereby further promoting the development of the data classification technology and widening the application prospect of the data classification technology in the fields of military industry, civilian life and the like. The key point of the data classification technology is to select proper data characteristics and construct a data classification model with high precision performance through a classification method.
Traditional classification methods, such as support vector machines, K neighbors, random forests, bayes, decision trees, artificial neural networks, takagi-Sugeno-Kang (TSK) fuzzy classifiers, and the like, train data classification models by utilizing data physical characteristics (such as distance, color, similarity, and the like). However, there is a implied association between data samples in most practical data sets, each class of data samples exhibiting unique implicit style characteristics, typical data sets include: (1) epileptic brain electrical signals: the brain electrical signal waveform of the normal population is obviously different from that of the population suffering from epilepsy; (2) handwriting data set: the font style of each author is significantly different from the other authors; (3) vowel recognition: each vowel in the english language pronounces differently from the other. At present, the traditional classification method only considers the physical characteristics of data in the process of training a data classification model, does not relate to the implicit style characteristics of the data, and does not exist in published documents at home and abroad, so that the data classification method can mine the implicit style characteristics of the data, and simultaneously trains the data classification model by utilizing the physical characteristics of the data and the implicit style characteristics of the data. Thus, existing data classification methods do not conform to the fact that a data set contains implicit style characteristics of the data.
Disclosure of Invention
The invention aims to provide a classification method based on data physical characteristics and implicit style characteristics, which falls on a social network, can accord with the fact that most actual data sets contain data implicit style characteristics, and is used for improving data classification behaviors and improving data classification accuracy by mining the data implicit style characteristics through the social network. In addition, the method does not need to generate a data classification model in the training stage, and can enter the classification stage after determining the implicit style characteristics of the nodes in the social network.
In order to achieve the above object, an embodiment of the present invention provides a classification method based on physical characteristics and implicit style characteristics of data, including the following steps: mapping the data set into a social network containing C sub-networks by using a K nearest neighbor algorithm; mining implicit style characteristics-authority and influence of data in a constructed social network; calculating the efficiency of a double-layer structure between each test sample and each node in the social network according to the data distance characteristics and the authoritative style characteristics, and determining an allowable connection set of each sub-network of the test sample; calculating the sum of the influence of all nodes in each allowed connection set according to the allowed connection set; and judging the test sample label type as a sub-network label type corresponding to the sum with the maximum node influence.
In the above technical solution, for a given data set x= [ X ] 1 ,x 2 ,…,x N ] T Wherein x is i ∈R d The tag set is y= [ Y ] 1 ,y 2 ,…,y N ] T Mapping a given dataset X into a social network G using a K-nearest neighbor algorithm, further comprising: mapping a given dataset X into a social network g= { G using a K-nearest neighbor algorithm 1 ,g 2 ,…,g Q -wherein Q is equal to the number of categories contained in data set X, each sample X in data set X i Node v corresponding to social network G i The method comprises the steps of carrying out a first treatment on the surface of the According to the K-nearest neighbor algorithm, any two nodes v in the social network G i And v j The following two conditions are satisfied: (1) Node v j For node v i Neighbor node of (2) node v j And node v i With the same label, then at node v i And v j Is established by v i As the starting point, v j Directed edge e being a node ij The method comprises the steps of carrying out a first treatment on the surface of the According to the established social network G, each sub-network corresponds to each data class in the data set X, and any two sub-networks G p And g q Independent of each other, each node in the sub-network has the same label and is the same as the corresponding data class label.
According to the invention, two kinds of data including data authority and data influence are mined in the constructed social network G to have the style characteristics, and the method further comprises the following steps: each node v is first mined in the social network G constructed as described above i Authority a of (2) i Sub-network g q Authority of (2)The node is each data sample in the data set X, and the sub-network is each data class in the data set X. The node v i Authority a of (2) i From node v in the social network G i Is included to fully calculate the node authority. Accordingly, in the social network G, if more other nodes are connected to a certain node, the node has higher authority; if a node connects more other nodes, thenThe node is also highly authoritative. The node v i Authority a of (2) i The calculation formula is that
Wherein,and +.>The calculation formulas are respectively
In the formulas (1) to (4),d (D) i Respectively represent the node v i Degree of exit, degree of entry, and degree of entry. ζ represents a very small positive value such that outlier or noise samples in the data set X do not affect the classification performance of the classification method.
The node v is subjected to a fuzzification method i Fuzzification of authority of (2) to obtain node v i Is a fuzzy weight ω of (2) i The calculation formula is that
Wherein N represents a datasetX comprises the total number of samples. From equation (2), the fuzzy weight ω i Is (0, 1) and for the node v i The higher the value, the higher the corresponding fuzzy weight.
When the authority of any node is determined, a sub-network g can be calculated q Authority of the formula
Wherein,representing subnetwork g q The number of nodes involved, i.e. with sub-network g q The corresponding data class contains the number of samples, v m Representing subnetwork g q The mth node is included.
When any node v i After determining the fuzzy weight of (a), node v can be calculated i The influence of (a) is calculated as
Wherein,represents the ith node v i The influence in the h iteration process, alpha represents the damping coefficient of the social network, and the value is generally alpha=0.85. ρ j The node density representing the social network G is calculated by the following formula
Wherein d jk Representative node v j And v k The distance between the two is Euclidean distance, dc represents cut-off distance, and the value of dc can be set manually so that the node v j Surrounding ofThe number of the nodes accounts for 1-2% of the number of all the nodes in the social network G. χ (·) represents some judgment function, i.e. if d jk -dc < 0, then χ (·) =1, whereas χ (·) =0.
When the number of iterative loops in equation (4) reaches a maximum value H or the following condition is satisfied, the iterative loop will terminate.
Wherein I 2 Represents a 2-normal form, θ represents a small threshold, and can be set manually, e.g., θ=10 -4
As can be seen from the formula (8), the node influence is calculated by using the density of the nodes in the social network G, i.e. the node influence is calculated according to the actual distribution of the samples in the dataset. And node density is continuously propagated in the iterative process, so that the node influence has dynamic characteristics. In addition, the node authority and the node influence are correlated by using the node fuzzy weight, so that a positive correlation relationship is formed between the node authority and the node influence, namely, the higher the node authority is, the higher the node influence is.
For test set t= [ T ] 1 ,t 2 ,…,t M ] T Wherein t is m ∈R d Calculating an allowed connection set between each test sample in the test set and each sub-network in the social network G according to the data physical characteristics and the data implied style characteristics, and further comprising: when embedding a certain test sample T in a test set T into the social network G, the efficiency Λ of the double-layer structure is first calculated t,j The calculation formula is that
Wherein v is j Representing subnetwork g q The j-th node in (a), i.e. with sub-network g q The jth sample, d, in the corresponding class of data tj Representing the test sample t and node v j The distance between the two is Euclidean distance. Gamma represents a balance coefficient, the higher the value thereof, the greater the role of authority representing the node, and conversely, the greater the role of physical characteristics representing the data. From equation (10), the efficiency Λ of the bilayer structure t,j Is determined by the physical characteristics of the data and the implicit style characteristics of the data. Using the dual layer efficiency, the allowed connection set can be determined for calculating the sum of the allowed connection set influence between the test sample t and each sub-network, the allowed connection set determination criteria being expressed as follows
Wherein the efficiency lambda of the double-layer structure t,j For the test sample t and the node v j A function between the test sample t and the node v for checking j When the connecting edge is established, the efficiency lambda of the double-layer structure is improved t,j The value is also to reduce the efficiency lambda of the double-layer structure t,j Values. Accordingly, the allowed connection set may be generatedIs described as follows
1) If there is a node v j So that lambda is t,j More than or equal to 1, namely the efficiency of the double-layer structure is improved, the node v is obtained j Joining to a set of allowed connections
2) If there is no node v j Satisfying 1), i.e. the efficiency of the bilayer structure is reduced, will be closest to Λ t,j Node joining to allowed connection set in case of =1The allowed connection set at this time->Only one node is included.
A further improvement of the present invention, based on said generated set of allowed connectionsCalculating a sum of influence of each of the allowed connection set nodes, further comprising: according to the allowed connection set->The sum of the influence of the nodes in the respective allowed connection sets is calculated for determining the sum of the maximum influence. The sum of the influence of each allowed connection centralized node is calculated as the formula
According to the above-mentioned sum of influence of each permitted connection centralized node, the label type of the test sample is judged as the sub-network label type corresponding to the sum of influence of the maximum node, further comprising: determining the maximum influence sum according to the influence sum of the nodes in each allowed connection setThe calculation formula is as follows
According to the sum of the maximum influencesIdentifying the label type of the test sample as the sum +.>A corresponding sub-network tag type.
The invention has the beneficial effects that: the invention can accord with the fact that most actual data sets contain data implicit style characteristics, and the data implicit style characteristics are mined through the social network to improve the data classification behavior and improve the data classification precision.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flowchart of an algorithm of the present invention;
FIG. 3 is a schematic diagram of a social network of the present invention containing two dataclass dataset mappings;
FIG. 4 is a diagram showing node attributes in a social network in accordance with the present invention;
FIG. 5 is a schematic diagram of node authority in a social network in accordance with the present invention;
FIG. 6 is a schematic diagram of the influence of nodes in a social network according to the present invention;
FIG. 7 is a schematic diagram of an allowed connection set generated in accordance with the efficiency of the bilayer structure of the present invention;
FIG. 8 is a schematic diagram of the present invention for predicting test sample tag types based on the sum of maximum node impact of allowed connection sets.
Detailed Description
The present invention will be further described in detail with reference to the drawings and examples, which are only for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.
The technical scheme of the invention is further and fully described below with reference to the specific embodiments and the attached drawings. It is apparent that the specific embodiments may be shown in the following drawings and are merely to illustrate the technical solution of the present invention, not to limit the specific application of the present invention.
As shown in fig. 1, according to a classification method of a social network based on physical characteristics and implicit style characteristics of data, the classification method comprises the following steps:
step one, for a given dataset x= [ X 1 ,x 2 ,…,x N ] T Wherein x is i ∈R d The tag set is y= [ Y ] 1 ,y 2 ,…,y N ] T Mapping a given dataset X into a social network g= { G using a K-nearest neighbor algorithm 1 ,g 2 ,…,g Q -wherein Q is equal to the number of categories contained in data set X, each sample X in data set X i Node v corresponding to social network G i
Further, in the first step, according to the K-nearest neighbor algorithm, any two nodes v in the social network G i And v j The following two conditions are satisfied: node v j For node v i Neighbor node of (a) and node v j And node v i With the same label, then at node v i And v j Is established by v i As the starting point, v j Directed edge e being a node ij
Further, in the first step, according to the social network G, each sub-network corresponds to each data class in the data set X, and any two sub-networks G p And g q Independent of each other, each node in the sub-network has the same label and is the same as the corresponding data class label.
Step two, mining two kinds of data including authority and influence of data in the social network G constructed in the step one, namely mining each node v in the social network G constructed in the step 1 firstly i Authority a of (2) i Sub-network g q Authority of (2)The node is each data sample in the data set X, and the sub-network is each data class in the data set X.
The node v i Authority a of (2) i From node v in the social network G i Is included to fully calculate the node authority. Accordingly, in the social network G, if more other nodes are connected to a certain node, the node has higher authority; a node is also more authoritative if it connects more other nodes. The node v i Authority a of (2) i The calculation formula is that
Wherein,and +.>The calculation formulas are respectively
In the above-mentioned formula(s),d (D) i Respectively represent the node v i Degree of exit, degree of entry, and degree of entry. ζ represents a very small positive value such that outlier or noise samples in the data set X do not affect the classification performance of the classification method.
The node v is subjected to a fuzzification method i Fuzzification of authority of (2) to obtain node v i Is a fuzzy weight ω of (2) i The calculation formula is that
Where N represents the total number of samples contained in data set X. From equation (5), the fuzzy weight ω i Is (0, 1) and for the node v i The higher the value, the higher the corresponding fuzzy weight.
When the authority of any node is determined, a sub-network g can be calculated q Authority of the formula
Wherein,representing subnetwork g q The number of nodes involved, i.e. with sub-network g q The corresponding data class contains the number of samples, v m Representing subnetwork g q The mth node is included.
When any node v i After determining the fuzzy weight of (a), node v can be calculated i The influence of (a) is calculated as
Wherein,represents the ith node v i The influence in the h iteration process, alpha represents the damping coefficient of the social network, and the value is generally alpha=0.85. ρ j The node density representing the social network G is calculated by the following formula
Wherein d jk Representative node v j And v k The distance between the two is Euclidean distance, dc represents cut-off distance, and the value of dc can be set manually so that the node v j The number of surrounding nodes accounts for 1-2% of the number of all nodes in the social network G. ChiOf) represents a certain judgment function, i.e. if d jk -dc < 0, then χ (·) =1, whereas χ (·) =0. When the number of iterative loops in equation (4) reaches a maximum value H or the following condition is satisfied, the iterative loop will terminate.
Wherein I 2 Represents a 2-normal form, θ represents a small threshold, and can be set manually, e.g., θ=10 -4
As can be seen from the formula (7), the node influence is calculated by using the density of the nodes in the social network G, i.e. the node influence is calculated according to the actual distribution of the samples in the dataset. And node density is continuously propagated in the iterative process, so that the node influence has dynamic characteristics. In addition, the node authority and the node influence are correlated by using the node fuzzy weight, so that a positive correlation relationship is formed between the node authority and the node influence, namely, the higher the node authority is, the higher the node influence is.
Step three, for test set t= [ T ] 1 ,t 2 ,…,t M ] T Wherein t is m ∈R d Calculating an allowable connection set between each test sample in a test set and each sub-network in a social network G according to data physical characteristics and data implied style characteristics, specifically, when embedding a certain test sample T in the test set T into the social network G, firstly calculating double-layer structure efficiency Λ t,j The calculation formula is that
Wherein v is j Representing subnetwork g q The j-th node in (a), i.e. with sub-network g q The jth sample, d, in the corresponding class of data tj Representing the test sample t and node v j The distance between the two is Euclidean distance.Gamma represents a balance coefficient, the higher the value thereof, the greater the role of authority representing the node, and conversely, the greater the role of physical characteristics representing the data. The efficiency lambda of the double-layer structure t,j Is determined by the physical characteristics of the data and the implicit style characteristics of the data. Using the dual layer efficiency, the allowed connection set can be determined for calculating the sum of the allowed connection set influence between the test sample t and each sub-network, the allowed connection set determination criteria being expressed as follows
Wherein the efficiency lambda of the double-layer structure t,j For the test sample t and the node v j A function between the test sample t and the node v for checking j When the connecting edge is established, the efficiency lambda of the double-layer structure is improved t,j The value is also to reduce the efficiency lambda of the double-layer structure t,j Values. Accordingly, the allowed connection set may be generatedIs described as follows
1) If there is a node v j So that lambda is t,j More than or equal to 1, namely the efficiency of the double-layer structure is improved, the node v is obtained j Joining to a set of allowed connections
2) If there is no node v j Satisfying 1), i.e. the efficiency of the bilayer structure is reduced, will be closest to Λ t,j Node joining to allowed connection set in case of =1The allowed connection set at this time->Only one node is included.
Step four, according to the aboveAllowing connection setsThe sum of the influence of the nodes in the respective allowed connection sets is calculated for determining the sum of the maximum influence. The sum of the influence of each allowed connection centralized node is calculated as the formula
Step five, determining the maximum influence sum according to the influence sum of each allowed connection centralized nodeThe calculation formula is as follows
Further, according to the sum of the maximum influencesIdentifying the label type of the test sample as the sum +.>A corresponding sub-network tag type.
As shown in fig. 2, when the data set X and the test data set T are input, in the training phase, the data set X is mapped into the social network G by using the K-nearest neighbor algorithm in the above step 1, and then the authority, the influence and the sub-network authority of each node in the social network G are mined by using the above step two, specifically, the concentration of each node in the social network G is calculated by using the above step two, the concentration of each node and the sub-network authority are calculated by using the above step two, the formulas (1) - (4), and the influence of each node is calculated by using the above step two, the formula (7) and the formula (9). In the classification stage, when a certain test sample t is input, firstly establishing an allowed connection set between the test sample t and each sub-network according to the formula (10) and the formula (11) in the third step, then calculating the sum of node influence in each allowed connection set according to the formula (12) in the fourth step, finally determining the allowed connection set with the sum of the maximum node influence according to the formula (13) in the fifth step, and judging the label type of the test sample as the label type corresponding to the allowed connection set with the sum of the maximum node influence.
As shown in FIG. 3, the input data set X contains two types of data, with labels 0 and 1, respectively, so that the data set X is mapped into a social network G and then contains two mutually independent sub-networks, respectively called'"and" ■ ", and the corresponding tag types for the subnetworks are also 0 and 1, respectively.
As shown in fig. 4, the degree of the partial nodes and the distance between the partial nodes, which is the euclidean distance, are shown in the social network G. The ingress and egress of a node may be determined by the directed edges established between the nodes.
As shown in FIG. 5, the authoritativeness of all nodes and subnetworks is shown in social network G, wherein the authoritativeness of a subnetwork is calculated from the authoritativeness of all nodes in the subnetwork.
As shown in fig. 6, the influence of all nodes is shown in the social network G, where in the process of iteratively calculating the influence of the nodes, the influence of the nodes is made to have a dynamic characteristic by propagating the concentration of each node in the social network, that is, according to the actual distribution condition of each sample in the data set.
As shown in fig. 7, when a certain test sample is embedded into the established social network G, the allowable connection set of the test sample with each sub-network is determined through the efficiency of the dual-layer structure. Wherein, the test sample is denoted by 'o', and the physical characteristics and the implicit style characteristics of the data in the efficiency of the double-layer structure work together through the balance coefficient.
As shown in fig. 8, due to the test sample and subnetwork'The sum of the permissible connection set node influence forces between "is greater than the sum of the permissible connection set node influence forces between the test sample and the subnetwork" ■ ", and therefore, the tag type of the test sample is determined to be" 0".
The foregoing has outlined and described the basic principles, features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (6)

1. The classification method based on the physical characteristics and the implicit style characteristics of the data is characterized by comprising the following steps:
step one: for a given dataset x= [ X 1 ,x 2 ,…,x N ] T Wherein x is i ∈R d The tag set is y= [ Y ] 1 ,y 2 ,…,y N ] T Mapping a given data set X into a social network G by using a K nearest neighbor algorithm, wherein the given data set is an epileptic electroencephalogram signal or handwriting data set or a vowel recognition data set;
step two: mining two kinds of data including authority and data influence in the constructed social network G, wherein the two kinds of data include style characteristics;
step three: for test set t= [ T ] 1 ,t 2 ,…,t M ] T Wherein t is m ∈R d Calculating an allowable connection set between each test sample in the test set and each sub-network in the social network G according to the physical characteristics and the implicit style characteristics of the data;
step four: calculating the sum of the influence of the nodes in each allowed connection set according to the allowed connection set generated in the step three;
step five: judging the label type of the test sample as a sub-network label type corresponding to the sum with the maximum node influence according to the calculated sum of the influence of each permitted connection concentrated node in the step four;
two kinds of data authority and data influence are mined in the constructed social network G, and style characteristics are revealed, and further:
each node v is first mined in the social network G constructed as described above i Authority a of (2) i Sub-network g q Authority of (2)The node is each data sample in the data set X, and the sub-network is each data class in the data set X;
the node v i Authority a of (2) i From node v in the social network G i Degree of ingress, egress, and degree of egress to fully calculate the node authority, the node v i Authority a of (2) i The calculation formula is that
Wherein,and +.>The calculation formulas are respectively
In the above-mentioned formula(s),d (D) i Respectively represent the node v i And xi represents a very small positive value such that outlier or noise samples in the data set X do not affect the classification performance of the classification method;
the node v is subjected to a fuzzification method i Fuzzification of authority of (2) to obtain node v i Is a fuzzy weight ω of (2) i The calculation formula is that
Wherein N represents the total number of samples contained in data set X;
when authority of any node is determined, calculating a sub-network g q Authority of the formula
Wherein,representing subnetwork g q The number of nodes involved, i.e. with sub-network g q The corresponding data class contains the number of samples, v m Representing subnetwork g q The m-th node is included;
when any node v i After determining the fuzzy weight of (a), calculating node v i The influence of (a) is calculated as
Wherein,represents the ith node v i Influence in the h iteration process, wherein alpha represents a social network damping coefficient, and the value is alpha=0.85; ρ j The node density representing the social network G is calculated by the following formula
Wherein d jk Representative node v j And v k The distance between the two is Euclidean distance, dc represents the cut-off distance, and the value is set manually so that the node v j The number of surrounding nodes accounts for 1-2% of the number of all nodes in the social network G; χ (·) represents a judgment function, i.e., if d jk -dc < 0, then χ (·) =1, whereas χ (·) =0; when the number of iterative loops in the formula (7) reaches a maximum value H or satisfies the following condition, the iterative loops are terminated;
wherein I 2 Representing the 2-normal form, θ represents a threshold.
2. The classification method based on data physical features and implicit style features of claim 1, wherein for a given dataset x= [ X ] 1 ,x 2 ,…,x N ] T Wherein x is i ∈R d The tag set is y= [ Y ] 1 ,y 2 ,…,y N ] T The given dataset X is mapped to a social network G using a K-nearest neighbor algorithm which, further,
mapping a given data set X using K-nearest neighbor algorithmSocial network g= { G 1 ,g 2 ,…,g Q -wherein Q is equal to the number of categories contained in data set X, each sample X in data set X i Node v corresponding to social network G i
According to the K-nearest neighbor algorithm, any two nodes v in the social network G i And v j The following conditions are satisfied:
node v j For node v i Neighbor node of (a) and node v j And node v i With the same label, then at node v i And v j Is established by v i As the starting point, v j Directed edge e being a node ij
3. The classification method based on data physical characteristics and implicit style characteristics according to claim 2, wherein said social network G, each sub-network corresponding to each data class in dataset X, any two sub-networks G p And g q Independent of each other, each node in the sub-network has the same label and is the same as the corresponding data class label.
4. A classification method based on data physical characteristics and implicit style characteristics as claimed in claim 3, wherein for the test set t= [ T ] 1 ,t 2 ,…,t M ] T Wherein t is m ∈R d Calculating an allowed connection set between each test sample in the test set and each sub-network in the social network G according to the data physical characteristics and the data implied style characteristics, and further comprising:
when embedding a test sample T in a test set T into the social network G, a double-layer efficiency Λ is first calculated t,j The calculation formula is that
Wherein v is j Representing subnetwork g q The j-th node in (a) is connected with the sub-networkg q The jth sample, d, in the corresponding class of data tj Representing the test sample t and node v j The distance between the two is Euclidean distance; gamma represents a balance coefficient, the higher the value thereof, the greater the function of representing the authority of the node, and conversely, the greater the function of representing the physical characteristics of the data;
the permissible connection set determination criteria are expressed as follows
Wherein the efficiency lambda of the double-layer structure t,j For the test sample t and the node v j A function between the test sample t and the node v for checking j When the connecting edge is established, the efficiency lambda of the double-layer structure is improved t,j The value is also to reduce the efficiency lambda of the double-layer structure t,j Values.
5. The classification method based on data physical characteristics and implicit style characteristics of claim 4 wherein, according to said generated allowed connection setCalculating a sum of influence of each of the allowed connection set nodes, further comprising:
according to the allowed connection setCalculating the sum of the influence of each allowed connection concentration node for determining the maximum influence sum, wherein the sum of the influence of each allowed connection concentration node is calculated by the formula
6. The classification method based on data physical characteristics and implicit style characteristics according to claim 5, wherein discriminating the label type of the test sample as the sub-network label type corresponding to the sum of the influence of the maximum nodes according to the sum of the influence of the nodes in each allowable connection set, further comprises:
determining the maximum influence sum according to the influence sum of the nodes in each allowed connection setThe calculation formula is as follows
According to the sum of the maximum influencesIdentifying the label type of the test sample as the sum +.>A corresponding sub-network tag type.
CN201910205905.5A 2019-03-18 2019-03-18 Classification method based on physical characteristics and implicit style characteristics of data Active CN109977150B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910205905.5A CN109977150B (en) 2019-03-18 2019-03-18 Classification method based on physical characteristics and implicit style characteristics of data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910205905.5A CN109977150B (en) 2019-03-18 2019-03-18 Classification method based on physical characteristics and implicit style characteristics of data

Publications (2)

Publication Number Publication Date
CN109977150A CN109977150A (en) 2019-07-05
CN109977150B true CN109977150B (en) 2023-11-10

Family

ID=67079384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910205905.5A Active CN109977150B (en) 2019-03-18 2019-03-18 Classification method based on physical characteristics and implicit style characteristics of data

Country Status (1)

Country Link
CN (1) CN109977150B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101370055A (en) * 2008-09-24 2009-02-18 中国电信股份有限公司 Propelling movement method, platform and system for recommending information of personalized ring back tone
CN103678669B (en) * 2013-12-25 2017-02-08 福州大学 Evaluating system and method for community influence in social network
CN103955451B (en) * 2014-05-15 2017-04-19 北京优捷信达信息科技有限公司 Method for judging emotional tendentiousness of short text
US10824674B2 (en) * 2016-06-03 2020-11-03 International Business Machines Corporation Label propagation in graphs
CN108564479B (en) * 2017-12-20 2022-02-11 重庆邮电大学 System and method for analyzing hot topic propagation trend based on hidden link

Also Published As

Publication number Publication date
CN109977150A (en) 2019-07-05

Similar Documents

Publication Publication Date Title
CN111814871B (en) Image classification method based on reliable weight optimal transmission
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
US11816183B2 (en) Methods and systems for mining minority-class data samples for training a neural network
CN108093406B (en) Wireless sensor network intrusion detection method based on ensemble learning
Li et al. SETRED: Self-training with editing
CN102629275B (en) Face and name aligning method and system facing to cross media news retrieval
CN109034205A (en) Image classification method based on the semi-supervised deep learning of direct-push
CN112906770A (en) Cross-modal fusion-based deep clustering method and system
CN113326731A (en) Cross-domain pedestrian re-identification algorithm based on momentum network guidance
CN109873779A (en) A kind of grading type wireless identification of signal modulation method based on LSTM
Chen et al. Label distribution‐based noise correction for multiclass crowdsourcing
CN110830291A (en) Node classification method of heterogeneous information network based on meta-path
CN112668633B (en) Adaptive graph migration learning method based on fine granularity field
CN112925710B (en) Fuzzy testing method based on gradient descent optimization
CN110705713A (en) Domain specific feature alignment method based on generation of countermeasure network
CN117150416B (en) Method, system, medium and equipment for detecting abnormal nodes of industrial Internet
CN111639680B (en) Identity recognition method based on expert feedback mechanism
CN109977150B (en) Classification method based on physical characteristics and implicit style characteristics of data
CN113343123A (en) Training method and detection method for generating confrontation multiple relation graph network
Lu et al. Personalized federated learning on long-tailed data via adversarial feature augmentation
CN111292062B (en) Network embedding-based crowd-sourced garbage worker detection method, system and storage medium
Wang et al. Enhancing rumor detection in social media using dynamic propagation structures
Zheng [Retracted] Construction and Application of Music Audio Database Based on Collaborative Filtering Algorithm
Khalil et al. Artificial Intelligence-based intrusion detection system for V2V communication in vehicular adhoc networks
CN112765489B (en) Social network link prediction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 213164 No.28, Mingxin Middle Road, Wujin District, Changzhou City, Jiangsu Province

Applicant after: Changzhou Polytechnic

Address before: 213164 No.28, Mingxin Middle Road, Wujin District, Changzhou City, Jiangsu Province

Applicant before: Changzhou Institute of Industry Technology

GR01 Patent grant
GR01 Patent grant