CN107679584A - Data classification method and system based on integrated resource allocation network RAN - Google Patents

Data classification method and system based on integrated resource allocation network RAN Download PDF

Info

Publication number
CN107679584A
CN107679584A CN201711022308.6A CN201711022308A CN107679584A CN 107679584 A CN107679584 A CN 107679584A CN 201711022308 A CN201711022308 A CN 201711022308A CN 107679584 A CN107679584 A CN 107679584A
Authority
CN
China
Prior art keywords
sample
input
ran
training
training sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711022308.6A
Other languages
Chinese (zh)
Inventor
张安国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ruijie Networks Co Ltd
Original Assignee
Ruijie Networks Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ruijie Networks Co Ltd filed Critical Ruijie Networks Co Ltd
Priority to CN201711022308.6A priority Critical patent/CN107679584A/en
Publication of CN107679584A publication Critical patent/CN107679584A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present invention provides a kind of data classification method and system based on integrated resource allocation network RAN, to solve the larger technical problem of existing training sample demand when being trained to RAN in the prior art.Wherein, method includes handling each training sample in training sample set based on the input weights set of default sample, obtain training sample set to be entered, P RAN graders to be trained in RAN are trained based on the training sample set to be entered, P RAN grader after being trained;The data to be sorted are classified based on P RAN grader after the training, and output category result.

Description

Data classification method and system based on integrated resource distribution network RAN
Technical Field
The invention relates to the technical field of computers, in particular to a data classification method and system based on an integrated resource distribution network (RAN).
Background
In recent years, artificial intelligence, particularly machine learning, has gained wide attention and application in academic and industrial circles. Machine learning is the meaning of "learning" in which an algorithm dynamically adjusts model structures or parameters using data so that a model has the ability to process and analyze such data. In the process of model learning, the quantity and quality of training data have a great influence on the model. Accordingly, the acquisition and processing of data becomes particularly important. However, in many practical projects, data collection is a time-consuming and labor-consuming task, and it is difficult to collect a sufficient amount of data or a sufficiently accurate data value. For example, some disease data and failure data of an aircraft engine, the data volume acquisition becomes very difficult due to the low incidence rate of the data itself. In addition, due to problems with the measuring sensor or the measuring method, there are more noise signals in the acquired data. Under the problem, how to obtain a stable machine learning model with higher accuracy by using a limited amount of data containing noise also becomes a research hotspot at present.
A Resource Allocation Network (RAN) is an implementation model for incremental learning, and the number of nodes in a Network hidden layer can be dynamically increased according to the complexity of a training sample to achieve higher computing power, so that the RAN is also interested in wide application and exploration in the industry. Meanwhile, the robustness and the calculation accuracy of the machine learning model on a small-scale training data set can be obviously improved through integrated learning. However, the existing RAM-based ensemble learning method has some problems.
In the prior art, each RAN is usually trained by using only a single class or pattern of training samples, and when the single class or pattern of training samples is used, only a small part of the training samples are used to train a single RAN, so that the total training samples are in a large demand and cannot be applied to a small sample data set.
In summary, there is a technical problem in the prior art that a training sample is required in training RAN.
Disclosure of Invention
The embodiment of the invention provides a data classification method and system based on an integrated resource distribution network (RAN), which are used for solving the technical problem that the requirement of training samples is large when the RAN is trained in the prior art.
In a first aspect, an embodiment of the present invention provides a data classification method based on an integrated resource distribution network RAN, including:
processing each training sample in the training sample set based on a preset sample input weight set to obtain a training sample set to be input; the preset sample input weight set comprises P M-dimensional sample input weight vectors, each training sample in the training sample set comprises an M-dimensional sample feature vector, the M-dimensional sample feature vector correspondingly indicates M sample characteristics of a corresponding training sample, one to-be-input training sample in the to-be-input training sample set is determined by one M-dimensional sample input weight vector and one training sample, each to-be-input training sample comprises M sample components, and P, M is an integer greater than or equal to 1;
training P RAN classifiers to be trained in the RAN based on the training sample set to be input to obtain P trained RAN classifiers;
classifying the data to be classified based on the trained P RAN classifiers, and outputting a classification result.
In a possible implementation manner, the processing each training sample in the training sample set based on the preset sample input weight set to obtain a training sample set to be input includes:
processing each training sample in the training sample set based on a preset sample input weight set to obtain a plurality of training samples to be input; when one training sample to be input in the plurality of training samples to be input is obtained, the following operations are executed: determining a kth sample component on the M sample components of the training sample to be input based on a kth sample input weight component on an M-dimensional sample input weight vector in the preset sample input weight set and a kth sample feature component on an M-dimensional sample feature vector of a training sample in the training sample set; wherein k is an integer from 1 to M in sequence; determining a training sample composed of the M sample components as the training sample to be input;
and determining a set formed by the plurality of training samples to be input as the set of training samples to be input.
In a possible implementation manner, the training P RAN classifiers to be trained in the RAN based on the training sample set to be input to obtain P trained RAN classifiers includes:
training any RAN classifier to be trained in P RAN classifiers to be trained based on at least one training sample to be input in the training sample set to be input, and obtaining a trained RAN classifier;
determining a plurality of the trained RAN classifiers as P trained RAN classifiers.
In a possible implementation manner, the classifying data to be classified based on the trained P RAN classifiers and outputting a classification result includes:
classifying the data to be classified by adopting each RAN classifier in the trained P RAN classifiers to obtain P output results; wherein each of the P output results is used to indicate a probability of a sample characteristic that is emphasized when the data to be classified is classified by a corresponding RAN classifier;
outputting the classification result based on the P output results.
In one possible implementation manner, the outputting the classification result based on the P output results includes:
respectively counting the probability of the same output result in the P output results;
determining the output result with the highest probability of the same output result as the classification result;
and outputting the classification result.
In a second aspect, an embodiment of the present invention provides a data classification system, including:
the processing module is used for processing each training sample in the training sample set based on a preset sample input weight set to obtain a training sample set to be input; the preset sample input weight set comprises P M-dimensional sample input weight vectors, each training sample in the training sample set comprises an M-dimensional sample feature vector, the M-dimensional sample feature vector correspondingly indicates M sample characteristics of a corresponding training sample, one to-be-input training sample in the to-be-input training sample set is determined by one M-dimensional sample input weight vector and one training sample, each to-be-input training sample comprises M sample components, and P, M is an integer greater than or equal to 1;
the training module is used for training P RAN classifiers to be trained in the RAN based on the training sample set to be input to obtain P trained RAN classifiers;
and the classification module is used for classifying the data to be classified based on the trained P RAN classifiers and outputting a classification result.
In one possible implementation, the processing module is configured to:
processing each training sample in the training sample set based on a preset sample input weight set to obtain a plurality of training samples to be input; when one training sample to be input in the plurality of training samples to be input is obtained, the following operations are executed: determining a kth sample component on the M sample components of the training sample to be input based on a kth sample input weight component on an M-dimensional sample input weight vector in the preset sample input weight set and a kth sample feature component on an M-dimensional sample feature vector of a training sample in the training sample set; wherein k is an integer from 1 to M in sequence; determining a training sample composed of the M sample components as the training sample to be input;
and determining a set formed by the plurality of training samples to be input as the set of training samples to be input.
In one possible implementation, the training module is configured to:
training any RAN classifier to be trained in P RAN classifiers to be trained based on at least one training sample to be input in the training sample set to be input, and obtaining a trained RAN classifier;
determining a plurality of the trained RAN classifiers as P trained RAN classifiers.
In one possible implementation, the classification module is configured to:
classifying the data to be classified by adopting each RAN classifier in the trained P RAN classifiers to obtain P output results; wherein each of the P output results is used to indicate a probability of a sample characteristic that is emphasized when the data to be classified is classified by a corresponding RAN classifier;
outputting the classification result based on the P output results.
In a possible implementation manner, the classification module is specifically configured to:
respectively counting the probability of the same output result in the P output results;
determining the output result with the highest probability of the same output result as the classification result;
and outputting the classification result.
In a third aspect, an embodiment of the present invention provides a computer apparatus, including:
at least one processor, and
a memory communicatively coupled to the at least one processor, a communication interface;
wherein the memory stores instructions executable by the at least one processor, the at least one processor performing the method of the first aspect with the communication interface by executing the instructions stored by the memory.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, including:
the computer readable storage medium stores computer instructions which, when executed on a computer, cause the computer to perform the method of the first aspect.
The embodiment of the invention provides a data classification method based on an integrated resource distribution network (RAN). Each training sample in a training sample set is processed through a preset sample input weight set to obtain a training sample set to be input, then P RAN classifiers to be trained in the RAN are subjected to parallel training according to the obtained training sample set to be input to obtain P trained RAN classifiers, and then the trained P RAN classifiers are adopted to classify data to be classified, and classification results are output. In the embodiment of the invention, because the difference exists between every two training samples to be input in the processed training sample set to be input, namely, in the embodiment of the invention, the training samples to be input can be obtained only by the training samples of the small data set, so that the requirement on the training samples is low.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a system architecture diagram of a data classification system in an embodiment of the present invention;
FIG. 2 is a model diagram of a RAN classifier in an embodiment of the invention;
fig. 3 is a flowchart illustrating a data classification method based on an integrated resource distribution network RAN according to an embodiment of the present invention;
FIG. 4 is a block diagram of a data classification system according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.
First, a system architecture to which the embodiments of the present invention are applied will be described for a person skilled in the art to understand.
Referring to fig. 1, a system architecture diagram of a data classification system to which the overall design of the present invention is applied is shown. The respective structures and symbols in fig. 1 will be briefly described from left to right.
In fig. 1, x may represent a training sample set in the embodiment of the present invention, and may include one or more training samples, where the number of the training samples may be determined according to an actual application, and the embodiment of the present invention is not limited specifically. The training samples may be pixel points and corresponding pixel values, text data, and the like.
ToThe formed set is the preset sample input weight set in the embodiment of the invention, whereinFor the purpose of example only,when existing alone, the vector is usually recorded, i.e. the M-dimensional sample input weight vector.
The RANs may include P subnets, that is, RANs 1 through RANp shown in fig. 1, and each RAN1 through RANp is referred to as a RAN classifier in the embodiment of the present invention.
one of o (1) to o (p) o (i) may correspond to an output result set representing a plurality of output results of one RAN, and one of y (1) to y (p) y (i) may represent each of the output result setsAnd (3) carrying out normalization processing on the output results to obtain a corresponding set, wherein p is an integer greater than or equal to 1, and i is an integer from 1 to p. Voting (VOTE) means that the final output results of y (1) to y (p) can be voted to obtain the final classification result, i.e. the classification result in FIG. 1
For convenience of understanding, a model of a RAN classifier in the embodiment of the present invention is briefly introduced below, and please refer to fig. 2, which is a schematic diagram of a model of a RAN classifier in the embodiment of the present invention.
As can be seen from fig. 2, each RAN classifier can be regarded as a RAN neural network having a three-layer structure, assuming that the input and output layer dimensions are M, L respectively, the number N of hidden layer nodes, i.e., c1 to cN, may be increased as the novelty of the input training samples occurs in the training process; b(1),b(2),...,b(L)Respective components corresponding to the offset amount of the output layer can be represented.
For example, the input set X ═ { X for T training samples1,x2,...,xTWhere each training sample xiFor an M-dimensional sample feature vector, M sample characteristics indicative of a respective training sample may be corresponded, and may be mathematically expressed as:
wherein, the [ alpha ], [ beta ]]' may refer to a transpose operation,a determinant of M rows by 1 columns may be represented, and each component in the determinant may be a real number, the same below.
Then, the center vector of the jth node of the hidden layer nodes can be expressed as:
the center width vector of the jth node of the hidden layer nodes can be expressed as:
the offset of the output layer can be expressed as:
b=[b(1),b(2),...,b(L)]′
thus, corresponding to the input amount xiThe network output of (a) is:
in the calculation process, the state of the jth node in the hidden layer nodes can be represented as:
where | | can be expressed as a modulo operation,representing the jth component on the training sample,the jth component on the central vector of the jth node representing the hidden layer node,the jth component on the center width vector of the jth node representing the hidden layer node.
The output of the ith output layer node may be expressed as:
for the multi-classification problem, classification is performed using the softmax algorithm,
the final classification output is:
preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Example one
Referring to fig. 3, an embodiment of the present invention provides a data classification method based on an integrated resource distribution network RAN, which may be applied to the data classification system shown in fig. 1, where an implementation process of the method may be described as follows:
s101: processing each training sample in the training sample set based on a preset sample input weight set to obtain a training sample set to be input; the preset sample input weight set comprises P M-dimensional sample input weight vectors, each training sample in the training sample set comprises an M-dimensional sample feature vector, the M-dimensional sample feature vectors correspondingly indicate M sample characteristics of corresponding training samples, one to-be-input training sample in the to-be-input training sample set is determined by one M-dimensional sample input weight vector and one training sample, each to-be-input training sample comprises M sample components, and P, M is an integer greater than or equal to 1;
s102: training P RAN classifiers to be trained in the RAN based on a training sample set to be input to obtain P trained RAN classifiers;
s103: classifying the data to be classified based on the trained P RAN classifiers, and outputting a classification result.
In the embodiment of the present invention, in order to obtain different RAN classifiers focusing on classifying different sample characteristics of data to be classified, a preset sample input weight set may be set, which is used to process each training sample in a training sample set to obtain a training sample set to be input, and then the obtained training sample set to be input may be used to train a RAN classifier in a RAN network to obtain a plurality of different RAN classifiers.
In S101, the preset sample input weight set may include the values as shown in fig. 1ToThat is, P M-dimensional sample input weight vectors, each of which includes M weight components, whose dimensions correspond to those of the sample feature vector of each training sample.
The preset sample input weight set may be obtained by the data classification system through random initialization according to relevant parameters of the training samples, such as the number of the training samples, the richness of the training samples, and may be fixed and unchanged in the subsequent training process of the RAN classifier.
Alternatively, the preset sample input weight set may be set by an engineer in a self-defined manner. The number of sample input weight vectors included in the preset sample input weight set corresponds to the number of RANs to be trained included in the RAN, that is, one sample input weight vector corresponds to one RAN to be trained.
The training sample set may include one or more training samples, and the number of training samples may be determined according to the actual application. For example, in an initial situation, the number of training samples may be limited, the number of training samples included in the training sample set may be limited and small, and as the number of training samples increases with the continuous accumulation of training data, the number of training samples included in the training sample set also increases continuously.
A training sample can be regarded as an M-dimensional sample feature vector, such as the training sample x described abovei. For example, assuming that a training sample includes 10 × 10, which includes 100 pixels, the training sample can be regarded as a 100-dimensional sample feature vector; since the pixel values corresponding to each pixel point may be different, the 100-dimensional sample feature vector may correspond to 100 sample characteristics indicative of the training sample.
For example, a picture including the letter a may be formed by 100 pixel points and the pixel value corresponding to each pixel point; or, the 100 pixel points and the pixel value corresponding to each pixel point may also constitute a picture including the letter B, and the like, and there may be partial identity or complete difference between the 100 sample features of the picture including the letter a and the 100 sample features of the picture including the letter B.
In a possible implementation manner, each training sample in the training sample set may be processed according to a preset sample input weight set, so as to obtain a training sample set to be input.
Taking the example of obtaining one to-be-input training sample in the to-be-input training sample set as follows:
a training sample to be input can be determined according to an M-dimensional sample input weight vector in a preset sample input weight set and a training sample in a training sample set.
Because each RAN classifier to be trained corresponds to one M-dimensional sample input weight vector, each training sample in the training sample set can be processed before the training sample to be input is input into the RAN classifier.
Let the ith training sample to be processed in the training sample set be xiIncluding 100 pixel points, the mathematical expression is:wherein,respectively and correspondingly representing 1 st to 100 th pixel points, wherein each pixel point corresponds to a pixel value;a determinant of 100 rows by 1 columns may be represented, and each component in the determinant may be a real number.
The RAN classifier to be trained is RAN1 in fig. 1, and the corresponding M-dimensional sample input weight vector isThe mathematical expression is as follows:
therefore, a weight vector can be input according to the M-dimensional samplesFor training sample xiProcessing to obtain a training sample to be inputThe mathematics can be expressed as:
in a possible implementation manner, as can be seen from the above mathematical expression, determining a training sample to be input may be performed according to a kth sample input weight on an M-dimensional sample input weight vectorDetermining a kth sample component on M sample components of a training sample to be input, wherein the kth sample component is the component on the M-dimensional sample feature vector of the training sample; wherein k is an integer from 1 to M in sequence; for example, the kth sample component of M sample components of a training sample to be input may be represented as:
then, the M sample components determined in the above manner may constitute one training sample to be input.
After the training sample set to be input is obtained by the method, S102 may be entered, that is, P RAN classifiers to be trained in the RAN may be trained according to the training sample set to be input, so as to obtain P trained RAN classifiers.
After the training sample set is processed by the method, any sample to be input in the obtained training sample set to be input has difference when being input into the P RAN classifiers to be trained, so that the training of any two RAN classifiers to be trained in the P RAN classifiers to be trained is independently carried out, and each trained RAN classifier can be independently used for classifying data to be classified.
Because any one to-be-input training sample in the obtained to-be-input training sample set has a difference when inputting the P to-be-trained RAN classifiers, each trained RAN classifier has at least one sample characteristic of the data to be classified that is emphasized when classifying the data to be classified.
For example, any RAN classifier of the trained P RAN classifiers focuses on classifying a sample feature of the data to be classified, for example, taking RAN1-RANp in fig. 1 as an example, assuming that P is 26, RAN1-RANp may focus on classifying the 26 letters a-Z, respectively; for example, if the RAN1-RANp in fig. 1 is still taken as an example, if P is 10, then in the RAN1-RAN10, the RAN1 may focus on classifying A, B and C letters, and the weights of the corresponding A, B and C letters may be the same and may be different; the RAN2 may focus on the classification of B and D letters, and the side weights of B and D may be the same or different, and the RAN3-RAN10 are similar, and the description of the embodiment of the present invention is omitted.
In a possible implementation manner, when any RAN classifier to be trained in the P RAN classifiers to be trained is trained, at least one training sample to be input in the training sample set to be input may be adopted to obtain a corresponding trained RAN classifier.
After the training sample set is processed by the method, any one to-be-input training sample in the obtained to-be-input training sample set has a difference when the to-be-trained RAN classifiers are input, namely, any two to-be-input training samples in the to-be-input training sample set may have a difference, when the to-be-input training sample set is adopted to train any one to-be-trained RAN classifier in the P to-be-trained RAN classifiers, one or more to-be-input training samples in the to-be-input training sample set can be adopted, so that the obtained trained P RAN classifiers have a difference, and the classification accuracy of the RAN in classifying the to-be-classified data is improved.
In the prior art, a RAN integrated learning method based on Long-Term Memory (LTM) is generally adopted, and an adaboost.m1 algorithm is used to calculate an input sample weight of a next RAN network according to an output error of a current RAN network. The calculation precision of the method is superior to that of an algorithm model of a single classifier. However, since this method needs to use the output error of the current RAN to train the next RAN classifier, only a single RAN classifier can be trained at each time point, and all RAN sets cannot be trained in parallel, which consumes a lot of training time.
Meanwhile, a RAN network integration method based on Bagging technology is also proposed in the literature, and in order to ensure that the integrated RAN networks have certain differences, each RAN network is trained only with sample data of N-1 classes, where N is the number of classes of total sample tags. For a certain RAN subnetwork in the integration, if the distance between the input sample and the nearest node in the network is greater than a certain threshold, it indicates that the input sample label does not belong to the label space when the network is trained. The choice of this threshold becomes particularly important for each RAN subnetwork, and the threshold is highly dependent on the N-1 classes of training samples used, and the eigenvalue distributions of the remaining 1 class of training samples. Therefore, the threshold value can be difficult and tedious to select, even unavailable, and the training efficiency is low.
In the embodiment of the present invention, each RAN classifier may be trained in a parallel training manner, that is, the RAN1-RANp classifiers in fig. 1 may be trained simultaneously, and only a short time is required to complete the whole training process, thereby improving the training efficiency.
In a possible implementation manner, classifying the data to be classified based on the trained P RAN classifiers and outputting the classification result may be performed by, but not limited to, the following manners:
because each RAN classifier in the trained P RAN classifiers can be independently used for classifying data to be classified, each RAN classifier in the trained P RAN classifiers can be simultaneously adopted for classifying the data to be classified to obtain P output results; wherein each of the P output results is used to indicate a probability of a sample characteristic that is emphasized when the data to be classified is classified by the corresponding RAN classifier.
For example, the data to be classified is a picture containing a letter a, and the trained P RAN classifiers can be used to identify 26 letters, where the accuracy of the RAN1 identifying the letter a is 80%, and the accuracy of identifying the remaining 25 letters except the letter a is less than 80%; RAN2 identified letter B as 80% and the recognition accuracy for the remaining 25 letters other than letter B was less than 80%; RAN3 recognizes the letter a as 60% and the recognition accuracy for the remaining 25 letters except the letter a is less than 60%, etc.
Then, the trained P RAN classifiers are used to classify the picture containing the letter a, so as to obtain P output results, where each output result of the P output results is the final output result of the corresponding RAN classifier.
As can be seen from fig. 2, each RAN classifier can be regarded as a RAN neural network with a three-layer structure, wherein the three-layer structure is an input layer, a hidden layer and an output layer, and respectively comprises a plurality of input nodes, a plurality of hidden layer nodes and a plurality of output nodes.
Using training samples to be inputIn training the RAN classifier, for the kth node on the input layer of the RAN classifier, the input value may correspond toWherein,represented as the kth component on the corresponding sample input weight vector,denoted as the kth component of the training sample to be input.
Thus, multiple output nodes of each RAN classifier may correspond to multiple output results, and the final output result of each RAN classifier may be voted from the multiple output results.
For example, the final output result corresponding to RAN1 is a, the final output result corresponding to RAN2 is B, the final output result corresponding to RAN3 is a, and so on.
And the classification result can be output according to the P output results.
In one possible implementation, the classification result is output based on the P output results, which may be performed by, but is not limited to, the following: and respectively counting the probabilities of the same output results in the P output results, further determining the output result with the highest probability of the same output results as a classification result, and outputting the classification result.
For example, the RAN includes 100 trained RAN classifiers, and when the RAN is used to classify the pictures to be classified, each RAN classifier of the 100 trained RAN classifiers is substantially used to classify the pictures to be classified independently, and accordingly, 100 output results can be obtained.
Assuming that 20 pictures to be classified are indicated as pictures including the letter a in the 100 output results, the output results are: the pictures to be classified are pictures including the letter A, and the probability is as follows: 20 percent; if the number of the pictures to be classified is 30 pictures including the letter B, the output result is: the pictures to be classified are pictures including a letter B, and the probability is as follows: 30 percent; if 50 pictures including the letter C are indicated as the pictures to be classified, the output result is: the pictures to be classified are pictures including a letter C, and the probability is as follows: 50 percent. Further, the final classification result can be obtained as: the classified pictures are pictures including the letter C.
In summary, one or more technical solutions of the embodiments of the present invention have the following technical effects or advantages:
firstly, each training sample in a training sample set is processed through a preset sample input weight set to obtain a training sample set to be input, then P RAN classifiers to be trained in RAN are subjected to parallel training according to the obtained training sample set to be input to obtain P trained RAN classifiers, wherein each trained RAN classifier can be independently used for classifying data to be classified, and then the trained P RAN classifiers are adopted for classifying the data to be classified and outputting classification results. In the embodiment of the invention, because the difference exists between every two samples to be input in the processed training sample set to be input, namely, in the embodiment of the invention, the training samples of a small data set are only needed to obtain various samples to be input, so the demand for the training samples is low.
Secondly, in the embodiment of the invention, each training sample in the training sample set can be processed according to the preset sample input weight set, so that the difference exists between any two training samples to be input in the training sample set to be input, and after the training sample set to be input is adopted to train the P RAN classifiers to be trained, the obtained P RAN classifiers also have the difference, namely, each trained RAN classifier can be independently used for classifying data to be classified, and one trained RAN classifier focuses on classifying the sample characteristics of the data to be classified, thereby improving the classification precision of the data processing system.
Thirdly, in the embodiment of the present invention, a small number of training samples may be used to train the RAN classifiers in an initial situation, and each RAN classifier starts to work after training. With the continuous accumulation of future training samples, the scale and parameters of the RAN classifier can be continuously adjusted through online learning, so that each sub-network has good computing performance, and the whole RAN integration system has higher classification accuracy.
Example two
Referring to fig. 4, based on the same inventive concept, an embodiment of the present invention provides a data classification system, which includes a processing module 41, a training module 42, and a classification module 43.
The processing module 41 is configured to process each training sample in the training sample set based on a preset sample input weight set, and obtain a training sample set to be input; the preset sample input weight set comprises P M-dimensional sample input weight vectors, each training sample in the training sample set comprises an M-dimensional sample feature vector, the M-dimensional sample feature vector correspondingly indicates M sample characteristics of a corresponding training sample, one to-be-input training sample in the to-be-input training sample set is determined by one M-dimensional sample input weight vector and one training sample, each to-be-input training sample comprises M sample components, and P, M is an integer greater than or equal to 1;
a training module 42, configured to train P RAN classifiers to be trained in the RAN based on the training sample set to be input, to obtain P trained RAN classifiers;
a classification module 43, configured to classify the data to be classified based on the trained P RAN classifiers, and output a classification result.
In a possible implementation manner, the processing module 41 is configured to:
processing each training sample in the training sample set based on a preset sample input weight set to obtain a plurality of training samples to be input; when one training sample to be input in the plurality of training samples to be input is obtained, the following operations are executed: determining a kth sample component on the M sample components of the training sample to be input based on a kth sample input weight component on an M-dimensional sample input weight vector in the preset sample input weight set and a kth sample feature component on an M-dimensional sample feature vector of a training sample in the training sample set; wherein k is an integer from 1 to M in sequence; determining a training sample composed of the M sample components as the training sample to be input;
and determining a set formed by the plurality of training samples to be input as the set of training samples to be input.
In one possible implementation, the training module 42 is configured to:
training any RAN classifier to be trained in P RAN classifiers to be trained based on at least one training sample to be input in the training sample set to be input, and obtaining a trained RAN classifier;
determining a plurality of the trained RAN classifiers as P trained RAN classifiers.
In one possible implementation, the classification module 43 is configured to:
classifying the data to be classified by adopting each RAN classifier in the trained P RAN classifiers to obtain P output results; wherein each of the P output results is used to indicate a probability of a sample characteristic that is emphasized when the data to be classified is classified by a corresponding RAN classifier;
outputting the classification result based on the P output results.
In a possible implementation manner, the classification module 43 is specifically configured to:
respectively counting the probability of the same output result in the P output results;
determining the output result with the highest probability of the same output result as the classification result;
and outputting the classification result.
EXAMPLE III
Referring to fig. 5, based on the same inventive concept, an embodiment of the present invention provides a computer apparatus, which includes at least one processor 51, and a memory 52 and a communication interface 53 communicatively connected to the at least one processor 51, where fig. 5 illustrates one processor 51 as an example.
Wherein the memory 52 stores instructions executable by the at least one processor 51, and the at least one processor 51 performs the method according to the first embodiment by executing the instructions stored in the memory 52 through the communication interface 53.
Example four
Based on the same inventive concept, the embodiments of the present invention provide a computer-readable storage medium storing computer instructions, which, when executed on a computer, cause the computer to perform the method according to the first embodiment.
In particular implementations, the computer-readable storage medium includes: various storage media capable of storing program codes, such as a universal Serial Bus flash drive (USB), a removable hard disk, a Read-only memory (ROP), a random Access memory (RAP), a magnetic disk, or an optical disk.
The above-described embodiments of the apparatus are merely illustrative, wherein units/modules illustrated as separate components may or may not be physically separate, and components shown as units/modules may or may not be physical units/modules, may be located in one place, or may be distributed over a plurality of network units/modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROP/RAP, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the method according to the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (12)

1. A data classification method based on an integrated resource distribution network (RAN) is characterized by comprising the following steps:
processing each training sample in the training sample set based on a preset sample input weight set to obtain a training sample set to be input; the preset sample input weight set comprises P M-dimensional sample input weight vectors, each training sample in the training sample set comprises an M-dimensional sample feature vector, the M-dimensional sample feature vector correspondingly indicates M sample characteristics of a corresponding training sample, one to-be-input training sample in the to-be-input training sample set is determined by one M-dimensional sample input weight vector and one training sample, each to-be-input training sample comprises M sample components, and P, M is an integer greater than or equal to 1;
training P RAN classifiers to be trained in the RAN based on the training sample set to be input to obtain P trained RAN classifiers;
classifying the data to be classified based on the trained P RAN classifiers, and outputting a classification result.
2. The method of claim 1, wherein the processing each training sample in the set of training samples based on the preset sample input weight set to obtain a set of training samples to be input comprises:
processing each training sample in the training sample set based on a preset sample input weight set to obtain a plurality of training samples to be input; when one training sample to be input in the plurality of training samples to be input is obtained, the following operations are executed: determining a kth sample component on the M sample components of the training sample to be input based on a kth sample input weight component on an M-dimensional sample input weight vector in the preset sample input weight set and a kth sample feature component on an M-dimensional sample feature vector of a training sample in the training sample set; wherein k is an integer from 1 to M in sequence; determining a training sample composed of the M sample components as the training sample to be input;
and determining a set formed by the plurality of training samples to be input as the set of training samples to be input.
3. The method of claim 1 or 2, wherein the training P RAN classifiers to be trained in the RAN based on the set of training samples to be input to obtain P trained RAN classifiers, comprises:
training any RAN classifier to be trained in P RAN classifiers to be trained based on at least one training sample to be input in the training sample set to be input, and obtaining a trained RAN classifier;
determining a plurality of the trained RAN classifiers as P trained RAN classifiers.
4. The method of claim 3, wherein the classifying the data to be classified based on the trained P RAN classifiers and outputting the classification result comprises:
classifying the data to be classified by adopting each RAN classifier in the trained P RAN classifiers to obtain P output results; wherein each of the P output results is used to indicate a probability of a sample characteristic that is emphasized when the data to be classified is classified by a corresponding RAN classifier;
outputting the classification result based on the P output results.
5. The method of claim 4, wherein said outputting the classification result based on the P output results comprises:
respectively counting the probability of the same output result in the P output results;
determining the output result with the highest probability of the same output result as the classification result;
and outputting the classification result.
6. A data classification system, characterized in that the system comprises:
the processing module is used for processing each training sample in the training sample set based on a preset sample input weight set to obtain a training sample set to be input; the preset sample input weight set comprises P M-dimensional sample input weight vectors, each training sample in the training sample set comprises an M-dimensional sample feature vector, the M-dimensional sample feature vector correspondingly indicates M sample characteristics of a corresponding training sample, one to-be-input training sample in the to-be-input training sample set is determined by one M-dimensional sample input weight vector and one training sample, each to-be-input training sample comprises M sample components, and P, M is an integer greater than or equal to 1;
the training module is used for training P RAN classifiers to be trained in the RAN based on the training sample set to be input to obtain P trained RAN classifiers;
and the classification module is used for classifying the data to be classified based on the trained P RAN classifiers and outputting a classification result.
7. The system of claim 6, wherein the processing module is to:
processing each training sample in the training sample set based on a preset sample input weight set to obtain a plurality of training samples to be input; when one training sample to be input in the plurality of training samples to be input is obtained, the following operations are executed: determining a kth sample component on the M sample components of the training sample to be input based on a kth sample input weight component on an M-dimensional sample input weight vector in the preset sample input weight set and a kth sample feature component on an M-dimensional sample feature vector of a training sample in the training sample set; wherein k is an integer from 1 to M in sequence; determining a training sample composed of the M sample components as the training sample to be input;
and determining a set formed by the plurality of training samples to be input as the set of training samples to be input.
8. The system of claim 6 or 7, wherein the training module is to:
training any RAN classifier to be trained in P RAN classifiers to be trained based on at least one training sample to be input in the training sample set to be input, and obtaining a trained RAN classifier;
determining a plurality of the trained RAN classifiers as P trained RAN classifiers.
9. The system of claim 8, wherein the classification module is to:
classifying the data to be classified by adopting each RAN classifier in the trained P RAN classifiers to obtain P output results; wherein each of the P output results is used to indicate a probability of a sample characteristic that is emphasized when the data to be classified is classified by a corresponding RAN classifier;
outputting the classification result based on the P output results.
10. The system of claim 9, wherein the classification module is specifically configured to:
respectively counting the probability of the same output result in the P output results;
determining the output result with the highest probability of the same output result as the classification result;
and outputting the classification result.
11. A computer device, the computer device comprising:
at least one processor, and
a memory communicatively coupled to the at least one processor, a communication interface;
wherein the memory stores instructions executable by the at least one processor, the at least one processor performing the method of any one of claims 1-5 with the communications interface by executing the instructions stored by the memory.
12. A computer-readable storage medium characterized by:
the computer readable storage medium stores computer instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1-5.
CN201711022308.6A 2017-10-27 2017-10-27 Data classification method and system based on integrated resource allocation network RAN Pending CN107679584A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711022308.6A CN107679584A (en) 2017-10-27 2017-10-27 Data classification method and system based on integrated resource allocation network RAN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711022308.6A CN107679584A (en) 2017-10-27 2017-10-27 Data classification method and system based on integrated resource allocation network RAN

Publications (1)

Publication Number Publication Date
CN107679584A true CN107679584A (en) 2018-02-09

Family

ID=61142688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711022308.6A Pending CN107679584A (en) 2017-10-27 2017-10-27 Data classification method and system based on integrated resource allocation network RAN

Country Status (1)

Country Link
CN (1) CN107679584A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633360A (en) * 2020-12-18 2021-04-09 中国地质大学(武汉) Classification method based on cerebral cortex learning mode

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633360A (en) * 2020-12-18 2021-04-09 中国地质大学(武汉) Classification method based on cerebral cortex learning mode
CN112633360B (en) * 2020-12-18 2024-04-05 中国地质大学(武汉) Classification method based on cerebral cortex learning mode

Similar Documents

Publication Publication Date Title
Thai et al. Image classification using support vector machine and artificial neural network
CN106415594B (en) Method and system for face verification
JP7266674B2 (en) Image classification model training method, image processing method and apparatus
CN109978893A (en) Training method, device, equipment and the storage medium of image, semantic segmentation network
CN108345827B (en) Method, system and neural network for identifying document direction
US20070065003A1 (en) Real-time recognition of mixed source text
WO2023284465A1 (en) Image detection method and apparatus, computer-readable storage medium, and computer device
CN110111113B (en) Abnormal transaction node detection method and device
CN103927550B (en) A kind of Handwritten Numeral Recognition Method and system
CN111046774A (en) Chinese signature handwriting identification method based on convolutional neural network
CN111178196B (en) Cell classification method, device and equipment
CN110264311B (en) Business promotion information accurate recommendation method and system based on deep learning
CN108710907A (en) Handwritten form data classification method, model training method, device, equipment and medium
KR102618916B1 (en) Data classification method and system, and classifier training method and system
CN117156442B (en) Cloud data security protection method and system based on 5G network
CN110110845A (en) Learning method based on parallel multi-level width neural network
CN114419313A (en) Image identification method and image identification system
CN111159481B (en) Edge prediction method and device for graph data and terminal equipment
Bunch et al. Weighting vectors for machine learning: numerical harmonic analysis applied to boundary detection
CN108830302B (en) Image classification method, training method, classification prediction method and related device
CN105678333B (en) Method and device for determining crowded area
CN107679584A (en) Data classification method and system based on integrated resource allocation network RAN
KR102178238B1 (en) Apparatus and method of defect classification using rotating kernel based on machine-learning
Liu et al. A weight-incorporated similarity-based clustering ensemble method
EP3444759B1 (en) Synthetic rare class generation by preserving morphological identity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180209