WO2020150955A1 - Procédé et appareil de classification de données, ainsi que dispositif et support de stockage - Google Patents
Procédé et appareil de classification de données, ainsi que dispositif et support de stockage Download PDFInfo
- Publication number
- WO2020150955A1 WO2020150955A1 PCT/CN2019/072932 CN2019072932W WO2020150955A1 WO 2020150955 A1 WO2020150955 A1 WO 2020150955A1 CN 2019072932 W CN2019072932 W CN 2019072932W WO 2020150955 A1 WO2020150955 A1 WO 2020150955A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- value attribute
- continuous value
- data
- continuous
- attribute
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
Definitions
- the present invention relates to the field of data processing technology, in particular to a data classification method, device, equipment and storage medium.
- the operating data are mostly mixed-value attributes, which include continuous value attributes and discrete value attributes.
- a common classification method is to continuousize discrete-valued attributes, and then classify continuous-valued attributes.
- the attribute value after the one-hot encoding operation is still discrete in the sense of numerical distribution, and it does not fundamentally solve the continuity of the discrete value attribute.
- the present invention provides a data classification method, device, equipment and storage medium to solve the problem that the existing classification method does not realize the continuity of discrete value attributes by using one-hot encoding operation.
- the present invention provides a data classification method, which includes: performing continuous encoding processing on discrete value attributes to obtain a second continuous value attribute, wherein the data includes the discrete value attribute and the first continuous value attribute; Continuous value attributes are trained, and the first Hidden layer data as the third continuous value attribute, where the neural network includes A hidden layer; the first continuous value attribute and the third continuous value attribute are merged to obtain the fourth continuous value attribute; the fourth continuous value attribute is classified to obtain the classified data.
- the discrete-valued attributes are first continuously coded, and then the neural network is used to train the second continuous-valued attributes, thereby thoroughly transforming the discrete-valued attributes into ordered information A continuous value attribute whose value is a real number.
- the first Hidden layer data as the third continuous value attribute, it also includes: constructing the objective function, where the objective function is the sum of the error value of the third continuous value attribute and the entropy; using the second continuous value attribute to train the neural network , Until the value of the objective function is the minimum.
- the error value of the third continuous value attribute and the sum of substituted entropy are used as the objective function to train the neural network, and the neural network is used to train the second continuous value attribute, that is, except In addition to ensuring the minimum error between the actual output and the actual output, it also ensures the minimum uncertainty of the data set after the conversion.
- constructing the objective function specifically includes: performing subtraction processing on the theoretical value of the third continuous value attribute and the third continuous value attribute to obtain an error value; performing data set division on the third continuous value attribute to obtain the third continuous value attribute A sub-data set, wherein the first data set includes a plurality of first sub-data sets; obtains the substitution entropy of the first sub-data set; superimposes the substitution entropy of the plurality of first sub-data sets to obtain the third consecutive The substitution entropy of the value attribute.
- the third continuous value attribute is divided into data sets to obtain the first sub-data set, and the substitution entropy of the first sub-data set is obtained to obtain the substitution entropy of the first data set, Reduce computational complexity.
- obtaining the substitution entropy of the first sub-data set specifically includes:
- the substitution entropy of the first sub-data set is obtained according to the first formula, where the first formula is: Represents the first subdata, En[ ⁇ ] represents the entropy, Indicates the number of samples of the data, b q represents the window width of the kernel density estimation method, Respectively represent the nth and mth elements in the first subdata.
- performing superposition processing on the substitution entropy of a plurality of first sub-data sets to obtain the substitution entropy of the third continuous value attribute specifically includes:
- the substitution entropy of the third continuous value attribute is obtained according to the second formula, where the second formula is: among them, For the first The number of nodes in a hidden layer, Is the third continuous value attribute,
- the data classification device is introduced below, and its implementation principle and technical effect are similar to the principle and technical effect of the above method, and will not be repeated here.
- the present invention provides a data classification device, including: an obtaining module for performing continuous encoding processing on discrete value attributes to obtain a second continuous value attribute, wherein the data includes the discrete value attribute and the first continuous value attribute; Module, used to train the second continuous value attribute using neural network, Hidden layer data as the third continuous value attribute, where the neural network includes A hidden layer; the acquisition module is used to merge the first continuous value attribute and the third continuous value attribute to obtain the fourth continuous value attribute; the acquisition module is also used to classify the fourth continuous value attribute to Obtain the classified data.
- the device further includes: a construction module for constructing an objective function, where the objective function is the sum of the error value of the third continuous value attribute and the substituting entropy; and the training module is used for the neural network Perform training until the value of the objective function is the minimum value.
- the building module specifically includes: a subtraction module, which is used to subtract the third continuous value attribute and the theoretical value of the third continuous value attribute to obtain an error value; and the division module is used to subtract the third continuous value attribute.
- the attributes are divided into data sets to obtain the first sub-data set, where the first data set includes a plurality of first sub-data sets; the obtaining module is used to obtain the substitution entropy of the first sub-data set; the superposition module is used to compare The substitution entropy of the multiple first sub-data sets is superimposed to obtain the substitution entropy of the third continuous value attribute.
- the building module specifically includes:
- the substitution entropy of the first sub-data set is obtained according to the first formula, where the first formula is: Represents the first subdata, En[ ⁇ ] represents the entropy, Indicates the number of samples of the data, b q represents the window width of the kernel density estimation method, Respectively represent the nth and mth elements in the first subdata.
- the building module specifically includes:
- the substitution entropy of the third continuous value attribute is obtained according to the second formula, where the second formula is: among them, For the first The number of nodes in a hidden layer, Is the third continuous value attribute,
- the present invention provides an electronic device comprising: at least one processor and a memory; wherein the memory stores computer-executable instructions; at least one processor executes the computer-executable instructions stored in the memory, so that at least one processor executes the first aspect And the data classification method involved in the optional plan.
- the present invention provides a computer-readable storage medium, characterized in that the computer-readable storage medium stores computer-executable instructions, and when the processor executes the computer-executable instructions, the first aspect and the alternative solutions involved Data classification method.
- the present invention provides a data classification method, device, equipment and storage medium.
- a discrete value attribute is continuously encoded to obtain a second continuous value attribute; a neural network is used to train the second continuous value attribute, and First Hidden layer data is used as the third continuous value attribute, thereby completely transforming discrete value attributes into continuous value attributes with order information and real numbers.
- the classification process is performed to obtain the classified data, so that the classification accuracy is higher than that in the prior art that only uses one-hot encoding to classify mixed-value attribute data Accuracy.
- Fig. 1 is a flowchart of a data classification method according to an exemplary embodiment of the present invention
- Fig. 2 is a flowchart of a data classification method according to an exemplary embodiment of the present invention
- Fig. 3 is a schematic diagram showing the structure of a data classification device according to an exemplary embodiment of the present invention.
- Fig. 4 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present invention.
- the present invention provides a data classification method, device, equipment and storage medium to solve the problem that the existing classification method does not realize the continuity of discrete value attributes by using one-hot encoding operation.
- Fig. 1 is a flowchart of a data classification method according to an exemplary embodiment of the present invention. As shown in Figure 1, the data classification method provided in this embodiment includes:
- the data includes discrete value attributes and first continuous value attributes. Continuously encoding the discrete-valued attributes to obtain the second continuous-valued attributes, and realize the preliminary conversion of the discrete-valued attributes into continuous-valued attributes.
- one-hot encoding can be used to convert the discrete value attribute into the second continuous value attribute.
- the data set is divided into continuous value attributes and discrete value attributes; with Respectively represent the data set The number of continuous and discrete value attributes included, Representative data set Contains the number of samples; Representative Continuous value attributes, then Representative Discrete-valued attributes, assuming its value is Represents discrete value attributes The number of values, then Represents the category of the nth sample, assuming the data set Share Categories then
- the neural network includes Hidden layer.
- the second continuous value attribute is input to the neural network for training, and the first The hidden layer data is output as the third continuous value attribute.
- an Encoding Neural Network (ENN). Among them, And take the one hot encoding data set shown in Table 3 as input.
- the input of ENN is expressed by formula (2).
- the number of input layer nodes of ENN is:
- the number of output layer nodes of ENN is the number of output layer nodes of ENN.
- the hidden layer node uses the Sigmoid function to activate its input, and the f-th hidden layer contains Nodes, where The f-th hidden layer is expressed by formula (5).
- Table 4 The third continuous value attribute data set
- S103 Perform merging processing on the first continuous value attribute and the third continuous value attribute to obtain a fourth continuous value attribute.
- the first continuous value attribute and the third continuous value attribute are combined to obtain a fourth continuous value attribute, where the fourth continuous value attribute includes the first continuous value attribute and the third continuous value attribute.
- the third continuous value attribute is expressed as:
- S104 Perform classification processing on the fourth continuous value attribute to obtain classified data.
- any classification method for continuous-valued attribute data can be used, such as support vector machines and neural networks, to process real-valued attribute data sets
- the discrete value attribute is continuously encoded to obtain the second continuous value attribute; the neural network is used to train the second continuous value attribute, and the first Hidden layer data is used as the third continuous value attribute, thereby completely transforming discrete value attributes into continuous value attributes with order information and real numbers.
- the classification process is performed to obtain the classified data, so that the classification accuracy is higher than that in the prior art that only uses one-hot encoding to classify mixed-value attribute data Accuracy.
- Fig. 2 is a flowchart of a data classification method according to an exemplary embodiment of the present invention. As shown in Figure 2, the data classification method provided in this embodiment includes:
- S201 Perform continuous encoding processing on the discrete value attribute to obtain a second continuous value attribute.
- S202 Construct an objective function, and use the second continuous value attribute to train the neural network until the value of the objective function is the minimum value.
- the objective function is the sum of the error value of the third continuous value attribute and the substituted entropy.
- E[ ⁇ ] is the third continuous value attribute data set corresponding to ENN
- U[ ⁇ ] is the first Hidden layer data Uncertainty.
- the error value can be obtained by subtracting the theoretical value of the third continuous value attribute and the third continuous value attribute.
- S301 Perform data set division on the third continuous value attribute to obtain a first sub-data set.
- the first data set includes a plurality of first sub-data sets.
- the third continuous value attribute data set is expressed as:
- the first sub-data set is expressed as:
- substitution entropy calculation method of the first sub-data set is as follows:
- Data set Corresponding to the entropy, Represented in the data set The data set obtained by the kernel density estimation method The probability density function.
- b q represents the window width parameter of the kernel density estimation method
- b q is about the number of samples
- S302 Perform superposition processing on the substitution entropy of the multiple first sub-data sets to obtain the substitution entropy of the third continuous value attribute.
- substitution entropy U[ ⁇ ] of the third continuous value attribute is calculated as follows:
- S204 Perform merging processing on the first continuous value attribute and the third continuous value attribute to obtain a fourth continuous value attribute.
- S205 Perform classification processing on the fourth continuous value attribute to obtain classified data.
- Fig. 3 is a schematic diagram showing the structure of a data classification device according to an exemplary embodiment of the present invention.
- this embodiment provides a data classification device, including: an obtaining module 101, configured to perform continuous encoding processing on discrete value attributes to obtain a second continuous value attribute, where the data includes the discrete value attribute and the first continuous value attribute. Value attribute; as the module 102, it is used to train the second continuous value attribute using a neural network, and the first Hidden layer data as the third continuous value attribute, where the neural network includes Hidden layers; the obtaining module 101 is also used to merge the first continuous value attribute and the third continuous value attribute to obtain the fourth continuous value attribute; the obtaining module 101 is also used to classify the fourth continuous value attribute To obtain classified data.
- the device further includes: a construction module 103 for constructing an objective function, where the objective function is the sum of the error value and the substitution entropy of the third continuous value attribute; the training module 104 is used for using the second continuous value attribute pair The neural network is trained until the value of the objective function is the minimum.
- the construction module 103 specifically includes: performing subtraction processing on the third continuous value attribute and theoretical values of the third continuous value attribute to obtain an error value; performing data set division on the third continuous value attribute to obtain the first A sub-data set, where the first data set includes a plurality of first sub-data sets; the substitution entropy of the first sub-data set is obtained; the superposition module is used to superimpose the substitution entropy of the plurality of first sub-data sets to Obtain the substitution entropy of the third continuous value attribute.
- the building module 103 specifically includes:
- the substitution entropy of the first sub-data set is obtained according to the first formula, where the first formula is: Represents the first sub-data, Entropy[ ⁇ ] represents the entropy, Indicates the number of samples of the data, b q represents the window width of the kernel density estimation method, Respectively represent the nth and mth elements in the first subdata.
- the building module 103 specifically includes:
- the substitution entropy of the third continuous value attribute is obtained according to the second formula, where the second formula is: among them, For the first The number of nodes in a hidden layer, Is the third continuous value attribute,
- the data classification device provided by this application can be used to implement the above data classification method, and its content and effects can be referred to the method part, which will not be repeated in this application.
- Fig. 4 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present invention.
- the electronic device 200 of this embodiment includes a processor 201 and a memory 202, where:
- the memory 202 is used to store computer execution instructions
- the processor 201 is configured to execute computer-executable instructions stored in the memory to implement each step executed by the receiving device in the foregoing embodiment. For details, refer to the related description in the foregoing method embodiment.
- the memory 202 may be independent or integrated with the processor 201.
- the flow control device 200 further includes a bus 203 for connecting the memory 202 and the processor 201.
- the embodiment of the present invention also provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the processor executes the computer-executable instructions, the data classification method as described above is implemented.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/072932 WO2020150955A1 (fr) | 2019-01-24 | 2019-01-24 | Procédé et appareil de classification de données, ainsi que dispositif et support de stockage |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/072932 WO2020150955A1 (fr) | 2019-01-24 | 2019-01-24 | Procédé et appareil de classification de données, ainsi que dispositif et support de stockage |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020150955A1 true WO2020150955A1 (fr) | 2020-07-30 |
Family
ID=71736027
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/072932 WO2020150955A1 (fr) | 2019-01-24 | 2019-01-24 | Procédé et appareil de classification de données, ainsi que dispositif et support de stockage |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2020150955A1 (fr) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0877132A (ja) * | 1994-08-31 | 1996-03-22 | Victor Co Of Japan Ltd | 相互結合型ニューラルネットワークの学習方法 |
CN105786860A (zh) * | 2014-12-23 | 2016-07-20 | 华为技术有限公司 | 一种数据建模中的数据处理方法及装置 |
CN108362510A (zh) * | 2017-11-30 | 2018-08-03 | 中国航空综合技术研究所 | 一种基于证据神经网络模型的机械产品故障模式识别方法 |
CN108628868A (zh) * | 2017-03-16 | 2018-10-09 | 北京京东尚科信息技术有限公司 | 文本分类方法和装置 |
-
2019
- 2019-01-24 WO PCT/CN2019/072932 patent/WO2020150955A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0877132A (ja) * | 1994-08-31 | 1996-03-22 | Victor Co Of Japan Ltd | 相互結合型ニューラルネットワークの学習方法 |
CN105786860A (zh) * | 2014-12-23 | 2016-07-20 | 华为技术有限公司 | 一种数据建模中的数据处理方法及装置 |
CN108628868A (zh) * | 2017-03-16 | 2018-10-09 | 北京京东尚科信息技术有限公司 | 文本分类方法和装置 |
CN108362510A (zh) * | 2017-11-30 | 2018-08-03 | 中国航空综合技术研究所 | 一种基于证据神经网络模型的机械产品故障模式识别方法 |
Non-Patent Citations (1)
Title |
---|
SUN, JINGUANG ET AL.: "DBN Classification Algorithm for Numerical Attribute", COMPUTER ENGINEERING AND APPLICATIONS, vol. 50, no. 2, 15 January 2014 (2014-01-15), pages 112 - 114, ISSN: 1002-8331 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020199693A1 (fr) | Procédé et appareil de reconnaissance faciale de grande pose et dispositif associé | |
JP2022056316A (ja) | 文字構造化抽出方法及び装置、電子機器、記憶媒体並びにコンピュータプログラム | |
US11587291B2 (en) | Systems and methods of contrastive point completion with fine-to-coarse refinement | |
WO2021031825A1 (fr) | Procédé et dispositif d'identification de fraude de réseau, dispositif informatique et support de stockage | |
CN110866546B (zh) | 一种共识节点的评估方法及装置 | |
US20230035910A1 (en) | Method, system and device for parallel processing of data, and storage medium | |
CN110889416B (zh) | 一种基于级联改良网络的显著性物体检测方法 | |
US11386507B2 (en) | Tensor-based predictions from analysis of time-varying graphs | |
US9824494B2 (en) | Hybrid surfaces for mesh repair | |
CN112580733B (zh) | 分类模型的训练方法、装置、设备以及存储介质 | |
JP2021193615A (ja) | 量子データの処理方法、量子デバイス、コンピューティングデバイス、記憶媒体、及びプログラム | |
US20220130495A1 (en) | Method and Device for Determining Correlation Between Drug and Target, and Electronic Device | |
CN109787958B (zh) | 网络流量实时检测方法及检测终端、计算机可读存储介质 | |
CN114330670A (zh) | 图神经网络训练方法、装置、设备及存储介质 | |
Rehman et al. | A modified self‐adaptive extragradient method for pseudomonotone equilibrium problem in a real Hilbert space with applications | |
CN107784087B (zh) | 一种热词确定方法、装置及设备 | |
CN113672735A (zh) | 一种基于主题感知异质图神经网络的链接预测方法 | |
WO2021035980A1 (fr) | Procédé et appareil d'apprentissage de modèle de reconnaissance faciale, dispositif et support d'enregistrement lisible | |
Jia et al. | SiaTrans: Siamese transformer network for RGB-D salient object detection with depth image classification | |
CN113205495B (zh) | 图像质量评价及模型训练方法、装置、设备和存储介质 | |
CN108470251B (zh) | 基于平均互信息的社区划分质量评价方法及系统 | |
WO2020150955A1 (fr) | Procédé et appareil de classification de données, ainsi que dispositif et support de stockage | |
CN113723072A (zh) | Rpa结合ai的模型融合结果获取方法、装置及电子设备 | |
CN115953651B (zh) | 一种基于跨域设备的模型训练方法、装置、设备及介质 | |
Kocik et al. | Node-to-node disjoint paths problem in Möbius cubes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19911376 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 15.09.2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19911376 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.04.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19911376 Country of ref document: EP Kind code of ref document: A1 |