WO2020150955A1 - Procédé et appareil de classification de données, ainsi que dispositif et support de stockage - Google Patents

Procédé et appareil de classification de données, ainsi que dispositif et support de stockage Download PDF

Info

Publication number
WO2020150955A1
WO2020150955A1 PCT/CN2019/072932 CN2019072932W WO2020150955A1 WO 2020150955 A1 WO2020150955 A1 WO 2020150955A1 CN 2019072932 W CN2019072932 W CN 2019072932W WO 2020150955 A1 WO2020150955 A1 WO 2020150955A1
Authority
WO
WIPO (PCT)
Prior art keywords
value attribute
continuous value
data
continuous
attribute
Prior art date
Application number
PCT/CN2019/072932
Other languages
English (en)
Chinese (zh)
Inventor
何玉林
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Priority to PCT/CN2019/072932 priority Critical patent/WO2020150955A1/fr
Publication of WO2020150955A1 publication Critical patent/WO2020150955A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present invention relates to the field of data processing technology, in particular to a data classification method, device, equipment and storage medium.
  • the operating data are mostly mixed-value attributes, which include continuous value attributes and discrete value attributes.
  • a common classification method is to continuousize discrete-valued attributes, and then classify continuous-valued attributes.
  • the attribute value after the one-hot encoding operation is still discrete in the sense of numerical distribution, and it does not fundamentally solve the continuity of the discrete value attribute.
  • the present invention provides a data classification method, device, equipment and storage medium to solve the problem that the existing classification method does not realize the continuity of discrete value attributes by using one-hot encoding operation.
  • the present invention provides a data classification method, which includes: performing continuous encoding processing on discrete value attributes to obtain a second continuous value attribute, wherein the data includes the discrete value attribute and the first continuous value attribute; Continuous value attributes are trained, and the first Hidden layer data as the third continuous value attribute, where the neural network includes A hidden layer; the first continuous value attribute and the third continuous value attribute are merged to obtain the fourth continuous value attribute; the fourth continuous value attribute is classified to obtain the classified data.
  • the discrete-valued attributes are first continuously coded, and then the neural network is used to train the second continuous-valued attributes, thereby thoroughly transforming the discrete-valued attributes into ordered information A continuous value attribute whose value is a real number.
  • the first Hidden layer data as the third continuous value attribute, it also includes: constructing the objective function, where the objective function is the sum of the error value of the third continuous value attribute and the entropy; using the second continuous value attribute to train the neural network , Until the value of the objective function is the minimum.
  • the error value of the third continuous value attribute and the sum of substituted entropy are used as the objective function to train the neural network, and the neural network is used to train the second continuous value attribute, that is, except In addition to ensuring the minimum error between the actual output and the actual output, it also ensures the minimum uncertainty of the data set after the conversion.
  • constructing the objective function specifically includes: performing subtraction processing on the theoretical value of the third continuous value attribute and the third continuous value attribute to obtain an error value; performing data set division on the third continuous value attribute to obtain the third continuous value attribute A sub-data set, wherein the first data set includes a plurality of first sub-data sets; obtains the substitution entropy of the first sub-data set; superimposes the substitution entropy of the plurality of first sub-data sets to obtain the third consecutive The substitution entropy of the value attribute.
  • the third continuous value attribute is divided into data sets to obtain the first sub-data set, and the substitution entropy of the first sub-data set is obtained to obtain the substitution entropy of the first data set, Reduce computational complexity.
  • obtaining the substitution entropy of the first sub-data set specifically includes:
  • the substitution entropy of the first sub-data set is obtained according to the first formula, where the first formula is: Represents the first subdata, En[ ⁇ ] represents the entropy, Indicates the number of samples of the data, b q represents the window width of the kernel density estimation method, Respectively represent the nth and mth elements in the first subdata.
  • performing superposition processing on the substitution entropy of a plurality of first sub-data sets to obtain the substitution entropy of the third continuous value attribute specifically includes:
  • the substitution entropy of the third continuous value attribute is obtained according to the second formula, where the second formula is: among them, For the first The number of nodes in a hidden layer, Is the third continuous value attribute,
  • the data classification device is introduced below, and its implementation principle and technical effect are similar to the principle and technical effect of the above method, and will not be repeated here.
  • the present invention provides a data classification device, including: an obtaining module for performing continuous encoding processing on discrete value attributes to obtain a second continuous value attribute, wherein the data includes the discrete value attribute and the first continuous value attribute; Module, used to train the second continuous value attribute using neural network, Hidden layer data as the third continuous value attribute, where the neural network includes A hidden layer; the acquisition module is used to merge the first continuous value attribute and the third continuous value attribute to obtain the fourth continuous value attribute; the acquisition module is also used to classify the fourth continuous value attribute to Obtain the classified data.
  • the device further includes: a construction module for constructing an objective function, where the objective function is the sum of the error value of the third continuous value attribute and the substituting entropy; and the training module is used for the neural network Perform training until the value of the objective function is the minimum value.
  • the building module specifically includes: a subtraction module, which is used to subtract the third continuous value attribute and the theoretical value of the third continuous value attribute to obtain an error value; and the division module is used to subtract the third continuous value attribute.
  • the attributes are divided into data sets to obtain the first sub-data set, where the first data set includes a plurality of first sub-data sets; the obtaining module is used to obtain the substitution entropy of the first sub-data set; the superposition module is used to compare The substitution entropy of the multiple first sub-data sets is superimposed to obtain the substitution entropy of the third continuous value attribute.
  • the building module specifically includes:
  • the substitution entropy of the first sub-data set is obtained according to the first formula, where the first formula is: Represents the first subdata, En[ ⁇ ] represents the entropy, Indicates the number of samples of the data, b q represents the window width of the kernel density estimation method, Respectively represent the nth and mth elements in the first subdata.
  • the building module specifically includes:
  • the substitution entropy of the third continuous value attribute is obtained according to the second formula, where the second formula is: among them, For the first The number of nodes in a hidden layer, Is the third continuous value attribute,
  • the present invention provides an electronic device comprising: at least one processor and a memory; wherein the memory stores computer-executable instructions; at least one processor executes the computer-executable instructions stored in the memory, so that at least one processor executes the first aspect And the data classification method involved in the optional plan.
  • the present invention provides a computer-readable storage medium, characterized in that the computer-readable storage medium stores computer-executable instructions, and when the processor executes the computer-executable instructions, the first aspect and the alternative solutions involved Data classification method.
  • the present invention provides a data classification method, device, equipment and storage medium.
  • a discrete value attribute is continuously encoded to obtain a second continuous value attribute; a neural network is used to train the second continuous value attribute, and First Hidden layer data is used as the third continuous value attribute, thereby completely transforming discrete value attributes into continuous value attributes with order information and real numbers.
  • the classification process is performed to obtain the classified data, so that the classification accuracy is higher than that in the prior art that only uses one-hot encoding to classify mixed-value attribute data Accuracy.
  • Fig. 1 is a flowchart of a data classification method according to an exemplary embodiment of the present invention
  • Fig. 2 is a flowchart of a data classification method according to an exemplary embodiment of the present invention
  • Fig. 3 is a schematic diagram showing the structure of a data classification device according to an exemplary embodiment of the present invention.
  • Fig. 4 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present invention.
  • the present invention provides a data classification method, device, equipment and storage medium to solve the problem that the existing classification method does not realize the continuity of discrete value attributes by using one-hot encoding operation.
  • Fig. 1 is a flowchart of a data classification method according to an exemplary embodiment of the present invention. As shown in Figure 1, the data classification method provided in this embodiment includes:
  • the data includes discrete value attributes and first continuous value attributes. Continuously encoding the discrete-valued attributes to obtain the second continuous-valued attributes, and realize the preliminary conversion of the discrete-valued attributes into continuous-valued attributes.
  • one-hot encoding can be used to convert the discrete value attribute into the second continuous value attribute.
  • the data set is divided into continuous value attributes and discrete value attributes; with Respectively represent the data set The number of continuous and discrete value attributes included, Representative data set Contains the number of samples; Representative Continuous value attributes, then Representative Discrete-valued attributes, assuming its value is Represents discrete value attributes The number of values, then Represents the category of the nth sample, assuming the data set Share Categories then
  • the neural network includes Hidden layer.
  • the second continuous value attribute is input to the neural network for training, and the first The hidden layer data is output as the third continuous value attribute.
  • an Encoding Neural Network (ENN). Among them, And take the one hot encoding data set shown in Table 3 as input.
  • the input of ENN is expressed by formula (2).
  • the number of input layer nodes of ENN is:
  • the number of output layer nodes of ENN is the number of output layer nodes of ENN.
  • the hidden layer node uses the Sigmoid function to activate its input, and the f-th hidden layer contains Nodes, where The f-th hidden layer is expressed by formula (5).
  • Table 4 The third continuous value attribute data set
  • S103 Perform merging processing on the first continuous value attribute and the third continuous value attribute to obtain a fourth continuous value attribute.
  • the first continuous value attribute and the third continuous value attribute are combined to obtain a fourth continuous value attribute, where the fourth continuous value attribute includes the first continuous value attribute and the third continuous value attribute.
  • the third continuous value attribute is expressed as:
  • S104 Perform classification processing on the fourth continuous value attribute to obtain classified data.
  • any classification method for continuous-valued attribute data can be used, such as support vector machines and neural networks, to process real-valued attribute data sets
  • the discrete value attribute is continuously encoded to obtain the second continuous value attribute; the neural network is used to train the second continuous value attribute, and the first Hidden layer data is used as the third continuous value attribute, thereby completely transforming discrete value attributes into continuous value attributes with order information and real numbers.
  • the classification process is performed to obtain the classified data, so that the classification accuracy is higher than that in the prior art that only uses one-hot encoding to classify mixed-value attribute data Accuracy.
  • Fig. 2 is a flowchart of a data classification method according to an exemplary embodiment of the present invention. As shown in Figure 2, the data classification method provided in this embodiment includes:
  • S201 Perform continuous encoding processing on the discrete value attribute to obtain a second continuous value attribute.
  • S202 Construct an objective function, and use the second continuous value attribute to train the neural network until the value of the objective function is the minimum value.
  • the objective function is the sum of the error value of the third continuous value attribute and the substituted entropy.
  • E[ ⁇ ] is the third continuous value attribute data set corresponding to ENN
  • U[ ⁇ ] is the first Hidden layer data Uncertainty.
  • the error value can be obtained by subtracting the theoretical value of the third continuous value attribute and the third continuous value attribute.
  • S301 Perform data set division on the third continuous value attribute to obtain a first sub-data set.
  • the first data set includes a plurality of first sub-data sets.
  • the third continuous value attribute data set is expressed as:
  • the first sub-data set is expressed as:
  • substitution entropy calculation method of the first sub-data set is as follows:
  • Data set Corresponding to the entropy, Represented in the data set The data set obtained by the kernel density estimation method The probability density function.
  • b q represents the window width parameter of the kernel density estimation method
  • b q is about the number of samples
  • S302 Perform superposition processing on the substitution entropy of the multiple first sub-data sets to obtain the substitution entropy of the third continuous value attribute.
  • substitution entropy U[ ⁇ ] of the third continuous value attribute is calculated as follows:
  • S204 Perform merging processing on the first continuous value attribute and the third continuous value attribute to obtain a fourth continuous value attribute.
  • S205 Perform classification processing on the fourth continuous value attribute to obtain classified data.
  • Fig. 3 is a schematic diagram showing the structure of a data classification device according to an exemplary embodiment of the present invention.
  • this embodiment provides a data classification device, including: an obtaining module 101, configured to perform continuous encoding processing on discrete value attributes to obtain a second continuous value attribute, where the data includes the discrete value attribute and the first continuous value attribute. Value attribute; as the module 102, it is used to train the second continuous value attribute using a neural network, and the first Hidden layer data as the third continuous value attribute, where the neural network includes Hidden layers; the obtaining module 101 is also used to merge the first continuous value attribute and the third continuous value attribute to obtain the fourth continuous value attribute; the obtaining module 101 is also used to classify the fourth continuous value attribute To obtain classified data.
  • the device further includes: a construction module 103 for constructing an objective function, where the objective function is the sum of the error value and the substitution entropy of the third continuous value attribute; the training module 104 is used for using the second continuous value attribute pair The neural network is trained until the value of the objective function is the minimum.
  • the construction module 103 specifically includes: performing subtraction processing on the third continuous value attribute and theoretical values of the third continuous value attribute to obtain an error value; performing data set division on the third continuous value attribute to obtain the first A sub-data set, where the first data set includes a plurality of first sub-data sets; the substitution entropy of the first sub-data set is obtained; the superposition module is used to superimpose the substitution entropy of the plurality of first sub-data sets to Obtain the substitution entropy of the third continuous value attribute.
  • the building module 103 specifically includes:
  • the substitution entropy of the first sub-data set is obtained according to the first formula, where the first formula is: Represents the first sub-data, Entropy[ ⁇ ] represents the entropy, Indicates the number of samples of the data, b q represents the window width of the kernel density estimation method, Respectively represent the nth and mth elements in the first subdata.
  • the building module 103 specifically includes:
  • the substitution entropy of the third continuous value attribute is obtained according to the second formula, where the second formula is: among them, For the first The number of nodes in a hidden layer, Is the third continuous value attribute,
  • the data classification device provided by this application can be used to implement the above data classification method, and its content and effects can be referred to the method part, which will not be repeated in this application.
  • Fig. 4 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present invention.
  • the electronic device 200 of this embodiment includes a processor 201 and a memory 202, where:
  • the memory 202 is used to store computer execution instructions
  • the processor 201 is configured to execute computer-executable instructions stored in the memory to implement each step executed by the receiving device in the foregoing embodiment. For details, refer to the related description in the foregoing method embodiment.
  • the memory 202 may be independent or integrated with the processor 201.
  • the flow control device 200 further includes a bus 203 for connecting the memory 202 and the processor 201.
  • the embodiment of the present invention also provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the processor executes the computer-executable instructions, the data classification method as described above is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne un procédé et un appareil de classification de données, ainsi qu'un dispositif et un support de stockage. Le procédé comprend les étapes consistant à : coder en continu des attributs de valeur discrète pour obtenir un deuxième attribut de valeur continue, les données comprenant les attributs de valeur discrète et le premier attribut de valeur continue ; entraîner le deuxième attribut de valeur continue à l'aide d'un réseau neuronal, et utiliser les données de la Ƒième couche cachée comme troisième attribut de valeur continue, le réseau neuronal comprenant Ƒ couches cachées ; combiner le premier attribut de valeur continue et le troisième attribut de valeur continue pour obtenir un quatrième attribut de valeur continue ; et classer le quatrième attribut de valeur continue pour obtenir des données classées. Dans la présente invention, des premiers attributs de valeur discrète sont codés en continu, puis un deuxième attribut de valeur continue est formé à l'aide d'un réseau neuronal, ce qui permet de transformer complètement les attributs de valeur discrète en un attribut de valeur continue ayant des informations ordonnées et utilisant un nombre réel.
PCT/CN2019/072932 2019-01-24 2019-01-24 Procédé et appareil de classification de données, ainsi que dispositif et support de stockage WO2020150955A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/072932 WO2020150955A1 (fr) 2019-01-24 2019-01-24 Procédé et appareil de classification de données, ainsi que dispositif et support de stockage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/072932 WO2020150955A1 (fr) 2019-01-24 2019-01-24 Procédé et appareil de classification de données, ainsi que dispositif et support de stockage

Publications (1)

Publication Number Publication Date
WO2020150955A1 true WO2020150955A1 (fr) 2020-07-30

Family

ID=71736027

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/072932 WO2020150955A1 (fr) 2019-01-24 2019-01-24 Procédé et appareil de classification de données, ainsi que dispositif et support de stockage

Country Status (1)

Country Link
WO (1) WO2020150955A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0877132A (ja) * 1994-08-31 1996-03-22 Victor Co Of Japan Ltd 相互結合型ニューラルネットワークの学習方法
CN105786860A (zh) * 2014-12-23 2016-07-20 华为技术有限公司 一种数据建模中的数据处理方法及装置
CN108362510A (zh) * 2017-11-30 2018-08-03 中国航空综合技术研究所 一种基于证据神经网络模型的机械产品故障模式识别方法
CN108628868A (zh) * 2017-03-16 2018-10-09 北京京东尚科信息技术有限公司 文本分类方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0877132A (ja) * 1994-08-31 1996-03-22 Victor Co Of Japan Ltd 相互結合型ニューラルネットワークの学習方法
CN105786860A (zh) * 2014-12-23 2016-07-20 华为技术有限公司 一种数据建模中的数据处理方法及装置
CN108628868A (zh) * 2017-03-16 2018-10-09 北京京东尚科信息技术有限公司 文本分类方法和装置
CN108362510A (zh) * 2017-11-30 2018-08-03 中国航空综合技术研究所 一种基于证据神经网络模型的机械产品故障模式识别方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SUN, JINGUANG ET AL.: "DBN Classification Algorithm for Numerical Attribute", COMPUTER ENGINEERING AND APPLICATIONS, vol. 50, no. 2, 15 January 2014 (2014-01-15), pages 112 - 114, ISSN: 1002-8331 *

Similar Documents

Publication Publication Date Title
WO2020199693A1 (fr) Procédé et appareil de reconnaissance faciale de grande pose et dispositif associé
JP2022056316A (ja) 文字構造化抽出方法及び装置、電子機器、記憶媒体並びにコンピュータプログラム
US11587291B2 (en) Systems and methods of contrastive point completion with fine-to-coarse refinement
WO2021031825A1 (fr) Procédé et dispositif d'identification de fraude de réseau, dispositif informatique et support de stockage
CN110866546B (zh) 一种共识节点的评估方法及装置
US20230035910A1 (en) Method, system and device for parallel processing of data, and storage medium
CN110889416B (zh) 一种基于级联改良网络的显著性物体检测方法
US11386507B2 (en) Tensor-based predictions from analysis of time-varying graphs
US9824494B2 (en) Hybrid surfaces for mesh repair
CN112580733B (zh) 分类模型的训练方法、装置、设备以及存储介质
JP2021193615A (ja) 量子データの処理方法、量子デバイス、コンピューティングデバイス、記憶媒体、及びプログラム
US20220130495A1 (en) Method and Device for Determining Correlation Between Drug and Target, and Electronic Device
CN109787958B (zh) 网络流量实时检测方法及检测终端、计算机可读存储介质
CN114330670A (zh) 图神经网络训练方法、装置、设备及存储介质
Rehman et al. A modified self‐adaptive extragradient method for pseudomonotone equilibrium problem in a real Hilbert space with applications
CN107784087B (zh) 一种热词确定方法、装置及设备
CN113672735A (zh) 一种基于主题感知异质图神经网络的链接预测方法
WO2021035980A1 (fr) Procédé et appareil d'apprentissage de modèle de reconnaissance faciale, dispositif et support d'enregistrement lisible
Jia et al. SiaTrans: Siamese transformer network for RGB-D salient object detection with depth image classification
CN113205495B (zh) 图像质量评价及模型训练方法、装置、设备和存储介质
CN108470251B (zh) 基于平均互信息的社区划分质量评价方法及系统
WO2020150955A1 (fr) Procédé et appareil de classification de données, ainsi que dispositif et support de stockage
CN113723072A (zh) Rpa结合ai的模型融合结果获取方法、装置及电子设备
CN115953651B (zh) 一种基于跨域设备的模型训练方法、装置、设备及介质
Kocik et al. Node-to-node disjoint paths problem in Möbius cubes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19911376

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 15.09.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19911376

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.04.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19911376

Country of ref document: EP

Kind code of ref document: A1