CN112215249A - Hierarchical classification method and device - Google Patents

Hierarchical classification method and device Download PDF

Info

Publication number
CN112215249A
CN112215249A CN201910625981.1A CN201910625981A CN112215249A CN 112215249 A CN112215249 A CN 112215249A CN 201910625981 A CN201910625981 A CN 201910625981A CN 112215249 A CN112215249 A CN 112215249A
Authority
CN
China
Prior art keywords
level
cluster
clusters
clustering
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910625981.1A
Other languages
Chinese (zh)
Inventor
冯明超
俞晓光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201910625981.1A priority Critical patent/CN112215249A/en
Publication of CN112215249A publication Critical patent/CN112215249A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24317Piecewise classification, i.e. whereby each classification requires several discriminant rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds

Abstract

The invention discloses a hierarchical classification method and a hierarchical classification device, and relates to the technical field of computers. One embodiment of the method comprises: clustering all classified objects to obtain a first level containing a plurality of class clusters; for each classified object in the class clusters, determining the spatial distance between the classified object and other class clusters in the first level, and determining the classified object as a redundant object when the spatial distance is smaller than a preset distance threshold; placing the redundant object in other class clusters and updating the first level; and taking each class cluster in the updated first level as a new classification object, and circularly executing the steps until a preset level is reached so as to realize level classification. According to the implementation method, the redundant data can be respectively classified into the class clusters to which the redundant data belong, so that the accuracy and the classification efficiency of hierarchical classification are improved.

Description

Hierarchical classification method and device
Technical Field
The invention relates to the technical field of computers, in particular to a hierarchical classification method and a hierarchical classification device.
Background
There are a number of classification problems in the natural and social sciences. The Hierarchical Classification (HC) problem is a branch of the classification problem in which classes are not intersecting, but are organized with hierarchical results.
A clustering-based hierarchical analysis method performs a hierarchical decomposition of a given data set until a certain condition is satisfied. The method can be divided into two schemes of bottom-up and top-down. For example, in a "bottom-up" scheme, initially each data is grouped into a separate group, and in the next iteration it groups those data that are adjacent to each other into a group until all the data is grouped into a group or some condition is met.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
1. redundant data enables clustering edges to be fuzzy, and classification errors are easily caused;
2. because the classification level number cannot be determined, the situation of under-classification or over-classification exists, and the classification effect and precision are influenced.
Disclosure of Invention
In view of this, embodiments of the present invention provide a hierarchical classification method and apparatus, which can respectively attribute redundant data to class clusters to which the redundant data belong, so as to improve accuracy and classification efficiency of hierarchical classification.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a hierarchical classification method including: clustering all classified objects to obtain a first level containing a plurality of class clusters; for each classified object in the class clusters, determining the spatial distance between the classified object and other class clusters in the first level, and determining the classified object as a redundant object when the spatial distance is smaller than a preset distance threshold; placing the redundant object in other class clusters and updating the first level; and taking each class cluster in the updated first level as a new classification object, and circularly executing the steps until a preset level is reached so as to realize level classification.
Further, the step of clustering all the classified objects includes: the method comprises the steps of mapping classification objects into a vector space, determining whether the distance between the classification objects is smaller than a first space distance, clustering two classification objects with the closest distance to obtain a middle cluster if the distance between the classification objects is smaller than the first space distance, and then clustering the rest classification objects and the middle cluster according to the distances between the rest classification objects and the middle cluster to obtain a first level containing a plurality of clusters.
Further, the clustering step includes: determining whether the number of the class clusters in the hierarchy is larger than the preset number of the class clusters; and if so, clustering the clusters in the hierarchy again.
Further, the above re-clustering the clusters in the hierarchy further includes: and determining the distance between the center points of any two cluster types in the current hierarchy, and combining the two cluster types with the closest distance into one cluster type.
According to another aspect of the embodiments of the present invention, there is provided a hierarchical classification apparatus including: the clustering module is used for clustering all classified objects to obtain a first level containing a plurality of clusters; the redundant data determining module is used for determining the spatial distance between each classified object in the class clusters and other class clusters in the first level, and when the spatial distance is smaller than a preset distance threshold value, determining the classified objects as redundant objects; the updating module is used for placing the redundant objects in other class clusters and updating the first level; and the circulating module is used for circularly executing the operation steps of the modules by taking each class cluster in the updated first level as a new classification object until the preset level number is reached so as to realize the hierarchical classification.
Further, the clustering module is further configured to; the method comprises the steps of mapping classification objects into a vector space, determining whether the distance between the classification objects is smaller than a first space distance, clustering two classification objects with the closest distance to obtain a middle cluster if the distance between the classification objects is smaller than the first space distance, and then clustering the rest classification objects and the middle cluster according to the distances between the rest classification objects and the middle cluster to obtain a first level containing a plurality of clusters.
Further, the hierarchical classification device further comprises a cluster number module, which is used for determining whether the number of the clusters in the hierarchy is greater than the preset cluster number; if so, the cluster number module is used for clustering the clusters in the hierarchy again.
Further, the cluster number module is further configured to; determining the distance between the center points of any two clusters in the current level; and merging the two closest class clusters into one class cluster.
According to an aspect of an embodiment of the present invention, there is provided a server including: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement any of the hierarchical classification methods described above.
According to an aspect of embodiments of the present invention, there is provided a computer-readable medium on which a computer program is stored, the program, when executed by a processor, implementing any of the above hierarchical classification methods.
One embodiment of the above invention has the following advantages or benefits: because all classified objects are clustered, a first level comprising a plurality of clusters is obtained; for each classified object in the class clusters, determining the spatial distance between the classified object and other class clusters in the first level, and determining the classified object as a redundant object when the spatial distance is smaller than a preset distance threshold; placing the redundant object in other class clusters and updating the first level; each cluster in the updated first hierarchy is used as a new classification object, and the steps are executed in a circulating mode until the preset hierarchy level is reached so as to realize the technical means of hierarchical classification, so that the technical problems that the clustering edge is fuzzy due to redundant data and classification errors are easy to cause are solved, and the accuracy and the classification efficiency of hierarchical classification are improved; meanwhile, by presetting the layer number, the condition of under-classification or over-classification is avoided, and the technical effect of improving the classification precision is realized.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1a is a schematic diagram of a main flow of a hierarchical classification method according to a first embodiment of the present invention;
FIG. 1b is a schematic diagram of a main flow of a hierarchical classification method according to a second embodiment of the present invention;
FIG. 2a is a schematic diagram of a clustering method in the hierarchical classification method according to an embodiment of the present invention;
FIG. 2b is a schematic diagram of the classification of redundant objects in the single-layer classification method according to the embodiment of the present invention;
FIG. 2c is a schematic diagram of the classification of a redundant object in the multi-level classification method according to the embodiment of the present invention;
FIG. 3 is a schematic diagram of the main modules of a hierarchical classification apparatus provided in accordance with an embodiment of the present invention;
FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 5 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1a is a schematic diagram of a main flow of a hierarchical classification method according to a first embodiment of the present invention, and as shown in fig. 1a, a hierarchical classification method provided in an embodiment of the present invention mainly includes:
step S101, clustering all classified objects to obtain a first level containing a plurality of clusters; for each classified object in the class clusters, determining the spatial distance between the classified object and other class clusters in the first level, and determining the classified object as a redundant object when the spatial distance is smaller than a preset distance threshold; placing the redundant object in other class clusters and updating the first level;
step S102, each class cluster in the updated first level is taken as a new classification object, and the steps are executed in a circulating manner until a preset level is reached so as to realize level classification
According to the embodiment of the invention, the step of clustering all the classified objects comprises the following steps: the method comprises the steps of mapping classification objects into a vector space, determining whether the distance between the classification objects is smaller than a first spatial distance (namely a preset distance threshold), if so, clustering two classification objects closest to each other to obtain a middle class cluster, and then clustering the rest classification objects and the middle class cluster according to the distance between the rest classification objects and the middle class cluster to obtain a first level containing a plurality of class clusters.
According to the embodiment of the invention, after a first hierarchy containing a plurality of class clusters is obtained; the hierarchical classification method further comprises the following steps: determining whether the number of the class clusters in the hierarchy is larger than the preset number of the class clusters; and if so, clustering the clusters in the hierarchy again.
Further, according to the embodiment of the present invention, the step of clustering the clusters in the hierarchy again includes: and determining the distance between the center points of any two cluster types in the current hierarchy, and combining the two cluster types with the closest distance into one cluster type.
According to the technical scheme of the embodiment of the invention, all classified objects are clustered to obtain a first level containing a plurality of class clusters; for each classified object in the class clusters, determining the spatial distance between the classified object and other class clusters in the first level, and determining the classified object as a redundant object when the spatial distance is smaller than a preset distance threshold; placing the redundant object in other class clusters and updating the first level; and taking each cluster in the updated first level as a new classification object, and executing the steps in a circulating manner until the preset level number is reached to realize the technical means of hierarchical classification, so that the technical problems that the clustering edge is fuzzy due to redundant data and classification errors are easily caused are solved, and the technical effects of improving the accuracy and the efficiency of hierarchical classification are further achieved.
Fig. 1b is a schematic diagram of a main flow of a hierarchical classification method according to a second embodiment of the present invention, and as shown in fig. 1b, a hierarchical classification method provided in an embodiment of the present invention mainly includes:
step S201, clustering all the classified objects to obtain a first hierarchy including a plurality of clusters.
Further, according to the embodiment of the present invention, the step of clustering all the classified objects includes: the method comprises the steps of mapping classification objects into a vector space, determining whether the distance between the classification objects is smaller than a first spatial distance (namely a preset distance threshold), if so, clustering two classification objects closest to each other to obtain a middle class cluster, and then clustering the rest classification objects and the middle class cluster according to the distance between the rest classification objects and the middle class cluster to obtain a first level containing a plurality of class clusters.
According to the embodiment of the present invention, after obtaining the first hierarchy including the plurality of class clusters, the hierarchical classification method further includes: step S202, determining whether the number of the class clusters in the hierarchy is larger than the preset number of the class clusters; if yes, go to step S203, cluster again to the cluster in the hierarchy; if not, go to step S204.
Step S203, determining a distance between center points of any two cluster types in the current hierarchy, and merging the two cluster types closest to each other into one cluster type.
Step S204, for each classified object in the class clusters, determining the spatial distance between the classified object and other class clusters in the first level, and determining the classified object as a redundant object when the spatial distance is smaller than a preset distance threshold; the redundant objects are placed in other class clusters and the first level is updated.
Specifically, in the process of obtaining the hierarchical classification of the first hierarchy, the classification object refers to pre-classification data, and the redundant object refers to data in the current class cluster, the distance from which to other class clusters is also within the range of the distance threshold; in the process of obtaining the hierarchical classification of the second or more levels, the classification object refers to a pre-classification cluster, and the redundant object refers to a cluster in which the distance from the current cluster to other clusters in the same level is also within the distance threshold range corresponding to the level.
Specifically, the step of updating the first hierarchy includes updating the first hierarchy by assigning the redundant object to the one or more class clusters if the distance between the redundant object and the other class clusters is smaller than the threshold distance, and placing the redundant object in the one or more class clusters to which the redundant object belongs. Specifically, if the number of clusters in the clustering result is greater than the preset number of clusters, that is, greater than the limit of the number of clusters in the hierarchy, clustering two clusters with a close spatial distance until the number of clusters in the hierarchy is equal to the preset number of clusters. In other words, in the multidimensional space, a large amount of data are clustered to form a small number of clusters, each cluster has a certain coverage area, the inclusion is stronger, and the classification accuracy of the data in the hierarchy is improved.
Step S205, determine whether the number of hierarchical levels of the hierarchical classification reaches a preset number of hierarchical levels. If yes, go to step S207; if not, go to step S206.
Step S206, taking each cluster in the current hierarchy as a classification object, returning to step S201 to execute the above steps in a loop until a preset hierarchy level is reached, so as to realize hierarchical classification.
Step S207, the hierarchical classification is completed.
Specifically, when the preset number of levels is multiple, the hierarchical classification method further includes: clustering by taking the first level cluster as a pre-level classification object according to the distance between the center points of the first level cluster to form a second level cluster to obtain a second level; determining redundant clusters according to the distance between the center point of the first-level cluster in the second-level cluster and the center points of other second-level clusters, and respectively placing the redundant clusters in the second-level clusters to which the redundant clusters belong; and then, taking the second level cluster as a pre-level classification object, and circulating the steps until a preset level is reached so as to realize level classification.
Specifically, the frame for processing data may be one hierarchy or multiple hierarchies, and for a hierarchical classification field requiring multiple hierarchies, the hierarchical classification field needs to be hierarchically classified again on the basis of the obtained hierarchical cluster until a preset hierarchy level is reached.
The data are classified in a hierarchical classification mode based on clustering, a directed acyclic graph different from a traditional hierarchical classification tree-shaped frame is formed, and the classification accuracy is remarkably improved; meanwhile, due to the reduction of the category of each layer, the subsequent searching efficiency is improved, and the whole classification efficiency is improved.
According to the technical scheme of the embodiment of the invention, all classified objects are clustered to obtain a first level containing a plurality of class clusters; for each classified object in the class clusters, determining the spatial distance between the classified object and other class clusters in the first level, and determining the classified object as a redundant object when the spatial distance is smaller than a preset distance threshold; placing the redundant object in other class clusters and updating the first level; and taking each cluster in the updated first level as a new classification object, and executing the steps in a circulating manner until the preset level number is reached to realize the technical means of hierarchical classification, so that the technical problems that the clustering edge is fuzzy due to redundant data and classification errors are easily caused are solved, and the technical effects of improving the accuracy and the efficiency of hierarchical classification are further achieved.
FIG. 2a is a schematic diagram of a clustering method in the hierarchical classification method according to an embodiment of the present invention; as shown in fig. 2a, in the multidimensional space, the classified objects p 0-p 6 are clustered, the data closest to each other are first aggregated into a class according to the spatial distance between the classified objects, then similar classes are continuously searched according to the first spatial distance (i.e. the first level preset distance threshold), and finally the classified objects within a certain range (i.e. smaller than the first spatial distance) are aggregated into a large class, i.e. a class cluster is formed. According to a specific implementation manner of the embodiment of the invention, 300 classification objects are clustered into 50 non-overlapping clusters, and each cluster has a larger coverage area, so that the classification effect is improved. After the cluster model of the cluster is established, when new data are transmitted to be classified, the classification accuracy can be obviously improved.
FIG. 2b is a schematic diagram of the classification of redundant objects in the single-layer classification method according to the embodiment of the present invention; as shown in fig. 2b, the classification objects a1, a2, a3, b1, b2, and b3 are classified in a hierarchy, wherein the preset hierarchy level is one level. Firstly, according to the comparison between the spatial distance between the classified objects and the first spatial distance (namely, the distance threshold preset by the first level), clustering a1, a2 and a3 into a cluster A, and clustering B1, B2 and B3 into a cluster B; then, according to the distance threshold, a3 is determined as redundant data, wherein m part belongs to the class cluster A, and n part belongs to the class cluster B, so that a3 is respectively assigned to the class cluster A and the class cluster B, and a first level is obtained.
FIG. 2c is a schematic diagram of the classification of a redundant object in the multi-level classification method according to the embodiment of the present invention; as shown in fig. 2c, the classification objects are a11, a12, a13, a14, a21, a22, a31, b11, b21, b31, c21 and c22, and are classified in a hierarchical manner, wherein the preset hierarchy level is two layers. Firstly, according to the comparison between the spatial distance between the classified objects and a first spatial distance (namely, a first-level preset distance threshold), clustering a11, a12, a13 and a14 into a cluster a1, clustering a21 and a22 into a cluster a2, clustering a31 into a cluster a3, clustering b11 into a cluster b1, clustering b21 into a cluster b2, clustering b31 into a cluster b3, and clustering c21 and c22 into a cluster c 2; then, according to the distance threshold, determining that a11 and a12 are redundant data, wherein a11 also belongs to a cluster 3, and a12 also belongs to a cluster a2, so that a11 and a12 are respectively placed in the cluster to which the a11 and the a12 belong, and a first level is obtained; then, clustering again on the basis of the cluster of the first level, clustering a1, a2 and a3 into a cluster A and clustering B1, B2 and B3 into a cluster B according to the comparison between the spatial distance between the classified objects and the spatial distance of the second level (namely the distance threshold value preset by the second level); determining a3 as a redundant cluster, which also belongs to the upper-layer cluster B, and placing the redundant cluster into the upper-layer cluster B to which the redundant cluster belongs to obtain a second level.
FIG. 3 is a schematic diagram of the main modules of a hierarchical classification apparatus provided in accordance with an embodiment of the present invention; as shown in fig. 3, a hierarchical classification apparatus 300 according to an embodiment of the present invention includes:
a clustering module 301, configured to cluster all classified objects to obtain a first hierarchy including a plurality of clusters;
according to a specific implementation manner of the embodiment of the present invention, the clustering module 301 is further configured to: the method comprises the steps of mapping classification objects into a vector space, determining whether the distance between the classification objects is smaller than a first spatial distance (namely a preset distance threshold), if so, clustering two classification objects closest to each other to obtain a middle class cluster, and then clustering the rest classification objects and the middle class cluster according to the distance between the rest classification objects and the middle class cluster to obtain a first level containing a plurality of class clusters.
The redundant object determining module 302 is configured to determine, for each classified object in the class clusters, a spatial distance between the classified object and another class cluster in the first hierarchy, and determine that the classified object is a redundant object when the spatial distance is smaller than a preset distance threshold.
An updating module 303, configured to place the redundant object in another class cluster and update the first hierarchy;
and the circulating module 304 is configured to circularly execute the above module operation steps with each class cluster in the updated first hierarchy as a new classification object until a preset hierarchy level is reached, so as to implement hierarchical classification.
According to an embodiment of the present invention, the hierarchical classification apparatus further includes: the cluster number module is used for determining whether the number of the clusters in the hierarchy is greater than the preset cluster number; if so, the cluster number module is used for clustering the clusters in the hierarchy again.
Further, according to a specific implementation manner of the embodiment of the present invention, the cluster number module is further configured to: determining the distance between the center points of any two clusters in the current level; and merging the two closest class clusters into one class cluster.
According to the technical scheme of the embodiment of the invention, all classified objects are clustered to obtain a first level containing a plurality of class clusters; for each classified object in the class clusters, determining the spatial distance between the classified object and other class clusters in the first level, and determining the classified object as a redundant object when the spatial distance is smaller than a preset distance threshold; placing the redundant object in other class clusters and updating the first level; and taking each cluster in the updated first level as a new classification object, and executing the steps in a circulating manner until the preset level number is reached to realize the technical means of hierarchical classification, so that the technical problems that the clustering edge is fuzzy due to redundant data and classification errors are easily caused are solved, and the technical effects of improving the accuracy and the efficiency of hierarchical classification are further achieved.
Fig. 4 shows an exemplary system architecture 400 to which the hierarchical classification method or hierarchical classification apparatus of an embodiment of the present invention may be applied.
As shown in fig. 4, the system architecture 400 may include terminal devices 401, 402, 403, a network 404, and a server 405 (this architecture is merely an example, and the components included in a particular architecture may be adapted according to application specific circumstances). The network 404 serves as a medium for providing communication links between the terminal devices 401, 402, 403 and the server 405. Network 404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 401, 402, 403 to interact with a server 405 over a network 404 to receive or send messages or the like. The terminal devices 401, 402, 403 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 405 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 401, 402, 403. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the hierarchical classification method provided by the embodiment of the present invention is generally executed by the server 405, and accordingly, the hierarchical classification apparatus is generally disposed in the server 405.
It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 5, shown is a block diagram of a computer system 500 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a clustering module, a redundant object validation module, and the like. Where the names of these modules do not in some cases constitute a limitation of the module itself, for example, a clustering module may also be described as "clustering all classified objects resulting in a first hierarchy comprising a plurality of clusters".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: clustering all classified objects to obtain a first level containing a plurality of class clusters; for each classified object in the class clusters, determining the spatial distance between the classified object and other class clusters in the first level, and determining the classified object as a redundant object when the spatial distance is smaller than a preset distance threshold; placing the redundant object in other class clusters and updating the first level; and taking each class cluster in the updated first level as a new classification object, and circularly executing the steps until a preset level is reached so as to realize level classification.
According to the technical scheme of the embodiment of the invention, all classified objects are clustered to obtain a first level containing a plurality of class clusters; for each classified object in the class clusters, determining the spatial distance between the classified object and other class clusters in the first level, and determining the classified object as a redundant object when the spatial distance is smaller than a preset distance threshold; placing the redundant object in other class clusters and updating the first level; and taking each cluster in the updated first level as a new classification object, and executing the steps in a circulating manner until the preset level number is reached to realize the technical means of hierarchical classification, so that the technical problems that the clustering edge is fuzzy due to redundant data and classification errors are easily caused are solved, and the technical effects of improving the accuracy and the efficiency of hierarchical classification are further achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A hierarchical classification method, comprising:
clustering all classified objects to obtain a first level containing a plurality of class clusters; for each classified object in the class clusters, determining the spatial distance between the classified object and other class clusters in a first level, and determining the classified object as a redundant object when the spatial distance is smaller than a preset distance threshold; placing the redundant object in the other class cluster and updating a first level;
and taking each class cluster in the updated first level as a new classification object, and circularly executing the steps until a preset level is reached so as to realize level classification.
2. The hierarchical classification method according to claim 1, wherein the step of clustering all classified objects comprises: mapping the classified objects into a vector space, determining whether the distance between the classified objects is smaller than a first spatial distance, if so, clustering two classified objects with the closest distance to obtain a middle cluster, and then clustering the rest classified objects and the middle cluster according to the distances between the rest classified objects and the middle cluster to obtain a first level containing a plurality of clusters.
3. The hierarchical classification method according to claim 1, further comprising:
determining whether the number of the class clusters in the hierarchy is larger than the preset number of the class clusters;
and if so, clustering the clusters in the hierarchy again.
4. The hierarchical classification method according to claim 3, wherein the step of re-clustering the clusters in the hierarchy comprises: and determining the distance between the center points of any two cluster types in the current hierarchy, and combining the two cluster types with the closest distance into one cluster type.
5. A hierarchical classification apparatus, comprising:
the clustering module is used for clustering all classified objects to obtain a first level containing a plurality of clusters;
the redundant object determining module is used for determining the spatial distance between each classified object in the clusters and other clusters in the first level, and when the spatial distance is smaller than a preset distance threshold value, determining the classified objects as redundant objects;
the updating module is used for placing the redundant object in the other class clusters and updating the first level;
and the circulating module is used for circularly executing the operation steps of the modules by taking each class cluster in the updated first level as a new classification object until the preset level number is reached so as to realize the hierarchical classification.
6. The hierarchical classification apparatus according to claim 5, wherein the clustering module is further configured to; mapping the classified objects into a vector space, determining whether the distance between the classified objects is smaller than a first spatial distance, if so, clustering two classified objects with the closest distance to obtain a middle cluster, and then clustering the rest classified objects and the middle cluster according to the distances between the rest classified objects and the middle cluster to obtain a first level containing a plurality of clusters.
7. The hierarchical classification apparatus according to claim 5, further comprising a cluster number module configured to determine whether the number of clusters in the hierarchy is greater than a preset cluster number; and if so, the cluster number module is used for clustering the clusters in the hierarchy again.
8. The hierarchical classification apparatus according to claim 7, wherein the cluster number module is further configured to; determining the distance between the center points of any two clusters in the current level; and merging the two closest class clusters into one class cluster.
9. A server, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-4.
10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-4.
CN201910625981.1A 2019-07-11 2019-07-11 Hierarchical classification method and device Pending CN112215249A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910625981.1A CN112215249A (en) 2019-07-11 2019-07-11 Hierarchical classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910625981.1A CN112215249A (en) 2019-07-11 2019-07-11 Hierarchical classification method and device

Publications (1)

Publication Number Publication Date
CN112215249A true CN112215249A (en) 2021-01-12

Family

ID=74047647

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910625981.1A Pending CN112215249A (en) 2019-07-11 2019-07-11 Hierarchical classification method and device

Country Status (1)

Country Link
CN (1) CN112215249A (en)

Similar Documents

Publication Publication Date Title
CN109189835B (en) Method and device for generating data wide table in real time
CN109614402B (en) Multidimensional data query method and device
US9400700B2 (en) Optimized system for analytics (graphs and sparse matrices) operations
CN107480205B (en) Method and device for partitioning data
US10616338B1 (en) Partitioning data according to relative differences indicated by a cover tree
CN112925859A (en) Data storage method and device
CN112597126A (en) Data migration method and device
CN112783887A (en) Data processing method and device based on data warehouse
CN111753019A (en) Data partitioning method and device applied to data warehouse
CN111435406A (en) Method and device for correcting database statement spelling errors
CN111026629A (en) Method and device for automatically generating test script
CN110321435B (en) Data source dividing method, device, equipment and storage medium
CN112215249A (en) Hierarchical classification method and device
CN113190730A (en) Method and device for classifying block chain addresses
CN113111084A (en) Method and device for processing data
CN112148461A (en) Application scheduling method and device
CN113704242A (en) Data processing method and device
CN112579673A (en) Multi-source data processing method and device
CN113742321A (en) Data updating method and device
CN113448957A (en) Data query method and device
CN112988857A (en) Service data processing method and device
CN113779370B (en) Address retrieval method and device
CN111639099A (en) Full-text indexing method and system
CN113362097B (en) User determination method and device
CN111858917A (en) Text classification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination