CN112948370B

CN112948370B - Data classification method and device and computer equipment

Info

Publication number: CN112948370B
Application number: CN201911175983.1A
Authority: CN
Inventors: 唐君行
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2019-11-26
Filing date: 2019-11-26
Publication date: 2023-04-11
Anticipated expiration: 2039-11-26
Also published as: CN112948370A

Abstract

The invention discloses a data classification method, which comprises the following steps: acquiring data to be classified; calculating M characteristic values of the data to be classified according to a characteristic value calculation rule; comparing the M characteristic values with each characteristic value data table in a characteristic value data base in sequence; and when the M characteristic values are included in the first characteristic value data table, classifying the data to be classified into first class data corresponding to the first characteristic value data table. The invention also provides a data classification device, computer equipment and a computer readable storage medium. The invention can compare the simple characteristic value of the data to be classified with the characteristic value data table, thereby greatly reducing the data processing amount, shortening the time and improving the efficiency.

Description

Data classification method and device and computer equipment

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data classification method and apparatus, a computer device, and a computer-readable storage medium.

Background

In the prior art, for the data classification problem, generally, a data class is created according to existing data, and then data to be classified is compared with all data in each data class one by one, so as to compare whether the data to be classified belongs to the data class. However, the classification method by enumerating each of the existing classes of data requires a huge amount of computation, which consumes many computer processing resources and takes a long time and a low efficiency.

Disclosure of Invention

In view of this, the present invention provides a data classification method, an apparatus, a computer device, and a computer-readable storage medium, which can solve the problems that a large amount of computer processing resources are required to be consumed and time is consumed in the data classification process.

First, to achieve the above object, the present invention provides a data classification method, including:

acquiring data to be classified; calculating M characteristic values of the data to be classified according to a characteristic value calculation rule; comparing the M characteristic values with each characteristic value data table in a characteristic value data base in sequence, wherein the characteristic value data base at least comprises a first characteristic value data table, and the first characteristic value data table is a set of all characteristic values of the same category data calculated by the characteristic value calculation rule; and when the M characteristic values are included in the first characteristic value data table, classifying the data to be classified into first class data corresponding to the first characteristic value data table.

In one example, the feature value calculation rule includes: calculating M hash values of the data to be classified through M different hash functions; or dividing the data to be classified into M parts, and respectively calculating the hash values of the M parts through M hash functions.

In one example, the characteristic value data table stores all characteristic values of the same category data in a bloom filter, and a boolean value of 1 is assigned to each characteristic value in the bloom filter.

In one example, the characteristic value database further includes at least a second characteristic value data table, wherein the sequentially comparing the M characteristic values with each of the characteristic value data tables in the characteristic value database includes: sequentially inquiring whether the Boolean values of the storage orders corresponding to the M characteristic values are all 1 in a first bloom filter and a second bloom filter corresponding to the first characteristic value data table and the second characteristic value data table; when the boolean values of the storage order corresponding to the M feature values in the first bloom filter or the second bloom filter are all 1, it is determined that the M feature values are all included in the first feature value data table or the second feature value data table.

In one example, the method further comprises: when the M characteristic values are not completely included in the first bloom filter and not completely included in the second bloom filter, judging that the data to be classified does not belong to the existing class data; and returning a warning of classification failure.

In addition, to achieve the above object, the present invention also provides a data sorting apparatus, comprising:

the acquisition module is used for acquiring data to be classified; the calculation module is used for calculating M characteristic values of the data to be classified according to a characteristic value calculation rule; a comparison module, configured to compare the M feature values with each feature value data table in a feature value database in sequence, where the feature value database at least includes a first feature value data table, and the first feature value data table is a set of all feature values of the same category data calculated by the feature value calculation rule; and the classification module is used for classifying the data to be classified into first class data corresponding to the first characteristic value data table when the M characteristic values are included in the first characteristic value data table.

In one example, the characteristic value data table stores all characteristic values of the same category data in a bloom filter manner, each characteristic value has a boolean value of 1 in a corresponding storage order in the bloom filter, the characteristic value database further includes at least a second characteristic value data table, and the comparison module is further configured to: sequentially inquiring whether the Boolean values of the storage orders corresponding to the M characteristic values are all 1 in a first bloom filter and a second bloom filter corresponding to the first characteristic value data table and the second characteristic value data table; when the boolean values of the storage order corresponding to the M feature values in the first bloom filter or the second bloom filter are all 1, it is determined that the M feature values are all included in the first feature value data table or the second feature value data table.

Further, the present invention also proposes a computer device, which includes a memory and a processor, wherein the memory stores a computer program that can be run on the processor, and the computer program implements the steps of the data classification method as described above when being executed by the processor.

Further, to achieve the above object, the present invention also provides a computer-readable storage medium storing a computer program, which is executable by at least one processor to cause the at least one processor to perform the steps of the data classification method as described above.

Compared with the prior art, the data classification method, the data classification device, the computer equipment and the computer readable storage medium can calculate M characteristic values of the data to be classified according to the characteristic value calculation rule after the data to be classified is obtained; comparing the M characteristic values with each characteristic value data table in a characteristic value data base in sequence; and when the M characteristic values are included in the first characteristic value data table, classifying the data to be classified into first class data corresponding to the first characteristic value data table. Through the method, the simple characteristic value of the data to be classified can be compared with the characteristic value data table, so that the data processing amount is greatly reduced, the time is shortened, and the efficiency is improved.

Drawings

FIG. 1 is a schematic diagram of an application environment of an embodiment of the present invention;

FIG. 2 is a flow chart illustrating a data classification method according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a specific embodiment of the process of comparing the M eigenvalues with each of the eigenvalue data tables in the eigenvalue database in turn in step S204 of FIG. 2;

FIG. 4 is a schematic illustration of the effect of the step shown in FIG. 3;

FIG. 5 is a diagram of an alternative hardware architecture for the computer device of the present invention;

FIG. 6 is a block diagram of a data sorting apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

Fig. 1 is a schematic diagram of an application environment according to an embodiment of the present invention. Referring to fig. 1, the computer device 1 is connected to a user terminal and a data server, receives data to be classified sent by the user terminal, and classifies the data to be classified according to a characteristic value database stored in the data server. In the present embodiment, the computer device 1 can be used as a terminal device such as a server, a mobile phone, a user portable device, a PC, and the like. In other embodiments, the computer device 1 may also be a stand-alone functional module, and then attached to a data server or a user terminal to implement the function of data classification. Of course, in this embodiment, the characteristic value database is disposed on the data server, and in other embodiments, the characteristic value database may also be disposed on the computer device 1, which is not limited herein.

FIG. 2 is a flowchart illustrating a data classification method according to an embodiment of the present invention. It is to be understood that the flow charts in the embodiments of the present method are not intended to limit the order in which the steps are performed. The following description is made by taking a computer device as an execution subject.

As shown in fig. 2, the data classification method may include steps S200 to S206, in which:

step S200, acquiring data to be classified.

Specifically, after the computer device 1 is connected to a user terminal, when a user has data to be classified, the data to be classified is sent to the computer device 1 through the user terminal, and then the computer device 1 receives the data to be classified. Of course, in other embodiments, the computer device 1 may also provide an interactive interface, then receive a classification request of a user for the data to be classified stored on the computer device 1 through the interactive interface, and then obtain the data to be classified from the storage unit of the computer device 1 itself.

Step S202, calculating M characteristic values of the data to be classified according to a characteristic value calculation rule.

Specifically, after the computer device 1 acquires the data to be classified, M feature values of the data to be classified are calculated according to a preset feature value calculation rule. In one embodiment, the feature value calculation rule includes: m hash values of the data to be classified are calculated through M different hash functions, wherein the hash functions mainly calculate corresponding hash values, namely characteristic values, according to the data to be classified. That is, the computer device 1 calculates M feature values of the data to be classified by M different hash functions set in advance, and associates the M feature values with the data to be classified.

Of course, in another specific embodiment, the feature value calculation rule includes: dividing the data to be classified into M parts, and respectively calculating the hash values of the M parts through M hash functions. For example, when the data to be classified belongs to large-capacity data, the data to be classified may be divided into M parts, and then the characteristic values of the data to be classified are calculated sequentially according to M preset hash functions, so as to obtain M corresponding characteristic values. The process of dividing the data to be classified can be set differently according to the characteristics of the data to be classified, for example, in the process of classifying video data, the data to be classified can be divided according to the duration of video; in the process of classifying the text data, the data to be classified can be divided according to paragraphs. In summary, for different data classifications, the computer device 1 may calculate M feature values of the data to be classified according to a preset feature value calculation rule.

And S204, comparing the M characteristic values with each characteristic value data table in a characteristic value data base in sequence. Wherein the characteristic value database includes at least a first characteristic value data table which is a set of all characteristic values of the same category data calculated by the characteristic value calculation rule.

Step S206, when all the M feature values are included in the first feature value data table, classifying the data to be classified into first class data corresponding to the first feature value data table.

In this embodiment, after the computer device 1 calculates M feature values of the data to be classified, the M feature values are sent to the data server, and the data service is requested to compare the M feature values with each feature value data table in a feature value database in sequence. Of course, in other embodiments, the computer device 1 may also obtain the characteristic value database from the data server, and then directly compare the M characteristic values with each characteristic value data table in the characteristic value database in sequence. The characteristic value data table is obtained by calculating the characteristic value of the same type of data according to the characteristic value calculation rule.

When the computer device 1 determines that the M feature values are included in the first feature value data table by comparison, the data to be classified is considered to be included in the existing data corresponding to the first feature value data table, and thus, the data to be classified is classified into the first category data corresponding to the first feature value data table. And finally, returning the classification result to the user terminal.

In an exemplary embodiment, the characteristic value data table stores all characteristic values of the same category data in a bloom filter, and a boolean value of 1 is assigned to each characteristic value in the bloom filter. As shown in fig. 3, when the characteristic value database further includes a second characteristic value data table, the comparing the M characteristic values with each characteristic value data table in the characteristic value database in sequence in step S204 includes steps S300 to S304:

and step S300, sequentially inquiring whether the Boolean values of the storage orders corresponding to the M characteristic values are all 1 in the first bloom filter and the second bloom filter corresponding to the first characteristic value data table and the second characteristic value data table.

Step S302 is to determine that all the M eigenvalues are included in the first eigenvalue data table or the second eigenvalue data table when all the boolean values of the storage order corresponding to each of the M eigenvalues in the first bloom filter or the second bloom filter are 1.

Step S304, when the M characteristic values are not completely included in the first characteristic value data table or the second characteristic value data table, judging that the data to be classified does not belong to the existing class data, and returning a warning of classification failure.

In particular, when the feature value data table is set as a bloom filter, then the feature value database represents a plurality of bloom filters. Therefore, after the computer device 1 calculates the M feature values of the data to be classified, the M feature values are sequentially compared with each bloom filter, and whether the M feature values are included in any bloom filter is determined. In this embodiment, since the bloom filter is a storage unit of a specific size that is stored in an array form, the storage unit includes a storage order and a boolean value in the storage order, the storage order is that the storage unit has an arrangement order, and the boolean value includes 1 and 0. Therefore, the computer device 1 sequentially searches whether the boolean values in the storage order corresponding to the M feature values are both 1 in the first bloom filter and the second bloom filter corresponding to the first feature value data table and the second feature value data table. When the boolean values of the storage order corresponding to the M feature values in the first bloom filter are all 1, determining that all the M feature values are included in the first feature value data table; and when the M characteristic values are not completely included in the first characteristic value data table and not completely included in the second characteristic value data table, judging that the data to be classified does not belong to the existing class data, and returning a warning of classification failure.

Referring to fig. 4, the computer device 1 compares M feature values of the data to be classified with the bloom filter 1 and the bloom filter 2 in sequence, and determines whether the M feature values exist in the bloom filter 1 or the bloom filter 2: in fig. 4 (a), when the M feature values do not exist in bloom filter 1 but exist in bloom filter 2, they are classified into second class data; in fig. 4 (B), when the M feature values do not exist in bloom filter 1 or bloom filter 2, the classification is not successful, and the data to be classified does not belong to the existing class data.

As can be seen from the above, after the data classification method provided in this embodiment can acquire data to be classified, M feature values of the data to be classified are calculated according to a feature value calculation rule; comparing the M characteristic values with each characteristic value data table in a characteristic value data base in sequence; and when the M characteristic values are included in the first characteristic value data table, classifying the data to be classified into first class data corresponding to the first characteristic value data table. Through the mode, the simple characteristic value of the data to be classified can be compared with the characteristic value data table, so that the data processing amount is greatly reduced, the time is shortened, and the efficiency is improved.

In addition, the present invention also provides a computer device, which is shown in fig. 5 and is a schematic diagram of an optional hardware architecture of the computer device of the present invention.

In this embodiment, the computer device 1 may include, but is not limited to, a memory 11, a processor 12, and a network interface 13, which may be communicatively connected to each other through a system bus. The computer device 1 is connected to a network (not shown in fig. 6) through a network interface 13, and is connected to a server (not shown in fig. 5) through the network for data interaction. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), wideband Code Division MultIPle Access (WCDMA), a 4G network, a 5G network, bluetooth (Bluetooth), wi-Fi, or a communication network.

It is noted that fig. 5 only shows the computer device 1 with components 11-13, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.

The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 11 may be an internal storage unit of the computer device 1, such as a hard disk or a memory of the computer device 1. In other embodiments, the memory 11 may also be an external storage device of the computer device 1, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are equipped with the computer device 1. Of course, the memory 11 may also comprise both an internal storage unit of the computer device 1 and an external storage device thereof. In this embodiment, the memory 11 is generally used for storing an operating system installed in the computer device 1 and various types of application software, such as program codes of the barrier application, and program codes of the data sorting apparatus 200. Furthermore, the memory 11 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 12 is typically used for controlling the overall operation of the computer device 1, such as performing data interaction or communication related control and processing. In this embodiment, the processor 12 is configured to run a program code stored in the memory 11 or process data, for example, an application program of the data classification apparatus 200, which is not limited herein.

The network interface 13 may comprise a wireless network interface or a wired network interface, and the network interface 13 is generally used for establishing a communication connection between the computer device 1 and a user terminal and a data server.

In this embodiment, when the data classification device 200 is installed and operated in the computer device 1, after the data classification device 200 is operated, and data to be classified can be acquired, M feature values of the data to be classified are calculated according to a feature value calculation rule; comparing the M characteristic values with each characteristic value data table in a characteristic value data base in sequence; and when the M characteristic values are included in the first characteristic value data table, classifying the data to be classified into first class data corresponding to the first characteristic value data table. Through the method, the simple characteristic value of the data to be classified can be compared with the characteristic value data table, so that the data processing amount is greatly reduced, the time is shortened, and the efficiency is improved.

The hardware structure and functions of the computer apparatus of the present invention have been described in detail so far. Hereinafter, various embodiments of the present invention will be proposed based on the above-described computer apparatus.

Referring to FIG. 6, a block diagram of a data sorting apparatus 200 according to an embodiment of the invention is shown.

In this embodiment, the data classification apparatus 200 includes a series of computer program instructions stored on the memory 11, which when executed by the processor 12, can implement the data classification function of the embodiment of the present invention. In some embodiments, the data classification apparatus 200 may be divided into one or more modules based on the particular operations implemented by the portions of the computer program instructions. For example, in fig. 6, the data classification apparatus 200 may be divided into an acquisition module 201, a calculation module 202, an alignment module 203, and a classification module 204. Wherein:

the obtaining module 201 is configured to obtain data to be classified.

Specifically, after the computer device is connected to the user terminal, when the user has data to be classified, the data to be classified is sent to the computer device through the user terminal, and then the obtaining module 201 receives the data to be classified. Of course, in other embodiments, the computer device may also provide an interactive interface, then receive a classification request of the user for the data to be classified stored on the computer device through the interactive interface, and then the obtaining module 201 obtains the data to be classified from the storage unit of the computer device itself.

The calculating module 202 is configured to calculate M feature values of the data to be classified according to a feature value calculation rule.

Specifically, after the obtaining module 201 obtains the data to be classified, the calculating module 202 calculates M feature values of the data to be classified according to a preset feature value calculating rule. In one embodiment, the feature value calculation rule includes: m hash values of the data to be classified are calculated through M different hash functions, wherein the hash functions mainly calculate corresponding hash values, namely characteristic values, according to the data to be classified. That is, the computer device calculates M feature values of the data to be classified by M different hash functions set in advance, and associates the M feature values with the data to be classified.

Of course, in another embodiment, the feature value calculation rule includes: dividing the data to be classified into M parts, and respectively calculating the hash values of the M parts through M hash functions. For example, when the data to be classified belongs to large-capacity data, the data to be classified may be divided into M parts, and then the characteristic values of the data to be classified are calculated sequentially according to M preset hash functions, so as to obtain M corresponding characteristic values. The process of dividing the data to be classified can be set in a differentiation manner according to the characteristics of the data to be classified, for example, in the process of classifying video data, the data to be classified can be divided according to the video duration; in the process of classifying the text data, the data to be classified can be divided according to paragraphs. In short, for different data classifications, the calculating module 202 may calculate M feature values of the data to be classified according to a preset feature value calculating rule.

The comparison module 203 is configured to compare the M characteristic values with each characteristic value data table in a characteristic value database in sequence. Wherein the characteristic value database includes at least a first characteristic value data table which is a set of all characteristic values of the same category data calculated by the characteristic value calculation rule.

The classifying module 204 is configured to classify the data to be classified into first class data corresponding to the first characteristic value data table when all the M characteristic values are included in the first characteristic value data table.

In this embodiment, after the calculating module 202 calculates M eigenvalues of the data to be classified, the comparing module 203 sends the M eigenvalues to the data server, and requests the data service to compare the M eigenvalues with each eigenvalue data table in the eigenvalue database in sequence. Of course, in other embodiments, the comparing module 203 may also obtain the characteristic value database from the data server, and then directly compare the M characteristic values with each characteristic value data table in the characteristic value database in sequence. The characteristic value data table is obtained by calculating the characteristic value of the same type of data according to the characteristic value calculation rule.

When the comparison module 203 determines that the M feature values are included in the first feature value data table through comparison, the data to be classified is considered to be included in the existing data corresponding to the first feature value data table, and therefore the classification module 204 classifies the data to be classified into the first type data corresponding to the first feature value data table. And finally, returning the classification result to the user terminal.

In an exemplary embodiment, the characteristic value data table stores all characteristic values of the same category data in a bloom filter, and a boolean value of 1 is assigned to each characteristic value in the bloom filter. When the characteristic value database further includes a second characteristic value data table, the comparison module 203 is further configured to sequentially query whether the boolean values of the storage orders corresponding to the M characteristic values are all 1 in a first bloom filter and a second bloom filter corresponding to the first characteristic value data table and the second characteristic value data table; and when the boolean values of the storage order corresponding to the M feature values in the first bloom filter or the second bloom filter are all 1, it is determined that the M feature values are all included in the first feature value data table or the second feature value data table. The classification module 204 is further configured to, when the M characteristic values are not completely included in the first characteristic value data table nor the second characteristic value data table, determine that the data to be classified does not belong to the existing category data, and return a warning of classification failure.

Specifically, when the feature value database is set as a bloom filter, then the feature value database represents a plurality of bloom filters. Therefore, after the calculating module 202 calculates the M feature values of the data to be classified, the comparing module 203 compares the M feature values with each bloom filter in sequence, and determines whether the M feature values are included in any bloom filter. In this embodiment, since the bloom filter is a storage unit of a specific size that is stored in an array form, the storage unit includes a storage order and a boolean value in the storage order, the storage order is the arrangement order on the storage unit, and the boolean value includes 1 and 0. Therefore, the comparison module 203 sequentially queries whether the boolean values of the storage orders corresponding to the M feature values are both 1 in the first bloom filter and the second bloom filter corresponding to the first feature value data table and the second feature value data table. When the boolean values of the storage order corresponding to the M feature values in the first bloom filter are all 1, the comparison module 203 determines that the M feature values are all included in the first feature value data table; when the comparison module 203 determines that the M characteristic values are not completely included in the first characteristic value data table nor in the second characteristic value data table, the classification module 204 determines that the data to be classified does not belong to the existing class data, and returns a warning of classification failure.

Referring to fig. 4, the comparison module 203 compares M feature values of the data to be classified with the bloom filter 1 and the bloom filter 2 in sequence, and determines whether the M feature values exist in the bloom filter 1 or the bloom filter 2: in fig. 4 (a), when the comparing module 203 determines that the M feature values are not present in the bloom filter 1 but present in the bloom filter 2, the classifying module 204 classifies the data to be classified into the second category data; in fig. 4 (B), when the comparing module 203 determines that the M feature values do not exist in the bloom filter 1 or do not exist in the bloom filter 2, the classifying module 204 prompts that the classification fails, and the data to be classified does not belong to the existing class data.

As can be seen from the above, after the computer device can acquire the data to be classified, M feature values of the data to be classified are calculated according to the feature value calculation rule; comparing the M characteristic values with each characteristic value data table in a characteristic value data base in sequence; and when the M characteristic values are included in the first characteristic value data table, classifying the data to be classified into first class data corresponding to the first characteristic value data table. Through the mode, the simple characteristic value of the data to be classified can be compared with the characteristic value data table, so that the data processing amount is greatly reduced, the time is shortened, and the efficiency is improved.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method of data classification, the method comprising:

acquiring data to be classified;

calculating M characteristic values of the data to be classified according to a characteristic value calculation rule;

comparing the M characteristic values with each characteristic value data table in a characteristic value data base in sequence, wherein the characteristic value data base at least comprises a first characteristic value data table which is a set of all characteristic values of the same category of data calculated according to the characteristic value calculation rule;

when the M characteristic values are included in the first characteristic value data table, classifying the data to be classified into first class data corresponding to the first characteristic value data table;

wherein the feature value calculation rule includes:

calculating M hash values of the data to be classified through M different hash functions; or

Dividing the data to be classified into M parts, and respectively calculating the hash values of the M parts through M hash functions.

2. The data classification method according to claim 1, wherein the characteristic value data table stores all characteristic values of the same category data in a bloom filter, and a boolean value of 1 is corresponding to each characteristic value in the bloom filter.

3. The data classification method according to claim 2, wherein the characteristic value database further includes at least a second characteristic value data table, and wherein the sequentially comparing the M characteristic values with each of the characteristic value data tables in the characteristic value database includes:

sequentially inquiring whether the Boolean values of the storage orders corresponding to the M characteristic values are all 1 in a first bloom filter and a second bloom filter corresponding to the first characteristic value data table and the second characteristic value data table;

when the boolean values of the storage order corresponding to the M feature values in the first bloom filter or the second bloom filter are all 1, it is determined that the M feature values are all included in the first feature value data table or the second feature value data table.

4. The data classification method of claim 3, characterized in that the method further comprises:

when the M characteristic values are not completely included in the first bloom filter and not completely included in the second bloom filter, judging that the data to be classified do not belong to the existing class data;

and returning a warning of classification failure.

5. An apparatus for classifying data, the apparatus comprising:

the acquisition module is used for acquiring data to be classified;

the calculation module is used for calculating M characteristic values of the data to be classified according to a characteristic value calculation rule;

a comparison module, configured to compare the M feature values with each feature value data table in a feature value data base in sequence, where the feature value data base at least includes a first feature value data table, and the first feature value data table is a set of all feature values of the same category data calculated according to the feature value calculation rule;

a classification module, configured to classify the data to be classified into first class data corresponding to the first characteristic value data table when all the M characteristic values are included in the first characteristic value data table;

wherein the feature value calculation rule includes:

6. The data classification apparatus according to claim 5, wherein the characteristic value data table stores all characteristic values of the same category data in a bloom filter, each characteristic value has a boolean value of 1 in a corresponding storage order in the bloom filter, the characteristic value database further includes at least a second characteristic value data table, and the comparison module is further configured to:

when all of boolean values in storage order corresponding to the M feature values in the first bloom filter or the second bloom filter are 1, it is determined that all of the M feature values are included in the first feature value data table or the second feature value data table.

7. A computer arrangement, characterized in that the computer arrangement comprises a memory, a processor, the memory having stored thereon a computer program being executable on the processor, the computer program, when being executed by the processor, realizing the steps of the data classification method according to any one of claims 1-4.

8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executable by at least one processor to cause the at least one processor to perform the steps of the data classification method according to any one of claims 1-4.