CN114881117A - Data classification method and related device - Google Patents

Data classification method and related device Download PDF

Info

Publication number
CN114881117A
CN114881117A CN202210365185.0A CN202210365185A CN114881117A CN 114881117 A CN114881117 A CN 114881117A CN 202210365185 A CN202210365185 A CN 202210365185A CN 114881117 A CN114881117 A CN 114881117A
Authority
CN
China
Prior art keywords
abnormal data
probability
category
data
categories
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210365185.0A
Other languages
Chinese (zh)
Inventor
朱守博
姜波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spreadtrum Communications Shanghai Co Ltd
Original Assignee
Spreadtrum Communications Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spreadtrum Communications Shanghai Co Ltd filed Critical Spreadtrum Communications Shanghai Co Ltd
Priority to CN202210365185.0A priority Critical patent/CN114881117A/en
Publication of CN114881117A publication Critical patent/CN114881117A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data classification method and a related device, wherein the method comprises the following steps: acquiring abnormal data to be classified, wherein the abnormal data comprises m characteristic attributes, the abnormal data is generated in the communication process of the terminal equipment, and m is a positive integer; performing classification operation on the m characteristic attributes, determining the probability of the m characteristic attributes under each category in n categories to obtain n probabilities, wherein the categories are the categories of abnormal data generated in the communication process of the terminal equipment, one category corresponds to one probability, and n is a positive integer; and determining the category to which the abnormal data belongs based on the n probabilities. By adopting the method provided by the application, the type determination efficiency of the abnormal data can be improved, and the data processing efficiency is further improved.

Description

Data classification method and related device
Technical Field
The present invention relates to the field of communications, and in particular, to a data classification method and related apparatus.
Background
In the current information explosion age, with the rapid progress of network communication, digital television and computer technology, the data acquisition, processing, analysis and presentation technology is also rapidly developed. Intelligent electronic devices, which are inevitably abnormal in modem during development or use, have become an indispensable part of people's daily life.
At present, when a modem anomaly is collected, the modem anomaly is generally analyzed manually to determine a category to which the modem anomaly belongs, so that the anomaly is processed in a targeted manner, however, the manner of manually analyzing the category of the anomaly causes low classification efficiency, and the data processing efficiency is reduced on the whole.
Disclosure of Invention
The application provides a data classification method and a related device, which can improve the class determination efficiency of abnormal data and further improve the data processing efficiency.
In a first aspect, the present application provides a data classification method, including: acquiring abnormal data to be classified, wherein the abnormal data comprises m characteristic attributes, the abnormal data is generated in the communication process of the terminal equipment, and m is a positive integer; performing classification operation on the m characteristic attributes, determining the probability of the m characteristic attributes under each of n categories to obtain n probabilities, wherein the categories are the categories of abnormal data generated in the communication process of the terminal equipment, one category corresponds to one probability, and n is a positive integer; and determining the category to which the abnormal data belongs based on the n probabilities.
With reference to the first aspect, in a possible implementation manner, the feature attribute includes one or more of the following: file name, code line, error information pointer, task name, stack pointer, register in current operation mode, or register in suspension mode; the categories include one or more of: blocked communications, abnormal communication data, down communications, abnormal cores, timeout timers, failed communications, or full message queues.
With reference to the first aspect, in a possible implementation manner, the performing classification operation on the m feature attributes and determining the probability of the m feature attributes in each of n classes includes: calculating the occurrence probability of each of n categories in the abnormal data; calculating the conditional probability of the m characteristic attributes under each category; based on the probability of occurrence and the conditional probability, a probability of the m feature attributes under each of the n classes is determined.
With reference to the first aspect, in a possible implementation manner, the determining, based on the n probabilities, a category to which the abnormal data belongs includes: and acquiring the category corresponding to the maximum probability in the n probabilities, and determining the category corresponding to the maximum probability as the category to which the abnormal data belongs.
With reference to the first aspect, in a possible implementation manner, the determining, based on the n probabilities, a category to which the abnormal data belongs includes: d probabilities with the probabilities larger than a target threshold are obtained from the n probabilities, a category corresponding to each probability in the d probabilities is determined, and d categories are obtained, wherein d is a positive integer smaller than or equal to n; determining the d categories as the categories to which the abnormal data belong; the method further comprises the following steps: and outputting the d categories and the probability corresponding to each category in the d categories.
With reference to the first aspect, in a possible implementation manner, the number of the abnormal data to be classified is multiple; the method further comprises the following steps: if the abnormal data comprise abnormal data with the quantity larger than the target quantity and the abnormal data are all in the target category, generating prompt information, wherein the prompt information is used for prompting to process the abnormality of the target category; and displaying the prompt message and/or sending the prompt message to the management terminal.
With reference to the first aspect, in a possible implementation manner, the method further includes: acquiring sample data to be classified, wherein the sample data comprises f characteristic attributes, the sample data is abnormal data generated in the communication process of the terminal equipment, and f is a positive integer; carrying out classification operation on the f characteristic attributes, determining the probability of the f characteristic attributes under each category in the n categories to obtain n sample probabilities, wherein one category corresponds to one sample probability; determining a sample class to which the sample data belongs based on the n sample probabilities; obtaining a mark type of the sample data, and training to obtain a classifier based on the sample type of the sample data and the mark type; should classify the operation to this m characteristic attribute, include: and performing classification operation on the m characteristic attributes based on the classifier.
In a second aspect, the present application provides a communication device comprising means for implementing the method of the first aspect and any possible implementation thereof.
In a third aspect, the present application provides a communications apparatus comprising a processor and a transceiver; the transceiver is used for receiving or transmitting signals; the processor is configured to perform the method according to the first aspect and any possible implementation manner thereof.
With reference to the third aspect, in a possible implementation manner, the communication apparatus further includes a memory: the memory for storing a computer program; the processor is specifically configured to invoke the computer program from the memory, so that the communication device executes the method according to the first aspect and any possible implementation manner thereof.
In a fourth aspect, the present application provides a chip, where the chip is configured to obtain abnormal data to be classified, where the abnormal data includes m feature attributes, the abnormal data is generated in a communication process of a terminal device, and m is a positive integer; performing classification operation on the m characteristic attributes, determining the probability of the m characteristic attributes under each of n categories to obtain n probabilities, wherein the categories are the categories of abnormal data generated in the communication process of the terminal equipment, one category corresponds to one probability, and n is a positive integer; and determining the category to which the abnormal data belongs based on the n probabilities.
In a fifth aspect, the present application provides a module device, which includes a communication module, a power module, a storage module, and a chip module, wherein: the power module is used for providing electric energy for the module equipment; the storage module is used for storing data and instructions; the communication module is used for carrying out internal communication of the module equipment or is used for carrying out communication between the module equipment and external equipment; this chip module is used for: acquiring abnormal data to be classified, wherein the abnormal data comprises m characteristic attributes, the abnormal data is generated in the communication process of the terminal equipment, and m is a positive integer; performing classification operation on the m characteristic attributes, determining the probability of the m characteristic attributes under each of n categories to obtain n probabilities, wherein the categories are the categories of abnormal data generated in the communication process of the terminal equipment, one category corresponds to one probability, and n is a positive integer; and determining the category to which the abnormal data belongs based on the n probabilities.
In a sixth aspect, the present application provides a computer-readable storage medium having stored thereon computer-readable instructions that, when run on a communication device, cause the communication device to perform the method of the first aspect and any of its possible implementations.
In a seventh aspect, the present application provides a computer program or a computer program product comprising code or instructions which, when run on a computer, cause the computer to perform the method according to the first aspect and any possible implementation thereof.
By the method, the abnormal data generated by the terminal equipment in the communication process are obtained, the multiple characteristic attributes included in the abnormal data are classified and calculated, the probability that the abnormal data belong to each abnormal category can be determined, and therefore the category to which the abnormal data belong is determined. Because the probability under each category is calculated, the accuracy of determining the abnormal data category can be improved. Further, since the category to which the abnormal data belongs is determined, the terminal device can be subsequently processed for the category to which the abnormal data belongs, for example, the abnormal repair processing, so that the terminal device can be processed in a targeted manner, and the data processing efficiency is further improved. In addition, the abnormal data does not need to be classified manually, so that the data classification efficiency can be improved.
Drawings
Fig. 1 is a schematic diagram of a network architecture provided in an embodiment of the present application;
FIG. 2 is a flow chart of a data classification method provided by an embodiment of the present application;
FIG. 3 is a flow chart of a method for training a classifier according to an embodiment of the present application;
FIG. 4 is a flow chart of another data classification method provided by the embodiments of the present application;
fig. 5 is a schematic structural diagram of a communication device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of another communication device provided in an embodiment of the present application;
fig. 7 is a schematic structural diagram of a chip provided in an embodiment of the present application;
fig. 8 is a schematic structural diagram of a module apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the following embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in the specification of the present application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the listed items.
It should be noted that the term "comprises/comprising" and any variations thereof in this application is intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In some embodiments, some simple statistical methods such as classifying modem anomalies according to the representation of the terminal device can be used, but there are disadvantages of large sample size, statistical difficulty, and lack of characteristic information that does not help in analyzing modem anomalies. The shortcomings of the conventional classification algorithms such as an Artificial Neural Network (ANN) algorithm are also obvious, and the learning time of the sample is too long, and even the learning purpose may not be achieved, for example, when the sample data is over-fit or not fit, the data determined by the classification algorithm has a large difference. Secondly, there may be output results that are difficult to interpret, for example, when the sample data is the outlier data, the data cannot be eliminated, which affects the final classification algorithm output result. Further, because the artificial neural network needs a large number of parameters, such as network topology, initial values of weight values and threshold values, and the like, and the data volume of the current modem abnormal data is insufficient, a large number of parameters cannot be acquired, so that the effect of the classification algorithm is poor.
In view of this, the technical solution of the present application provides a data classification method, which determines the probability of the feature attribute under each category by performing classification operation on the feature attribute included in the abnormal data to be classified, so as to determine the category to which the abnormal data to be classified belongs. When the abnormal data to be classified is obtained, probability calculation can be performed based on the characteristic attributes of the abnormal data, so that the labor cost can be reduced, and the data classification efficiency can be improved; in addition, the probability under each category is calculated, so that the accuracy of data classification can be improved, and in addition, the probability calculation mode is simple, so that the data processing efficiency can be improved.
Fig. 1 shows a schematic diagram of an architecture of a network system, and fig. 1 shows a data classification method according to an embodiment of the present invention. As shown in fig. 1, the network system may include a terminal device 10 and a network device 20. The terminal device 10 and the network device 20 are communicatively connected, for example, the terminal device 10 may establish a communication connection with the network device 20 through a Global System for Mobile Communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Time Division Multiple Access (Time-Division Code Division Multiple Access, TD-SCDMA), Long Term Evolution (LTE), fifth Generation Mobile communication technology (5th Generation Mobile Networks, 5G), Wireless Local Area Network (WLAN), Wireless Fidelity (Wi-Fi) network.
The terminal device 10 may be a portable terminal device, such as a smart phone, a tablet computer, a wearable terminal device (e.g., a smart watch) with a wireless communication function, and the like; the terminal device 10 may also be a non-portable terminal device such as a vehicle-mounted computer, a desktop computer, or the like.
Here, the network device 20 may refer to a device that provides a wireless communication function for the terminal device 10. The Network device 20 may be a next generation Base Station (enode B, gNB), an evolved Node B (evolved Node B, eNB), a Node B (Node B, eNB), a Radio Network Controller (Radio Network Controller, RNC), a Base Station Controller (BSC), a Base Transceiver Station (BTS), a baseband Unit (BBU), a transmission Point (TRP), a Transmission Point (TP), a mobile switching center (msc), etc. in 5G, which is not limited herein.
The network architecture and the service scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not form a limitation on the technical solution provided in the embodiment of the present application, and as a person of ordinary skill in the art knows that along with the evolution of the network architecture and the appearance of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.
Next, a data classification method according to an embodiment of the present application is described in detail, please refer to fig. 2, and fig. 2 is a flowchart of a data classification method according to an embodiment of the present application, which can be applied to the network architecture shown in fig. 1. As shown in fig. 2, the data classification method includes, but is not limited to, the following steps:
s101, the terminal equipment acquires abnormal data to be classified.
The method and the device for processing the abnormal data are suitable for classifying and processing the modem abnormity of the terminal equipment, so that the category of the abnormal data is determined, and the terminal equipment is convenient to carry out targeted abnormity repairing in a scene. In the embodiment of the application, when the abnormal modem of the terminal device is detected, the abnormal data can be acquired, and the abnormal data is further classified. The abnormal data is generated in the communication process of the terminal equipment, and m is a positive integer.
In a possible situation, the terminal device may collect its own modem abnormal data, classify its own abnormal data, and determine the category to which its own abnormal data belongs.
In another possible case, the terminal device may classify the modem abnormal data of other terminal devices, and determine a category to which the abnormal data belongs. If the terminal device classifies the abnormal data of other terminal devices, the other terminal devices can collect the abnormal data of the modem of the terminal device, send the collected abnormal data of the modem to the terminal device, and classify the abnormal data based on the terminal device.
Illustratively, the classified terminal device is terminal device 1, and the other terminal devices may include terminal devices 2 to 6, so that when the terminal devices 2 to 6 acquire the abnormal data of their own modem, the abnormal data may be sent to the terminal device 1, and the abnormal data of the terminal devices 2 to 6 are classified by the terminal device 1. Optionally, any terminal device in the terminal devices 2 to 6 may also classify the abnormal data of other terminal devices, which is not limited in this embodiment of the present application.
The modem is mainly used for converting digital signals into analog signals transmitted on a telephone line and realizing communication between two computers. Optionally, a method for caching abnormal data related to the modem may be established in advance, for example, which abnormal data need to be cached may be set in advance, and when it is detected that the modem is abnormal, it is determined whether the abnormality belongs to the abnormal data which needs to be cached in advance. If the exception of the terminal equipment belongs to the exception data needing to be cached, the exception data can be classified, if the exception data does not belong to the exception data needing to be cached, subsequent processing is not needed, and the data processing efficiency can be saved. For example, the terminal device may establish a cache for exception data including a target feature, for example, the target feature may be a register, and when it is detected that a modem exception generated by the terminal device is an exception related to the register, the exception data is cached.
Optionally, the exception data may include m feature attributes, i.e., the exception data may refer to a set of m feature attributes, which may include, but are not limited to, a file name, a code line, an error information pointer, a task name, a stack pointer, a register in a current run mode, a register in an abort mode, and so on. When abnormal data generated by the terminal device in the communication process is detected, one or more characteristic attributes included in the abnormal data can be determined, so that the abnormal data can be classified based on the characteristic attributes included in the abnormal data.
S102, the terminal device carries out classification operation on the m characteristic attributes, determines the probability of the m characteristic attributes under each of the n categories, and obtains n probabilities.
The category may refer to a category of abnormal data generated during communication of the terminal device, and the category may include, but is not limited to, communication block (modem block), communication data abnormality (data abort), communication downtime (cpcrash or modecrash), kernel abnormality (kernel crash), timer timeout (watchdog timeout), communication failure (modem not alive), message queue full (queue full), and the like.
In the embodiment of the application, since the abnormal data includes m feature attributes, the category of the abnormal data can be determined by combining one or more feature attributes of the m feature attributes, and one category corresponds to one probability. By performing classification operation on the m characteristic attributes, the probability of the m characteristic attributes under each of the n categories can be determined, so as to obtain n probabilities, wherein n is a positive integer.
In one possible implementation, a classifier may be used to perform a classification operation on the m feature attributes, and determine the probability of the m feature attributes under each of the n classes. For example, a classifier may be used to calculate the probability of occurrence for each of n classes in the anomaly data; calculating the conditional probability of m characteristic attributes under each category; based on the probability of occurrence and the conditional probability, the probabilities of the m feature attributes under each of the n classes are determined.
Alternatively, the classifier may be a naive bayes classifier, when abnormal data to be classified is acquired, the abnormal data may be quantized and normalized, and the classifier is used to perform probability calculation on the processed abnormal data, for example, the probability P (y) of occurrence of each class in the abnormal data may be calculated k ) And calculating the conditional probability of each feature attribute under each category, and calculating the probability of m feature attributes under each category in n categories according to the occurrence probability and the conditional probability based on Bayesian theorem. Since the probabilities of p (x) for all the classes are the same, only molecular maximization needs to be considered in the embodiment of the present application, that is, the maximum probability can be determined from the probabilities of m feature attributes under each of the n classes, and thus the class to which the abnormal data to be classified belongs is determined.
For example, the formula for determining the probability of m feature attributes under each of n classes based on the probability of occurrence and the conditional probability may be as shown in formula (1-1):
Figure BDA0003586817050000061
wherein, P (y) k | x) is the probability that the anomaly data belongs to class k, i.e., the probability of m feature attributes under class k, P (y) k ) For the probability of occurrence of each class in the anomaly data, P (x | y) k ) Is the conditional probability of each characteristic attribute under each category, x is abnormal data, a m Characteristic attribute included for the anomaly data, y k Are classified.
In specific implementation, before the classifier is used for classifying the abnormal data, the classifier can be trained in advance, the trained classifier is stored, and when the abnormal data is obtained subsequently, the classified operation is performed on the abnormal data based on the stored classifier. The specific process of training the classifier may refer to the method in the embodiment corresponding to fig. 3, which is not described herein too much.
S103, the terminal device determines the category to which the abnormal data belongs based on the n probabilities.
In the embodiment of the application, the terminal device calculates the probability of m feature attributes under each of n categories, that is, the probability of the abnormal data belonging to each of the n categories, so that the category to which the abnormal data belongs can be determined based on the n probabilities.
In one possible implementation manner, the manner of determining the category to which the abnormal data belongs based on the n probabilities may be: and acquiring the maximum probability from the n probabilities, determining the category corresponding to the maximum probability, and determining the category corresponding to the maximum probability as the category to which the abnormal data to be classified belongs.
Illustratively, for example, n is equal to 3, the categories include three categories, which are communication blocking, communication downtime and timer timeout, respectively, the probability corresponding to the communication blocking is 0.7, the probability corresponding to the communication downtime is 0.6, the probability corresponding to the timer timeout is 0.65, it is known that the maximum probability is 0.7, and the category corresponding to the maximum probability is the communication blocking, the communication blocking is determined as the category to which the abnormal data belongs. Optionally, if the probability corresponding to the abnormal communication data is 0.7, the probability corresponding to the communication downtime is 0.7, and the probability corresponding to the timeout of the timer is 0.65, since the probabilities of the two categories of the abnormal communication data and the communication downtime are equal, any one of the two categories of the abnormal communication data and the communication downtime may be used as the category to which the abnormal data belongs; alternatively, both of the categories of the communication data abnormality and the communication downtime may be determined as the category to which the abnormal data belongs.
In another possible implementation manner, the manner of determining the category to which the abnormal data belongs based on the n probabilities may be: obtaining d probabilities with the probabilities larger than a target threshold value from the n probabilities, determining a category corresponding to each probability in the d probabilities to obtain d categories, wherein d is a positive integer smaller than or equal to n; and determining the d categories as categories to which the abnormal data belong. That is, if there are a plurality of probabilities each of which is greater than the target threshold, the categories corresponding to the plurality of probabilities may be determined as the categories to which the abnormal data to be classified belongs.
Illustratively, for example, n is equal to 3, the categories include three categories, which are communication blocking, communication downtime and timer timeout respectively, the probability corresponding to the communication blocking is 0.7, the probability corresponding to the communication downtime is 0.75, the probability corresponding to the timer timeout is 0.65, and the target threshold value is 0.6.
Further, the d categories and the probability corresponding to each category in the d categories can also be output. Exemplary ways in which the terminal device outputs the d categories include, but are not limited to, text output and voice output. The terminal device may output the categories and the probabilities corresponding to the categories based on its own output device (e.g., a display), or the terminal device may further transmit the d categories and the probabilities corresponding to each category to other terminal devices for output, where the other terminal devices include but are not limited to a management terminal. By outputting the d probabilities with the probabilities larger than the target threshold and the categories corresponding to the d probabilities, subsequent manual work can judge the d categories, and further accuracy of category determination is improved.
For example, when the probabilities corresponding to the communication blocking, the communication data abnormality and the 3 categories of the communication downtime are all greater than the target threshold, the probabilities of the 3 categories and each category may be output, and subsequent related management personnel may detect the communication blocking, the communication data abnormality and the 3 categories of the communication downtime of the terminal device, so as to specifically repair the abnormality of the terminal device.
In some embodiments, if the number of the abnormal data to be classified is one, the abnormal data may be classified by the above method, and the category to which the abnormal data belongs is determined, so that the terminal device is subjected to targeted abnormal repairing processing, the category determination efficiency of the abnormal data is improved, and the data processing efficiency is further improved.
In some embodiments, if the number of the abnormal data to be classified is multiple, in the case of determining a category to which each abnormal data in the multiple abnormal data belongs, the abnormal data of different categories may be subjected to targeted processing. For example, if the plurality of abnormal data include abnormal data with a number greater than the target number and the abnormal data are all of the target types, prompt information is generated and used for prompting to process the abnormality of the target types; and displaying the prompt information and/or sending the prompt information to the management terminal. Wherein the object class may refer to any of the above mentioned classes.
For example, if the number of the abnormal data to be classified is 100, the target number is 60, and if the categories of the abnormal data including 85 abnormal data in the 100 abnormal data are all communication congestion, and 85 is greater than 60, prompt information is generated, and the prompt information is used for prompting to process the abnormality of the communication congestion category of the terminal device. The prompt message may include, but is not limited to, text messages and voice messages. Through displaying the prompt information, the related management personnel can pertinently process the communication blocking abnormity of the terminal equipment, and the data processing efficiency is improved. Or the prompt information can be sent to the management terminal, the management terminal can be a terminal used by a manager, the manager can process the communication blocking abnormity, the communication blocking type abnormity of the subsequent terminal device can be managed, the management and control strength is increased, and therefore the abnormity occurrence probability of the type modem in the terminal device is reduced.
Optionally, if the plurality of abnormal data includes abnormal data less than or equal to the target number, the abnormal data are all of the first category, and the abnormal data of the first category may be processed according to an actual situation, for example, such an abnormality of a subsequent terminal device may be processed, or it may be determined that the abnormality belongs to an accidental event, the processing is omitted, and the like. The first category may refer to one of n categories.
In some embodiments, the abnormal data and the category to which the abnormal data belongs may be transmitted to the management terminal, and/or the abnormal data and the category to which the abnormal data belongs may be displayed. That is to say, after the abnormal data is acquired and classified, the abnormal data and the category to which the abnormal data belongs can be sent to the management terminal, so that the management terminal processes the abnormal data, the accuracy of data processing is improved, and the abnormal occurrence probability of the terminal device is reduced.
In the embodiment of the application, the probability that the abnormal data belongs to each abnormal category can be determined by acquiring the abnormal data generated by the terminal device in the communication process and performing classification operation on a plurality of characteristic attributes included in the abnormal data, so that the category to which the abnormal data belongs is determined. Because the probability under each category is calculated, the accuracy of determining the abnormal data category can be improved. Further, since the category to which the abnormal data belongs is determined, the terminal device can be subsequently processed for the category to which the abnormal data belongs, for example, the abnormal repair processing, so that the terminal device can be processed in a targeted manner, and the data processing efficiency is further improved. In addition, the abnormal data does not need to be classified manually, so that the data classification efficiency can be improved.
Referring to fig. 3, fig. 3 is a flowchart of a method for training a classifier according to an embodiment of the present application, where the method for training a classifier can be applied to the network architecture shown in fig. 1, and the method for training a classifier in fig. 3 can be executed by a terminal device, as shown in fig. 3, and the method for training a classifier includes, but is not limited to, the following steps:
s201, obtaining sample data to be classified.
The method comprises the steps that sample data are abnormal data generated in the communication process of terminal equipment, the sample data comprise f characteristic attributes, and f is a positive integer.
In the embodiment of the application, the terminal equipment can acquire a large amount of sample data to be classified, the classifier is trained by using the sample data, and the trained classifier can be stored. Because a large amount of sample data is used for training the classifier, parameters in the classifier are continuously adjusted in the training process, and the classification effect of the classifier can be improved. When the abnormal data of the terminal equipment is acquired subsequently, the stored classifier can be directly used for automatically classifying the abnormal data, so that the labor cost can be reduced, and the abnormal data classification efficiency can be improved.
S202, performing classification operation on the f characteristic attributes, determining the probability of the f characteristic attributes under each of n categories, and obtaining n sample probabilities.
In the embodiment of the application, since the sample data includes f characteristic attributes, the class of the sample data can be determined by combining one or more characteristic attributes of the f characteristic attributes to obtain n sample probabilities, and one class corresponds to one sample probability. By performing classification operation on the f characteristic attributes, the probability of the f characteristic attributes under each of the n categories can be determined, and n sample probabilities are obtained.
In a possible implementation manner, a classifier may be used to perform classification operation on the f feature attributes, determine the probability of the f feature attributes under each of n classes, and obtain n sample probabilities. For example, a classifier may be used to calculate a sample occurrence probability for each of f classes in the sample data; calculating the sample conditional probability of f characteristic attributes under each category; based on the sample probability of occurrence and the sample conditional probability, the probability of f feature attributes under each of the n classes is determined.
Alternatively, the classifier may be a naive bayes classifier, the principle of which is as follows: let x ═ a 1 ,a 2 ,…,a m The items to be classified are defined, and a is the characteristic attribute of x; class set C ═ { y ═ y 1 ,y 2 ,…,y n }; calculating the conditional probability of occurrence of event y under the characteristic attribute x, i.e. P (y) k | x); if P (y) k |x)=max{P(y 1 |x),P(y 2 |x),…,P(y n | x) }, then x ∈ y k
In a specific implementation, when sample data to be classified is acquired, the sample data may be quantized and normalized, and a classifier is used to perform probability calculation on the processed sample data, for example, the occurrence probability P (y) of each class of sample in the sample data may be calculated k ) And calculating respective feature attributes under respective categoriesSample conditional probability P (x) i |y k ) According to Bayes' theorem:
Figure BDA0003586817050000091
after conversion, the following results are obtained:
Figure BDA0003586817050000092
in the formula (1-2), P (y) k | x) is the probability P (y) that the sample data belongs to class k k ) For the sample probability of occurrence of each class in the sample data, P (x | y) k ) Sample conditional probability for each feature attribute under each class, x is sample data, a m For the characteristic attribute included in the sample data, y k Are classified.
Through the formula, n sample probabilities, that is, the probabilities of f feature attributes under each of n classes can be calculated, and since p (x) is the same value for the probabilities of all the classes, in the embodiment of the present application, only the molecular maximization needs to be considered, and the maximum probability can be determined. Alternatively, based on the maximum a posteriori probability (MAP) decision criteria, the classifier can be defined as shown in equations (1-3):
Figure BDA0003586817050000093
illustratively, the classifier may be operated on using Maximum Likelihood Estimation (MLE), and model parameters of the classifier may be obtained, which may include, but are not limited to, a priori probability P (x | y) k ) Since the model parameters of the classifier are determined, subsequently, when the classifier is used for performing class prediction on abnormal data to be classified, the class to which the abnormal data to be classified belongs can be predicted based on the classifier.
In some embodiments, the terminal device may preset a correspondence between one or more characteristic attributes and the categories, and when m characteristic attributes included in the abnormal data are determined, a weight of each of the m characteristic attributes may be acquired, and a probability of the characteristic attribute under each category is calculated based on the weight. For example, when the exception data is obtained, the exception data includes 3 feature attributes, and the 3 feature attributes are a task name, a stack pointer, and a register in the current operation mode, respectively, weights of the 3 feature attributes may be obtained, for example, the weight of the task name is g1, the weight of the stack pointer is g2, and the weight of the register in the current operation mode is g3, and then the probability under each category may be calculated by combining the weight of each feature attribute. If m is equal to 1, that is, the abnormal data includes a feature attribute, the weight corresponding to the feature attribute is 1. Since in some cases, the influence degree of some feature attributes on the class determination is large, that is, the probability that some feature attributes indicate that the abnormal data belongs to some class is large, the weight of each feature attribute can be determined according to the history, and when the feature attributes included in the abnormal data are determined, the probability that the abnormal data belong to each class can be calculated by combining the weights of the feature attributes, so that the class determination accuracy is improved.
For example, the probability of a feature attribute under each category may be calculated based on equations (1-4) in combination with the weight of the feature attribute:
Figure BDA0003586817050000094
wherein, g m Is the weight of the feature attribute.
S203, determining the sample type of the sample data based on the n sample probabilities.
In a possible implementation manner, the manner of determining the sample class to which the sample data belongs based on the n sample probabilities may be: and acquiring the maximum sample probability from the n sample probabilities, and determining the sample class corresponding to the maximum sample probability, so that the sample class corresponding to the maximum sample probability is determined as the sample class to which the sample data to be classified belongs.
In another possible implementation manner, the manner of determining the sample class to which the sample data belongs based on the n sample probabilities may be: d sample probabilities with sample probabilities larger than a target threshold are obtained from the n sample probabilities, and a sample category corresponding to each sample probability in the d sample probabilities is determined to obtain d sample categories; and determining the d sample classes as the sample classes to which the sample data belong. That is, if there are a plurality of sample probabilities that are all greater than the target threshold, the sample classes corresponding to the plurality of sample probabilities may all be determined as the sample class to which the sample data to be classified belongs. Further, the d sample classes and the sample probability corresponding to each sample class in the d sample classes can also be output. By outputting the d sample probabilities with the sample probabilities larger than the target threshold and the sample categories corresponding to the d sample probabilities, subsequent manual work can judge the d sample categories, and further accuracy of category determination is improved.
And S204, acquiring the marking category of the sample data, and training to obtain a classifier based on the sample category and the marking category to which the sample data belongs.
The label type of the sample data refers to the actual type of the sample data, and when the classifier is trained, the label type of the sample data can be determined in advance, which is equivalent to knowing the actual label of the sample data. By processing the sample data by using the classifier, the result output by the classifier, that is, the sample class to which the sample data belongs, can be obtained, and the sample class to which the sample data belongs to one or more classes of the n classes. The goal of training the classifier is to make the sample class to which the sample data belongs and the label class of the sample data as consistent as possible. If the sample types and the mark types of the sample data of which the number is greater than or equal to the preset number are consistent, the classifier at the moment can be stored, so that the subsequent use is facilitated. If the sample types and the mark types of the sample data smaller than the preset number in the plurality of sample data are consistent, the classifier can be continuously trained, and the model parameters in the classifier are adjusted, so that the sample types and the mark types of the sample data which are output by the classifier are consistent as much as possible after the sample data are processed by the classifier.
In the embodiment of the application, naive Bayes is a method for constructing a classifier. The classifier model represents class labels of problem instances by using some characteristic attributes, the class labels being taken from a finite set. The naive bayes classifier assumes that each feature attribute of a sample is not related to other feature attributes, and considers these feature attributes to be independent in determining the probability distribution of the sample class, even though these feature attributes are interdependent or some feature attributes are determined by other feature attributes. The terminal equipment has sufficient sample volume, abundant data and normal distribution, and the extractable characteristic attributes are mutually independent. Therefore, a scheme for classifying the modem exception cases based on a naive Bayes classifier is reasonable.
Further, by using naive Bayes to construct the classifier, since the classifier is constructed in such a way that the conditional probability between the characteristic attribute and the abnormal category is calculated, compared with the classification of abnormal data by using the classification algorithm of the artificial neural network, the method is simpler in calculation way, the sample learning time is relatively shorter, and the training efficiency of the classifier can be saved. Moreover, the classifier has the advantages of simple structure, no need of using a plurality of network structures, simple algorithm logic, easy realization, less parameters to be estimated, insensitivity to missing data, small error classification rate, stable performance and good robustness.
In the embodiment of the application, the classifier is trained by acquiring a large amount of sample data, so that the classifier can classify the input sample data, the parameters in the classifier can be continuously adjusted, and the classification accuracy of the classifier is improved.
Referring to fig. 4, fig. 4 is a flowchart of another data classification method provided in an embodiment of the present application, where the data classification method may be applied to the network architecture shown in fig. 1, and the data classification method in fig. 4 may be executed by a terminal device, as shown in fig. 4, where the data classification method includes, but is not limited to, the following steps:
s301, determining the characteristic attribute included in the initial sample data.
S302, determining a training sample based on the characteristic attribute.
S303, calculating the occurrence probability of each category in the training sample.
S304, calculating the conditional probability of each characteristic attribute under each category.
S305, constructing a classifier based on the occurrence probability and the conditional probability.
S306, obtaining abnormal data, classifying the abnormal data based on the classifier, and determining the category of the abnormal data.
The initial sample data refers to any acquired modem abnormal data of the terminal device, and the training sample can reflect the corresponding relation among the initial sample data, the characteristic attribute and the category.
In some embodiments, the data classification method may include three phases, a preparation phase, a training phase, and an application phase. The preparation stage is a stage of preparing sample data, and corresponds to the above steps S301 to S302. Optionally, the task of the preparation stage is to make necessary preparation for naive bayes classification, and the main work is to determine feature attributes according to specific situations, and appropriately divide each feature attribute to realize classification of initial sample data and form a training sample. The purpose of this stage is to determine the correspondence between the initial sample data, the feature attributes and the categories, and to determine the training samples. The stage is a preparation stage of training the classifier, and the quality of the subsequent classifier can be determined by the feature attributes, the feature attribute classification and the quality of the training samples in the stage. For example, the embodiment of steps S301 to S302 may also refer to the description in fig. 3 for step S201.
Further, the training stage, i.e., the stage of training the classifier, corresponds to the above-described steps S303 to S305. The task of the training phase is to generate a classifier, and the main work is to calculate the occurrence probability of each class in a training sample and the conditional probability of each feature attribute under each class, and record the calculated result, where the recorded data may include model parameters, such as prior probability, in the classifier. For example, the embodiments of step S303 to step S305 may also refer to the descriptions in step S202 to step S204 in fig. 3.
Further, the application stage is a stage of classifying the abnormal data by using a classifier, and corresponds to the step S306. The task of the application stage is to classify the abnormal data to be classified by using a classifier, the data input into the classifier is the abnormal data to be classified, and the output result is the category to which the abnormal data belongs. For example, the implementation of step S306 may also refer to the descriptions in step S101 to step S103 in fig. 2.
In the embodiment of the application, the probability that the abnormal data belongs to each abnormal category can be determined by acquiring the abnormal data generated by the terminal device in the communication process and performing classification operation on a plurality of characteristic attributes included in the abnormal data, so that the category to which the abnormal data belongs is determined. Due to the fact that the probability under each category is calculated, the accuracy of determining the abnormal data categories can be improved. Further, since the category to which the abnormal data belongs is determined, the terminal device can be subsequently processed for the category to which the abnormal data belongs, for example, the abnormal repair processing, so that the terminal device can be processed in a targeted manner, and the data processing efficiency is further improved. In addition, the abnormal data does not need to be classified manually, so that the data classification efficiency can be improved. Furthermore, a large amount of sample data is obtained to train the classifier, so that the classifier can classify the input sample data, the parameters in the classifier can be continuously adjusted, and the classification accuracy of the classifier is improved.
It is understood that, in order to implement the functions of the above embodiments, the terminal device includes a corresponding hardware structure and/or software module for executing each function. Those of skill in the art will readily appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software driven hardware depends on the particular application scenario and design constraints imposed on the solution.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a communication device according to an embodiment of the present application. The communication apparatus may be a terminal device or an apparatus disposed on a terminal device, and the communication apparatus 50 shown in fig. 5 may include an acquisition unit 501, an arithmetic unit 502, and a determination unit 503. Wherein:
the obtaining unit 501 is configured to obtain abnormal data to be classified, where the abnormal data includes m feature attributes, the abnormal data is generated in a communication process of a terminal device, and m is a positive integer.
An operation unit 502, configured to perform a classification operation on the m feature attributes, determine probabilities of the m feature attributes under each of n categories, to obtain n probabilities, where the category is a category of abnormal data generated in a communication process of a terminal device, and one category corresponds to one probability, and n is a positive integer.
A determining unit 503, configured to determine a category to which the abnormal data belongs based on the n probabilities.
In one possible implementation, the feature attributes include one or more of: file name, code line, error information pointer, task name, stack pointer, register in current operation mode, or register in suspension mode; the categories include one or more of: blocked communications, abnormal communication data, down communications, abnormal kernel, timeout timer, failed communications, or full message queue.
In a possible implementation manner, the operation unit 502 is specifically configured to: calculating the occurrence probability of each of n categories in the abnormal data; calculating the conditional probability of the m characteristic attributes under each category; based on the probability of occurrence and the conditional probability, a probability of the m feature attributes under each of the n classes is determined.
In a possible implementation manner, the determining unit 503 is specifically configured to: and acquiring the category corresponding to the maximum probability in the n probabilities, and determining the category corresponding to the maximum probability as the category to which the abnormal data belongs.
In a possible implementation manner, the determining unit 503 is specifically configured to: d probabilities with the probabilities larger than a target threshold value are obtained from the n probabilities, a category corresponding to each probability in the d probabilities is determined, and d categories are obtained, wherein d is a positive integer smaller than or equal to n; determining the d categories as the categories to which the abnormal data belong; and outputting the d categories and the probability corresponding to each category in the d categories.
In a possible implementation manner, the number of the abnormal data to be classified is multiple; the determining unit 503 is further configured to: if the abnormal data comprise abnormal data with the quantity larger than the target quantity and the abnormal data are all in the target category, generating prompt information, wherein the prompt information is used for prompting to process the abnormality of the target category; and displaying the prompt message and/or sending the prompt message to a management terminal.
In one possible implementation, the determining unit 503 is further configured to: acquiring sample data to be classified, wherein the sample data comprises f characteristic attributes, the sample data is abnormal data generated in the communication process of the terminal equipment, and f is a positive integer; carrying out classification operation on the f characteristic attributes, determining the probability of the f characteristic attributes under each category in the n categories to obtain n sample probabilities, wherein one category corresponds to one sample probability; determining a sample class to which the sample data belongs based on the n sample probabilities; obtaining a mark type of the sample data, and training to obtain a classifier based on the sample type of the sample data and the mark type; the operation unit 502 is specifically configured to: and performing classification operation on the m characteristic attributes based on the classifier.
The communication device may be, for example: a chip, or a chip module. Each module or each unit included in each apparatus and product described in the above embodiments may be a software module, or may also be a hardware module, or may also be a part of a software module and a part of a hardware module. For example, for each device or product applied to or integrated in a chip, each module included in the device or product may be implemented by hardware such as a circuit, or at least a part of the modules may be implemented by a software program running on a processor integrated in the chip, and the rest (if any) part of the modules may be implemented by hardware such as a circuit; for each device and product applied to or integrated with the chip module, each module included in the device and product may be implemented in a hardware manner such as a circuit, and different modules may be located in the same component (e.g., a chip, a circuit module, etc.) or different components of the chip module, or at least a part of the modules may be implemented in a software program running on a processor integrated within the chip module, and the rest (if any) part of the modules may be implemented in a hardware manner such as a circuit; for each device and product applied to or integrated in the terminal device, each module included in the device and product may be implemented by hardware such as a circuit, and different modules may be located in the same component (e.g., a chip, a circuit module, etc.) or different components in the terminal device, or at least some of the modules may be implemented by a software program running on a processor integrated in the terminal device, and the rest (if any) of the modules may be implemented by hardware such as a circuit.
Fig. 6 shows another communication apparatus 60 provided in the embodiment of the present application, which is used for implementing the functions of the terminal device. The apparatus 60 may be a terminal device or an apparatus disposed on the terminal device, and the apparatus disposed on the terminal device may be a chip system or a chip in the terminal device. The chip system may be composed of a chip, or may include a chip and other discrete devices.
The communication device 60 includes at least one processor 602, which is configured to implement the data classification function of the terminal device in the method provided by the embodiment of the present application. The communication device 60 may further include a communication interface 601, configured to implement transceiving operations of a terminal device in the method provided in this embodiment. In embodiments of the present application, the communication interface may be a transceiver, circuit, bus, module, or other type of communication interface for communicating with other devices over a transmission medium. For example, the communication interface 601 is used for the apparatus in the communication apparatus 60 to communicate with other devices. The processor 602 uses the communication interface 601 to send and receive data and is used to implement the methods described in the above method embodiments.
The communication device 60 may also include at least one memory 603 for storing program instructions and/or data. The memory 603 is coupled to the processor 602. The coupling in the embodiments of the present application is an indirect coupling or a communication connection between devices, units or modules, and may be an electrical, mechanical or other form for information interaction between the devices, units or modules. The processor 602 may cooperate with the memory 603. The processor 602 may execute program instructions stored in the memory 603. At least one of the at least one memory may be included in the processor.
When the communication device 60 is powered on, the processor 602 can read the software program in the memory 603, interpret and execute the instructions of the software program, and process the data of the software program. When data needs to be sent wirelessly, the processor 602 performs baseband processing on the data to be sent, and outputs a baseband signal to a radio frequency circuit (not shown), and the radio frequency circuit performs radio frequency processing on the baseband signal and sends the radio frequency signal to the outside in the form of electromagnetic waves through an antenna. When data is transmitted to the communication device 60, the rf circuit receives an rf signal through the antenna, converts the rf signal into a baseband signal, and outputs the baseband signal to the processor 602, and the processor 602 converts the baseband signal into data and processes the data.
In another implementation, the rf circuitry and antennas may be provided independently of the processor 602 performing baseband processing, for example in a distributed scenario, the rf circuitry and antennas may be in a remote arrangement independent of the communication device.
The embodiment of the present application does not limit the specific connection medium among the communication interface 601, the processor 602, and the memory 603. In the embodiment of the present application, the memory 603, the processor 602, and the communication interface 601 are connected by the bus 604 in fig. 6, the bus is represented by a thick line in fig. 6, and the connection manner between other components is merely illustrative and not limited. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.
When the communication device 60 is a device disposed on a terminal device, for example, when the communication device 60 is a chip or a chip system, the output or the reception of the communication interface 601 may be a baseband signal. When the communication device 60 is a terminal device, the communication interface 601 may output or receive a radio frequency signal. In the embodiments of the present application, the processor may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or execute the methods, operations, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The operations of the methods disclosed in connection with the embodiments of the present application may be directly performed by a hardware processor, or may be performed by a combination of hardware and software modules in a processor.
It should be noted that, the communication apparatus may perform relevant steps of the terminal device or the access network device in the foregoing method embodiments, which may specifically refer to implementation manners provided in the foregoing steps, and details are not described herein again.
For each device or product applied to or integrated in the communication device, each module included in the device or product may be implemented by hardware such as a circuit, different modules may be located in the same component (e.g., a chip, a circuit module, etc.) or different components in the terminal, or at least a part of the modules may be implemented by a software program running on a processor integrated in the terminal, and the rest (if any) of the modules may be implemented by hardware such as a circuit.
For the case that the communication device may be a chip or a system of chips, see the schematic diagram of the chip shown in fig. 7. The chip 70 comprises a processor 701 and a communication interface 702. The number of the processors 701 may be one or more, and the number of the communication interfaces 702 may be more.
The processor 701 is configured to perform the following operations:
acquiring abnormal data to be classified, wherein the abnormal data comprises m characteristic attributes, the abnormal data is generated in the communication process of the terminal equipment, and m is a positive integer;
performing classification operation on the m characteristic attributes, determining the probability of the m characteristic attributes under each of n categories to obtain n probabilities, wherein the categories are the categories of abnormal data generated in the communication process of the terminal equipment, one category corresponds to one probability, and n is a positive integer;
and determining the category to which the abnormal data belongs based on the n probabilities.
In one possible implementation, the feature attributes include one or more of: file name, code line, error information pointer, task name, stack pointer, register in current operation mode, or register in suspension mode; the categories include one or more of: blocked communications, abnormal communication data, down communications, abnormal cores, timeout timers, failed communications, or full message queues.
In one possible implementation, the processor 701 is configured to specifically perform the following operations: calculating the occurrence probability of each of n categories in the abnormal data; calculating the conditional probability of the m characteristic attributes under each category; based on the probability of occurrence and the conditional probability, a probability of the m feature attributes under each of the n classes is determined.
In one possible implementation, the processor 701 is configured to specifically perform the following operations: and acquiring the category corresponding to the maximum probability in the n probabilities, and determining the category corresponding to the maximum probability as the category to which the abnormal data belongs.
In one possible implementation, the processor 701 is configured to specifically perform the following operations: d probabilities with the probabilities larger than a target threshold are obtained from the n probabilities, a category corresponding to each probability in the d probabilities is determined, and d categories are obtained, wherein d is a positive integer smaller than or equal to n; determining the d categories as the categories to which the abnormal data belong; and outputting the d categories and the probability corresponding to each category in the d categories.
In a possible implementation manner, the number of the abnormal data to be classified is multiple; the processor 701 is configured to perform the following operations: if the abnormal data comprise abnormal data with the quantity larger than the target quantity and the abnormal data are all in the target category, generating prompt information, wherein the prompt information is used for prompting to process the abnormality of the target category; and displaying the prompt message and/or sending the prompt message to the management terminal.
In one possible implementation, the processor 701 is configured to perform the following operations: acquiring sample data to be classified, wherein the sample data comprises f characteristic attributes, the sample data is abnormal data generated in the communication process of the terminal equipment, and f is a positive integer; carrying out classification operation on the f characteristic attributes, determining the probability of the f characteristic attributes under each category in the n categories to obtain n sample probabilities, wherein one category corresponds to one sample probability; determining a sample class to which the sample data belongs based on the n sample probabilities; obtaining a mark type of the sample data, and training to obtain a classifier based on the sample type of the sample data and the mark type; and performing classification operation on the m characteristic attributes based on the classifier.
For each device or product applied to or integrated in the chip, each module included in the device or product may be implemented by hardware such as a circuit, or at least a part of the modules may be implemented by a software program running on the processor 701 integrated in the chip, and the rest (if any) part of the modules may be implemented by hardware such as a circuit.
As shown in fig. 8, fig. 8 is a schematic structural diagram of a module device according to an embodiment of the present disclosure. The module apparatus 80 can perform the steps related to the terminal apparatus in the foregoing method embodiments, and the module apparatus 80 includes: a communication module 801, a power module 802, a storage module 803, and a chip module 804.
Wherein, the power module 802 is used for providing power for the module device; the storage module 803 is used for storing data and instructions; the communication module 801 is used for performing module device internal communication or for performing module device and external device communication.
The chip module 804 is used for:
acquiring abnormal data to be classified, wherein the abnormal data comprises m characteristic attributes, the abnormal data is generated in the communication process of the terminal equipment, and m is a positive integer; performing classification operation on the m characteristic attributes, determining the probability of the m characteristic attributes under each of n categories to obtain n probabilities, wherein the categories are the categories of abnormal data generated in the communication process of the terminal equipment, one category corresponds to one probability, and n is a positive integer; and determining the category to which the abnormal data belongs based on the n probabilities.
In one possible implementation, the feature attributes include one or more of: file name, code line, error information pointer, task name, stack pointer, register in current operation mode, or register in suspension mode; the categories include one or more of: blocked communications, abnormal communication data, down communications, abnormal cores, timeout timers, failed communications, or full message queues.
In a possible implementation manner, the chip module 804 is specifically configured to: calculating the occurrence probability of each of n categories in the abnormal data; calculating the conditional probability of the m characteristic attributes under each category; based on the probability of occurrence and the conditional probability, a probability of the m feature attributes under each of the n classes is determined.
In a possible implementation manner, the chip module 804 is specifically configured to: and acquiring the category corresponding to the maximum probability in the n probabilities, and determining the category corresponding to the maximum probability as the category to which the abnormal data belongs.
In a possible implementation manner, the chip module 804 is specifically configured to: d probabilities with the probabilities larger than a target threshold are obtained from the n probabilities, a category corresponding to each probability in the d probabilities is determined, and d categories are obtained, wherein d is a positive integer smaller than or equal to n; determining the d categories as the categories to which the abnormal data belong; and outputting the d categories and the probability corresponding to each category in the d categories.
In a possible implementation manner, the number of the abnormal data to be classified is multiple; the chip module 804 is specifically configured to: if the abnormal data comprise abnormal data with the quantity larger than the target quantity and the abnormal data are all in the target category, generating prompt information, wherein the prompt information is used for prompting to process the abnormality of the target category; and displaying the prompt message and/or sending the prompt message to the management terminal.
In a possible implementation manner, the chip module 804 is specifically configured to: acquiring sample data to be classified, wherein the sample data comprises f characteristic attributes, the sample data is abnormal data generated in the communication process of the terminal equipment, and f is a positive integer; carrying out classification operation on the f characteristic attributes, determining the probability of the f characteristic attributes under each category in the n categories to obtain n sample probabilities, wherein one category corresponds to one sample probability; determining a sample class to which the sample data belongs based on the n sample probabilities; obtaining a mark type of the sample data, and training to obtain a classifier based on the sample type of the sample data and the mark type; and performing classification operation on the m characteristic attributes based on the classifier.
For each device and product applied to or integrated in the chip module, each module included in the device and product may be implemented by using hardware such as a circuit, and different modules may be located in the same component (e.g., a chip, a circuit module, etc.) or different components of the chip module, or at least some of the modules may be implemented by using a software program running on a processor integrated in the chip module, and the rest (if any) of the modules may be implemented by using hardware such as a circuit.
Embodiments of the present application further provide a computer-readable storage medium, in which instructions are stored, and when the computer-readable storage medium is executed on a processor, the method flow of the above method embodiments is implemented.
It is noted that, for simplicity of explanation, the foregoing method embodiments are described as a series of acts or combination of acts, but those skilled in the art will appreciate that the present application is not limited by the order of acts, as some acts may, in accordance with the present application, occur in other orders and/or concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
The descriptions of the embodiments provided in the present application may be referred to each other, and the descriptions of the embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. For convenience and simplicity of description, for example, the functions and operations performed by each device and apparatus provided in the embodiments of the present application may refer to the relevant description of the method embodiments of the present application, and may also be referred to, combined with or incorporated into each other among the method embodiments and the device embodiments.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (13)

1. A method of data classification, comprising:
acquiring abnormal data to be classified, wherein the abnormal data comprises m characteristic attributes, the abnormal data is generated in the communication process of the terminal equipment, and m is a positive integer;
performing classification operation on the m characteristic attributes, determining the probability of the m characteristic attributes under each category of n categories to obtain n probabilities, wherein the categories are the categories of abnormal data generated in the communication process of the terminal equipment, one category corresponds to one probability, and n is a positive integer;
determining a category to which the anomalous data belongs based on the n probabilities.
2. The method of claim 1, wherein the feature attributes comprise one or more of: file name, code line, error information pointer, task name, stack pointer, register in current operation mode, or register in suspension mode;
the categories include one or more of: blocked communications, abnormal communication data, down communications, abnormal cores, timeout timers, failed communications, or full message queues.
3. The method of claim 1 or 2, wherein said performing a classification operation on said m feature attributes to determine a probability of said m feature attributes in each of n classes comprises:
calculating the occurrence probability of each of n categories in the abnormal data;
calculating conditional probabilities of the m feature attributes under each of the categories;
determining a probability of the m feature attributes under each of the n classes based on the probability of occurrence and the conditional probability.
4. The method according to any one of claims 1-3, wherein said determining a category to which the anomalous data belongs based on the n probabilities comprises:
and acquiring the category corresponding to the maximum probability in the n probabilities, and determining the category corresponding to the maximum probability as the category to which the abnormal data belongs.
5. The method according to any one of claims 1-3, wherein said determining a category to which the anomalous data belongs based on the n probabilities comprises:
obtaining d probabilities with the probabilities larger than a target threshold value from the n probabilities, determining a category corresponding to each probability in the d probabilities to obtain d categories, wherein d is a positive integer smaller than or equal to n;
determining the d categories as categories to which the abnormal data belong;
the method further comprises the following steps:
and outputting the d categories and the probability corresponding to each category in the d categories.
6. The method according to any one of claims 1 to 5, wherein the number of the abnormal data to be classified is plural;
the method further comprises the following steps:
if the abnormal data comprise abnormal data with the quantity larger than the target quantity and the abnormal data are all in the target category, generating prompt information, wherein the prompt information is used for prompting to process the abnormality of the target category;
and displaying the prompt information and/or sending the prompt information to a management terminal.
7. The method of claim 1, further comprising:
acquiring sample data to be classified, wherein the sample data comprises f characteristic attributes, the sample data is abnormal data generated in the communication process of the terminal equipment, and f is a positive integer;
performing classification operation on the f characteristic attributes, determining the probability of the f characteristic attributes under each of the n classes to obtain n sample probabilities, wherein one class corresponds to one sample probability;
determining a sample class to which the sample data belongs based on the n sample probabilities;
obtaining a mark class of the sample data, and training to obtain a classifier based on the sample class to which the sample data belongs and the mark class;
the classifying operation of the m feature attributes includes:
and performing classification operation on the m characteristic attributes based on the classifier.
8. A communication device comprising means for implementing the method of any of claims 1-7.
9. A communication device comprising a processor and a transceiver;
the transceiver is used for receiving or transmitting signals;
the processor is used for executing the method of any one of claims 1 to 7.
10. The communications apparatus of claim 9, the communications apparatus further comprising a memory:
the memory for storing a computer program;
the processor, in particular for invoking the computer program from the memory, to cause the communication device to perform the method of any of claims 1-7.
11. A chip, characterized in that,
the chip is used for acquiring abnormal data to be classified, the abnormal data comprises m characteristic attributes, the abnormal data is generated in the communication process of the terminal equipment, and m is a positive integer;
the chip is further used for carrying out classification operation on the m characteristic attributes, determining the probability of the m characteristic attributes in each of n categories, and obtaining n probabilities; the categories are categories of abnormal data generated in the communication process of the terminal equipment, one category corresponds to one probability, and n is a positive integer;
the chip is further used for determining the category to which the abnormal data belongs based on the n probabilities.
12. The utility model provides a module equipment, its characterized in that, module equipment includes communication module, power module, storage module and chip module, wherein:
the power supply module is used for providing electric energy for the module equipment;
the storage module is used for storing data and instructions;
the communication module is used for carrying out internal communication of module equipment or is used for carrying out communication between the module equipment and external equipment;
the chip module is used for:
acquiring abnormal data to be classified, wherein the abnormal data comprises m characteristic attributes, the abnormal data is generated in the communication process of the terminal equipment, and m is a positive integer;
performing classification operation on the m characteristic attributes, and determining the probability of the m characteristic attributes under each of n categories to obtain n probabilities; the categories are categories of abnormal data generated in the communication process of the terminal equipment, one category corresponds to one probability, and n is a positive integer;
determining a category to which the anomalous data belongs based on the n probabilities.
13. A computer readable storage medium having computer readable instructions stored thereon which, when run on a communication device, cause the communication device to perform the method of any of claims 1-7.
CN202210365185.0A 2022-04-08 2022-04-08 Data classification method and related device Pending CN114881117A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210365185.0A CN114881117A (en) 2022-04-08 2022-04-08 Data classification method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210365185.0A CN114881117A (en) 2022-04-08 2022-04-08 Data classification method and related device

Publications (1)

Publication Number Publication Date
CN114881117A true CN114881117A (en) 2022-08-09

Family

ID=82670234

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210365185.0A Pending CN114881117A (en) 2022-04-08 2022-04-08 Data classification method and related device

Country Status (1)

Country Link
CN (1) CN114881117A (en)

Similar Documents

Publication Publication Date Title
US20200258006A1 (en) Prediction method, terminal, and server
US11451452B2 (en) Model update method and apparatus, and system
EP3739811B1 (en) Data analysis device, and multi-model co-decision system and method
WO2015176560A1 (en) User behavior recognition method, user equipment, and behavior recognition server
CN107635189B (en) Beam selection method and device
WO2022166886A1 (en) Data acquisition method and apparatus
CN104253704A (en) Terminal log reporting method, device and system
EP3542363B1 (en) Mixture model based soft-clipping detection
EP4307634A1 (en) Feature engineering programming method and apparatus
CN108764051A (en) Image processing method, device and mobile terminal
US10271218B2 (en) Enable access point availability prediction
US20240127074A1 (en) Multi-task network model–based communication method, apparatus, and system
CN114881117A (en) Data classification method and related device
EP4057751A1 (en) Scheduling method and apparatus
CN117459961A (en) Communication method, device and system
CN114444607A (en) LSTM-based equipment evaluation method and system
CN112989078A (en) Data processing method and device, computer equipment and storage medium
CN108738067B (en) Apparatus and method in radio communication system and computer storage medium
Lee et al. Neural architecture search for computation offloading of dnns from mobile devices to the edge server
WO2024012326A1 (en) Communication method, apparatus and system
US11727602B2 (en) Resolution of a picture
WO2023213270A1 (en) Model training processing methods, apparatus, terminal and network side device
WO2024120445A1 (en) Model input information determination method, apparatus, device and system, and storage medium
WO2024008111A1 (en) Data acquisition method and device
WO2023125934A1 (en) Ai network information transmission method and apparatus, and communication device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination