CN115378880A - Traffic classification method and device, computer equipment and storage medium - Google Patents

Traffic classification method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN115378880A
CN115378880A CN202210982546.6A CN202210982546A CN115378880A CN 115378880 A CN115378880 A CN 115378880A CN 202210982546 A CN202210982546 A CN 202210982546A CN 115378880 A CN115378880 A CN 115378880A
Authority
CN
China
Prior art keywords
classification
traffic
flow
classified
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210982546.6A
Other languages
Chinese (zh)
Other versions
CN115378880B (en
Inventor
谈敏
陈宇麟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202210982546.6A priority Critical patent/CN115378880B/en
Publication of CN115378880A publication Critical patent/CN115378880A/en
Application granted granted Critical
Publication of CN115378880B publication Critical patent/CN115378880B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of computers, in particular to a traffic classification method, a traffic classification device, computer equipment and a storage medium, wherein the traffic classification method comprises the following steps: acquiring traffic to be classified and traffic classification scene information; inputting the flow classification scene information into a classification scene keyword extraction model to obtain classification keywords serving as classification bases, wherein the classification keywords comprise: a classification attribute, and a classification attribute value; inputting the classification attribute and the flow into an attribute matching model to obtain a flow attribute value matched with the classification attribute, wherein the flow attribute value is extracted from the flow; acquiring the similarity of the classification attribute value and the flow attribute value; and classifying the flow according to the similarity to obtain a classified flow group. The embodiment of the application aims at improving the accuracy of flow classification.

Description

Traffic classification method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a traffic classification method and apparatus, a computer device, and a storage medium.
Background
Nowadays, information technology and living standard are continuously improved, internet financial services such as internet shopping and internet financing are developed, and network flow data is explosively increased in the face of rapidly developed internet financial services.
In order to manage the explosive traffic data, optimization needs to be performed with respect to network resources. As a key technology for managing and optimizing various network resources, network traffic classification is widely applied to the fields of network security, quality of service management and the like. Conventional traffic classification techniques classify traffic port numbers, for example, by UDP or TCP port numbers, mac addresses.
However, the traffic classification method only aims at the port number and cannot aim at different classification requirements, so that the accuracy of traffic classification is low.
Disclosure of Invention
The embodiment of the application provides a flow classification method, a flow classification device, computer equipment and a storage medium, and aims to improve the accuracy of flow classification.
In one aspect, the present application provides a traffic classification method, including:
acquiring traffic to be classified and traffic classification scene information;
inputting the flow classification scene information into a classification scene keyword extraction model to obtain classification keywords serving as classification bases, wherein the classification keywords comprise: a classification attribute, and a classification attribute value;
matching the classification attribute with the flow input attribute to obtain a flow attribute value matched with the classification attribute, wherein the flow attribute value is extracted from the flow;
acquiring the similarity of the classification attribute value and the flow attribute value;
and classifying the flow according to the similarity to obtain a classified flow group.
In another aspect, the present application provides a flow classifying device, including:
the acquisition module is used for acquiring the traffic to be classified and traffic classification scene information;
an extraction module, configured to input the traffic classification scene information into a classification scene keyword extraction model to obtain a classification keyword as a classification basis, where the classification keyword includes: a classification attribute, and a classification attribute value;
the matching module is used for matching the classification attribute with the flow input attribute to obtain a flow attribute value matched with the classification attribute, wherein the flow attribute value is extracted from the flow;
the classification module is used for acquiring the similarity between the classification attribute value and the flow attribute value; and classifying the flow according to the similarity to obtain a classified flow group.
In another aspect, the present application further provides a computer device, including:
one or more processors;
a memory; and
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the processor to implement the traffic classification method of any of the first aspects.
In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, the computer program being loaded by a processor to perform the steps in the traffic classification method according to any one of the first aspect.
According to the method and the device, the classification keywords serving as classification bases can be determined based on the flow classification scene information, and classification is performed through the classification keywords, so that flow classification can be achieved aiming at different service scenes, and the accuracy of flow classification under different scenes is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view of a flow classification system provided in an embodiment of the present application;
FIG. 2 is a flow chart illustrating an embodiment of a traffic classification method provided in an embodiment of the present application;
fig. 3 is a schematic flowchart of an embodiment of obtaining a traffic recording file in the embodiment of the present application;
FIG. 4 is a schematic structural diagram of an embodiment of a flow classifying device provided in the embodiments of the present application;
fig. 5 is a schematic structural diagram of an embodiment of a computer device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the description of the present application, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience of description and for simplicity of description, and do not indicate or imply that the referenced device or element must have a particular orientation, be constructed in a particular orientation, and be operated, and thus should not be considered as limiting the present application. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
In this application, the word "exemplary" is used to mean "serving as an example, instance, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. The following description is presented to enable any person skilled in the art to make and use the application. In the following description, details are set forth for the purpose of explanation. It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known structures and processes are not set forth in detail in order to avoid obscuring the description of the present application with unnecessary detail. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
It should be noted that, since the method in the embodiment of the present application is executed in a computer device, and processing objects of each computer device all exist in the form of data or information, for example, time, which is substantially time information, it can be understood that, in the subsequent embodiments, if size, number, position, and the like are mentioned, corresponding data exist so as to be processed by the electronic device, and details are not described herein.
Embodiments of the present application provide a traffic classification method, apparatus, computer device, and storage medium, which are described in detail below.
Referring to fig. 1, fig. 1 is a schematic view of a flow rate classification system according to an embodiment of the present application, where the flow rate classification system may include a computer device 100, and a flow rate classification apparatus, such as the computer device in fig. 1, is integrated in the computer device 100.
In the embodiment of the present application, the computer device 100 is mainly used for acquiring traffic to be classified and traffic classification scene information;
inputting the flow classification scene information into a classification scene keyword extraction model to obtain classification keywords serving as classification bases, wherein the classification keywords comprise: a classification attribute, and a classification attribute value;
inputting the classification attribute and the flow into an attribute matching model to obtain a flow attribute value matched with the classification attribute, wherein the flow attribute value is extracted from the flow;
obtaining the similarity of the classification attribute value and the flow attribute value;
and classifying the flow according to the similarity to obtain a classified flow group.
In this embodiment, the computer device 100 may be an independent server, or may be a server network or a server cluster composed of servers, for example, the computer device 100 described in this embodiment includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud server composed of a plurality of servers. Among them, the Cloud server is constituted by a large number of computers or web servers based on Cloud Computing (Cloud Computing).
Those skilled in the art will appreciate that the application environment shown in fig. 1 is only one application scenario related to the present application, and does not constitute a limitation on the application scenario of the present application, and that other application environments may further include more or less computer devices than those shown in fig. 1, for example, only 1 computer device is shown in fig. 1, and it is understood that the traffic classification system may further include one or more other services, which are not limited herein.
In addition, as shown in fig. 1, the traffic classification system may further include a memory 200 for storing data, such as traffic related data, traffic classification scenario information, and the like.
It should be noted that the scenario diagram of the traffic classification system shown in fig. 1 is merely an example, and the traffic classification system and the scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not form a limitation to the technical solution provided in the embodiment of the present application, and as a person having ordinary skill in the art knows that along with the evolution of the traffic classification system and the appearance of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.
First, an embodiment of the present application provides a traffic classification method, where an execution subject of the traffic classification method is a traffic classification device, and the traffic classification device is applied to a computer device, and the traffic classification method includes:
acquiring traffic to be classified and traffic classification scene information;
inputting the flow classification scene information into a classification scene keyword extraction model to obtain classification keywords serving as classification bases, wherein the classification keywords comprise: a classification attribute, and a classification attribute value;
inputting the classification attribute and the flow into an attribute matching model to obtain a flow attribute value matched with the classification attribute, wherein the flow attribute value is extracted from the flow;
obtaining the similarity of the classification attribute value and the flow attribute value;
and classifying the flow according to the similarity to obtain a classified flow group.
Compared with the classification only through the flow ports, the classification method and the classification device can determine the classification keywords serving as the classification basis based on the flow classification scene information, and classify through the classification keywords, so that the flow classification can be realized aiming at different service scenes, and the accuracy of the flow classification under different scenes is improved.
Fig. 2 is a schematic flow chart of an embodiment of a traffic classification method in the embodiment of the present application, and details of implementation of the traffic classification method in the embodiment of the present application are specifically described below with reference to fig. 2.
The traffic classification method of the embodiment of the application comprises the following steps:
step 201, obtaining the traffic to be classified and traffic classification scene information.
The traffic in the embodiment of the present application is specifically network traffic.
The traffic classification scenario information includes at least: after the flow is classified, the flow of each category has the common flow characteristic, and the flow characteristic can represent classification basis information of the flow.
The traffic classification scenario information may further include: use of the classified traffic.
For example: the classified traffic is used for performing software tests, such as performance tests and functional tests, to improve software robustness, and for example, the classified traffic is used for determining the priority of the traffic, and the traffic is processed according to the priority.
Step 202, inputting the flow classification scene information into a classification scene keyword extraction model to obtain a classification keyword serving as a classification basis.
The classification keywords include: a classification attribute, and a classification attribute value.
Specifically, the classification keyword according to which the classification is based may be determined from the traffic features in the traffic classification scenario information.
The classification key words are a series of key value pairs which are used as classification basis, wherein the key is a classification attribute, and the value is a classification attribute value.
The classification keyword includes at least one classification attribute, wherein the classification attribute may include a technology attribute and may also include a service attribute, but is not limited thereto, for example: the technical class attributes may include transmission protocols, interface numbers, latency requirements, concurrency requirements, and the like, and the service class attributes may include: and a traffic service scenario, where the traffic service scenario may be identified by a scenario code, and the traffic service scenario includes, but is not limited to, a payment service, a login service, or a registration service, a loan service, and the like.
The classification attribute value may be a value or a set, and this embodiment of the present application does not limit this.
After the classification keywords are extracted, the classification keywords may be numbered, one class for each keyword corresponding to a class number.
For example, please refer to table 1 below, where the table is a schematic diagram of extracted classification keywords, a classification attribute of a classification keyword with a class number of 001 is a port, and a classification attribute value is a set, where the set includes port a, port B, and port C; the classification attributes in the classification keyword with the class number of 002 are respectively a protocol and a scene code, the classification attribute value corresponding to the protocol is a protocol A, the classification attribute value corresponding to the scene code is registration and login, and so on.
Watch 1
Figure BDA0003800804940000071
It should be noted that, for convenience of understanding, the classification attribute values and the classification attributes in the table are expressed in chinese characters, and in practical applications, the classification attributes and the classification attribute values may be encoded and expressed for convenience of data transmission, for example, the classification attribute values of the scene codes may be expressed by specific numeric codes, or may be expressed by other encoding methods such as character codes, and the specific expression method of the information in the application process is not limited in the embodiments of the present application.
Several classification keywords are listed above, and several ways of extracting classification keywords are listed below.
In some embodiments, the traffic classification scene information is input into a candidate word extraction module in a classification scene keyword extraction model to obtain classification candidate words;
and inputting the classified candidate words into a keyword screening module in the classified scene keyword extraction model to obtain classified keywords.
According to the embodiment of the application, the classified candidate words are extracted firstly, then the classified keywords are extracted, the screening range of the classified keywords is narrowed through the candidate words, and the efficiency of extracting the classified keywords is improved.
In some embodiments, the keyword screening module includes a trained keyword extraction classifier, and the step of inputting the classified candidate words into the keyword screening module in the classified scene keyword extraction model to obtain the classified keywords includes:
classifying and labeling the classified candidate words through the keyword extraction classifier;
and taking the classification candidate words marked as the keywords as the classification keywords.
Wherein, the step of training the keyword extraction classifier comprises:
obtaining candidate word samples extracted from the flow samples, wherein each candidate word sample has a corresponding classification label, and the classification label of each candidate word is as follows: the candidate word is a classified keyword, or the candidate word is not a candidate keyword;
and inputting the candidate word samples and the classification labels of the candidate word samples into the keyword extraction classifier until the loss function of the keyword extraction classifier is converged.
The keyword extraction classifier is realized through machine learning, and therefore the influence degree of various factors on judging whether candidate words are classified keywords or not can be comprehensively considered in the training process of the keyword extraction classifier, so that the screening result of the classified keywords can be more accurate through the trained keyword extraction classifier, and the classification accuracy is further improved.
In other embodiments, inputting the classification candidate words into a keyword screening module in the classification scene keyword extraction model to obtain classification keywords, including:
acquiring the number N of preset classified keywords through a keyword screening module, wherein N is a positive integer;
grading each classified candidate word through the keyword screening module to obtain a score of each classified candidate word;
sorting the candidate words according to the scores of the classified candidate words;
and screening the first N classification candidate words as classification keywords in the sorted classification candidate words according to the sequence of scores from large to small.
In some embodiments, scoring each classification candidate word by the keyword screening module to obtain a score of each classification candidate word includes: keyword extraction based on statistical features, such as TF, TF-IDF, keyword extraction based on word graph models, such as PageRank and TextRank, keyword extraction based on topic models, such as LDA, and the like.
The keyword extraction algorithm based on the statistical characteristics is to extract keywords of the document through statistical information of words in the flow classification scene information;
extracting keywords based on a word graph model, firstly constructing a language network graph of flow classification scene information, and then carrying out network graph analysis on the language network graph, so as to search words or phrases with important functions on the language network graph, wherein the phrases are the keywords of the flow classification scene information;
the keyword extraction algorithm based on the theme mainly performs keyword extraction through the property about theme distribution in the theme model.
The embodiment of the application lists a plurality of scoring strategies, but is not limited to the above strategies, and the actual application types can be set according to requirements. The extraction of the keywords is carried out through scoring, the keyword extraction classifier does not need to be trained, and therefore the classification samples do not need to be labeled, and the extraction of the keywords is faster.
In other embodiments, the user can also select a keyword extraction mode.
For example: a user interface can be provided, the user interface comprises a keyword extraction mode, and the keyword extraction mode comprises: and if the user selects the unsupervised keyword extraction, the keywords can be extracted in a second mode, namely a scoring mode.
After the keywords are extracted, step 203 may be performed.
And 203, inputting the classification attribute and the flow into an attribute matching model to obtain a flow attribute value matched with the classification attribute.
Wherein the flow attribute value is extracted from the flow.
The attribute matching model is used to find values in the traffic that correspond to the classification attributes.
For example, the classification attributes include: and (4) a scene code, wherein the traffic scene code corresponding to the traffic represents registration, and the traffic attribute value matched with the classification attribute is registration.
The traffic has a corresponding packet format, for example, a quintuple format, and the like, and the format of each traffic is integrated in the attribute matching model, and the attribute matching model can determine the meaning of each part in the traffic packet.
After obtaining the traffic attribute value matching the classification attribute from the traffic, step 204 may be performed.
And step 204, obtaining the similarity of the classification attribute value and the flow attribute value.
In some embodiments, the classification attribute values and the traffic attribute values may be vectorized, and the similarity between the vectorized classification attribute values and the vectorized traffic attribute values may be calculated by using a jaccard similarity coefficient, a cosine similarity, or the like.
In other embodiments, the classification attribute value and the flow attribute value may be compared, whether the classification attribute value is the same as an element in the classification attribute value set is determined, and the similarity is obtained according to whether the elements are the same, where the higher the same element is, the higher the similarity is, and if the elements that are the same as the flow attribute value can be found in the classification attribute value set, the characterization similarity is 100%.
For example, the traffic attribute values are protocol a and registration, and as can be seen from the above table, if the protocol a of the traffic attribute value is the same as the classification attribute value corresponding to the protocol of class number 002, and the registration of the traffic attribute value is the same as the registration in the classification attribute value set corresponding to the scene code of class number 002, the similarity between the traffic attribute value of the traffic and the classification attribute value is 100%.
Further, the traffic attribute value is also 100% similar to the classification attribute value of 004.
And step 205, classifying the flow according to the similarity to obtain a classified flow group.
In some embodiments, a similarity threshold may be set, and if the similarity threshold is higher than the threshold, the traffic is divided into traffic groups corresponding to the keyword, where the similarity threshold may be set according to actual application requirements.
For example, if the similarity threshold is set to 100%, and the traffic attribute values are protocol a and registration, the traffic corresponding to the traffic attribute value is classified into the traffic group corresponding to class number 002, and meanwhile, the traffic is also classified into the traffic group corresponding to class number 004.
After classifying the traffic according to the similarity, classifying the scene information at the traffic includes: under the condition of a flow classification scene of software testing, recording the flow according to the classified flow group to obtain a flow recording file; and transmitting the flow recording file to computer equipment for software testing.
In some embodiments, as shown in fig. 3, the above recording traffic according to the classified traffic group to obtain a traffic recording file includes steps 301 to 304:
step 301, acquiring software testing requirements.
The software testing requirement information can be carried in the flow classification scene information and can also be obtained through other messages.
The software testing requirement information may include: the function of the test, or the performance of the test, etc., such as testing the registration function.
And step 302, matching the target classification number according to the software test requirement.
For example, the target classification keywords consistent with the software test requirement information are searched in the classification keywords, and the target classification numbers corresponding to the target classification keywords are determined.
For example, if the software testing requirement needs to test the registration function, the registered classification keywords are included as the target classification keywords.
Step 303, determining a target flow group according to the target classification number.
That is, the traffic group corresponding to the target classification number is the target traffic group. For example, if the software testing requirement is a test registration function, it may be determined that the traffic group corresponding to the number 004 is the target traffic group.
In some embodiments, after the target traffic group is determined, the traffic in the target traffic group is subjected to deduplication processing, so as to obtain a deduplicated target traffic group.
And step 304, recording the target flow group to obtain a flow recording file.
In some embodiments, the target traffic group after the duplication removal is recorded, so as to obtain a traffic recording file. In the embodiment, the flow group is subjected to duplicate removal, so that redundant data in software testing is reduced.
In some embodiments, the traffic recording configuration information may also be obtained, and the target traffic is recorded according to the traffic recording configuration information. The traffic recording configuration information may include: the traffic is copied N times for recording, and the like, which is not limited in this embodiment.
In some embodiments, after classifying the traffic according to the similarity, the method further comprises:
the traffic classification scenario information includes: acquiring a traffic processing priority matched with a classification keyword under the condition of a traffic classification scene subjected to priority processing judgment;
and determining the traffic processing priority corresponding to the traffic group according to the corresponding relation between the traffic group and the classification keywords.
For example, if an address to which a certain traffic packet is sent corresponds to a VIP user or if a certain traffic packet is an emergency transaction, the traffic group may be set to a high priority, and the traffic group may be processed with priority. According to the embodiment, the traffic can be processed according to the priority, and the traffic processing efficiency is improved.
The embodiment of the application can determine the classification key words serving as classification bases based on the flow classification scene information, and classify the traffic according to the classification key words, so that the flow classification can be realized aiming at different service scenes, the accuracy of the flow classification under different scenes is improved, in addition, after the flow classification, the classified flow groups are processed aiming at different flow classification scenes, and the flexibility of the flow processing can be improved.
An embodiment of the present application further provides a flow classification device, which can be shown in fig. 4, and includes:
an obtaining module 401, configured to obtain traffic to be classified and traffic classification scenario information;
an extracting module 402, configured to input the traffic classification scene information into a classification scene keyword extracting model to obtain a classification keyword as a classification basis, where the classification keyword includes: a classification attribute, and a classification attribute value;
a matching module 403, configured to match the classification attribute with the traffic input attribute to obtain a traffic attribute value matched with the classification attribute, where the traffic attribute value is extracted from the traffic;
a classification module 404, configured to obtain a similarity between the classification attribute value and the flow attribute value; and classifying the flow according to the similarity to obtain a classified flow group.
In some embodiments of the present application, the extracting module 402 is further configured to input the traffic classification scenario information into a candidate word extracting module in the classification scenario keyword extracting model to obtain a classification candidate word; and inputting the classified candidate words into a keyword screening module in the classified scene keyword extraction model to obtain classified keywords.
In some embodiments of the present application, the keyword screening module in the extracting module 402 includes a trained keyword extraction classifier, and the extracting module 402 is further configured to classify and label the classified candidate words through the keyword extraction classifier; and taking the classification candidate words marked as the keywords as the classification keywords.
In some embodiments of the present application, the extracting module 402 is further configured to obtain, by the keyword screening module, a number N of preset classification keywords, where N is a positive integer;
grading each classified candidate word through the keyword screening module to obtain a score of each classified candidate word;
sorting the candidate words according to the scores of the classified candidate words;
and screening the first N classification candidate words as classification keywords in the sorted classification candidate words according to the sequence of scores from large to small.
In some embodiments of the present application, the classification module 404 is further configured to classify the context information at the traffic flow, including: under the condition of a flow classification scene of software testing, recording the flow according to the classified flow group to obtain a flow recording file; and transmitting the flow recording file to computer equipment for software testing.
In some embodiments of the present application, the classifying module 404 is further configured to perform a deduplication process on the traffic in the traffic group, so as to obtain a deduplicated traffic group; and recording the de-duplicated flow group to obtain a flow recording file.
In some embodiments of the present application, the classification module 404 is further configured to classify the context information at the traffic flow, including: acquiring a traffic processing priority matched with a classification keyword under the condition of a traffic classification scene subjected to priority processing judgment; and determining the traffic processing priority corresponding to the traffic group according to the corresponding relation between the traffic group and the classification keyword.
An embodiment of the present application further provides a computer device, which integrates any one of the traffic classification apparatuses provided in the embodiment of the present application, where the computer device includes:
one or more processors;
a memory; and
one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the processor for performing the steps of the traffic classification method described in any of the above embodiments of the traffic classification method.
The embodiment of the application also provides computer equipment, which integrates any flow classification device provided by the embodiment of the application. As shown in fig. 5, it shows a schematic structural diagram of a computer device according to an embodiment of the present application, and specifically:
the computer device may include components such as a processor 501 of one or more processing cores, memory 502 of one or more computer-readable storage media, a power supply 503, and an input unit 504. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 5 does not constitute a limitation of the computer device, and may include more or fewer components than illustrated, or some components may be combined, or a different arrangement of components. Wherein:
the processor 501 is a control center of the computer device, connects various parts of the whole computer device by various interfaces and lines, performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 502 and calling data stored in the memory 502, thereby performing overall monitoring of the computer device. Optionally, processor 501 may include one or more processing cores; preferably, the processor 501 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 501.
The memory 502 may be used to store software programs and modules, and the processor 501 executes various functional applications and data processing by operating the software programs and modules stored in the memory 502. The memory 502 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 502 may also include a memory controller to provide the processor 501 with access to the memory 502.
The computer device further comprises a power supply 503 for supplying power to the various components, and preferably, the power supply 503 may be logically connected to the processor 501 through a power management system, so that functions of managing charging, discharging, power consumption, and the like are realized through the power management system. The power supply 503 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The computer device may also include an input unit 504, and the input unit 504 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the computer device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 501 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 502 according to the following instructions, and the processor 501 runs the application programs stored in the memory 502, thereby implementing various functions as follows:
acquiring traffic to be classified and traffic classification scene information;
inputting the flow classification scene information into a classification scene keyword extraction model to obtain classification keywords serving as classification bases, wherein the classification keywords comprise: a classification attribute, and a classification attribute value;
inputting the classification attribute and the flow into an attribute matching model to obtain a flow attribute value matched with the classification attribute, wherein the flow attribute value is extracted from the flow;
obtaining the similarity of the classification attribute value and the flow attribute value;
and classifying the flow according to the similarity to obtain a classified flow group.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, an embodiment of the present application provides a computer-readable storage medium, which may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like. Stored thereon, is a computer program, which is loaded by a processor to perform the steps of any of the traffic classification methods provided by the embodiments of the present application. For example, the computer program may be loaded by a processor to perform the steps of:
acquiring traffic to be classified and traffic classification scene information;
inputting the flow classification scene information into a classification scene keyword extraction model to obtain classification keywords serving as classification bases, wherein the classification keywords comprise: a classification attribute, and a classification attribute value;
inputting the classification attribute and the flow into an attribute matching model to obtain a flow attribute value matched with the classification attribute, wherein the flow attribute value is extracted from the flow;
obtaining the similarity of the classification attribute value and the flow attribute value;
and classifying the flow according to the similarity to obtain a classified flow group.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed descriptions of other embodiments, and are not described herein again.
In a specific implementation, each unit or structure may be implemented as an independent entity, or may be combined arbitrarily to be implemented as one or several entities, and the specific implementation of each unit or structure may refer to the foregoing method embodiment, which is not described herein again.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
The traffic classification method, the traffic classification device, the computer device, and the storage medium provided in the embodiments of the present application are described in detail above, and specific examples are applied herein to explain the principles and implementations of the present application, and the description of the embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, the specific implementation manner and the application scope may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A traffic classification method, characterized in that the traffic classification method comprises:
acquiring traffic to be classified and traffic classification scene information;
inputting the flow classification scene information into a classification scene keyword extraction model to obtain classification keywords serving as classification bases, wherein the classification keywords comprise: a classification attribute, and a classification attribute value;
inputting the classification attribute and the flow into an attribute matching model to obtain a flow attribute value matched with the classification attribute, wherein the flow attribute value is extracted from the flow;
acquiring the similarity of the classification attribute value and the flow attribute value;
and classifying the flow according to the similarity to obtain a classified flow group.
2. The traffic classification method according to claim 1, wherein the inputting the traffic classification scene information into a classification scene keyword extraction model to obtain a classification keyword as a classification basis comprises:
inputting the traffic classification scene information into a candidate word extraction module in the classification scene keyword extraction model to obtain classification candidate words;
and inputting the classified candidate words into a keyword screening module in the classified scene keyword extraction model to obtain classified keywords.
3. The traffic classification method according to claim 2, characterized in that the keyword filtering module comprises a trained keyword extraction classifier;
the step of inputting the classified candidate words into a keyword screening module in the classified scene keyword extraction model to obtain classified keywords comprises the following steps:
classifying and labeling the classified candidate words through the keyword extraction classifier;
and taking the classification candidate words marked as the keywords as the classification keywords.
4. The traffic classification method according to claim 2, wherein the step of inputting the classification candidate words into a keyword screening module in the classification scene keyword extraction model to obtain classification keywords comprises:
acquiring the number N of preset classified keywords through the keyword screening module, wherein N is a positive integer;
grading each classified candidate word through the keyword screening module to obtain a score of each classified candidate word;
sorting the candidate words according to the scores of the classified candidate words;
and screening the top N classified candidate words as the classified keywords from large scores to small scores in the sorted classified candidate words.
5. The traffic classification method according to any one of claims 1 to 4, characterized in that, after said classifying the traffic according to the similarity, the method further comprises:
the traffic classification scenario information includes: under the condition of a flow classification scene of software testing, recording the flow according to the classified flow group to obtain a flow recording file;
and transmitting the flow recording file to computer equipment for software testing.
6. The traffic classification method according to claim 5, wherein the recording traffic according to the classified traffic group to obtain a traffic recording file comprises:
carrying out duplicate removal processing on the flow in the flow group to obtain a duplicate-removed flow group;
and recording the de-duplicated flow group to obtain a flow recording file.
7. The traffic classification method according to any one of claims 1 to 4, characterized in that, after said classifying the traffic according to the similarity, the method further comprises:
the traffic classification scenario information includes: acquiring a traffic processing priority matched with a classification keyword under the condition of a traffic classification scene subjected to priority processing judgment;
and determining the traffic processing priority corresponding to the traffic group according to the corresponding relation between the traffic group and the classification keywords.
8. A flow classification device, comprising:
the acquisition module is used for acquiring the traffic to be classified and traffic classification scene information;
an extraction module, configured to input the traffic classification scenario information into a classification scenario keyword extraction model to obtain a classification keyword as a classification basis, where the classification keyword includes: a classification attribute, and a classification attribute value;
the matching module is used for matching the classification attribute with the flow input attribute to obtain a flow attribute value matched with the classification attribute, wherein the flow attribute value is extracted from the flow;
the classification module is used for acquiring the similarity between the classification attribute value and the flow attribute value; and classifying the flow according to the similarity to obtain a classified flow group.
9. A computer device, characterized in that the computer device comprises:
one or more processors;
a memory; and
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the processor to implement the traffic classification method of any of claims 1 to 7.
10. A computer-readable storage medium, having stored thereon a computer program which is loaded by a processor to perform the steps of the traffic classification method according to any of the claims 1 to 7.
CN202210982546.6A 2022-08-16 2022-08-16 Traffic classification method, device, computer equipment and storage medium Active CN115378880B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210982546.6A CN115378880B (en) 2022-08-16 2022-08-16 Traffic classification method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210982546.6A CN115378880B (en) 2022-08-16 2022-08-16 Traffic classification method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115378880A true CN115378880A (en) 2022-11-22
CN115378880B CN115378880B (en) 2023-08-22

Family

ID=84065736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210982546.6A Active CN115378880B (en) 2022-08-16 2022-08-16 Traffic classification method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115378880B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080279456A1 (en) * 2007-05-08 2008-11-13 Seiko Epson Corporation Scene Classification Apparatus and Scene Classification Method
US20190130216A1 (en) * 2017-11-02 2019-05-02 Canon Kabushiki Kaisha Information processing apparatus, method for controlling information processing apparatus, and storage medium
CN111797288A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Data screening method and device, storage medium and electronic equipment
CN112200272A (en) * 2020-12-07 2021-01-08 上海冰鉴信息科技有限公司 Service classification method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080279456A1 (en) * 2007-05-08 2008-11-13 Seiko Epson Corporation Scene Classification Apparatus and Scene Classification Method
US20190130216A1 (en) * 2017-11-02 2019-05-02 Canon Kabushiki Kaisha Information processing apparatus, method for controlling information processing apparatus, and storage medium
CN111797288A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Data screening method and device, storage medium and electronic equipment
CN112200272A (en) * 2020-12-07 2021-01-08 上海冰鉴信息科技有限公司 Service classification method and device

Also Published As

Publication number Publication date
CN115378880B (en) 2023-08-22

Similar Documents

Publication Publication Date Title
US10565244B2 (en) System and method for text categorization and sentiment analysis
US20200081899A1 (en) Automated database schema matching
CN108701161B (en) Providing images for search queries
Qian et al. Social event classification via boosted multimodal supervised latent dirichlet allocation
CN110929125B (en) Search recall method, device, equipment and storage medium thereof
US8606779B2 (en) Search method, similarity calculation method, similarity calculation, same document matching system, and program thereof
Xie et al. Fast and accurate near-duplicate image search with affinity propagation on the ImageWeb
KR20100106464A (en) Method and system for discovery and modification of data clusters and synonyms
CN108920649B (en) Information recommendation method, device, equipment and medium
CN110134777B (en) Question duplication eliminating method and device, electronic equipment and computer readable storage medium
CN110858217A (en) Method and device for detecting microblog sensitive topics and readable storage medium
Sundara Vadivel et al. An efficient CBIR system based on color histogram, edge, and texture features
EP3928221A1 (en) System and method for text categorization and sentiment analysis
CN110929525A (en) Network loan risk behavior analysis and detection method, device, equipment and storage medium
US20170124090A1 (en) Method of discovering and exploring feature knowledge
US20190034758A1 (en) Systems and methods for clustering of near-duplicate images in very large image collections
CN111737966A (en) Document repetition degree detection method, device, equipment and readable storage medium
CN114461783A (en) Keyword generation method and device, computer equipment, storage medium and product
CN114490923A (en) Training method, device and equipment for similar text matching model and storage medium
US20160085760A1 (en) Method for in-loop human validation of disambiguated features
CN116578700A (en) Log classification method, log classification device, equipment and medium
CN115378880B (en) Traffic classification method, device, computer equipment and storage medium
CN114444514B (en) Semantic matching model training method, semantic matching method and related device
CN113177479B (en) Image classification method, device, electronic equipment and storage medium
CN112381162B (en) Information point identification method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant