WO2023098222A1

WO2023098222A1 - Multi-service scenario identification method and decision forest model training method

Info

Publication number: WO2023098222A1
Application number: PCT/CN2022/118249
Authority: WO
Inventors: 王子晟; 张耀东; 刘昕颖
Original assignee: 中兴通讯股份有限公司
Priority date: 2021-12-03
Filing date: 2022-09-09
Publication date: 2023-06-08
Also published as: CN116304650A

Abstract

Embodiments of the present invention relate to the technical field of wireless local area networks. Disclosed are a multi-service scenario identification method and a decision forest model training method. The multi-service scenario identification method comprises: obtaining a data stream feature of packet data of a service scenario to be identified; inputting the data stream feature into a pre-trained decision forest model, the decision forest model comprising N decision trees, the decision trees being used for identifying service scenarios corresponding to data stream features, N being a natural number greater than 0, and a training sample of the decision forest model comprising data stream features of a plurality of service scenarios; and obtaining, according to identification results of the N decision trees, a service scenario corresponding to the packet data. According to the multi-service scenario identification method provided in the embodiments of the present invention, multi-service scenario identification can be achieved, and the use experience of a user is improved.

Description

Recognition method of multiple business scenarios and training method of decision forest model

technical field

Embodiments of the present invention relate to the technical field of wireless local area networks, and in particular to a multi-service scene identification method and a decision forest model training method.

Background technique

The business scene recognition algorithm allows (Wireless Local Area Network, WLAN) access points and devices to perceive the applications, services and scenes that the current user is accessing, such as games, voice, video, live broadcast, etc. Then, according to the characteristics of applications, services, and scenarios, the access point provides users with different WLAN parameter configurations and services, which can maximize the user's network access experience. For example, game packets are sent with priority to provide users with a low-latency and low-jamming gaming experience. Among them, the business scene recognition algorithm is the cornerstone of the entire process, and its performance is crucial.

The current business scene recognition algorithms are as follows:

(1) Packet-based business scenario identification: In an ad hoc network, traffic packets containing business scenario type information are sent between devices. By interpreting the contents of these traffic packets, business scenario identification can be realized.

(2) Business scenario identification based on access characteristics: specific scenarios, such as web browsing or games, will access specific ports and IPs. By establishing a one-to-one mapping library of access features and business scenarios, business scenario recognition is realized.

(3) Business scene recognition based on deep packet inspection technology: This method extracts the specific feature field carried in the traffic of each scene, and then compares the content of the traffic with the pre-established feature library to judge the business of the current traffic Scenes.

However, the above-mentioned packet-based business scene recognition algorithm, access feature-based business scene recognition algorithm and deep packet inspection technology-based business scene recognition algorithm can only identify specific business scenarios, and cannot realize the identification of multiple business scenarios.

Contents of the invention

The main purpose of the embodiments of the present invention is to propose a multi-service scene recognition method and a decision forest model training method, which can realize multi-service scene recognition and improve user experience.

In order to at least achieve the above purpose, an embodiment of the present invention provides a method for identifying multiple business scenarios, including: acquiring the data flow characteristics of message data of the business scenario to be identified; inputting the data flow characteristics into a pre-trained decision forest Model; the decision forest model includes N decision trees, and the decision tree is used to identify business scenarios of data flow characteristics; the N is a natural number greater than 0, and the training samples of the decision forest model include multiple business scenarios Data flow characteristics; according to the identification results of the N decision trees, obtain the business scenario of the message data.

In order to at least achieve the above purpose, an embodiment of the present invention also provides a training method for a decision forest model, including: obtaining training samples, the training samples include data flow characteristics of multiple business scenarios; The decision forest model is trained to obtain a trained decision forest model; wherein, the decision forest model includes N decision trees, and the decision trees are used to identify business scenarios of data flow characteristics; the N is a natural number greater than 0.

In order to at least achieve the above object, an embodiment of the present invention also provides an identification device for multiple business scenarios, including: a feature acquisition module, configured to acquire the data flow characteristics of message data of the business scenario to be identified; an input module, configured to The data flow characteristics are input into a pre-trained decision forest model; the decision forest model includes N decision trees, and the decision trees are used to identify business scenarios of data flow characteristics; the N is a natural number greater than 0, and the The training samples of the decision forest model include data flow characteristics of multiple business scenarios; the scenario acquisition module is configured to acquire the business scenarios of the message data according to the identification results of the N decision trees.

In order to at least achieve the above object, an embodiment of the present invention also provides a training device for a decision forest model, including: a sample acquisition module, configured to acquire training samples, the training samples include data flow characteristics of multiple business scenarios; the training module , is set to train the initial decision forest model according to the training samples to obtain a trained decision forest model; wherein, the decision forest model includes N decision trees, and the decision trees are used to identify the business of data flow characteristics Scenario; the N is a natural number greater than 0.

In order to at least achieve the above purpose, an embodiment of the present invention further provides an electronic device, including: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores information that can be used by the Instructions executed by at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can execute the above-mentioned multi-service scene recognition method, or execute the above-mentioned decision forest model training method .

In order to at least achieve the above purpose, an embodiment of the present invention also provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the above-mentioned multi-service scene identification method is realized, or, the above-mentioned A method for training a decision forest model.

The method for identifying multiple business scenarios proposed by the present invention first obtains the data flow characteristics of the message data of the business scenario to be identified, and inputs the data flow characteristics into a pre-trained decision forest model, wherein the decision forest model includes N decision Trees, decision trees are used to identify business scenarios of data flow characteristics, where N is a natural number greater than 0, and the training samples of the decision forest model include data flow characteristics of multiple business scenarios, according to the identification results of N decision trees, obtain Business scenarios for packet data. Since the training samples of the decision forest model include the data flow characteristics of multiple business scenarios, and the business scenarios cover a wide range, the obtained data flow characteristics of the packet data are input into the trained decision forest model, and the The identification result of the decision tree in the decision forest model obtains the business scenario of the packet data, which realizes the identification of various business scenarios, and then provides corresponding WLAN parameter configuration and services according to the business scenario of the packet data, effectively improving the User experience.

Description of drawings

One or more embodiments are exemplified by corresponding pictures in the drawings, and these exemplifications do not constitute a limitation to the embodiments, and elements with the same reference numerals in the drawings represent similar elements, Unless otherwise stated, the drawings in the drawings are not limited to scale.

FIG. 1 is a system architecture diagram provided according to an embodiment of the present invention;

FIG. 2 is a structural diagram of an access point device provided according to an embodiment of the present invention;

FIG. 3 is a flowchart of a method for identifying multiple business scenarios according to an embodiment of the present invention;

FIG. 4 is a structural diagram of a decision forest model provided according to an embodiment of the present invention;

FIG. 5 is a flow chart of a method for identifying multiple business scenarios according to another embodiment of the present invention;

Fig. 6 is the flowchart of the training method of a kind of decision forest model provided according to one embodiment of the present invention;

Fig. 7 is a schematic diagram of an identification device for a multi-service scenario provided according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a training device for a decision forest model provided according to an embodiment of the present invention;

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art can understand that in each embodiment of the present invention, many technical details are provided for readers to better understand the present invention. However, even without these technical details and various changes and modifications based on the following embodiments, the technical solution claimed in the present invention can also be realized. The division of the following embodiments is for the convenience of description, and should not constitute any limitation to the specific implementation of the present invention, and the various embodiments can be combined and referred to each other on the premise of no contradiction.

An embodiment of the present invention relates to a multi-service scenario identification method, which is applied to an access point device. The application scenario of the embodiment of the present invention may include but not limited to the system architecture shown in FIG. 1 , including: an access point device and multiple user equipments.

Specifically, a WLAN has at least one access point device, the link between the user equipment and the access point device is not limited to wireless, and the access point device provides unlimited data access services for the user equipment.

Among them, an access point device can provide unlimited data access services for one or more user devices at the same time, and the form of the user device can be but not limited to a mobile phone, an Internet of Things terminal, or another access point device, the access point Communication between devices and user equipment may be based on, but not limited to, the 802.11 family of protocols.

In a specific implementation, the structure of the access point device may be as shown in FIG. 2 , specifically including: a radio frequency module, a physical link layer, a media access control layer, and an identification module for multiple service scenarios.

Wherein, the radio frequency module and the physical link layer are configured to demodulate the wireless signal of the user equipment, and send the demodulated signal to the media access control layer.

The media control layer is configured to acquire the message data of the service scene to be identified sent by the user equipment to the Internet.

The identification module of multi-service scenarios is set to obtain the data flow characteristics of the message data of the business scenarios to be identified; input the data flow characteristics into the pre-trained decision forest model; wherein, the decision forest model includes N decision trees, and the decision tree uses Business scenarios for identifying data flow characteristics; N is a natural number greater than 0, and the training samples of the decision forest model include data flow characteristics of multiple business scenarios; according to the identification results of N decision trees, obtain the business scenario of message data.

The implementation flowchart of the identification method for multiple business scenarios in this embodiment is shown in Figure 3, specifically including:

Step 301, acquire the data flow characteristics of the message data of the business scene to be identified.

Step 302, input the data stream features into the pre-trained decision forest model.

Step 303, according to the identification result of the decision tree, obtain the business scenario of the message data.

In this embodiment, first obtain the data flow characteristics of the message data of the service scene to be identified, and input the data flow characteristics into the pre-trained decision forest model, wherein the decision forest model includes N decision trees, and the decision tree uses Business scenarios for identifying data flow characteristics, where N is a natural number greater than 0, and the training samples of the decision forest model include data flow characteristics of multiple business scenarios. According to the identification results of N decision trees, the business of obtaining message data Scenes. Since the training samples of the decision forest model include the data flow characteristics of multiple business scenarios, and the business scenarios cover a wide range, the obtained data flow characteristics of the packet data are input into the trained decision forest model, and the The identification result of the decision tree in the decision forest model obtains the business scenario of the packet data, which realizes the identification of various business scenarios, and then provides corresponding WLAN parameter configuration and services according to the business scenario of the packet data, effectively improving the User experience.

The implementation details of the method for identifying multiple service scenarios in this embodiment are described in detail below. The following content is only implementation details provided for easy understanding, and is not necessary for implementing this solution.

In step 301, the access point device identifies and counts the packet data sent by the user equipment, so as to obtain the data flow characteristics of the packet data.

Among them, the data flow characteristics include one or any combination of the following: source IP, source port, destination IP, destination port, protocol type, maximum packet size, minimum packet size, average packet size, variance of packet size, Maximum message exchange time, minimum message exchange time, average message exchange time.

Specifically, the source IP is the Internet Protocol address of the data stream sender; the source port is the port address of the data stream sender; the destination IP is the Internet Protocol address of the data stream receiver; the destination port is the port address of the data stream receiver; The type is the Internet protocol type, for example, Transmission Control Protocol/Internet Protocol (Transmission Control Protocol/Internet Protocol, TCP/IP protocol), User Datagram Protocol (User Datagram Protocol, UDP), or Internet Control Message Protocol (Internet Control Message Protocol, ICMP); the maximum packet size and the minimum packet size are obtained directly from the media control layer in the access point device; the average packet size is calculated according to the maximum packet size and the minimum packet size; the variance of the packet size is calculated according to The maximum packet size, the minimum packet size and the average packet size are calculated; the maximum packet exchange time and the minimum packet exchange time are obtained by recording the access point device timer; the average packet exchange time is based on the maximum The text exchange time is calculated.

In step 302, the access point device inputs the acquired data flow features into a pre-trained decision forest model.

Among them, the decision forest model is a data structure, which is obtained by training with the gradient boosting decision tree method. The specific structure of the decision forest model is shown in Figure 4, including N decision trees, where N is a natural number greater than 0.

Specifically, each decision tree is composed of nodes and leaf nodes. Each node contains a logical judgment for a data flow feature. When the logical judgment is true, enter the corresponding subtree and judge the node logic of the subtree until reaching leaf node. Wherein, the number of leaf nodes of each decision tree is the same as the number of categories of business scenarios.

By inputting the characteristics of the data stream into the aforementioned decision forest model including N decision trees, the recognition results of the N decision trees can be obtained. Wherein, the recognition result of the decision tree includes the recognized business scenario and the weight of the recognized business scenario.

In an example, when the data flow characteristics include source IP and/or destination IP, after obtaining the data flow characteristics of the message data of the service scenario to be identified, before inputting the data flow characteristics into the pre-trained decision forest model, the The acquired source IP and/or destination IP are processed for data to improve the performance of business scene recognition.

Specifically, the address of the source IP and/or destination IP is converted into binary data, and then the binary data is converted into an unsigned integer, and then the decimal precision normalization is performed on the binary data converted into an unsigned integer, to obtain the normalized The source IP and/or destination IP used to input the decision forest model.

Wherein, the calculation formula of the normalized source IP and/or destination IP is:

is the normalized source IP and/or destination IP, IP _int is an integerized source IP and/or destination IP, and K _IP is a normalization coefficient of the source IP and/or destination IP. In one example, K _ip is 9.

In an example, when the data flow characteristics include source port and/or destination port, after obtaining the data flow characteristics of the message data of the business scenario to be identified, before inputting the data flow characteristics into the pre-trained decision forest model, the Decimal precision normalization is performed on the source port and/or target port to obtain the normalized source port and/or target port for input into the decision forest model, so as to improve the performance of business scenario recognition.

Wherein, the calculation formula of the normalized source port and/or target port is:

is the normalized source port and/or target port, port is the source port and/or target port, and K _port is the normalization coefficient of the source port and/or target port. In one example, the K _port is 5.

In one example, when the data flow characteristics include the protocol (Protocol) type, after obtaining the data flow characteristics of the message data of the business scene to be identified, before inputting the data flow characteristics into the pre-trained decision forest model, the The mapping relationship between the protocol type and the integer, the currently obtained protocol type is mapped to the corresponding integer, and the corresponding integer is used as the protocol type input into the decision forest model, so as to improve the performance of business scenario recognition.

Among them, the mapping function used is as follows:

Pro _int is the protocol type mapped to an integer, and Protocol is the protocol type name of the data stream characteristics.

It should be noted that the normalized source IP and/or destination IP, source port and/or destination port obtained by the above method are only examples, and the specific normalization method may not be unique.

In step 303, the access point device obtains the service scenario of the packet data according to the obtained identification results of the N decision trees.

Specifically, since the leaf node of each decision tree in the decision forest model contains a value, this value represents the weight of the business scenario category corresponding to the leaf node under the input current message data, therefore, according to the N decision tree's For the recognition result, the weights of the identified business scenarios of the same category are accumulated to obtain the weight accumulation value of each recognized business scenario, and the business scenario with the highest weight accumulation value is used as the business scenario of the message data.

It is worth mentioning that the method for identifying multiple business scenarios in the embodiment of the present invention realizes the identification of multiple business scenarios based on the decision forest model, and the recognition accuracy is relatively high.

In one example, in order to reflect the higher recognition accuracy of the multi-service scene recognition method of the embodiment of the present invention, the following tests were carried out:

Use the network packet analysis software Wireshark to capture network traffic, and analyze and calibrate the traffic of the message data. The business scenarios of the message data are divided into the following 15 categories, namely: network access, system, download, web browsing, Voice, mail, streaming, social media, chat, remote writing, music, cloud storage, software upgrades, video and others. Among them, each category has 2000 to 6000 different numbers of data flow samples, and the probability of defining a category of a scene is:

Table 1 embodies the detection probability of the multi-service scene identification method of the embodiment of the present invention for each service scene category:

Table 1

类名class name	检测概率(％)Detection probability (%)	训练样本数量(个)Number of training samples (pieces)	检测样本数量(个)Number of test samples (pieces)
全部应用apply all	87.1487.14	280555280555	7005270052
网络访问network access	99.1899.18	2385223852	61486148
系统system	99.1199.11	2412924129	58715871
下载download	97.6497.64	2394023940	60606060
网页Web page	96.9496.94	2397523975	60256025
其他other	95.6495.64	2405724057	59435943

语音voice	87.0287.02	2401124011	59895989
邮件mail	86.8086.80	1816818168	45404540
流媒体stream media	84.4284.42	2404224042	59585958
社交媒体social media	83.6383.63	2395123951	60596059
聊天chat	81.1181.11	2394723947	60536053
音乐music	74.9674.96	28502850	695695
云存储cloud storage	71.8371.83	2416724167	58335833
软件升级software upgrade	67.2867.28	1321113211	32893289
视频video	61.1761.17	62556255	15891589

According to Table 1, it can be seen that the identification accuracy rate of the multi-service scene identification method in the embodiment of the present invention is as high as 87.14%. Among them, the detection probability of 6 categories exceeded 95%, and the detection probability of 11 categories exceeded 80%. An overall high probability of detection is guaranteed.

It should be noted that, the above-mentioned examples in this embodiment are illustrations for easy understanding, and do not limit the technical solutions of the embodiments of the present invention.

Another embodiment of the present invention relates to a method for identifying multiple business scenarios. The implementation details of the method for identifying multiple business scenarios in this embodiment are described in detail below. As necessary for the solution, Figure 5 is a flow chart of the method for identifying multiple business scenarios described in this embodiment, specifically including:

Step 501, update the data flow features within the latest preset time period to the training samples of the decision forest model.

Specifically, after a certain period of time, the access point device will request the user equipment to feed back the services within the latest preset time period, and use the data flow characteristics obtained within the latest preset time period as the corresponding data flow characteristics of the service, and The data flow characteristics within the latest preset time period are updated to the training samples of the decision forest model.

Among them, the data flow characteristics within the latest preset time period are calibrated with business scenarios, and the data volume of the data flow features within the latest preset time period is the same as the data volume of the training samples before the update, so as to avoid excessive reliance on the latest preset time period The dataflow characteristic of the dataflow characteristic.

In an example, the service scenario of data stream feature calibration can be implemented in the following manner. The access point device uploads the original packet data fed back by the user equipment to the server under the permission of the user equipment, and uses deep packet inspection by the server. Calibrate the business scenario by means of a manual method, or perform the calibration of the business scenario manually.

Step 502, update the decision forest model according to the updated training samples.

According to the updated training sample, the access point device uses the data flow characteristics within the latest preset time period to add a leaf node under the original decision tree node of the decision forest model to update the decision tree, thereby updating the decision forest model .

Step 503, acquire the data flow characteristics of the message data of the service scenario to be identified.

Wherein, step 503 is substantially the same as step 301 and will not be repeated here.

Step 504, input data flow features into the updated decision forest model.

Step 505, according to the identification result of the decision tree, obtain the business scenario of the packet data.

In this embodiment, the decision forest model can be optimized by updating the data flow characteristics within the latest preset time period to the training samples of the decision forest model, and updating the decision forest model according to the updated training samples, which further improves the Recognition performance of multiple business scenarios.

Another embodiment of the present invention relates to a method for training a decision forest model, which is applied to an access point device. The implementation details of the training method of the decision forest model of this embodiment are described in detail below. The following content is only the implementation details provided for the convenience of understanding, and is not necessary for the implementation of this solution. FIG. 6 is the implementation of the decision forest model described in this embodiment. Flowchart of the training method, specifically including:

Step 601, acquire training samples.

Specifically, the access point device obtains the data flow characteristics from the user equipment as the initial training samples of the decision forest model, where there may be one or more user equipment, and the training samples include the data flow characteristics of multiple business scenarios .

In a specific implementation, the source IP and/or destination IP is the source IP and/or destination IP converted by binary data and unsigned integer and normalized by decimal precision, and the source port and/or destination port is normalized by decimal precision The source port and/or destination port, the protocol type is the integer mapped according to the preset mapping relationship between the protocol type and the integer, so as to improve the recognition performance of the decision forest model.

In one example, in order to show that the recognition performance of the decision forest model can be improved by using the data stream features after data processing in the embodiment of the present invention for model training, the following tests were carried out:

The specific test process adopted is the same as the test process in the first embodiment, and will not be repeated here. The difference is that before the initial decision forest model is trained, the above-mentioned data processing is not performed on the acquired data flow characteristics, and then After the identification of multiple business scenarios is obtained, the detection probability of each business scenario category is shown in Table 2:

Table 2

类别category	检测概率(％)Detection probability (%)
网络访问network access	99.2199.21
其他other	95.5795.57
下载download	95.5695.56
系统system	95.1195.11
语音voice	85.1485.14
网页Web page	84.2184.21
全部业务all business	83.8383.83
邮件mail	80.7380.73
聊天chat	79.8879.88
流媒体stream media	79.8079.80
音乐music	79.0979.09
社交媒体social media	76.3976.39
云存储cloud storage	71.7771.77
软件升级software upgrade	69.2369.23
视频video	58.4258.42

According to Table 2, the overall recognition probability is reduced from 87.14% in Table 1 to 83.83%. The number of business categories with a detection probability of more than 95% has been reduced from 6 to 4, and the number of business categories with a detection probability of more than 80% has been reduced from 11 to 8. Among them, individual businesses, such as social media business, have a drop rate of detection probability as high as 7.24%.

Therefore, when training the decision forest model, the training samples use the processed data flow features, which can effectively improve the business recognition performance.

Step 602: Train the initial decision forest model according to the training samples to obtain a trained decision forest model.

Specifically, the access point device uses the gradient boosting decision tree method to train the initial decision forest model according to the obtained training samples, so as to obtain a trained decision forest model. Among them, the decision forest model includes N decision trees, which are used to identify business scenarios of data flow characteristics; N is a natural number greater than 0.

The specific training process is as follows: first define _xi as the i-th training data, define y _i as the business scenario category corresponding to the i-th training data, and define

is the prediction result of the model for the i-th training data.

Wherein, T is the number of decision trees, f _t is the function of the tth decision tree, t=1, ..., T, to determine the training parameters of the decision tree when the recognition accuracy of the decision forest model is the highest.

Define θ _t as the parameter of f _t , that is, when the recognition accuracy of the decision forest model is the highest, the training parameters of the decision tree can be obtained by the following calculation formula

In one example, the training sample is iterated for T times, and in each iteration, based on the training parameters used in the previous iteration, a new one is added such that the objective function

descending tree, and define

The prediction result of the model updated for the t-th iteration for the i-th training data. At the tth iteration, use the following calculation method to get

In this embodiment, the recognition accuracy of the decision forest model can be improved by obtaining the optimal training parameters, that is, the training parameters of the decision tree, for example, the number of decision trees, during the training of the decision forest model.

The step division of the above various methods is only for the sake of clarity of description. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.

Another embodiment of the present invention relates to an identification device for multi-service scenarios. The details of the identification device for multi-service scenarios in this embodiment will be described in detail below. The following content is only the implementation details provided for the convenience of understanding, not the implementation of this embodiment As necessary, FIG. 7 is a schematic diagram of an apparatus for identifying multiple business scenarios described in this embodiment, including: a feature acquisition module 701 , an input module 702 and a scenario acquisition module 703 .

Specifically, the feature acquisition module 701 is configured to acquire the data flow features of the message data of the service scenario to be identified.

The input module 702 is configured to input data flow characteristics into a pre-trained decision forest model; wherein, the decision forest model includes N decision trees, and the decision trees are set to identify business scenarios of data flow characteristics; N is a natural number greater than 0, and the decision The training samples of the forest model include data flow characteristics of multiple business scenarios.

In one example, the input module 702 is also configured to convert the address of the source IP and/or the destination IP into binary data when the data flow characteristics include the source IP and/or the destination IP; convert the binary data into an unsigned integer; Perform decimal precision normalization for unsigned integer binary data, and after obtaining the normalized source IP and/or destination IP for input into the decision forest model, input the normalized source IP and/or destination IP Pre-trained decision forest models.

In an example, the input module 702 is also configured to perform decimal precision normalization on the source port and/or the target port when the data flow characteristics include the source port and/or the target port, and obtain the normalized input decision forest After the source port and/or target port of the model, the normalized source port and/or target port are input into the pre-trained decision forest model.

In one example, the input module 702 is also configured to input the corresponding integer after the protocol type currently obtained is mapped to the corresponding integer according to the preset mapping relationship between the protocol type and the integer after the data flow feature is included in the protocol type. Pre-trained decision forest models.

The scenario acquisition module 703 is configured to acquire the business scenario of the message data according to the identification results of the N decision trees.

In one example, the scenario acquisition module 703 is also configured to accumulate the weights of identified business scenarios of the same category according to the identification results of the N decision trees to obtain the weight accumulation value of each identified business scenario; the weights are accumulated The business scenario with the highest value is used as the business scenario of the packet data.

Another embodiment of the present invention relates to a training device for a decision forest model. The details of the training device for a decision forest model in this embodiment are described in detail below. The following content is only an implementation detail provided for easy understanding, and is not an implementation of this embodiment As necessary, FIG. 8 is a schematic diagram of a training device for the decision forest model described in this embodiment, including: a sample acquisition module 801 and a training module 802 .

Specifically, the sample acquisition module 801 is configured to acquire training samples, wherein the training samples include data flow characteristics of multiple business scenarios.

The training module 802 is set to train the initial decision forest model according to the training samples to obtain the trained decision forest model; wherein, the decision forest model includes N decision trees, and the decision trees are used to identify business scenarios of data flow characteristics; N is a natural number greater than 0.

It is not difficult to find that this embodiment is an apparatus embodiment corresponding to the above embodiment of the method for identifying multiple service scenarios, and this embodiment can be implemented in cooperation with the above method embodiment. The relevant technical details and technical effects mentioned in the above embodiments are still valid in this embodiment, and will not be repeated here to reduce repetition. Correspondingly, the relevant technical details mentioned in this embodiment can also be applied in the above embodiments.

It is worth mentioning that all the modules involved in the above two embodiments are logical modules. In practical applications, a logical unit can be a physical unit or a part of a physical unit, and can also Combination of physical units. In addition, in order to highlight the innovative part of the embodiment of the present invention, this embodiment does not introduce units that are not closely related to solving the technical problems raised by the embodiment of the present invention, but this does not mean that there are no other elements in this embodiment unit.

Another embodiment of the present invention relates to an electronic device, as shown in FIG. 9 , including: at least one processor 901; and a memory 902 communicatively connected to the at least one processor 901; wherein, the memory 902 stores Instructions that can be executed by the at least one processor 901, the instructions are executed by the at least one processor 901, so that the at least one processor 901 can execute the multi-service scene identification method and A method for training a decision forest model.

Wherein, the memory and the processor are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory together. The bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein. The bus interface provides an interface between the bus and the transceivers. A transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium. The data processed by the processor is transmitted on the wireless medium through the antenna, further, the antenna also receives the data and transmits the data to the processor.

The processor manages the bus and general processing, and can also provide various functions including timing, peripheral interfacing, voltage regulation, power management, and other control functions. Instead, memory can be used to store data that the processor uses when performing operations.

Another embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The above method embodiments are implemented when the computer program is executed by the processor.

That is, those skilled in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing related hardware through a program, the program is stored in a storage medium, and includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

Those of ordinary skill in the art can understand that the above-mentioned implementation modes are specific examples for implementing the embodiments of the present invention, and in practical applications, various changes can be made to them in form and details without departing from the implementation of the present invention. spirit and scope of the example.

Claims

A method for identifying multiple business scenarios, comprising:

Obtain the data flow characteristics of the message data of the business scene to be identified;

The data flow characteristics are input into a pre-trained decision forest model; the decision forest model includes N decision trees, and the decision trees are used to identify business scenarios of data flow characteristics; the N is a natural number greater than 0, so The training samples of the above decision forest model include data flow characteristics of multiple business scenarios;

According to the identification results of the N decision trees, the business scenario of the packet data is acquired.
The method for identifying multiple business scenarios according to claim 1, wherein the identification result of the decision tree includes the identified business scenarios and the weight of the identified business scenarios;

The business scenario of obtaining the message data according to the identification results of the N decision trees includes:

According to the identification results of the N decision trees, the weights of the identified business scenarios of the same category are accumulated to obtain the weight accumulation value of each identified business scenario;

The business scenario with the highest accumulated weight value is used as the business scenario of the packet data.
The method for identifying multi-service scenarios according to claim 1, wherein the data flow characteristics include one of the following or any combination thereof: source IP, source port, destination IP, destination port, protocol type, maximum packet size, Minimum packet size, average packet size, variance of packet size, maximum packet exchange time, minimum packet exchange time, average packet exchange time.
The method for identifying multiple business scenarios according to claim 3, wherein, when the data flow characteristics include source IP and/or destination IP, after obtaining the data flow characteristics of the message data of the business scenario to be identified, the Before the data flow features are input into the pre-trained decision forest model, it also includes:

Convert the address of source IP and/or destination IP into binary data;

Convert the binary data into an unsigned integer;

Perform decimal precision normalization on the binary data converted into an unsigned integer to obtain a normalized source IP and/or destination IP for inputting into the decision forest model;

When the data flow characteristics include the source port and/or the destination port, after the data flow characteristics of the message data of the service scene to be identified are obtained, before the data flow characteristics are input into the pre-trained decision forest model, the include:

Decimal precision normalization is performed on the source port and/or the target port to obtain a normalized source port and/or target port for inputting into the decision forest model.
The method for identifying multiple business scenarios according to claim 3, wherein, when the data flow feature includes a protocol type, after the acquisition of the data flow feature of the message data of the service scene to be identified, the data flow feature Before inputting the pre-trained decision forest model, also include:

According to the preset mapping relationship between protocol types and integers, the currently acquired protocol type is mapped to a corresponding integer, and the corresponding integer is used as the protocol type input into the decision forest model.
The method for identifying multiple business scenarios according to any one of claims 1 to 5, wherein, after obtaining the business scenarios of the message data according to the identification results of the N decision trees, further comprising:

Updating the data flow characteristics within the latest preset time length to the training samples, wherein the data flow characteristics within the latest preset time length are marked with business scenarios, and the data volume of the data flow characteristics within the latest preset time length , which is the same as the data volume of the training sample before updating;

The decision forest model is updated according to the updated training samples.
The method for identifying multiple business scenarios according to any one of claims 1 to 5, wherein the decision forest model is trained by using a gradient boosting decision tree method.
A training method for a decision forest model, comprising:

Acquiring training samples, the training samples include data flow characteristics of multiple business scenarios;

Training the initial decision forest model according to the training samples to obtain the trained decision forest model;

Wherein, the decision forest model includes N decision trees, and the decision trees are used to identify business scenarios of data flow characteristics; the N is a natural number greater than 0.
The training method of the decision forest model according to claim 8, wherein the data flow characteristics include one of the following or any combination thereof: source IP, source port, destination IP, destination port, protocol type, maximum packet size, Minimum packet size, average packet size, variance of packet size, maximum packet exchange time, minimum packet exchange time, average packet exchange time.
The training method of the decision forest model according to claim 9, wherein, the source IP and/or the destination IP are the source IP and/or the destination IP through binary data and unsigned integer conversion and decimal precision normalization;

The source port and/or target port are source ports and/or target ports normalized by decimal precision;

The protocol type is an integer obtained by mapping according to a preset mapping relationship between protocol types and integers.
An identification device for multiple business scenarios, comprising:

The feature acquisition module is configured to acquire the data flow features of the message data of the business scene to be identified;

The input module is configured to input the data flow characteristics into a pre-trained decision forest model; the decision forest model includes N decision trees, and the decision trees are used to identify business scenarios of data flow characteristics; the N is greater than A natural number of 0, the training samples of the decision forest model include data flow characteristics of multiple business scenarios;

The scenario acquiring module is configured to acquire the business scenario of the message data according to the identification results of the N decision trees.
A training device for a decision forest model, comprising:

The sample acquisition module is configured to acquire training samples, the training samples include data flow characteristics of multiple business scenarios;

The training module is configured to train the initial decision forest model according to the training samples to obtain the trained decision forest model;

Wherein, the decision forest model includes N decision trees, and the decision trees are used to identify business scenarios of data flow characteristics; the N is a natural number greater than 0.
An electronic device comprising:

at least one processor; and,

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can perform the operation described in any one of claims 1 to 7 The method for identifying multiple business scenarios described above, or, execute the method for training a decision forest model as described in any one of claims 8 to 10.
A computer-readable storage medium storing a computer program, wherein, when the computer program is executed by a processor, the method for identifying multiple business scenarios according to any one of claims 1 to 7 is realized, or claim 8 is realized To the training method of the decision forest model described in any one of 10.