CN116566735B

CN116566735B - Method for identifying malicious traffic through machine learning

Info

Publication number: CN116566735B
Application number: CN202310763667.6A
Authority: CN
Inventors: 金飞; 黄泽源
Original assignee: Beijing Yunke Anxin Technology Co ltd
Current assignee: Beijing Yunke Anxin Technology Co ltd
Priority date: 2023-06-27
Filing date: 2023-06-27
Publication date: 2023-09-12
Anticipated expiration: 2043-06-27
Also published as: CN116566735A

Abstract

The invention relates to the technical field of data processing, in particular to a method for identifying malicious traffic by machine learning, which comprises the following steps: setting up a test scene; judging according to the reaction time of the mouse to determine the operation tendency of the single operation; judging the mouse movement speed with the operation tendency being the machine simulation tendency to determine the operation type; comparing the operation type of the single operation with the actual operation type, and adjusting the reaction time interval and the movement speed interval of the machine according to the comparison result; judging the number of times of reproduction of the machine flow in a preset time length to determine whether the machine flow belongs to malicious flow or not; according to the invention, the machine operation or the manual operation is determined by means of identifying the movement of the mouse, and the malicious flow is judged according to the number of times of reproduction of the machine operation, so that the accuracy of identifying the machine operation is effectively improved, and meanwhile, the stability of identifying the malicious flow is effectively improved.

Description

Method for identifying malicious traffic through machine learning

Technical Field

The invention relates to the technical field of data processing, in particular to a method for identifying malicious traffic by machine learning.

Background

Malicious traffic is used as an attack means for jeopardizing information security, is more difficult to effectively identify under the condition of technical support, and particularly comprises mouse operation, and has huge damage to the information security; however, in the conventional recognition means, malicious traffic is often detected by using an ML or DL method, but the DL or ML detection method has low recognition efficiency and poor success rate in a scene including a mouse operation.

Chinese patent grant bulletin number: CN112989339B discloses a method for detecting malicious code intrusion of GCC compiler based on machine learning, which specifically comprises the following steps: step 1, downloading a c language source code dataset; step 2, converting the source code data set obtained in the step 1 into a binary file; step 3, preprocessing the sample set obtained in the step 2; and 4, building a BP neural network model and training the model, inputting the characteristic value obtained in the step 4 into the BP neural network model for training, obtaining an optimal neural network model and outputting the optimal neural network model. And 5, carrying out prediction classification on the neural network model output in the step 4, and carrying out parameter adjustment training on the model in the step 4 according to the test result. According to the invention, by automatically extracting the software fingerprint characteristics of the GCC compiler, the fingerprint characteristics of malicious codes in the compiler are detected, so that whether one GCC compiler is invaded by the malicious codes or not is detected.

It can be seen that the above technical solution has the following problems: malicious traffic generated by the mouse track simulated by the machine cannot be effectively identified.

Disclosure of Invention

Therefore, the invention provides a method for identifying malicious traffic by machine learning, which is used for solving the problem that the malicious traffic identification is unstable because the malicious traffic generated by a mouse track simulated by a machine cannot be effectively identified in the prior art.

To achieve the above object, the present invention provides a method for identifying malicious traffic by machine learning, including:

step S1, setting up a test scene, and recording parameters of a manual mouse and a machine mouse in the test scene;

step S2, judging according to the reaction time length of the mouse in the single operation, and determining the operation tendency of the single operation according to the reaction time length interval of the machine;

step S3, judging the movement speed of the mouse in the single operation with the operation tendency being the machine simulation tendency, and judging the operation type of the single operation according to the movement speed interval of the machine;

step S4, comparing the operation type of the single operation with the actual operation type, adjusting the machine reaction time interval and the machine movement speed interval according to the comparison result, and repeating the steps S2 to S3 until the practical condition is reached;

step S5, identifying the part containing the mouse operation in the flow, recording the flow containing the machine simulation as the machine flow, and judging the number of times of reproduction of the machine flow in a preset time length to determine whether the machine flow belongs to malicious flow;

the parameters comprise the reaction time length and the mouse movement speed, the machine reaction time length is the time length spent by a machine for identifying the scene picture of the single operation and simulating the operation of a mouse, the reaction time length is longer than the manual reaction time length, the operation tendency comprises the machine simulation tendency and the manual operation tendency, the machine movement speed interval is a corresponding interval which is performed on a plane corresponding to the test scene and does not exceed a preset speed error, the operation type comprises machine operation and manual operation, the practical condition is that the operation type of any operation is the same as the actual operation type of the operation, and the machine simulation is that the machine is used for carrying out image identification and simulating the manual operation of the mouse;

the method comprises the steps of S1 to S5, wherein software taking a server as a carrier is used, the preset duration is related to the maximum load of the server, and the number of times of reproduction is the number of times of flow rate of the machine simulation in the preset duration;

the preset speed error is a standard error value of uniform motion, and is related to the pixels of the scene picture.

Further, in the step S2, for the single operation, the corresponding mouse reaction time length is the measured time length used by the mouse pointer to reach the target point in a continuous and smooth path from the starting point of the preset distance from the target point, and the server compares the measured time length with the machine reaction time length interval to determine the operation tendency of the single operation;

if the measured time length is within the machine reaction time length interval, the server judges that the single operation is the machine simulation trend, and judges the mouse movement speed;

and if the measured time length is not in the machine reaction time length interval, the server judges that the single operation is the manual operation tendency, and judges that the operation type of the operation is manual operation.

Further, in the step S2, the machine reaction duration is a duration interval not less than a maximum manual operation reaction duration and not greater than a maximum machine operation reaction duration;

the maximum response time of the manual operation is the maximum time used for moving the mouse pointer from the departure point to the target point in the manual operation, and the maximum response time of the machine operation is the maximum time used for carrying out image recognition on the test scene by the machine and moving the mouse pointer from the departure point to the target point.

Further, in the step S3, when judging the movement speed of the mouse, the server decomposes the movement of the mouse into a coordinate system formed by mutually perpendicular coordinate axes, and judges the component speeds of the coordinate system in the direction of a single coordinate axis respectively;

for a single component speed, the server compares the corresponding first derivative with the preset speed error to determine the operation type of the single operation;

if any point of the first derivative in the single partial speed is not in the interval corresponding to the preset speed error, the server judges that the operation type of the single operation corresponding to the partial speed is manual operation;

and if all points in the first derivative of the single minute speed are in the interval corresponding to the preset speed error, the server judges that the operation type of the single operation corresponding to the minute speed is machine operation.

Further, in the step S3, a minimum manual speed change threshold is set in the server, and the preset speed error is a section formed from a negative value corresponding to the minimum manual speed change threshold to a positive value corresponding to the minimum manual speed change threshold;

wherein the minimum manual speed change threshold is inversely proportional to the sensitivity of the mouse.

Further, in the step S4, a preset adjustment ratio is set in the server, and if the server determines that the single operation corresponding to the manual operation is a machine operation, the server increases the maximum reaction duration of the manual operation by the preset ratio;

if the server judges that the single operation corresponding to the machine operation is manual operation, the server reduces the minimum manual speed change threshold by the preset proportion;

wherein the preset ratio is proportional to the sensitivity of the mouse.

Further, in the step S5, a threshold number of times of reproduction is set in the server, the server compares the number of times of reproduction with the threshold number of times of reproduction to determine a category of the machine traffic,

if the number of times of reproduction is not greater than the threshold value of the number of times of reproduction, the server judges that the machine flow is normal flow;

and if the number of the reproduction times is larger than the threshold value of the number of the reproduction times, the server judges that the machine traffic is malicious traffic.

Further, in the step S5, the preset duration is proportional to the maximum load of the server and is less than or equal to 24 hours.

Further, in the step S1, the test scene is a rectangular interface including at least one button, and the size of the button can be recognized by a machine.

Further, in the step S2, if the mouse pointer does not jump from the departure point to the target point through any path in the single operation, the server determines that the single operation is the machine operation.

Compared with the prior art, the method has the beneficial effects that the method utilizes the mode of identifying the movement of the mouse to determine the machine operation or the manual operation, judges the malicious flow according to the number of times of reproduction of the machine operation, and effectively improves the identification accuracy of the machine operation and the identification stability of the malicious flow.

Further, by judging the reaction time of the mouse operation, the operation tendency of the mouse operation is judged, so that the accuracy of judging the type of the mouse operation is effectively improved, and meanwhile, the stability of malicious flow identification is further improved.

Further, the characteristics of the machine operation mouse are confirmed in a mode of judging the movement speed of the mouse, so that the identification capability of the machine operation mouse is effectively improved, and meanwhile, the stability of malicious flow identification is further improved.

Further, the movement speed of the mouse and the judgment parameters of the reaction time length are adjusted through the test, so that the recognition accuracy is effectively improved, more application scenes can be compatible after training, and the stability of malicious flow recognition is further improved.

Further, by judging the number of times of reproduction of the traffic, the accuracy of identifying the malicious traffic of the machine is effectively improved, and meanwhile, the concealment of the malicious traffic is reduced, so that the stability of identifying the malicious traffic is further improved.

Further, through the mode of simulating the use scene, the reliability of identifying the machine operation is effectively improved, and meanwhile, the stability of identifying malicious traffic is further improved.

Further, through judging the special operation of the machine operation mouse, the mouse path which cannot be completed by manual operation is identified, so that the identification efficiency is effectively improved, and meanwhile, the stability of malicious flow identification is further improved.

Drawings

FIG. 1 is a flow chart of a method for machine learning to identify malicious traffic in accordance with an embodiment of the present invention;

FIG. 2 is a schematic plan view of a mouse path according to an embodiment of the present invention;

FIG. 3 is a line graph of mouse speed according to an embodiment of the present invention;

wherein: 1, a mouse cursor; 2, a button; 3, machine path; 4, a manual path; 5, starting point; and 6, testing a scene.

Detailed Description

In order that the objects and advantages of the invention will become more apparent, the invention will be further described with reference to the following examples; it should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention and are not limiting the scope of the present invention.

It should be noted that, in the description of the present invention, terms such as "upper," "lower," "left," "right," "inner," "outer," and the like indicate directions or positional relationships based on the directions or positional relationships shown in the drawings, which are merely for convenience of description, and do not indicate or imply that the apparatus or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.

Furthermore, it should be noted that, in the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood as the case may be for manual mobilization by a person skilled in the art.

To aid understanding, the invention is explained with respect to nouns:

malicious traffic: the method refers to an attack form of long-term persistent network attack on a specific target by utilizing an advanced attack means, and under the current environment, the common attack means is to simulate the operation of a person by utilizing equipment so as to achieve the aim of spoofing safety software;

machine learning: the process of how computers simulate or implement learning behavior of humans to acquire new knowledge or skills, reorganize existing knowledge structures, and continually improve their own performance is specially studied.

Mouse pointer: the resulting image of the mouse position is identified on the graphical interface.

Resolution ratio: referring to the resolution of a measurement or display system to a detail, the resolution in the present invention is related to the pixels of the screen.

Referring to fig. 1, a flowchart of a method for identifying malicious traffic by machine learning according to an embodiment of the invention is shown, and the method for identifying malicious traffic based on machine learning includes:

step S3, judging the movement speed of the mouse in a single operation with the operation tendency being the machine simulation tendency, and judging the operation type of the single operation according to the movement speed interval of the machine;

step S4, comparing the operation type of the single operation with the actual operation type, adjusting the reaction time interval and the movement speed interval of the machine according to the comparison result, and repeating the steps S2 to S3 until the practical condition is reached;

the parameters comprise reaction time length and mouse movement speed, wherein the machine reaction time length is the time length spent by a machine for identifying a scene picture of single operation and simulating the operation of a mouse, the reaction time length is longer than the manual reaction time length, the operation tendency comprises machine simulation tendency and manual operation tendency, the machine movement speed interval is a corresponding interval which is performed on a plane corresponding to a test scene and does not exceed a preset speed error, the operation type comprises machine operation and manual operation, the practical condition is that the operation type of any operation is the same as the actual operation type of the operation, and the machine is simulated to perform image identification through the machine and simulate the manual operation of the mouse;

the method comprises the steps of using software taking a server as a carrier to perform the steps, wherein the preset duration is related to the maximum load of the server, and the reproduction times are times of the flow rate of machine simulation in the preset duration;

the preset speed error is a standard error value of uniform motion, and is related to pixels of the scene picture.

In practice, the duration of manual reaction is generally 200ms-240ms, the duration of machine reaction is generally 440ms-460ms, the speed of machine movement is uniform at any direction of component speed, the error generated by the machine is related to the resolution of the image, and when the resolution is lower, the error is larger.

Specifically, in step S2, for a single operation, the corresponding mouse reaction time length is the measured time length used by the mouse pointer to reach the target point in a continuous and smooth path from the start point of the preset distance from the target point, the server compares the measured time length with the machine reaction time length interval to determine the operation tendency of the single operation,

if the measured time length is in the machine reaction time length interval, the server judges that the single operation is the machine simulation trend, and judges the movement speed of the mouse;

if the measured time is not in the machine reaction time interval, the server judges that the single operation is a manual operation trend, and judges that the operation type of the operation is manual operation;

specifically, in step S2, the machine reaction duration is a duration interval that is not less than the maximum manual operation reaction duration and not greater than the maximum machine operation reaction duration;

By means of judging the reaction time of the mouse operation, the operation tendency of the mouse operation is judged, and the accuracy of judging the mouse operation type is effectively improved, and meanwhile, the stability of malicious flow identification is further improved.

It will be appreciated that for a single operation, the machine reaction duration interval is set to 240ms-460ms, and when the time required for the process of moving from the start point to the button from the occurrence of the operation is 240ms-460ms, the operation is determined to be a machine simulation tendency;

when the machine simulates the mouse, the shortest path planning is needed according to the image, so that the reaction time is longer than that of manual operation.

Specifically, in step S3, when judging the movement speed of the mouse, the server decomposes the movement of the mouse into a coordinate system composed of mutually perpendicular coordinate axes, and judges the component speeds of the mouse in the directions of the single coordinate axes respectively;

for a single component speed, the server compares its corresponding first derivative with a preset speed error to determine the operational category of the single operation,

if all points in the first derivative of the single partial speed are in the interval corresponding to the preset speed error, the server judges that the operation type of the single operation corresponding to the partial speed is the machine operation.

Please refer to fig. 2, which is a schematic plan view of a mouse path according to an embodiment of the present invention, wherein a mouse cursor 1 moves in a test scene 6 from a start point 5 to a button 2, and if the movement is a mouse movement simulated by a machine, the machine will firstly plan a shortest path, namely a machine path 3, and then perform uniform motion; if the movement is a manual movement, it is limited by the physical mouse and mouse pad, and its manual path 4 is an irregular but head-to-tail determined curve.

Specifically, in step S3, a minimum manual speed change threshold is set in the server, and the preset speed error is a section formed from a negative value corresponding to the minimum manual speed change threshold to a positive value corresponding to the minimum manual speed change threshold;

By judging the movement speed of the mouse, the characteristics of the machine operation mouse are confirmed, and the identification capacity of the machine operation mouse is effectively improved, and meanwhile, the stability of malicious flow identification is further improved.

In practice, for the ith operation, it corresponds to a path curve Ki and a velocity function Vi (Ki) with reference to coordinates, and where Ki consists of coordinates (x, y), it will be appreciated that Vi (Ki) has a derivative at any point;

for machine operation, the motion of the machine moves at a uniform speed on a path, so that the derivative of Vi (Ki) at any point is 0, and the motion of the machine in the error can be approximate to the uniform motion by considering the error brought by the picture pixel;

for manual operation, the motion of which on the path is a random value, it will be appreciated that the movement of manual operation is related to the sensitivity of the mouse and the smoothness of the mouse pad, so that it varies irregularly in a single operation.

Referring to FIG. 3, a line diagram of the mouse speed according to an embodiment of the present invention is shown, in which the ordinate Vi is the speed of the mouse movement, and the abscissa isFor the (x, y) mode, it is understood that this is understood as the path, taking the total path length of 60mm as an example:

please refer to fig. 3 (a), which is a diagram illustrating a mouse speed line diagram of a machine operation according to an embodiment of the present invention;

since the machine is planned with the shortest path as the reference, the mouse will move at a constant speed after the initial acceleration and stop when reaching the target position.

Please refer to fig. 3 (b), which is a diagram illustrating a manually operated mouse speed line graph according to an embodiment of the present invention;

it is affected by factors of the physical mouse application scene, so its movement speed is irregular.

Specifically, in step S4, a preset adjustment ratio is set in the server, and if the server determines that the single operation corresponding to the manual operation is a machine operation, the server increases the maximum reaction time of the manual operation by the preset ratio;

if the server judges the single operation corresponding to the machine operation as manual operation, the server reduces the minimum manual speed change threshold by a preset proportion;

wherein the preset ratio is proportional to the sensitivity of the mouse.

The movement speed of the mouse and the judgment parameters of the reaction time length are adjusted through the test, so that the recognition accuracy is effectively improved, more application scenes can be compatible after training, and the stability of malicious flow recognition is further improved.

In practice, for a mouse with a DPI of 2200, the preset ratio may be set to 2%;

for a mouse with DPI of 3000, the preset ratio can be set to 3%;

for a mouse with a DPI of 4500, the preset ratio can be set to 5%;

it will be appreciated that the preset ratio may be adjusted according to the screen pixel, and may be set to any value, but when the set value is inappropriate, step S4 is repeated several additional times.

Specifically, in step S5, a threshold number of times of reproduction is set in the server, the server compares the number of times of reproduction with the threshold number of times of reproduction to determine the category of the machine flow,

if the number of the reproduction times is not greater than the threshold value of the number of the reproduction times, the server judges that the machine flow is normal flow;

if the number of the reproductions is greater than the threshold number of the reproductions, the server judges that the machine traffic is malicious traffic.

By judging the number of times of reproduction of the traffic, the accuracy of identifying the malicious traffic of the machine is effectively improved, and meanwhile, the concealment of the malicious traffic is reduced, so that the stability of identifying the malicious traffic is further improved.

In implementation, the number of times of reproduction may be set according to the flow that the server can carry, and when calculating the flow that the server can carry, the data capacity corresponding to a single operation may be converted from the carrying capacity of the server, and may be recorded as the number of times, for example: if the server bearer flow is 20 operations/second, the threshold of the number of times of reproduction may be set to 800 operations/minute; if the load flow of the server is 2000 operations/min, the threshold of the number of times of reproduction can be set to 100000 operations/h, and the threshold of the number of times of reproduction is set to be not more than 90% of the flow which can be loaded by the server;

it will be appreciated that if the data is important, the threshold number of reproductions may be set at 20% -30% of the server's able load flow.

Specifically, in step S5, the preset duration is proportional to the maximum load of the server and is not greater than 24 hours.

Specifically, in step S1, the test scene is a rectangular interface including at least one button, and the size of the button can be recognized by the machine.

Through the mode of simulating the use scene, the reliability of identifying the machine operation is effectively improved, and meanwhile, the stability of identifying malicious traffic is further improved.

Specifically, in step S2, if the mouse pointer does not jump from the departure point to the destination point through any path in a single operation, the server determines that the single operation is a machine operation.

By judging the special operation of the machine operation mouse, the mouse path which cannot be completed by manual operation is identified, so that the identification efficiency is effectively improved, and meanwhile, the stability of malicious flow identification is further improved.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is obviously not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features can be made by those skilled in the art manually without departing from the principle of the present invention, and the technical solutions after these modifications and substitutions will fall within the scope of the present invention.

The foregoing description is only of the preferred embodiments of the invention and is not intended to limit the invention; various modifications and variations of the present invention are possible for a person skilled in the art to manually make. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of machine learning to identify malicious traffic, comprising:

2. The method according to claim 1, wherein in the step S2, for the single operation, the corresponding mouse reaction time length is a measured time length for the mouse pointer to reach the target point in a continuous and smooth path from the start point of the preset distance from the target point, and the server compares the measured time length with the machine reaction time length interval to determine the operation tendency of the single operation;

3. The method for identifying malicious traffic by machine learning according to claim 2, wherein in the step S2, the machine reaction duration is a duration interval not less than a manual operation maximum reaction duration and not greater than a machine operation maximum reaction duration;

4. The method for recognizing malicious traffic by machine learning according to claim 3, wherein in the step S3, when judging the movement speed of the mouse, the server decomposes the movement of the mouse into a coordinate system composed of mutually perpendicular coordinate axes, and judges the component speeds thereof in the directions of the single coordinate axes, respectively;

5. The method for recognizing malicious traffic by machine learning according to claim 4, wherein in the step S3, a minimum manual speed change threshold is set in the server, and the preset speed error is a section formed from a negative value of the minimum manual speed change threshold to a positive value of the minimum manual speed change threshold;

6. The method for identifying malicious traffic by machine learning according to claim 5, wherein in the step S4, a preset adjustment ratio is set in the server, and if the server determines that a single operation corresponding to a manual operation is a machine operation, the server increases the maximum reaction duration of the manual operation by the preset ratio;

wherein the preset ratio is proportional to the sensitivity of the mouse.

7. The method for recognizing malicious traffic by machine learning according to claim 6, wherein in said step S5, a threshold number of times of reproduction is set in said server, and the server compares said number of times of reproduction with said threshold number of times of reproduction to determine a category of said machine traffic,

8. The method of machine learning to identify malicious traffic of claim 7, wherein in the step S5, the preset duration is proportional to a maximum load of the server and is equal to or less than 24 hours.

9. The method of machine learning to identify malicious traffic of claim 8, wherein in said step S1, said test scenario is a rectangular interface comprising at least one button, and the size of the button is machine identifiable.

10. The method of machine learning to identify malicious traffic as claimed in claim 9, wherein in said step S2, if said mouse pointer jumps from said departure point to said destination point without going through any path in said single operation, said server determines that the single operation is said machine operation.