CN117201078A - Malicious traffic detection method, malicious traffic detection device, computer equipment and storage medium - Google Patents

Malicious traffic detection method, malicious traffic detection device, computer equipment and storage medium Download PDF

Info

Publication number
CN117201078A
CN117201078A CN202310990231.0A CN202310990231A CN117201078A CN 117201078 A CN117201078 A CN 117201078A CN 202310990231 A CN202310990231 A CN 202310990231A CN 117201078 A CN117201078 A CN 117201078A
Authority
CN
China
Prior art keywords
string
binary
substring
sub
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310990231.0A
Other languages
Chinese (zh)
Inventor
张�浩
卜绪萌
侯鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202310990231.0A priority Critical patent/CN117201078A/en
Publication of CN117201078A publication Critical patent/CN117201078A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a malicious traffic detection method, a malicious traffic detection device, computer equipment and a storage medium. The method comprises the following steps: acquiring a plurality of binary character strings corresponding to malicious traffic respectively, and determining a first sub-character string with the occurrence frequency larger than a frequency threshold value from the binary character strings; for each binary string, searching similar sub-strings of the first sub-string from the binary string; determining a segmentation position from similar substrings of the first substring, and segmenting a binary character string to which the similar substring belongs at the segmentation position to obtain a segmented substring; selecting a second sub-string from the split sub-strings according to the occurrence frequency of the split sub-strings in the binary strings; a set of malicious strings is obtained based on the first substring and the second substring. By traversing each binary character string, the character string carrying the malicious traffic characteristics can be screened out more accurately, and the accuracy of malicious traffic identification is improved.

Description

Malicious traffic detection method, malicious traffic detection device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and apparatus for detecting malicious traffic, a computer device, and a storage medium.
Background
With the development of computer technology, in the process of online financial transaction, there may be a danger caused by malicious traffic to the security of the financial network, so, in order to discover the malicious traffic in the financial network in time, a method for identifying and judging the data traffic through the characteristics of the data traffic is presented.
In the prior art, feature extraction of data traffic is usually completed based on a neural network, but for data traffic with insufficiently sharp features, the neural network cannot accurately extract the features of the data traffic, so that the identification accuracy of malicious traffic is low.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a malicious traffic detection method, apparatus, computer device, and storage medium that can improve the accuracy of malicious traffic identification.
In a first aspect, the present application provides a malicious traffic detection method. The method comprises the following steps:
acquiring a plurality of binary character strings corresponding to malicious traffic respectively, and determining a first sub-character string with the occurrence frequency larger than a frequency threshold value from the binary character strings;
For each binary string, searching similar sub-strings of the first sub-string from the binary string;
determining a segmentation position from similar substrings of the first substring, and segmenting a binary character string to which the similar substring belongs at the segmentation position to obtain a segmented substring;
selecting a second sub-string from the split sub-strings according to the occurrence frequency of the split sub-strings in the binary strings;
obtaining a malicious character string set based on the first sub-character string and the second sub-character string; and the malicious character string set is used for detecting malicious traffic.
In one embodiment, determining from each binary string that the frequency of occurrence is greater than a frequency threshold comprises:
splitting each binary character string to obtain a plurality of first candidate sub character strings;
counting the occurrence frequency of each first candidate substring in each binary character string;
and selecting the first candidate substring with the current frequency larger than the frequency threshold value from the first candidate substrings as the first substring.
In one embodiment, the splitting each binary string to obtain a plurality of first candidate substrings includes:
Counting the number of binary character strings with characters at the positions being preset characters according to each position of each binary character string to obtain the number of character strings corresponding to the positions;
selecting positions with the number of character strings smaller than the preset number from the positions to obtain target positions;
for each binary character string, segmenting the binary character string at a target position to obtain a plurality of first candidate sub-character strings; the character string at the target position is not included in each first candidate substring.
In one embodiment, selecting a first candidate substring with a frequency of occurrence greater than a frequency threshold from the first candidate substrings, as the first substring includes:
determining the character length of each first candidate substring;
selecting candidate substrings with character length larger than or equal to a length threshold value from the first candidate substrings as second candidate substrings;
and selecting the second candidate substring with the current frequency larger than the frequency threshold value from the second candidate substrings as the first substring.
In one embodiment, searching for similar substrings of the first substring from the binary string comprises:
Determining the position and character length of the first substring;
determining similar substrings of the first substring from the binary string based on the position and the character length of the first substring; wherein the similar substrings of the first substring are identical to the first substring in position and length.
In one embodiment, determining the segmentation position from the similar substrings of the first substring comprises:
comparing the first substring with characters at the same position in the similar substring;
and determining the position of inconsistent character comparison results as a segmentation position.
In a second aspect, the application further provides a malicious traffic detection device. The device comprises:
the first string acquisition module is used for acquiring binary character strings corresponding to a plurality of malicious traffic respectively, and determining a first sub-character string with the occurrence frequency larger than a frequency threshold value from the binary character strings;
a similar string obtaining module, configured to find, for each binary string, a similar sub-string of the first sub-string from the binary strings;
the segmentation string acquisition module is used for determining segmentation positions from similar substrings of the first substring, and segmenting binary character strings to which the similar substrings belong at the segmentation positions to obtain segmentation substrings;
The second string acquisition module is used for selecting a second sub-string from the segmentation sub-strings according to the occurrence frequency of the segmentation sub-strings in the binary strings;
the malicious string acquisition module is used for acquiring a malicious string set based on the first sub-string and the second sub-string; and the malicious character string set is used for detecting malicious traffic.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
acquiring a plurality of binary character strings corresponding to malicious traffic respectively, and determining a first sub-character string with the occurrence frequency larger than a frequency threshold value from the binary character strings;
for each binary string, searching similar sub-strings of the first sub-string from the binary string;
determining a segmentation position from similar substrings of the first substring, and segmenting a binary character string to which the similar substring belongs at the segmentation position to obtain a segmented substring;
selecting a second sub-string from the split sub-strings according to the occurrence frequency of the split sub-strings in the binary strings;
Obtaining a malicious character string set based on the first sub-character string and the second sub-character string; and the malicious character string set is used for detecting malicious traffic.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring a plurality of binary character strings corresponding to malicious traffic respectively, and determining a first sub-character string with the occurrence frequency larger than a frequency threshold value from the binary character strings;
for each binary string, searching similar sub-strings of the first sub-string from the binary string;
determining a segmentation position from similar substrings of the first substring, and segmenting a binary character string to which the similar substring belongs at the segmentation position to obtain a segmented substring;
selecting a second sub-string from the split sub-strings according to the occurrence frequency of the split sub-strings in the binary strings;
obtaining a malicious character string set based on the first sub-character string and the second sub-character string; and the malicious character string set is used for detecting malicious traffic.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:
Acquiring a plurality of binary character strings corresponding to malicious traffic respectively, and determining a first sub-character string with the occurrence frequency larger than a frequency threshold value from the binary character strings;
for each binary string, searching similar sub-strings of the first sub-string from the binary string;
determining a segmentation position from similar substrings of the first substring, and segmenting a binary character string to which the similar substring belongs at the segmentation position to obtain a segmented substring;
selecting a second sub-string from the split sub-strings according to the occurrence frequency of the split sub-strings in the binary strings;
obtaining a malicious character string set based on the first sub-character string and the second sub-character string; and the malicious character string set is used for detecting malicious traffic.
According to the malicious traffic detection method, the malicious traffic detection device, the computer equipment and the storage medium, the first sub-character string is determined from the binary character strings according to the binary character strings corresponding to the acquired malicious traffic, the second sub-character string is further cut from the binary character strings based on the first sub-character string, and finally a malicious character string set is obtained according to the first sub-character string and the second sub-character string, so that the malicious character string set comprises more frequently-occurring character strings in the malicious traffic. Because the malicious character string set is used for detecting malicious traffic, the malicious traffic can be accurately identified based on the malicious character string set, and the accuracy of identifying the malicious traffic is improved.
Drawings
Fig. 1 is an application environment diagram of a malicious traffic detection method provided in this embodiment;
fig. 2 is a flow chart of a first malicious traffic detection method according to the present embodiment;
fig. 3 is a flowchart illustrating a step of determining a first character string according to the present embodiment;
fig. 4 is a flow chart of a second malicious traffic detection method according to the present embodiment;
fig. 5 is a block diagram of a first malicious traffic detection device according to the present embodiment;
fig. 6 is a block diagram of a second malicious traffic detection device according to the present embodiment;
fig. 7 is a block diagram of a third malicious traffic detection device according to the present embodiment;
fig. 8 is an internal structure diagram of a computer device according to the present embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The malicious traffic detection method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in FIG. 1. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing malicious traffic and malicious character string set data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a malicious traffic detection method.
In one embodiment, as shown in fig. 2, a malicious traffic detection method is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps:
s201, binary character strings corresponding to a plurality of malicious traffic are obtained, and a first sub-character string with the occurrence frequency larger than a frequency threshold value is determined from the binary character strings.
The malicious traffic may be data traffic with malicious behavior in the network traffic, and optionally, the malicious behavior may include a network attack, a service attack, a malicious crawler, and the like.
Wherein the binary string may be malicious traffic in binary form.
Wherein the frequency of occurrence is used to characterize the number of times the first substring occurs in each binary string.
Optionally, the server acquires a plurality of malicious traffic intercepted by history, and performs pruning processing on each malicious traffic, namely, removes invalid data (such as symbols and the like) from the malicious traffic to obtain the malicious traffic after pruning processing, and performs binary conversion on the malicious traffic after pruning processing to obtain binary character strings corresponding to the malicious traffic. Randomly segmenting the binary character strings to obtain a plurality of first candidate sub-character strings, determining the occurrence times of the first candidate sub-character strings in each binary character string, calculating the difference between the occurrence times and the number of the binary character strings, taking the difference as the occurrence frequency of the first candidate sub-character strings, and selecting the first candidate sub-character strings with the occurrence frequency larger than a frequency threshold value from all the first candidate sub-character strings as the first sub-character strings.
S202 searches for similar substrings of the first substring from the binary strings for each binary string.
Wherein the similar character string may be a character string in which the same character exists as the first sub-character string in each binary character string. Preferably, the similar substrings of the first substring may be identical to both the position and the length of the first substring.
Optionally, for each binary string, there are various ways to search for the similar substring of the first substring from the binary strings, which the present application is not limited to.
One alternative way may be to find out, for each binary string, a string having the same character as each first sub-string from the binary string, and when the number of characters of the same character is greater than the threshold value of the number of characters, use the string having the same character as a similar sub-string.
Alternatively, the position and character length of the first substring may be determined; a similar substring of the first substring is determined from the binary strings based on the position and character length of the first substring.
Specifically, for each first sub-string, the server searches the position identical to the position information from the binary strings according to the character length of the first string and the position information of the first character, starts to select the character string identical to the character length of the first string from the initial position to serve as the character string with the same length, determines the number of characters with different characters at the same position between the first string and the character string with the same length, and uses the character string with the same length, the number of which is smaller than the threshold value of the number of characters, as similar sub-strings.
Preferably, in order to better detect malicious traffic from data traffic, the number of strings containing different malicious traffic features in the malicious string set should be increased as much as possible, so on the basis of ensuring that the length of the string containing the malicious traffic features can perform accurate malicious traffic detection, preferably, only one string with different characters from the first string and the strings with the same length positions can be selected as a similar sub-string.
S203, determining a segmentation position from the similar substrings of the first substring, and segmenting the binary character string of the similar substring at the segmentation position to obtain a segmented substring.
Optionally, the manner of determining the segmentation position from the similar substring of the first substring may be that the server uses any position in the similar substring of the first substring as the segmentation position, or may also compare the first substring with a character in the same position in the similar substring; and determining the position of inconsistent character comparison results as a segmentation position. Specifically, the server compares the first sub-string with the characters at the same position in the similar sub-string, selects different positions of the characters from the similar sub-string, and determines the different positions of the characters as segmentation positions.
Optionally, after determining the segmentation position in the similar substring, the similar substring is segmented at the segmentation position to obtain at least two segmented substrings.
Illustratively, "#X" indicates the position of the first character in the string in the associated binary string. Assume that a first substring is "010000100010010101#20", similar substring" 010000100110010101#20", only the character at the 29 th bit is different between the first substring and the similar substring, so that the 29 th bit is used as a segmentation position, and the similar substring is segmented to obtain two segmentation substrings of '010000100#20' and '10010101#30'.
S204, selecting a second sub-string from the segmentation sub-strings according to the occurrence frequency of the segmentation sub-strings in the binary strings.
Optionally, for each sub-string, determining the number of occurrences of the sub-string at the same position in each binary string, calculating a difference between the number of occurrences and the number of binary strings, using the difference as the occurrence frequency of the sub-string, and selecting a sub-string with the occurrence frequency greater than a frequency threshold from all sub-strings as a second sub-string.
S205, obtaining a malicious character string set based on the first sub-character string and the second sub-character string.
Wherein, the malicious character string set can be a set of character strings containing malicious traffic characteristics. Alternatively, a set of malicious strings may be used for malicious traffic detection.
Optionally, the server constructs a character string set, adds the first sub-character string and the second sub-character string into the character string set, updates the character string set, and takes the updated character string set as a malicious character string set.
When the server detects the malicious traffic detection requirement, binary conversion is performed based on the acquired data traffic to be detected to obtain a data traffic binary string, the data traffic binary string is compared with each malicious string in the malicious string set, the data traffic binary string containing the malicious string is selected, and the data traffic corresponding to the data traffic binary string is determined to be the malicious traffic.
According to the malicious traffic detection method, the first sub-strings are determined from the binary strings according to the binary strings corresponding to the acquired malicious traffic, the second sub-strings are further separated from the binary strings based on the first sub-strings, and finally a malicious string set is obtained according to the first sub-strings and the second sub-strings, so that the malicious string set comprises more frequently-occurring strings in the malicious traffic. Because the malicious character string set is used for detecting malicious traffic, the malicious traffic can be accurately identified based on the malicious character string set, and the accuracy of identifying the malicious traffic is improved. In addition, as the malicious character string set is used for detecting malicious traffic, the malicious traffic can be rapidly identified through the malicious character string set, the time consumed for identifying the malicious traffic is reduced, and the efficiency of identifying the malicious traffic is improved.
FIG. 3 is a flow chart illustrating a first sub-string determination process according to an embodiment. In order to ensure the accuracy of the obtained first character string, on the basis of the above embodiment, the present embodiment provides an optional manner for determining the first sub-character string in detail, which includes the following steps:
s301, segmenting each binary character string to obtain a plurality of first candidate substrings.
Optionally, the method for splitting each binary string to obtain a plurality of first candidate sub-strings is various, which may be to split each binary string at random positions directly to obtain at least two first candidate sub-strings, or count, for each position of each binary string, the number of binary strings with characters at the position being preset characters to obtain the number of strings corresponding to the position; selecting positions with the number of character strings smaller than the preset number from the positions to obtain target positions; and aiming at each binary character string, segmenting the binary character string at a target position to obtain a plurality of first candidate substrings. Wherein, each first candidate sub-character string does not comprise the character string at the target position. Specifically, for each position in each binary string, the server traverses the corresponding character in the position, counts the number of binary strings with the character in the position being a preset character, takes the number of binary strings as the number of strings corresponding to the position, and takes the position with the number of strings smaller than the preset number as a target position. For each target position, the server also cuts each binary string by taking the target position as a cutting mark to obtain at least two first candidate sub-strings by cutting the binary string.
For example, assuming that the preset character is 0, the preset number is 2, there is a binary character string 1 of "010101", a binary character string 2 of "000000", a binary character string 3 of "111111", a binary character string 4 of "101010", and a binary character string 5 of "011011". Then, for the first character in each binary character string, the first character of binary character string 1, binary character string 2, and binary character string 5 is 0, so the number of character strings of the first character is 3, and therefore, the number of character strings of the first character is not less than the preset number, and the first character is not the target position; similarly, for the second character in each binary character string, the number of the character strings of the second character is 2, so that the number of the character strings of the second character is not less than the preset number, and the second character is not the target position; for the third character in each binary character string, the number of the character strings of the third character is 2, so that the number of the character strings of the third character is not less than the preset number, and the third character is not the target position; for the second character in each quaternary character string, the number of the character strings of the fourth character is 3, so that the number of the character strings of the fourth character is not less than the preset number, and the fourth character is not the target position; for the second character in each of the five-system character strings, the number of the character strings of the fifth character is 1, so that the number of the character strings of the fifth character is smaller than the preset number, and the fifth character is a target position; for the second character in each of the six-system character strings, the number of character strings of the sixth character is 2, and therefore, the number of character strings of the sixth character is not less than the preset number, and the sixth character is not the target position. In summary, only the fifth character is the target position, and therefore, the segmentation is performed from the fifth character for each binary string, i.e., the 1 st-4 th character is reserved as a first candidate substring, and the sixth character is also reserved as a first candidate substring.
It should be noted that, since the target positions in each binary string are the same, the same first candidate sub-strings may exist, and therefore, in this embodiment, after determining the first candidate sub-strings, the deduplication process may be performed on all the first candidate sub-strings, that is, only one candidate is reserved for a plurality of identical first candidate sub-strings.
S302, counting occurrence frequencies of the first candidate substrings in the binary character strings respectively.
Alternatively, for each first candidate sub-string, the server may determine a partial string having the same position as each character in the first candidate sub-string from among the binary strings, count the number of occurrences of each first candidate sub-string in each binary string, calculate a difference between the number of occurrences and the number of binary strings, and use the difference as the frequency of occurrence of the first candidate sub-string.
S303, selecting a first candidate substring with the current frequency being greater than a frequency threshold value from the first candidate substrings as the first substring.
Alternatively, the method of selecting the first candidate substring with the occurrence frequency greater than the frequency threshold from the first candidate substrings as the first substring may be that the server directly selects the first candidate substring with the occurrence frequency greater than the frequency threshold from all the first candidate substrings as the first substring. The server may determine the character length of each first candidate substring; selecting candidate substrings with character length larger than or equal to a length threshold value from the first candidate substrings as second candidate substrings; and selecting the second candidate substring with the current frequency larger than the frequency threshold value from the second candidate substrings as the first substring. Specifically, for the first candidate sub-strings, the server determines the character length of the first candidate sub-strings, selects candidate sub-strings with the character length greater than or equal to a length threshold value from the first candidate sub-strings as second candidate sub-strings, determines the occurrence frequency of the second candidate sub-strings for each second candidate sub-string, and selects second candidate sub-strings with the occurrence frequency greater than a frequency threshold value from all the second candidate sub-strings as first sub-strings.
According to the method for determining the first sub-character strings, the first candidate sub-character strings are obtained by cutting the binary character strings, and the first sub-character strings are determined according to the occurrence frequency of the first candidate sub-character strings in the binary character strings.
In one embodiment, this embodiment provides an alternative way of malicious traffic detection, and the method is applied to a server for example and is described. As shown in fig. 4, the method comprises the steps of:
s401, acquiring a plurality of binary character strings corresponding to malicious traffic respectively, counting the number of binary character strings with characters at the positions being preset characters according to each position of each binary character string, and obtaining the number of character strings corresponding to the positions.
S402, selecting positions with the number of character strings smaller than the preset number from the positions to obtain target positions.
S403, aiming at each binary character string, segmenting the binary character string at a target position to obtain a plurality of first candidate sub-character strings; the character string at the target position is not included in each first candidate substring.
S404, counting occurrence frequencies of the first candidate substrings in the binary character strings respectively.
S405 determines the respective character lengths of the first candidate substrings.
S406, selecting the candidate substring with the character length being greater than or equal to the length threshold value from the first candidate substring as each second candidate substring.
S407 selects, as the first substring, a second candidate substring whose current frequency is greater than the frequency threshold from among the second candidate substrings.
S408 determines the position and character length of the first substring.
S409, determining similar substrings of the first substring from the binary character strings based on the position and the character length of the first substring; wherein the similar substrings of the first substring are identical to the first substring in position and length.
S410 compares the first substring with characters in the same position in similar substrings.
S411, determining the position of inconsistent character comparison results as a segmentation position, and segmenting the binary character string to which the similar sub-character string belongs at the segmentation position to obtain a segmented sub-character string.
S412, selecting a second sub-string from the segmentation sub-strings according to the occurrence frequency of the segmentation sub-strings in the binary strings.
S413, obtaining a malicious character string set based on the first sub-character string and the second sub-character string; and the malicious character string set is used for detecting malicious traffic.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a malicious traffic detection device for realizing the malicious traffic detection method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation of the embodiment of one or more malicious traffic detection devices provided below may be referred to the limitation of the malicious traffic detection method hereinabove, and will not be described herein.
In one embodiment, as shown in fig. 5, there is provided a malicious traffic detection apparatus 1, including: a first string acquisition module 10, a similar string acquisition module 11, a split string acquisition module 12, a second string acquisition module 13, and a malicious string acquisition module 14, wherein:
a first string obtaining module 10, configured to obtain binary strings corresponding to a plurality of malicious traffic, and determine a first sub-string whose occurrence frequency is greater than a frequency threshold from the binary strings;
a similar string obtaining module 11, configured to find, for each binary string, a similar sub-string of the first sub-string from the binary strings;
the segmentation string acquisition module 12 is configured to determine a segmentation position from similar substrings of the first substring, segment a binary string to which the similar substring belongs at the segmentation position, and obtain a segmentation substring;
a second string obtaining module 13, configured to select a second sub-string from the split sub-strings according to occurrence frequencies of the split sub-strings in the binary strings;
a malicious string obtaining module 14, configured to obtain a malicious string set based on the first sub-string and the second sub-string; and the malicious character string set is used for detecting malicious traffic.
In one embodiment, as shown in fig. 6, the first string acquisition module 10 in fig. 5 includes:
a candidate string obtaining unit 100, configured to segment each binary string to obtain a plurality of first candidate sub-strings;
a frequency statistics unit 101, configured to count occurrence frequencies of the first candidate substrings in the binary strings, respectively;
the first substring determining unit 102 is configured to select, from the first candidate substrings, a first candidate substring whose current frequency is greater than a frequency threshold as the first substring.
In one embodiment, the candidate string acquisition unit 100 in fig. 6 includes:
the number acquisition subunit is used for counting the number of binary character strings with characters at the positions being preset characters according to each position of each binary character string to obtain the number of character strings corresponding to the positions;
a target position determining subunit, configured to select positions with the number of character strings smaller than a preset number from the positions, so as to obtain target positions;
a candidate substring obtaining subunit, configured to segment, for each binary string, the binary string at a target position, to obtain a plurality of first candidate substrings; the character string at the target position is not included in each first candidate substring.
In one embodiment, determining the first string unit 102 in fig. 6 includes:
a length determining subunit, configured to determine a respective character length of each first candidate substring;
a second candidate determining subunit, configured to select, from the first candidate substrings, a candidate substring with a character length greater than or equal to a length threshold, as each second candidate substring;
and the first substring determination subunit is used for selecting a second candidate substring with the current frequency being greater than the frequency threshold value from the second candidate substrings as the first substring.
In one embodiment, as shown in fig. 7, the segmentation string acquisition module 12 in fig. 5 includes:
a position length determining unit 120 for determining a position and a character length of the first substring;
a similar string determining unit 121 for determining similar substrings of the first substring from the binary character strings based on the position and character length of the first substring; wherein the similar substrings of the first substring are identical to the first substring in position and length.
In one embodiment, the segmentation string obtaining module 12 in fig. 5 is specifically configured to compare the first substring with the characters at the same position in the similar substring; and determining the position of inconsistent character comparison results as a segmentation position.
The various modules in the malicious traffic detection device described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 8. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program, when executed by a processor, implements a malicious traffic detection method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in FIG. 8 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:
acquiring a plurality of binary character strings corresponding to malicious traffic respectively, and determining a first sub-character string with the occurrence frequency larger than a frequency threshold value from the binary character strings;
for each binary string, searching similar sub-strings of the first sub-string from the binary string;
determining a segmentation position from similar substrings of the first substring, and segmenting a binary character string to which the similar substring belongs at the segmentation position to obtain a segmented substring;
selecting a second sub-string from the split sub-strings according to the occurrence frequency of the split sub-strings in the binary strings;
Obtaining a malicious character string set based on the first sub-character string and the second sub-character string; and the malicious character string set is used for detecting malicious traffic.
In one embodiment, the processor when executing the computer program further performs the steps of:
splitting each binary character string to obtain a plurality of first candidate sub character strings;
counting the occurrence frequency of each first candidate substring in each binary character string;
and selecting the first candidate substring with the current frequency larger than the frequency threshold value from the first candidate substrings as the first substring.
In one embodiment, the processor when executing the computer program further performs the steps of:
counting the number of binary character strings with characters at the positions being preset characters according to each position of each binary character string to obtain the number of character strings corresponding to the positions;
selecting positions with the number of character strings smaller than the preset number from the positions to obtain target positions;
for each binary character string, segmenting the binary character string at a target position to obtain a plurality of first candidate sub-character strings; the character string at the target position is not included in each first candidate substring.
In one embodiment, the processor when executing the computer program further performs the steps of:
Determining the character length of each first candidate substring;
selecting candidate substrings with character length larger than or equal to a length threshold value from the first candidate substrings as second candidate substrings;
and selecting the second candidate substring with the current frequency larger than the frequency threshold value from the second candidate substrings as the first substring.
In one embodiment, the processor when executing the computer program further performs the steps of:
determining the position and character length of the first substring;
determining similar substrings of the first substring from the binary string based on the position and the character length of the first substring; wherein the similar substrings of the first substring are identical to the first substring in position and length.
In one embodiment, the processor when executing the computer program further performs the steps of:
comparing the first substring with characters at the same position in the similar substring;
and determining the position of inconsistent character comparison results as a segmentation position.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:
Acquiring a plurality of binary character strings corresponding to malicious traffic respectively, and determining a first sub-character string with the occurrence frequency larger than a frequency threshold value from the binary character strings;
for each binary string, searching similar sub-strings of the first sub-string from the binary string;
determining a segmentation position from similar substrings of the first substring, and segmenting a binary character string to which the similar substring belongs at the segmentation position to obtain a segmented substring;
selecting a second sub-string from the split sub-strings according to the occurrence frequency of the split sub-strings in the binary strings;
obtaining a malicious character string set based on the first sub-character string and the second sub-character string; and the malicious character string set is used for detecting malicious traffic.
In one embodiment, the processor when executing the computer program further performs the steps of:
splitting each binary character string to obtain a plurality of first candidate sub character strings;
counting the occurrence frequency of each first candidate substring in each binary character string;
and selecting the first candidate substring with the current frequency larger than the frequency threshold value from the first candidate substrings as the first substring.
In one embodiment, the processor when executing the computer program further performs the steps of:
counting the number of binary character strings with characters at the positions being preset characters according to each position of each binary character string to obtain the number of character strings corresponding to the positions;
selecting positions with the number of character strings smaller than the preset number from the positions to obtain target positions;
for each binary character string, segmenting the binary character string at a target position to obtain a plurality of first candidate sub-character strings; the character string at the target position is not included in each first candidate substring.
In one embodiment, the processor when executing the computer program further performs the steps of:
determining the character length of each first candidate substring;
selecting candidate substrings with character length larger than or equal to a length threshold value from the first candidate substrings as second candidate substrings;
and selecting the second candidate substring with the current frequency larger than the frequency threshold value from the second candidate substrings as the first substring.
In one embodiment, the processor when executing the computer program further performs the steps of:
determining the position and character length of the first substring;
Determining similar substrings of the first substring from the binary string based on the position and the character length of the first substring; wherein the similar substrings of the first substring are identical to the first substring in position and length.
In one embodiment, the processor when executing the computer program further performs the steps of:
comparing the first substring with characters at the same position in the similar substring;
and determining the position of inconsistent character comparison results as a segmentation position.
In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:
acquiring a plurality of binary character strings corresponding to malicious traffic respectively, and determining a first sub-character string with the occurrence frequency larger than a frequency threshold value from the binary character strings;
for each binary string, searching similar sub-strings of the first sub-string from the binary string;
determining a segmentation position from similar substrings of the first substring, and segmenting a binary character string to which the similar substring belongs at the segmentation position to obtain a segmented substring;
Selecting a second sub-string from the split sub-strings according to the occurrence frequency of the split sub-strings in the binary strings;
obtaining a malicious character string set based on the first sub-character string and the second sub-character string; and the malicious character string set is used for detecting malicious traffic.
In one embodiment, the processor when executing the computer program further performs the steps of:
splitting each binary character string to obtain a plurality of first candidate sub character strings;
counting the occurrence frequency of each first candidate substring in each binary character string;
and selecting the first candidate substring with the current frequency larger than the frequency threshold value from the first candidate substrings as the first substring.
In one embodiment, the processor when executing the computer program further performs the steps of:
counting the number of binary character strings with characters at the positions being preset characters according to each position of each binary character string to obtain the number of character strings corresponding to the positions;
selecting positions with the number of character strings smaller than the preset number from the positions to obtain target positions;
for each binary character string, segmenting the binary character string at a target position to obtain a plurality of first candidate sub-character strings; the character string at the target position is not included in each first candidate substring.
In one embodiment, the processor when executing the computer program further performs the steps of:
determining the character length of each first candidate substring;
selecting candidate substrings with character length larger than or equal to a length threshold value from the first candidate substrings as second candidate substrings;
and selecting the second candidate substring with the current frequency larger than the frequency threshold value from the second candidate substrings as the first substring.
In one embodiment, the processor when executing the computer program further performs the steps of:
determining the position and character length of the first substring;
determining similar substrings of the first substring from the binary string based on the position and the character length of the first substring; wherein the similar substrings of the first substring are identical to the first substring in position and length.
In one embodiment, the processor when executing the computer program further performs the steps of:
comparing the first substring with characters at the same position in the similar substring;
and determining the position of inconsistent character comparison results as a segmentation position.
The data (including, but not limited to, data for analysis, data stored, data displayed, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (10)

1. A method for detecting malicious traffic, the method comprising:
acquiring a plurality of binary character strings corresponding to malicious traffic respectively, and determining a first sub-character string with occurrence frequency larger than a frequency threshold value from the binary character strings;
for each binary string, searching similar sub-strings of the first sub-string from the binary string;
Determining a segmentation position from similar substrings of the first substring, and segmenting a binary character string to which the similar substring belongs at the segmentation position to obtain a segmentation substring;
selecting a second sub-string from the segmentation sub-strings according to the occurrence frequency of the segmentation sub-strings in the binary strings;
obtaining a malicious character string set based on the first sub-character string and the second sub-character string; the malicious character string set is used for detecting malicious traffic.
2. The method of claim 1, wherein said determining a first substring from each of said binary strings having a frequency of occurrence greater than a frequency threshold comprises:
dividing each binary character string to obtain a plurality of first candidate substrings;
counting the occurrence frequency of each first candidate substring in each binary character string;
and selecting the first candidate substring with the current frequency larger than a frequency threshold value from the first candidate substrings as the first substring.
3. The method of claim 2, wherein the splitting each of the binary strings to obtain a plurality of first candidate substrings comprises:
Counting the number of the binary character strings with the characters at the positions being preset characters according to each position of each binary character string to obtain the number of the character strings corresponding to the positions;
selecting positions with the number of the character strings smaller than the preset number from the positions to obtain target positions;
for each binary character string, segmenting the binary character string at the target position to obtain a plurality of first candidate substrings; the character string at the target position is not included in each of the first candidate substrings.
4. The method of claim 2, wherein selecting a first candidate substring from the first candidate substrings having a occurrence frequency greater than a frequency threshold value as the first substring comprises:
determining the respective character length of each first candidate substring;
selecting the candidate substrings with the character length larger than or equal to a length threshold value from the first candidate substrings as second candidate substrings;
and selecting a second candidate substring with the current frequency being greater than a frequency threshold value from the second candidate substrings as a first substring.
5. The method of claim 1, wherein the searching for similar substrings of the first substring from the binary string comprises:
determining the position and character length of the first sub-character string;
determining similar substrings of the first substring from the binary string based on the position and the character length of the first substring; wherein the similar substrings of the first substring are consistent with the first substring in position and length.
6. The method of claim 5, wherein determining a cut location from a similar substring of the first substring comprises:
comparing the first sub-character string with characters at the same position in the similar sub-character string;
and determining the position of inconsistent character comparison results as a segmentation position.
7. A malicious traffic detection apparatus, the apparatus comprising:
the first string acquisition module is used for acquiring a plurality of binary character strings corresponding to malicious traffic respectively, and determining a first sub-character string with the occurrence frequency larger than a frequency threshold value from the binary character strings;
a similar string obtaining module, configured to find, for each binary string, a similar sub-string of the first sub-string from the binary string;
The segmentation string acquisition module is used for determining segmentation positions from similar substrings of the first substring, and segmenting binary character strings of the similar substrings at the segmentation positions to obtain segmentation substrings;
a second string obtaining module, configured to select a second sub-string from the split sub-strings according to occurrence frequencies of the split sub-strings in the binary strings;
the malicious string acquisition module is used for acquiring a malicious string set based on the first sub-string and the second sub-string; the malicious character string set is used for detecting malicious traffic.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
CN202310990231.0A 2023-08-08 2023-08-08 Malicious traffic detection method, malicious traffic detection device, computer equipment and storage medium Pending CN117201078A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310990231.0A CN117201078A (en) 2023-08-08 2023-08-08 Malicious traffic detection method, malicious traffic detection device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310990231.0A CN117201078A (en) 2023-08-08 2023-08-08 Malicious traffic detection method, malicious traffic detection device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117201078A true CN117201078A (en) 2023-12-08

Family

ID=88987758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310990231.0A Pending CN117201078A (en) 2023-08-08 2023-08-08 Malicious traffic detection method, malicious traffic detection device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117201078A (en)

Similar Documents

Publication Publication Date Title
WO2019128529A1 (en) Url attack detection method and apparatus, and electronic device
CN110738577B (en) Community discovery method, device, computer equipment and storage medium
CN111339293B (en) Data processing method and device for alarm event and classifying method for alarm event
US11368901B2 (en) Method for identifying a type of a wireless hotspot and a network device thereof
TW201730757A (en) Character string distance calculation method and device
CN112148305A (en) Application detection method and device, computer equipment and readable storage medium
CN110888880A (en) Proximity analysis method, device, equipment and medium based on spatial index
CN113297269A (en) Data query method and device
CN113282799B (en) Node operation method, node operation device, computer equipment and storage medium
CN112347100B (en) Database index optimization method, device, computer equipment and storage medium
CN116226681B (en) Text similarity judging method and device, computer equipment and storage medium
CN116911867A (en) Problem processing method, device, computer equipment and storage medium
CN117201078A (en) Malicious traffic detection method, malicious traffic detection device, computer equipment and storage medium
CN112347477A (en) Family variant malicious file mining method and device
CN112579839B (en) Multi-mode matching method and device for large-scale features and storage medium
CN117851959B (en) FHGS-based dynamic network subgraph anomaly detection method, device and equipment
CN116578583B (en) Abnormal statement identification method, device, equipment and storage medium
CN116257837B (en) Application system login method and device, computer equipment and storage medium
CN115794807A (en) Data updating method, device, equipment, storage medium and computer program product
CN116881116A (en) Interface test method, apparatus, computer device, storage medium, and program product
CN116932677A (en) Address information matching method, device, computer equipment and storage medium
CN115576965A (en) POI database updating method and device, electronic equipment and storage medium
CN116186583A (en) Data processing method, device, computer equipment and storage medium
CN117113342A (en) Application identification method, device, computer equipment and storage medium
CN116187975A (en) Method and device for detecting running state of equipment, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination