CN114338601A - Unknown domain name identification method, computer equipment and storage medium - Google Patents

Unknown domain name identification method, computer equipment and storage medium Download PDF

Info

Publication number
CN114338601A
CN114338601A CN202011062802.7A CN202011062802A CN114338601A CN 114338601 A CN114338601 A CN 114338601A CN 202011062802 A CN202011062802 A CN 202011062802A CN 114338601 A CN114338601 A CN 114338601A
Authority
CN
China
Prior art keywords
domain name
network access
unknown
determining
terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011062802.7A
Other languages
Chinese (zh)
Inventor
宋科
李华光
刘西亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN202011062802.7A priority Critical patent/CN114338601A/en
Publication of CN114338601A publication Critical patent/CN114338601A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention provides an unknown domain name identification method, unknown domain name identification equipment and an unknown domain name storage medium, and belongs to the technical field of internet. The method comprises the following steps: the method comprises the steps of obtaining network access records of a plurality of user terminals, wherein the network access records comprise domain names, and the network access record of at least one user terminal comprises an unknown domain name; determining a domain name associated with the unknown domain name according to the network access records of the plurality of user terminals; and storing and/or outputting the domain name associated with the unknown domain name so as to update a DPI characteristic library according to the associated domain name, wherein the DPI characteristic library is used for identifying the application category of the unknown domain name. The technical scheme of the embodiment of the invention solves the problem that the unknown domain name cannot be identified due to insufficient information, and improves the identification rate of the DPI feature library to the unknown domain name.

Description

Unknown domain name identification method, computer equipment and storage medium
Technical Field
The invention relates to the technical field of internet information, in particular to an unknown domain name identification method, computer equipment and a storage medium.
Background
In the mobile internet, an operator identifies the protocol or application information in the user internet volume through the DPI device, thereby realizing the functions of statistics, QoS, speed limit, blocking, charging, analysis and the like based on the protocol or application. In recent years, with the large-scale popularization of 4G, new websites and new applications are in a variety, DPI devices identify websites and application traffic based on a protocol application feature library, and if the DPI feature library is not updated timely, the traffic identification rate is reduced, so that the requirements of telecom operators on the accurate and timely processing and analysis of the new websites and the new applications in the network cannot be met.
Disclosure of Invention
The embodiment of the invention provides an unknown domain name identification method, computer equipment and a storage medium, and aims to improve the identification rate of a DPI feature library on an unknown domain name.
In a first aspect, an embodiment of the present invention provides an unknown domain name identification method, including:
the method comprises the steps of obtaining network access records of a plurality of user terminals, wherein the network access records comprise domain names, and the network access record of at least one user terminal comprises an unknown domain name;
determining a domain name associated with the unknown domain name according to the network access records of the plurality of user terminals;
and storing and/or outputting the domain name associated with the unknown domain name so as to update a DPI characteristic library according to the associated domain name, wherein the DPI characteristic library is used for identifying the application category of the unknown domain name.
In a second aspect, the embodiment of the present invention further provides a computer device, which includes a processor, a memory, a computer program stored on the memory and executable by the processor, and a data bus for implementing connection communication between the processor and the memory, wherein when the computer program is executed by the processor, the steps of any one of the unknown domain name recognition methods provided in the present specification are implemented.
In a third aspect, an embodiment of the present invention further provides a storage medium for a computer-readable storage, where the storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement the steps of any method for unknown domain name identification as provided in the present specification.
The embodiment of the invention provides an unknown domain name identification method, computer equipment and a storage medium, wherein network access records of a plurality of user terminals are obtained, the network access records comprise known domain names, and the network access record of at least one user terminal comprises the unknown domain name; determining a known domain name associated with the unknown domain name according to the network access records of the plurality of user terminals; and storing and/or outputting the known domain name associated with the unknown domain name so as to update a DPI characteristic library according to the known domain name, wherein the DPI characteristic library is used for identifying the application category of the unknown domain name. The problem that the unknown domain name cannot be identified due to insufficient information is solved, and the identification rate of the DPI feature library to the unknown domain name is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of an unknown domain name identification method according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of a structure of a computer device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
The embodiment of the invention provides an unknown domain name identification method, computer equipment and a storage medium. The unknown domain name identification method can be applied to a mobile terminal, and the mobile terminal can be an electronic device such as a tablet computer, a notebook computer and a desktop computer.
Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Referring to fig. 1, fig. 1 is a flowchart illustrating an unknown domain name method according to an embodiment of the present invention.
As shown in fig. 1, the unknown domain name method includes steps S101 to S103.
Step S101, network access records of a plurality of user terminals are obtained, wherein the network access records comprise domain names, and the network access records of at least one user terminal comprise unknown domain names.
Illustratively, the network access record of as many user terminals as possible is obtained, which may be the network access record of hundreds of user terminals, thousands of user terminals, or even tens of thousands of user terminals.
For example, the network access records of multiple user terminals in one network segment may be obtained, or the network access records of multiple user terminals in multiple network segments may be randomly obtained.
For example, the network access records of multiple user terminals may be obtained according to the load condition, or only the network access records of one or more process-managed user terminals may be obtained.
Illustratively, network access records of a plurality of user terminals are obtained over a period of time to increase the cardinality of the domain name in the network access record.
For example, if the obtained network access record includes the unknown domain name, the obtaining of the network access record of the user terminal is suspended. If the unknown domain name is not acquired, the network access records are continuously acquired for a period of time until the unknown domain name is acquired or the acquisition time is over.
Illustratively, the domain name and the unknown domain are non-primary domain names, and may be secondary domain names or lower level domain names and/or sub-domain names.
For example, in domain name "www.xliemgne.com," com "is a primary domain name," xliemene.com "is a secondary domain name," www.xliemgne.com "is a tertiary domain name, which is also referred to as a sub-domain name of the secondary domain name.
In some embodiments, the obtaining network access records of a plurality of user terminals, the network access records including domain names, and the network access record of at least one user terminal including an unknown domain name, includes: the method comprises the steps of obtaining network access records of a plurality of user terminals, wherein the network access records comprise known domain names, and the network access record of at least one user terminal comprises an unknown domain name; the determining the domain name associated with the unknown domain name according to the network access records of the plurality of user terminals comprises: determining a known domain name associated with the unknown domain name according to the network access records of the plurality of user terminals; the storing and/or outputting the domain name associated with the unknown domain name comprises: the storing and/or outputting of the known domain name associated with the unknown domain name.
Illustratively, the user terminal should access the network address when accessing the network, and the related domain name information can be obtained from the network address.
Illustratively, the network access record of the user terminal includes one or more domain names, wherein the domain name may include one or more known domain names and may also include one or more unknown domain names.
In some embodiments, the obtaining the network access records of the plurality of user terminals includes: acquiring domain name information accessed by the user terminal; if the domain name information accessed by the user terminal comprises the unknown domain name, generating a corresponding network access record according to the domain name information accessed by the user terminal, wherein the network access record of the user terminal comprises the known domain name and the unknown domain name; and if the domain name information accessed by the user terminal does not comprise the unknown domain name, generating a corresponding network access record according to the domain name information accessed by the user terminal, wherein the network access record comprises the known domain name. .
Illustratively, the network access record of the user terminal includes only one or more known domain names, and the network access record generated by the corresponding user terminal should include only known domain names.
Illustratively, the network access record of at least one user terminal in the network access records of the plurality of user terminals includes at least one unknown domain name.
Illustratively, the network access record may further include HTTP and HTTPs traffic, and the HOST domain name and the SNI domain name are extracted according to the HTTP and HTTPs, respectively.
For example, the HOST domain name extracted by HTTP refers to the domain name part in the HOST field value extracted from request information such as GET or POST of HTTP, for example, in "GET/HTTP/1.1 \ r \ nHost: www.xliemgne.com:8080\ r \ n \ r \ n", the HOST field is "www.xliemgne.com: 8080", and the domain name of the HOST field is "www.xliemgne.com: 8080"
For example, the SNI domain name extracted by HTTPS means that the domain name in the SNI field value is extracted from the ClientHello message of HTTPS.
Illustratively, the HOST domain name and SNI domain name may be extracted based on access time.
Illustratively, the network access records are stored and operated in different forms in order to reserve storage space and save memory resources.
In some embodiments, if the obtained network access record of the user terminal exceeds the maximum storage threshold, determining the network access record to be retained according to the obtaining sequence.
For example, the obtaining order may be determined according to the access time of the network access record, for example, if the network access record is obtained for 5 days continuously, the network access record exceeds the maximum storage threshold, and the network access record obtained on the current day is replaced with the network access record obtained 5 days ago.
For example, the maximum storage threshold may be preset so that the network access record does not occupy too much memory resources.
In other embodiments, the corresponding hash value obtained by the hash algorithm according to the longer domain name in the network access record is stored in the form of a hash value.
In other embodiments, different domain names are numbered, and when the network access record is obtained and saved, only the number value needs to be saved, for example, if the network access record of the terminal of the user a is "www.aabb.com" number 1, "www.bbdd.com" number 2, "www.aacc.com" number 3, "www.bbcc.com" number 4, "www.aadd.com" number 5, and if the network access record of the terminal of the user a is "www.aabb.com, www.aacc.com, www.bbcc.com," the network access record of the terminal of the user B is "www.aacc.com, www.bbdd.com, www.aadd.com," the network access record of the terminal of the user a may be saved: "1, 3, 4", user B terminal's network access record "3, 2, 5".
By acquiring and storing the data in the system in a simple manner, the operation speed of the system can be maintained, and the excessive occupation of memory resources can be avoided.
And step S102, determining the domain name associated with the unknown domain name according to the network access records of the plurality of user terminals.
For example, the network access record of a certain user terminal may not have the same domain name, such as the obtained network access record of the user a terminal: "www.aabb.com, www.aacc.com, www.bbcc.com".
For example, the domain names in the network access records of different user terminals may be the same, such as the obtained network access record of the user B terminal: "www.aacc.com, www.bbdd.com, www.aadd.com".
For example, if the same unknown domain name appears in the network access records of multiple user terminals, and the same domain name appears, the same domain name and the unknown domain name may be considered to be associated with each other. The incidence relation can be the same or similar source and application type.
In some embodiments, the determining the domain name associated with the unknown domain name from the network access records of the plurality of user terminals comprises: determining a first terminal according to the terminal including the unknown domain name in the network access record, and determining a second terminal according to the terminal not including the unknown domain name in the network access record; generating a first domain name set according to the network access record of the first terminal; generating a second domain name set according to the network access record of the second terminal; and determining the domain name associated with the unknown domain name according to the first domain name set and the second domain name set.
Illustratively, the first terminal is determined according to the terminal including the unknown domain name in the network access record, and the second terminal is determined according to the terminal not including the unknown domain name in the network access record. It will be appreciated that the first terminal and the second terminal each comprise a number of user terminals.
For example, if the network access record of the user terminal includes an unknown domain name, the corresponding user terminal is determined as the first terminal, and it can be understood that a terminal that does not include the unknown domain name in the network access record is determined as the second terminal.
Illustratively, the identifier of the user terminal is determined according to the network access record of the user terminal, and the user terminal is determined to be the first terminal or the second terminal according to the identifier.
For example, according to the fact that the network access record of the user terminal includes an unknown domain name, the terminal identifier is determined to be 1, and according to the fact that the network access record of the user terminal does not include an unknown domain name, the terminal identifier is determined to be 0.
Exemplarily, the terminal is determined as the first terminal according to the terminal identifier of fountain flag 1; and determining the terminal as the second terminal according to the terminal identification as FoundFlag-0.
Illustratively, after the plurality of user terminals are divided into a first terminal and a second terminal, a first domain name set is generated according to the network access records of the plurality of user terminals belonging to the first terminal, and it can be understood that a second domain name set is generated according to the network access records of the plurality of user terminals belonging to the second terminal.
Illustratively, the first domain name set may be a sum of network access records of a plurality of user terminals belonging to the first terminal.
In some embodiments, the generating a first domain name set according to the network access record of the first terminal and generating a second domain name set according to the network access record of the second terminal includes: and extracting non-repeated domain names from the network access records of the plurality of first terminals to generate the first domain name set, and extracting non-repeated domain names from the network access records of the plurality of second terminals to generate the second domain name set.
For example, when the first/second domain name sets are generated, if the currently acquired network access record is found to be included in the first/second domain name sets, the currently acquired network access record is not stored in the first/second domain name sets.
For example, the terminals of the user a and the user B are both the first terminal, the "www.aabb.com" in the network access record of the user a terminal has been stored into the first domain name set in the previous period, and the "www.aabb.com" in the network access record of the user B terminal is acquired at the current time, so the "www.aabb.com" of the user B terminal is not stored into the first domain name set.
For example, the first domain name set and the second domain name set may both be known domain names to analyze association information of unknown domain names.
Illustratively, the known domain name may be present only in the first set of domain names, only in the second set of domain names, or in both the first and second sets of domain names.
Exemplarily, assuming that the duration of the continuous acquisition is 10 minutes, continuously acquiring the network access record of the user terminal within 10 minutes, if the network access record of the user terminal includes an unknown domain name, marking the terminal as a first terminal and suspending the acquisition of the network access record of the terminal, and storing the corresponding network access record into a first domain name set; if the network access record of the user terminal acquired within the acquisition duration does not include the unknown domain name, the terminal is marked as a second terminal, and the acquired corresponding network access record is stored in a second domain name set.
In some embodiments, the determining a domain name associated with the unknown domain name from the first set of domain names and the second set of domain names includes: and determining the domain name associated with the unknown domain name according to the domain names which are in the first domain name set and not in the second domain name set.
Illustratively, the first set of domain names may be { a, b, d }, and the second set of domain names may be { a, c, d, e, h }, with b being determined as the domain name associated with the unknown domain name based on domain names that exist in the first set of domain names and not in the second set of domain names.
By determining the domain name associated with the unknown domain name according to the first domain name set and the second domain name set, the computation load can be reduced, and the association closeness with the unknown domain name is higher.
And S103, storing and/or outputting the domain name associated with the unknown domain name so as to update a DPI feature library according to the associated domain name, wherein the DPI feature library is used for identifying the application category of the unknown domain name.
Illustratively, according to the associated domain name, the domain name may be stored in the memory to be used when needed, or output to the DPI manufacturer, so that the DPI manufacturer updates the DPI feature library according to the domain name associated with the unknown domain name, thereby improving the recognition rate of the DPI feature library of the operator to the unknown domain name.
In some embodiments, the determining a domain name associated with the unknown domain name from the network access record comprises: determining a frequency value of each domain name in a network access record comprising the unknown domain name; and determining the domain name associated with the unknown domain name according to the frequency value of each domain name.
Illustratively, the Frequency value of each domain name may be determined using a Term Frequency-Inverse Document Frequency algorithm (Term Frequency-Inverse Document Frequency TF-IDF).
Illustratively, frequency values of domain names in the first domain name set are determined, and domain names associated with the unknown domain names are determined according to the frequency values of the domain names in the first domain name set.
In some embodiments, the determining the domain name associated with the unknown domain name according to the frequency value of each domain name includes: sorting the domain names according to the frequency values of the domain names; determining a cut-off frequency threshold value according to the frequency value of the sorted domain name; and determining the domain name corresponding to the frequency value which is greater than or equal to the cut-off frequency threshold value as the domain name associated with the unknown domain name.
Illustratively, a plurality of unknown domain names are denoted as Ui, i ═ 1,2,3 …, and a set of corresponding domain names may be determined according to the unknown domain names (S1), for example, the unknown domain name appears in the user a terminal and the unknown domain name also appears in the user B terminal, and the domain names in the network access records of the user a and the user B terminals are taken as a set (S1).
It can be understood that the user terminal with the unknown domain name is determined as the first terminal, and the word frequency-inverse document algorithm is to determine the word frequency (tf value) of the domain name first, and then determine the frequency value of the unknown domain name according to the inverse document value (idf).
The word frequency (tf value) is a ratio of the first terminal where the corresponding unknown domain name appears to all the first terminals, for example, there are a plurality of unknown domain names a and b, among the first terminals, 10 terminals where the unknown domain name a appears and 40 terminals where the unknown domain name b appears, and then the word frequency (tf value) value of a is 0.25.
The inverse document value may be a ratio of the total number of all unknown domain names (count (ui)) to the number of times that the corresponding domain name appears in the set (S1), and the inverse document value (idf value) is determined according to a common logarithm operation, i.e., a base-10 logarithm operation, with respect to the ratio.
The word frequency-inverse document frequency (tf-idf) is the product of the word frequency and the inverse document value.
Illustratively, sorting is performed according to the frequency value of each domain name, and the cutoff frequency is determined according to the sorted frequency value.
For example, the frequency value of domain name a is 0.5, domain name b is 0.3, domain name c is 0.7, domain name d is 0.65, domain name e is 0.2, and the ordered domain names are { c, d, a, b, e }, where the frequency value of the domain name located in the third is set as the cutoff frequency threshold.
For example, the domain name corresponding to the frequency value greater than or equal to the threshold cutoff frequency is determined as the domain name associated with the unknown domain name.
For example, if the frequency of domain name a is set as the threshold of cut-off frequency, then domain names c, d, a are the associated domain names.
According to the frequency value sequencing of the domain names, the threshold value of the cut-off frequency is determined, and the associated domain names are determined according to the threshold value of the cut-off frequency, so that the calculation amount of a computer can be reduced, and the association degree of the domain names and unknown domain names can be seen more intuitively.
The method comprises the steps that network access records of a plurality of user terminals are obtained, wherein the network access records comprise domain names, and the network access record of at least one user terminal comprises an unknown domain name; determining a domain name associated with the unknown domain name according to the network access records of the plurality of user terminals; and storing and/or outputting the domain name associated with the unknown domain name so as to update a DPI characteristic library according to the domain name, wherein the DPI characteristic library is used for identifying the application category of the unknown domain name. The domain name associated with the unknown domain name is used for conjecturing the application type and/or the source of the unknown domain name, the problem that the source/application type is unclear due to insufficient information of the unknown domain name is effectively solved, the updating rate and frequency of the DPI feature library are improved, the recognition rate of the DPI feature library to the unknown domain name is kept,
referring to fig. 2, fig. 2 is a schematic block diagram of a computer device according to an embodiment of the present invention.
As shown in fig. 2, the computer device 300 includes a processor 301 and a memory 302, the processor 301 and the memory 302 being connected by a bus 303, such as an I2C (Inter-integrated Circuit) bus.
In particular, the processor 301 is used to provide computing and control capabilities, supporting the operation of the entire computer device. The Processor 301 may be a Central Processing Unit (CPU), and the Processor 301 may also be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Specifically, the Memory 302 may be a Flash chip, a Read-Only Memory (ROM) magnetic disk, an optical disk, a usb disk, or a removable hard disk.
Those skilled in the art will appreciate that the architecture shown in fig. 2 is merely a block diagram of portions of an architecture associated with embodiments of the present invention and is not intended to limit the computing devices to which embodiments of the present invention may be applied, and that a particular server may include more or less components than those shown, or some components may be combined, or have a different arrangement of components.
The processor is configured to run a computer program stored in the memory, and when executing the computer program, implement any one of the unknown domain name recognition methods provided by the embodiments of the present invention.
In an embodiment, the processor is configured to run a computer program stored in the memory and to implement the following steps when executing the computer program:
the method comprises the steps of obtaining network access records of a plurality of user terminals, wherein the network access records comprise domain names, and the network access record of at least one user terminal comprises an unknown domain name;
determining a domain name associated with the unknown domain name according to the network access records of the plurality of user terminals;
and storing and/or outputting the domain name associated with the unknown domain name so as to update a DPI characteristic library according to the associated domain name, wherein the DPI characteristic library is used for identifying the application category of the unknown domain name.
In one embodiment, the processor, in enabling determining the domain name associated with the unknown domain name from the network access records of the plurality of user terminals, is configured to:
determining a first terminal according to the terminal including the unknown domain name in the network access record, and determining a second terminal according to the terminal not including the unknown domain name in the network access record;
generating a first domain name set according to the network access record of the first terminal;
generating a second domain name set according to the network access record of the second terminal;
and determining the domain name associated with the unknown domain name according to the first domain name set and the second domain name set.
In an embodiment, the processor, in implementing determining the domain name associated with the unknown domain name from the first set of domain names and the second set of domain names, is configured to implement:
and determining the domain name associated with the unknown domain name according to the domain names which are in the first domain name set and not in the second domain name set.
In an embodiment, the processor, when implementing generation of a first domain name set according to the network access record of the first terminal and generation of a second domain name set according to the network access record of the second terminal, is configured to implement:
and extracting non-repeated domain names from the network access records of the plurality of first terminals to generate the first domain name set, and extracting non-repeated domain names from the network access records of the plurality of second terminals to generate the second domain name set.
In an embodiment, the processor, when implementing obtaining network access records of a plurality of user terminals, the network access records including domain names, and the network access record of at least one user terminal including an unknown domain name, is configured to implement:
the method comprises the steps of obtaining network access records of a plurality of user terminals, wherein the network access records comprise known domain names, and the network access records of at least one user terminal comprise unknown domain names.
When determining the domain name associated with the unknown domain name according to the network access records of the plurality of user terminals, the method is used for realizing that: determining a known domain name associated with the unknown domain name according to the network access records of the plurality of user terminals
When the storage and/or output of the domain name associated with the unknown domain name is realized, the method is used for realizing that: storing and/or outputting the known domain name associated with the unknown domain name.
In an embodiment, the processor, when implementing obtaining network access records of a plurality of user terminals, is configured to implement:
acquiring domain name information accessed by the user terminal;
if the domain name information accessed by the user terminal comprises the unknown domain name, generating a corresponding network access record according to the domain name information accessed by the user terminal, wherein the network access record of the user terminal comprises the known domain name and the unknown domain name;
and if the domain name information accessed by the user terminal does not comprise the unknown domain name, generating a corresponding network access record according to the domain name information accessed by the user terminal, wherein the network access record of the user terminal comprises the known domain name.
In one embodiment, the processor, in causing determination of the known domain name associated with the unknown domain name from the network access record, is configured to cause:
determining a frequency value of each known domain name in a network access record including the unknown domain name;
and determining the known domain name associated with the unknown domain name according to the frequency value of each known domain name.
In one embodiment, the processor, in performing determining the known domain name associated with the unknown domain name based on the frequency value of each known domain name, is configured to perform: sorting the known domain names according to the frequency values of the known domain names;
determining a cut-off frequency threshold value according to the frequency values of the sorted known domain names;
and determining the known domain name corresponding to the frequency value which is greater than or equal to the cut-off frequency threshold value as the known domain name associated with the unknown domain name.
It should be noted that, as will be clearly understood by those skilled in the art, for convenience and brevity of description, the specific working process of the computer device described above may refer to the corresponding process in the foregoing embodiment of the unknown domain name recognition method, and details are not described herein again.
Embodiments of the present invention also provide a storage medium for computer-readable storage, where the storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement the steps of any method for unknown domain name identification as provided in the description of the embodiments of the present invention.
The storage medium may be an internal storage unit of the computer device described in the foregoing embodiment, for example, a hard disk or a memory of the computer device. The storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the computer device.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware embodiment, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
It should be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. An unknown domain name identification method, characterized in that the method comprises:
the method comprises the steps of obtaining network access records of a plurality of user terminals, wherein the network access records comprise domain names, and the network access record of at least one user terminal comprises an unknown domain name;
determining a domain name associated with the unknown domain name according to the network access records of the plurality of user terminals;
and storing and/or outputting the domain name associated with the unknown domain name so as to update a DPI characteristic library according to the associated domain name, wherein the DPI characteristic library is used for identifying the application category of the unknown domain name.
2. The unknown domain name recognition method of claim 1, wherein the determining the domain name associated with the unknown domain name from the network access records of the plurality of user terminals comprises:
determining a first terminal according to the terminal including the unknown domain name in the network access record, and determining a second terminal according to the terminal not including the unknown domain name in the network access record;
generating a first domain name set according to the network access record of the first terminal;
generating a second domain name set according to the network access record of the second terminal;
and determining the domain name associated with the unknown domain name according to the first domain name set and the second domain name set.
3. The unknown domain name recognition method of claim 2, wherein determining the domain name associated with the unknown domain name from the first set of domain names and the second set of domain names comprises:
and determining the domain name associated with the unknown domain name according to the domain names which are in the first domain name set and not in the second domain name set.
4. The unknown domain name recognition method according to claim 2, wherein the generating a first domain name set according to the network access record of the first terminal and the generating a second domain name set according to the network access record of the second terminal comprises:
and extracting non-repeated domain names from the network access records of the plurality of first terminals to generate the first domain name set, and extracting non-repeated domain names from the network access records of the plurality of second terminals to generate the second domain name set.
5. The unknown domain name recognition method according to any of claims 1-4, wherein the obtaining network access records of a plurality of user terminals, the network access records including domain names, and wherein the network access record of at least one user terminal includes an unknown domain name comprises: the method comprises the steps of obtaining network access records of a plurality of user terminals, wherein the network access records comprise known domain names, and the network access record of at least one user terminal comprises an unknown domain name;
the determining the domain name associated with the unknown domain name according to the network access records of the plurality of user terminals comprises:
determining a known domain name associated with the unknown domain name according to the network access records of the plurality of user terminals;
the storing and/or outputting the domain name associated with the unknown domain name comprises:
the storing and/or outputting of the known domain name associated with the unknown domain name.
6. The unknown domain name recognition method according to claim 5, wherein said obtaining network access records of a plurality of user terminals comprises:
acquiring domain name information accessed by the user terminal;
if the domain name information accessed by the user terminal comprises the unknown domain name, generating a corresponding network access record according to the domain name information accessed by the user terminal, wherein the network access record of the user terminal comprises the known domain name and the unknown domain name;
and if the domain name information accessed by the user terminal does not comprise the unknown domain name, generating a corresponding network access record according to the domain name information accessed by the user terminal, wherein the network access record of the user terminal comprises the known domain name.
7. The unknown domain name recognition method according to any of claims 1-4, wherein said determining the domain name associated with the unknown domain name from the network access record comprises:
determining a frequency value of each domain name in a network access record comprising the unknown domain name;
and determining the domain name associated with the unknown domain name according to the frequency value of each known domain name.
8. The unknown domain name recognition method according to claim 7, wherein the determining the domain name associated with the unknown domain name according to the frequency value of each domain name comprises:
sorting the domain names according to the frequency values of the domain names;
determining a cut-off frequency threshold value according to the frequency value of the sorted domain name;
and determining the domain name corresponding to the frequency value which is greater than or equal to the cut-off frequency threshold value as the domain name associated with the unknown domain name.
9. A computer arrangement comprising a processor, a memory, a computer program stored on the memory and executable by the processor, and a data bus for enabling connection communication between the processor and the memory, wherein the computer program, when executed by the processor, implements the steps of the unknown domain name recognition method of any one of claims 1 to 8.
10. A storage medium for computer readable storage, wherein the storage medium stores one or more programs which are executable by one or more processors to implement the steps of the unknown domain name recognition method of any one of claims 1 to 8.
CN202011062802.7A 2020-09-30 2020-09-30 Unknown domain name identification method, computer equipment and storage medium Pending CN114338601A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011062802.7A CN114338601A (en) 2020-09-30 2020-09-30 Unknown domain name identification method, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011062802.7A CN114338601A (en) 2020-09-30 2020-09-30 Unknown domain name identification method, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114338601A true CN114338601A (en) 2022-04-12

Family

ID=81032408

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011062802.7A Pending CN114338601A (en) 2020-09-30 2020-09-30 Unknown domain name identification method, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114338601A (en)

Similar Documents

Publication Publication Date Title
CN107844634B (en) Modeling method of multivariate general model platform, electronic equipment and computer readable storage medium
CN104601736A (en) Method and device for realizing short uniform resource locator (URL) service
CN107480205B (en) Method and device for partitioning data
CN111858055B (en) Task processing method, server and storage medium
CN110009347B (en) Block chain transaction information auditing method and device
CN110233741B (en) Service charging method, device, equipment and storage medium
CN111343267B (en) Configuration management method and system
CN108446110B (en) Lua script generation method, Lua script generation device, Lua script generation terminal and computer readable medium
CN111490890A (en) Hierarchical registration method, device, storage medium and equipment based on micro-service architecture
CN106855862B (en) Rapid comparison method and device
CN113051229A (en) User data acquisition method, device, terminal and readable storage medium
CN111694505A (en) Data storage management method, device and computer readable storage medium
CN108595685B (en) Data processing method and device
CN110990350A (en) Log analysis method and device
CN108520401B (en) User list management method, device, platform and storage medium
CN110019400B (en) Data storage method, electronic device and storage medium
CN111782728A (en) Data synchronization method, device, electronic equipment and medium
CN114338601A (en) Unknown domain name identification method, computer equipment and storage medium
CN116485019A (en) Data processing method and device
CN110717826A (en) Asset filtering method and device
CN111131393B (en) User activity data statistical method, electronic device and storage medium
CN113655942A (en) Chart data display method and device
CN112817689A (en) Method and device for sorting virtual machines and electronic equipment
CN108629610B (en) Method and device for determining popularization information exposure
CN109086279B (en) Report caching method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination