CN111629081A - Internet protocol IP address data processing method and device and electronic equipment - Google Patents

Internet protocol IP address data processing method and device and electronic equipment Download PDF

Info

Publication number
CN111629081A
CN111629081A CN202010469330.0A CN202010469330A CN111629081A CN 111629081 A CN111629081 A CN 111629081A CN 202010469330 A CN202010469330 A CN 202010469330A CN 111629081 A CN111629081 A CN 111629081A
Authority
CN
China
Prior art keywords
address
data
physical address
address information
physical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010469330.0A
Other languages
Chinese (zh)
Other versions
CN111629081B (en
Inventor
滕达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010469330.0A priority Critical patent/CN111629081B/en
Publication of CN111629081A publication Critical patent/CN111629081A/en
Application granted granted Critical
Publication of CN111629081B publication Critical patent/CN111629081B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/60Types of network addresses
    • H04L2101/69Types of network addresses using geographic information, e.g. room number
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/09Mapping addresses
    • H04L61/25Mapping addresses of the same type
    • H04L61/2503Translation of Internet protocol [IP] addresses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/09Mapping addresses
    • H04L61/25Mapping addresses of the same type
    • H04L61/2503Translation of Internet protocol [IP] addresses
    • H04L61/255Maintenance or indexing of mapping tables

Abstract

The embodiment of the application relates to the technical field of computer networks and discloses a method, a device and an electronic device for processing Internet Protocol (IP) address data, wherein the method for processing the Internet Protocol (IP) address data comprises an IP address field and physical address information corresponding to the IP address field, and comprises the following steps: obtaining at least one second IP address data according to the IP address field of the obtained at least one first IP address data and the physical address information corresponding to the IP address field; then, respectively compressing the IP address field of at least one second IP address data and the physical address information corresponding to the IP address field; and then, coding the result obtained by the compression processing into a corresponding coding file according to a preset coding mode so as to inquire the physical address information corresponding to the target IP address according to the coding file. The storage space required for storing the IP address data is greatly reduced, and the analysis of the IP address data is simpler and more efficient.

Description

Internet protocol IP address data processing method and device and electronic equipment
Technical Field
The embodiment of the application relates to the technical field of computer networks, in particular to a method and a device for processing Internet Protocol (IP) address data and electronic equipment.
Background
With the rapid development of the internet, it becomes more and more important to analyze network data according to an IP address, and since the IP address can indicate a geographical location, the geographical location of a source of network information can be analyzed according to the IP address, for example, which country, province, city it comes from is determined according to an IP address of a visitor who visits a web page.
The geographic location corresponding to an IP address can be usually determined by querying a corresponding IP library or IP file, and therefore, a large amount of acquired IP address data needs to be stored in the corresponding IP library or IP file.
Disclosure of Invention
The purpose of the embodiments of the present application is to solve at least one of the above technical drawbacks, and to provide the following technical solutions:
in one aspect, a method for processing internet protocol IP address data is provided, where the IP address data includes an IP address field and physical address information corresponding to the IP address field, and the method includes:
obtaining at least one second IP address data according to the IP address field of the obtained at least one first IP address data and the physical address information corresponding to the IP address field;
respectively compressing an IP address field of at least one second IP address data and physical address information corresponding to the IP address field;
and coding the result obtained by the compression processing into a corresponding coding file according to a preset coding mode so as to inquire the physical address information corresponding to the target IP address according to the coding file.
In one aspect, an IP address data processing apparatus is provided, where IP address data includes an IP address field and physical address information corresponding to the IP address field, and includes:
the first processing module is used for obtaining at least one second IP address data according to the IP address field of the obtained at least one first IP address data and the physical address information corresponding to the IP address field;
the second processing module is used for respectively compressing the IP address field of at least one second IP address data and the physical address information corresponding to the IP address field;
and the third processing module is used for coding the result obtained by the compression processing into a corresponding coding file according to a preset coding mode so as to inquire the physical address information corresponding to the target IP address according to the coding file.
In a possible implementation manner, the first processing module is configured to perform data processing on an IP address field of the obtained at least one first IP address data and physical address information corresponding to the IP address field, respectively, to obtain at least one second IP address data; when the first processing module respectively performs data processing on the IP address field of the obtained at least one first IP address data and the physical address information corresponding to the IP address field, the first processing module is configured to:
converting the dot decimal character string type IP address in the IP address field of each first IP address data into a decimal numeric type IP address;
simplifying and describing physical address information corresponding to the IP address field of each first IP address data to obtain simplified and described physical address information;
the second IP address data includes a decimal value type IP address and simplified description physical address information.
In one possible implementation, the physical address information includes a geographic location and a name of an internet service provider; when the first processing module performs simplified description on the physical address information corresponding to the IP address field of each piece of first IP address data, the first processing module is configured to:
and performing at least one of the following processing on the geographic position corresponding to the IP address field of each piece of first IP address data: filtering out non-key words in the geographic position, representing unknown information or missing information in the geographic position by using a first preset identifier, and mapping the geographic position into a corresponding country regional code according to a preset country regional code;
and performing at least one of the following processing on the name of the internet service provider corresponding to the IP address field of each piece of first IP address data: filtering out non-key words in the name of the Internet service provider, and characterizing other Internet service providers except the first N Internet service providers in an Internet service provider queue by using a second predetermined identifier, wherein the Internet service provider queue is used for sequencing the Internet service providers according to the order of the number of users from top to bottom.
In a possible implementation manner, when performing compression processing on an IP address segment of at least one piece of second IP address data, the second processing module is configured to:
when at least two continuous IP address segments exist in the plurality of IP address segments and the physical address information corresponding to the at least two continuous IP address segments is the same, merging the at least two continuous IP address segments to obtain a merged IP address segment, wherein the initial IP address of the merged IP address segment is the initial IP address of the first IP address segment of the at least two continuous IP address segments, and the terminal IP address of the merged IP address segment is the terminal IP address of the last IP address segment of the at least two continuous IP address segments;
the second processing module is configured to, when compressing physical address information corresponding to an IP address field of at least one second IP address data,:
and storing the physical address information corresponding to each IP address field into a preset aggregation list, and filtering the repeated physical address information when the repeated physical address information exists in the aggregation list.
In a possible implementation manner, the predetermined encoding manner includes a header area, a prefix area, an index area, and a content area, and the third processing module is configured to:
determining the total number of the combined IP address fields, the number of the IP address fields corresponding to each IP prefix and the total number of bytes of the residual physical address information after the repeated physical address information is filtered out from the aggregation list, wherein the IP prefix is a first decimal number of the IP address of a dotted decimal character string type;
writing the total number of the IP address fields after merging into a file header region, writing an index region starting position and an index region ending position determined according to the number of the IP address fields corresponding to each IP prefix into a prefix region as prefix data information, writing an ending IP address, a physical address information starting position and a physical address information length of each IP address field corresponding to each predetermined IP prefix into an index region as index data information, and writing the remaining physical address information in a content region after filtering the repeated physical address information in a set list;
the space size of the content area is the total number of bytes of the physical address information left after the repeated physical address information is filtered out from the aggregation list.
In a possible implementation manner, the apparatus further includes a fourth processing module, where the fourth processing module is configured to:
when receiving a query request for querying physical address information corresponding to a target IP address, determining a target IP prefix of the target IP address, and converting the target IP address into a decimal numerical value type IP address by a preset conversion method, wherein the target IP address is an IP address of a dotted decimal character string type, and the target IP prefix is a first decimal number of the target IP address;
inquiring a prefix area of the coded file according to the target IP prefix to obtain a corresponding target index area initial position and a target index area end position;
and searching physical address information corresponding to the decimal value type IP address in the interval between the starting position of the target index area and the ending position of the target index area.
In a possible implementation manner, when the fourth processing module searches for physical address information corresponding to the decimal value type IP address in the interval between the starting position of the target index area and the ending position of the target index area, the fourth processing module is configured to:
and searching index data information of which the current termination IP address is smaller than the decimal numerical value type IP address and the next termination IP address of the current termination IP address is larger than the decimal numerical value type IP address in an interval of the starting position of the target index area and the ending position of the target index area, and obtaining physical address information corresponding to the decimal numerical value type IP address according to the starting position of the physical address information and the length of the physical address information in the index data information.
In one possible implementation, the predetermined conversion method comprises a socket.
In one aspect, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the above-mentioned internet protocol IP address data processing method.
In one aspect, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the above-mentioned internet protocol IP address data processing method.
According to the internet protocol IP address data processing method provided by the embodiment of the application, the at least one piece of standard second IP address data can be obtained according to the obtained at least one piece of first IP address data, and a precondition guarantee is provided for subsequent compression processing and encoding processing; the IP address field of at least one second IP address data and the physical address information corresponding to the IP address field are respectively compressed, so that the storage space required for storing the IP address data can be greatly reduced, meanwhile, the IP address data comprises the IP address field and the physical address information corresponding to the IP address field, and complete IP address data is ensured, so that the query accuracy of subsequent IP address data is improved, the result obtained by compression is coded into a corresponding coding file according to a preset coding mode, the storage space can be further saved, the subsequent query of the IP address data is facilitated, the query efficiency is improved, and the analysis of the IP address data is simpler and more efficient.
Additional aspects and advantages of embodiments of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of embodiments of the present application will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart of an IP address data processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of data preprocessing of IP address data according to an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating an IP address segment compression process according to an embodiment of the present application;
fig. 4 is a schematic process diagram of IP address data processing according to an embodiment of the present application;
fig. 5 is a schematic diagram of a basic structure of an IP address data processing apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.
Cloud technology refers to a hosting technology for unifying serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.
Cloud technology (Cloud technology) is based on a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied in a Cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.
A distributed cloud storage system (hereinafter, referred to as a storage system) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of different types in a network through application software or application interfaces to cooperatively work by using functions such as cluster application, grid technology, and a distributed storage file system, and provides a data storage function and a service access function to the outside.
At present, a storage method of a storage system is as follows: logical volumes are created, and when created, each logical volume is allocated physical storage space, which may be the disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as data Identification (ID), the file system writes each object into a physical storage space of the logical volume, and the file system records storage location information of each object, so that when the client requests to access the data, the file system can allow the client to access the data according to the storage location information of each object.
The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided in advance into stripes according to a group of capacity measures of objects stored in a logical volume (the measures often have a large margin with respect to the capacity of the actual objects to be stored) and Redundant Array of Independent Disks (RAID), and one logical volume can be understood as one stripe, thereby allocating physical storage space to the logical volume.
The following describes in detail the technical solutions of the embodiments of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
One embodiment of the present application provides an internet protocol IP address data processing method, which is executed by a computer device, where the computer device may be a terminal or a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, and a big data and artificial intelligence platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited herein.
As shown in fig. 1, the method includes: step S110, obtaining at least one second IP address data according to the IP address field of the obtained at least one first IP address data and the physical address information corresponding to the IP address field; step S120, respectively compressing the IP address field of at least one second IP address data and the physical address information corresponding to the IP address field; step S130, according to a predetermined encoding mode, encoding the result obtained by the compression processing into a corresponding encoded file, so as to query the physical address information corresponding to the target IP address according to the encoded file.
An IP (Internet Protocol) address refers to an Internet Protocol address, and may be also referred to as an Internet Protocol address, where the IP address is a dotted decimal character string shaped like "1.14.95.255", and the IP address field is a dotted decimal character string shaped like "1.14.0.0-1.14.95.255". The physical address information may be a home of an IP address or IP address segment.
In the process of processing the IP address data, the following processing procedures may be performed:
first, one or more pieces of to-be-processed IP address data (i.e., first IP address data) are obtained, where the one or more pieces of to-be-processed IP address data each include an IP address field and physical address information corresponding to the IP address field, that is, the first IP address data includes the IP address field and the physical address information corresponding to the IP address field. The one or more pieces of IP address data to be processed or stored may be read from an existing IP library, or may be obtained by other means, which is not limited in the embodiment of the present application.
Then, according to the obtained one or more to-be-processed IP address data, at least one second IP address data is obtained, that is, according to an IP address segment (denoted as a first IP address segment) of the at least one first IP address data and physical address information (denoted as first physical address information) corresponding to the IP address segment, at least one target IP address data (i.e., a second IP address data) is obtained, that is, an IP address segment (denoted as a second IP address segment) in the second IP address data is obtained according to the first IP address segment, and physical address information (denoted as second physical address information) corresponding to the second IP address segment is obtained according to the first physical address information.
And then, respectively compressing the IP address field of each second IP address data and the physical address information corresponding to the IP address field, namely compressing the IP address field of the second IP address data, and simultaneously compressing the physical address information corresponding to the IP address field of the second IP address data, thereby saving the storage space required for storing the IP address data.
And then, coding the result obtained by the compression processing into a corresponding coding file according to a preset coding mode. Because the storage structure of the IP address data generally determines the size of the occupied storage space and affects the query efficiency when the IP address data is subsequently queried, the corresponding storage structure can be determined according to the preset coding mode, and the result obtained by the compression processing is coded into the corresponding coding file according to the storage structure, so that the storage space occupied when the IP address data is stored is further reduced, the IP address data can be conveniently queried according to the coding file based on the preset coding mode subsequently, the query efficiency is improved, and the IP address data can be analyzed more simply and efficiently.
According to the internet protocol IP address data processing method provided by the embodiment of the application, the at least one piece of standard second IP address data can be obtained according to the obtained at least one piece of first IP address data, and a precondition guarantee is provided for subsequent compression processing and encoding processing; the IP address field of at least one second IP address data and the physical address information corresponding to the IP address field are respectively compressed, so that the storage space required for storing the IP address data can be greatly reduced, meanwhile, the IP address data comprises the IP address field and the physical address information corresponding to the IP address field, and complete IP address data is ensured, so that the query accuracy of subsequent IP address data is improved, the result obtained by compression is coded into a corresponding coding file according to a preset coding mode, the storage space can be further saved, the subsequent query of the IP address data is facilitated, the query efficiency is improved, and the analysis of the IP address data is simpler and more efficient.
For convenience of description, at least one piece of acquired first IP address data may be regarded as original IP address data, and the following takes the first IP address data as the original IP address data as an example to specifically describe the internet protocol IP address data processing method in the embodiment of the present application:
in a possible implementation manner, obtaining at least one second IP address data according to the obtained at least one first IP address data includes: and respectively carrying out data processing on the IP address field of the acquired at least one first IP address data and the physical address information corresponding to the IP address field to obtain at least one second IP address data.
The method includes the steps of performing data processing on one or more acquired IP address data to be processed, namely performing data processing on an IP address field (marked as a first IP address field) of at least one acquired first IP address data and physical address information (marked as first physical address information) corresponding to the IP address field respectively to obtain at least one piece of IP address data (namely second IP address data) after corresponding data processing, namely performing data processing on the first IP address field to obtain an IP address field (marked as a second IP address field) in second IP address data, and performing data processing on the first physical address information to obtain physical address information (marked as second physical address information) corresponding to the second IP address field. The data processing includes, but is not limited to, data cleaning, data arrangement, and the like. By carrying out data processing on the acquired at least one first IP address data, the at least one first IP address data can be converted into standard IP address data, so that subsequent processing is facilitated.
In the process of performing data processing on the acquired at least one piece of first IP address data, the following processing may be performed: converting the dot decimal character string type IP address in the IP address field of each first IP address data into a decimal numeric type IP address; simplifying and describing physical address information corresponding to the IP address field of each first IP address data to obtain simplified and described physical address information; wherein the second IP address data includes a decimal value type IP address and simplified description physical address information.
Because the IP address data comprises the IP address field and the physical address information corresponding to the IP address field, the data preprocessing of the original IP address data also comprises two parts, wherein one part is used for carrying out data preprocessing on the IP address field of the original IP address data, and the other part is used for carrying out data processing on the physical address information corresponding to the IP address field of the original IP address data.
Generally, an IP address is a unique 32-bit binary number worldwide, i.e., an IP address is represented by a 32-bit binary number, such as "01001001001010001110000100100000". In daily life, in order to improve readability, the equivalent dot decimal character string of every 8 bits is usually used, and the character string is represented as 1.14.95.255, namely, the IP address is converted from the representation form of 32-bit binary number into the representation form of the dot decimal character string. Based on this, in the process of data processing the IP address segment of the original IP address data, the IP address segment of the original IP address data in the dotted decimal character string type like "1.14.0.0-1.14.95.255" may be first converted into a 32-bit binary number, and then the 32-bit binary number may be converted into a decimal numeric number, that is, the IP address in the dotted decimal character string type in the IP address segment of each first IP address data is converted into an IP address in the decimal numeric type.
Taking the IP address as "1.14.95.255" of the decimal character string type, it may be that "1.14.95.255" is converted into a binary number of 32 bits to obtain "00000001000011100101111111111111", and then the binary number of 32 bits "00000001000011100101111111111111" is converted into a decimal number with high readability to obtain "17719295", that is, "1.14.95.255" of the decimal character string type is converted into "17719295" of the decimal value type. In this example, the IP address of the original IP address data (i.e., "1.14.95.255") needs to occupy at least 7 bytes for storage, and the converted decimal value type "17719295" needs to occupy only 4 bytes for storage, and it can be seen that converting the IP address of the dotted decimal character string type into the decimal value type IP address can save the storage space of each IP address by 3 bytes at minimum, that is, 42% of the storage space.
In the process of data processing of the physical address information corresponding to the IP address field of the first IP address data, the physical address information may be described in a simplified manner, for example, the original physical address information may be "D1 town of C1 area, B1, province A1, a B1, province A1, a B1, a" a country A1B1C1D1 ", a" a country A1B1 ", or the like.
In one possible implementation, the physical address information may include a specific geographical location (e.g., "B1, C1, D1 town, a country a1, above") and the name of an internet service provider (e.g., country a telecommunications). The simplified description of the physical address information corresponding to the IP address field of each piece of first IP address data may be as follows: and performing at least one of the following processing on the geographic position corresponding to the IP address field of each piece of first IP address data: filtering out non-key words in the geographic position, representing unknown information or missing information in the geographic position by using a first preset identifier, and mapping the geographic position into a corresponding country regional code according to a preset country regional code; and the name of the internet service provider corresponding to the IP address field of each piece of the first IP address data may be at least one of: filtering out non-key words in the name of the Internet service provider, and characterizing other Internet service providers except the first N Internet service providers in an Internet service provider queue by using a second predetermined identifier, wherein the Internet service provider queue is used for sequencing the Internet service providers according to the order of the number of users from top to bottom.
The simplified description is carried out on the physical address information corresponding to the IP address field of each first IP address data, the physical address information is substantially normalized, meaningless information is removed, and only more critical information is reserved. In one example, if the geographic location is "B1C 1 area D1 town of a country A1 province, city, area, town, etc., in the simplified description of the geographic location, the characters of province, city, area, town, etc. belong to non-critical information (i.e., non-keywords) relative to A1, B1, C1, D1, etc., at this time, the characters of province, city, area, town, etc. may be filtered out to save the storage space, that is," a country A1 province, B1 city C1 area D1 town "is simply described as" a country A1B1C1D1 ".
In another example, if the original geographic location should be "D1 town of C1 area C1, province B1, a country a 1", and the D1 town is not accurately acquired due to some reason, that is, "D1 town" is unknown information or missing information, "D1 town" may be characterized by a first predetermined identifier, which may be "@", and "#", and so on, which is not limited by the embodiments of the present application. If the first predetermined identifier is ", the" D1 town "is characterized by" ", i.e.," D1 district D1 town, province B1, province A1 ", which is described as" district C1, province B1, province A1, or "district A1B1C 1". If the original geographic location should be "D1 town C1 region from a country A1, B1 and city, D1" and the first predetermined identifier is "", the C1 region, i.e., "C1 region", is not accurately acquired due to some reason, and the "C1 region" is unknown or missing, so that the D1 town, i.e., "D1 town", is also unknown or missing, and thus the "D1 region D1 town from a country A1, B1 and city" may be simply described as "a country A1, B1 and city" or "a country A1B 1".
In yet another example, the geographic location may be mapped to a corresponding country region code according to a predetermined country region code, for example, the country information may be mapped using a country region code of "ISO 3166-1" standard, and for example, the region of continental china may be mapped using a public "administrative division code over county", for example, the beijing city is mapped to 110000, and the beijing city east city is mapped to 110101.
In the process of simplifying the description of the name of the internet service provider corresponding to the IP address field of each first IP address data, if the geographic location in the physical address information is "D1 town of C1 area C1, a country a1, B1, a country a", the name of the internet service provider in the physical address information is "M carrier of a country", that is, the physical address information is "M carrier of a country a1, B1, a country C1 area D1, a country a", and "non-keyword" of "M carrier of a country", "company", etc. belong to non-key information, so that it can be filtered, that is, non-keyword "company" of "in the name of the internet service provider" is filtered.
Generally, there are many internet service providers capable of providing internet services, such as 50, 80, etc., but actually, most users use internet operators that are basically several relatively influential, such as chinese telecom, unicom, mobile, etc., so in order to reduce the data volume of the dimension of the name of the internet operator, only the name of the internet operator N before the user can be reserved, and other internet operators all use a second predetermined identifier, which may be "+", "@" and "#", etc., and the embodiment of the present application does not limit it. The method is characterized in that the method is equivalent to sequencing the Internet operators according to the sequence of the user amount of each Internet operator from high to low, meanwhile, the names of N Internet operators ranked at the top are reserved in the physical address information, and the names of the N +1 th to the last Internet operators ranked in the physical address information are represented and reserved by a second preset identifier.
In one example, if the physical address information is "M carrier of a country A1, B1, C1, D1, C country a carrier", that is, the name of the internet carrier "M carrier of a country" if "M carrier of a country" belongs to N internet carriers ranked earlier, then the word "M carrier of a country" or the word "M carrier of a country" is retained in the physical address information, and if "M carrier of a country" does not belong to N internet carriers ranked earlier, then the word "M carrier of a country" is characterized in the physical address information by a second predetermined identifier "at this time, and then the physical address information may be" M carrier of a country A1, B1, C1, D1, C carrier of a country A1B1C1D1 ".
Fig. 2 is a schematic diagram showing a result of data processing performed on the acquired original IP address data, where the left side of fig. 2 is the original IP address data including an IP address of a decimal character string type and original physical address information, and the right side of fig. 2 is the IP address data after data processing including an IP address of a decimal number type and simplified description physical address information.
In a possible implementation manner, in the process of compressing the IP address field of the at least one second IP address data, when at least two consecutive IP address fields exist in the plurality of IP address fields and the physical address information corresponding to the at least two consecutive IP address fields is the same, the at least two consecutive IP address fields are merged to obtain the merged IP address field. The initial IP address of the IP address field after the merging processing is the initial IP address of the first IP address field in at least two continuous IP address fields, and the terminal IP address of the IP address field after the merging processing is the terminal IP address of the last IP address field in at least two continuous IP address fields.
Specifically, after the original IP address data is subjected to data processing to obtain the second IP address data, in some cases, there may be a case where physical address information of two or three or more consecutive second IP address segments is the same, and at this time, the two or three or more consecutive second IP address segments may be merged and reduced, for example, the two or three or more consecutive second IP address segments are merged and reduced into one second IP address segment. As shown in fig. 3, the 1 st second IP address segment in the left-side wire frame is "2005997312-. Wherein 2005997312 of the merged and reduced second IP address segment is a start IP address of a first second IP address segment of the 3 consecutive second IP address segments, and 2006008063 is an end IP address of a last second IP address segment of the 3 consecutive second IP address segments.
Specifically, the process of compressing the physical address information corresponding to the IP address field of the at least one second IP address data may be a process of performing a normalization process on the physical address information. After the original IP address data is processed to obtain the second IP address data, although the data after data processing is in the same format, there may be a lot of repeated physical address information in the original IP address data, for example, the physical address information of a certain original IP address data is "D1 town in C1 area of B1, a country a 1", the physical address information of another original IP address data is "D1 town in C1 area of B1, a country a 1", and the physical address information of another original IP address data is "D1 town in C1 area of C366352, a country a1, B1", so that there is a lot of repetition of corresponding to the physical address information corresponding to the IP address field of the obtained second IP address data. Particularly, after the dimension of the physical address information is reduced according to the service requirement, more repeated physical address information exists, for example, the original physical address information is "D1 town of C1 area of B1, a country a1 province", after the dimension of the physical address information is reduced to province according to the service requirement, that is, the physical address information is only reserved to province information, and at this time, the physical address information becomes "a country a1 province", and in this case, the repeated physical address information will be more. Taking china as an example, after the dimensionality of the physical address information is reduced to province, that is, business gray scale is issued according to provinces, and then only 34 pieces of different physical address information are available. However, there are thousands of IP address segments of IP address data, which translates into a more-to-less scenario.
In this case, all the physical address information corresponding to the IP address fields of the second IP address data may be stored as a bar of character string data such as "a country | a1, B1 city | C1 region | D1 town | telecommunications", i.e., each dimension is divided by "|", unknown information or missing information is represented by "-", so that all the physical address information is represented collectively. Then, all the physical address information after being represented uniformly is stored in a preset set list, such as a set, and all the physical address information in the set list is subjected to a normalization process, only different physical address information is reserved, namely when the physical address information which appears repeatedly exists in the set list, the physical address information which appears repeatedly is filtered. After the physical address information in the collection list is subjected to the normalization processing, the collection list after the normalization processing can be converted into an array list to ensure that the physical address information obtained by the normalization processing is fixed in sequence.
In a possible implementation manner, the predetermined encoding manner includes a header region, a prefix region, an index region, and a content region, and in the process of encoding a result obtained by the compression processing into a corresponding encoded file according to the predetermined encoding manner, the following processing may be performed: determining the total number of the combined IP address fields, the number of the IP address fields corresponding to each IP prefix and the total number of bytes of the residual physical address information after the repeated physical address information is filtered out from the aggregation list, wherein the IP prefix is a first decimal number of the IP address of a dotted decimal character string type; then, writing the total number of the IP address fields after the merging processing into a file header region, writing the initial position and the end position of the index region determined according to the number of the IP address fields corresponding to each IP prefix into a prefix region as prefix data information, writing the predetermined end IP address, the predetermined initial position and the predetermined length of the physical address information of each IP address field corresponding to each IP prefix into the index region as index data information, and writing the remaining physical address information after filtering the repeated physical address information in the collection list into a content region. The space size of the content area is the total number of bytes of the physical address information left after the repeated physical address information is filtered out from the aggregation list.
Specifically, when the result obtained by the compression processing is encoded into the corresponding encoded file according to the predetermined encoding mode, pre-encoding processing may be performed on the data to be encoded (i.e., the second IP address data after the compression processing), where the pre-encoding processing includes the following two aspects: in a first aspect, the following data quantities are calculated: a) the number of all the IP address segments (namely the total number of the IP address segments after merging processing), b) the total number of bytes of all the physical address information which does not repeatedly appear (namely the total number of bytes of the physical address information which is remained after filtering the physical address information which repeatedly appears in the aggregation list), c) the number of the IP address segments corresponding to each IP prefix, wherein the IP prefix is the first decimal number of the IP address of the dotted decimal character string type, if the IP address of the dotted decimal character string type is '1.14.95.255', the IP prefix is 1, and if the IP address of the dotted decimal character string type is '14.17.22.34', the IP prefix is 14. The following description will be made by taking the number of all IP address segments as 30000 as an example.
Specifically, the specification data before encoding can be obtained through the above three parts a), b) and c), as shown in table 1 below, in table 1, part of the physical address information of the content area is given, and in practical applications, all unknown physical address information is usually placed in the first one, which may represent the reserved address.
TABLE 1 Specification data before encoding
Figure BDA0002513765850000161
The ending IP value in table 1 is the ending IP address of the IP address segment corresponding to the IP prefix, and the ending IP address is the decimal value type IP address after data processing. If the IP address segment is "1.14.0.0-1.14.95.255", i.e., the IP prefix is 1, the end IP address of the IP address segment "1.14.95.255" corresponds to the decimal value type IP address "17719295".
Next, the results obtained by the compression process (i.e., the second IP address data after the compression process) may be sequentially encoded into the encoded file according to a predetermined encoding method shown in table 2 below, so as to generate an IP offline library file with an extremely high compression rate and an extremely high parsing efficiency.
TABLE 2 predetermined coding scheme
Figure BDA0002513765850000171
The sum of the number of bytes of all the physical address information in table 2 refers to the total number of bytes of the physical address information remaining after the repeated physical address information is filtered out from the aggregate list, that is, the spatial size of the content area is the total number of bytes of the physical address information remaining after the repeated physical address information is filtered out from the aggregate list. In addition, the space size of the file header area is 4 bytes, the space size of the prefix area is 256 × 8 bytes, and the space size of the index area is 8 bytes multiplied by the total number of the IP network segments after the merging processing.
Next, the result obtained by the compression processing (i.e., the second IP address data after the compression processing) is encoded into a corresponding encoded file in accordance with a predetermined encoding method shown in table 2. If the file name of the encoded file is "ip _ db _ v1. dat", then:
in the first step, a header area of the file is written, that is, the total number of IP address fields is written into the first 4 bytes of the file, and an exemplary number of IP address fields in the embodiment of the present application is 30000, that is, 30000 is converted into a byte stream with a standard size of 4 bytes, which is written into the file.
Secondly, writing in a prefix area, the number of IP address segments corresponding to each IP prefix is known according to the specification data shown in table 1, so that the start position and the end position of the index area corresponding to each IP prefix can be obtained according to a predetermined formula, and the start position and the end position of the index area are written in the prefix area as a piece of prefix data information, where the predetermined formula may be: the index area starting position of the current IP prefix is equal to the index area ending position of the last IP prefix, and the index area ending position of the current IP prefix is equal to the index area ending position of the current IP prefix + the number of IP segments of the last IP prefix is 8. In one example, prefix 0 is the first indexed data, so the index region starting position of prefix 0 should be 2052, and the index region ending position should be 2060, since prefix 0 has only one IP segment, and as calculated by the above formula, the index region starting position of prefix 1 should be 2060, and assuming prefix 1 has 200 IP address segments, the index region ending position of prefix 1 is 3660. By analogy, the index area starting position and the index area ending position corresponding to each IP prefix can be obtained.
After the index area starting position and the index area ending position corresponding to the current IP prefix are obtained, the IP prefix is converted into binary number of 4 bytes and written into the prefix area. Each piece of prefix data information of the prefix area comprises an index area initial position field and an index area end position field, in the process of writing the index area initial position and the index area end position determined according to the number of the IP network segments corresponding to each IP prefix into the prefix area as a piece of prefix data information, the index area initial position is written into the index area initial position field of a piece of prefix data information, and the index area end position is written into the index area end position field of a piece of prefix data information.
Thirdly, writing in an index area, known from the preset coding mode shown in table 2, after all prefix areas are written, the file size is 2052 bytes, and then the index area writes all IP address segments and corresponding physical address information, as shown by the spatial content of the index area in table 2, all the canonical data shown in table 1 are sequentially converted into corresponding binary numbers according to the format of [ end IP value ] [ physical address information position ] [ physical address information length ] and written in the index area, that is, the end IP address, the physical address information start position and the physical address information length of each IP address segment corresponding to each predetermined IP prefix are written in the index area as index data information.
Specifically, each piece of index data information in the index area includes an end IP address field, a physical address information start position field, and a physical address information length field, and when the predetermined end IP address, the predetermined physical address information start position, and the predetermined physical address information length of each IP segment corresponding to each IP prefix are written into the index area as a piece of index data information, the end IP address can be written into the end IP address field of the piece of index data information, the start position of the physical address information is written into the start position field of the physical address information of the piece of index data information, and the physical address information length is written into the physical address information length field of the piece of index data information.
In practical application, the [ end IP value ] field occupies 4 bytes, the [ starting position of the physical address information ] field occupies 3 bytes, and the [ length of the physical address information ] field occupies 1 byte. For example, when data "16777215 | 24020529" in table 1 is written into the index area of the encoded file, 16777215, 2401028 and 9 need to be sequentially converted into byte streams of 4 bytes, 3 bytes and 1 byte size, respectively, and written into the index area of the encoded file, and all IP address segments are sequentially written into the encoded file according to the flow.
Fourthly, writing a content area, wherein the data written in the three steps are all fixed in size, so that the data need to be arranged in front, the size of each piece of data in the content area is not fixed, but the overall size of the content area is determined, namely the total byte number of all unrepeated physical address information, only the physical address information needs to be sequentially written into a file according to byte streams in the step, and finally a binary coding file "ip _ db _ v1. dat" is generated, and the data composition form of the coding file is shown in the following table 3:
TABLE 3 data composition form of encoded file
Figure BDA0002513765850000191
In table 3, the ellipses in the third column are the start positions and end positions of other unrecited index areas in the prefix area, the ellipses in the fifth column are the end IP values, the start positions of the physical address information and the lengths of the physical address information, which are unrecited in the index area, and the ellipses in the seventh column are the physical address information, which is unrecited in the content area.
It should be noted that, when the result obtained by the compression processing is encoded into the corresponding encoded file according to the predetermined encoding method, the written content may also be adjusted as needed, for example, a version field may be added to the file header area, and the embodiment of the present application does not limit this.
In a possible implementation manner, after the result obtained by the compression processing is encoded into the corresponding encoded file according to the predetermined encoding manner, the physical address information corresponding to the target IP address may also be queried according to the encoded file, where the process of querying the physical address information corresponding to the target IP address according to the encoded file may be: when receiving a query request for querying physical address information corresponding to a target IP address, determining a target IP prefix of the target IP address, and converting the target IP address into a decimal numerical value type IP address by a preset conversion method, wherein the target IP address is an IP address of a dotted decimal character string type, and the target IP prefix is a first decimal number of the target IP address; then, inquiring a prefix area of the coded file according to the target IP prefix to obtain a corresponding target index area initial position and a target index area end position; and then, searching physical address information corresponding to the decimal value type IP address in the interval between the starting position of the target index area and the ending position of the target index area.
When the physical address information corresponding to the decimal value type IP address is searched in the interval between the starting position of the target index area and the ending position of the target index area, index data information that the current ending IP address is smaller than the decimal value type IP address and the next ending IP address of the current ending IP address is larger than the decimal value type IP address can be searched in the interval between the starting position of the target index area and the ending position of the target index area, and the physical address information corresponding to the decimal value type IP address is obtained according to the starting position of the physical address information and the length of the physical address information in the index data information.
Specifically, the process of querying the physical address information corresponding to the target IP address according to the encoded file is substantially a process of analyzing the target IP address, that is, querying the physical address information of a certain IP address by using the generated encoded file. The following takes the target IP address "14.17.22.34" as an example, and specifically introduces the resolution process of the target IP address:
first, a target IP address is converted, the target IP address being of a dotted decimal character string type, and first, a target IP prefix of the target IP address, for example, a target IP prefix of the target IP address "14.17.22.34" is determined to be 14, and then, the target IP address is converted into a decimal numeric type IP address by a predetermined conversion method, wherein the decimal numeric type IP address corresponding to the target IP address "14.17.22.34" is 236000802. It should be noted that, when converting a target IP address of a dot decimal character string type into a decimal numeric value type IP address, a most efficient conversion method needs to be used, and since the encoding method of each programming language is different, in the embodiment of the present application, a Python encoding language is taken as an example, and a most efficient socket.
Secondly, searching physical address information corresponding to the decimal value type IP address in the binary coding file, firstly, inquiring in a prefix area of the coding file according to a target IP prefix to obtain a corresponding target index area starting position and a target index area ending position, namely, according to the condition that the position of the target IP prefix in the prefix area is a target IP prefix 8+4 and the size is 8, determining the prefix area data of the prefix 14 as a coding file byte interval [116, 124], unpacking the data with the size of two 4 bytes in the interval [ 116124 ] to obtain a target index area starting position a and a target index area ending position b, then, in the coding file byte interval [ a, b ], finding out index data information of which the current [ ending IP address ] is less than 236000802 and the next ending IP address of the current ending IP address is greater than 236000802 by a binary search method, according to the index data information, the starting position of the corresponding physical address information is x, and the length of the physical address information is y, so that the data of the byte interval [ x, x + y ] of the encoding file can be determined to be the physical address information corresponding to the target IP address "14.17.22.34".
Fig. 4 shows an application schematic diagram of the embodiment of the present application, in fig. 4, the IP raw data of step 401 is the acquired at least one first IP address data, and the IP raw data is subjected to reduction, merging and conversion on the at least one first IP address data, which corresponds to the content of the data processing and compression processing part of the embodiment of the present application, so as to obtain the IP specification data of step 402, which is the compressed second IP address data of the embodiment of the present application, and then the IP encoded data of step 403 is obtained through corresponding calculation, where the IP encoded data is the specification data shown in table 1 of the embodiment of the present application, and then the compressed second IP address data is written into the encoded file by writing in order of format (that is, in the embodiment of the present application, the result obtained by the compression processing is encoded into the corresponding encoded file according to the predetermined encoding mode), and obtaining an execution result that the IP coding data in the step 404 are sequentially written into the file, and after the IP coding data are sequentially written into the file, performing the IP address analysis in the step 405, namely analyzing the target IP address according to the file to obtain the physical address information corresponding to the target IP address.
The method of the embodiment of the application not only can greatly reduce the space required by IP address information storage, but also can ensure very high query efficiency, thereby being suitable for IP address resolution scenes of services with high time-consuming requirements. In addition, the method can be only applied to the client side, namely IP address resolution is carried out on the client side, all IP addresses can be accurately resolved only by occupying a small space, the space and the time are well balanced, the IP address resolution is simpler and more efficient, the regional network quality of all the clients can be conveniently counted in real time, and very professional and delicate data are provided for analyzing problems of the clients.
Fig. 5 is a schematic structural diagram of an internet protocol IP address data processing apparatus according to yet another embodiment of the present application, as shown in fig. 5, IP address data in the apparatus 500 includes an IP address field and physical address information corresponding to the IP address field, and the apparatus 500 may include a first processing module 501, a second processing module 502, and a third processing module 503, where:
the first processing module 501 is configured to obtain at least one second IP address data according to an IP address field of the obtained at least one first IP address data and physical address information corresponding to the IP address field;
a second processing module 502, configured to respectively perform compression processing on an IP address field of at least one second IP address data and physical address information corresponding to the IP address field;
the third processing module 503 is configured to encode a result obtained by the compression processing into a corresponding encoded file according to a predetermined encoding manner, so as to query the physical address information corresponding to the target IP address according to the encoded file.
In a possible implementation manner, the first processing module is configured to perform data processing on an IP address field of the obtained at least one first IP address data and physical address information corresponding to the IP address field, respectively, to obtain at least one second IP address data; when the first processing module respectively performs data processing on the IP address field of the obtained at least one first IP address data and the physical address information corresponding to the IP address field, the first processing module is configured to:
converting the dot decimal character string type IP address in the IP address field of each first IP address data into a decimal numeric type IP address;
simplifying and describing physical address information corresponding to the IP address field of each first IP address data to obtain simplified and described physical address information;
the second IP address data includes a decimal value type IP address and simplified description physical address information.
In one possible implementation, the physical address information includes a geographic location and a name of an internet service provider; when the first processing module performs simplified description on the physical address information corresponding to the IP address field of each piece of first IP address data, the first processing module is configured to:
and performing at least one of the following processing on the geographic position corresponding to the IP address field of each piece of first IP address data: filtering out non-key words in the geographic position, representing unknown information or missing information in the geographic position by using a first preset identifier, and mapping the geographic position into a corresponding country regional code according to a preset country regional code;
and performing at least one of the following processing on the name of the internet service provider corresponding to the IP address field of each piece of first IP address data: filtering out non-key words in the name of the Internet service provider, and characterizing other Internet service providers except the first N Internet service providers in an Internet service provider queue by using a second predetermined identifier, wherein the Internet service provider queue is used for sequencing the Internet service providers according to the order of the number of users from top to bottom.
In a possible implementation manner, when performing compression processing on an IP address segment of at least one piece of second IP address data, the second processing module is configured to:
when at least two continuous IP address segments exist in the plurality of IP address segments and the physical address information corresponding to the at least two continuous IP address segments is the same, merging the at least two continuous IP address segments to obtain a merged IP address segment, wherein the initial IP address of the merged IP address segment is the initial IP address of the first IP address segment of the at least two continuous IP address segments, and the terminal IP address of the merged IP address segment is the terminal IP address of the last IP address segment of the at least two continuous IP address segments;
the second processing module is configured to, when compressing physical address information corresponding to an IP address field of at least one second IP address data,:
and storing the physical address information corresponding to each IP address field into a preset aggregation list, and filtering the repeated physical address information when the repeated physical address information exists in the aggregation list.
In a possible implementation manner, the predetermined encoding manner includes a header area, a prefix area, an index area, and a content area, and the third processing module is configured to:
determining the total number of the combined IP address fields, the number of the IP address fields corresponding to each IP prefix and the total number of bytes of the residual physical address information after the repeated physical address information is filtered out from the aggregation list, wherein the IP prefix is a first decimal number of the IP address of a dotted decimal character string type;
writing the total number of the IP address fields after merging into a file header region, writing an index region starting position and an index region ending position determined according to the number of the IP address fields corresponding to each IP prefix into a prefix region as prefix data information, writing an ending IP address, a physical address information starting position and a physical address information length of each IP address field corresponding to each predetermined IP prefix into an index region as index data information, and writing the remaining physical address information in a content region after filtering the repeated physical address information in a set list;
the space size of the content area is the total number of bytes of the physical address information left after the repeated physical address information is filtered out from the aggregation list.
In a possible implementation manner, the apparatus further includes a fourth processing module, where the fourth processing module is configured to:
when receiving a query request for querying physical address information corresponding to a target IP address, determining a target IP prefix of the target IP address, and converting the target IP address into a decimal numerical value type IP address by a preset conversion method, wherein the target IP address is an IP address of a dotted decimal character string type, and the target IP prefix is a first decimal number of the target IP address;
inquiring a prefix area of the coded file according to the target IP prefix to obtain a corresponding target index area initial position and a target index area end position;
and searching physical address information corresponding to the decimal value type IP address in the interval between the starting position of the target index area and the ending position of the target index area.
In a possible implementation manner, when the fourth processing module searches for physical address information corresponding to the decimal value type IP address in the interval between the starting position of the target index area and the ending position of the target index area, the fourth processing module is configured to:
and searching index data information of which the current termination IP address is smaller than the decimal numerical value type IP address and the next termination IP address of the current termination IP address is larger than the decimal numerical value type IP address in an interval of the starting position of the target index area and the ending position of the target index area, and obtaining physical address information corresponding to the decimal numerical value type IP address according to the starting position of the physical address information and the length of the physical address information in the index data information.
In one possible implementation, the predetermined conversion method comprises a socket.
According to the device provided by the embodiment of the application, the at least one second IP address data can be obtained according to the at least one first IP address data, so that a precondition guarantee is provided for subsequent compression processing and encoding processing; the IP address field of at least one second IP address data and the physical address information corresponding to the IP address field are respectively compressed, so that the storage space required for storing the IP address data can be greatly reduced, meanwhile, the IP address data comprises the IP address field and the physical address information corresponding to the IP address field, and complete IP address data is ensured, so that the query accuracy of subsequent IP address data is improved, the result obtained by compression is coded into a corresponding coding file according to a preset coding mode, the storage space can be further saved, the subsequent query of the IP address data is facilitated, the query efficiency is improved, and the analysis of the IP address data is simpler and more efficient.
It should be noted that the present embodiment is an apparatus embodiment corresponding to the method embodiment described above, and the present embodiment can be implemented in cooperation with the method embodiment described above. The related technical details mentioned in the above method embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described method item embodiments.
Another embodiment of the present application provides an electronic device, as shown in fig. 6, an electronic device 600 shown in fig. 6 includes: a processor 601 and a memory 603. The processor 601 is coupled to the memory 603, such as via a bus 602. Further, the electronic device 600 may also include a transceiver 604. It should be noted that the transceiver 604 is not limited to one in practical applications, and the structure of the electronic device 600 is not limited to the embodiment of the present application.
The processor 601 is applied to the embodiment of the present application, and is used to implement the functions of the first processing module, the second processing module, and the third processing module shown in fig. 5. The transceiver 604 includes a receiver and a transmitter.
The processor 601 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 601 may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs and microprocessors, and the like.
Bus 602 may include a path that transfers information between the above components. The bus 602 may be a PCI bus or an EISA bus, etc. The bus 602 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.
Memory 603 may be, but is not limited to, ROM or other types of static storage devices that can store static information and instructions, RAM or other types of dynamic storage devices that can store information and instructions, EEPROM, CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 603 is used for storing application program codes for executing the scheme of the application, and the processor 601 controls the execution. The processor 601 is configured to execute the application program code stored in the memory 603 to implement the actions of the internet protocol IP address data processing apparatus provided by the embodiment shown in fig. 5.
The electronic device provided by the embodiment of the application comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the electronic device can realize that: obtaining at least one second IP address data according to the IP address field of the obtained at least one first IP address data and the physical address information corresponding to the IP address field; then, respectively compressing the IP address field of at least one second IP address data and the physical address information corresponding to the IP address field; and then, coding the result obtained by the compression processing into a corresponding coding file according to a preset coding mode so as to inquire the physical address information corresponding to the target IP address according to the coding file.
The embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the method shown in the above embodiment. According to the acquired at least one first IP address data, at least one second IP address data can be acquired, and a precondition guarantee is provided for subsequent compression processing and encoding processing; the IP address field of at least one second IP address data and the physical address information corresponding to the IP address field are respectively compressed, so that the storage space required for storing the IP address data can be greatly reduced, meanwhile, the IP address data comprises the IP address field and the physical address information corresponding to the IP address field, and complete IP address data is ensured, so that the query accuracy of subsequent IP address data is improved, the result obtained by compression is coded into a corresponding coding file according to a preset coding mode, the storage space can be further saved, the subsequent query of the IP address data is facilitated, the query efficiency is improved, and the analysis of the IP address data is simpler and more efficient.
The computer-readable storage medium provided by the embodiment of the application is suitable for any embodiment of the method.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (10)

1. An Internet Protocol (IP) address data processing method, wherein the IP address data comprises an IP address field and physical address information corresponding to the IP address field, the method is characterized by comprising the following steps:
obtaining at least one second IP address data according to the IP address field of the obtained at least one first IP address data and the physical address information corresponding to the IP address field;
respectively compressing the IP address field of the at least one second IP address data and the physical address information corresponding to the IP address field;
and coding the result obtained by the compression processing into a corresponding coding file according to a preset coding mode so as to inquire the physical address information corresponding to the target IP address according to the coding file.
2. The method according to claim 1, wherein obtaining at least one second IP address data according to the obtained at least one first IP address data comprises:
respectively carrying out data processing on the IP address field of the acquired at least one first IP address data and the physical address information corresponding to the IP address field to obtain at least one second IP address data;
the data processing method includes the following steps that data processing is respectively carried out on an IP address field of at least one piece of acquired first IP address data and physical address information corresponding to the IP address field, and the data processing method includes the following steps:
converting the dot decimal character string type IP address in the IP address field of each first IP address data into a decimal numeric type IP address;
simplifying and describing physical address information corresponding to the IP address field of each first IP address data to obtain simplified and described physical address information;
the second IP address data includes a decimal value type IP address and simplified description physical address information.
3. The method of claim 2, wherein the physical address information includes a geographic location and a name of an internet service provider; the simplified description of the physical address information corresponding to the IP address field of each piece of first IP address data includes:
and performing at least one of the following processing on the geographic position corresponding to the IP address field of each piece of first IP address data: filtering out non-key words in the geographic position, representing unknown information or missing information in the geographic position by using a first preset identifier, and mapping the geographic position into a corresponding country regional code according to a preset country regional code;
and performing at least one of the following processing on the name of the internet service provider corresponding to the IP address field of each piece of first IP address data: filtering out non-key words in the name of the Internet service provider, and characterizing other Internet service providers except the first N Internet service providers in an Internet service provider queue by using a second predetermined identifier, wherein the Internet service provider queue is used for sequencing the Internet service providers according to the order of the number of users from top to bottom.
4. The method of claim 1, wherein compressing the IP address segment of the at least one second IP address datum comprises:
when at least two continuous IP address segments exist in the plurality of IP address segments and the corresponding physical address information of the at least two continuous IP address segments is the same, merging the at least two continuous IP address segments to obtain a merged IP address segment, wherein the initial IP address of the merged IP address segment is the initial IP address of the first IP address segment of the at least two continuous IP address segments, and the terminal IP address of the merged IP address segment is the terminal IP address of the last IP address segment of the at least two continuous IP address segments;
compressing the physical address information corresponding to the IP address field of the at least one second IP address data, including:
storing the physical address information corresponding to each IP address field into a preset aggregation list, and filtering the repeated physical address information when the repeated physical address information exists in the aggregation list.
5. The method of claim 4, wherein the predetermined encoding scheme includes a header region, a prefix region, an index region, and a content region, and encoding the result of the compression process into a corresponding encoded file according to the predetermined encoding scheme includes:
determining the total number of the combined IP address fields, the number of the IP address fields corresponding to each IP prefix and the total number of bytes of the residual physical address information after the repeated physical address information is filtered out from the set list, wherein the IP prefix is a first decimal number of the IP address of a dotted decimal character string type;
writing the total number of the merged IP address fields into the file header area, writing an index area initial position and an index area end position determined according to the number of the IP address fields corresponding to each IP prefix into the prefix area as prefix data information, writing a predetermined termination IP address, a predetermined physical address information initial position and a predetermined physical address information length of each IP address field corresponding to each IP prefix into the index area as index data information, and writing the remaining physical address information in the aggregate list after filtering the repeated physical address information into the content area;
the space size of the content area is the total number of bytes of the physical address information remaining after the repeated physical address information is filtered out from the aggregate list.
6. The method according to claim 1, wherein after encoding the result of the compression process into the corresponding encoded file according to a predetermined encoding scheme, further comprising:
when receiving a query request for querying physical address information corresponding to a target IP address, determining a target IP prefix of the target IP address, and converting the target IP address into a decimal numeric type IP address by a predetermined conversion method, wherein the target IP address is an IP address of a dotted decimal character string type, and the target IP prefix is a first decimal number of the target IP address;
inquiring in a prefix area of the coding file according to the target IP prefix to obtain a corresponding target index area initial position and a target index area end position;
and searching physical address information corresponding to the decimal value type IP address in the interval between the starting position of the target index area and the ending position of the target index area.
7. The method of claim 6, wherein the searching for the physical address information corresponding to the decimal value type IP address in the interval between the starting position of the target index area and the ending position of the target index area comprises:
searching index data information of which the current termination IP address is smaller than the decimal numerical value type IP address and the next termination IP address of the current termination IP address is larger than the decimal numerical value type IP address in an interval of the starting position of the target index area and the ending position of the target index area, and obtaining physical address information corresponding to the decimal numerical value type IP address according to the starting position of the physical address information in the index data information and the length of the physical address information.
8. The method of claim 6, wherein the predetermined conversion method comprises a socket.
9. An Internet Protocol (IP) address data processing device, wherein IP address data comprises an IP address field and physical address information corresponding to the IP address field, comprising:
the first processing module is used for obtaining at least one second IP address data according to the IP address field of the obtained at least one first IP address data and the physical address information corresponding to the IP address field;
the second processing module is used for respectively compressing the IP address field of the at least one piece of second IP address data and the physical address information corresponding to the IP address field;
and the third processing module is used for coding the result obtained by the compression processing into a corresponding coding file according to a preset coding mode so as to inquire the physical address information corresponding to the target IP address according to the coding file.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1-8 when executing the program.
CN202010469330.0A 2020-05-28 2020-05-28 Internet Protocol (IP) address data processing method and device and electronic equipment Active CN111629081B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010469330.0A CN111629081B (en) 2020-05-28 2020-05-28 Internet Protocol (IP) address data processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010469330.0A CN111629081B (en) 2020-05-28 2020-05-28 Internet Protocol (IP) address data processing method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111629081A true CN111629081A (en) 2020-09-04
CN111629081B CN111629081B (en) 2023-07-28

Family

ID=72260149

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010469330.0A Active CN111629081B (en) 2020-05-28 2020-05-28 Internet Protocol (IP) address data processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111629081B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112398867A (en) * 2020-11-23 2021-02-23 欧冶云商股份有限公司 Black and white list limitation implementation method, platform, computer equipment and storage medium
CN112948376A (en) * 2021-02-02 2021-06-11 厦门服云信息科技有限公司 IP geographical position information query method, terminal equipment and storage medium
CN113779165A (en) * 2021-08-03 2021-12-10 北京邮电大学 Method for judging geographic position ambiguity of IP address and related equipment
CN114172731A (en) * 2021-12-09 2022-03-11 赛尔网络有限公司 Method, device, equipment and medium for quickly verifying and tracing IPv6 address
CN114492312A (en) * 2021-12-22 2022-05-13 深圳市小溪流科技有限公司 Coding and decoding method and system for IP country mapping information
CN114726592A (en) * 2022-03-21 2022-07-08 中国电信股份有限公司广州分公司 Method, device and equipment for detecting broadband attribute and storage medium
CN115052010A (en) * 2022-07-19 2022-09-13 北京微芯感知科技有限公司 Method and system for managing electronic certificate based on distributed storage

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104102707A (en) * 2014-07-10 2014-10-15 西安交通大学 Geographical attribution information inquiry method oriented to MapReduce frame
CN107682466A (en) * 2017-09-26 2018-02-09 小草数语(北京)科技有限公司 The regional information searching method and its device of IP address
CN109684303A (en) * 2018-12-10 2019-04-26 世纪龙信息网络有限责任公司 Communications codes attribution inquiry method, device, computer equipment and storage medium
JP2019095833A (en) * 2017-11-17 2019-06-20 株式会社ショーケース・ティービー Address management system
CN110769079A (en) * 2019-10-30 2020-02-07 杭州迪普科技股份有限公司 Method and device for retrieving geographic position corresponding to IP

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104102707A (en) * 2014-07-10 2014-10-15 西安交通大学 Geographical attribution information inquiry method oriented to MapReduce frame
CN107682466A (en) * 2017-09-26 2018-02-09 小草数语(北京)科技有限公司 The regional information searching method and its device of IP address
JP2019095833A (en) * 2017-11-17 2019-06-20 株式会社ショーケース・ティービー Address management system
CN109684303A (en) * 2018-12-10 2019-04-26 世纪龙信息网络有限责任公司 Communications codes attribution inquiry method, device, computer equipment and storage medium
CN110769079A (en) * 2019-10-30 2020-02-07 杭州迪普科技股份有限公司 Method and device for retrieving geographic position corresponding to IP

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112398867A (en) * 2020-11-23 2021-02-23 欧冶云商股份有限公司 Black and white list limitation implementation method, platform, computer equipment and storage medium
CN112948376A (en) * 2021-02-02 2021-06-11 厦门服云信息科技有限公司 IP geographical position information query method, terminal equipment and storage medium
CN113779165A (en) * 2021-08-03 2021-12-10 北京邮电大学 Method for judging geographic position ambiguity of IP address and related equipment
CN113779165B (en) * 2021-08-03 2023-07-28 北京邮电大学 IP address geographic position ambiguity judging method and related equipment
CN114172731A (en) * 2021-12-09 2022-03-11 赛尔网络有限公司 Method, device, equipment and medium for quickly verifying and tracing IPv6 address
CN114492312A (en) * 2021-12-22 2022-05-13 深圳市小溪流科技有限公司 Coding and decoding method and system for IP country mapping information
CN114492312B (en) * 2021-12-22 2022-09-20 深圳市小溪流科技有限公司 Coding and decoding method and system for IP country mapping information
CN114726592A (en) * 2022-03-21 2022-07-08 中国电信股份有限公司广州分公司 Method, device and equipment for detecting broadband attribute and storage medium
CN114726592B (en) * 2022-03-21 2024-04-05 中国电信股份有限公司广州分公司 Broadband attribute detection method, device, equipment and storage medium
CN115052010A (en) * 2022-07-19 2022-09-13 北京微芯感知科技有限公司 Method and system for managing electronic certificate based on distributed storage

Also Published As

Publication number Publication date
CN111629081B (en) 2023-07-28

Similar Documents

Publication Publication Date Title
CN111629081B (en) Internet Protocol (IP) address data processing method and device and electronic equipment
US10122788B2 (en) Managed function execution for processing data streams in real time
CN106407201B (en) Data processing method and device and computer readable storage medium
CN104657362A (en) Method and device for storing and querying data
CN108090064A (en) A kind of data query method, apparatus, data storage server and system
US20050027731A1 (en) Compression dictionaries
CN110020086B (en) User portrait query method and device
CN108733317B (en) Data storage method and device
CN111061678B (en) Service data processing method, device, computer equipment and storage medium
CN111274454B (en) Spatio-temporal data processing method and device, electronic equipment and storage medium
CN111680489B (en) Target text matching method and device, storage medium and electronic equipment
CN115023697A (en) Data query method and device and server
CN110334103B (en) Recommendation service updating method, providing device, access device and recommendation system
CN111447292A (en) IPv6 geographical position positioning method, device, equipment and storage medium
CN114398520A (en) Data retrieval method, system, device, electronic equipment and storage medium
CN110266834B (en) Area searching method and device based on internet protocol address
CN103051480B (en) The storage means of a kind of DN and DN storage device
CN104102707A (en) Geographical attribution information inquiry method oriented to MapReduce frame
CN111385379A (en) Internet of things identification method and device for eSIM terminal
CN109684450B (en) Industrial network data distribution service system and method based on semantic identification
CN111814020A (en) Data acquisition method and device
CN111538730B (en) Data statistics method and system based on Hash bucket algorithm
CN112329393A (en) Method, equipment and storage medium for generating short code ID
CN112650777A (en) Data warehouse manufacturing method and device, terminal equipment and computer storage medium
CN112162951A (en) Information retrieval method, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant