WO2024031972A1 - Method, system and apparatus for identifying repeated data, and storage medium and product - Google Patents

Method, system and apparatus for identifying repeated data, and storage medium and product Download PDF

Info

Publication number
WO2024031972A1
WO2024031972A1 PCT/CN2023/079344 CN2023079344W WO2024031972A1 WO 2024031972 A1 WO2024031972 A1 WO 2024031972A1 CN 2023079344 W CN2023079344 W CN 2023079344W WO 2024031972 A1 WO2024031972 A1 WO 2024031972A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
repeated
access
data packet
time
Prior art date
Application number
PCT/CN2023/079344
Other languages
French (fr)
Chinese (zh)
Inventor
罗来胜
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2024031972A1 publication Critical patent/WO2024031972A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/50Address allocation
    • H04L61/5007Internet protocol [IP] addresses
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Definitions

  • This application relates to but is not limited to the field of communications, and in particular, to a method, system, device, storage medium and product for identifying duplicate data.
  • data collection equipment In modern communication networks, data collection equipment is used to collect and analyze packets transmitted between network element devices in the IP communication network to learn the user's activity status and the transmission quality of the network. For example, when network communication is abnormal or Poor quality will cause application data packets to time out and be retransmitted. Data collection devices can analyze network quality by collecting application retransmission data packets.
  • Embodiments of the present application provide a method, system, device, storage medium and product for identifying duplicate data.
  • embodiments of the present application provide a method for identifying duplicate data, which includes: obtaining source-destination IP address pair information, where the source-destination IP address pair information includes sending-end IP address information and receiving-end IP address information; obtaining Repeated access type information, the repeated access type information is used to characterize the IP address access type that causes repeated data packets; obtain repeated access time information, the repeated access time information is used to represent the repeated data packet access time information; based on at least one of the source and destination IP address pair information, repeated access type information, and repeated access time information, identify duplicate data when generating duplicate data packets.
  • inventions of the present application provide a system for identifying duplicate data.
  • the system includes: a configuration module configured to configure a comparison table between IP and maximum survival time; and a data packet receiving module configured to receive data packets. , and sends the data packet to the data packet analysis module; the data packet analysis module is configured to parse the data packet to obtain IP identification information, and identify repeated data packets according to the IP identification information to obtain repeated access types.
  • Information wherein the IP identification information includes the sending end IP address information and the receiving end IP address information; the statistics module is configured to store the sending end IP address information, the receiving end IP address information, and repeated access of the repeated data packets. Type information and access time information of the repeated data packet.
  • embodiments of the present application provide a device for identifying duplicate data, including: at least one processor; at least one memory for storing at least one program; when at least one of the programs is executed by at least one of the processors hour Implement the method for identifying duplicate data as described in the first aspect.
  • embodiments of the present application provide a computer-readable storage medium in which a processor-executable program is stored. When the processor-executable program is executed by the processor, it is used to implement the first aspect. Duplicate data identification method.
  • embodiments of the present application provide a computer program product, including a computer program or computer instructions.
  • the computer program or computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device obtains the information from the computer.
  • the readable storage medium reads the computer program or the computer instructions, and the processor executes the computer program or the computer instructions, so that the computer device performs the method for identifying duplicate data as described in the first aspect.
  • Figure 1 is a network architecture diagram of a method for identifying duplicate data provided by an embodiment of the present application
  • Figure 2 is a flow chart of a method for identifying duplicate data provided by an embodiment of the present application
  • Figure 3 is a flow chart of a method for identifying duplicate data provided by an embodiment of the present application.
  • Figure 4 is a flow chart of a method for identifying duplicate data provided by an embodiment of the present application.
  • Figure 5 is a flow chart of a method for identifying duplicate data provided by an embodiment of the present application.
  • Figure 6 is a module block diagram of a method for identifying duplicate data provided by an embodiment of the present application.
  • Figure 7 is a flow chart of a method for identifying duplicate data provided by an embodiment of the present application.
  • Figure 8 is a flow chart of a method for identifying duplicate data provided by an embodiment of the present application.
  • Figure 9 is a flow chart of a method for identifying duplicate data provided by an embodiment of the present application.
  • Figure 10 is a working flow chart of the aging timer in the method for identifying duplicate data provided by an embodiment of the present application.
  • words such as setting, installation, and connection should be understood in a broad sense. Those skilled in the art can reasonably determine the meaning of the above words in the embodiments of this application based on the content of the technical solution. .
  • words such as “further”, “exemplarily” or “optionally” are used as examples, illustrations or illustrations, and should not be interpreted as being more preferable or better than other embodiments or designs. Advantages. The use of the words “further,” “exemplarily,” or “optionally” is intended to present relevant concepts.
  • the embodiments of this application can be applied to any communication network based on IP protocol.
  • data packets in IP communication systems are divided into two types, namely ordinary data packets and application retransmission data packets.
  • Ordinary data packets refer to data packets transmitted in the normal state
  • application retransmission data packets refer to Data packets are timed out and retransmitted due to abnormal network communication or poor quality among network element devices.
  • the packet data packets transmitted between network element devices in the IP communication network are collected and analyzed to identify users.
  • the activity status and network transmission quality have been widely used around the world.
  • the data aggregation equipment is upgraded or changed, and the data collection system is upgraded or changed. It will lead to the occurrence of abnormal situations such as repeated optical splitting access on one side of the same network element and repeated collection of multiple network elements.
  • data collection equipment there are four types of data packets collected, namely ordinary data packets and application retransmission data packets. , single-sided repeated spectrometry data packets and double-sided repeated acquisition data packets. The latter two data packets are generated due to misoperation. If these two data packets are not processed, it will not only increase the load of the data acquisition equipment, but also affect the actual operation. Interference occurs in the analysis of network quality.
  • one solution is: after the data collection device discovers data anomalies, perform reverse verification of the data anomalies, and check whether there are abnormalities in the front-end devices one by one. If an anomaly is found, readjust the access policy to ensure Due to the uniqueness of data access, this solution is too passive, has low efficiency and low intelligence, and will have a negative impact on product quality and user experience.
  • Another solution is to perform a repeatability check on the received data packets at the front end of the data collection device. If the current data packet is a duplicate data packet, filter it directly. Although this solution can Discard it to avoid sending it to the data collection device, but the phenomenon of repeated access of data packets still exists, and the user cannot locate the problem device.
  • embodiments of the present application provide a duplicate data identification method, system, device, storage medium and product.
  • the duplicate data identification method in this application applies to all systems based on IP networks.
  • IP networks include IPV4 and IPV6 versions. , but is not limited to any form of network structure.
  • the embodiments of this application are based on the identified source and destination IP address pair information, repeated access type and repeated access time messages, thereby helping to troubleshoot and ensure the accuracy of network quality analysis. performance while also helping to improve the performance of data acquisition systems and analysis systems.
  • Embodiments of the present application can be applied to the network architecture shown in Figure 1.
  • the network structure includes network element equipment, data packet aggregator, data collection system, data analysis system and duplicate data identification system.
  • the network equipment refers to each communication client. end and routing equipment, the data packet aggregator is used to aggregate the data of each network element equipment for subsequent equipment collection, the data acquisition system is used to collect data packets, and transmit the data packets to the data analysis system.
  • the data analysis system analyzes the received data.
  • the data packets are analyzed and processed to evaluate the network transmission quality between network element devices.
  • a duplicate data identification system is added between the data aggregator and the data collection system to identify the repeatedly accessed data. Packets are identified and fault diagnosis can be made based on the identified duplicate data packets.
  • the duplicate data identification method in the embodiment of the present application is implemented based on the duplicate data identification system.
  • Figure 2 shows a flow chart of a method for identifying duplicate data provided by an embodiment of the present application.
  • the method of identifying duplicate data in an embodiment of the present application at least includes but is not limited to the following steps:
  • Step S1100 Obtain the source and destination IP address pair information, which includes the sending end IP address information and the receiving end IP address information;
  • Step S1200 Obtain repeated access type information.
  • the repeated access type information is used to characterize the IP address access type that causes duplicate data packets to be generated;
  • Step S1300 Obtain repeated access time information, which is used to represent the time information of repeated data packet access;
  • step S1100, step S1200, and step S1300 the source and destination IP address pair information, repeated access type information, and repeated access time information are obtained through the repeated access statistics table.
  • the fields in the repeated access statistics table are configured in advance, and the repeated access statistics table is used to perform statistical calculations on repeated access information of repeated data packets. When duplicate data packets are accessed, the information in the duplicate access statistics table will be updated.
  • the steps of obtaining the repeated access statistics table include but are not limited to step S2110, step S2120 and step S2130.
  • Step S2110 Obtain duplicate data packets
  • a preset data packet information table is obtained.
  • the data packet information table is used to cache IP layer information of data packets and to determine whether the current data packet is a repeatedly accessed data packet.
  • the currently received data packet is parsed, the IP layer information of the data packet is parsed, and the IP identification information of the current data packet is obtained.
  • IP identification information is a keyword in the data packet information table. According to the IP identification information of the current data packet, the data packet information table is queried. If there is data matching the IP identification information of the current data packet in the data packet information table, Then the currently accessed data packet is a duplicate data packet, which is represented as a repeatedly collected data packet.
  • the currently accessed data packet is the initial data packet
  • the IP identification information of the initial data packet is stored in in the data packet information table, and update the corresponding other field information in the data packet information table.
  • Step S2120 Obtain the source and destination IP address pair information, repeated access type information and repeated access time information based on the repeated data packet;
  • obtaining the source and destination IP address pair information based on the repeated data packets includes: parsing the repeated data packets, obtaining the sending end IP address information and the receiving end IP address information of the repeated data packets, and converting the sending end IP address information Determine the source and destination IP address pair information with the receiving end IP address information;
  • the sending end IP address information and the receiving end IP address information are obtained from IP layer protocol parsing of the repeated data packet.
  • obtaining the repeated access type information based on the repeated data packets includes: identifying the repeated access type of the repeated data packets and obtaining the repeated access type information of the repeated data packets;
  • the step is to identify the repeated access type of the repeated data packet to obtain the repeated access type information of the repeated data packet, which may include but is not limited to step S2121, step S2122, step S2123 and step S2124:
  • Step S2121 Analyze the received current data packet to obtain the IP identification information used to identify the current data packet;
  • the IP identification information includes the sending end IP address information, the receiving end IP address information, IP identification information, IP fragmentation mark information, IP fragmentation offset information, protocol type, IP packet total length information.
  • the sending end IP address information corresponds to the Source Address field of the IP protocol layer
  • the receiving end IP address information corresponds to the Destination Address field of the IP protocol layer
  • the IP identification information corresponds to the Identification field of the IP protocol layer.
  • IP The fragmentation mark information corresponds to the Fragment Offset field of the IP protocol layer
  • the total length information of the IP packet corresponds to the Total Length field of the IP protocol layer
  • the survival time corresponds to the Time To Live field of the IP protocol layer.
  • the time to live is referred to as TTL.
  • the IP identification information includes the sending end IP address information, the receiving end IP address information, flow label information, payload length information, and next header information.
  • the sender IP address information corresponds to the Source IP Address field of the IP protocol layer
  • the receiver IP address information corresponds to the Destination IP Address of the IP protocol layer
  • the flow label information corresponds to the Flow label field of the IP protocol layer.
  • payload The payload length information corresponds to the Payload length field of the IP protocol layer.
  • the flow label information in the IPV6 version is similar to the IP identification information in the IPV4 version
  • the payload length information in the IPV6 version is similar to the payload part length of the IP packet total length information in the IPV4 version
  • the hop count limit in the IPV6 version The HopLimit field is actually equivalent to the Time To Live field in the IPV4 version. Time To Live and Hop Limit are both the survival time in the embodiment of this application.
  • each message will carry a life time value, that is, the parameter value of the life time.
  • the maximum survival time is set by the device corresponding to the sending IP address when constructing the message.
  • the maximum survival time is represented by N.
  • the value of N is generally 64.
  • Step S2122 Query and obtain historical access source tag information based on the IP identification information
  • a query is performed based on a preset data packet information table to obtain historical access source tag information.
  • the fields in the data packet information table include at least the sending end IP address information field, the receiving end IP address information field, the IP identification information field, the IP fragmentation mark information field, and the IP fragmentation bias information field.
  • migration information field IP packet total length information field
  • the fields in the data packet information table include at least the sending end IP address information field, the receiving end IP address information field, the flow label information field, the payload length information field, Next header information field.
  • Step S2123 Determine the access source of the duplicate data packet and obtain the access source determination result
  • step S2133 includes: obtaining the maximum survival time and the current survival time of the duplicate data packet; if the current survival time matches the maximum survival time, the access source judgment result is the sending end access; if the current survival time matches the maximum survival time, The survival time does not match, and the access source judgment result is access from the receiving end.
  • the current survival time matches the maximum survival time it means that the duplicate data packet has not passed through the routing device, indicating that the IP address of the sender of the duplicate data packet has been repeatedly entered. If the current survival time does not match the maximum survival time, it means that the data is duplicated.
  • the table has passed through the routing device, and the value of the current survival time has been reduced, indicating that the IP address of the receiving end of the duplicate data packet has been entered repeatedly.
  • the current survival time matches the maximum survival time, it can be understood as if the current survival time and the maximum survival time are equal. If the current survival time and the maximum survival time do not match, it can be understood as if the current survival time is equal to the maximum survival time. Not equal. In the embodiment of the present application, the current survival time and the maximum survival time are not equal, which means that the current survival time is less than the maximum survival time.
  • Step S2124 Determine the repeated access type information of the repeated data packet based on the access source judgment result and historical access source mark information.
  • the historical access source tag information includes historical sending end tag information and historical receiving end tag information
  • historical sender tag information and historical receiver tag information are stored in a data packet information table.
  • the historical access source tag information is obtained through the following steps: obtain the maximum survival time and the initial data packet, and the initial data packet is the data packet obtained for the first time; parse the initial data packet to obtain the initial survival time; if the initial data packet If the survival time matches the maximum survival time, set the historical sender's mark information to 1 and the historical receiver's mark information to 0; if the initial survival time does not match the maximum survival time, set the historical sender's mark information to 0 and the historical receiver's mark information The end flag information is set to 1.
  • the initial survival time matches the maximum survival time, it means that the initial data packet is from the sending end IP If the initial survival time does not match the maximum survival time, it means that the initial data packet was collected from the receiving IP address.
  • the initial survival time matches the maximum survival time, it can be understood as if the initial survival time and the maximum survival time are equal. If the initial survival time and the maximum survival time do not match, it can be understood as if the initial survival time and the maximum survival time do not match. Not equal.
  • the unequal initial survival time and the maximum survival time means that the initial survival time is less than the maximum survival time.
  • obtaining the maximum survival time includes: obtaining a preconfigured IP and maximum survival time comparison table; querying the IP and maximum survival time comparison table according to the sender IP address information to obtain the maximum survival time.
  • the at least fields included in the IP and maximum survival time comparison table are: the sending end IP address information field and the maximum survival time field.
  • the sending end IP address information field As the key of this table, the maximum survival time corresponding to the IP and maximum survival time comparison table can be queried through the sending end IP address information.
  • mapping relationship between the sending end IP address information and the maximum lifetime is obtained from the network element equipment vendor, which usually exists in the form of a configuration interface or configuration file.
  • the sending IP address information and the corresponding maximum survival time of the network element device have been stored in the IP and maximum survival time comparison table.
  • the IP and maximum survival time The survival time in the comparison table does not change as the data packet is transmitted between network element devices.
  • a query is performed in the IP and maximum survival time comparison table to obtain the corresponding maximum survival time.
  • step S2134 determining the repeated access type information of the repeated data packet according to the access source judgment result and historical access source mark information includes:
  • the historical sending end tag information is judged. If the historical sending end tag information is 1, it is determined that the repeated access type of the repeated data packet is the sending end IP repeated access type. ;
  • the historical sending end tag information is judged. If the historical receiving end tag information is 1, it is determined that the repeated access type of the repeated data packet is the receiving end IP repeated access type. ;
  • the historical sender mark information is judged. If the historical sender mark information is 0, it is determined that the repeated access type of the duplicate data packet is sender-receiver IP duplication. access type;
  • the historical receiving end tag information is judged. If the historical receiving end tag information is 0, it is determined that the repeated access type of the repeated data packet is sender-receiver IP duplication. Access type.
  • the sender IP repeated access type is referred to as SS type, that is, Source IP repeated access type
  • the receiving end IP repeated access type is referred to as DD type, that is, Destination IP repeated access type
  • the sender-receiver IP repeated access type is referred to as , referred to as SD type, that is, Source IP and Destination IP are repeated.
  • the method in the embodiment of the present application also includes: updating the historical access source tag information according to the current access source judgment result to obtain the current access source tag information; using the current access source tag information as the updated historical access source tag information. Enter source tag information.
  • the historical sending end mark information when the access source judgment result is that the sending end is accessed, the historical sending end mark information is judged. If the historical sending end mark information is 0, the historical sending end mark information is updated, and the historical sending end mark information is updated. The value of the sending end tag information is set to 1; when the access source judgment result is that the receiving end is accessed, the historical receiving end tag information is Make a judgment and if the historical receiving end mark information is 0, update the historical receiving end mark information and set the value of the historical receiving end mark information to 1.
  • the access time of repeated data packets is recorded to obtain repeated access time information
  • the step of recording the access time of repeated data packets and obtaining the repeated access time information may include but is not limited to step S2125, step S2126, and step S2127:
  • Step S2125 Obtain the repeated access start time, which is the time when repeated access occurs for the first time;
  • Step S2126 Obtain the latest time of repeated access, which is the latest time when repeated access occurs;
  • the time information in the duplicate access statistics table is updated according to the access time of the current data packet.
  • the updated time information includes the duplicate access start time field and the duplicate access The latest time field, or only update the latest time field for repeated access.
  • the repeated access start time and the repeated access latest time are both set to the access time of the current repeated data packet.
  • the repeated access start time remains unchanged, and the latest repeated access time is set as the access time of the current repeated data packet.
  • the latest time of repeated access is represented by the time of the latest access of a repeated data packet.
  • Step S2127 Use the difference between the latest repeated access time and the repeated access start time as repeated access time information.
  • the repeated access time information it can be known in which time period the repeated data packets are generated.
  • Step S2130 Update the corresponding information in the repeated access statistics table according to the source and destination IP address pair information, repeated access type information, and repeated access time information to obtain an updated repeated access statistics table.
  • the repeated access time information can be understood as the access time of repeated data packets.
  • Step S1400 Identify duplicate data generated in duplicate data packets based on at least one of the source and destination IP address pair information, duplicate access type information, and duplicate access time information.
  • the user can learn, for example, the source and destination IP address pair information, the number of accesses of the repeated access type, and the number of repeated access types. Access time information, etc., and then use this information to locate the device corresponding to the duplicate data packet, and monitor the subsequent fault elimination.
  • the fault elimination scenario will be described in detail below.
  • the method for identifying duplicate data in the embodiment of the present application also includes but is not limited to the following steps:
  • Step S1500 Construct a counter corresponding to each repeated access type based on the repeated access type information
  • the SS type counter is constructed according to the repeated access type SS type
  • the DD type counter is constructed based on the repeated access type DD type
  • the SD type counter is constructed based on the repeated access type SD type.
  • Step S1600 Display the source and destination IP address pair information, numerical information corresponding to the counter, and repeated access time information.
  • the repeated access statistics table is updated according to the source and destination IP address pair information and the repeated access type of the current data packet.
  • the update includes querying and obtaining the counter corresponding to the repeated access type based on the current source and destination IP address pair information, and adding 1 to the value of the counter obtained by querying. If the repeated access type is SS type, update the value of SS type counter and add 1 to the value of SS type counter; if the repeated access type is DD type, update the value of DD type counter and add 1 to the value of DD type counter; If the repeated access type is SD type, update the value of the SD type counter and add 1 to the value of the SD type counter.
  • the source and destination IP address pair information, numerical information corresponding to the counter, and repeated access time are displayed.
  • the display allows users to intuitively understand the source and destination IP address pairs of repeated data packets, the number of SS-type accesses, the number of DD-type accesses, the number of SD-type accesses, the latest time of repeated access and the start time of repeated access. , so that users can accurately understand the network element equipment corresponding to repeated data packets, so that they can quickly locate repeated access failure problems.
  • fault diagnosis is described.
  • fault elimination is reflected in three scenarios: not eliminated, partially eliminated, and completely eliminated.
  • fault elimination is divided into three scenarios: The fault is rectified for some source and destination IP address pairs and the fault is rectified for all source and destination IP address pairs.
  • a time threshold needs to be set; the elimination of the fault for partial source and destination IP addresses refers to if the SS type counter of a certain source and destination IP address pair and The DD type counters are constantly increasing.
  • the SS type counters After the time threshold, the SS type counters no longer increment, but the DD type counters are still increasing, indicating that the source and destination IP addresses are in the correct pairing, and the SS type repeated access fault has been eliminated, but the DD type counters The repeated access fault has not been eliminated; the elimination of the fault for all source and destination IP addresses means that after the time threshold, the SS type counter, DD type counter, and SD type counter no longer increase.
  • step S1600 the data in the updated repeated access statistics table is displayed on the terminal.
  • An embodiment of the present application also provides a duplicate data identification system, including a configuration module, a data packet receiving module and a data packet analysis module.
  • the configuration module is set to configure a comparison table between IP and maximum survival time.
  • the data packet receiving module The module is set to receive data packets and send the data packets to the data packet analysis module.
  • the data packet analysis module is set to parse the data packets to obtain IP identification information, and identify repeated data packets based on the IP identification information to obtain repeated access types.
  • Information where the IP identification information includes the sender IP address information and the receiver IP address information, and the statistics module is configured to store the sender IP address information, receiver IP address information, repeated access types, and duplicate data packets of duplicate data packets. access time information.
  • the configuration module is further configured to configure the turning on or off of the duplicate data identification function.
  • the duplicate data identification system in the embodiment of the present application is applicable to any communication network based on the IP protocol, including IPV4 and IPV6 versions, but is not limited to any form of network structure.
  • the configuration module can be a configurator
  • the data packet receiving module can be a data packet receiver
  • the data packet analysis module can be a data packet analyzer
  • the statistics module can be a repeated access statistician.
  • the duplicate data identification system includes a configurator, a data packet receiver, a data packet analyzer, a data packet sender, a duplicate access statistician and a duplicate access presenter.
  • the configurator is responsible for maintaining configuration information.
  • the current configuration information includes, but is not limited to, turning on and off the duplicate data identification function, configuring the maximum cache duration of data packets, maintaining the comparison table between IP and maximum survival time, and the maximum duration for fault resolution.
  • the default state of the duplicate data identification function is on, and the maximum cache duration of data packets is configured as 10 seconds.
  • the identified duplicate data packets are directly discarded to avoid sending duplicate data packets to the data collection system, thus increasing the workload of the data collection system.
  • the switch of the duplicate data identification function if the switch of the duplicate data identification function is turned off, all received data packets will not be identified, and the data packets will be directly forwarded to the data collection system to avoid the situation where there are no duplicate data packets. Next, the data transmission delay.
  • the comparison relationship between IP and maximum survival time can be obtained through parameter information provided by the collected device.
  • the fields in the comparison table between IP and maximum survival time are shown, including the sending end IP address information field and the maximum survival time field.
  • the end IP address information is a keyword that can uniquely determine a record in the comparison table between IP and maximum survival time. According to the sending end IP address information, the value of the maximum life time of the corresponding device can be queried.
  • each message in order to prevent messages from looping between routing devices and never reaching the destination, each message carries a survival time value.
  • the maximum survival time value is set by the sending IP address device when constructing the message, for example , use N to represent the initial value of the survival time, N is generally 64, each time the message passes through a routing device, the survival time value is reduced by 1, and is discarded when the value is reduced to 0, so the survival time value of the data packet satisfies: the survival time is less than equal to N.
  • the data packet receiver is mainly responsible for establishing and maintaining the duplicate data identification system and the communication link between the data packet aggregators. It is also responsible for receiving all data packets from the data packet aggregator and transferring the received data packets to the data Analyzer performs analysis and processing.
  • the data packet sender is mainly responsible for establishing and maintaining the communication link between the duplicate data identification system and the data collection system. It is also responsible for sending the data packets that need to be forwarded after processing by the data packet analyzer to the data collection system.
  • the data packet analyzer is mainly responsible for parsing the information of each protocol layer of the data packet, and identifying the repeated access type of the data packet based on the IP information of the protocol layer and the field information in the data packet information table.
  • IP layer protocol information that data packets need to parse is as shown in Table 2, including sender IP address information, receiver IP address information, IP identification information, IP fragmentation mark information, IP fragmentation offset information, protocol Type, IP packet total length information and TTL.
  • IP protocol layer fields corresponding to the information in Table 2 have been listed above.
  • the IP layer protocol information that data packets need to parse is as shown in Table 3, including the sending IP address information, the receiving IP address information, flow label information, payload length information, next header information, and hop limit.
  • Information Hop Limit
  • the IP protocol layer fields corresponding to the information in Table 3 have been listed above.
  • the packet analyzer is also responsible for maintaining the packet information table in order to cache the IP layer protocol information of the packets received within the maximum cache time.
  • the packet analyzer also performs regular detection on the packet information table. If the packets are not received for a specified period of time, the packet analyzer will Records of duplicate data packets are cleared in a timely manner, and the specified time can be configured by the user.
  • the aging time is set, the aging timer is used to regularly detect the records in the data packet information table, and the existence time of each record in the table is cyclically checked whether it is equal to or greater than the aging time. , if a record is equal to or greater than the aging time, delete the record.
  • the fields in the packet information table of the IPV4 version are shown, including the sending end IP address information, the receiving end IP address information, IP fragmentation mark information, and IP fragmentation offset information. , protocol type, and IP packet total length information are used as keywords for each record. The unique record in the data packet information table can be determined through the keywords.
  • the fields in the packet information table of the IPV6 version are shown.
  • the sender IP address information, receiver IP address information, flow label information, payload length information, and next header information are used as keywords for each record. This keyword can determine the only record in the packet information table.
  • the algorithm for accessing source tag information is: if the survival time carried by the current data packet is equal to the maximum survival time, then the value of the historical sender tag information is set to 1; if the survival time carried by the current data packet is not equal to the maximum survival time, then Set the value of the historical receiver tag information to 1.
  • the access source tag information of the initial data packet is explained, and the initial survival time of the initial data packet is compared with the maximum survival time. If the initial survival time is equal to the maximum survival time, it means that the initial data packet is sent from It is collected from the end IP address, and the historical sending end mark information is set to 1, and the historical receiving end mark information is set to 0; if the initial survival time is not equal to the maximum survival time, it means that the initial data packet was collected from the receiving end IP address. , and set the historical sending end tag information to 0, and the historical receiving end tag information to 1. After completing the above operations, the initial data packet is sent to the data collection system for network quality analysis.
  • the repeated access counter is mainly responsible for maintaining the repeated access statistics table, which is used to store the sending IP address and receiving IP address of repeated data packets, and to count the number of repeated accesses and other information, so that the information in the repeated access statistics table can be stored Send to repeat access presenter for display.
  • the fields in the repeated access statistics table are shown, including the sending end IP address information field, the receiving end IP address information field, SS type counter field, SS type counter field, SD type counter field, repeat Access start time field and repeat access time field.
  • the three counter fields are used to record the number of accesses of the three repeated access types.
  • the sending end IP address information and the receiving end IP address information are used as keywords for each record. This keyword can uniquely determine the only record in the repeated access statistics table.
  • the repeated access statistics table you can know the number of SS-type repeated accesses, the number of DD-type repeated accesses, and the number of SD-type repeated accesses corresponding to a certain source-destination IP address pair information.
  • the repeated access statistics table you can know whether a certain repeated access type fault in a source-destination IP address pair has been resolved. For example, set a time threshold. If the value of the SS type counter is greater than 0 and within the time threshold If the value no longer increases, it means that the SS type repeated access fault problem has been solved.
  • the repeated access time information of a certain source and destination IP address pair that is, there is a time period for repeated access.
  • the repeated access time period consists of the repeated access start time and the latest repeated access time. The difference is calculated.
  • the SS type counter, the DD type counter and the SD type counter can each be set to a repeated access latest time to respectively monitor the elimination time of the three types of repeated access.
  • the repeated access displayer is mainly responsible for displaying the information in the repeated access statistics table in the form of a visual interface, allowing users to intuitively understand the source and destination IP address pairs of repeated data packets, the number of SS type repeated accesses, DD type Information such as the number of repeated accesses, the number of SD-type repeated accesses, the repeated access start time, the latest repeated access time, etc., so that users can accurately know the network element equipment corresponding to the repeated access data packets, so that they can expressly Locate the fault.
  • the data packet analyzer passes the sending end IP address information, the receiving end IP address information and the repeated access type information of the repeated data packets to the repeated access statistician. According to the sending end IP address information, receiving end IP address information, query duplicate access statistics table.
  • the repeated access statistics table add a new record in the repeated access statistics table and add the sender IP address information and the receiver IP address.
  • the address information is added to the record, and according to the repeated access type information, the value of the corresponding counter is set to 1, the values of other counters are set to 0, and the repeated access start time and the latest repeated access time are set Set to current time.
  • the value of the counter corresponding to the repeated access type information is increased by 1, which is used to count the access of the repeated access type. quantity, and updates the latest repeated access time to the current time, indicating the access time of the latest repeated data packet.
  • the data analyzer queries and updates the data packet information table.
  • the query conditions are such as the keywords in Table 4 and Table 5.
  • the query fields are Historical sender mark information, historical receiver mark information, the update fields in the data packet information table are, historical sender mark information, historical receiver mark information.
  • the data packet analyzer sends a statistical message to the repeated access statistician.
  • the content of the message is the sending end IP address information, the receiving end IP address information and the repeated access type information.
  • the repeated access statistics counter queries and updates the repeated access statistics table.
  • the query conditions are the sending IP address information and the receiving IP address information.
  • the query fields are SS type counter, SS type counter and SD type counter.
  • the field that needs to be updated is SS.
  • the packet receiver receives all packets from the packet aggregator and forwards the received packets to the packet analyzer for processing.
  • the packet analyzer performs protocol analysis on the received data packets in order to obtain relevant information of the data packets, such as the sending IP address information, receiving IP address information, IP fragmentation mark information, and IP fragmentation offset in the IPV4 version Information, IP packet total length information, TTL, sender IP address information, receiver IP address information, flow label information, payload length information, next header information, and hop limit information in the IPV6 version.
  • the duplicate data packet After protocol analysis of the data packet, based on the information parsed from the current data packet and the information stored in the data packet information table, it is identified whether the current data packet is a duplicate data packet. If it is a duplicate data packet, the duplicate data packet is further identified. Repeat access type.
  • the sender IP address information, receiving end IP address information, and repeated access type information of the repeated data packet are sent to the repeated access statistician to count the number of repeated accesses. .
  • the current data packet is transferred to the data packet sender; if it is a duplicate data packet, the duplicate data packet is discarded.
  • the data packet sender When the data packet sender receives the data packet, it forwards the data packet to the data acquisition system.
  • the repeated access statistician When the repeated access statistician receives the statistics request from the packet analyzer, it updates the information in the repeated access statistics table and sends the information in the repeated access information table to the repeated access presenter.
  • the embodiment of the present application can accurately know which IP addresses have repeated access of data packets, and can know what type of repeated access type the repeated data packets belong to, and further obtain Know on which IP ends there are duplicate packets.
  • the embodiment of this application uses source and destination IP address pairs as units to count duplicate data packets, and can accurately know the number of duplicate data packet accesses for source and destination IP address pairs, and the three types of repeated accesses will be counted separately. And it can know the start time and the latest access time of duplicate data packets in the source and destination IP addresses.
  • the embodiment of the present application intuitively displays the information of repeated data packets to the user through repeated access to the presenter, so that the user can quickly locate the faulty device and troubleshoot the fault accurately and quickly. And through the changes in the three types of counters in the repeated access display, we can know whether the data packet repeated access failure is partially or completely eliminated; when the values of the three counters in the repeated access display are no longer incrementing, and combined with the latest The access time of a repeated data packet can be used to know at which point in time the repeated access failure is completely eliminated.
  • the embodiments of the present application avoid sending these abnormal data packets to the data collection system and data analysis system, thereby improving system performance and the accuracy of data analysis.
  • the embodiment of the present application provides a data source identification system that can pair information, repeated access types and repeated access time messages based on the identified source and destination IP addresses, thereby helping to troubleshoot and ensure the accuracy of network quality analysis. performance while also helping to improve the performance of data acquisition systems and analysis systems.
  • One embodiment of the present application also provides a device for identifying duplicate data.
  • the device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor can execute the computer program to implement the above.
  • the above-mentioned identification method of duplicate data is not limited to:
  • memory can be used to store non-transitory software programs and non-transitory computer executable programs.
  • the memory may include high-speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device.
  • the memory may include memory located remotely from the processor, and the remote memory may be connected to the processor through a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
  • the non-transitory software programs and instructions required to implement the image processing method of the above embodiment are stored in the memory.
  • the repetitive data identification method in the above embodiment is executed.
  • the network element embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • One embodiment of the present application also provides a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to execute the duplicate data identification method of the above embodiment.
  • memory can be used to store non-transitory software programs and non-transitory computer executable programs.
  • the memory may include high-speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device.
  • the memory may include memory located remotely from the processor, and the remote memory may be connected to the processor through a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
  • embodiments of the present application also provide a computer program product, which includes a computer program or computer instructions.
  • the computer program or computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer program from the computer-readable storage medium.
  • Program or computer instructions the processor executes the computer program or computer instructions, so that the computer device performs the above duplicate data identification method.
  • the mobile communication device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separate, that is, they may be located in one place, or may be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Provided in the embodiments of the present application are a method for identifying repeated data, a system for identifying repeated data, an apparatus for identifying repeated data, and a computer-readable storage medium and a computer program product. The method for identifying repeated data comprises: acquiring source-destination IP address pair information, wherein the source-destination IP address pair information comprises sending-end IP address information and receiving-end IP address information (S1100); acquiring repeated-access type information, wherein the repeated-access type information is used for representing an IP address access type, which causes the generation of a repeated data packet (S1200); acquiring repeated-access time information, wherein the repeated-access time information is used for representing time information of repeated data packet access (S1300); and according to at least one of the source-destination IP address pair information, the repeated-access type information and the repeated-access time information, performing repeated-data identification on the generated repeated data packet (S1400).

Description

重复数据的识别方法、系统、装置、存储介质及产品Duplicate data identification methods, systems, devices, storage media and products
相关申请的交叉引用Cross-references to related applications
本申请基于申请号为202210968336.1、申请日为2022年08月12日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is filed based on a Chinese patent application with application number 202210968336.1 and a filing date of August 12, 2022, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated into this application as a reference.
技术领域Technical field
本申请涉及但不限于通讯领域,尤其涉及一种重复数据的识别方法、系统、装置、存储介质及产品。This application relates to but is not limited to the field of communications, and in particular, to a method, system, device, storage medium and product for identifying duplicate data.
背景技术Background technique
在现代通讯网络中,通过使用数据采集设备对IP通讯网络中各网元设备之间传递的分组数据包进行采集分析,以获知用户的活动状态和网络的传输质量,例如,当网络通讯异常或质量不佳时会导致应用数据包超时重传,数据采集设备通过采集应用重传数据包可以分析网络质量。In modern communication networks, data collection equipment is used to collect and analyze packets transmitted between network element devices in the IP communication network to learn the user's activity status and the transmission quality of the network. For example, when network communication is abnormal or Poor quality will cause application data packets to time out and be retransmitted. Data collection devices can analyze network quality by collecting application retransmission data packets.
在实际应用的过程中,由于数据采集设备网络布局或配置不正确等原因,数据采集设备会重复采集数据包,而重复采集的数据包会对真实的网络质量判断造成干扰,如何对上述两种数据包的来源进行识别诊断,是目前亟待解决的问题。In the process of actual application, due to incorrect network layout or configuration of the data collection equipment, the data collection equipment will repeatedly collect data packets, and the repeatedly collected data packets will cause interference to the real network quality judgment. How to deal with the above two Identifying and diagnosing the source of data packets is an urgent problem that needs to be solved.
发明内容Contents of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics described in detail in this article. This summary is not intended to limit the scope of the claims.
本申请实施例提供了一种重复数据的识别方法、系统、装置、存储介质及产品。Embodiments of the present application provide a method, system, device, storage medium and product for identifying duplicate data.
第一方面,本申请实施例提供了一种重复数据的识别方法,包括:获取源目IP地址对信息,所述源目IP地址对信息包括发送端IP地址信息和接收端IP地址信息;获取重复接入类型信息,所述重复接入类型信息用于表征导致重复数据包产生的IP地址接入类型;获取重复接入时间信息,所述重复接入时间信息用于表征重复数据包接入的时间信息;根据所述源目IP地址对信息、重复接入类型信息与重复接入时间信息中的至少一个信息,对产生重复数据包进行重复数据的识别。In the first aspect, embodiments of the present application provide a method for identifying duplicate data, which includes: obtaining source-destination IP address pair information, where the source-destination IP address pair information includes sending-end IP address information and receiving-end IP address information; obtaining Repeated access type information, the repeated access type information is used to characterize the IP address access type that causes repeated data packets; obtain repeated access time information, the repeated access time information is used to represent the repeated data packet access time information; based on at least one of the source and destination IP address pair information, repeated access type information, and repeated access time information, identify duplicate data when generating duplicate data packets.
第二方面,本申请实施例提供了一种重复数据的识别系统,所述系统包括:配置模块,被设置为配置IP与最大生存时间对照关系表;数据包接收模块,被设置为接收数据包,并将所述数据包发送给数据包分析模块;数据包分析模块,被设置为解析所述数据包得到IP标识信息,并根据所述IP标识信息对重复数据包进行识别得到重复接入类型信息,其中,所述IP标识信息包括发送端IP地址信息和接收端IP地址信息;统计模块,被设置为存储所述重复数据包的发送端IP地址信息、接收端IP地址信息、重复接入类型信息以及所述重复数据包的接入时间信息。In the second aspect, embodiments of the present application provide a system for identifying duplicate data. The system includes: a configuration module configured to configure a comparison table between IP and maximum survival time; and a data packet receiving module configured to receive data packets. , and sends the data packet to the data packet analysis module; the data packet analysis module is configured to parse the data packet to obtain IP identification information, and identify repeated data packets according to the IP identification information to obtain repeated access types. Information, wherein the IP identification information includes the sending end IP address information and the receiving end IP address information; the statistics module is configured to store the sending end IP address information, the receiving end IP address information, and repeated access of the repeated data packets. Type information and access time information of the repeated data packet.
第三方面,本申请实施例提供了一种重复数据的识别装置,包括:至少一个处理器;至少一个存储器,用于存储至少一个程序;当至少一个所述程序被至少一个所述处理器执行时 实现如第一方面所述的重复数据的识别方法。In a third aspect, embodiments of the present application provide a device for identifying duplicate data, including: at least one processor; at least one memory for storing at least one program; when at least one of the programs is executed by at least one of the processors hour Implement the method for identifying duplicate data as described in the first aspect.
第四方面,本申请实施例提供了一种计算机可读存储介质,其中存储有处理器可执行的程序,所述处理器可执行的程序被处理器执行时用于实现如第一方面所述的重复数据的识别方法。In a fourth aspect, embodiments of the present application provide a computer-readable storage medium in which a processor-executable program is stored. When the processor-executable program is executed by the processor, it is used to implement the first aspect. Duplicate data identification method.
第五方面,本申请实施例提供了一种计算机程序产品,包括计算机程序或计算机指令,所述计算机程序或所述计算机指令存储在计算机可读存储介质中,计算机设备的处理器从所述计算机可读存储介质读取所述计算机程序或所述计算机指令,所述处理器执行所述计算机程序或所述计算机指令,使得所述计算机设备执行如第一方面所述的重复数据的识别方法。In a fifth aspect, embodiments of the present application provide a computer program product, including a computer program or computer instructions. The computer program or computer instructions are stored in a computer-readable storage medium. The processor of the computer device obtains the information from the computer. The readable storage medium reads the computer program or the computer instructions, and the processor executes the computer program or the computer instructions, so that the computer device performs the method for identifying duplicate data as described in the first aspect.
附图说明Description of drawings
图1为本申请一实施例提供的重复数据的识别方法的网络架构图;Figure 1 is a network architecture diagram of a method for identifying duplicate data provided by an embodiment of the present application;
图2为本申请一实施例提供的重复数据的识别方法的流程图;Figure 2 is a flow chart of a method for identifying duplicate data provided by an embodiment of the present application;
图3为本申请一实施例提供的重复数据的识别方法的流程图;Figure 3 is a flow chart of a method for identifying duplicate data provided by an embodiment of the present application;
图4为本申请一实施例提供的重复数据的识别方法的流程图;Figure 4 is a flow chart of a method for identifying duplicate data provided by an embodiment of the present application;
图5为本申请一实施例提供的重复数据的识别方法的流程图;Figure 5 is a flow chart of a method for identifying duplicate data provided by an embodiment of the present application;
图6为本申请一实施例提供的重复数据的识别方法的模块框图;Figure 6 is a module block diagram of a method for identifying duplicate data provided by an embodiment of the present application;
图7为本申请一实施例提供的重复数据的识别方法的流程图;Figure 7 is a flow chart of a method for identifying duplicate data provided by an embodiment of the present application;
图8为本申请一实施例提供的重复数据的识别方法的流程图;Figure 8 is a flow chart of a method for identifying duplicate data provided by an embodiment of the present application;
图9为本申请一实施例提供的重复数据的识别方法的流程图;Figure 9 is a flow chart of a method for identifying duplicate data provided by an embodiment of the present application;
图10为本申请一实施例提供的重复数据的识别方法中老化定时器的工作流程图。Figure 10 is a working flow chart of the aging timer in the method for identifying duplicate data provided by an embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the embodiments described here are only used to explain the present application and are not used to limit the present application.
需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that although the functional modules are divided in the device schematic diagram and the logical sequence is shown in the flow chart, in some cases, the modules can be divided into different modules in the device or the order in the flow chart can be executed. The steps shown or described. The terms "first", "second", etc. in the description, claims, and above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific sequence or sequence.
本申请实施例的描述中,除非另有明确的限定,设置、安装、连接等词语应做广义理解,所属技术领域技术人员可以结合技术方案的内容合理确定上述词语在本申请实施例中的含义。本申请实施例中,“进一步地”、“示例性地”或者“可选地”等词用于表示作为例子、例证或说明,不应被解释为比其它实施例或设计方案更优选或更具有优势。使用“进一步地”、“示例性地”或者“可选地”等词旨在呈现相关概念。In the description of the embodiments of this application, unless otherwise explicitly limited, words such as setting, installation, and connection should be understood in a broad sense. Those skilled in the art can reasonably determine the meaning of the above words in the embodiments of this application based on the content of the technical solution. . In the embodiments of this application, words such as "further", "exemplarily" or "optionally" are used as examples, illustrations or illustrations, and should not be interpreted as being more preferable or better than other embodiments or designs. Advantages. The use of the words "further," "exemplarily," or "optionally" is intended to present relevant concepts.
本申请实施例可以应用于基于IP协议的任何通信网络。The embodiments of this application can be applied to any communication network based on IP protocol.
一般情况下,IP通讯系统中的数据包分为两种,分别是普通数据包和应用重传数据包,普通数据包指的是在正常状态传输的数据包,应用重传数据包指的是由于网元设备之间的网络通讯异常或者质量不佳等原因而导致数据包超时重传。Generally speaking, data packets in IP communication systems are divided into two types, namely ordinary data packets and application retransmission data packets. Ordinary data packets refer to data packets transmitted in the normal state, and application retransmission data packets refer to Data packets are timed out and retransmitted due to abnormal network communication or poor quality among network element devices.
目前,通过对IP通讯网络中网元设备之间传递的分组数据包进行采集分析,以识别用户 的活动状态以及网络的传输质量等已广泛应用于全球。在实际应用过程中,由于数据采集设备网络布局或者配置不正确等原因,例如,IP通讯系统组网结构发生变更、数据汇聚设备发生升级或变更、数据采集系统发生升级或变更。将会导致同一网元单侧重复分光接入、多网元重复采集等异常情况的发生,对于数据采集设备来说,采集到的数据包有四种,即普通数据包、应用重传数据包、单侧重复分光数据包和双侧重复采集数据包,而后两种数据包是由于误操作产生的,若不对这两种数据包进行处理,不仅会增加数据采集设备的负荷,还会对实际网络质量的分析产生干扰。Currently, the packet data packets transmitted between network element devices in the IP communication network are collected and analyzed to identify users. The activity status and network transmission quality have been widely used around the world. In the actual application process, due to incorrect network layout or configuration of the data collection equipment, for example, the IP communication system network structure changes, the data aggregation equipment is upgraded or changed, and the data collection system is upgraded or changed. It will lead to the occurrence of abnormal situations such as repeated optical splitting access on one side of the same network element and repeated collection of multiple network elements. For data collection equipment, there are four types of data packets collected, namely ordinary data packets and application retransmission data packets. , single-sided repeated spectrometry data packets and double-sided repeated acquisition data packets. The latter two data packets are generated due to misoperation. If these two data packets are not processed, it will not only increase the load of the data acquisition equipment, but also affect the actual operation. Interference occurs in the analysis of network quality.
在一些解决方案中,一种解决方案是:在数据采集设备发现数据异常后,对数据异常进行反推查证,逐个排查前端设备是否存在异常,如发现异常,则重新调整接入策略,以确保数据接入的唯一性,这种解决方案过于被动,效率较低、智能化程度低、对产品的质量和用户的体验都会产生不良的影响。In some solutions, one solution is: after the data collection device discovers data anomalies, perform reverse verification of the data anomalies, and check whether there are abnormalities in the front-end devices one by one. If an anomaly is found, readjust the access policy to ensure Due to the uniqueness of data access, this solution is too passive, has low efficiency and low intelligence, and will have a negative impact on product quality and user experience.
另一种解决方案为:在数据采集设备的前端,对接收到的数据包进行重复性校验,若当前数据包为重复数据包则直接将其过滤,这一解决方案虽然能够将重复数据包丢弃,避免将其发送给数据采集设备,但是数据包重复接入的现象依然存在,用户无法定位到问题设备。Another solution is to perform a repeatability check on the received data packets at the front end of the data collection device. If the current data packet is a duplicate data packet, filter it directly. Although this solution can Discard it to avoid sending it to the data collection device, but the phenomenon of repeated access of data packets still exists, and the user cannot locate the problem device.
基于此,本申请实施例提供了一种重复数据的识别方法、系统、装置、存储介质及产品,本申请中重复数据的识别方法应用于所有基于IP网络的系统,IP网络包括IPV4和IPV6版本,但不局限于任何形式的网络结构,本申请实施例根据识别到的源目IP地址对信息、重复接入类型和重复接入时间消息,从而有助于排查故障,保证网络质量分析的准确性,同时还有助于提高数据采集系统和分析系统的性能。Based on this, embodiments of the present application provide a duplicate data identification method, system, device, storage medium and product. The duplicate data identification method in this application applies to all systems based on IP networks. IP networks include IPV4 and IPV6 versions. , but is not limited to any form of network structure. The embodiments of this application are based on the identified source and destination IP address pair information, repeated access type and repeated access time messages, thereby helping to troubleshoot and ensure the accuracy of network quality analysis. performance while also helping to improve the performance of data acquisition systems and analysis systems.
本申请实施例可应用于如图1所示的网络架构,该网络结构包括网元设备、数据包汇聚器、数据采集系统、数据分析系统和重复数据识别系统,网络设备指的是各个通讯客户端及路由设备,数据包汇聚器用于汇聚各个网元设备的数据,以供后续设备进行采集,数据采集系统用于采集数据包,并将数据包传送至数据分析系统,数据分析系统对接收到的数据包进行分析处理,以对网元设备之间的网络传输质量进行评判,本申请实施例在数据汇聚器和数据采集系统之间新增一个重复数据识别系统,以对重复接入的数据包进行识别,并能够根据识别到的重复数据包进行故障的诊断。本申请实施例的重复数据的识别方法基于该重复数据识别系统实现的。Embodiments of the present application can be applied to the network architecture shown in Figure 1. The network structure includes network element equipment, data packet aggregator, data collection system, data analysis system and duplicate data identification system. The network equipment refers to each communication client. end and routing equipment, the data packet aggregator is used to aggregate the data of each network element equipment for subsequent equipment collection, the data acquisition system is used to collect data packets, and transmit the data packets to the data analysis system. The data analysis system analyzes the received data. The data packets are analyzed and processed to evaluate the network transmission quality between network element devices. In the embodiment of this application, a duplicate data identification system is added between the data aggregator and the data collection system to identify the repeatedly accessed data. Packets are identified and fault diagnosis can be made based on the identified duplicate data packets. The duplicate data identification method in the embodiment of the present application is implemented based on the duplicate data identification system.
下面对相关技术作进一步阐述。Related technologies are further elaborated below.
下面结合附图2,对重复数据的识别方法进行阐述。The method for identifying duplicate data will be described below in conjunction with Figure 2.
如图2所示,图2所示是本申请实施例提供的重复数据的识别方法的流程图,本申请实施例中的重复数据的识别方法至少包括但不限于以下步骤:As shown in Figure 2, Figure 2 shows a flow chart of a method for identifying duplicate data provided by an embodiment of the present application. The method of identifying duplicate data in an embodiment of the present application at least includes but is not limited to the following steps:
步骤S1100:获取源目IP地址对信息,源目IP地址对信息包括发送端IP地址信息和接收端IP地址信息;Step S1100: Obtain the source and destination IP address pair information, which includes the sending end IP address information and the receiving end IP address information;
步骤S1200:获取重复接入类型信息,重复接入类型信息用于表征导致重复数据包产生的IP地址接入类型;Step S1200: Obtain repeated access type information. The repeated access type information is used to characterize the IP address access type that causes duplicate data packets to be generated;
步骤S1300:获取重复接入时间信息,重复时间信息用于表征重复数据包接入的时间信息;Step S1300: Obtain repeated access time information, which is used to represent the time information of repeated data packet access;
在一实施方式中,在步骤S1100、步骤S1200、步骤S1300中,通过重复接入统计表获取源目IP地址对信息、重复接入类型信息与重复接入时间信息。 In one embodiment, in step S1100, step S1200, and step S1300, the source and destination IP address pair information, repeated access type information, and repeated access time information are obtained through the repeated access statistics table.
在一实施方式中,重复接入统计表中的字段提前配置,重复接入统计表用于对重复数据包的重复接入信息进行统计计算。当有重复数据包接入时,会更新重复接入统计表中的信息。In one embodiment, the fields in the repeated access statistics table are configured in advance, and the repeated access statistics table is used to perform statistical calculations on repeated access information of repeated data packets. When duplicate data packets are accessed, the information in the duplicate access statistics table will be updated.
在一实施方式中,如图3所示,重复接入统计表的获取步骤包括但不限于步骤S2110、步骤S2120和步骤S2130。In one embodiment, as shown in Figure 3, the steps of obtaining the repeated access statistics table include but are not limited to step S2110, step S2120 and step S2130.
步骤S2110:获取重复数据包;Step S2110: Obtain duplicate data packets;
在一实施方式中,获取预设的数据包信息表,数据包信息表用于缓存数据包的IP层信息,用于判断当前数据包是否为重复接入的数据包。In one embodiment, a preset data packet information table is obtained. The data packet information table is used to cache IP layer information of data packets and to determine whether the current data packet is a repeatedly accessed data packet.
在一实施方式中,从通讯客户端或者路由设备中获取新的数据包时,将当前接收到的数据包进行解析,对数据包的IP层信息进行解析,得到当前数据包的IP标识信息,IP标识信息是数据包信息表中的关键字,根据当前数据包的IP标识信息,对数据包信息表进行查询,若在数据包信息表中存在与当前数据包的IP标识信息匹配的数据,则当前接入的数据包即为重复数据包,表示为重复采集的数据包。In one embodiment, when a new data packet is obtained from the communication client or routing device, the currently received data packet is parsed, the IP layer information of the data packet is parsed, and the IP identification information of the current data packet is obtained. IP identification information is a keyword in the data packet information table. According to the IP identification information of the current data packet, the data packet information table is queried. If there is data matching the IP identification information of the current data packet in the data packet information table, Then the currently accessed data packet is a duplicate data packet, which is represented as a repeatedly collected data packet.
在一实施方式中,若在数据包信息表中不存在与当前数据包的IP标识信息匹配的数据,则当前接入的数据包即为初始数据包,将初始数据包的IP标识信息存储至数据包信息表中,并更新数据包信息表中对应的其他字段信息。In one implementation, if there is no data matching the IP identification information of the current data packet in the data packet information table, the currently accessed data packet is the initial data packet, and the IP identification information of the initial data packet is stored in in the data packet information table, and update the corresponding other field information in the data packet information table.
步骤S2120:根据重复数据包,得到源目IP地址对信息、重复接入类型信息与重复接入时间信息;Step S2120: Obtain the source and destination IP address pair information, repeated access type information and repeated access time information based on the repeated data packet;
在一实施方式中,根据重复数据包,得到源目IP地址对信息包括:对重复数据包进行解析,得到重复数据包的发送端IP地址信息与接收端IP地址信息,将发送端IP地址信息与接收端IP地址信息确定为源目IP地址对信息;In one embodiment, obtaining the source and destination IP address pair information based on the repeated data packets includes: parsing the repeated data packets, obtaining the sending end IP address information and the receiving end IP address information of the repeated data packets, and converting the sending end IP address information Determine the source and destination IP address pair information with the receiving end IP address information;
在一实施方式中,发送端IP地址信息和接收端IP地址信息从重复数据包的IP层协议解析得到。In one embodiment, the sending end IP address information and the receiving end IP address information are obtained from IP layer protocol parsing of the repeated data packet.
在一实施方式中,根据重复数据包,得到重复接入类型信息包括:对重复数据包进行重复接入类型识别,得到重复数据包的重复接入类型信息;In one embodiment, obtaining the repeated access type information based on the repeated data packets includes: identifying the repeated access type of the repeated data packets and obtaining the repeated access type information of the repeated data packets;
在一实施方式中,如图4所示,步骤对重复数据包进行重复接入类型识别,得到重复数据包的重复接入类型信息,可以包括但不限于步骤S2121、步骤S2122、步骤S2123和步骤S2124:In one embodiment, as shown in Figure 4, the step is to identify the repeated access type of the repeated data packet to obtain the repeated access type information of the repeated data packet, which may include but is not limited to step S2121, step S2122, step S2123 and step S2124:
步骤S2121:对接收到的当前数据包进行解析,得到用于标识当前数据包的IP标识信息;Step S2121: Analyze the received current data packet to obtain the IP identification information used to identify the current data packet;
在一实施方式中,以IPV4为例,IP标识信息包括发送端IP地址信息、接收端IP地址信息、IP标识信息、IP分片标记信息、IP分片偏移信息、协议类型、IP包总长度信息。需要说明的是,在IPV4版本中,发送端IP地址信息对应IP协议层的Source Address字段,接收端IP地址信息对应IP协议层的Destination Address字段,IP标识信息对应IP协议层的Identification字段,IP分片标记信息对应IP协议层的Fragment Offset字段,IP包总长度信息对应IP协议层的Total Length字段,生存时间对应IP协议层的Time To Live字段,生存时间简称TTL。In one implementation, taking IPv4 as an example, the IP identification information includes the sending end IP address information, the receiving end IP address information, IP identification information, IP fragmentation mark information, IP fragmentation offset information, protocol type, IP packet total length information. It should be noted that in the IPV4 version, the sending end IP address information corresponds to the Source Address field of the IP protocol layer, the receiving end IP address information corresponds to the Destination Address field of the IP protocol layer, and the IP identification information corresponds to the Identification field of the IP protocol layer. IP The fragmentation mark information corresponds to the Fragment Offset field of the IP protocol layer, the total length information of the IP packet corresponds to the Total Length field of the IP protocol layer, and the survival time corresponds to the Time To Live field of the IP protocol layer. The time to live is referred to as TTL.
在一实施方式中,以IPV6为例,IP标识信息包括发送端IP地址信息、接收端IP地址信息、流标签信息、有效载荷长度信息、下一个头部信息。需要说明的是,在IPV6版本中,发送端IP地址信息对应IP协议层的Source IP Address字段,接收端IP地址信息对应IP协议层的Destination IP Address,流标签信息对应IP协议层的Flow lable字段,有效载 荷长度信息对应IP协议层的Payload length字段。In one embodiment, taking IPv6 as an example, the IP identification information includes the sending end IP address information, the receiving end IP address information, flow label information, payload length information, and next header information. It should be noted that in the IPV6 version, the sender IP address information corresponds to the Source IP Address field of the IP protocol layer, the receiver IP address information corresponds to the Destination IP Address of the IP protocol layer, and the flow label information corresponds to the Flow label field of the IP protocol layer. , payload The payload length information corresponds to the Payload length field of the IP protocol layer.
其中,IPV6版本中的流标签信息类似于IPV4版本的IP标识信息,IPV6版本中的有效载荷长度信息类似于IPV4版本中的IP包总长度信息的净荷部分长度,IPV6版本中的跳数限制HopLimit字段,实际上等效于IPV4版本中的Time To Live字段,Time To Live和Hop Limit均是本申请实施例中的生存时间。Among them, the flow label information in the IPV6 version is similar to the IP identification information in the IPV4 version, the payload length information in the IPV6 version is similar to the payload part length of the IP packet total length information in the IPV4 version, and the hop count limit in the IPV6 version The HopLimit field is actually equivalent to the Time To Live field in the IPV4 version. Time To Live and Hop Limit are both the survival time in the embodiment of this application.
需要注意的是,根据IP标准协议规定,为了避免数据消息在路由设备间出现循环传递而永远达不到目的地的情况,每一条消息均会携带一个生命时间值,即生存时间的参数值,最大生存时间由发送端IP地址对应设备构造消息时设置。最大生存时间用N表示,N的值一般为64,消息每经过一个路由设备时,该生存时间值减1,直到生存时间值为0时消息被丢弃,因此,每个数据包的值应当满足生存时间值小于等于N。It should be noted that according to the IP standard protocol, in order to avoid the situation where data messages are transmitted in a loop between routing devices and never reach the destination, each message will carry a life time value, that is, the parameter value of the life time. The maximum survival time is set by the device corresponding to the sending IP address when constructing the message. The maximum survival time is represented by N. The value of N is generally 64. Each time a message passes through a routing device, the survival time value is reduced by 1 until the survival time value reaches 0 and the message is discarded. Therefore, the value of each data packet should satisfy The survival time value is less than or equal to N.
步骤S2122:根据IP标识信息,查询得到历史接入来源标记信息;Step S2122: Query and obtain historical access source tag information based on the IP identification information;
在一实施方式中,根据预设的数据包信息表进行查询,得到历史接入来源标记信息。In one implementation, a query is performed based on a preset data packet information table to obtain historical access source tag information.
在一实施方式中,IPV4版本中,数据包信息表中的字段至少包括,发送端IP地址信息字段、接收端IP地址信息字段、IP标识信息字段、IP分片标记信息字段、IP分片偏移信息字段、IP包总长度信息字段;IPV6版本中,数据包信息表中的字段至少包括,发送端IP地址信息字段、接收端IP地址信息字段、流标签信息字段、有效载荷长度信息字段、下一个头部信息字段。In one embodiment, in the IPV4 version, the fields in the data packet information table include at least the sending end IP address information field, the receiving end IP address information field, the IP identification information field, the IP fragmentation mark information field, and the IP fragmentation bias information field. migration information field, IP packet total length information field; in the IPV6 version, the fields in the data packet information table include at least the sending end IP address information field, the receiving end IP address information field, the flow label information field, the payload length information field, Next header information field.
步骤S2123:对重复数据包的接入来源进行判断,得到接入来源判断结果;Step S2123: Determine the access source of the duplicate data packet and obtain the access source determination result;
在一实施方式中,步骤S2133包括:获取最大生存时间与重复数据包的当前生存时间;若当前生存时间与最大生存时间匹配,接入来源判断结果为发送端接入;若当前生存时间与最大生存时间不匹配,接入来源判断结果为接收端接入。换句话说,若当前生存时间与最大生存时间匹配,表示重复数据包未经过路由设备,说明重复数据包的发送端IP地址存在重复输入,若当前生存时间与最大生存时间不匹配,表示重复数据表经过了路由设备,而使当前生存时间的数值减少,说明重复数据包的接收端IP地址存在重复输入。In one embodiment, step S2133 includes: obtaining the maximum survival time and the current survival time of the duplicate data packet; if the current survival time matches the maximum survival time, the access source judgment result is the sending end access; if the current survival time matches the maximum survival time, The survival time does not match, and the access source judgment result is access from the receiving end. In other words, if the current survival time matches the maximum survival time, it means that the duplicate data packet has not passed through the routing device, indicating that the IP address of the sender of the duplicate data packet has been repeatedly entered. If the current survival time does not match the maximum survival time, it means that the data is duplicated. The table has passed through the routing device, and the value of the current survival time has been reduced, indicating that the IP address of the receiving end of the duplicate data packet has been entered repeatedly.
在一实施方式中,若当前生存时间与最大生存时间匹配,理解为若当前生存时间与最大生存时间相等,若当前生存时间与最大生存时间不匹配,可以理解为若当前生存时间与最大生存时间不相等,在本申请实施例中,当前生存时间与最大生存时间不相等表示为当前生存时间小于最大生存时间。In one embodiment, if the current survival time matches the maximum survival time, it can be understood as if the current survival time and the maximum survival time are equal. If the current survival time and the maximum survival time do not match, it can be understood as if the current survival time is equal to the maximum survival time. Not equal. In the embodiment of the present application, the current survival time and the maximum survival time are not equal, which means that the current survival time is less than the maximum survival time.
步骤S2124:根据接入来源判断结果与历史接入来源标记信息,确定重复数据包的重复接入类型信息。Step S2124: Determine the repeated access type information of the repeated data packet based on the access source judgment result and historical access source mark information.
在一实施方式中,历史接入来源标记信息包括历史发送端标记信息和历史接收端标记信息;In one embodiment, the historical access source tag information includes historical sending end tag information and historical receiving end tag information;
在一实施方式中,历史发送端标记信息和历史接收端标记信息存储于数据包信息表中。In one implementation, historical sender tag information and historical receiver tag information are stored in a data packet information table.
在一实施方式中,历史接入来源标记信息通过以下步骤得到:获取最大生存时间与初始数据包,初始数据包为首次获取的数据包;对初始数据包进行解析,得到初始生存时间;若初始生存时间与最大生存时间匹配,将历史发送端标记信息置为1,历史接收端标记信息置为0;若初始生存时间与最大生存时间不匹配,将历史发送端标记信息置为0,历史接收端标记信息置为1。In one implementation, the historical access source tag information is obtained through the following steps: obtain the maximum survival time and the initial data packet, and the initial data packet is the data packet obtained for the first time; parse the initial data packet to obtain the initial survival time; if the initial data packet If the survival time matches the maximum survival time, set the historical sender's mark information to 1 and the historical receiver's mark information to 0; if the initial survival time does not match the maximum survival time, set the historical sender's mark information to 0 and the historical receiver's mark information The end flag information is set to 1.
在一实施方式中,若初始生存时间与最大生存时间匹配,表示初始数据包是从发送端IP 地址采集到的,若初始生存时间与最大生存时间不匹配,表示初始数据包是从接收端IP地址采集到的。In one implementation, if the initial survival time matches the maximum survival time, it means that the initial data packet is from the sending end IP If the initial survival time does not match the maximum survival time, it means that the initial data packet was collected from the receiving IP address.
在一实施方式中,若初始生存时间与最大生存时间匹配,理解为若初始生存时间与最大生存时间相等,若初始生存时间与最大生存时间不匹配,可以理解为若初始生存时间与最大生存时间不相等,在本申请实施例中,初始生存时间与最大生存时间不相等表示为初始生存时间小于最大生存时间。In one embodiment, if the initial survival time matches the maximum survival time, it can be understood as if the initial survival time and the maximum survival time are equal. If the initial survival time and the maximum survival time do not match, it can be understood as if the initial survival time and the maximum survival time do not match. Not equal. In the embodiment of the present application, the unequal initial survival time and the maximum survival time means that the initial survival time is less than the maximum survival time.
在一实施方式中,获取最大生存时间包括:获取预先配置的IP与最大生存时间对照关系表;根据发送端IP地址信息在IP与最大生存时间对照关系表中进行查询,得到最大生存时间。In one embodiment, obtaining the maximum survival time includes: obtaining a preconfigured IP and maximum survival time comparison table; querying the IP and maximum survival time comparison table according to the sender IP address information to obtain the maximum survival time.
在一实施方式中,IP与最大生存时间对照关系表中至少包含的字段有:发送端IP地址信息字段和最大生存时间字段,在IP与最大生存时间对照关系表中,发送端IP地址信息字段作为该表的关键字,通过发送端IP地址信息可以查询到IP与最大生存时间对照关系表中对应的最大生存时间。In one embodiment, the at least fields included in the IP and maximum survival time comparison table are: the sending end IP address information field and the maximum survival time field. In the IP and maximum survival time comparison table, the sending end IP address information field As the key of this table, the maximum survival time corresponding to the IP and maximum survival time comparison table can be queried through the sending end IP address information.
在一实施方式中,发送端IP地址信息、最大生存时间之间的映射关系的获取来源于网元设备商,通常会以配置界面或者配置文件的形式存在。In one implementation, the mapping relationship between the sending end IP address information and the maximum lifetime is obtained from the network element equipment vendor, which usually exists in the form of a configuration interface or configuration file.
需要说明的是,在网元设备接入网元中时,已将网元设备的发送端IP地址信息及相应的最大生存时间存储至IP与最大生存时间对照关系表中,IP与最大生存时间对照关系表中的生存时间不随着数据包在网元设备之间的传输而改变。It should be noted that when the network element device is connected to the network element, the sending IP address information and the corresponding maximum survival time of the network element device have been stored in the IP and maximum survival time comparison table. The IP and maximum survival time The survival time in the comparison table does not change as the data packet is transmitted between network element devices.
在一实施方式中,根据IP标识信息中的发送端IP地址信息,在IP与最大生存时间对照关系表中进行查询,得到对应的最大生存时间。In one embodiment, based on the sender IP address information in the IP identification information, a query is performed in the IP and maximum survival time comparison table to obtain the corresponding maximum survival time.
在一实施方式中,在步骤S2134中,根据接入来源判断结果与历史接入来源标记信息,确定重复数据包的重复接入类型信息包括:In one embodiment, in step S2134, determining the repeated access type information of the repeated data packet according to the access source judgment result and historical access source mark information includes:
在接入来源判断结果为发送端接入的情况下,对历史发送端标记信息进行判断,若历史发送端标记信息为1,确定重复数据包的重复接入类型为发送端IP重复接入类型;When the access source judgment result is that the sending end is accessed, the historical sending end tag information is judged. If the historical sending end tag information is 1, it is determined that the repeated access type of the repeated data packet is the sending end IP repeated access type. ;
在接入来源判断结果为接收端接入的情况下,对历史发送端标记信息进行判断,若历史接收端标记信息为1,确定重复数据包的重复接入类型为接收端IP重复接入类型;When the access source judgment result is that the receiving end accesses, the historical sending end tag information is judged. If the historical receiving end tag information is 1, it is determined that the repeated access type of the repeated data packet is the receiving end IP repeated access type. ;
在接入来源判断结果为发送端接入的情况下,对历史发送端标记信息进行判断,若历史发送端标记信息为0,确定重复数据包的重复接入类型为发送端-接收端IP重复接入类型;When the access source judgment result is that the sender accesses, the historical sender mark information is judged. If the historical sender mark information is 0, it is determined that the repeated access type of the duplicate data packet is sender-receiver IP duplication. access type;
在接入来源判断结果为接收端接入的情况下,对历史接收端标记信息进行判断,若历史接收端标记信息为0,确定重复数据包的重复接入类型为发送端-接收端IP重复接入类型。When the access source judgment result is that the receiving end is accessed, the historical receiving end tag information is judged. If the historical receiving end tag information is 0, it is determined that the repeated access type of the repeated data packet is sender-receiver IP duplication. Access type.
在一实施方式中,发送端IP重复接入类型简称为SS型,即Source IP重复,接收端IP重复接入类型简称为DD型,即Destination IP重复,发送端-接收端IP重复接入类型,简称为SD型,即Source IP、Destination IP重复。In one embodiment, the sender IP repeated access type is referred to as SS type, that is, Source IP repeated access type, and the receiving end IP repeated access type is referred to as DD type, that is, Destination IP repeated access type, and the sender-receiver IP repeated access type is referred to as , referred to as SD type, that is, Source IP and Destination IP are repeated.
在本申请实施例中的方法还包括:根据当前接入来源判断结果,对历史接入来源标记信息进行更新,得到当前接入来源标记信息;将当前接入来源标记信息作为更新后的历史接入来源标记信息。The method in the embodiment of the present application also includes: updating the historical access source tag information according to the current access source judgment result to obtain the current access source tag information; using the current access source tag information as the updated historical access source tag information. Enter source tag information.
在一实施方式中,在接入来源判断结果为发送端接入的情况下,对历史发送端标记信息进行判断,若历史发送端标记信息为0,对历史发送端标记信息进行更新,将历史发送端标记信息的数值置为1;在接入来源判断结果为接收端接入的情况下,对历史接收端标记信息 进行判断,若历史接收端标记信息为0,对历史接收端标记信息进行更新,将历史接收端标记信息的数值置为1。In one embodiment, when the access source judgment result is that the sending end is accessed, the historical sending end mark information is judged. If the historical sending end mark information is 0, the historical sending end mark information is updated, and the historical sending end mark information is updated. The value of the sending end tag information is set to 1; when the access source judgment result is that the receiving end is accessed, the historical receiving end tag information is Make a judgment and if the historical receiving end mark information is 0, update the historical receiving end mark information and set the value of the historical receiving end mark information to 1.
在一实施方式中,对重复数据包的接入时间进行记录,得到重复接入时间信息;In one embodiment, the access time of repeated data packets is recorded to obtain repeated access time information;
在一实施方式中,如图5所示,步骤对重复数据包的接入时间进行记录,得到重复接入时间信息可以包括但不限于步骤S2125、步骤S2126、步骤S2127:In one embodiment, as shown in Figure 5, the step of recording the access time of repeated data packets and obtaining the repeated access time information may include but is not limited to step S2125, step S2126, and step S2127:
步骤S2125:获取重复接入开始时间,重复接入开始时间为首次发生重复接入的时间;Step S2125: Obtain the repeated access start time, which is the time when repeated access occurs for the first time;
步骤S2126:获取重复接入最近时间,重复接入最近时间为发生重复接入的最近时间;Step S2126: Obtain the latest time of repeated access, which is the latest time when repeated access occurs;
在一实施方式中,当存在重复数据包接入时,根据当前数据包的接入时间,更新重复接入统计表中的时间信息,更新的时间信息包括重复接入开始时间字段和重复接入最近时间字段,或者仅更新重复接入最近时间字段。In one embodiment, when there is a duplicate data packet access, the time information in the duplicate access statistics table is updated according to the access time of the current data packet. The updated time information includes the duplicate access start time field and the duplicate access The latest time field, or only update the latest time field for repeated access.
在一实施方式中,若当前数据包为首次发生重复的数据包,将重复接入开始时间和重复接入最近时间均设置为当前重复数据包的接入时间。In one implementation, if the current data packet is a repeated data packet for the first time, the repeated access start time and the repeated access latest time are both set to the access time of the current repeated data packet.
在一实施方式中,若当前数据包不是首次发生重复的数据包,重复接入开始时间不变,将重复接入最近时间设置为当前重复数据包的接入时间。In one embodiment, if the current data packet is not the first repeated data packet, the repeated access start time remains unchanged, and the latest repeated access time is set as the access time of the current repeated data packet.
在一实施方式中,重复接入最近时间表示为最近一次接入重复数据包的时间。In one embodiment, the latest time of repeated access is represented by the time of the latest access of a repeated data packet.
步骤S2127:将重复接入最近时间与重复接入开始时间的差值,作为重复接入时间信息。Step S2127: Use the difference between the latest repeated access time and the repeated access start time as repeated access time information.
在一实施方式中,根据重复接入时间信息,可以得知重复数据包在哪个时间段产生。In one embodiment, according to the repeated access time information, it can be known in which time period the repeated data packets are generated.
步骤S2130:根据源目IP地址对信息、重复接入类型信息与重复接入时间信息,对重复接入统计表中对应信息进行更新,得到更新后的重复接入统计表。Step S2130: Update the corresponding information in the repeated access statistics table according to the source and destination IP address pair information, repeated access type information, and repeated access time information to obtain an updated repeated access statistics table.
在一实施方式中,重复接入时间信息可以理解为重复数据包的接入时间。In one implementation, the repeated access time information can be understood as the access time of repeated data packets.
步骤S1400:根据源目IP地址对信息、重复接入类型信息与重复接入时间信息中的至少一个信息,对产生重复数据包进行重复数据的识别。Step S1400: Identify duplicate data generated in duplicate data packets based on at least one of the source and destination IP address pair information, duplicate access type information, and duplicate access time information.
在一实施方式中,用户通过结合源目IP地址对信息、重复接入类型信息与重复接入时间信息,可以得知,如源目IP地址对信息、重复接入类型的接入数量,重复接入的时间信息等,进而根据这些信息定位到出现重复数据包对应的设备,同时对后续故障消除情况进行监控,故障消除场景下文会详细说明。In one implementation, by combining the source and destination IP address pair information, repeated access type information, and repeated access time information, the user can learn, for example, the source and destination IP address pair information, the number of accesses of the repeated access type, and the number of repeated access types. Access time information, etc., and then use this information to locate the device corresponding to the duplicate data packet, and monitor the subsequent fault elimination. The fault elimination scenario will be described in detail below.
如图6所示,本申请实施例的重复数据的识别方法还包括但不限于以下步骤:As shown in Figure 6, the method for identifying duplicate data in the embodiment of the present application also includes but is not limited to the following steps:
步骤S1500:根据重复接入类型信息,构建对应于各个重复接入类型的计数器;Step S1500: Construct a counter corresponding to each repeated access type based on the repeated access type information;
在一实施方式中,根据重复接入类型SS型,构建SS型计数器,根据重复接入类型DD型,构建DD型计数器,根据重复接入类型SD型,构建SD型计数器。In one implementation, the SS type counter is constructed according to the repeated access type SS type, the DD type counter is constructed based on the repeated access type DD type, and the SD type counter is constructed based on the repeated access type SD type.
步骤S1600:对源目IP地址对信息、计数器对应的数值信息与重复接入时间信息进行显示。Step S1600: Display the source and destination IP address pair information, numerical information corresponding to the counter, and repeated access time information.
在一实施方式中,根据当前数据包的源目IP地址对信息、重复接入类型,更新重复接入统计表。In one embodiment, the repeated access statistics table is updated according to the source and destination IP address pair information and the repeated access type of the current data packet.
更新包括,根据当前源目IP地址对信息,查询得到与重复接入类型对应的计数器,并将查询得到的计数器的数值加1。若重复接入类型为SS型,更新SS型计数器的数值,将SS型计数器的数值加1;若重复接入类型为DD型,更新DD型计数器的数值,将DD型计数器的数值加1;若重复接入类型为SD型,更新SD型计数器的数值,将SD型计数器的数值加1。The update includes querying and obtaining the counter corresponding to the repeated access type based on the current source and destination IP address pair information, and adding 1 to the value of the counter obtained by querying. If the repeated access type is SS type, update the value of SS type counter and add 1 to the value of SS type counter; if the repeated access type is DD type, update the value of DD type counter and add 1 to the value of DD type counter; If the repeated access type is SD type, update the value of the SD type counter and add 1 to the value of the SD type counter.
在一实施方式中,将源目IP地址对信息、计数器对应的数值信息与重复接入时间进行显 示,能够让用户直观了解到重复数据包的源目IP地址对、SS型的接入数量、DD型的接入数量、SD型的接入数量、重复接入最近时间和重复接入开始时间,以便用户能够精准地了解重复数据包所对应的网元设备,从而能够便于快速地定位到重复接入故障问题。In one embodiment, the source and destination IP address pair information, numerical information corresponding to the counter, and repeated access time are displayed. The display allows users to intuitively understand the source and destination IP address pairs of repeated data packets, the number of SS-type accesses, the number of DD-type accesses, the number of SD-type accesses, the latest time of repeated access and the start time of repeated access. , so that users can accurately understand the network element equipment corresponding to repeated data packets, so that they can quickly locate repeated access failure problems.
在一实施方式中,对故障诊断进行说明,对于某个源目IP地址对而言,故障消除体现为三种场景,未消除、部分消除和彻底消除,在一实施方式中,故障消除又分为部分源目IP地址对故障消除和所有源目IP地址对故障消除。为了判断部分源目IP地址对故障消除和所有源目IP地址对故障消除,需要设置一个时间阈值;部分源目IP地址对故障消除指的是若某个源目IP地址对的SS型计数器和DD型计数器都在不停递增,在时间阈值后,SS型计数器不再递增,但是DD型计数器还在递增,说明该源目IP地址对中,SS型重复接入故障已经消除,但是DD型重复接入故障未消除;所有源目IP地址对故障消除指的是在时间阈值后,SS型计数器、DD型计数器和SD型计数器均不再递增。In one embodiment, fault diagnosis is described. For a certain source-destination IP address pair, fault elimination is reflected in three scenarios: not eliminated, partially eliminated, and completely eliminated. In one embodiment, fault elimination is divided into three scenarios: The fault is rectified for some source and destination IP address pairs and the fault is rectified for all source and destination IP address pairs. In order to determine whether the fault is eliminated for some source and destination IP addresses and for all source and destination IP addresses, a time threshold needs to be set; the elimination of the fault for partial source and destination IP addresses refers to if the SS type counter of a certain source and destination IP address pair and The DD type counters are constantly increasing. After the time threshold, the SS type counters no longer increment, but the DD type counters are still increasing, indicating that the source and destination IP addresses are in the correct pairing, and the SS type repeated access fault has been eliminated, but the DD type counters The repeated access fault has not been eliminated; the elimination of the fault for all source and destination IP addresses means that after the time threshold, the SS type counter, DD type counter, and SD type counter no longer increase.
在一实施方式中,在步骤S1600中,根据上述更新后的重复接入统计表中的数据在终端进行显示。In one embodiment, in step S1600, the data in the updated repeated access statistics table is displayed on the terminal.
本申请的一个实施例还提供了一种重复数据的识别系统,包括,配置模块、数据包接收模块和数据包分析模块,配置模块被设置为配置IP与最大生存时间对照关系表,数据包接收模块被设置为接收数据包,并将数据包发送给数据包分析模块,数据包分析模块被设置为解析数据包得到IP标识信息,并根据IP标识信息对重复数据包进行识别得到重复接入类型信息,其中,IP标识信息包括发送端IP地址信息和接收端IP地址信息,统计模块被设置为存储重复数据包的发送端IP地址信息、接收端IP地址信息、重复接入类型以及重复数据包的接入时间信息。An embodiment of the present application also provides a duplicate data identification system, including a configuration module, a data packet receiving module and a data packet analysis module. The configuration module is set to configure a comparison table between IP and maximum survival time. The data packet receiving module The module is set to receive data packets and send the data packets to the data packet analysis module. The data packet analysis module is set to parse the data packets to obtain IP identification information, and identify repeated data packets based on the IP identification information to obtain repeated access types. Information, where the IP identification information includes the sender IP address information and the receiver IP address information, and the statistics module is configured to store the sender IP address information, receiver IP address information, repeated access types, and duplicate data packets of duplicate data packets. access time information.
在一实施方式中,配置模块还被设置为配置重复数据识别功能的开启或关闭。In one embodiment, the configuration module is further configured to configure the turning on or off of the duplicate data identification function.
本申请实施例的重复数据的识别系统应用于基于IP协议的任何通讯网络,包括IPV4和IPV6版本,但不局限于任何形式的网络结构。The duplicate data identification system in the embodiment of the present application is applicable to any communication network based on the IP protocol, including IPV4 and IPV6 versions, but is not limited to any form of network structure.
在一实施方式中,配置模块可为配置器,数据包接收模块可为数据包接收器,数据包分析模块即可为数据包分析器,统计模块可为重复接入统计器。In one implementation, the configuration module can be a configurator, the data packet receiving module can be a data packet receiver, the data packet analysis module can be a data packet analyzer, and the statistics module can be a repeated access statistician.
在一实施方式中,如图7所示,重复数据的识别系统包括配置器、数据包接收器、数据包分析器、数据包发送器、重复接入统计器和重复接入展示器,下面将详细介绍。In one implementation, as shown in Figure 7, the duplicate data identification system includes a configurator, a data packet receiver, a data packet analyzer, a data packet sender, a duplicate access statistician and a duplicate access presenter. The following will Detailed introduction.
配置器负责维护配置信息,目前的配置信息包括但不限于,重复数据识别功能的开启与关闭、对数据包最大缓存时长进行配置、维护IP与最大生存时间对照关系表、故障解决的最大时长。The configurator is responsible for maintaining configuration information. The current configuration information includes, but is not limited to, turning on and off the duplicate data identification function, configuring the maximum cache duration of data packets, maintaining the comparison table between IP and maximum survival time, and the maximum duration for fault resolution.
在一实施方式中,重复数据识别功能的默认状态为开启,数据包的最大缓存时长配置为10秒。In one implementation, the default state of the duplicate data identification function is on, and the maximum cache duration of data packets is configured as 10 seconds.
在一实施方式中,若重复数据识别的开关为开启状态,则识别出的重复数据包直接丢弃,以避免重复数据包发送到数据采集系统,从而增加数据采集系统的工作负荷。In one embodiment, if the switch for duplicate data identification is turned on, the identified duplicate data packets are directly discarded to avoid sending duplicate data packets to the data collection system, thus increasing the workload of the data collection system.
在一实施方式中,若重复数据识别功能的开关为关闭状态,则对接收到的所有数据包都不做识别处理,直接将数据包转发给数据采集系统,以避免在没有重复数据包的情况下,数据传输的时延。In one embodiment, if the switch of the duplicate data identification function is turned off, all received data packets will not be identified, and the data packets will be directly forwarded to the data collection system to avoid the situation where there are no duplicate data packets. Next, the data transmission delay.
在一实施方式中,IP与最大生存时间的对照关系可以通过被采集设备提供的参数信息获得。 In one implementation, the comparison relationship between IP and maximum survival time can be obtained through parameter information provided by the collected device.
如表1所示,示出了IP与最大生存时间对照关系表中的字段,包括发送端IP地址信息字段和最大生存时间字段,发送端IP地址信息和最大生存时间之间具有映射关系,发送端IP地址信息为关键字,能够唯一确定IP与最大生存时间对照关系表中的一条记录,根据发送端IP地址信息可以查询到对应设备的最大生存时间的数值。As shown in Table 1, the fields in the comparison table between IP and maximum survival time are shown, including the sending end IP address information field and the maximum survival time field. There is a mapping relationship between the sending end IP address information and the maximum survival time. The end IP address information is a keyword that can uniquely determine a record in the comparison table between IP and maximum survival time. According to the sending end IP address information, the value of the maximum life time of the corresponding device can be queried.
根据IP协议,为了避免消息在路由设备之间出现循环而永远达不到目的地,每条消息都会携带一个生存时间值,最大生存时间的值由发送端IP地址端设备构造消息时设置,例如,使用N代表生存时间的初始值,N一般为64,消息每经过一个路由设备,则该生存时间值减1,当值减到0时丢弃,因此数据包的生存时间值满足:生存时间小于等于N。According to the IP protocol, in order to prevent messages from looping between routing devices and never reaching the destination, each message carries a survival time value. The maximum survival time value is set by the sending IP address device when constructing the message, for example , use N to represent the initial value of the survival time, N is generally 64, each time the message passes through a routing device, the survival time value is reduced by 1, and is discarded when the value is reduced to 0, so the survival time value of the data packet satisfies: the survival time is less than equal to N.
表1
Table 1
数据包接收器主要负责建立和维护重复数据的识别系统、数据包汇聚器之间的通讯链路,还负责接收来自数据包汇聚器中的所有数据包,并将接收到的数据包转交给数据分析器进行分析处理。The data packet receiver is mainly responsible for establishing and maintaining the duplicate data identification system and the communication link between the data packet aggregators. It is also responsible for receiving all data packets from the data packet aggregator and transferring the received data packets to the data Analyzer performs analysis and processing.
数据包发送器主要负责建立和维护重复数据的识别系统、数据采集系统之间的通讯链路,还负责把数据包分析器处理后,且需要转发的数据包发送给数据采集系统。The data packet sender is mainly responsible for establishing and maintaining the communication link between the duplicate data identification system and the data collection system. It is also responsible for sending the data packets that need to be forwarded after processing by the data packet analyzer to the data collection system.
数据包分析器主要负责解析数据包各协议层的信息、根据协议层的IP信息,结合数据包信息表中的字段信息,识别出数据包的重复接入类型。The data packet analyzer is mainly responsible for parsing the information of each protocol layer of the data packet, and identifying the repeated access type of the data packet based on the IP information of the protocol layer and the field information in the data packet information table.
在IPV4版本中,数据包需要解析的IP层协议的信息如表2,包括发送端IP地址信息、接收端IP地址信息、IP标识信息、IP分片标记信息、IP分片偏移信息、协议类型、IP包总长度信息和TTL,表2中的信息所对应的IP协议层字段在上文中已列举。In the IPV4 version, the IP layer protocol information that data packets need to parse is as shown in Table 2, including sender IP address information, receiver IP address information, IP identification information, IP fragmentation mark information, IP fragmentation offset information, protocol Type, IP packet total length information and TTL. The IP protocol layer fields corresponding to the information in Table 2 have been listed above.
表2
Table 2
在IPV6版本中,数据包需要解析的IP层协议的信息如下表3,包括发送端IP地址信息、接收端IP地址信息、流标签信息、有效载荷长度信息、下一个头部信息、跳数限制信息(Hop Limit),表3中的信息对应的IP协议层字段在上文中已列举出。In the IPV6 version, the IP layer protocol information that data packets need to parse is as shown in Table 3, including the sending IP address information, the receiving IP address information, flow label information, payload length information, next header information, and hop limit. Information (Hop Limit), the IP protocol layer fields corresponding to the information in Table 3 have been listed above.
表3
table 3
数据包分析器还负责维护数据包信息表,以便缓存在最大缓存时间内接收到的数据包的IP层协议信息,并且数据包分析器对数据包信息表进行定时检测,将超过指定时长未收到重复数据包的记录及时清除,指定时间可以由用户自行进行配置。The packet analyzer is also responsible for maintaining the packet information table in order to cache the IP layer protocol information of the packets received within the maximum cache time. The packet analyzer also performs regular detection on the packet information table. If the packets are not received for a specified period of time, the packet analyzer will Records of duplicate data packets are cleared in a timely manner, and the specified time can be configured by the user.
在一实施方式中,如图10所示,设定老化时间,使用老化定时器对数据包信息表中的记录进行定时检测,循环检查表中的每一条记录存在的时间是否等于或大于老化时间,若某一条记录等于或大于老化时间,则删除该条记录。In one embodiment, as shown in Figure 10, the aging time is set, the aging timer is used to regularly detect the records in the data packet information table, and the existence time of each record in the table is cyclically checked whether it is equal to or greater than the aging time. , if a record is equal to or greater than the aging time, delete the record.
在IPV4版本中,如表4所示,示出了IPV4版本的数据包信息表中的字段,将发送端IP地址信息、接收端IP地址信息、IP分片标记信息、IP分片偏移信息、协议类型、IP包总长度信息,作为每一条记录的关键字,通过关键字可以确定数据包信息表中唯一一条记录。 In the IPV4 version, as shown in Table 4, the fields in the packet information table of the IPV4 version are shown, including the sending end IP address information, the receiving end IP address information, IP fragmentation mark information, and IP fragmentation offset information. , protocol type, and IP packet total length information are used as keywords for each record. The unique record in the data packet information table can be determined through the keywords.
表4
Table 4
在IPV6版本中,如表5所示,示出了IPV6版本的数据包信息表中的字段。将发送端IP地址信息、接收端IP地址信息、流标签信息、有效载荷长度信息、下一个头部信息,作为每一条记录的关键字,该关键字可以确定数据包信息表中唯一一条记录。In the IPV6 version, as shown in Table 5, the fields in the packet information table of the IPV6 version are shown. The sender IP address information, receiver IP address information, flow label information, payload length information, and next header information are used as keywords for each record. This keyword can determine the only record in the packet information table.
对历史发送端标记信息和历史接收端标记信息进行说明:Explain the historical sender mark information and historical receiver mark information:
从IP与最大生存时间对照关系表中,查询得到与当前发送端IP地址信息对应的生存时间,历史发送端标记信息和历史接收端标记信息的初始值为0。接入来源标记信息的算法为,若当前数据包携带的生存时间等于最大生存时间,则将历史发送端标记信息的数值置为1,若当前数据包携带的生存时间不等于最大生存时间,则将历史接收端标记信息的数值置为1。From the IP and maximum survival time comparison table, query the survival time corresponding to the current sender IP address information. The initial values of the historical sender mark information and the historical receiver mark information are 0. The algorithm for accessing source tag information is: if the survival time carried by the current data packet is equal to the maximum survival time, then the value of the historical sender tag information is set to 1; if the survival time carried by the current data packet is not equal to the maximum survival time, then Set the value of the historical receiver tag information to 1.
参照图8,例如,对初始数据包的接入来源标记信息进行说明,将初始数据包的初始生存时间与最大生存时间进行对比,若初始生存时间等于最大生存时间,表示初始数据包是从发送端IP地址采集到的,并将历史发送端标记信息置为1,历史接收端标记信息置为0;若初始生存时间不等于最大生存时间,表示初始数据包是从接收端IP地址采集到的,并将历史发送端标记信息置为0,历史接收端标记信息置为1。在完成上述操作后,将初始数据包发送给数据采集系统进行网络质量的分析。Referring to Figure 8, for example, the access source tag information of the initial data packet is explained, and the initial survival time of the initial data packet is compared with the maximum survival time. If the initial survival time is equal to the maximum survival time, it means that the initial data packet is sent from It is collected from the end IP address, and the historical sending end mark information is set to 1, and the historical receiving end mark information is set to 0; if the initial survival time is not equal to the maximum survival time, it means that the initial data packet was collected from the receiving end IP address. , and set the historical sending end tag information to 0, and the historical receiving end tag information to 1. After completing the above operations, the initial data packet is sent to the data collection system for network quality analysis.
表5
table 5
重复接入统计器主要负责维护重复接入统计表,用于存储重复数据包的发送端IP地址和接收端IP地址,并统计重复接入次数等信息,以便将重复接入统计表中的信息发送给重复接入展示器进行展示。The repeated access counter is mainly responsible for maintaining the repeated access statistics table, which is used to store the sending IP address and receiving IP address of repeated data packets, and to count the number of repeated accesses and other information, so that the information in the repeated access statistics table can be stored Send to repeat access presenter for display.
如表6所示,示出了重复接入统计表中的字段,包括,发送端IP地址信息字段、接收端IP地址信息字段、SS型计数器字段、SS型计数器字段、SD型计数器字段、重复接入开始时间字段和重复接入时间字段。而三种计数器字段分别用于记录三种重复接入类型的接入数量。As shown in Table 6, the fields in the repeated access statistics table are shown, including the sending end IP address information field, the receiving end IP address information field, SS type counter field, SS type counter field, SD type counter field, repeat Access start time field and repeat access time field. The three counter fields are used to record the number of accesses of the three repeated access types.
在重复接入统计表中,将发送端IP地址信息、接收端IP地址信息作为每一条记录的关键字,通过该关键字能够唯一确定重复接入统计表中的唯一一条记录。In the repeated access statistics table, the sending end IP address information and the receiving end IP address information are used as keywords for each record. This keyword can uniquely determine the only record in the repeated access statistics table.
表6
Table 6
通过重复接入统计表,可以得知如下信息:By repeatedly accessing the statistics table, you can get the following information:
根据重复接入统计表,可以得知存在重复接入情况的源目IP地址对信息。According to the repeated access statistics table, you can know the source and destination IP address pairs with repeated access.
根据重复接入统计表,可以得知某个源目IP地址对信息所对应的SS型重复接入数量、DD型重复接入数量、SD型重复接入数量。According to the repeated access statistics table, you can know the number of SS-type repeated accesses, the number of DD-type repeated accesses, and the number of SD-type repeated accesses corresponding to a certain source-destination IP address pair information.
根据重复接入统计表,可以得知某个源目IP地址对出现重复接入情况的开始时间。 According to the repeated access statistics table, you can know the start time of repeated access for a certain source and destination IP address.
根据重复接入统计表,可以得知某个源目IP地址对重复接入故障完全解决的时间。According to the repeated access statistics table, you can know the time it takes for a certain source and destination IP address to completely resolve the repeated access failure.
根据重复接入统计表,可以得知某个源目IP地址对中的某个重复接入类型的故障是否已经解决,例如,设置一个时间阈值,若SS型计数器的值大于0且在时间阈值内不再递增,则代表该SS型重复接入故障问题已经解决。According to the repeated access statistics table, you can know whether a certain repeated access type fault in a source-destination IP address pair has been resolved. For example, set a time threshold. If the value of the SS type counter is greater than 0 and within the time threshold If the value no longer increases, it means that the SS type repeated access fault problem has been solved.
根据重复接入统计表,可以得知某个源目IP地址对的重复接入时间信息,即存在重复接入的时间段,重复接入时间段由重复接入开始时间和重复接入最近时间的差值计算得出。According to the repeated access statistics table, we can know the repeated access time information of a certain source and destination IP address pair, that is, there is a time period for repeated access. The repeated access time period consists of the repeated access start time and the latest repeated access time. The difference is calculated.
在一实施例中,可以将SS型计数器、DD型计数器和SD型计数器分别设置一个重复接入最近时间,以分别监控三种重复接入类型的消除时间。In one embodiment, the SS type counter, the DD type counter and the SD type counter can each be set to a repeated access latest time to respectively monitor the elimination time of the three types of repeated access.
根据重复接入统计表,可以直观地了解哪些源目IP地址对存在重复接入情况,及哪些源目IP地址对的重复接入问题是否已经解决,According to the repeated access statistics table, you can intuitively understand which source and destination IP address pairs have repeated access, and whether the repeated access problem of which source and destination IP address pairs has been resolved.
重复接入展示器主要负责将重复接入统计表中的信息以可视化界面的形式展示出来,让用户能够直观地了解重复数据包的源目IP地址对、SS型重复接入的次数、DD型重复接入的次数、SD型重复接入的次数、重复接入开始时间、重复接入最近时间等信息,以便用户能够精准地知道存在重复接入数据包对应的网元设备,从而能够快递地定位故障。The repeated access displayer is mainly responsible for displaying the information in the repeated access statistics table in the form of a visual interface, allowing users to intuitively understand the source and destination IP address pairs of repeated data packets, the number of SS type repeated accesses, DD type Information such as the number of repeated accesses, the number of SD-type repeated accesses, the repeated access start time, the latest repeated access time, etc., so that users can accurately know the network element equipment corresponding to the repeated access data packets, so that they can expressly Locate the fault.
参照图9,对重复接入统计表进行说明,数据包分析器将重复数据包的发送端IP地址信息、接收端IP地址信息和重复接入类型信息传递给重复接入统计器,根据发送端IP地址信息、接收端IP地址信息,查询重复接入统计表。Referring to Figure 9, the repeated access statistics table is explained. The data packet analyzer passes the sending end IP address information, the receiving end IP address information and the repeated access type information of the repeated data packets to the repeated access statistician. According to the sending end IP address information, receiving end IP address information, query duplicate access statistics table.
若在重复接入统计表中未查找到与发送端IP地址信息、接收端IP地址信息相应的记录,则在重复接入统计表中新增一条记录,将发送端IP地址信息、接收端IP地址信息添加至该条记录中,并根据重复接入类型信息,将与其对应的计数器的数值置为1,将其他计数器的数值置为0,并将重复接入开始时间和重复接入最近时间置为当前时间。If no record corresponding to the sender IP address information and the receiver IP address information is found in the repeated access statistics table, add a new record in the repeated access statistics table and add the sender IP address information and the receiver IP address. The address information is added to the record, and according to the repeated access type information, the value of the corresponding counter is set to 1, the values of other counters are set to 0, and the repeated access start time and the latest repeated access time are set Set to current time.
若在重复接入统计表中查找到与发送端IP地址信息、接收端IP地址信息相应的记录,则将重复接入类型信息对应计数器的数值加1,用于统计重复接入类型的接入数量,并更新重复接入最近时间为当前时间,表示最近一次重复数据包的接入时间。If a record corresponding to the sender IP address information and the receiver IP address information is found in the repeated access statistics table, the value of the counter corresponding to the repeated access type information is increased by 1, which is used to count the access of the repeated access type. quantity, and updates the latest repeated access time to the current time, indicating the access time of the latest repeated data packet.
对数据包信息表、重复接入统计表的查询条件、查询字段和更新字段进行说明,数据分析器查询并更新数据包信息表,查询条件如表4和表5中的关键字,查询字段为历史发送端标记信息、历史接收端标记信息,数据包信息表中的更新字段为,历史发送端标记信息、历史接收端标记信息。Describe the query conditions, query fields and update fields of the data packet information table and repeated access statistics table. The data analyzer queries and updates the data packet information table. The query conditions are such as the keywords in Table 4 and Table 5. The query fields are Historical sender mark information, historical receiver mark information, the update fields in the data packet information table are, historical sender mark information, historical receiver mark information.
数据包分析器向重复接入统计器发送统计消息,消息内容为发送端IP地址信息、接收端IP地址信息和重复接入类型信息。The data packet analyzer sends a statistical message to the repeated access statistician. The content of the message is the sending end IP address information, the receiving end IP address information and the repeated access type information.
重复接入统计器查询并更新重复接入统计表,查询条件为发送端IP地址信息和接收端IP地址信息,查询字段为SS型计数器、SS型计数器和SD型计数器,需要更新的字段为SS型计数器、SS型计数器、SD型计数器、重复接入最近时间和重复接入开始时间。The repeated access statistics counter queries and updates the repeated access statistics table. The query conditions are the sending IP address information and the receiving IP address information. The query fields are SS type counter, SS type counter and SD type counter. The field that needs to be updated is SS. Type counter, SS type counter, SD type counter, repeated access latest time and repeated access start time.
下面简要对重复数据包的流程进行说明:The following is a brief description of the process of repeated data packets:
数据包接收器接收来自数据包汇聚器的所有数据包,并将接收到的数据包转交给数据包分析器进行处理。数据包分析器对接收到的数据包进行协议解析,以便获取数据包的相关信息,如IPV4版本中的发送端IP地址信息、接收端IP地址信息、IP分片标记信息、IP分片偏移信息、IP包总长度信息、TTL,IPV6版本中的发送端IP地址信息、接收端IP地址信息、流标签信息、有效载荷长度信息、下一个头部信息、跳数限制信息。 The packet receiver receives all packets from the packet aggregator and forwards the received packets to the packet analyzer for processing. The packet analyzer performs protocol analysis on the received data packets in order to obtain relevant information of the data packets, such as the sending IP address information, receiving IP address information, IP fragmentation mark information, and IP fragmentation offset in the IPV4 version Information, IP packet total length information, TTL, sender IP address information, receiver IP address information, flow label information, payload length information, next header information, and hop limit information in the IPV6 version.
在对数据包进行协议解析后,根据当前数据包解析到的信息,结合数据包信息表中存储的信息,识别出当前数据包是否为重复数据包,若是重复数据包,则进一步识别重复数据包的重复接入类型。After protocol analysis of the data packet, based on the information parsed from the current data packet and the information stored in the data packet information table, it is identified whether the current data packet is a duplicate data packet. If it is a duplicate data packet, the duplicate data packet is further identified. Repeat access type.
在识别出重复数据包的重复接入类型后,将重复数据包的发送端IP地址信息、接收端IP地址信息、重复接入类型信息,发送给重复接入统计器进行重复接入数量的统计。After identifying the repeated access type of the repeated data packet, the sender IP address information, receiving end IP address information, and repeated access type information of the repeated data packet are sent to the repeated access statistician to count the number of repeated accesses. .
在一实施方式中,若不是重复数据包,则将当前数据包转交给数据包发送器,若是重复数据包,则将重复数据包丢弃。In one implementation, if the current data packet is not a duplicate data packet, the current data packet is transferred to the data packet sender; if it is a duplicate data packet, the duplicate data packet is discarded.
当数据包发送器接收到数据包后,将该数据包转发给数据采集系统。When the data packet sender receives the data packet, it forwards the data packet to the data acquisition system.
当重复接入统计器接收到数据包分析器的统计请求后,对重复接入统计表中的信息进行更新,并将重复接入信息表中的信息发送给重复接入展示器。When the repeated access statistician receives the statistics request from the packet analyzer, it updates the information in the repeated access statistics table and sends the information in the repeated access information table to the repeated access presenter.
本申请实施例通过对接收到的数据包进行分析,能够精确得知在哪些IP地址中存在数据包的重复接入情况、能得知重复数据包的重复接入类型属于哪种类型,进一步得知在哪些IP端存在数据包重复。By analyzing the received data packets, the embodiment of the present application can accurately know which IP addresses have repeated access of data packets, and can know what type of repeated access type the repeated data packets belong to, and further obtain Know on which IP ends there are duplicate packets.
本申请实施例以源目IP地址对为单位进行重复数据包进行统计,能够精确知道源目IP地址对存在重复数据包接入的次数,且三种重复接入类型将分别进行统计。并且能够得知源目IP地址中重复数据包接入的开始时间和最近一次接入的时间。The embodiment of this application uses source and destination IP address pairs as units to count duplicate data packets, and can accurately know the number of duplicate data packet accesses for source and destination IP address pairs, and the three types of repeated accesses will be counted separately. And it can know the start time and the latest access time of duplicate data packets in the source and destination IP addresses.
本申请实施例通过重复接入展示器,将重复数据包的信息直观地展示给用户,以便用户能够快速定位到故障设备,从而精确快速地排查故障。并且通过重复接入展示器中三个类型计数器的变化情况,可以得知数据包重复接入故障是部分消除还是全部消除;当重复接入展示器中三个计数器的数值不在递增,并且结合最近一次重复数据包的接入时间,可以得知重复接入故障在哪个时间点彻底清除。The embodiment of the present application intuitively displays the information of repeated data packets to the user through repeated access to the presenter, so that the user can quickly locate the faulty device and troubleshoot the fault accurately and quickly. And through the changes in the three types of counters in the repeated access display, we can know whether the data packet repeated access failure is partially or completely eliminated; when the values of the three counters in the repeated access display are no longer incrementing, and combined with the latest The access time of a repeated data packet can be used to know at which point in time the repeated access failure is completely eliminated.
本申请实施例通过将重复数据包进行过滤,避免了将这些异常的数据包发送给数据采集系统和数据分析系统,提高了系统性能和数据分析的准确性。By filtering duplicate data packets, the embodiments of the present application avoid sending these abnormal data packets to the data collection system and data analysis system, thereby improving system performance and the accuracy of data analysis.
本申请实施例提供了一种数据来源的识别系统,能够根据识别到的源目IP地址对信息、重复接入类型和重复接入时间消息,从而有助于排查故障,保证网络质量分析的准确性,同时还有助于提高数据采集系统和分析系统的性能。The embodiment of the present application provides a data source identification system that can pair information, repeated access types and repeated access time messages based on the identified source and destination IP addresses, thereby helping to troubleshoot and ensure the accuracy of network quality analysis. performance while also helping to improve the performance of data acquisition systems and analysis systems.
本申请的一个实施例还提供了一种重复数据的识别装置,该装置包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器能够执行计算机程序实现如上所述的重复数据的识别方法。One embodiment of the present application also provides a device for identifying duplicate data. The device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor. The processor can execute the computer program to implement the above. The above-mentioned identification method of duplicate data.
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。As a non-transitory computer-readable storage medium, memory can be used to store non-transitory software programs and non-transitory computer executable programs. In addition, the memory may include high-speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory may include memory located remotely from the processor, and the remote memory may be connected to the processor through a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
实现上述实施例的图像处理方法所需的非暂态软件程序以及指令存储在存储器中,当被处理器执行时,执行上述实施例中的重复数据的识别方法。The non-transitory software programs and instructions required to implement the image processing method of the above embodiment are stored in the memory. When executed by the processor, the repetitive data identification method in the above embodiment is executed.
以上所描述的网元实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。 The network element embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
本申请的一个实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行上述实施例的重复数据的识别方法。One embodiment of the present application also provides a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to execute the duplicate data identification method of the above embodiment.
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。As a non-transitory computer-readable storage medium, memory can be used to store non-transitory software programs and non-transitory computer executable programs. In addition, the memory may include high-speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory may include memory located remotely from the processor, and the remote memory may be connected to the processor through a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
此外,本申请实施例还提供了一种计算机程序产品,包括计算机程序或计算机指令,计算机程序或计算机指令存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取计算机程序或计算机指令,处理器执行计算机程序或计算机指令,使得计算机设备执行如上的重复数据的识别方法。In addition, embodiments of the present application also provide a computer program product, which includes a computer program or computer instructions. The computer program or computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer program from the computer-readable storage medium. Program or computer instructions, the processor executes the computer program or computer instructions, so that the computer device performs the above duplicate data identification method.
以上所描述的移动通信设备实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The mobile communication device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separate, that is, they may be located in one place, or may be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。Those of ordinary skill in the art can understand that all or some steps and systems in the methods disclosed above can be implemented as software, firmware, hardware, and appropriate combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is known to those of ordinary skill in the art, the term computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer. Additionally, it is known to those of ordinary skill in the art that communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
以上是对本申请的若干实施方式进行了说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请精神的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。 The above has described several embodiments of the present application, but the present application is not limited to the above-mentioned embodiments. Those skilled in the art can also make various equivalent modifications or substitutions without violating the spirit of the present application. These equivalents All modifications or substitutions are included in the scope defined by the claims of this application.

Claims (19)

  1. 一种重复数据的识别方法,包括:A method for identifying duplicate data, including:
    获取源目IP地址对信息,所述源目IP地址对信息包括发送端IP地址信息和接收端IP地址信息;Obtain the source and destination IP address pair information, which includes the sending end IP address information and the receiving end IP address information;
    获取重复接入类型信息,所述重复接入类型信息用于表征导致重复数据包产生的IP地址接入类型;Obtain repeated access type information, which is used to characterize the IP address access type that causes duplicate data packets to be generated;
    获取重复接入时间信息,所述重复接入时间信息用于表征重复数据包接入的时间信息;Obtain repeated access time information, which is used to represent the time information of repeated data packet access;
    根据所述源目IP地址对信息、重复接入类型信息与重复接入时间信息中的至少一个信息,对产生重复数据包进行重复数据的识别。According to at least one of the source and destination IP address pair information, repeated access type information and repeated access time information, duplicate data is identified for generated duplicate data packets.
  2. 根据权利要求1所述的重复数据的识别方法,其中,所述方法还包括:The method for identifying duplicate data according to claim 1, wherein the method further includes:
    根据所述重复接入类型信息,构建对应于各个重复接入类型的计数器;Construct a counter corresponding to each repeated access type according to the repeated access type information;
    对所述源目IP地址对信息、所述计数器对应的数值信息与所述重复接入时间信息进行显示。The source and destination IP address pair information, the numerical information corresponding to the counter, and the repeated access time information are displayed.
  3. 根据权利要求1所述的重复数据的识别方法,其中,通过重复接入统计表获取所述获取源目IP地址对信息、重复接入类型信息与重复接入时间信息。The method for identifying duplicate data according to claim 1, wherein the obtained source and destination IP address pair information, repeated access type information and repeated access time information are obtained through a repeated access statistics table.
  4. 根据权利要求3所述的重复数据的识别方法,其中,所述重复接入统计表的获取步骤包括:The method for identifying duplicate data according to claim 3, wherein the step of obtaining the duplicate access statistics table includes:
    获取重复数据包;Get duplicate packets;
    根据所述重复数据包,得到源目IP地址对信息、重复接入类型信息与重复接入时间信息;According to the repeated data packet, source and destination IP address pair information, repeated access type information and repeated access time information are obtained;
    根据所述源目IP地址对信息、所述重复接入类型信息与所述重复接入时间信息,对所述重复接入统计表中的对应信息进行更新,得到更新后的所述重复接入统计表。According to the source and destination IP address pair information, the repeated access type information and the repeated access time information, the corresponding information in the repeated access statistics table is updated to obtain the updated repeated access Statistics table.
  5. 根据权利要求4所述的重复数据的识别方法,其中,所述根据所述重复数据包,得到源目IP地址对信息,包括:The method for identifying duplicate data according to claim 4, wherein said obtaining source and destination IP address pair information based on the duplicate data packet includes:
    对所述重复数据包进行解析,得到所述重复数据包的发送端IP地址信息与接收端IP地址信息,将所述发送端IP地址信息与所述接收端IP地址信息确定为所述源目IP地址对信息。The repeated data packet is analyzed to obtain the sending end IP address information and the receiving end IP address information of the repeated data packet, and the sending end IP address information and the receiving end IP address information are determined as the source and destination. IP address pair information.
  6. 根据权利要求4所述的重复数据的识别方法,其中,所述根据所述重复数据包,得到重复接入类型,包括:The method for identifying duplicate data according to claim 4, wherein said obtaining a duplicate access type according to the duplicate data packet includes:
    对所述重复数据包进行重复接入类型识别,得到所述重复数据包的重复接入类型信息。Repeated access type identification is performed on the repeated data packet to obtain repeated access type information of the repeated data packet.
  7. 根据权利要求4所述的重复数据的识别方法,其中,所述根据所述重复数据包,得到重复接入时间,包括:The method for identifying duplicate data according to claim 4, wherein said obtaining the duplicate access time according to the duplicate data packet includes:
    对所述重复数据包的接入时间进行记录,得到重复接入时间信息。The access time of the repeated data packet is recorded to obtain repeated access time information.
  8. 根据权利要求6所述的重复数据的识别方法,其中,所述对所述重复数据包进行重复接入类型识别,得到所述重复数据包的重复接入类型信息,包括:The method for identifying duplicate data according to claim 6, wherein the step of identifying the duplicate access type of the duplicate data packet to obtain the duplicate access type information of the duplicate data packet includes:
    对接收到的当前数据包进行解析,得到用于标识当前数据包的IP标识信息;Analyze the current data packet received to obtain the IP identification information used to identify the current data packet;
    根据所述IP标识信息,查询得到历史接入来源标记信息;According to the IP identification information, query and obtain historical access source tag information;
    对所述重复数据包的接入来源进行判断,得到接入来源判断结果;Determine the access source of the repeated data packet and obtain the access source determination result;
    根据所述接入来源判断结果与所述历史接入来源标记信息,确定所述重复数据包的重复接入类型信息。 The repeated access type information of the repeated data packet is determined according to the access source judgment result and the historical access source mark information.
  9. 根据权利要求8所述的重复数据的识别方法,其中,所述对所述重复数据包的接入来源进行判断,得到接入来源判断结果,包括:The method for identifying duplicate data according to claim 8, wherein said judging the access source of the duplicate data packet to obtain the access source judgment result includes:
    获取最大生存时间与所述重复数据包的当前生存时间;Obtain the maximum survival time and the current survival time of the duplicate data packet;
    若所述当前生存时间与所述最大生存时间匹配,接入来源判断结果为发送端接入;If the current survival time matches the maximum survival time, the access source judgment result is that the sending end accesses;
    若所述当前生存时间与所述最大生存时间不匹配,接入来源判断结果为接收端接入。If the current survival time does not match the maximum survival time, the access source determination result is access by the receiving end.
  10. 根据权利要求9所述的重复数据的识别方法,其中,所述历史接入来源标记信息包括历史发送端标记信息与历史接收端标记信息;The method for identifying duplicate data according to claim 9, wherein the historical access source mark information includes historical sending end mark information and historical receiving end mark information;
    所述根据所述接入来源判断结果与所述历史接入来源标记信息,确定所述重复数据包的重复接入类型信息,包括:Determining the repeated access type information of the repeated data packet based on the access source judgment result and the historical access source mark information includes:
    在接入来源判断结果为发送端接入的情况下,对所述历史发送端标记信息进行判断,若所述历史发送端标记信息为1,确定所述重复数据包的重复接入类型为发送端IP重复接入类型;When the access source judgment result is that the sending end is accessed, the historical sending end mark information is judged. If the historical sending end mark information is 1, it is determined that the repeated access type of the repeated data packet is send. End IP repeated access type;
    在接入来源判断结果为接收端接入的情况下,对所述历史接收端标记信息进行判断,若所述历史接收端标记信息为1,确定所述重复数据包的重复接入类型为接收端IP重复接入类型;When the access source judgment result is that the receiving end accesses, the historical receiving end tag information is judged. If the historical receiving end tag information is 1, it is determined that the repeated access type of the repeated data packet is receiving. End IP repeated access type;
    在接入来源判断结果为发送端接入的情况下,对所述历史发送端标记信息进行判断,若所述历史发送端标记信息为0,确定所述重复数据包的重复接入类型为发送端-接收端IP重复接入类型;When the access source judgment result is that the sending end is accessed, the historical sending end mark information is judged. If the historical sending end mark information is 0, it is determined that the repeated access type of the repeated data packet is send. End-receiving end IP repeated access type;
    在接入来源判断结果为接收端接入的情况下,对所述历史接收端标记信息进行判断,若所述历史接收端标记信息为0,确定所述重复数据包的重复接入类型为发送端-接收端IP重复接入类型。When the access source judgment result is that the receiving end accesses, the historical receiving end tag information is judged. If the historical receiving end tag information is 0, it is determined that the repeated access type of the repeated data packet is send. End-receiver IP duplicate access type.
  11. 根据权利要求10所述的重复数据的识别方法,其中,所述方法还包括:The method for identifying duplicate data according to claim 10, wherein the method further includes:
    根据当前接入来源判断结果,对所述历史接入来源标记信息进行更新,得到当前接入来源标记信息;According to the current access source judgment result, update the historical access source tag information to obtain the current access source tag information;
    将所述当前接入来源标记信息作为更新后的所述历史接入来源标记信息。The current access source tag information is used as the updated historical access source tag information.
  12. 根据权利要求8所述的重复数据的识别方法,其中,所述历史接入来源标记信息包括历史发送端标记信息与历史接收端标记信息;所述历史接入来源标记信息通过以下步骤得到:The method for identifying duplicate data according to claim 8, wherein the historical access source mark information includes historical sending end mark information and historical receiving end mark information; the historical access source mark information is obtained through the following steps:
    获取最大生存时间与初始数据包,所述初始数据包为首次获取的数据包;Obtain the maximum survival time and the initial data packet, and the initial data packet is the data packet obtained for the first time;
    对所述初始数据包进行解析,得到初始生存时间;Analyze the initial data packet to obtain the initial survival time;
    若所述初始生存时间与所述最大生存时间匹配,将历史发送端标记信息置为1,历史接收端标记信息置为0;If the initial survival time matches the maximum survival time, the historical sending end tag information is set to 1, and the historical receiving end tag information is set to 0;
    若所述初始生存时间与所述最大生存时间不匹配,将历史发送端标记信息置为0,历史接收端标记信息置为1。If the initial survival time does not match the maximum survival time, the historical sending end tag information is set to 0, and the historical receiving end tag information is set to 1.
  13. 根据权利要求9或12所述的重复数据的识别方法,其中,所述获取最大生存时间包括:The method for identifying duplicate data according to claim 9 or 12, wherein said obtaining the maximum survival time includes:
    获取预先配置的IP与最大生存时间对照关系表;Obtain the pre-configured IP and maximum survival time comparison table;
    根据所述发送端IP地址信息在所述IP与最大生存时间对照关系表中进行查询,得到最大生存时间。According to the IP address information of the sending end, a query is performed in the comparison table between the IP and the maximum survival time to obtain the maximum survival time.
  14. 根据权利要求7所述的重复数据的识别方法,其中,所述对所述重复数据包的接入时 间进行记录,得到重复接入时间信息,包括:The method for identifying duplicate data according to claim 7, wherein the access to the duplicate data packet is Record the time to obtain repeated access time information, including:
    获取重复接入开始时间,所述重复接入开始时间为首次发生重复接入的时间;Obtain the repeated access start time, which is the time when the repeated access occurs for the first time;
    获取重复接入最近时间,所述重复接入最近时间为发生重复接入的最近时间;Obtain the latest time of repeated access, where the latest time of repeated access is the latest time when repeated access occurs;
    将所述重复接入最近时间与所述重复接入开始时间的差值,作为重复接入时间信息。The difference between the latest repeated access time and the repeated access start time is used as repeated access time information.
  15. 一种重复数据的识别系统,包括:A duplicate data identification system, including:
    配置模块,被设置为配置IP与最大生存时间对照关系表;The configuration module is set to configure the comparison table between IP and maximum survival time;
    数据包接收模块,被设置为接收数据包,并将所述数据包发送给数据包分析模块;a data packet receiving module configured to receive data packets and send the data packets to the data packet analysis module;
    数据包分析模块,被设置为解析所述数据包得到IP标识信息,根据所述IP标识信息对重复数据包进行识别得到重复接入类型信息,其中,所述IP标识信息包括发送端IP地址信息和接收端IP地址信息;The data packet analysis module is configured to parse the data packet to obtain IP identification information, identify duplicate data packets according to the IP identification information and obtain repeated access type information, wherein the IP identification information includes the sending end IP address information. and receiving end IP address information;
    统计模块,被设置为存储所述重复数据包的发送端IP地址信息、接收端IP地址信息、重复接入类型信息以及所述重复数据包的接入时间信息。The statistics module is configured to store the sending end IP address information, the receiving end IP address information, the repeated access type information and the access time information of the repeated data packets of the repeated data packets.
  16. 根据权利要求15所述的重复数据的识别系统,其中:The duplicate data identification system according to claim 15, wherein:
    所述配置模块还被设置为配置重复数据识别功能的开启或关闭。The configuration module is also configured to configure the opening or closing of the duplicate data identification function.
  17. 一种重复数据的识别装置,包括:A device for identifying duplicate data, including:
    至少一个处理器;at least one processor;
    至少一个存储器,用于存储至少一个程序;At least one memory for storing at least one program;
    当至少一个所述程序被至少一个所述处理器执行时实现如权利要求1至14任意一项所述的重复数据的识别方法。The method for identifying duplicate data according to any one of claims 1 to 14 is implemented when at least one of the programs is executed by at least one of the processors.
  18. 一种计算机可读存储介质,其中存储有处理器可执行的程序,所述处理器可执行的程序被处理器执行时用于实现如权利要求1至14任意一项所述的重复数据的识别方法。A computer-readable storage medium in which a processor-executable program is stored, and when the processor-executable program is executed by the processor, it is used to realize the identification of duplicate data according to any one of claims 1 to 14. method.
  19. 一种计算机程序产品,包括计算机程序或计算机指令,所述计算机程序或所述计算机指令存储在计算机可读存储介质中,计算机设备的处理器从所述计算机可读存储介质读取所述计算机程序或所述计算机指令,所述处理器执行所述计算机程序或所述计算机指令,使得所述计算机设备执行如权利要求1至14中任意一项所述的重复数据的识别方法。 A computer program product comprising a computer program or computer instructions stored in a computer-readable storage medium from which a processor of a computer device reads the computer program Or the computer instructions, the processor executes the computer program or the computer instructions, so that the computer device performs the duplicate data identification method according to any one of claims 1 to 14.
PCT/CN2023/079344 2022-08-12 2023-03-02 Method, system and apparatus for identifying repeated data, and storage medium and product WO2024031972A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210968336.1 2022-08-12
CN202210968336.1A CN117640586A (en) 2022-08-12 2022-08-12 Method, system, device, storage medium and product for identifying repeated data

Publications (1)

Publication Number Publication Date
WO2024031972A1 true WO2024031972A1 (en) 2024-02-15

Family

ID=89850538

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/079344 WO2024031972A1 (en) 2022-08-12 2023-03-02 Method, system and apparatus for identifying repeated data, and storage medium and product

Country Status (2)

Country Link
CN (1) CN117640586A (en)
WO (1) WO2024031972A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130287024A1 (en) * 2012-04-30 2013-10-31 Fujitsu Limited Duplicate packet suppression
CN110912655A (en) * 2019-12-24 2020-03-24 瑞斯康达科技发展股份有限公司 Data redundancy backup method, device, equipment and medium
CN111770023A (en) * 2020-06-28 2020-10-13 湖南有马信息技术有限公司 Message duplicate removal method and device based on FPGA and FPGA chip
CN113055127A (en) * 2021-03-17 2021-06-29 网宿科技股份有限公司 Data message duplicate removal and transmission method, electronic equipment and storage medium
CN114157730A (en) * 2021-10-26 2022-03-08 武汉光迅信息技术有限公司 Message duplicate removal method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130287024A1 (en) * 2012-04-30 2013-10-31 Fujitsu Limited Duplicate packet suppression
CN110912655A (en) * 2019-12-24 2020-03-24 瑞斯康达科技发展股份有限公司 Data redundancy backup method, device, equipment and medium
CN111770023A (en) * 2020-06-28 2020-10-13 湖南有马信息技术有限公司 Message duplicate removal method and device based on FPGA and FPGA chip
CN113055127A (en) * 2021-03-17 2021-06-29 网宿科技股份有限公司 Data message duplicate removal and transmission method, electronic equipment and storage medium
CN114157730A (en) * 2021-10-26 2022-03-08 武汉光迅信息技术有限公司 Message duplicate removal method and device

Also Published As

Publication number Publication date
CN117640586A (en) 2024-03-01

Similar Documents

Publication Publication Date Title
US11693688B2 (en) Recommendation generation based on selection of selectable elements of visual representation
US11743135B2 (en) Presenting data regarding grouped flows
US11349876B2 (en) Security policy recommendation generation
US11288256B2 (en) Dynamically providing keys to host for flow aggregation
US11188570B2 (en) Using keys to aggregate flow attributes at host
US20210029002A1 (en) Anomaly detection on groups of flows
US9003065B2 (en) De-duplicating of packets in flows at layer 3
US8626903B2 (en) Method and device for identifying an SCTP packet
US20210029050A1 (en) Host-based flow aggregation
US20210026863A1 (en) Using keys to aggregate flows at appliance
US10305928B2 (en) Detection of malware and malicious applications
US20210029051A1 (en) Analyzing flow group attributes using configuration tags
US20220321449A1 (en) Centralized error telemetry using segment routing header tunneling
US8626912B1 (en) Automated passive discovery of applications
US20190166008A1 (en) Methods, systems, and computer readable media for network traffic statistics collection
US8868998B2 (en) Packet communication apparatus and packet communication method
WO2016045098A1 (en) Switch, controller, system and link quality detection method
US10355961B2 (en) Network traffic capture analysis
CN109245955B (en) Data processing method and device and server
US10742672B2 (en) Comparing metrics from different data flows to detect flaws in network data collection for anomaly detection
EP4089958A1 (en) Detecting sources of computer network failures
WO2024031972A1 (en) Method, system and apparatus for identifying repeated data, and storage medium and product
US11770360B1 (en) Correlating protocol data units transiting networks with differing addressing schemes
CN117544708A (en) Method for identifying UDP fragment packet to establish connection tracking
CN116032853A (en) Flow control method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23851208

Country of ref document: EP

Kind code of ref document: A1