Disclosure of Invention
The embodiment of the specification provides new technical solutions for detecting network addresses in application platforms.
According to of the present specification, there is provided a method for detecting kinds of network addresses, comprising:
acquiring behavior data of a user for executing webpage operation;
extracting a network address contained in the behavior data;
and carrying out availability detection on the network address.
Optionally, the method further includes:
classifying the network addresses according to the association degree among the network addresses;
determining the business object to which the corresponding classification belongs based on the network address contained in each classification, and
and according to the business object to which each classification belongs, obtaining the business object to which the abnormal network address detected by the availability detection belongs.
Optionally, the determining the business object to which the corresponding classification belongs includes:
looking up a preset initial address contained in each classification, and
and taking the known business object to which the initial address belongs as the business object to which the corresponding classification belongs.
Optionally, the method further comprises extracting a jump relationship between network addresses existing in the behavior data, wherein the jump relationship represents a jump to a network address of another webpage via a network address of an webpage, and
wherein the classifying the network address comprises:
and classifying the network addresses according to the association degree of the network addresses with the jump relation.
Optionally, the classifying the network address includes:
setting a per target network address as a seed network address, wherein the target network address includes at least the initial address;
acquiring all lower-level network addresses of the seed network address according to the jump relation;
according to the association degree between the seed network address and each lower-level network address, obtaining the lower-level network address belonging to the same service object as the seed network address to form service associated data;
taking the lower-level network address which belongs to the same service object with the seed network address as a target network address;
classifying the network addresses according to the traffic-related data corresponding to each of the network addresses.
Optionally, the classifying the network address further includes:
determining a word frequency-inverse text frequency index value for each of said lower level network address;
and determining the association degree between the corresponding lower-level network address and the seed network address according to the word frequency-inverse text frequency index value.
Optionally, the determining the word frequency-inverse text frequency index value for each lower level network address includes:
taking each lower-level network address as a current network address in turn;
determining the number of times the current network address appears in the seed network address as times;
determining the occurrence times of all the lower-level network addresses in the seed network address as a second time;
determining the word frequency of the current network address according to the th times and the second times;
determining the occurrence times of the current network address in all seed network addresses as a third time;
determining the total number of all seed network addresses;
determining the reverse text frequency index of the current network address according to the total number of all the network addresses and the third times;
and obtaining the word frequency-inverse text frequency index value of the current network address according to the word frequency and the inverse text frequency index.
Optionally, the method further includes:
under the condition that an abnormal network address is detected, determining an abnormal business object to which the abnormal network address belongs; and
and alarming to remind the abnormal business object.
Optionally, the method further includes:
extracting a jump relationship between network addresses present in the behavior data, wherein the jump relationship represents a network address jumping to another webpage via a network address of an webpage, an
Obtaining associated data among network addresses according to the jump relation;
wherein the detecting of the availability of the network address comprises:
and carrying out availability detection on the network address according to the associated data.
Optionally, the performing, according to the association data, availability detection on the network address includes:
determining the detection frequency of each network address according to the associated data; and
and according to the detection frequency, carrying out availability detection on the corresponding network address.
Optionally, the method further includes:
determining a hop probability per network address based on the behavior data;
determining the detection frequency of the corresponding network address according to the jump probability; and
and according to the detection frequency, carrying out availability detection on the corresponding network address.
Optionally, the method further includes:
and displaying the abnormal network address under the condition that the abnormal network address is detected.
According to a second aspect of the present specification, there is provided an kind of network address detection apparatus, including:
the data acquisition module is used for acquiring behavior data of webpage operation executed by a user;
the address extraction module is used for extracting the network address contained in the behavior data;
and the availability detection module is used for carrying out availability detection on the network address.
Optionally, the apparatus further comprises:
the classification module is used for classifying the network addresses according to the association degree among the network addresses;
an object determination module for determining the business object to which the corresponding classification belongs according to the network address contained in each classification, and
and the abnormity determining module is used for obtaining the service object to which the abnormal network address detected by the availability detection belongs according to the service object to which each classification belongs.
Optionally, the object determination module is further configured to:
looking up a preset initial address contained in each classification, and
and taking the known business object to which the initial address belongs as the business object to which the corresponding classification belongs.
Optionally, the apparatus further comprises:
a jump relation extracting module for extracting jump relations between network addresses existing in the behavior data, wherein the jump relations represent network addresses jumping to another webpage via a network address of an webpage, and
wherein the classification module is further configured to:
and classifying the network addresses according to the association degree of the network addresses with the jump relation.
Optionally, the classification module further includes:
a seed address setting unit for setting each target network address as a seed network address, wherein the target network address includes at least the initial address;
a lower address obtaining unit, configured to obtain all lower network addresses of the seed network address according to the jump relationship;
the associated data obtaining unit is used for obtaining the lower-level network address which belongs to the same service object with the seed network address according to the association degree between the seed network address and each lower-level network address to form service associated data;
the target address setting unit is used for taking the lower-level network address which belongs to the same service object with the seed network address as a target network address;
a classifying unit, configured to classify the network address according to the service association data corresponding to each network address.
Optionally, the associated data obtaining unit is further configured to:
determining a word frequency-inverse text frequency index value for each of said lower level network address;
and determining the association degree between the corresponding lower-level network address and the seed network address according to the word frequency-inverse text frequency index value.
Optionally, the determining the word frequency-inverse text frequency index value for each lower level network address includes:
taking each lower-level network address as a current network address in turn;
determining the number of times the current network address appears in the seed network address as times;
determining the occurrence times of all the lower-level network addresses in the seed network address as a second time;
determining the word frequency of the current network address according to the th times and the second times;
determining the occurrence times of the current network address in all seed network addresses as a third time;
determining the total number of all seed network addresses;
determining the reverse text frequency index of the current network address according to the total number of all the network addresses and the third times;
and obtaining the word frequency-inverse text frequency index value of the current network address according to the word frequency and the inverse text frequency index.
Optionally, the apparatus further comprises:
the module is used for determining an abnormal business object to which the abnormal network address belongs under the condition that the abnormal network address is detected; and
and the module is used for alarming and reminding the abnormal business object.
Optionally, the apparatus further comprises:
means for extracting a jump relationship between network addresses present in the behavior data, wherein the jump relationship represents a network address jumping to another web page via a network address of an web page, an
A module for obtaining the associated data between the network addresses according to the jump relation;
wherein the detecting of the availability of the network address comprises:
and the module is used for carrying out availability detection on the network address according to the associated data.
Optionally, the performing, according to the association data, availability detection on the network address includes:
determining the detection frequency of each network address according to the associated data; and
and according to the detection frequency, carrying out availability detection on the corresponding network address.
Optionally, the apparatus further comprises:
means for determining a hop probability per network address based on the behavior data;
a module for determining the detection frequency of the corresponding network address according to the jump probability; and
and the module is used for carrying out availability detection on the corresponding network address according to the detection frequency.
Optionally, the apparatus further comprises:
and the module is used for displaying the abnormal network address under the condition that the abnormal network address is detected.
According to a third aspect of the present specification, there is provided electronic devices comprising an apparatus according to the second aspect of the present specification, or comprising a processor and a memory, the memory storing executable instructions for controlling the processor to perform a method according to the aspect of the present specification.
Other features of the present description and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Detailed Description
Various exemplary embodiments of the present specification will now be described in detail with reference to the accompanying drawings.
The following description of at least exemplary embodiments is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
In all of the examples shown and discussed herein, any particular value is exemplary only and not limiting. Thus, the specific values of the exemplary embodiments may have different values in other examples.
It should be noted that like reference numerals and letters refer to like items in the following figures, and thus once a item is defined in figures, it need not be discussed further in in subsequent figures.
< hardware configuration >
Fig. 1 and 2 are block diagrams of a hardware configuration of an electronic apparatus 1000 that can be used to implement the network address detection method of any embodiment of the present specification.
In embodiments, as shown in FIG. 1, the electronic device 1000 may be a server 1100.
The servers 1100 can be of various types, such as, but not limited to, web servers, news servers, mail servers, messaging servers, advertising servers, file servers, application servers, interaction servers, database servers, or proxy servers in embodiments, each server can include hardware, software, or embedded logic components or a combination of two or more such components for performing the appropriate functions supported or implemented by the server.
In this embodiment, the server 1100 may include a processor 1110, a memory 1120, an interface device 1130, a communication device 1140, a display device 1150, and an input device 1160, as shown in fig. 1.
In this embodiment, the server 1100 may also include a speaker, a microphone, and the like, which are not limited herein.
The processor 1110 may be a dedicated server processor, or may be a desktop processor, a mobile version processor, or the like that meets performance requirements, and is not limited herein. The memory 1120 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1130 includes various bus interfaces such as a serial bus interface (including a USB interface), a parallel bus interface, and the like. The communication device 1140 is capable of wired or wireless communication, for example. The display device 1150 is, for example, a liquid crystal display panel, an LED display panel touch display panel, or the like. Input devices 1160 may include, for example, a touch screen, a keyboard, and the like.
In this embodiment, the memory 1120 of the server 1100 is configured to store instructions for controlling the processor 1110 to operate at least to perform the method for detecting a network address according to any embodiment of the present description. The skilled person can design the instructions according to the solution disclosed in the present specification. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.
Although a number of devices are shown in fig. 1 for server 1100, this description may refer to only some of the devices, for example, server 1100 may refer to only memory 1120 and processor 1110.
In the embodiments, the electronic device 1000 may be a terminal device 1200 such as a PC or a notebook computer used by an operator, which is not limited herein.
In this embodiment, referring to fig. 2, the terminal apparatus 1200 may include a processor 1210, a memory 1220, an interface device 1230, a communication device 1240, a display device 1250, an input device 1260, a speaker 1270, a microphone 1280, and the like.
The processor 1210 may be a mobile version processor. The memory 1220 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1230 includes, for example, a USB interface, a headphone interface, and the like. The communication device 1240 may be capable of wired or wireless communication, for example, the communication device 1240 may include a short-range communication device, such as any device that performs short-range wireless communication based on short-range wireless communication protocols, such as the Hilink protocol, WiFi (IEEE 802.11 protocol), Mesh, bluetooth, ZigBee, Thread, Z-Wave, NFC, UWB, LiFi, and the like, and the communication device 1240 may also include a long-range communication device, such as any device that performs WLAN, GPRS, 2G/3G/4G/5G long-range communication. The display device 1250 is, for example, a liquid crystal display, a touch display, or the like. The input device 1260 may include, for example, a touch screen, a keyboard, and the like. A user can input/output voice information through the speaker 1270 and the microphone 1280.
In this embodiment, the memory 1220 of the terminal device 1200 is configured to store instructions for controlling the processor 1210 to operate at least to perform a method of detecting a network address according to any embodiment of the present description. The skilled person can design the instructions according to the solution disclosed in the present specification. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.
Although a plurality of devices of the terminal apparatus 1200 are shown in fig. 2, the present specification may refer to only some of the devices, for example, the terminal apparatus 1200 refers to only the memory 1220, the processor 1210 and the display device 1250.
< method examples >
Fig. 3 is a schematic flow chart of a method for detecting a network address according to an embodiment of the present disclosure.
In examples, the method shown in FIG. 3 can be implemented by a server or a terminal device alone, or by both a server and a terminal device, hi embodiments, the terminal device can be the terminal device 1200 shown in FIG. 2, and the server can be the server 1100 shown in FIG. 1.
As shown in fig. 3, the method of the present embodiment includes the following steps S302 to S306:
step S302, behavior data of the user for executing the webpage operation is obtained.
The behavior data is data characterizing the behavior of a user performing a web page operation, from which at least the network address of the web page visited by the user, and the network address of the web page at , may be determined.
In or more embodiments of the present specification, a buried point may be set in advance in an application platform of an H5 page or an applet hosting an external merchant to acquire behavior data of a plurality of users performing web page operations in the application platform.
In or more embodiments of the present specification, behavior data may be acquired at a set sampling frequency within a set period, where the set period may be set in advance according to an application scenario or a specific requirement, for example, the set period may be 1 day, the set sampling frequency may be set in advance according to an application scenario or a specific requirement, for example, the set sampling frequency may be 1 minute/time, and then, behavior data of times of user performing web page operations may be acquired every 1 minute within 1 day.
Step S304, extracts the network address included in the behavior data.
The network address may be a Uniform Resource Locator (URL) of , which is a compact representation of the location and access method of a Resource available from the internet, and is an address of a standard Resource on the internet.
The network addresses included in the behavior data are the network addresses visited by the user, and the number of the network addresses extracted in this step may be , or may be multiple.
Step S306, the availability detection is carried out on the network address.
Specifically, the availability of the network address may be detected by simulating to access the network address to detect whether the network address is available.
In embodiments, the availability of network addresses of web pages that are not pre-configured in the application platform but are accessible from the application platform can be detected, which can improve the reliability of network address detection.
In addition, the network addresses needing to be detected can be obtained for availability detection without setting buried points for the network addresses in an H5 page or an internal code corresponding to an applet by a merchant who is resident in an application platform, so that the availability detection process of the network addresses can be further simplified , the discovery rate of the situation that the network addresses accessed through the application platform are unavailable can be improved, and the user experience is improved.
In one or more embodiments of the present specification , if multiple network addresses are extracted in step S304, the method may further include steps S402 to S406 shown in fig. 4:
step S402, classifying the network addresses according to the association degree among the network addresses.
In one or more embodiments of this specification , the degree of association may be a parameter that characterizes the degree of association between network addresses, i.e., a parameter that characterizes the degree of association of URLs of two network addresses.
In the th embodiment of the present specification, the association degree between every two network addresses may be determined, and the two network addresses with the association degree greater than or equal to the preset association degree threshold are classified into the same classification.
For example, the threshold of the degree of association may be, but is not limited to, 0.8, and then two network addresses with the degree of association greater than or equal to 0.8 may be classified into the same category.
In one or more embodiments of the present specification , the method may further include extracting a jump relationship between network addresses present in the behavior data, wherein the jump relationship represents a network address jumping to another web page via a network address of the web page.
Then, classifying the network address may further include:
and classifying the network addresses according to the association degree of the network addresses with the jump relation.
In the second embodiment of the present specification, classifying the network addresses according to the association degree between the network addresses having the jump relationship may include steps S502 to S510 shown in fig. 5:
step S502, each target network address is set as a seed network address, wherein, the target network address at least comprises an initial address.
Step S504, according to the jump relation, all the lower network addresses of the seed network address are obtained.
In embodiments, the lower level network address may be a network address that can be opened after jumping via the network address.
For example, the network address of the lower webpage corresponding to the seed network address URL1 in the jump relationship is URL1.1, and the network address of the lower webpage corresponding to the network address URL1.1 in the jump relationship is URL1.1.1, so that the network address URL1.1 and the network address URL1.1.1 are both lower-level network addresses of the seed network address URL1.
In another embodiments, the lower level network address may also be the network address of the lower web page corresponding to the seed network address in the jump relationship.
For example, as shown in fig. 6, the network addresses of the lower webpage corresponding to the seed network address URL1 in the jump relationship include URL1.1, URL1.2, and URL1.3, the network addresses of the lower webpage corresponding to the network address URL1.1 in the jump relationship include URL1.1.1, URL1.1.2, and URL1.1.3, and the network addresses of the lower webpage corresponding to the network address URL1.2 include URL1.2.1 and URL1.2.2, so that the network addresses include URL1.1, URL1.2, and URL1.3 are lower-level network addresses of the seed network address URL1, and the network addresses URL1.1.1, URL1.1.2, URL1.1.3, URL1.2.1, and URL1.2.2 are not lower-level network addresses of the seed network address 1.
Step S506, according to the association degree between the seed network address and each lower level network address, obtaining the lower level network address belonging to the same service object with the seed network address, and forming service associated data.
Specifically, the lower-level network address whose association with the seed network address is greater than or equal to a preset association threshold may be used as the lower-level network address belonging to the same service object as the seed network address.
On this basis, the method may further include a step of determining a degree of association between the seed network address and each lower-level network address, specifically including steps S602 to S604 shown as follows:
step S602, determining a word frequency-inverse text frequency index value for each lower level network address.
In this embodiment, the term frequency-inverse text frequency index is the TF-IDF value. TF-IDF (term frequency-inverse text frequency index) is a commonly used weighting technique for information retrieval and data mining.
The importance of a subordinate network address increases in direct proportion to the number of times it appears in the corresponding seed network address, but at the same time decreases in inverse proportion to the frequency of its appearance in all seed network addresses.
If the frequency TF of lower network addresses appearing in the corresponding seed network address is high and rarely appears in other seed addresses, the lower network address is considered to have a higher association degree with the corresponding seed network address and belong to the same service object.
In embodiments, the word frequency and the inverse text frequency index of each lower level network address are respectively calculated, and then the product of the word frequency and the inverse text frequency index of each lower level network address is respectively calculated as the word frequency-inverse text frequency index value of the corresponding lower level network address.
The manner of determining the word frequency-inverse text frequency index value per lower level network address may include:
taking each lower-level network address as the current network address in turn, respectively determining the word frequency and the reverse text frequency index of the current network address, and determining the word frequency-reverse text frequency index value of the current network address according to the word frequency and the reverse text frequency index of the current network address.
The method for calculating the reverse text frequency index of the current network address comprises the steps of determining the number of times that the current network address appears in the seed network address as th times, determining the number of times that all lower-level network addresses appear in the seed network address as second times, determining a ratio of th times to th times as the word frequency of the current network address, determining the total number of all seed network addresses, determining the number of times that the current network address appears in all seed network addresses as third times, determining a second ratio of the total number of all seed network addresses to the third times, and calculating the logarithm of the second ratio with the base 10 as the reverse text frequency index of the current network address.
For example, if the total number of all the subordinate network addresses of the seed network address URL1 is N1 and the number of occurrences of the current network address URL1.1 is N2, the word frequency of the current network address URL1.1 in the seed network address URL1 is N2/N1.
If the number of occurrences of the current network address URL1.1 in all seed network addresses is M1 and the total number of seed network addresses is M2, then the reverse text frequency index for the current network address URL1.1 may be lg (M2/M1).
Then, the TF-IDF value of the current network address URL1.1 may be the product of the corresponding word frequency and the inverse text frequency index, i.e., N2/N1 × lg (M2/M1).
In another embodiments, the TF-IDF value of each lower network address can be determined by training the obtained TF-IDF model in advance.
Step S604, according to the word frequency-inverse text frequency index value of each lower-level network address, determining the association degree between the corresponding lower-level network address and the seed network address.
In embodiments, the TF-IDF value of each lower network address can be used as the association degree between the corresponding lower network address and the seed network address.
For example, with respect to the lower network address URL1.1 of the seed network address URL1, the TF-IDF value of the lower network address URL1.1 is calculated in the above step S602, and the TF-IDF value can be used as the degree of association between the lower network address URL1.1 and the seed network address URL1.
Step S508, the lower level network address belonging to the same service object as the seed network address is used as the target network address.
In this embodiment, the lower network address belonging to the same service object as the seed network address in each iteration process can be obtained by an iteration process in which the lower network address belonging to the same service object as the seed network address is used as the target network address.
Further , the iteration process of using the lower network address belonging to the same service object as the seed network address as the target network address is ended when the iteration number reaches the preset iteration number, or the iteration process of using the lower network address belonging to the same service object as the seed network address is ended when there is no lower network address belonging to the same service object as the seed network address in the iteration process.
Step S510, classifying the network addresses according to the service association data corresponding to each network address.
Specifically, the network addresses belonging to the same business object may be classified into the same classification.
In the example shown in fig. 6, in the iteration process, the initial address URL1 may be used as a seed network address, the lower level network address belonging to the same service object as the seed network address URL1 is obtained to include URL1.1 and URL1.2, in the second iteration process, the lower level network address URL1.1 and URL1.2 are respectively used as seed network addresses, the lower level network address belonging to the same service object as the seed network address URL1.1 is obtained to include URL1.1.1 and URL1.1.3, the lower level network address belonging to the same service object as the seed network address URL1.2 is obtained to include URL1.2.1, in the third iteration process, the lower level network addresses URL1.1.1, URL1.1.3 and URL1.2.1 are respectively used as seed network addresses, the lower level network address belonging to the same service object as the seed network address URL1.1.1 is not obtained, the lower level network address belonging to the same service object as the seed network address URL1.1.3 is obtained to be URL1.1.3.1, the lower level network address belonging to the same service object as the seed network address URL1.2.1 is not obtained as the seed network address URL1.1.3.1, and the lower level network address belonging to be obtained as seed network address in the fourth iteration process, the seed network address URL1.1.3.1 is obtained.
The lower network addresses belonging to the same service object as the initial address URL1 include network addresses URL1.1 and URL1.2, the lower network addresses belonging to the same service object as the network address URL1.1 include network addresses URL1.1.1 and URL1.1.3, the lower network addresses belonging to the same service object as the network address URL1.2 include a network address URL1.2.1, and the lower network addresses belonging to the same service object as the network address URL1.1.3 are URL1.1.3.1, so that the initial address URL1, the network address URL1.1, the network address URL1.2, the network address URL1.1.1, the network address URL1.1.3, the network address URL1.2.1, and the network address URL1.1.3.1 belong to the same classification.
In the third embodiment of the present specification, an initial address corresponding to each business object may be configured in advance, and then, by classifying network addresses, each classification includes an initial address.
Specifically, each target network address is set as a seed network address, wherein the target network address at least comprises an initial address, other network addresses of the same business object belonging to the seed network address are obtained according to the relevance between the seed network address and the other network addresses to form business relevant data, the other network addresses are network addresses except the seed network address, the other network addresses belonging to the same business object with the seed network address are set as target network addresses, and the network addresses are classified according to the business relevant data corresponding to each network address.
The implementation manner of each step in this embodiment may specifically refer to the second embodiment described above, and is not described herein again.
Step S404, according to the network address contained in each classification, determining the business object to which the corresponding classification belongs.
In one or more embodiments of this specification , which may be preconfigured with an initial address corresponding to each business object, then determining the business object to which the corresponding classification belongs according to the network address contained in each classification may include:
searching preset initial address contained in each classification, and using the business object to which the known initial address belongs as the business object to which the corresponding classification belongs.
For example, if the initial address of the service object a configured on the application platform is URL1, then the classification including URL1 includes network addresses URL1.1 to URL1.6, and then the service objects to which the network addresses URL1.1 to URL1.6 belong are all the service objects a.
Step S406, based on the business object to which each classification belongs, obtains the business object to which the abnormal network address detected by the availability detection belongs.
Specifically, in the process of detecting the availability of the network address, if an abnormal network address is detected, the abnormal service object to which the abnormal network address belongs may be obtained according to the service object to which the classification including the abnormal network address belongs, and the abnormal service object may be prompted to maintain the abnormal network address.
In embodiments, the testing method may prompt the abnormal service object to which the abnormal network address belongs to maintain the abnormal network address, so as to facilitate timely repairing the abnormal network address.
In one or more embodiments of this specification , the method may further comprise:
and under the condition that the abnormal network address is detected, displaying the abnormal network address and the abnormal business object to which the abnormal network address belongs.
In or more embodiments of the present specification, the method may further include extracting a jump relationship between network addresses existing in the behavior data, and obtaining association data between the network addresses according to the jump relationship.
On this basis, the availability check of the network address may further include:
and according to the associated data, carrying out availability detection on the network address.
In this embodiment, the association data may be traffic association data corresponding to each network address in the previous embodiment the association data may be data in a tree structure as shown in fig. 6.
In or more embodiments of the present description, detecting availability of the network address based on the association data may include:
determining the detection frequency of each network address according to the associated data; and according to the detection frequency, carrying out availability detection on the corresponding network address.
It may be that the level of the network data at the upper layer is higher than that of the network data at the lower layer in the tree structure of the associated data, and correspondingly, the detection frequency of the network data at the upper layer is higher than that of the network data at the lower layer in the tree structure of the associated data. Therefore, the availability detection frequency of the upper-level network address can be improved, and the jump to the corresponding lower-level network address can be guaranteed.
Specifically, the detection frequency corresponding to each level may be preset, and the detection frequency corresponding to each network address may be obtained according to the level corresponding to each network address in the associated data.
For example, in the tree structure shown in FIG. 6, the network address URL1 has the highest rank, followed by the network addresses URL1.1 and URL1.2, followed by the network addresses URL1.1.1, URL1.1.3 and URL1.2.1, and finally followed by the network address URL 1.1.3.1. then, it can be determined that the detection frequency of the network address URL1 is the frequency, the detection frequencies of the network addresses URL1.1 and URL1.2 are the second frequency, the detection frequencies of the network addresses URL1.1.1, URL1.1.3 and URL1.2.1 are the third frequency, and the detection frequency of the network address URL1.1.3.1 is the fourth frequency, wherein the frequency ≧ the second frequency ≧ the third frequency ≧ the fourth frequency.
In or more embodiments of the present description, the method may further comprise:
determining the jump probability of each network address according to the behavior data, determining the detection frequency of the corresponding network address according to the jump probability, and carrying out the availability detection on the corresponding network address according to the detection frequency.
Specifically, the access times of each network address may be obtained according to the behavior data. According to the access times of each network address, the jump probability of each network address can be obtained. The hop probability of each network address may be a ratio between the number of accesses of the corresponding network address and the number of accesses of all network addresses. The hop probability can embody the probability of the user accessing the corresponding network address, and therefore, the detection frequency of the corresponding network address can be set according to the hop probability.
For example, if the number of accesses to the m network addresses URL1 to URLm is N1 for the network address URL1, N2 and … … for the network address URL2, and Nm for the network address URLm, the probability of jumping of the i-th network address URLi is determinedCan be that
In this embodiment, a comparison table reflecting the correspondence between the hop probability range and the detection frequency may be preset, so that the corresponding detection frequency is higher when the hop probability is higher. By looking up the lookup table, the detection frequency of each network address can be determined, and the availability detection is performed on each network address according to the detection frequency.
In embodiments, the test method can increase the frequency of availability detection for commonly used network addresses to ensure that web pages with a large number of user visits are available.
< example 1>
The following describes, by using specific examples, a procedure implemented by the network address detection method in the embodiment of the present specification.
Step S702, behavior data of the user executing the web page operation is acquired.
Step S704, extracts the network addresses existing in the behavior data and the jump relationship between the network addresses.
Step S706, setting every target network address as a seed network address, wherein the target network address at least includes a preset initial address.
Step S708, according to the jump relation, all the lower level network addresses of the seed network address are obtained.
Step S710, determining the word frequency-inverse text frequency index value of each lower level network address.
Step S712, determining the association degree between the corresponding lower-level network address and the seed network address according to the word frequency-inverse text frequency index value of each lower-level network address.
Step S714, according to the association degree between the seed network address and each lower level network address, obtaining the lower level network address belonging to the same service object as the seed network address, and forming service associated data.
Step S716, the lower level network address belonging to the same service object as the seed network address is used as the target network address.
Step S718, classifying the network addresses according to the service association data corresponding to each network address.
Step S720, using the service object to which the known initial address belongs as the service object to which the corresponding classification belongs.
In step S722, when the abnormal network address is detected, the abnormal service object to which the abnormal network address belongs is specified.
Step S724, alarming and prompting the abnormal business object, and displaying the abnormal network address and the abnormal business object.
< apparatus >
In this embodiment, kinds of detection devices 8000 of network addresses are provided, as shown in fig. 8, the detection device 8000 of network addresses includes a data obtaining module 8100, an address extracting module 8200 and an availability detecting module 8300, where the data obtaining module 8100 is configured to obtain behavior data of a user performing a web page operation, the address extracting module 8200 is configured to extract a network address included in the behavior data, and the availability detecting module 8300 is configured to detect availability of a network address.
In or more embodiments of the present disclosure, the apparatus 8000 may further include a classification module 8400, an object determination module 8500, and an anomaly determination module 8600 as shown in fig. 9, where the classification module 8400 is configured to classify network addresses according to the association between the network addresses, the object determination module 8500 is configured to determine a service object to which a corresponding classification belongs according to the network addresses included in each classification, and the anomaly determination module 8600 is configured to obtain a service object to which an abnormal network address detected by the availability detection belongs according to the service object to which each classification belongs.
In or more embodiments of the present description, the object determination module 8500 may also be to:
looking up a preset initial address contained in each classification, and
and taking the service object to which the known initial address belongs as the service object to which the corresponding classification belongs.
In or more embodiments of the present description, the apparatus 8000 may further include a jump relation extraction module 8700 as shown in fig. 10 for extracting jump relations between network addresses existing in the behavior data, wherein a jump relation indicates a network address jumping to another web page via a network address of web page, wherein the classification module 8400 may further be configured to:
and classifying the network addresses according to the association degree of the network addresses with the jump relation.
In or more embodiments of the present specification, the classification module 8400 may further include a seed address setting unit 8410, a lower address obtaining unit 8420, an association data obtaining unit 8430, a target address setting unit 8440, and a classification unit 8450 as shown in fig. 10, the seed address setting unit 8410 is configured to set each target network address as a seed network address, wherein the target network address includes at least an initial address, the lower address obtaining unit 8420 is configured to obtain all lower network addresses of the seed network address according to a jump relation, the association data obtaining unit 8430 is configured to obtain lower network addresses belonging to the same service object as the seed network address according to a degree of association between the seed network address and each lower network address to form service association data, the target address setting unit 8440 is configured to classify the network addresses according to service association data corresponding to each network address.
In or more embodiments of the present description, the association data obtaining unit 8430 may further be configured to:
determining a word frequency-inverse text frequency index value of each lower level network address;
and determining the association degree between the corresponding lower-level network address and the seed network address according to the word frequency-inverse text frequency index value.
In or more embodiments of the present specification, determining the word frequency-inverse text frequency index value for each lower level network address comprises:
taking each lower-level network address as the current network address in turn;
determining the number of times that the current network address appears in the seed network address as times;
determining the occurrence times of all the lower-level network addresses in the seed network address as a second time;
determining the word frequency of the current network address according to the th times and the second times;
determining the occurrence times of the current network address in all the seed network addresses as a third time;
determining the total number of all seed network addresses;
determining the reverse text frequency index of the current network address according to the total number and the third times of all the network addresses;
and obtaining the word frequency-inverse text frequency index value of the current network address according to the word frequency and the inverse text frequency index.
In or more embodiments of the present description, the apparatus 8000 may further include:
a module for determining an abnormal service object to which the abnormal network address belongs in the case of the detected abnormal network address; and
and the module is used for alarming and reminding the abnormal business object.
In or more embodiments of the present description, the apparatus 8000 further includes:
means for extracting a jump relationship between network addresses present in the behavior data, wherein the jump relationship represents a network address jumping to another web page via a network address of an web page, an
A module for obtaining the associated data between the network addresses according to the jump relation;
wherein the availability detection module 8300 is further operable to:
and the module is used for carrying out availability detection on the network address according to the associated data.
In or more embodiments of the present description, detecting availability of the network address based on the association data includes:
determining the detection frequency of each network address according to the associated data; and
and according to the detection frequency, carrying out availability detection on the corresponding network address.
In or more embodiments of the present description, the apparatus 8000 may further include:
means for determining a hop probability per network address based on the behavioral data;
a module for determining the detection frequency of the corresponding network address according to the jump probability; and
and the module is used for carrying out availability detection on the corresponding network address according to the detection frequency.
In or more embodiments of the present description, the apparatus 8000 may further include:
and the module is used for displaying the abnormal network address under the condition that the abnormal network address is detected.
It should be understood by those skilled in the art that the means 8000 for detecting a network address can be implemented in various ways, for example, the means 8000 for detecting a network address can be implemented by configuring a processor with instructions, for example, the instructions can be stored in a ROM and read from the ROM into a programmable device to implement the means 8000 for detecting a network address when the device is activated, for example, the means 8000 for detecting a network address can be consolidated into a dedicated device (e.g., a processor), the means 8000 for detecting a network address can be divided into separate units or can be combined at , the means 8000 for detecting a network address can be implemented by of the various implementations described above, or can be implemented by a combination of two or more of the various implementations described above.
In this embodiment, the network address detection device 8000 may have various implementations, for example, the network address detection device 8000 may be any functional module running in a software product or application providing a network address detection function, or a peripheral insert, a plug-in, a patch, etc. of the software product or application, or the software product or application itself.
< electronic apparatus >
In this embodiment, electronic devices 9000 are also provided, the electronic devices 9000 may include the server 1100 as shown in fig. 1, and the electronic devices 9000 may also be the terminal device 1200 as shown in fig. 2.
At , the electronic device 9000 may comprise the aforementioned means for detecting network address 8000 for carrying out the method of any of the embodiments of the present description.
In another aspect , as shown in fig. 11, the electronic device 9000 can further comprise a processor 9100 and a memory 9200, the memory 9200 is configured to store executable instructions, and the processor 9100 is configured to execute the electronic device 9000 to perform the method for detecting a network address according to any embodiment of the present specification under the control of the instructions.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and the description of each embodiment is different from the description of the other embodiments. In particular, as for the device embodiment and the electronic apparatus embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the partial description of the method embodiment.
The present description may be a method and/or a computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the specification.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
Computer program instructions for carrying out operations of the present description may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in or any combination of programming languages, including object oriented programming languages such as Smalltalk, C + + or the like, as well as conventional procedural programming languages, such as the "C" language or similar programming languages.
Aspects of the present description are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the description. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks .
The computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks .
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present specification, and in this regard, each block in the flowcharts or block diagrams may represent part of modules, program segments, or instructions, part of which comprises or more executable instructions for implementing the specified logical functions.
The foregoing description of the embodiments of the present specification has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the present description is defined by the appended claims.
In addition, the processes depicted in the accompanying figures do not require the particular order shown or connected to the order shown to achieve desirable results .