CN108197142B - Method, device, storage medium and equipment for determining relevance of network transaction - Google Patents

Method, device, storage medium and equipment for determining relevance of network transaction Download PDF

Info

Publication number
CN108197142B
CN108197142B CN201711195221.9A CN201711195221A CN108197142B CN 108197142 B CN108197142 B CN 108197142B CN 201711195221 A CN201711195221 A CN 201711195221A CN 108197142 B CN108197142 B CN 108197142B
Authority
CN
China
Prior art keywords
transaction
name
hash value
list
word vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711195221.9A
Other languages
Chinese (zh)
Other versions
CN108197142A (en
Inventor
武超
石子凡
许力
纪勇
黄治纲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201711195221.9A priority Critical patent/CN108197142B/en
Publication of CN108197142A publication Critical patent/CN108197142A/en
Application granted granted Critical
Publication of CN108197142B publication Critical patent/CN108197142B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1865Transactional file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure relates to a method, a device, a storage medium and a device for determining network transaction relevance, wherein the method comprises the following steps: the method comprises the steps of obtaining a first transaction list of a target log, determining a word vector corresponding to the name of each transaction in the first transaction list through a preset document word vector model, obtaining a first relation tree according to the word vectors and a preset relation tree creation rule when the first transaction needing to be checked is determined, and then determining the association degree of the first transaction and other transactions according to the first relation tree. Therefore, operation and maintenance personnel can conveniently analyze the relevance between any affair and other affairs, so that when the system is configured, the system setting is better optimized according to the relevance between the affairs, and the system performance is improved.

Description

Method, device, storage medium and equipment for determining relevance of network transaction
Technical Field
The present disclosure relates to the field of network technologies, and in particular, to a method, an apparatus, a storage medium, and a device for determining a network transaction relevance.
Background
With the development of network technology, network systems of enterprises are developed more and more, and more transactions are processed in the network systems, so that the relationships among a large number of transactions cannot be recorded and analyzed only by means of documents of developers. In the prior art, for business transactions in a complex enterprise network system, system developers cannot clearly and accurately determine the association degree between the transactions, and the association of the business transactions between the enterprise network systems conceived during early development may be different after the enterprise network systems are online. However, if the incidence relation between the transactions after the system actually runs can be known, the cache optimization strategy can be adjusted, and the system performance is improved.
Therefore, how to obtain the relationship between network transactions in the log by analyzing a large number of logs is an urgent problem to be solved.
Disclosure of Invention
The invention aims to provide a method, a device, a storage medium and equipment for determining the relevance of network transactions, which are used for overcoming the problem that the relationship among the network transactions cannot be obtained by analyzing a large number of logs in a manual mode.
In order to achieve the above object, the present disclosure provides a method for determining network transaction relevance, the method including:
acquiring a first transaction list of a target log, wherein the first transaction list comprises a name hash value of each transaction in the target log and a timestamp corresponding to each transaction;
determining a word vector corresponding to the name of each transaction in the first transaction list through a preset document word vector model;
when a first transaction needing to be viewed is determined, acquiring a first relation tree according to the word vector and a preset relation tree creating rule, wherein the first relation tree comprises the first transaction and other transactions related to the first transaction;
and determining the association degree of the first transaction and the other transactions according to the first relation tree.
Optionally, the obtaining the first transaction list of the target log includes:
extracting the name of each transaction and the timestamp corresponding to each transaction from the target log;
sequencing all the transactions in the first transaction list according to the timestamp corresponding to each transaction to obtain a first sequence;
replacing the name of each transaction with the name hash value of each transaction by using a preset hash algorithm;
and generating the first transaction list comprising the name hash value of each transaction and the timestamp corresponding to each transaction according to the first sequence.
Optionally, replacing the name of each transaction with the name hash value of each transaction by using a preset hash algorithm includes:
acquiring an integer hash value of the name of each transaction by using a preset hash value calculation formula according to the name of each transaction;
converting the integer hash value of the name of each transaction into a hexadecimal hash value;
performing A-added hexadecimal calculation on each bit of the hexadecimal hash value of each transaction to obtain a name hash value of each transaction;
replacing the name of each transaction with the name hash value of each transaction;
wherein the hash value calculation formula includes:
Figure BDA0001481826560000031
wherein HV (i) is an integer hash value representing the name of the ith transaction in the first ordering; n represents the total number of characters of the name of the ith transaction, t represents the tth character in the name of the ith transaction, and s [ t ] represents the ten thousand code of the tth character.
Optionally, the preset document word vector model is determined by training using a continuous bag-of-words model and a history log.
Optionally, the relationship tree creation rule includes a number of child nodes of the relationship tree, a depth and a transaction non-repetition principle; when the first transaction needing to be viewed is determined, the obtaining of the first relation tree according to the word vector and a preset relation tree creating rule comprises the following steps:
determining the first transaction as a root node of the first relationship tree;
determining the number and depth of child nodes of the first relation tree according to the relation tree creating rule;
and establishing the first relation tree by utilizing depth-first search according to the root node, the number of the child nodes, the depth and the word vector under the transaction non-repetition principle.
In a second aspect of the embodiments of the present disclosure, an apparatus for determining network transaction relevance is provided, where the apparatus includes:
the system comprises a list acquisition module, a list acquisition module and a list processing module, wherein the list acquisition module is used for acquiring a first transaction list of a target log, and the first transaction list comprises a name hash value of each transaction in the target log and a timestamp corresponding to each transaction;
the word vector acquisition module is used for determining a word vector corresponding to the name of each transaction in the first transaction list through a preset document word vector model;
the relation tree determining module is used for acquiring a first relation tree according to the word vector and a preset relation tree creating rule when a first transaction needing to be checked is determined, wherein the first relation tree comprises the first transaction and other transactions related to the first transaction;
and the association degree determining module is used for determining the association degree of the first transaction and the other transactions according to the first relation tree.
Optionally, the list obtaining module includes:
the extraction submodule is used for extracting the name of each transaction and the timestamp corresponding to each transaction from the target log;
the ordering submodule is used for ordering all the transactions in the first transaction list according to the timestamp corresponding to each transaction to obtain a first ordering;
the calculation submodule is used for replacing the name of each transaction with the name hash value of each transaction by using a preset hash algorithm;
and the list generation submodule is used for generating the first transaction list comprising the name hash value of each transaction and the timestamp corresponding to each transaction according to the first sequence.
Optionally, the calculation sub-module includes:
the first obtaining submodule is used for obtaining an integer hash value of the name of each transaction by utilizing a preset hash value calculation formula according to the name of each transaction;
a conversion submodule, configured to convert the integer hash value of the name of each transaction into a hexadecimal hash value;
the second obtaining submodule is used for carrying out A-added hexadecimal calculation on each bit of the hexadecimal hash value of each transaction to obtain a name hash value of each transaction;
a replacing submodule, configured to replace the name of each transaction with the name hash value of each transaction;
wherein the hash value calculation formula includes:
Figure BDA0001481826560000041
wherein HV (i) is an integer hash value representing the name of the ith transaction in the first ordering; n represents the total number of characters of the name of the ith transaction, t represents the tth character in the name of the ith transaction, and s [ t ] represents the ten thousand code of the tth character.
Optionally, the preset document word vector model is determined by training using a continuous bag-of-words model and a history log.
Optionally, the relationship tree creation rule includes a number of child nodes of the relationship tree, a depth and a transaction non-repetition principle; the relationship tree determination module includes:
a first determining submodule, configured to determine the first transaction as a root node of the first relationship tree;
the second determining submodule is used for determining the number and the depth of the child nodes of the first relation tree according to the relation tree creating rule;
and the relation tree establishing submodule is used for establishing the first relation tree by utilizing depth-first search according to the root node, the number of the child nodes, the depth and the word vector under the transaction non-repetition principle.
In a third aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, implements the steps of the method of any one of the first aspect.
In a fourth aspect of the embodiments of the present disclosure, an electronic device is provided, including:
the computer-readable storage medium of the third aspect; and
one or more processors to execute the computer program in the computer-readable storage medium.
According to the method, the device, the storage medium and the equipment for determining the relevance of the network transaction, a first transaction list of a target log is obtained, wherein the first transaction list comprises a name hash value of each transaction in the target log and a timestamp corresponding to each transaction; determining a word vector corresponding to the name of each transaction in the first transaction list through a preset document word vector model; when a first transaction needing to be viewed is determined, acquiring a first relation tree according to the word vector and a preset relation tree creating rule, wherein the first relation tree comprises the first transaction and other transactions related to the first transaction; and determining the association degree of the first transaction and the other transactions according to the first relation tree. Therefore, the relevance among the network transactions can be analyzed in the system log of the network system with a large amount of data, and the relationship tree is established, so that the transaction related to the abnormal transaction can be determined through the relationship tree after the abnormal transaction occurs in a certain network transaction, thereby quickly and conveniently solving the abnormal transaction and optimizing the system setting.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
FIG. 1 is a flow diagram illustrating a method for network transaction affinity determination in accordance with an exemplary embodiment;
FIG. 2 is a schematic structural diagram of a CBOW model;
FIG. 3 is a flow diagram illustrating another method of network transaction affinity determination in accordance with an exemplary embodiment;
FIG. 4 is a flow diagram illustrating yet another method of network transaction affinity determination in accordance with an illustrative embodiment;
FIG. 5 is a flow diagram illustrating yet another method of network transaction affinity determination in accordance with an illustrative embodiment;
FIG. 6 is a diagram illustrating a first relationship tree structure, according to an example embodiment;
FIG. 7 is a block diagram illustrating a network transaction association determination apparatus in accordance with an exemplary embodiment;
FIG. 8 is a block diagram illustrating a list acquisition module in accordance with an exemplary embodiment;
FIG. 9 is a block diagram illustrating a computation submodule in accordance with an exemplary embodiment;
FIG. 10 is a block diagram illustrating a relationship tree determination module in accordance with an exemplary embodiment;
FIG. 11 is a block diagram illustrating an electronic device in accordance with an example embodiment.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
Before the embodiments of the present disclosure are described, the terms and application scenarios referred to in the present disclosure will be explained and described first, and the terms referred to in the present disclosure will be described first:
a Hash value, also commonly translated as a "Hash value," is the transformation of an input of arbitrary length, through a Hash algorithm, into an output of fixed length, which is the Hash value. This conversion is a compression mapping, the space occupied by the hash value is usually much smaller than the space occupied by the input quantity, and different inputs may hash to the same output without the possibility of uniquely determining the input value from the hash value.
The document Word Vector Model (English: Word to Vector Model) is to construct a multilayer neural network, then obtain corresponding input and output in a given text, continuously correct parameters in the neural network in the training process, and finally obtain Word vectors. The most common of which is the continuous bag-of-words model employed in this disclosure.
The Continuous Bag of Words Model (CBOW Model for short) is a Model for predicting a current word by using a context word, and when the Model is used in a network system, for a network transaction record recorded in a log file in the network system, it predicts what transaction the target transaction is most likely to be according to network transactions recorded in the log before and after the target transaction.
Huffman coding is a variable length coding method, proposed by Huffman in 1952, which provides unique coding of characters according to their probability of appearing in the document to be coded and ensures that the average coding of variable coding is the shortest. The Huffman tree, also known as an optimal tree, designates n weights as n leaf nodes to construct a binary tree, and when the length of the weighted path reaches the minimum, calls such binary tree as the optimal binary tree, i.e. the Huffman tree. The huffman tree is the tree with the shortest weighted path length, the node with the larger weight is closer to the root, and if the node in the tree is assigned with a numerical value with a certain meaning, the numerical value is called as the weight of the node.
Depth-First Search (DFS) is a type of Search algorithm that traverses nodes of a tree along its Depth, searching branches of the tree as deep as possible. For example, in an HTML (Chinese: Hypertext Markup Language, English: Hypertext Markup Language) file, when a hyperlink is selected, the linked HTML file will perform a depth-first search, i.e., a single chain must be searched in its entirety before searching the remaining hyperlink results. The depth-first search goes down a hyperlink on an HTML file until it can no longer be reached, then returns to an HTML file, and continues to select other hyperlinks in the HTML file. When no more hyperlinks are selectable, the search is said to have ended. DFS is a kind of graph algorithm, and the process is briefly that each possible branch path is deep until it can not be deep any more, and each node can only be visited once.
In the following description of the application scenario related to the present disclosure, after the network system is online, there may be a problem that the relevance between the transactions and the setting at the time of development are different, for example, for a group buying application, it is considered that it is possible that a user may want to click to view the store details after opening a store page at the time of development, so the store details are set as a default cache, while in fact, for a store page, the user may be more concerned about the evaluation of the store by other users, due to the difference, the store page request transaction set at the time of system development may have a close relationship with the store detail request transaction, and in fact, when the system is online, the store page request transaction and the store comment request transaction have a higher relevance. Therefore, the real affair relevance of the system needs to be analyzed timely, so that after the real affair relevance of the system is obtained, the cache of the system can be adjusted accordingly, and further the system is optimized, namely, for the system information of one shop, the shop evaluation, which is not the shop details, should be cached preferentially.
Therefore, according to the technical scheme provided by the embodiment of the disclosure, the relevance between each network transaction in the log file can be determined, so that when a certain transaction is abnormal or the relevance of a certain transaction is specially checked, the relevance between each transaction can be checked through the method, and then the transaction related to the transaction is determined.
Fig. 1 is a flowchart illustrating a method for determining network transaction association according to an exemplary embodiment, as shown in fig. 1, including the following steps:
step 110, a first transaction list of the target log is obtained, where the first transaction list includes a name hash value of each transaction in the target log and a timestamp corresponding to each transaction.
During the operation of the network service, before or after a network transaction occurs, transactions with high association degree are likely to occur, for example, network transactions with high system association degree generally occur in one user access, a user request news list is highly related to two network transactions with user request news specific content, that is, after the user clicks the request news list, specific content of a certain news is requested in a large probability, that is, the occurrence time of the associated network transactions is close, the occurrence frequency is similar, and the relationships are represented in a log file as being close in position and similar in occurrence frequency recorded in the log file, and the characteristics are similar to the situation that words in a natural language are distributed in a document. In documents in natural language, words with high degrees of association between word senses appear in documents with similar frequencies and are likely to appear with greater probability in the context of the target word. Therefore, for the above situation, one network transaction may be regarded as one word, the content of the log file in a certain time window is intercepted as one document, the network transaction is a plurality of words constituting the document, and the log segments intercepted in different time windows may be respectively regarded as a plurality of documents.
In the step, all transactions in the target log are sequenced according to the time stamps, and the name hash value of each transaction is determined by using a hash value algorithm so as to be converted into a word vector in the next step, the more similar the transactions occur in the first transaction list, the higher the association degree of the transactions is, and the time stamps are also used for merging and distinguishing a large number of transactions in the target log.
And step 120, determining a word vector corresponding to the name of each transaction in the first transaction list through a preset document word vector model.
The document word vector model is used for vector conversion of the transaction, the conversion efficiency is high, the method is suitable for processing of large batches of texts, meanwhile, distributed computing can be adopted during actual operation, and computing operation is simplified.
Illustratively, the preset document word vector model adopted by the embodiment is determined by training using a continuous bag-of-words model and a history log.
Wherein, the network structure of the CBOW model comprises three layers: the input layer, the projection layer and the output layer are as shown in fig. 2, and the following description will be given by taking samples (context (w), w) as an example to describe the determination method of the document word vector model:
an input layer: word vector v (context (w) containing 2C words in context (w)1),v(Context(w)2),……,v(Context(W)2C)∈RmAnd m denotes the length of the word vector.
That is, at the beginning of model building, a word vector of C words before and after a sample w is taken as an input, w represents a certain transaction in a history log, the word vector of the transaction is assumed, and the word vector of 2C other transactions is determined according to the assumed word vector and the correlation between the 2C other transactions related to the transaction w.
The projection layer accumulates the input 2C word vectors, that is:
Figure BDA0001481826560000101
the output layer corresponds to a Huffman tree structure, each leaf node in the binary tree corresponds to a word, and the tree structure shows the correlation among the leaf nodes.
Based on the above concept, first, the conditional probability of predicting a word w using a context is expressed as:
Figure BDA0001481826560000102
then, the objective function may use a log-likelihood function as shown in equation 2.2:
f=∑w∈clogP(w|Context(w)) (2.2)
then, substituting equation 2.1 into log-likelihood function 2.2 can result in equation 2.3, where equation 2.3 is as follows:
Figure BDA0001481826560000103
wherein p iswFor a path from the root node to the corresponding leaf node of w, lwRepresents a path pwThe number of the nodes is included in the medium,
Figure BDA0001481826560000104
represents a path pwInwA node, wherein
Figure BDA0001481826560000105
A root node is represented as a root node,
Figure BDA0001481826560000106
the node corresponding to the expression w is shown,
Figure BDA0001481826560000107
huffman coding of the word w, consisting ofw-a 1-bit code formation,
Figure BDA0001481826560000108
represents a path pwAnd vectors corresponding to the middle non-leaf nodes.
Figure BDA0001481826560000109
Represents a path pwThe vector corresponding to the jth non-leaf node in (j).
In the process, a random gradient descent method can be adopted to calculate the minimum value of f (w, j), so that all parameters of the document word vector model can be determined.
With the above model determination method, after the history log is used as the training sample to determine the document word vector model, the hash value of the name of each transaction in the first transaction list of step 110 is used as the input, the output of the model is the word vector of the name corresponding to each transaction, and then the first relationship tree indicating the association between the transactions is determined according to the word vector.
Step 130, when the first transaction needing to be viewed is determined, obtaining a first relation tree according to the word vector and a preset relation tree creating rule, wherein the first relation tree comprises the first transaction and other transactions related to the first transaction.
Illustratively, the relationship tree creation rules include the number of child nodes of the relationship tree, the depth, and the transaction non-duplication principle. In addition, the first transaction needing to be checked can be a transaction with an exception or can be based on a transaction needing to be checked specially.
The threshold value of the number of child nodes of the first relation tree can be determined when the document word vector model is trained through the historical log, and the number of child nodes of the relation tree is preset as the threshold value in the network system; the depth of the relation tree indicates the number of layers of the relation tree, and for example, the number of child nodes may be set to 2 and the depth to 3. In the transaction non-repetition principle, in the process of establishing the relationship tree, the newly added nodes do not include nodes existing in the relationship tree, and any node in the relationship tree only appears once.
And step 140, determining the association degree of the first transaction and other transactions according to the first relation tree.
From the first relationship tree it is clearly indicated that the first transaction is related to those other transactions and to what extent. When the transaction relevance in the network system is analyzed, only the log file is relied on, the code of the network system does not need to be modified or a new file is not configured, and the format of the log file is not limited, so that the searching and the solving of the problems occurring in the network system by operation and maintenance personnel are simplified.
In addition, the system can be optimally configured according to the first relation tree. For example, the first relationship tree indicates that the user is more interested in the notification announcements and the news of innovation in the network system, so that the content of news lists of the notification announcements and the news of innovation can be cached in the default setting of the network system, thereby improving the response time of the network system and improving the performance of the network system.
In summary, according to the method for determining relevance of network transactions provided by the present disclosure, a first transaction list of a target log is obtained, and then a word vector corresponding to a name of each transaction in the first transaction list is determined through a preset document word vector model, when a first transaction to be checked is determined, a first relation tree is obtained according to the word vector and a preset relation tree creation rule, and then relevance of the first transaction and other transactions is determined according to the first relation tree. Therefore, operation and maintenance personnel can conveniently analyze the relevance between any affair and other affairs, so that when the system is configured, the system setting is better optimized according to the relevance between the affairs, and the system performance is improved.
Fig. 3 is a flowchart illustrating another method for determining network transaction relevance according to an exemplary embodiment, where, as shown in fig. 3, the obtaining of the first transaction list of the target log in step 110 includes the following steps:
step 111, extracting the name of each transaction and the timestamp corresponding to each transaction from the target log.
And 112, sequencing all the transactions in the first transaction list according to the timestamp corresponding to each transaction to obtain a first sequence.
And 113, replacing the name of each transaction with the name hash value of each transaction by using a preset hash algorithm.
Wherein the step may further comprise the following sub-steps, as shown in fig. 4:
step 1131, according to the name of each transaction, using a preset hash value calculation formula, obtaining an integer hash value of the name of each transaction.
For example, mapping the name of each transaction to a hash value with a fixed length, since generally similar transactions have similar names, for example, the name of a database transaction may include the same database name and indication, and thus the text distance of the name of the transaction is reflected in the hash value.
Wherein, the hash value calculation formula includes:
Figure BDA0001481826560000131
wherein hv (i) is an integer hash value representing the name of the ith transaction in the first ordering; n represents the total number of characters of the name of the ith transaction, t represents the t-th character in the name of the ith transaction, and s [ t ] represents the ten thousand code of the t-th character.
The hash value calculation formula above shows that the first character in the name of the ith transaction has a greater effect on the hash value.
At step 1132, the integer hash value of the name of each transaction is converted to a hexadecimal hash value.
Step 1133, perform a hexadecimal calculation of adding a on each bit of the hexadecimal hash value of each transaction to obtain a name hash value of each transaction.
The hexadecimal system is used for indicating the hash value, word segmentation processing can be conveniently carried out during subsequent word vector conversion, each digit of the hexadecimal system hash value of each transaction is added with the character 'A', the hexadecimal system hash value can be completely converted into letter representation, and the word vector can be conveniently determined.
Step 1134, replace the name of each transaction with the name hash value of each transaction.
That is, the name of the transaction in the first ordering in step 112 is replaced with the name hash value.
Step 114, a first transaction list is generated in a first order that includes the name hash value of each transaction and a timestamp corresponding to each transaction.
And sequentially extracting the name and the timestamp of the transaction from the target log by using the steps 111 to 114 to determine a first sequence, and replacing the name of the transaction in the first sequence with the hash value to generate a first transaction list with the name hash value.
Fig. 5 is a flowchart illustrating a further method for determining relevance of a network transaction according to an exemplary embodiment, where as shown in fig. 5, when determining a first transaction that needs to be viewed in step 130, obtaining a first relationship tree according to a word vector and a preset relationship tree creation rule includes the following steps:
at step 131, the first transaction is determined as a root node of the first relationship tree.
Step 132, determining the number and depth of child nodes of the first relation tree according to the relation tree creation rule.
Step 133, according to the number of root nodes, child nodes, depth, and word vectors, a first relationship tree is established by using depth-first search under the principle that transactions are not repeated.
For example, after obtaining the word vector corresponding to the name of each transaction, the association relationship between the transactions may be determined by comparing the distances between the word vector of the first transaction and the word vectors of the other transactions in the first transaction list, and determining the relationship between the transactions by comparing the distances between every two transactions having the association with the first transaction, so as to determine the first relationship tree including the first transaction and the other transactions associated with the first transaction.
For example, as shown in fig. 6, through the operations of the above steps 131 to 133, a first relationship tree without repeated transactions is established, in which the first transaction (website home page) is the root node, the number of child nodes is 2, and the depth is 3, and the association degree of the default left-side node with the parent node is greater than that of the parent node of the right-side node, as shown in fig. 6, it can be seen that, after the user requests the website home page, the probability of accessing two secondary pages, namely notification announcement and innovation news, is the greatest, and when the user accesses the two secondary pages, the probability of the user clicking daily recommendation is greater than that of picture news and news hotspots, so that the daily recommended news list is preferentially cached in the website cache, so as to improve the cache hit rate of the network system and speed up the overall response rate of the network system.
In addition, it should be noted that the root node of the first relationship tree may be set according to user requirements, an abnormal transaction may be used as the root node of the first relationship tree, or a default root node of the first relationship tree may be set, for example, a home page of the network system may be used as the root node to establish the first relationship tree, so as to determine the association degrees of all transactions in the network system, so that an operation maintenance worker may more comprehensively obtain the correlation of the transactions in the network system when viewing the first relationship tree.
In summary, according to the method for determining relevance of network transactions provided by the present disclosure, a first transaction list of a target log is obtained, and then a word vector corresponding to a name of each transaction in the first transaction list is determined through a preset document word vector model, when a first transaction to be checked is determined, a first relation tree is obtained according to the word vector and a preset relation tree creation rule, and then relevance of the first transaction and other transactions is determined according to the first relation tree. Therefore, operation and maintenance personnel can conveniently analyze the relevance between any affair and other affairs, so that when the system is configured, the system setting is better optimized according to the relevance between the affairs, and the system performance is improved.
Fig. 7 is a block diagram illustrating a network transaction association determination apparatus 700, according to an example embodiment, the apparatus 700 may be used to execute the method described in any of fig. 1-6, referring to fig. 7, the apparatus 700 including:
the list obtaining module 710 is configured to obtain a first transaction list of the target log, where the first transaction list includes a name hash value of each transaction in the target log and a timestamp corresponding to each transaction.
And the word vector obtaining module 720 is configured to determine, through a preset document word vector model, a word vector corresponding to the name of each transaction in the first transaction list.
The relationship tree determining module 730 is configured to, when determining the first transaction that needs to be checked, obtain a first relationship tree according to the word vector and a preset relationship tree creating rule, where the first relationship tree includes the first transaction and other transactions associated with the first transaction.
And the association degree determining module 740 is configured to determine, according to the first relationship tree, an association degree between the first transaction and another transaction.
Fig. 8 is a block diagram illustrating a list acquisition module according to an example embodiment, where the list acquisition module 710 includes, as shown in fig. 8:
the extracting sub-module 711 is configured to extract a name of each transaction and a timestamp corresponding to each transaction in the target log.
The sorting submodule 712 is configured to sort all the transactions in the first transaction list according to the timestamp corresponding to each transaction, so as to obtain a first sort.
A computation submodule 713, configured to replace the name of each transaction with a name hash value of each transaction using a preset hash algorithm.
The list generation sub-module 714 is configured to generate a first transaction list including the name hash value of each transaction and a timestamp corresponding to each transaction in a first order.
Fig. 9 is a block diagram illustrating a calculation submodule, as shown in fig. 9, of the calculation submodule 713, including:
the first obtaining sub-module 7131 is configured to obtain, according to the name of each transaction, an integer hash value of the name of each transaction by using a preset hash value calculation formula.
A conversion submodule 7132, configured to convert the integer hash value of the name of each transaction into a hexadecimal hash value.
The second obtaining sub-module 7133 is configured to perform a hexadecimal calculation of adding a to each bit of the hexadecimal hash value of each transaction, to obtain a name hash value of each transaction.
A replace sub-module 7134 for replacing the name of each transaction with a name hash value for each transaction.
Wherein, the hash value calculation formula includes:
Figure BDA0001481826560000161
wherein hv (i) is an integer hash value representing the name of the ith transaction in the first ordering; n represents the total number of characters of the name of the ith transaction, t represents the t-th character in the name of the ith transaction, and s [ t ] represents the ten thousand code of the t-th character.
Optionally, the preset document word vector model is determined by training using a continuous bag-of-words model and a history log.
Fig. 10 is a block diagram illustrating a relationship tree determination module according to an exemplary embodiment, where, as shown in fig. 10, a relationship tree creation rule includes a number of child nodes of a relationship tree, a depth, and a transaction non-duplication rule, and the relationship tree determination module 730 includes:
a first determining sub-module 731 for determining the first transaction as a root node of the first relationship tree.
The second determining submodule 732 is configured to determine the number and depth of child nodes of the first relationship tree according to the relationship tree creation rule.
The relation tree building sub-module 733 is configured to build a first relation tree by using depth-first search according to the number of root nodes, child nodes, depth, and word vectors, on the principle that a transaction is not repeated.
In summary, according to the device for determining relevance of network transactions provided by the present disclosure, a first transaction list of a target log is obtained, a word vector corresponding to a name of each transaction in the first transaction list is determined through a preset document word vector model, when a first transaction to be checked is determined, a first relation tree is obtained according to the word vector and a preset relation tree creation rule, and then relevance of the first transaction and other transactions is determined according to the first relation tree. Therefore, operation and maintenance personnel can conveniently analyze the relevance between any affair and other affairs, so that when the system is configured, the system setting is better optimized according to the relevance between the affairs, and the system performance is improved.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 11 is a block diagram illustrating an electronic device 1100 in accordance with an example embodiment. As shown in fig. 11, the electronic device 1100 may include: a processor 1101, a memory 1102, multimedia components 1103, input/output (I/O) interfaces 1104, and communication components 1105.
The processor 1101 is configured to control the overall operation of the electronic device 1100, so as to complete all or part of the steps in the network transaction relevance determination method. The memory 1102 is used to store various types of data to support operation at the electronic device 1100, such as instructions for any application or method operating on the electronic device 1100, as well as application-related data, such as contact data, messaging, pictures, audio, video, and so forth. The Memory 1102 may be implemented by any type or combination of volatile and non-volatile Memory devices, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia components 1103 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 1102 or transmitted through the communication component 1105. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 1104 provides an interface between the processor 1101 and other interface modules, such as a keyboard, mouse, buttons, and the like. These buttons may be virtual buttons or physical buttons. The communication component 1105 provides for wired or wireless communication between the electronic device 1100 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding Communication component 1105 may include: Wi-Fi module, bluetooth module, NFC module.
In an exemplary embodiment, the electronic Device 1100 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the network transaction correlation determination methods described above.
In another exemplary embodiment, a computer readable storage medium comprising program instructions, such as the memory 1102 comprising program instructions, executable by the processor 1101 of the electronic device 1100 to perform the network transaction association determination method described above is also provided.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, various possible combinations will not be separately described in this disclosure.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (10)

1. A method for determining network transaction relevance, the method comprising:
acquiring a first transaction list of a target log, wherein the first transaction list comprises a name hash value of each transaction in the target log and a timestamp corresponding to each transaction;
inputting the name hash value of each transaction in the first transaction list into a preset document word vector model to obtain a word vector corresponding to the name of each transaction in the first transaction list output by the document word vector model;
when a first transaction needing to be viewed is determined, acquiring a first relation tree according to the word vector and a preset relation tree creating rule, wherein the first relation tree comprises the first transaction and other transactions related to the first transaction;
determining the association degree of the first transaction and the other transactions according to the first relation tree;
the name hash value is obtained by the following steps:
acquiring an integer hash value of the name of each transaction by using a preset hash value calculation formula according to the name of each transaction;
converting the integer hash value of the name of each transaction into a hexadecimal hash value;
and adding each bit of the hexadecimal hash value of each transaction with a character A to obtain the name hash value of each transaction.
2. The method of claim 1, wherein obtaining the first transaction list of the target log comprises:
extracting the name of each transaction and the timestamp corresponding to each transaction from the target log;
sequencing all the transactions in the first transaction list according to the timestamp corresponding to each transaction to obtain a first sequence;
replacing the name of each transaction with the name hash value of each transaction by using a preset hash algorithm;
and generating the first transaction list comprising the name hash value of each transaction and the timestamp corresponding to each transaction according to the first sequence.
3. The method of claim 2, wherein the hash calculation formula comprises:
Figure FDA0002663660600000021
wherein HV (i) is an integer hash value representing the name of the ith transaction in the first ordering; n represents the total number of characters of the name of the ith transaction, t represents the tth character in the name of the ith transaction, and s [ t ] represents the ten thousand code of the tth character.
4. The method of claim 1, wherein the preset document word vector model is determined by training using a continuous bag of words model and a history log.
5. The method of claim 1, wherein the relationship tree creation rules include a relationship tree child node number, a relationship tree depth, and a transaction non-duplication rule; when the first transaction needing to be viewed is determined, the obtaining of the first relation tree according to the word vector and a preset relation tree creating rule comprises the following steps:
determining the first transaction as a root node of the first relationship tree;
determining the number and depth of child nodes of the first relation tree according to the relation tree creating rule;
and establishing the first relation tree by utilizing depth-first search according to the root node, the number of the child nodes, the depth and the word vector under the transaction non-repetition principle.
6. An apparatus for determining network transaction relevance, the apparatus comprising:
the system comprises a list acquisition module, a list acquisition module and a list processing module, wherein the list acquisition module is used for acquiring a first transaction list of a target log, and the first transaction list comprises a name hash value of each transaction in the target log and a timestamp corresponding to each transaction;
a word vector obtaining module, configured to input the name hash value of each transaction in the first transaction list into a preset document word vector model, so as to obtain a word vector corresponding to the name of each transaction in the first transaction list output by the document word vector model;
the relation tree determining module is used for acquiring a first relation tree according to the word vector and a preset relation tree creating rule when a first transaction needing to be checked is determined, wherein the first relation tree comprises the first transaction and other transactions related to the first transaction;
the association degree determining module is used for determining the association degree of the first transaction and the other transactions according to the first relation tree;
the name hash value is obtained by the following steps:
acquiring an integer hash value of the name of each transaction by using a preset hash value calculation formula according to the name of each transaction;
converting the integer hash value of the name of each transaction into a hexadecimal hash value;
and adding each bit of the hexadecimal hash value of each transaction with a character A to obtain the name hash value of each transaction.
7. The apparatus of claim 6, wherein the list obtaining module comprises:
the extraction submodule is used for extracting the name of each transaction and the timestamp corresponding to each transaction from the target log;
the ordering submodule is used for ordering all the transactions in the first transaction list according to the timestamp corresponding to each transaction to obtain a first ordering;
the calculation submodule is used for replacing the name of each transaction with the name hash value of each transaction by using a preset hash algorithm;
and the list generation submodule is used for generating the first transaction list comprising the name hash value of each transaction and the timestamp corresponding to each transaction according to the first sequence.
8. The apparatus of claim 7, wherein the hash calculation formula comprises:
Figure FDA0002663660600000041
wherein HV (i) is an integer hash value representing the name of the ith transaction in the first ordering; n represents the total number of characters of the name of the ith transaction, t represents the tth character in the name of the ith transaction, and s [ t ] represents the ten thousand code of the tth character.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
10. An electronic device, comprising:
the computer-readable storage medium of claim 9; and
one or more processors to execute the program in the computer-readable storage medium.
CN201711195221.9A 2017-11-24 2017-11-24 Method, device, storage medium and equipment for determining relevance of network transaction Active CN108197142B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711195221.9A CN108197142B (en) 2017-11-24 2017-11-24 Method, device, storage medium and equipment for determining relevance of network transaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711195221.9A CN108197142B (en) 2017-11-24 2017-11-24 Method, device, storage medium and equipment for determining relevance of network transaction

Publications (2)

Publication Number Publication Date
CN108197142A CN108197142A (en) 2018-06-22
CN108197142B true CN108197142B (en) 2020-10-30

Family

ID=62573086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711195221.9A Active CN108197142B (en) 2017-11-24 2017-11-24 Method, device, storage medium and equipment for determining relevance of network transaction

Country Status (1)

Country Link
CN (1) CN108197142B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101888309A (en) * 2010-06-30 2010-11-17 中国科学院计算技术研究所 Online log analysis method
CN102054004A (en) * 2009-11-04 2011-05-11 清华大学 Webpage recommendation method and device adopting same
CN102158355A (en) * 2011-03-11 2011-08-17 广州蓝科科技股份有限公司 Log event correlation analysis method and device capable of concurrent and interrupted analysis
CN102855309A (en) * 2012-08-21 2013-01-02 亿赞普(北京)科技有限公司 Information recommendation method and device based on user behavior associated analysis
CN104917627A (en) * 2015-01-20 2015-09-16 杭州安恒信息技术有限公司 Log cluster scanning and analysis method used for large-scale server cluster
CN106452808A (en) * 2015-08-04 2017-02-22 北京奇虎科技有限公司 Data processing method and data processing device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8060540B2 (en) * 2007-06-18 2011-11-15 Microsoft Corporation Data relationship visualizer

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102054004A (en) * 2009-11-04 2011-05-11 清华大学 Webpage recommendation method and device adopting same
CN101888309A (en) * 2010-06-30 2010-11-17 中国科学院计算技术研究所 Online log analysis method
CN102158355A (en) * 2011-03-11 2011-08-17 广州蓝科科技股份有限公司 Log event correlation analysis method and device capable of concurrent and interrupted analysis
CN102855309A (en) * 2012-08-21 2013-01-02 亿赞普(北京)科技有限公司 Information recommendation method and device based on user behavior associated analysis
CN104917627A (en) * 2015-01-20 2015-09-16 杭州安恒信息技术有限公司 Log cluster scanning and analysis method used for large-scale server cluster
CN106452808A (en) * 2015-08-04 2017-02-22 北京奇虎科技有限公司 Data processing method and data processing device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于Lucene的中文自然语言搜索引擎;胡长春;《中国优秀硕士学位论文全文数据库信息科技辑》;20091215;正文第22页 *

Also Published As

Publication number Publication date
CN108197142A (en) 2018-06-22

Similar Documents

Publication Publication Date Title
US11023505B2 (en) Method and apparatus for pushing information
CN107463704B (en) Search method and device based on artificial intelligence
US20140279751A1 (en) Aggregation and analysis of media content information
US10748165B2 (en) Collecting and analyzing electronic survey responses including user-composed text
KR20200019824A (en) Entity relationship data generating method, apparatus, equipment and storage medium
CN111783016B (en) Website classification method, device and equipment
CN110674255A (en) Text content auditing method and device
US10803257B2 (en) Machine translation locking using sequence-based lock/unlock classification
CN110598109A (en) Information recommendation method, device, equipment and storage medium
JP2023017921A (en) Content recommendation and sorting model training method, apparatus, and device and computer program
WO2022076885A1 (en) Systems and methods for tracking data shared with third parties using artificial intelligence-machine learning
US20220391595A1 (en) User discussion environment interaction and curation via system-generated responses
Zhu et al. CCBLA: a lightweight phishing detection model based on CNN, BiLSTM, and attention mechanism
US11055330B2 (en) Utilizing external knowledge and memory networks in a question-answering system
WO2022235404A1 (en) Composing human-readable explanations for user navigational recommendations
CN112765966B (en) Method and device for removing duplicate of associated word, computer readable storage medium and electronic equipment
US20210264480A1 (en) Text processing based interface accelerating
CN117216393A (en) Information recommendation method, training method and device of information recommendation model and equipment
CN111382232A (en) Question and answer information processing method and device and computer equipment
CN108197142B (en) Method, device, storage medium and equipment for determining relevance of network transaction
US20230161948A1 (en) Iteratively updating a document structure to resolve disconnected text in element blocks
US20220358293A1 (en) Alignment of values and opinions between two distinct entities
CN113961811A (en) Conversational recommendation method, device, equipment and medium based on event map
Do et al. Some Research Issues of Harmful and Violent Content Filtering for Social Networks in the Context of Large-Scale and Streaming Data with Apache Spark
CN115687736B (en) Web application searching method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant