Disclosure of Invention
One or more embodiments of the present specification describe a method and apparatus for updating a node in a traffic guide graph, which can improve efficiency when updating a node in the traffic guide graph.
In a first aspect, a method for updating nodes in a service guide graph is provided, where the service guide graph includes a plurality of nodes organized into a tree-like hierarchical structure according to service dimensions, each node corresponds to a keyword and an associated expression of the keyword, and a leaf node of the service guide graph carries a standard problem associated with the keyword of the leaf node, and the method includes:
acquiring a first question set formed by original questions;
receiving a first instruction, and determining one node in the plurality of nodes as a node to be updated according to the first instruction;
receiving a second instruction, and determining a screening node set according to the second instruction;
screening out question sentences containing key words corresponding to each node in the screening node set or associated expressions of the key words from the first question sentence set to obtain a second question sentence set;
performing word segmentation processing on each question in the second question set, and removing a keyword corresponding to each node in the screening node set or associated expression of the keyword to obtain a candidate word set;
clustering the question sentences including the participles in the second question sentence set according to the participles included in the alternative word set to obtain a candidate word set consisting of a plurality of categories of question sentences and the corresponding core words of each category;
receiving a third instruction, adding a new-added child node to the node to be updated according to the third instruction, and determining at least one candidate word in the candidate word set indicated by the third instruction as a keyword corresponding to the new-added child node or an associated expression of the keyword; or receiving a fourth instruction, determining an existing child node of the node to be updated according to the fourth instruction, and determining at least one candidate word in the candidate word set indicated by the fourth instruction as the associated expression of the keyword of the existing child node.
In a possible embodiment, the determining a set of screening nodes according to the second instruction includes:
determining at least one node in the path of the node to be updated according to the second instruction;
and taking a set formed by the node to be updated and the at least one node as the screening node set.
In a possible implementation manner, the clustering, according to the participles included in the candidate word set, the question sentences including the participles in the second question sentence set to obtain a candidate word set composed of question sentences of multiple categories and a headword corresponding to each category, includes:
performing word frequency statistics on the participles included in the alternative word set to obtain high-frequency words with the word frequency larger than a preset threshold;
dividing the question including the same high-frequency word into question sentences of one category, and adding the high-frequency word into a candidate word set as a candidate word corresponding to the category.
Further, the method further comprises:
and displaying the candidate words in the candidate word set according to the order of word frequency from high to low.
In a possible implementation manner, the clustering, according to the participles included in the candidate word set, the question sentences including the participles in the second question sentence set to obtain a candidate word set composed of question sentences of multiple categories and a headword corresponding to each category, includes:
performing clustering statistics based on density on the participles included in the alternative word set to obtain a plurality of clustering clusters;
dividing the question including any participle in the same cluster into a category of question, and adding the central word of the cluster into a candidate word set as a candidate word corresponding to the category.
Further, the method further comprises:
and displaying all candidate words in the candidate word set from high to low according to the density of the corresponding category.
In a possible implementation manner, after the adding a new increasing child node to the node to be updated according to the third instruction and determining at least one candidate word in the candidate word set indicated by the third instruction as a keyword corresponding to the new increasing child node or an associated expression of the keyword, the method further includes:
receiving a fifth instruction, determining the new-added child node as a leaf node according to the fifth instruction, and mounting a standard problem associated with the keyword of the leaf node for the leaf node.
In a possible implementation manner, after the adding a new increasing child node to the node to be updated according to the third instruction and determining at least one candidate word in the candidate word set indicated by the third instruction as a keyword corresponding to the new increasing child node or an associated expression of the keyword, the method further includes:
receiving a sixth instruction, and determining that the new-added child node is not a leaf node according to the sixth instruction;
executing the receiving first instruction, and determining one node in the plurality of nodes as a node to be updated according to the first instruction, wherein the node to be updated is the new-added child node.
In a second aspect, an apparatus for updating nodes in a service guide map is provided, where the service guide map includes a plurality of nodes organized into a tree-like hierarchical structure according to service dimensions, each node corresponds to a keyword and an associated expression of the keyword, and a leaf node of the service guide map carries a standard problem associated with the keyword of the leaf node, and the apparatus includes:
the device comprises an acquisition unit, a query unit and a query unit, wherein the acquisition unit is used for acquiring a first question set formed by original questions;
the determining unit is used for receiving a first instruction and determining one node in the plurality of nodes as a node to be updated according to the first instruction; receiving a second instruction, and determining a screening node set according to the second instruction;
the screening unit is used for screening out question sentences containing key words or associated expressions of the key words corresponding to each node in the screening node set determined by the determining unit from the first question sentence set acquired by the acquiring unit to obtain a second question sentence set;
a word segmentation unit, configured to perform word segmentation processing on each question in the second question set obtained by the screening unit, and remove a keyword or an associated expression of the keyword corresponding to each node in the screening node set determined by the determination unit, to obtain an alternative word set;
the clustering unit is used for clustering the question sentences including the participles in the second question sentence set obtained by the screening unit according to the participles in the alternative word set obtained by the participle unit to obtain a candidate word set consisting of a plurality of categories of question sentences and the corresponding core words of each category;
the node updating unit is used for receiving a third instruction, adding a new-added child node to the node to be updated determined by the determining unit according to the third instruction, and determining at least one candidate word in the candidate word set obtained by the clustering unit indicated by the third instruction as a keyword corresponding to the new-added child node or an associated expression of the keyword; or receiving a fourth instruction, determining an existing child node of the node to be updated according to the fourth instruction, and determining at least one candidate word in the candidate word set indicated by the fourth instruction as the associated expression of the keyword of the existing child node.
In a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
In a fourth aspect, a computing device is provided, comprising a memory having stored therein executable code, and a processor that when executing the executable code, implements the method of the first aspect.
According to the method and the device provided by the embodiment of the specification, a first question set formed by an original question is obtained firstly, then a first instruction is received, one of a plurality of nodes is determined as a node to be updated according to the first instruction, a second instruction is received, a screening node set is determined according to the second instruction, then a question containing a keyword corresponding to each node in the screening node set or an associated expression of the keyword is screened from the first question set to obtain a second question set, then word segmentation processing is carried out on each question in the second question set, the keyword corresponding to each node in the screening node set or the associated expression of the keyword is planed to obtain a candidate word set, clustering processing is carried out on the question containing the word in the second question set according to the word segmentation of the candidate word set to obtain a candidate word set formed by a plurality of categories of questions and central words corresponding to each category, finally a third instruction is received, a keyword corresponding to the candidate word set is added according to the third instruction, and at least one of the candidate word set is determined as a new node to be updated; or receiving a fourth instruction, determining an existing child node of the node to be updated according to the fourth instruction, and determining at least one candidate word in the candidate word set indicated by the fourth instruction as the associated expression of the keyword of the existing child node. Therefore, the method utilizes the existing structure of the service guide diagram to filter and screen the initially acquired first question set, so that the difference between the user questions can be displayed, the corresponding clustering result is more accurate, the workload of operators is reduced, and the efficiency can be improved when the nodes are updated in the service guide diagram.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in this specification. The implementation scenario involves updating nodes in a traffic guide graph. Referring to fig. 1, a service guide graph 100 includes a plurality of nodes organized into a tree-like hierarchical structure according to service dimensions, each node corresponds to a keyword and an associated expression of the keyword, and leaf nodes of the service guide graph mount standard questions (abbreviated as label questions) associated with the keywords of the leaf nodes. Wherein, the root node of the traffic guide graph 100 represents a traffic type, for example, the root node 101 in fig. 1 is a balance treasure; the third level nodes 102 and nodes after the third level in the traffic guide graph are typically semantic nodes, for example, the third level nodes 102 in fig. 1 include refunds, rollouts, switches, and the like; each leaf node 103 (also referred to as an end node) in the traffic guide graph may mount a standard question 104, for example, the leaf node 103 in fig. 1 includes "how", and the standard question 104 mounted by the leaf node is "how to query a balance treasure"; standard questions: the standard questions summarized by the user's high-frequency questions are referred to as the questions hereinafter; and (3) association expression: the method comprises synonymous expression, implication expression and upper and lower level words, and each semantic node can be configured with the associated expression.
In the embodiment of the present specification, the update node may include, but is not limited to, any of the following situations: adding new sub nodes for the nodes to be updated in the service guide diagram, and determining corresponding keywords and associated expressions for the new sub nodes; or, determining an existing child node for the node to be updated in the service guide graph, and determining the associated expression of the corresponding keyword for the existing child node.
It can be understood that after the new adding child node is added, the standard problem can be mounted for the new adding child node according to the instruction of the operator, or the new adding child node is used as the node to be updated to further update the node.
Fig. 2 is a schematic diagram of another implementation scenario of an embodiment disclosed in this specification. The implementation scenario involves node mining in a traffic guide graph. Referring to fig. 2, an original question set 21 is obtained first, then user questions in the original question set 21 are clustered 22 to obtain a plurality of cluster clusters 23, and the operator 23 performs manual review on the cluster clusters to perform question marking production 24. Wherein the process of interrogating production 24 includes the process of updating nodes. It can be understood that the more repetitive clustering clusters, that is, the more clustering clusters corresponding to the same standard problem, the greater the manual review cost, so in the embodiments of the present specification, emphasis is placed on reducing the repetitive clustering clusters to reduce the manual review cost and improve the efficiency.
FIG. 3 shows a flow diagram of a method for updating nodes in a service guide graph according to an embodiment, where the service guide graph includes a plurality of nodes organized as a tree-like hierarchy according to service dimensions, each node corresponds to a keyword and an associated expression of the keyword, and a leaf node of the service guide graph carries a standard problem associated with the keyword of the leaf node. As shown in fig. 3, the method for mining nodes in the traffic guide map in this embodiment includes the following steps: step 31, acquiring a first question set formed by original questions; step 32, receiving a first instruction, and determining one node in the plurality of nodes as a node to be updated according to the first instruction; step 33, receiving a second instruction, and determining a screening node set according to the second instruction; step 34, selecting question sentences containing key words or associated expressions of the key words corresponding to each node in the screening node set from the first question sentence set to obtain a second question sentence set; step 35, performing word segmentation processing on each question in the second question set, and removing a keyword or an associated expression of the keyword corresponding to each node in the screening node set to obtain a candidate word set; step 36, according to the participles included in the candidate word set, clustering the question sentences including the participles in the second question sentence set to obtain question sentences of multiple categories and a candidate word set formed by the core words corresponding to the categories; step 37, receiving a third instruction, adding a new-added child node to the node to be updated according to the third instruction, and determining at least one candidate word in the candidate word set indicated by the third instruction as a keyword corresponding to the new-added child node or an associated expression of the keyword; or receiving a fourth instruction, determining an existing child node of the node to be updated according to the fourth instruction, and determining at least one candidate word in the candidate word set indicated by the fourth instruction as the associated expression of the keyword of the existing child node. Specific execution modes of the above steps are described below.
First, in step 31, a first set of question sentences, made up of original question sentences, is obtained. It is understood that a plurality of question sentences, for example, 100 question sentences or 1000 question sentences, are included in the first set of question sentences.
In the embodiment of the present specification, a source of the first question set is not limited, and for example, the first question set may be obtained by collecting, on line, original questions input by a user within a preset time period.
Next, in step 32, a first instruction is received, and one of the plurality of nodes is determined as a node to be updated according to the first instruction.
It is understood that, on the premise that the service guide map has been constructed with a plurality of nodes, one node may be selected from the plurality of nodes by an operator as a node to be updated. That is to say, it needs to be determined subsequently whether a new child node can be mined out under the node to be updated, and a keyword and an associated expression of the keyword are added to the new child node, or whether an associated expression of the keyword can be added to an existing child node of the node to be updated.
Then, in step 33, a second instruction is received, and a set of screening nodes is determined according to the second instruction.
The filtering node set may only include the node to be updated, and optionally, may further include one or more nodes in a path where the node to be updated is located, where the nodes are used to filter and filter the question later.
In one example, according to a second instruction, at least one node in the path of the node to be updated is determined; and taking a set formed by the node to be updated and the at least one node as the screening node set.
Then, in step 34, a question including a keyword corresponding to each node in the screening node set or an associated expression of the keyword is screened from the first question set, so as to obtain a second question set.
It will be appreciated that the first set of questions and the second set of questions are in an inclusive relationship, and step 34 is actually a process of narrowing the scope of the questions, which may be referred to as hierarchical filtering when the set of filter nodes includes a plurality of nodes.
Fig. 4 is a schematic diagram of hierarchical filtering provided in an embodiment of the present disclosure. Referring to fig. 4, hierarchical filtering: and (4) performing a filtering and screening process on the question through the hierarchical structure of the guide picture. As shown in the figure, in the traffic guidance diagram, the node a has a child node B and a child node C, the screening node set determined in step 33 includes the node a and the node B, the keyword and the association thereof corresponding to the node a are expressed as "a, α", and the keyword and the association thereof corresponding to the node B are expressed as "B, β". The leftmost side is an original question set S0, all the questions which are expressed in a node A correlation mode are contained in the first-layer screening S0 to obtain S1, the second layer screens the S1 in the next step, a node B is used as a screening node in the legend to obtain SB2, and the process of screening the questions in a node correlation mode layer by layer is called hierarchical filtering. S0 may be understood as a first question set, and SB2 may be understood as a second question set.
Then, in step 35, performing word segmentation processing on each question in the second question set, and removing a keyword or an associated expression of the keyword corresponding to each node in the screening node set to obtain an alternative word set.
For example, a question includes the participles a, B, C, D, and the keyword or the associated expression of the keyword corresponding to each node in the screening node set includes B and C, then B and C of the participles a, B, C, D are planed, and the participles a and D are added into the alternative word set.
And performing the word segmentation processing on each question in the second question set to obtain a final alternative word set.
And then, in step 36, clustering the question sentences including the participles in the second question sentence set according to the participles included in the candidate word set to obtain question sentences of multiple categories and a candidate word set formed by the core words corresponding to the categories.
In one example, word frequency statistics is performed on the participles included in the candidate word set, and high-frequency words with word frequency larger than a preset threshold value are obtained; dividing the question including the same high-frequency word into question sentences of one category, and adding the high-frequency word into a candidate word set as a candidate word corresponding to the category.
The preset threshold may be set according to a policy, for example, the preset threshold may be set to 0, 1, 2, 3, and the like.
Furthermore, each candidate word in the candidate word set can be displayed in sequence from high to low according to word frequency for operators to observe, and one or more candidate words are selected from each candidate word as keywords and associated expressions thereof.
In another example, performing density-based cluster statistics on the segmented words included in the candidate word set to obtain a plurality of cluster clusters; dividing a question comprising any participle in the same cluster into a category question, and adding a headword of the cluster into a candidate word set as a candidate word corresponding to the category.
Furthermore, each candidate word in the candidate word set can be displayed in sequence from high to low according to the density of the corresponding category for operators to observe, and one or more candidate words are selected from the candidate words as keywords and associated expressions thereof.
Finally, in step 37, a third instruction is received, a new-added child node is added to the node to be updated according to the third instruction, and at least one candidate word in the candidate word set indicated by the third instruction is determined as a keyword corresponding to the new-added child node or an associated expression of the keyword; or receiving a fourth instruction, determining an existing child node of the node to be updated according to the fourth instruction, and determining at least one candidate word in the candidate word set indicated by the fourth instruction as an associated expression of a keyword of the existing child node.
In one example, after step 37, a fifth instruction is received, the new child node is determined to be a leaf node according to the fifth instruction, and the standard question associated with the keyword of the leaf node is mounted for the leaf node.
In another example, after step 37, a sixth instruction is received from which it is determined that the new child node is not a leaf node; executing the first receiving instruction, and determining one node in the plurality of nodes as a node to be updated according to the first instruction, wherein the node to be updated is the new child node. That is, the mining continues for the newly added child nodes.
By the method provided by the embodiment of the specification, the initially acquired first question set is filtered and screened by using the existing structure of the service guide graph, so that the difference between the questions of the users is displayed, the clustering result is relatively accurate, the workload of operators is reduced, and the efficiency can be improved when node mining is performed in the service guide graph.
Fig. 5 is a schematic diagram of interactive mining provided in an embodiment of the present specification. Referring to fig. 5, interactive mining means that an operator selects a node of a current sub-node to be mined during a process of constructing a guide graph, selects a necessary element in a path where the node is located for filtering (shown as flower, repayment), and filters all question sentences containing associated expressions of the selected node in original question sentences by an algorithm, wherein the filtering result is S ', and finally, the term frequency is counted from S' (the necessary element for filtering is planed). The right side column displays the keywords from high to low according to the word frequency of the node, finally, an operator can select a plurality of keywords to create new sub-nodes, and the selected keywords automatically become the associated expression of the new node. The whole process is called interactive excavation.
Fig. 6 is a flowchart for discovering new problems of the guide map provided in the embodiment of the present specification. Referring to fig. 6, an original question is provided first, then a node to be updated (i.e. a node to be mined) is selected, and essential elements are selected, and the purpose of mining is to find a new node and a new standard problem. In the embodiment of the specification, manual judgment is needed, and in the whole link, two judgment conditions are provided, namely whether high-frequency child nodes exist or not and whether nodes exist or not and need to be subdivided. And if the high-frequency words exist, the high-frequency words are merged into the associated expressions of the existing nodes, and if the high-frequency words do not exist, the corresponding child nodes are created. And judging whether the nodes need to be subdivided or not, and if the same problem is described, the operator needs to observe whether the question sentences left at the end nodes after hierarchical filtering in the guide graph mostly set forth the same problem or not, and if a plurality of problems are described, the node can still be subdivided.
According to another embodiment, an apparatus for updating nodes in a service guide graph is further provided, where the service guide graph includes a plurality of nodes organized into a tree-like hierarchical structure according to service dimensions, each node corresponds to a keyword and an associated expression of the keyword, and a leaf node of the service guide graph carries a standard problem associated with the keyword of the leaf node. Fig. 7 shows a schematic block diagram of an arrangement for updating a node in a traffic direction graph according to an embodiment. As shown in fig. 7, the apparatus 700 includes:
an obtaining unit 71, configured to obtain a first question set formed by original questions;
a determining unit 72, configured to receive a first instruction, and determine one node of the plurality of nodes as a node to be updated according to the first instruction; receiving a second instruction, and determining a screening node set according to the second instruction;
a screening unit 73, configured to screen out, from the first question set acquired by the acquiring unit 71, a question that includes a keyword or an associated expression of the keyword, where the keyword corresponds to each node in the screening node set determined by the determining unit 72, to obtain a second question set;
a word segmentation unit 74, configured to perform word segmentation processing on each question in the second question set obtained by the screening unit 73, and eliminate a keyword or an associated expression of the keyword corresponding to each node in the screening node set determined by the determination unit 72, so as to obtain an alternative word set;
a clustering unit 75, configured to perform clustering processing on the question sentences including the participle in the second question sentence set obtained by the screening unit 73 according to the participle included in the candidate word set obtained by the participle unit 74, so as to obtain a candidate word set formed by multiple categories of question sentences and core words corresponding to the categories;
a node updating unit 76, configured to receive a third instruction, add a new-added child node to the node to be updated determined by the determining unit 72 according to the third instruction, and determine at least one candidate word in the candidate word set obtained by the clustering unit 75 indicated by the third instruction as a keyword corresponding to the new-added child node or an associated expression of the keyword; or receiving a fourth instruction, determining an existing child node of the node to be updated according to the fourth instruction, and determining at least one candidate word in the candidate word set indicated by the fourth instruction as an associated expression of a keyword of the existing child node.
Optionally, as an embodiment, the determining unit 72 is specifically configured to:
determining at least one node in the path of the node to be updated according to the second instruction;
and taking a set formed by the node to be updated and the at least one node as the screening node set.
Optionally, as an embodiment, the clustering unit 75 is specifically configured to:
performing word frequency statistics on the participles included in the alternative word set to obtain high-frequency words with the word frequency larger than a preset threshold;
dividing the question including the same high-frequency word into question sentences of one category, and adding the high-frequency word into a candidate word set as a candidate word corresponding to the category.
Further, the apparatus further comprises:
and the display unit is used for displaying all candidate words in the candidate word set from high to low according to the word frequency.
Optionally, as an embodiment, the clustering unit 75 is specifically configured to:
performing clustering statistics based on density on the participles included in the alternative word set to obtain a plurality of clustering clusters;
dividing the question including any participle in the same cluster into a category of question, and adding the central word of the cluster into a candidate word set as a candidate word corresponding to the category.
Further, the apparatus further comprises:
and the display unit is used for sequentially displaying all candidate words in the candidate word set from high to low according to the density of the corresponding category.
Optionally, as an embodiment, the node updating unit 76 is further configured to receive a fifth instruction, determine that the new child node is a leaf node according to the fifth instruction, and mount, for the leaf node, a standard problem associated with the keyword of the leaf node.
Optionally, as an embodiment, the node updating unit 76 is further configured to receive a sixth instruction, and determine that the new child node is not a leaf node according to the sixth instruction;
the determining unit 72 is further configured to receive a first instruction, and determine one node of the plurality of nodes as a node to be updated according to the first instruction, where the node to be updated is the new child node.
By the device provided by the embodiment of the specification, the initially acquired first question set is filtered and screened by utilizing the existing structure of the service guide diagram, so that the difference between the questions of the users can be displayed, the clustering result is relatively accurate, the workload of operators is reduced, and the efficiency can be improved when node mining is performed in the service guide diagram.
In the embodiment of the specification, not only can the efficiency be improved, but also the following effects can be brought correspondingly.
On one hand, aiming at the problem that the text similarity is greatly influenced by the text content, the difference of two question sentences can be gradually reflected through hierarchical filtering. For example, question 1 is "eating one apple every day has great benefit to maintain human health", question 2 is "eating one banana every day has great benefit to maintain human health", and similar human- > health- > benefit- > \ 8230can be obtained through hierarchical filtering, the more to the end node, the more the content of the question is filtered by the preposed node, the shorter the content of the rest text is, the larger the difference is, so that the two questions are gradually reflected and distinguished.
On the other hand, aiming at the problem that the granularity cannot be controlled by a text content clustering-based method, the granularity does not need to be displayed and set by the text hierarchy filtering and word frequency statistics method, and the difference between texts can be naturally displayed as the hierarchy is gradually deepened.
On the other hand, aiming at the problem that the matching model and the clustering model are irrelevant to each other, in the embodiment of the description, the structure of the guide graph directly participates in matching, and the hierarchical filtering based on the guide graph is more accurate when the matching rate is improved due to the fact that the service optimization guide graph.
On the other hand, different expressions of the same label can be clustered into different clusters, and operators can continuously expand the associated expressions of the nodes of the guide map, so that the same problems divided into different clusters can be filtered to one node through the continuously expanded associated expressions.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with any of fig. 3 to 6.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in conjunction with any of fig. 3 to 6.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.