CN111737445B

CN111737445B - Knowledge base searching method and device

Info

Publication number: CN111737445B
Application number: CN202010572936.7A
Authority: CN
Inventors: 申亚坤
Original assignee: Bank of China Ltd
Current assignee: Bank of China Ltd
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2023-09-01
Anticipated expiration: 2040-06-22
Also published as: CN111737445A

Abstract

The application provides a knowledge base searching method and device, wherein the method comprises the following steps: receiving a current character string sent by a seat terminal; performing word segmentation operation on the current character string to obtain a current word segmentation set; calculating first similarity between the current word segmentation set and each knowledge point document; extracting each individual marking word segmentation set corresponding to the seat terminal from marking fields of each knowledge point document; calculating the second similarity of the current word segmentation set and the personalized marking word segmentation set of each knowledge point document; calculating the comprehensive similarity of each knowledge point document based on the first similarity and the second similarity of each knowledge point document; and pushing a plurality of knowledge point documents to the seat terminal according to the sequence of the comprehensive similarity from high to low. According to the method and the system for recommending the knowledge point documents based on the comprehensive similarity, the knowledge point documents obtained through final recommendation can be more accurate.

Description

Knowledge base searching method and device

Technical Field

The present application relates to the field of communications technologies, and in particular, to a method and an apparatus for searching a knowledge base.

Background

In the process of providing service by the agent, the agent retrieves the required knowledge from the knowledge base, and determines the quality of service. If the searching efficiency is high and the precision is high, the customer satisfaction is high; if the search efficiency is low and the accuracy is low, the customer satisfaction is low.

In the current process of searching a knowledge base by an agent, keywords of knowledge points are generally input for searching, and the knowledge base can find one or more knowledge points related to the keywords for the agent to check.

However, because descriptions of different agents in searching knowledge points are different, when different descriptions are used, the difference between the knowledge points obtained by searching is larger, so that the accuracy of the scheme for searching by the current knowledge base based on keywords is lower.

Disclosure of Invention

In view of the above, the application provides a knowledge base searching method and device, which can improve the searching accuracy.

In order to achieve the above object, the present application provides the following technical features:

a knowledge base searching method, comprising:

receiving a current character string sent by a seat terminal;

performing word segmentation operation on the current character string to obtain a current word segmentation set;

calculating first similarity between the current word segmentation set and each knowledge point document;

extracting each individual marking word segmentation set corresponding to the seat terminal from marking fields of each knowledge point document;

calculating the second similarity of the current word segmentation set and the personalized marking word segmentation set of each knowledge point document;

calculating the comprehensive similarity of each knowledge point document based on the first similarity and the second similarity of each knowledge point document;

and pushing a plurality of knowledge point documents to the seat terminal according to the sequence of the comprehensive similarity from high to low.

Optionally, the calculating the first similarity between the current word segmentation set and each knowledge point document includes:

and calculating the first similarity between the current word segmentation set and each knowledge point document by using a TF-IDF algorithm.

Optionally, after the pushing the plurality of knowledge point documents to the agent terminal according to the order of the integrated similarity from high to low, the method further includes:

receiving an adding instruction which is sent by the seat terminal and contains a knowledge point document identifier;

and adding the current word segmentation set into the personalized marking word segmentation set corresponding to the seat terminal in the marking field corresponding to the knowledge point document.

Optionally, before the receiving the current character string sent by the seat terminal, the method further includes:

receiving a history character string sent by a seat terminal;

performing word segmentation operation on the history character string to obtain a history word segmentation set;

calculating the similarity between the history word segmentation set and each knowledge point document;

pushing a plurality of knowledge point documents to the seat terminal according to the sequence of the similarity from high to low;

and adding the history word segmentation set into the personalized marking word segmentation set corresponding to the seat terminal in the marking field corresponding to the knowledge point document.

Optionally, after receiving the current character string sent by the seat terminal, the method further includes:

and executing preprocessing operation on the current character string.

A knowledge base searching apparatus comprising:

the receiving unit is used for receiving the current character string sent by the seat terminal;

the word segmentation unit is used for performing word segmentation operation on the current character string to obtain a current word segmentation set;

the first computing unit is used for computing the first similarity between the current word segmentation set and each knowledge point document;

the extraction unit is used for extracting each individual marking word segmentation set corresponding to the seat terminal from marking fields of each knowledge point document;

the second computing unit is used for computing the second similarity of the current word segmentation set and the personalized marking word segmentation set of each knowledge point document;

a third calculation unit for calculating the comprehensive similarity of each knowledge point document based on the first similarity and the second similarity of each knowledge point document;

and the pushing unit is used for pushing a plurality of knowledge point documents to the seat terminal according to the sequence of the comprehensive similarity from high to low.

Optionally, the first computing unit includes: and calculating the first similarity between the current word segmentation set and each knowledge point document by using a TF-IDF algorithm.

Optionally, after the pushing unit, the method further includes:

the adding unit is used for receiving an adding instruction which is sent by the seat terminal and contains a knowledge point document identifier; and adding the current word segmentation set into the personalized marking word segmentation set corresponding to the seat terminal in the marking field corresponding to the knowledge point document.

Optionally, before the receiving unit, the method further includes:

the construction unit is used for receiving the history character string sent by the seat terminal; performing word segmentation operation on the history character string to obtain a history word segmentation set; calculating the similarity between the history word segmentation set and each knowledge point document; pushing a plurality of knowledge point documents to the seat terminal according to the sequence of the comprehensive similarity from high to low; receiving an adding instruction which is sent by the seat terminal and contains a knowledge point document identifier; and adding the history word segmentation set into the personalized marking word segmentation set corresponding to the seat terminal in the marking field corresponding to the knowledge point document.

Optionally, after the receiving unit, the method further includes:

and the preprocessing unit is used for executing preprocessing operation on the current character string.

Through the technical means, the following beneficial effects can be realized:

the application provides a knowledge base searching method, which can calculate the first similarity between a current character string and knowledge point documents, calculate the second similarity between the current character string and a personalized marking word segmentation set in each knowledge point document, calculate the comprehensive similarity of each knowledge point document based on the first similarity and the second similarity of each knowledge point document, and push a plurality of knowledge point documents to a seat terminal according to the order of the comprehensive similarity from high to low.

According to the method, not only is the first similarity of the current character string and the knowledge point document calculated, but also the second similarity of the current character string and the personalized marking word segmentation set in the knowledge point document is calculated, and the first similarity and the second similarity are combined with each other, so that the comprehensive similarity is obtained.

Recommending the knowledge point document based on the comprehensive similarity can enable the knowledge point document obtained by final recommendation to be more accurate.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a knowledge base searching system according to an embodiment of the present application;

FIG. 2 is a flow chart of a method for adding a personalized marking word segmentation set according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a knowledge base searching device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a knowledge base searching device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of another knowledge base searching apparatus according to an embodiment of the present application.

Detailed Description

Term interpretation:

the main idea of TF-IDF is that if a word or phrase appears in one article with a high frequency TF and rarely appears in other articles, the word or phrase is considered to have a good class distinction capability, suitable for classification.

The TF-IDF is actually: TF is IDF, TF word Frequency (Term Frequency), IDF is inverse document Frequency (Inverse Document Frequency). TF represents the frequency of occurrence of the term in document d.

The main ideas of IDF are: if the fewer documents containing the term t, i.e., the smaller n, the larger IDF, the better class distinction capability the term t has. If the number of documents containing the term t in a certain class of documents C is m and the total number of documents containing t in other classes is k, it is obvious that the number n=m+k of all documents containing t is also large when m is large, the value of IDF obtained according to the IDF formula will be small, which indicates that the term t is not strong in classification ability.

In a given document, term Frequency (TF) refers to the frequency with which a given word appears in the document. This number is a normalization to the number of words (term count) to prevent it from biasing towards long files.

Reverse document frequency (inverse document frequency, IDF) is a measure of the general importance of a word. The IDF of a particular word may be divided by the number of documents containing the word, and the quotient obtained may be logarithmized.

High term frequencies within a particular document, and low document frequencies of that term throughout the document collection, may yield a high weighted TF-IDF. Thus, TF-IDF tends to filter out common words, preserving important words.

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Referring to fig. 1, the present application provides a knowledge base searching system including: a plurality of agent terminals 100 and a server 200.

Referring to fig. 2, the application provides a method for adding a personalized marking word segmentation set, which comprises the following steps:

step S201: and receiving the history character string sent by the seat terminal.

The server receives the history string transmitted from the agent terminal, and is represented by the history string for distinguishing from the following.

The history string is preprocessed, for example, pinyin is converted into Chinese characters, internet paste formats are removed, punctuation marks are removed, and the like.

Step S202: and executing word segmentation operation on the history character string to obtain a history word segmentation set.

And performing word segmentation operation on the history character string by using a professional word segmentation device to obtain a history word segmentation set. The history word segmentation set includes a plurality of segmented words.

Step S203: and calculating the similarity between the history word segmentation set and each knowledge point document.

In order to calculate the similarity of the history word segmentation set and each knowledge point document, a TF-IDF algorithm may be used to calculate the first similarity of the current word segmentation set and each knowledge point document.

Step S204: and pushing a plurality of knowledge point documents to the seat terminal according to the sequence of the similarity from high to low.

The knowledge point documents are determined in the order of the similarity from high to low, and it is understood that the higher the similarity of the knowledge point documents, the higher the ranking. The farther back the rank is, the bottom of the similarity month for the knowledge point documents.

In order to facilitate the seat viewing, a plurality of knowledge point documents can be selected according to the sequence from high to low, and then pushed to the seat terminal, so that the seat terminal can view the plurality of knowledge point documents, and select the knowledge point documents which are needed by the seat terminal and correspond to the history character strings.

Step S205: and receiving an adding instruction which is sent by the seat terminal and contains a knowledge point document identifier.

After the seat terminal views the knowledge point document, the seat terminal can select the knowledge point document which is needed by itself and corresponds to the history character string. If the agent wishes to search the same history string, the knowledge point document still appears, and the marking operation can be performed on the knowledge point document.

The agent terminal can send an adding instruction containing a knowledge point document identifier so as to add the history word segmentation set into the marking field of the knowledge point document.

Step S206: and adding the history word segmentation set into the personalized marking word segmentation set corresponding to the seat terminal in the marking field corresponding to the knowledge point document.

The text description habits of different agents are different, so that the same knowledge point document is used to different agents, different history character strings are used to different agents, and in order to increase personalized differences and meet the use habits of different agents, marking word sets of different agents are built in marking fields, so that the history word sets are added into the marking fields corresponding to the knowledge point document and the personalized marking word sets corresponding to agent terminals.

Referring to fig. 3, the present application provides a knowledge base searching method, which is applied to the server shown in fig. 1, and the method includes:

step S301: and receiving the current character string sent by the seat terminal.

The agent terminal can input the current character string in the search box, and the server receives the current character string sent by the agent terminal.

The current character string is preprocessed, for example, pinyin is converted into Chinese characters, the internet paste format is removed, punctuation marks are removed, and the like.

Step S302: and executing word segmentation operation on the current character string to obtain a current word segmentation set.

Step S303: and calculating the first similarity between the current word segmentation set and each knowledge point document.

Taking a knowledge point document as an example, using a TF-IDF algorithm to calculate TF-IDF values of all the segmented words in the knowledge point document in the current segmented word set, and taking the sum of the TF-IDF values of all the segmented words as a first similarity between the current segmented word set and the knowledge point document.

The way the documents are processed for other knowledge points is consistent.

Step S304: and extracting each individual marking word segmentation set corresponding to the seat terminal from marking fields of each knowledge point document.

The marking field of each knowledge point document is provided with a personalized marking word segmentation set corresponding to the agent terminal, so that in order to better search the knowledge point document, the personalized marking word segmentation set can be set for each agent terminal, and words of character strings input by different agent terminals according to use habits can be stored.

Step S305: and calculating the second similarity between the current word segmentation set and the personalized marking word segmentation set of each knowledge point document.

And calculating the second similarity of the current word segmentation set and the personalized marking word segmentation set of each knowledge point document according to the word segmentation and word segmentation similarity calculation mode.

Step S306: based on the first similarity and the second similarity of each knowledge point document, the comprehensive similarity of each knowledge point document is calculated.

The first similarity is calculated based on the knowledge point documents, the second similarity is calculated based on the personalized marking word segmentation set, and the comprehensive similarity of each knowledge point document can be obtained through superposition of the first similarity and the second similarity.

Step S307: and pushing a plurality of knowledge point documents to the seat terminal according to the sequence of the comprehensive similarity from high to low.

Optionally, the server may further receive an addition instruction sent by the agent terminal and including a knowledge point document identifier; and adding the current word segmentation set into the personalized marking word segmentation set corresponding to the seat terminal in the marking field corresponding to the knowledge point document so as to enrich the personalized marking word segmentation set.

Through the technical means, the following beneficial effects can be realized:

The present application provides a knowledge base searching device according to a first embodiment, referring to fig. 4, including:

a receiving unit 41, configured to receive a current character string sent by the seat terminal;

a word segmentation unit 42, configured to perform word segmentation operation on the current character string to obtain a current word segmentation set;

a first calculating unit 43, configured to calculate a first similarity between the current word segmentation set and each knowledge point document;

an extracting unit 44, configured to extract, from the marking fields of the knowledge point documents, individual marking word-segmentation sets corresponding to the agent terminal;

a second calculating unit 45, configured to calculate a second similarity between the current word segmentation set and the personalized marking word segmentation set of each knowledge point document;

a third calculation unit 46 for calculating the comprehensive similarity of each knowledge point document based on the first similarity and the second similarity of each knowledge point document;

and a pushing unit 47, configured to push the plurality of knowledge point documents to the agent terminal in order of high-to-low integrated similarity.

Wherein the first computing unit comprises: and calculating the first similarity between the current word segmentation set and each knowledge point document by using a TF-IDF algorithm.

The present application provides a second embodiment of a knowledge base searching device, and referring to fig. 5, the second embodiment includes:

wherein after the pushing unit 47, further comprises:

the adding unit 48 is configured to receive an adding instruction sent by the agent terminal and including a knowledge point document identifier; and adding the current word segmentation set into the personalized marking word segmentation set corresponding to the seat terminal in the marking field corresponding to the knowledge point document.

Before the receiving unit 41, further includes:

a construction unit 40, configured to receive a history string sent by the seat terminal; performing word segmentation operation on the history character string to obtain a history word segmentation set; calculating the similarity between the history word segmentation set and each knowledge point document; pushing a plurality of knowledge point documents to the seat terminal according to the sequence of the comprehensive similarity from high to low; receiving an adding instruction which is sent by the seat terminal and contains a knowledge point document identifier; and adding the history word segmentation set into the personalized marking word segmentation set corresponding to the seat terminal in the marking field corresponding to the knowledge point document.

Optionally, after the receiving unit, the method further includes: and the preprocessing unit is used for executing preprocessing operation on the current character string.

Through the technical means, the following beneficial effects can be realized:

The functions described in the method of this embodiment, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computing device readable storage medium. Based on such understanding, a part of the present application that contributes to the prior art or a part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device, etc.) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A knowledge base searching method, comprising:

receiving a current character string sent by a seat terminal;

pushing a plurality of knowledge point documents to the seat terminal according to the sequence of the comprehensive similarity from high to low;

before the receiving the current character string sent by the seat terminal, the method further comprises the following steps:

receiving a history character string sent by a seat terminal;

2. The method of claim 1, wherein said calculating a first similarity of the current segmentation set to each knowledge point document comprises:

3. The method of claim 2, further comprising, after pushing the plurality of knowledge point documents to the agent terminal in the order of high-to-low integrated similarity:

4. The method of claim 1, further comprising, after receiving the current string transmitted by the agent terminal:

and executing preprocessing operation on the current character string.

5. A knowledge base searching apparatus, comprising:

the pushing unit is used for pushing a plurality of knowledge point documents to the seat terminal according to the sequence of the comprehensive similarity from high to low;

further comprises:

the construction unit is used for receiving the historical character string sent by the seat terminal before the current character string sent by the seat terminal is received; performing word segmentation operation on the history character string to obtain a history word segmentation set; calculating the similarity between the history word segmentation set and each knowledge point document; pushing a plurality of knowledge point documents to the seat terminal according to the sequence of the comprehensive similarity from high to low; receiving an adding instruction which is sent by the seat terminal and contains a knowledge point document identifier; and adding the history word segmentation set into the personalized marking word segmentation set corresponding to the seat terminal in the marking field corresponding to the knowledge point document.

6. The apparatus of claim 5, wherein the first computing unit comprises: and calculating the first similarity between the current word segmentation set and each knowledge point document by using a TF-IDF algorithm.

7. The apparatus as recited in claim 6, further comprising:

the adding unit is used for receiving an adding instruction which is sent by the seat terminal and contains a knowledge point document identifier after the plurality of knowledge point documents are pushed to the seat terminal according to the sequence of the comprehensive similarity from high to low; and adding the current word segmentation set into the personalized marking word segmentation set corresponding to the seat terminal in the marking field corresponding to the knowledge point document.

8. The apparatus as recited in claim 5, further comprising:

and the preprocessing unit is used for executing preprocessing operation on the current character string after receiving the current character string sent by the seat terminal.