CN116108141A

CN116108141A - Similar question searching mechanism under interview scene

Info

Publication number: CN116108141A
Application number: CN202310159297.5A
Authority: CN
Inventors: 厉程
Original assignee: Hangzhou Shuangshi Technology Co ltd
Current assignee: Hangzhou Shuangshi Technology Co ltd
Priority date: 2023-02-24
Filing date: 2023-02-24
Publication date: 2023-05-12

Abstract

The invention provides a similar question searching mechanism under an interview scene, and relates to the technical field of interview scenes. The method comprises the steps of constructing a priori knowledge base and constructing an offline knowledge base, and comprises the following steps: s1, constructing a standard problem; s2, similar question expansion writing; s3, mapping off-line questions; s4, searching on-line questions; the similar question expansion writing is expansion writing constructed based on standard questions, and the component database mainly comprises different question methods of the same standard question, and inputs a question query and a knowledge base name source through calling a service interface. The knowledge base not only contains standard sentences and similar sentences which are commonly known in the industry, but also contains high-quality questions matched with the historical interview records. Through the association operation, the matching accuracy problem and the time-consuming problem can be simultaneously optimized.

Description

Similar question searching mechanism under interview scene

Technical Field

The invention relates to the technical field of interview scenes, in particular to a similar question searching mechanism under an interview scene.

Background

The existing similar question searching mechanism under the interview scene has two technologies in total, namely a text clustering algorithm is adopted firstly, and clustering is carried out according to the similarity among different questions. And is also an unsupervised, exploratory manner. Such methods first vectorize sentences, then calculate the similarity according to the distance between vectors, and then cluster through K-means, for example. In the clustering result, the questions in the same category are regarded as similar questions, and then text similarity retrieval is carried out. And taking the target standard sentence with the matching degree exceeding a certain threshold value as a retrieval result, and taking the target standard sentence as a similar question sentence. The idea is widely applied to the fields of intelligent outbound call, intelligent customer service and the like.

But the main problem of text clustering algorithms is the low accuracy: as the clustering algorithm belongs to one of the unsupervised learning, the prior knowledge of human beings is lacking and the accuracy is not high compared with the supervised learning. In addition, many sentences are only formally resolved, but the semantics are quite different, such as question 1: please ask what is the difference between HTTP and HTTPs? Question 2: please ask about what is the difference between KPIs and OKR? These two questions are identical except HTTP, HTTPS, KPI, OKR. When clustering is performed, the clustering is easy to be classified, and since the interview scene is a highly spoken context, a large number of spoken nonsensical words appear. The interview adopts a video interview form, and word deletion can occur due to network transmission. The problem of accuracy of the superimposed voice recognition is that if the direct clustering is performed, a large number of nonsensical categories and nonsensical questions appear, and the accuracy is extremely low

The text similarity retrieval has the problems of low accuracy and long time consumption. Firstly, the number of the historical questions is huge, and the questions are influenced by spoken words, network katon, voice recognition accuracy and other factors, so that a large number of nonsensical questions are available. Directly retrieving the historical question necessarily matches a large number of nonsensical results. And secondly, searching Q questions, wherein the number of the historical questions is H. The time complexity is O (q×h) because of the need to match one by one and sort by score.

Disclosure of Invention

(one) solving the technical problems

Aiming at the defects of the prior art, the invention provides a similar question searching mechanism under the interview scene, and solves the problems of searching and matching of similar problems under the interview scene.

(II) technical scheme

2. In order to achieve the above purpose, the invention is realized by the following technical scheme: a similar question searching mechanism in an interview scene comprises construction of a priori knowledge base and construction of an offline knowledge base, and the method comprises the following steps:

s1, constructing a standard problem;

s2, similar question expansion writing;

s3, mapping off-line questions;

s4, searching on-line questions.

Preferably, the similar question spreading is a spreading constructed based on standard questions, and mainly comprises different question methods of the same standard question.

Further, the component database inputs a question query and a knowledge base name source by calling a service interface, and judges whether a knowledge base index file exists according to the source.

Furthermore, the off-line question mapping is composed of four steps of data acquisition/knowledge base construction, standard sentence matching, post processing and manual auditing.

Further, regarding the time complexity, consider Q questions, and match one by one among the H historical questions, where the time complexity is O (q×h).

(III) beneficial effects

The invention provides a similar question searching mechanism under an interview scene. The beneficial effects are as follows: according to the invention, the questions of past interview records are mapped to each standard question through a series of cleaning and matching operations, so that a knowledge base of the interview questions is constructed. The knowledge base not only contains standard sentences and similar sentences which are commonly known in the industry, but also contains high-quality questions matched with the historical interview records. Through the association operation, the matching accuracy problem and the time-consuming problem can be simultaneously optimized.

Drawings

FIG. 1 is a flow chart of the overall steps of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Embodiment one:

as shown in fig. 1, the embodiment of the invention provides a similar question searching mechanism in an interview scene, which comprises construction of a priori knowledge base and offline knowledge base construction, and the method focuses on the construction process of the priori knowledge base. In order to obtain high-quality similar questions and corresponding candidate answers by real-time online search, an offline knowledge base is constructed.

The knowledge base contains not only standard questions in the interview process, but also similar questions of the standard questions. Most importantly, we match and map past questions to associate with these standard questions. Therefore, the problems of high time consumption of online retrieval and accuracy of the retrieval result are truly solved. That is, we use these standard questions as a bridge between real-time search questions and past history face questions, thereby greatly improving the accuracy of similar question search.

The method is divided into four parts: standard question construction, similar question expansion and writing, off-line question mapping and on-line question retrieval.

1. Standard problem construction

As noted in the figure, module (1) contains standard questions in the interview process. For example, "what interests you have at ordinary times"? The selection of the problems is mainly based on manual priori knowledge, and review and refinement of historical data are assisted, so that the common problem set in the interview process is finally summarized.

2. Similar question expansion

The module (2) marked in the figure is based on the expansion of the module (1) and mainly comprises different question methods of the same standard question. The construction method comprises two methods: the first is that a synonym replacement algorithm (the part needs to be expanded and written by @ Xu Mengdi) carries out standard sentence expansion and writing; and secondly, manually performing diffusion writing according to historical real data and priori knowledge.

3. Offline question mapping

(1) Acquiring data/building a knowledge base

And (3) inputting a question query and a knowledge base name source by calling a service interface, judging whether a knowledge base index file exists according to the source, if so, entering the next step, and if not, firstly establishing an index. When the index is established, a knowledge base file is used, a standard sentence is used as a label, a corresponding similar sentence is used as an input, an input format required by a bert model is established, the input format is sent into a fine-tuned roberta model to carry out sentence vector coding, the coded sentence vector is established by using an annoy module, the index is formed by two files, wherein the ann file is a vector index tree, and the pk file is a corresponding relation table of vectors and the standard sentence.

(2) Standard sentence matching

Firstly, data are cleaned, filtering is carried out according to the existing word stop vocabulary, the filtered question sentence is constructed into an input format required by a bert model, the input format is sent into a micro-tuned roberta model to carry out sentence vector coding, the code result and a knowledge base index are subjected to angu l ar distance calculation to obtain top1, similar sentences i d and angu l ar distances of top1 are returned, corresponding standard sentences and similar sentences are obtained through id, and the score is converted into cos i ne value to be used as a basic score for threshold comparison. The question sentence and the matched similar sentence are used as input and sent into a negative example judging model, if the judgment is positive example, the processing is not carried out, if the judgment is negative example, the grading is reduced by 0.06, if the matched standard sentence has keywords, the keyword comparison is carried out, if the question sentence contains keywords, the processing is not carried out, the X score is reduced (X undetermined) if the question sentence contains no keywords, the grading is finally returned to the matched standard sentence and the grading thereof if the grading is more than 0.75, and if the grading is less than 0.75, the short sentence is returned.

(3) Post-treatment

The question sentence and the matched similar sentence are used as input and sent into a negative example judging model, if the judgment is positive example, the processing is not carried out, if the judgment is negative example, the grading is reduced by 0.06, if the matched standard sentence has keywords, the keyword comparison is carried out, if the question sentence contains keywords, the processing is not carried out, the X score is reduced (X undetermined) if the question sentence contains no keywords, the grading is finally returned to the matched standard sentence and the grading thereof if the grading is more than 0.75, and if the grading is less than 0.75, the short sentence is returned.

(4) Manual auditing

Since the construction quality of the knowledge base has a decisive influence on the retrieval result. Therefore, after the process is finished, a round of manual verification is needed, and the verification is passed and then added to the knowledge base. For example, in the figure we will get the 1 st question "#q1 you all dry-spot in amateur time at ordinary times? ", and the 2 nd standard question," what interests you have at ordinary times? "successful match" is added to the criteria-to-association list.

4. On-line question retrieval

Good knowledge of the online search question. What interests you have at ordinary times? ". It is necessary to follow the procedure close to step 3. First, standard sentence matching is performed, and then post-processing is performed. Because of the time requirement of real-time retrieval, we have no manual auditing step.

For the time complexity, consider Q questions, matching one by one among the H historical questions, the time complexity is O (q×h).

The online retrieval time complexity of the method is only O (Q multiplied by N), and N is the number of standard sentences. Because the number of standard questions involved in the interview process is far less than the full number of questions. So there is N < < H. Therefore, the method can greatly reduce the time consumption of searching for online searching.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A similar question searching mechanism under an interview scene is characterized in that: the method comprises the steps of constructing a priori knowledge base and constructing an offline knowledge base, and comprises the following steps:

s1, constructing a standard problem;

s2, similar question expansion writing;

s3, mapping off-line questions;

s4, searching on-line questions.

2. The similar question retrieval mechanism in an interview scenario according to claim 1, wherein: the similar question expansion is expansion constructed based on standard questions, and mainly comprises different question methods of the same standard question.

3. The similar question retrieval mechanism in an interview scenario according to claim 1, wherein: and the component database inputs a question query and a knowledge base name source through calling a service interface, and judges whether a knowledge base index file exists according to the source.

4. The similar question retrieval mechanism in an interview scenario according to claim 1, wherein: the off-line question mapping is composed of four steps of data acquisition/knowledge base construction, standard sentence matching, post-processing and manual auditing.

5. The similar question retrieval mechanism in an interview scenario according to claim 1, wherein: for the time complexity, Q questions are considered, and the time complexity is O (Q multiplied by H) and the Q questions are matched one by one in the H historical questions.