CN116108141A - Similar question searching mechanism under interview scene - Google Patents
Similar question searching mechanism under interview scene Download PDFInfo
- Publication number
- CN116108141A CN116108141A CN202310159297.5A CN202310159297A CN116108141A CN 116108141 A CN116108141 A CN 116108141A CN 202310159297 A CN202310159297 A CN 202310159297A CN 116108141 A CN116108141 A CN 116108141A
- Authority
- CN
- China
- Prior art keywords
- question
- questions
- knowledge base
- standard
- similar
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/322—Trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3338—Query expansion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a similar question searching mechanism under an interview scene, and relates to the technical field of interview scenes. The method comprises the steps of constructing a priori knowledge base and constructing an offline knowledge base, and comprises the following steps: s1, constructing a standard problem; s2, similar question expansion writing; s3, mapping off-line questions; s4, searching on-line questions; the similar question expansion writing is expansion writing constructed based on standard questions, and the component database mainly comprises different question methods of the same standard question, and inputs a question query and a knowledge base name source through calling a service interface. The knowledge base not only contains standard sentences and similar sentences which are commonly known in the industry, but also contains high-quality questions matched with the historical interview records. Through the association operation, the matching accuracy problem and the time-consuming problem can be simultaneously optimized.
Description
Technical Field
The invention relates to the technical field of interview scenes, in particular to a similar question searching mechanism under an interview scene.
Background
The existing similar question searching mechanism under the interview scene has two technologies in total, namely a text clustering algorithm is adopted firstly, and clustering is carried out according to the similarity among different questions. And is also an unsupervised, exploratory manner. Such methods first vectorize sentences, then calculate the similarity according to the distance between vectors, and then cluster through K-means, for example. In the clustering result, the questions in the same category are regarded as similar questions, and then text similarity retrieval is carried out. And taking the target standard sentence with the matching degree exceeding a certain threshold value as a retrieval result, and taking the target standard sentence as a similar question sentence. The idea is widely applied to the fields of intelligent outbound call, intelligent customer service and the like.
But the main problem of text clustering algorithms is the low accuracy: as the clustering algorithm belongs to one of the unsupervised learning, the prior knowledge of human beings is lacking and the accuracy is not high compared with the supervised learning. In addition, many sentences are only formally resolved, but the semantics are quite different, such as question 1: please ask what is the difference between HTTP and HTTPs? Question 2: please ask about what is the difference between KPIs and OKR? These two questions are identical except HTTP, HTTPS, KPI, OKR. When clustering is performed, the clustering is easy to be classified, and since the interview scene is a highly spoken context, a large number of spoken nonsensical words appear. The interview adopts a video interview form, and word deletion can occur due to network transmission. The problem of accuracy of the superimposed voice recognition is that if the direct clustering is performed, a large number of nonsensical categories and nonsensical questions appear, and the accuracy is extremely low
The text similarity retrieval has the problems of low accuracy and long time consumption. Firstly, the number of the historical questions is huge, and the questions are influenced by spoken words, network katon, voice recognition accuracy and other factors, so that a large number of nonsensical questions are available. Directly retrieving the historical question necessarily matches a large number of nonsensical results. And secondly, searching Q questions, wherein the number of the historical questions is H. The time complexity is O (q×h) because of the need to match one by one and sort by score.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a similar question searching mechanism under the interview scene, and solves the problems of searching and matching of similar problems under the interview scene.
(II) technical scheme
2. In order to achieve the above purpose, the invention is realized by the following technical scheme: a similar question searching mechanism in an interview scene comprises construction of a priori knowledge base and construction of an offline knowledge base, and the method comprises the following steps:
s1, constructing a standard problem;
s2, similar question expansion writing;
s3, mapping off-line questions;
s4, searching on-line questions.
Preferably, the similar question spreading is a spreading constructed based on standard questions, and mainly comprises different question methods of the same standard question.
Further, the component database inputs a question query and a knowledge base name source by calling a service interface, and judges whether a knowledge base index file exists according to the source.
Furthermore, the off-line question mapping is composed of four steps of data acquisition/knowledge base construction, standard sentence matching, post processing and manual auditing.
Further, regarding the time complexity, consider Q questions, and match one by one among the H historical questions, where the time complexity is O (q×h).
(III) beneficial effects
The invention provides a similar question searching mechanism under an interview scene. The beneficial effects are as follows: according to the invention, the questions of past interview records are mapped to each standard question through a series of cleaning and matching operations, so that a knowledge base of the interview questions is constructed. The knowledge base not only contains standard sentences and similar sentences which are commonly known in the industry, but also contains high-quality questions matched with the historical interview records. Through the association operation, the matching accuracy problem and the time-consuming problem can be simultaneously optimized.
Drawings
FIG. 1 is a flow chart of the overall steps of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Embodiment one:
as shown in fig. 1, the embodiment of the invention provides a similar question searching mechanism in an interview scene, which comprises construction of a priori knowledge base and offline knowledge base construction, and the method focuses on the construction process of the priori knowledge base. In order to obtain high-quality similar questions and corresponding candidate answers by real-time online search, an offline knowledge base is constructed.
The knowledge base contains not only standard questions in the interview process, but also similar questions of the standard questions. Most importantly, we match and map past questions to associate with these standard questions. Therefore, the problems of high time consumption of online retrieval and accuracy of the retrieval result are truly solved. That is, we use these standard questions as a bridge between real-time search questions and past history face questions, thereby greatly improving the accuracy of similar question search.
The method is divided into four parts: standard question construction, similar question expansion and writing, off-line question mapping and on-line question retrieval.
1. Standard problem construction
As noted in the figure, module (1) contains standard questions in the interview process. For example, "what interests you have at ordinary times"? The selection of the problems is mainly based on manual priori knowledge, and review and refinement of historical data are assisted, so that the common problem set in the interview process is finally summarized.
2. Similar question expansion
The module (2) marked in the figure is based on the expansion of the module (1) and mainly comprises different question methods of the same standard question. The construction method comprises two methods: the first is that a synonym replacement algorithm (the part needs to be expanded and written by @ Xu Mengdi) carries out standard sentence expansion and writing; and secondly, manually performing diffusion writing according to historical real data and priori knowledge.
3. Offline question mapping
(1) Acquiring data/building a knowledge base
And (3) inputting a question query and a knowledge base name source by calling a service interface, judging whether a knowledge base index file exists according to the source, if so, entering the next step, and if not, firstly establishing an index. When the index is established, a knowledge base file is used, a standard sentence is used as a label, a corresponding similar sentence is used as an input, an input format required by a bert model is established, the input format is sent into a fine-tuned roberta model to carry out sentence vector coding, the coded sentence vector is established by using an annoy module, the index is formed by two files, wherein the ann file is a vector index tree, and the pk file is a corresponding relation table of vectors and the standard sentence.
(2) Standard sentence matching
Firstly, data are cleaned, filtering is carried out according to the existing word stop vocabulary, the filtered question sentence is constructed into an input format required by a bert model, the input format is sent into a micro-tuned roberta model to carry out sentence vector coding, the code result and a knowledge base index are subjected to angu l ar distance calculation to obtain top1, similar sentences i d and angu l ar distances of top1 are returned, corresponding standard sentences and similar sentences are obtained through id, and the score is converted into cos i ne value to be used as a basic score for threshold comparison. The question sentence and the matched similar sentence are used as input and sent into a negative example judging model, if the judgment is positive example, the processing is not carried out, if the judgment is negative example, the grading is reduced by 0.06, if the matched standard sentence has keywords, the keyword comparison is carried out, if the question sentence contains keywords, the processing is not carried out, the X score is reduced (X undetermined) if the question sentence contains no keywords, the grading is finally returned to the matched standard sentence and the grading thereof if the grading is more than 0.75, and if the grading is less than 0.75, the short sentence is returned.
(3) Post-treatment
The question sentence and the matched similar sentence are used as input and sent into a negative example judging model, if the judgment is positive example, the processing is not carried out, if the judgment is negative example, the grading is reduced by 0.06, if the matched standard sentence has keywords, the keyword comparison is carried out, if the question sentence contains keywords, the processing is not carried out, the X score is reduced (X undetermined) if the question sentence contains no keywords, the grading is finally returned to the matched standard sentence and the grading thereof if the grading is more than 0.75, and if the grading is less than 0.75, the short sentence is returned.
(4) Manual auditing
Since the construction quality of the knowledge base has a decisive influence on the retrieval result. Therefore, after the process is finished, a round of manual verification is needed, and the verification is passed and then added to the knowledge base. For example, in the figure we will get the 1 st question "#q1 you all dry-spot in amateur time at ordinary times? ", and the 2 nd standard question," what interests you have at ordinary times? "successful match" is added to the criteria-to-association list.
4. On-line question retrieval
Good knowledge of the online search question. What interests you have at ordinary times? ". It is necessary to follow the procedure close to step 3. First, standard sentence matching is performed, and then post-processing is performed. Because of the time requirement of real-time retrieval, we have no manual auditing step.
For the time complexity, consider Q questions, matching one by one among the H historical questions, the time complexity is O (q×h).
The online retrieval time complexity of the method is only O (Q multiplied by N), and N is the number of standard sentences. Because the number of standard questions involved in the interview process is far less than the full number of questions. So there is N < < H. Therefore, the method can greatly reduce the time consumption of searching for online searching.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (5)
1. A similar question searching mechanism under an interview scene is characterized in that: the method comprises the steps of constructing a priori knowledge base and constructing an offline knowledge base, and comprises the following steps:
s1, constructing a standard problem;
s2, similar question expansion writing;
s3, mapping off-line questions;
s4, searching on-line questions.
2. The similar question retrieval mechanism in an interview scenario according to claim 1, wherein: the similar question expansion is expansion constructed based on standard questions, and mainly comprises different question methods of the same standard question.
3. The similar question retrieval mechanism in an interview scenario according to claim 1, wherein: and the component database inputs a question query and a knowledge base name source through calling a service interface, and judges whether a knowledge base index file exists according to the source.
4. The similar question retrieval mechanism in an interview scenario according to claim 1, wherein: the off-line question mapping is composed of four steps of data acquisition/knowledge base construction, standard sentence matching, post-processing and manual auditing.
5. The similar question retrieval mechanism in an interview scenario according to claim 1, wherein: for the time complexity, Q questions are considered, and the time complexity is O (Q multiplied by H) and the Q questions are matched one by one in the H historical questions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310159297.5A CN116108141A (en) | 2023-02-24 | 2023-02-24 | Similar question searching mechanism under interview scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310159297.5A CN116108141A (en) | 2023-02-24 | 2023-02-24 | Similar question searching mechanism under interview scene |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116108141A true CN116108141A (en) | 2023-05-12 |
Family
ID=86265461
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310159297.5A Pending CN116108141A (en) | 2023-02-24 | 2023-02-24 | Similar question searching mechanism under interview scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116108141A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117312500A (en) * | 2023-11-30 | 2023-12-29 | 山东齐鲁壹点传媒有限公司 | Semantic retrieval model building method based on ANN and BERT |
-
2023
- 2023-02-24 CN CN202310159297.5A patent/CN116108141A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117312500A (en) * | 2023-11-30 | 2023-12-29 | 山东齐鲁壹点传媒有限公司 | Semantic retrieval model building method based on ANN and BERT |
CN117312500B (en) * | 2023-11-30 | 2024-02-27 | 山东齐鲁壹点传媒有限公司 | Semantic retrieval model building method based on ANN and BERT |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115238101B (en) | Multi-engine intelligent question-answering system oriented to multi-type knowledge base | |
CN110675288B (en) | Intelligent auxiliary judgment method, device, computer equipment and storage medium | |
WO2022022746A1 (en) | Intent recognition method and intent recognition system having self learning capability | |
CN111739516A (en) | Speech recognition system for intelligent customer service call | |
CN110704571B (en) | Court trial auxiliary processing method, trial auxiliary processing device, equipment and medium | |
CN110321564B (en) | Multi-round dialogue intention recognition method | |
CN111666425B (en) | Automobile accessory searching method based on semantic knowledge | |
WO2021036439A1 (en) | Method for responding to complaint, and device | |
CN112487810B (en) | Intelligent customer service method, device, equipment and storage medium | |
CN111191051B (en) | Method and system for constructing emergency knowledge map based on Chinese word segmentation technology | |
CN111159375A (en) | Text processing method and device | |
CN110825865A (en) | Multi-round conversation intelligent customer service system based on special word correction and cold start | |
CN116108141A (en) | Similar question searching mechanism under interview scene | |
CN114818649A (en) | Service consultation processing method and device based on intelligent voice interaction technology | |
CN112183051A (en) | Intelligent voice follow-up method, system, computer equipment, storage medium and program product | |
CN115269836A (en) | Intention identification method and device | |
CN113297365B (en) | User intention judging method, device, equipment and storage medium | |
CN114330318A (en) | Method and device for recognizing Chinese fine-grained entities in financial field | |
CN111460114A (en) | Retrieval method, device, equipment and computer readable storage medium | |
CN114708047B (en) | Outbound strategy operation method and system based on knowledge graph | |
CN113569022B (en) | Method for realizing dialogue robot response engine based on cascade search | |
CN116775848B (en) | Control method, device, computing equipment and storage medium for generating dialogue information | |
CN117453895B (en) | Intelligent customer service response method, device, equipment and readable storage medium | |
CN113297361B (en) | Intelligent question-answer interaction system and method based on visual flow chart | |
CN117909754A (en) | Auxiliary power plant equipment defect elimination method and system based on twin neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |