CN116108141A - Similar question searching mechanism under interview scene - Google Patents

Similar question searching mechanism under interview scene Download PDF

Info

Publication number
CN116108141A
CN116108141A CN202310159297.5A CN202310159297A CN116108141A CN 116108141 A CN116108141 A CN 116108141A CN 202310159297 A CN202310159297 A CN 202310159297A CN 116108141 A CN116108141 A CN 116108141A
Authority
CN
China
Prior art keywords
question
questions
knowledge base
standard
similar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310159297.5A
Other languages
Chinese (zh)
Inventor
厉程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Shuangshi Technology Co ltd
Original Assignee
Hangzhou Shuangshi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Shuangshi Technology Co ltd filed Critical Hangzhou Shuangshi Technology Co ltd
Priority to CN202310159297.5A priority Critical patent/CN116108141A/en
Publication of CN116108141A publication Critical patent/CN116108141A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a similar question searching mechanism under an interview scene, and relates to the technical field of interview scenes. The method comprises the steps of constructing a priori knowledge base and constructing an offline knowledge base, and comprises the following steps: s1, constructing a standard problem; s2, similar question expansion writing; s3, mapping off-line questions; s4, searching on-line questions; the similar question expansion writing is expansion writing constructed based on standard questions, and the component database mainly comprises different question methods of the same standard question, and inputs a question query and a knowledge base name source through calling a service interface. The knowledge base not only contains standard sentences and similar sentences which are commonly known in the industry, but also contains high-quality questions matched with the historical interview records. Through the association operation, the matching accuracy problem and the time-consuming problem can be simultaneously optimized.

Description

Similar question searching mechanism under interview scene
Technical Field
The invention relates to the technical field of interview scenes, in particular to a similar question searching mechanism under an interview scene.
Background
The existing similar question searching mechanism under the interview scene has two technologies in total, namely a text clustering algorithm is adopted firstly, and clustering is carried out according to the similarity among different questions. And is also an unsupervised, exploratory manner. Such methods first vectorize sentences, then calculate the similarity according to the distance between vectors, and then cluster through K-means, for example. In the clustering result, the questions in the same category are regarded as similar questions, and then text similarity retrieval is carried out. And taking the target standard sentence with the matching degree exceeding a certain threshold value as a retrieval result, and taking the target standard sentence as a similar question sentence. The idea is widely applied to the fields of intelligent outbound call, intelligent customer service and the like.
But the main problem of text clustering algorithms is the low accuracy: as the clustering algorithm belongs to one of the unsupervised learning, the prior knowledge of human beings is lacking and the accuracy is not high compared with the supervised learning. In addition, many sentences are only formally resolved, but the semantics are quite different, such as question 1: please ask what is the difference between HTTP and HTTPs? Question 2: please ask about what is the difference between KPIs and OKR? These two questions are identical except HTTP, HTTPS, KPI, OKR. When clustering is performed, the clustering is easy to be classified, and since the interview scene is a highly spoken context, a large number of spoken nonsensical words appear. The interview adopts a video interview form, and word deletion can occur due to network transmission. The problem of accuracy of the superimposed voice recognition is that if the direct clustering is performed, a large number of nonsensical categories and nonsensical questions appear, and the accuracy is extremely low
The text similarity retrieval has the problems of low accuracy and long time consumption. Firstly, the number of the historical questions is huge, and the questions are influenced by spoken words, network katon, voice recognition accuracy and other factors, so that a large number of nonsensical questions are available. Directly retrieving the historical question necessarily matches a large number of nonsensical results. And secondly, searching Q questions, wherein the number of the historical questions is H. The time complexity is O (q×h) because of the need to match one by one and sort by score.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a similar question searching mechanism under the interview scene, and solves the problems of searching and matching of similar problems under the interview scene.
(II) technical scheme
2. In order to achieve the above purpose, the invention is realized by the following technical scheme: a similar question searching mechanism in an interview scene comprises construction of a priori knowledge base and construction of an offline knowledge base, and the method comprises the following steps:
s1, constructing a standard problem;
s2, similar question expansion writing;
s3, mapping off-line questions;
s4, searching on-line questions.
Preferably, the similar question spreading is a spreading constructed based on standard questions, and mainly comprises different question methods of the same standard question.
Further, the component database inputs a question query and a knowledge base name source by calling a service interface, and judges whether a knowledge base index file exists according to the source.
Furthermore, the off-line question mapping is composed of four steps of data acquisition/knowledge base construction, standard sentence matching, post processing and manual auditing.
Further, regarding the time complexity, consider Q questions, and match one by one among the H historical questions, where the time complexity is O (q×h).
(III) beneficial effects
The invention provides a similar question searching mechanism under an interview scene. The beneficial effects are as follows: according to the invention, the questions of past interview records are mapped to each standard question through a series of cleaning and matching operations, so that a knowledge base of the interview questions is constructed. The knowledge base not only contains standard sentences and similar sentences which are commonly known in the industry, but also contains high-quality questions matched with the historical interview records. Through the association operation, the matching accuracy problem and the time-consuming problem can be simultaneously optimized.
Drawings
FIG. 1 is a flow chart of the overall steps of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Embodiment one:
as shown in fig. 1, the embodiment of the invention provides a similar question searching mechanism in an interview scene, which comprises construction of a priori knowledge base and offline knowledge base construction, and the method focuses on the construction process of the priori knowledge base. In order to obtain high-quality similar questions and corresponding candidate answers by real-time online search, an offline knowledge base is constructed.
The knowledge base contains not only standard questions in the interview process, but also similar questions of the standard questions. Most importantly, we match and map past questions to associate with these standard questions. Therefore, the problems of high time consumption of online retrieval and accuracy of the retrieval result are truly solved. That is, we use these standard questions as a bridge between real-time search questions and past history face questions, thereby greatly improving the accuracy of similar question search.
The method is divided into four parts: standard question construction, similar question expansion and writing, off-line question mapping and on-line question retrieval.
1. Standard problem construction
As noted in the figure, module (1) contains standard questions in the interview process. For example, "what interests you have at ordinary times"? The selection of the problems is mainly based on manual priori knowledge, and review and refinement of historical data are assisted, so that the common problem set in the interview process is finally summarized.
2. Similar question expansion
The module (2) marked in the figure is based on the expansion of the module (1) and mainly comprises different question methods of the same standard question. The construction method comprises two methods: the first is that a synonym replacement algorithm (the part needs to be expanded and written by @ Xu Mengdi) carries out standard sentence expansion and writing; and secondly, manually performing diffusion writing according to historical real data and priori knowledge.
3. Offline question mapping
(1) Acquiring data/building a knowledge base
And (3) inputting a question query and a knowledge base name source by calling a service interface, judging whether a knowledge base index file exists according to the source, if so, entering the next step, and if not, firstly establishing an index. When the index is established, a knowledge base file is used, a standard sentence is used as a label, a corresponding similar sentence is used as an input, an input format required by a bert model is established, the input format is sent into a fine-tuned roberta model to carry out sentence vector coding, the coded sentence vector is established by using an annoy module, the index is formed by two files, wherein the ann file is a vector index tree, and the pk file is a corresponding relation table of vectors and the standard sentence.
(2) Standard sentence matching
Firstly, data are cleaned, filtering is carried out according to the existing word stop vocabulary, the filtered question sentence is constructed into an input format required by a bert model, the input format is sent into a micro-tuned roberta model to carry out sentence vector coding, the code result and a knowledge base index are subjected to angu l ar distance calculation to obtain top1, similar sentences i d and angu l ar distances of top1 are returned, corresponding standard sentences and similar sentences are obtained through id, and the score is converted into cos i ne value to be used as a basic score for threshold comparison. The question sentence and the matched similar sentence are used as input and sent into a negative example judging model, if the judgment is positive example, the processing is not carried out, if the judgment is negative example, the grading is reduced by 0.06, if the matched standard sentence has keywords, the keyword comparison is carried out, if the question sentence contains keywords, the processing is not carried out, the X score is reduced (X undetermined) if the question sentence contains no keywords, the grading is finally returned to the matched standard sentence and the grading thereof if the grading is more than 0.75, and if the grading is less than 0.75, the short sentence is returned.
(3) Post-treatment
The question sentence and the matched similar sentence are used as input and sent into a negative example judging model, if the judgment is positive example, the processing is not carried out, if the judgment is negative example, the grading is reduced by 0.06, if the matched standard sentence has keywords, the keyword comparison is carried out, if the question sentence contains keywords, the processing is not carried out, the X score is reduced (X undetermined) if the question sentence contains no keywords, the grading is finally returned to the matched standard sentence and the grading thereof if the grading is more than 0.75, and if the grading is less than 0.75, the short sentence is returned.
(4) Manual auditing
Since the construction quality of the knowledge base has a decisive influence on the retrieval result. Therefore, after the process is finished, a round of manual verification is needed, and the verification is passed and then added to the knowledge base. For example, in the figure we will get the 1 st question "#q1 you all dry-spot in amateur time at ordinary times? ", and the 2 nd standard question," what interests you have at ordinary times? "successful match" is added to the criteria-to-association list.
4. On-line question retrieval
Good knowledge of the online search question. What interests you have at ordinary times? ". It is necessary to follow the procedure close to step 3. First, standard sentence matching is performed, and then post-processing is performed. Because of the time requirement of real-time retrieval, we have no manual auditing step.
For the time complexity, consider Q questions, matching one by one among the H historical questions, the time complexity is O (q×h).
The online retrieval time complexity of the method is only O (Q multiplied by N), and N is the number of standard sentences. Because the number of standard questions involved in the interview process is far less than the full number of questions. So there is N < < H. Therefore, the method can greatly reduce the time consumption of searching for online searching.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (5)

1. A similar question searching mechanism under an interview scene is characterized in that: the method comprises the steps of constructing a priori knowledge base and constructing an offline knowledge base, and comprises the following steps:
s1, constructing a standard problem;
s2, similar question expansion writing;
s3, mapping off-line questions;
s4, searching on-line questions.
2. The similar question retrieval mechanism in an interview scenario according to claim 1, wherein: the similar question expansion is expansion constructed based on standard questions, and mainly comprises different question methods of the same standard question.
3. The similar question retrieval mechanism in an interview scenario according to claim 1, wherein: and the component database inputs a question query and a knowledge base name source through calling a service interface, and judges whether a knowledge base index file exists according to the source.
4. The similar question retrieval mechanism in an interview scenario according to claim 1, wherein: the off-line question mapping is composed of four steps of data acquisition/knowledge base construction, standard sentence matching, post-processing and manual auditing.
5. The similar question retrieval mechanism in an interview scenario according to claim 1, wherein: for the time complexity, Q questions are considered, and the time complexity is O (Q multiplied by H) and the Q questions are matched one by one in the H historical questions.
CN202310159297.5A 2023-02-24 2023-02-24 Similar question searching mechanism under interview scene Pending CN116108141A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310159297.5A CN116108141A (en) 2023-02-24 2023-02-24 Similar question searching mechanism under interview scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310159297.5A CN116108141A (en) 2023-02-24 2023-02-24 Similar question searching mechanism under interview scene

Publications (1)

Publication Number Publication Date
CN116108141A true CN116108141A (en) 2023-05-12

Family

ID=86265461

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310159297.5A Pending CN116108141A (en) 2023-02-24 2023-02-24 Similar question searching mechanism under interview scene

Country Status (1)

Country Link
CN (1) CN116108141A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117312500A (en) * 2023-11-30 2023-12-29 山东齐鲁壹点传媒有限公司 Semantic retrieval model building method based on ANN and BERT

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117312500A (en) * 2023-11-30 2023-12-29 山东齐鲁壹点传媒有限公司 Semantic retrieval model building method based on ANN and BERT
CN117312500B (en) * 2023-11-30 2024-02-27 山东齐鲁壹点传媒有限公司 Semantic retrieval model building method based on ANN and BERT

Similar Documents

Publication Publication Date Title
CN115238101B (en) Multi-engine intelligent question-answering system oriented to multi-type knowledge base
CN110675288B (en) Intelligent auxiliary judgment method, device, computer equipment and storage medium
WO2022022746A1 (en) Intent recognition method and intent recognition system having self learning capability
CN111739516A (en) Speech recognition system for intelligent customer service call
CN110704571B (en) Court trial auxiliary processing method, trial auxiliary processing device, equipment and medium
CN110321564B (en) Multi-round dialogue intention recognition method
CN111666425B (en) Automobile accessory searching method based on semantic knowledge
WO2021036439A1 (en) Method for responding to complaint, and device
CN112487810B (en) Intelligent customer service method, device, equipment and storage medium
CN111191051B (en) Method and system for constructing emergency knowledge map based on Chinese word segmentation technology
CN111159375A (en) Text processing method and device
CN110825865A (en) Multi-round conversation intelligent customer service system based on special word correction and cold start
CN116108141A (en) Similar question searching mechanism under interview scene
CN114818649A (en) Service consultation processing method and device based on intelligent voice interaction technology
CN112183051A (en) Intelligent voice follow-up method, system, computer equipment, storage medium and program product
CN115269836A (en) Intention identification method and device
CN113297365B (en) User intention judging method, device, equipment and storage medium
CN114330318A (en) Method and device for recognizing Chinese fine-grained entities in financial field
CN111460114A (en) Retrieval method, device, equipment and computer readable storage medium
CN114708047B (en) Outbound strategy operation method and system based on knowledge graph
CN113569022B (en) Method for realizing dialogue robot response engine based on cascade search
CN116775848B (en) Control method, device, computing equipment and storage medium for generating dialogue information
CN117453895B (en) Intelligent customer service response method, device, equipment and readable storage medium
CN113297361B (en) Intelligent question-answer interaction system and method based on visual flow chart
CN117909754A (en) Auxiliary power plant equipment defect elimination method and system based on twin neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination