CN110399472B - Interview question prompting method and device, computer equipment and storage medium - Google Patents

Interview question prompting method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN110399472B
CN110399472B CN201910523564.6A CN201910523564A CN110399472B CN 110399472 B CN110399472 B CN 110399472B CN 201910523564 A CN201910523564 A CN 201910523564A CN 110399472 B CN110399472 B CN 110399472B
Authority
CN
China
Prior art keywords
interview
content
vector
question
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910523564.6A
Other languages
Chinese (zh)
Other versions
CN110399472A (en
Inventor
金戈
徐亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910523564.6A priority Critical patent/CN110399472B/en
Publication of CN110399472A publication Critical patent/CN110399472A/en
Application granted granted Critical
Publication of CN110399472B publication Critical patent/CN110399472B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Abstract

The application relates to the field of voice semantics, and realizes prediction of content of interest of an interviewer according to conversation content in an interviewing process by fusing attention information of the interviewer, and provides prompts for questions of the interviewer. Specifically, disclosed are an interview question prompting method, device, computer equipment and storage medium, the method comprising: acquiring interview session data and interview corpus data to obtain interview content data; extracting interview question texts asked by interviewers to obtain interview question data; based on the BERT model, calculating content feature vectors corresponding to the interview content data according to the interview content data, and calculating corresponding problem feature vectors according to the interview problem data; calculating an attention feature vector according to the content feature vector and the problem feature vector; splicing the attention feature vector and the content feature vector to obtain a content vector with attention information; and determining prompt information in the interview content data according to the content vector and the question feature vector so as to prompt an interviewee to ask a question.

Description

Interview question prompting method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a method and an apparatus for interview question prompt, a computer device, and a storage medium.
Background
In an interview, an interviewee is allowed to ask a question to the interviewee before the interviewee is finished, and the interviewee can know the detailed conditions of companies, positions and the like through the question; it is common today for interviewers to ask questions based on pre-prepared questions or for interviewers to actively introduce content that may be of interest to the interviewer.
Although the existing artificial intelligent interview system can realize question answering with an interviewer, the content which the interviewer is interested in cannot be accurately recommended through conversation content in the interview process, so that a prompt cannot be accurately provided for asking questions of the interviewer, and the credibility of the artificial intelligent interview system is reduced.
Disclosure of Invention
The embodiment of the application provides an interview question prompting method, an interview question prompting device, computer equipment and a storage medium, which can better predict content interested by an interviewer according to conversation content in an interview process and give a prompt for the interviewer to ask a question.
In a first aspect, the present application provides an interview question prompting method, including:
acquiring interview session data and preset interview corpus data, and processing the interview session data and the interview corpus data into interview content data;
extracting interview question texts asked by interviewers from the interview session data, and processing the interview question texts into interview question data;
based on a BERT model, calculating a content feature vector for describing the interview content data according to the interview content data, and calculating a problem feature vector for describing the interview problem text according to the interview problem data;
calculating an attention feature vector from the content feature vector and a question feature vector, the attention feature vector representing a degree of attention of the question feature vector to the content feature vector;
splicing the attention feature vector and the content feature vector to obtain a content vector with attention information;
and determining a prompt starting point and a prompt ending point in the interview content data according to the content vector and the question feature vector, and outputting prompt information according to the text between the prompt starting point and the prompt ending point to prompt an interviewer to ask a question.
In a second aspect, the present application provides an interview question prompting device, comprising:
the content acquisition module is used for acquiring interview session data and preset interview corpus data and processing the interview session data and the interview corpus data into interview content data;
the question acquisition module is used for extracting interview question texts asked by interviewers from the interview session data and processing the interview question texts into interview question data;
the feature vector calculation module is used for calculating a content feature vector for describing the interview content data according to the interview content data and calculating a problem feature vector for describing the interview problem text according to the interview problem data based on a BERT model;
an attention calculation module for calculating an attention feature vector from the content feature vector and a question feature vector, the attention feature vector representing an attention degree of the question feature vector to the content feature vector;
the attention splicing module is used for splicing the attention feature vector and the content feature vector to obtain a content vector with attention information;
and the prompt interval determining module is used for determining a prompt starting point and a prompt ending point in the interview content data according to the content vector and the question feature vector, and outputting prompt information according to a text between the prompt starting point and the prompt ending point so as to prompt an interviewee to ask a question.
In a third aspect, the present application provides a computer device comprising a memory and a processor; the memory is used for storing a computer program; the processor is used for executing the computer program and realizing the interview question prompting method when the computer program is executed.
In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored, and if the computer program is executed by a processor, the interview question prompting method is implemented.
The application discloses an interview question prompting method, an interview question prompting device, interview question prompting equipment and a storage medium, wherein content feature vectors are extracted from interview session data and preset interview corpus data through a BERT model, and question feature vectors are extracted from interview question data of an interviewer; then, calculating an attention feature vector for representing the attention degree from the problem feature vector to the content feature vector so as to splice the attention information to the content feature vector to obtain a content vector with the attentiveness information of the interviewee; determining a text for prompting in the interview content data according to the content vector and the question feature vector; therefore, the content which is interested by the interviewer is predicted according to the conversation content in the interviewing process, and a prompt is given to the interviewer for asking; moreover, the attention information of the interviewer is fused in the prediction process, and the prediction is more accurate.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic flow chart illustrating an interview question prompting method according to an embodiment of the present application;
FIG. 2 is a schematic view of a sub-process for obtaining interview content data shown in FIG. 1;
FIG. 3 is a schematic view of a sub-process for obtaining interview question data of FIG. 1;
FIG. 4 is a sub-flow diagram of the calculation of the attention feature vector of FIG. 1;
FIG. 5 is a sub-flow diagram illustrating the process of FIG. 1 to obtain a content vector with attention information;
FIG. 6 is a sub-flowchart illustrating the determination of the prompt start point and the prompt end point in FIG. 1;
fig. 7 is a schematic structural diagram of an interview question prompting device according to an embodiment of the present application;
fig. 8 is a schematic structural view of an interview question prompting device according to another embodiment of the present application;
fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, of the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation. In addition, although the division of the functional blocks is made in the device diagram, it may be divided in different blocks from that in the device diagram in some cases.
The embodiment of the application provides an interview question prompting method, device and equipment and a storage medium. The interview question prompting method can be applied to a terminal or a server to predict the content of interest of an interviewer according to the conversation content in the interview process and give a prompt for the interviewer to ask a question.
For example, the interview question prompting method is used for a server, and certainly can be used for terminals such as mobile phones, notebooks, desktops and the like. However, for ease of understanding, the following embodiments will be described in detail with reference to an interview question prompting method applied to a server.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating an interview question prompting method according to an embodiment of the present disclosure.
As shown in fig. 1, the interview question prompting method includes the following steps S110 to S160.
Step S110, interview session data and preset interview corpus data are obtained, and the interview session data and the interview corpus data are processed into interview content data.
In some embodiments, interview session data is acquired during an interviewer interview. Illustratively, interview session data includes information about the session between the interviewer and the interviewer. The interviewer may be a human or an interview robot. The conversation may include a speech form and/or a text form conversation, wherein the speech form conversation may be converted to the text form conversation through speech recognition.
Illustratively, interview session data includes interviewer question information, interviewer answer information; and the information of questions asked by interviewers and the information of answers answered by interviewers can be included.
Illustratively, interview session data includes question data and answer data that has occurred between the interviewer and the interviewer. For example interview session data includes:
interviewer question 1: what do you like o? What is it liked to do when empty?
Interviewer answers 1: i prefer to play basketball in particular because playing basketball can foster a team spirit.
Interviewer question 2: when do you feel the pressure is the greatest?
Interviewer answers 2: when I feel most stressful, I want to do several things at the same time, for example, when you want to take an end exam, you need to organize a large-scale activity, and this time my stress is especially great.
Interviewer questions 1: how long you stay at the company and where you prefer the company.
Interviewer answers 1: … … is added.
……。
Illustratively, interview corpus data is preset and stored, and the interview corpus data includes information that can be queried by an interviewer, and the like.
The interview corpus data that can be queried by the interviewer illustratively includes information about the company's situation, position description information, treatment information, and the like.
For example, the interview corpus data includes: company is a financial company that is listed at the hong kong exchange and the shanghai securities exchange. The main business is to provide diversified financial services and products, and take insurance business as the core; the Shenzhen snakes were established in 1988 by the company, and the company is the first insurance enterprise made by shares in China. On 3/1 of 2007, company, stock a, was marketed at shanghai stock exchange. On day 11, 5/2007, the service of the permanent finger announced that company added the ingredient stock … … for the permanent index from day 4, 6/2007.
In some embodiments, the interview corpus data and interview session data are arranged in a preset format and stored to obtain interview content data. For example, interview corpus data is stored after interview session data to obtain interview content data.
In other embodiments, the interview corpus data and the interview session data may be preprocessed and then stored to obtain interview content data.
Illustratively, as shown in fig. 2, the processing of the interview session data and the interview corpus data into interview content data in step S110 includes steps S111 to S113.
And step S111, performing word segmentation processing on the interview session data and the interview corpus data.
Illustratively, the interview session data and the interview corpus data are encoded by using one-hot encoding, namely one-hot encoding, and a start character [ CLS ], a separator [ SEP ] between sentences and a separator [ SEP ] at the end of a sentence are added at the beginning of each sentence of text in the encoded interview session data and the interview corpus data.
For example, the word segmentation of the text of a question in interview session data is as follows: [ CLS ] how long [ SEP ] you stay in the company and where [ SEP ] the company likes best.
And step S112, carrying out embedding processing on the participle information, paragraph information and position information of the interview session data and interview corpus data after the participle processing.
The word segmentation information is embedded, the obtained embedding result Token Embeddings is a word vector, and the first word is a CLS mark and can be used for a subsequent prediction task.
In the embodiment, each word (token) obtained by segmenting the interview session data and the interview corpus data is sent to the token embedding layer, so that each word is converted into a vector form.
Illustratively, the token embedding layer converts individual words into fixed-dimension vectors, e.g., converts individual words into 768-dimensional vector representations.
And embedding the paragraph information to obtain an embedding result Segment indexes for distinguishing different sentences in the interview session data and the interview corpus data.
The embedding result Position entries of the Position information is learned. For example, the BERT model can handle input sequences of up to 512 words (tokens). Sequence order information is encoded by having the BERT model learn a vector representation at each position. This means that the Position Embeddings layer is actually a lookup table of size (512, 768), the first row of which represents the first Position of the first sequence, the second row represents the second Position of the sequence, and so on.
Specifically, the first word (token) of each participled answer text is always embedded in a special classification, namely, the initial character [ CLS ]. The final hidden state corresponding to the initiator, i.e., the transform's output, is used as the aggregate sequence representation for the classification task.
And S113, adding the embedding results of the word segmentation information, the paragraph information and the position information to obtain interview content data.
Illustratively, the Token entries of the segmentation information are vector representations of words, the Segment information segement entries can assist the BERT model in distinguishing vector representations of different sentences, and the Position information Position entries can enable the BERT model to learn the input order attributes.
Illustratively, the embedding results of the segmentation information, the paragraph information and the position information are vectors of (1, n, 768), and the embedding results are added element by element to obtain a composite representation of (1, n, 768), which can be used as interview content data and can be used as input representation of the BERT model coding layer.
And step S120, extracting interview question texts asked by interviewers from the interview session data, and processing the interview question texts into interview question data.
In this embodiment, the interview session data acquired in step S110 includes interviewer question information, so interview question text asked by the interviewer can be extracted from the interview session data.
In other embodiments, interview question text that the interviewer asks is obtained during the interview process, for example, the interviewer's question may be converted to text format by voice recognition.
Exemplary interview question text for an interviewer includes:
interviewer questions 1: how long you stay at the company and where you prefer the company.
……。
In some embodiment modes, the interview question texts are arranged and stored in a preset format, for example, according to the time sequence of the occurrence of the questions, and interview question data is obtained.
In other embodiments, the interview question text can be preprocessed and then stored to obtain interview question data.
Illustratively, as shown in fig. 3, the processing of the interview question text into interview question data in step S120 includes steps S121 to S123.
And step S121, performing word segmentation on the interview question text.
Illustratively, the interview question text is encoded by one-hot encoding, i.e., one-hot encoding, and a start character [ CLS ], a separator [ SEP ] between sentences and a separator [ SEP ] at the end of a sentence are added to the encoded interview question text at the beginning of each sentence.
For example, after the text of a question in the text of the face test question is participled, the following steps are performed: [ CLS ] how long [ SEP ] you stay in the company and where [ SEP ] the company likes best.
And step S122, carrying out embedding processing on the participle information, paragraph information and position information of the interview question text after the participle processing.
Illustratively, the Token entries of the segmentation information are vector representations of words, the Segment information Segment entries can assist the BERT model in distinguishing vector representations of different sentences, and the Position information Segment entries can enable the BERT model to learn the input sequential attributes.
And S123, adding the embedding results of the word segmentation information, the paragraph information and the position information to obtain interview question data.
Illustratively, the embedding results of the segmentation information, the paragraph information and the position information are vectors of (1, n, 768), and the embedding results are added element by element to obtain a composite representation of (1, n, 768), which can be used as interview question data and can be used as input representation of the BERT model coding layer.
And S130, calculating a content feature vector for describing the interview content data according to the interview content data based on a BERT model, and calculating a problem feature vector for describing the interview problem text according to the interview problem data.
The bert (bidirectional Encoder representation from transforms) model, i.e. the Encoder for bidirectional conversion (Transformer) (Encoder) aims at pre-training the deep bidirectional representation by jointly adjusting the context in all layers; transformer is a method that relies entirely on self-attention to compute input and output characterizations.
The main innovation of the BERT model is the pre-train method, i.e., the Masked Language Model (MLM) and the Next Sentence Prediction (Next sequence Prediction) are used to capture word and Sentence level representations (representation), respectively.
Masking language model some words in the random masking model input aim at predicting their original vocabulary id based only on the context of the masked words. Unlike left-to-right pre-training of the language model, the training objectives of the occluded language model allow the characterization of the context on both the left and right sides of the fusion, thus pre-training a deep bi-directional Transformer.
Randomly select 15% of the words in the corpus, remove the 15% of the words, e.g., replace the original words with a Mask, and then target the model to correctly predict the replaced words.
In the concrete 15% of the words which are selected to execute the [ mask ] substitute task, only 80% of the words are actually replaced by the [ mask ] mark, 10% of the words are randomly replaced by another word, and in the case of 10%, the word is not changed. This is the specific way of the Masked bidirectional speech model.
Predicting the Next Sentence, namely selecting two sentences under two conditions when the Next sequence Prediction refers to the language model pre-training, wherein one is selecting two sentences which are really connected in sequence in the corpus; the other is that the second sentence throws the color son from the corpus and randomly selects one to be pieced after the first sentence. Besides the Masked language model task, the model additionally performs sentence relation prediction to judge whether the second sentence is a subsequent sentence of the first sentence or not. Adding this task facilitates the downstream sentence relation determination task.
The pretraining of the BERT model is a multi-task process, essentially, a network structure model is designed to be used as a language model task, then a large amount of even inexhaustible natural language texts without labels are utilized, and a large amount of linguistic knowledge is extracted and coded into a network structure by the pretraining task.
Google opens pre-trained BERT-Base and BERT-Large models, and can extract corresponding feature vectors according to texts by calling the pre-trained BERT models, namely the vectors capable of representing semantic features.
In the present embodiment, based on the BERT model, content feature vectors for describing interview content data are extracted from interview content data, and question feature vectors for describing interview question texts are extracted from interview question data.
Illustratively, inputting interview content data into a pre-trained BERT model results in a self-attentive feature representation, i.e., a content feature vector, of the interview content data. The content feature vector comprises a plurality of content feature sub-vectors, which may be represented as
Figure BDA0002097463370000081
Wherein the content of the first and second substances,
Figure BDA0002097463370000082
representing a content feature sub-vector corresponding to the ith word segmentation in the interview content data, namely the ith content feature sub-vector; m represents the number of participles in the interview content data.
Illustratively, inputting interview question data into a pre-trained BERT model results in a self-attentive feature representation of the interview question text, i.e., a question feature vector. The problem feature vector comprises a plurality of problem feature sub-vectors, which can be expressed as
Figure BDA0002097463370000091
Wherein the content of the first and second substances,
Figure BDA0002097463370000092
representing a problem feature sub-vector corresponding to the jth participle in the interview problem data, namely the jth problem feature sub-vector; n represents the number of participles in the interview question data.
In some embodiments, the length or dimension of the subvector corresponding to each participle output by the pre-trained BERT model, i.e. the semantic feature vector, is constant, e.g. there is
Figure BDA0002097463370000093
And
Figure BDA0002097463370000094
where d represents the length of the semantic feature vector output by the pre-trained BERT model, RdRepresenting a d-dimensional vector space. For example, each content feature sub-vector in the content feature vectors obtained in step S130
Figure BDA0002097463370000095
And each problem feature sub-vector in the problem feature vector
Figure BDA0002097463370000096
Are all vectors of dimension 1 × d. Illustratively, d is equal to 768.
And step S140, calculating an attention feature vector according to the content feature vector and the problem feature vector.
Specifically, the attention feature vector represents a degree of attention of the problem feature vector to the content feature vector.
Illustratively, the attention feature vector may be represented as
Figure BDA0002097463370000097
An expression of attention of the interview question data to the interview content data may be expressed. Wherein the content of the first and second substances,
Figure BDA0002097463370000098
the problem feature vector hQTo the content feature vector hCThe ith content feature sub-vector
Figure BDA0002097463370000099
The attention feature sub-vector of (1).
In some embodiments, as shown in fig. 4, step S140 calculates an attention feature vector from the content feature vector and the question feature vector, including steps S141-S143.
Step S141, calculating attention values between each of the content feature sub-vectors and each of the problem feature sub-vectors.
Illustratively, the 1 st content feature sub-vector is first calculated separately
Figure BDA00020974633700000910
And calculating the attention value between the sub-vectors of the problem feature, and then calculating the sub-vector of the 2 nd content feature
Figure BDA00020974633700000911
And calculating the ith content feature sub-vector according to the attention value between each problem feature sub-vector, … …
Figure BDA00020974633700000912
And the attention value between each question feature sub-vector until the m-th content feature sub-vector is calculated
Figure BDA00020974633700000913
And the attention value between each of said problem feature sub-vectors.
In some embodiments, step S141 calculates the attention value between each content feature sub-vector and each question feature sub-vector according to the following formula:
Figure BDA00020974633700000914
αij∝exp(Sij)
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00020974633700000915
representing the ith content feature sub-vector in the content feature vector,
Figure BDA00020974633700000916
represents the jth problem feature sub-vector, S, of the problem feature vectorsijRepresent
Figure BDA0002097463370000101
And with
Figure BDA0002097463370000102
An attention index of between, U represents a weight matrix and has U ∈ Rd×kD represents a diagonal matrix and has D ∈ Rk×kSeLU () represents an activation function; alpha (alpha) ("alpha")ijRepresent
Figure BDA0002097463370000103
And with
Figure BDA0002097463370000104
The value of attention in between, oc represents proportional.
Illustratively, k represents the dimension of the attention hiding layer, and the dimension is a preset hyper-parameter. SeLU () represents a Scaled Exponential Linear unit (Scaled amplified Linear Units) activation function that performs nonlinear activation to introduce self-normalized properties.
For example, the weight values of the corresponding parameters in the weight matrix U and the diagonal matrix D may be empirical values or obtained through training, specifically, a loss function is calculated, and error back propagation is performed to update the weight values.
Calculation of attention index and attention numerical value is to solve the problem characteristic sub-vector
Figure BDA0002097463370000105
And content feature subvectors
Figure BDA0002097463370000106
After mapping by the same weight matrix U and the activation function SeLU (), similarity is calculated by means of vector inner product, and attention indexes between each content feature sub-vector and each problem feature sub-vector are obtained, which may also be referred to as attention weights.
In an exemplary manner, the first and second electrodes are,
Figure BDA0002097463370000107
is a vector with the dimension of 1 x d,
Figure BDA0002097463370000108
by means of
Figure BDA0002097463370000109
Is a d x 1-dimensional vector, and is,
Figure BDA00020974633700001010
and with
Figure BDA00020974633700001011
The value of attention in between aij
Figure BDA00020974633700001012
And with
Figure BDA00020974633700001013
Attention index S betweenijAll are numerical values of 1X 1.
Step S142, calculating an attention feature sub-vector from the question feature vector to the content feature sub-vector according to the attention values between each content feature sub-vector and all question feature sub-vectors in the question feature vector.
In some embodiments, step S142 calculates the attention feature sub-vector from the question feature vector to the content feature sub-vector according to the attention value between each content feature sub-vector and all question feature sub-vectors in the question feature vector, specifically according to the following formula:
Figure BDA00020974633700001014
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00020974633700001015
representing the question feature vector to the ith content feature sub-vector in the content feature vector
Figure BDA00020974633700001016
The attention feature sub-vector of (1).
Ith content feature sub-vector
Figure BDA00020974633700001017
Linearly combined with the attention values between the problem feature sub-vectors in the problem feature vector, the problem feature vector h can be calculatedQTo the content feature vector hCThe ith content feature sub-vector
Figure BDA00020974633700001018
Attention feature subvector of (2)
Figure BDA00020974633700001019
Exemplary, attention feature subvectors
Figure BDA00020974633700001020
Is a 1 × d dimensional vector.
And step S143, combining the attention feature sub-vectors corresponding to all the content feature sub-vectors in the content feature vector to obtain an attention feature vector.
Respectively calculating problem characteristic vectors hQTo the content feature vector hCThe attention feature sub-vector of each content feature sub-vector can be obtained
Figure BDA0002097463370000111
I.e. an expression of attentiveness of the interview question data to the interview content data.
And S150, splicing the attention feature vector and the content feature vector to obtain a content vector with attention information.
Specifically, the content vector includes content feature information in the content feature vector and attention information from interview question data to interview content data in the attention feature vector.
In some embodiments, as shown in fig. 5, step S150 splices the attention feature vector and the content feature vector to obtain a content vector with attention information, including step S151 and step S152.
And step S151, correspondingly splicing the attention feature sub-vectors corresponding to the content feature sub-vectors in the content feature vectors with the content feature sub-vectors to obtain the content sub-vectors with attention information corresponding to the content feature sub-vectors.
Illustratively, attention is directed to a feature vector hQAnd each content feature sub-vectorCorresponding attention feature subvector
Figure BDA0002097463370000112
Spliced to content feature vector hCFeature subvector of each content
Figure BDA0002097463370000113
The head or the tail of the user is obtained, and the content sub-vectors with the attention information corresponding to the content feature sub-vectors are obtained.
Feature vector h due to attentionQThe ith content feature sub-vector in the interview content data corresponds to the ith word segmentation in the interview content data, and the 1 x d-dimensional attention feature sub-vector corresponding to each word segmentation in the interview content data
Figure BDA0002097463370000114
Spliced to content feature vector hCThe 1 x d dimension content feature sub-vector corresponding to each participle
Figure BDA0002097463370000115
Head or tail of (a); obtaining content subvectors with attention information corresponding to each participle in interview content data
Figure BDA0002097463370000116
Illustratively, each content subvector with attention information
Figure BDA0002097463370000117
Is a vector of dimensions 1 x 2 d.
And S152, combining the content sub-vectors corresponding to all the content feature sub-vectors in the content feature vector to obtain the content vector with attention information.
The attention feature vector is spliced with the content feature vector to obtain the content vector with the attention information
Figure BDA0002097463370000118
Content vector
Figure BDA0002097463370000119
Including content subvectors corresponding to the ith participle in interview content data
Figure BDA00020974633700001110
So that the prompt start point and the prompt end point can be determined more accurately in interview content data according to the content vector.
Step S160, determining a prompt starting point and a prompt ending point in the interview content data according to the content vector and the question feature vector, and outputting prompt information according to the text between the prompt starting point and the prompt ending point to prompt an interviewer to ask a question.
And predicting a part of text which is interested by the interviewer in the interview content data, namely the text between the determined prompt starting point and the determined prompt ending point according to the content vector with the attentiveness information of the interviewer and the question feature vector containing the question information of the interviewer.
In some embodiments, as shown in fig. 6, the step S160 of determining a prompt start point and a prompt end point in the interview content data according to the content vector and the question feature vector includes steps S161-S163.
Step S161, weighting and summing the problem feature sub-vectors in the problem feature vector to obtain a problem vector.
Exemplary, problem vectors
Figure BDA0002097463370000121
The question vector includes information describing the entirety of the interview question data.
Illustratively, the weight βjMay be obtained by learning, or may be empirical values, for example, both may be set to 1; for example, according to the order of questions asked by the interviewee, the weight of the question feature sub-vector corresponding to each question asked by the interviewee is sequentially decreased. Exemplary, problem vectors
Figure BDA00020974633700001213
Is 1A d-dimensional vector.
Feature vector h due to attentionQThe ith content feature sub-vector in (1) corresponds to the ith word segmentation in the interview content data, so that the content sub-vector with attention information corresponding to each word segmentation in the interview content data can be used subsequently
Figure BDA0002097463370000122
And problem vector
Figure BDA0002097463370000123
And calculating the probability of each participle as a question section starting word, namely a prompt starting point, and calculating the probability of each participle as a question section ending word, namely a prompt ending point, so as to determine a prompt section for prompting the question of the interviewee from the interview content data.
And step S162, calculating the probability that the participle corresponding to the content characteristic sub-vector in the interview content data is a prompt starting point according to the content sub-vector corresponding to the content characteristic sub-vector and the question vector.
Illustratively, calculating the probability that the participle in the interview content data corresponding to the content feature sub-vector is a prompt starting point is implemented by the following formula:
Figure BDA0002097463370000124
wherein the content of the first and second substances,
Figure BDA0002097463370000125
the problem vector is represented by a vector of the problem,
Figure BDA0002097463370000126
sub-vector representing ith content feature
Figure BDA0002097463370000127
Corresponding content subvectors, Pi STo represent
Figure BDA0002097463370000128
Probability of corresponding word segmentation as prompt starting point, WSA weight matrix is represented.
Exemplary, WSIs a d × 2d weight matrix (parametrized matrix).
By calculating the content sub-vector corresponding to each participle in the interview content data
Figure BDA0002097463370000129
And problem vector
Figure BDA00020974633700001210
The similarity of the corresponding word segmentation is evaluated to be the probability of the starting point of the prompt. For example, the probability that the word with the highest similarity is the prompt starting point is also the largest, and the word may be used as the prompt starting point, that is, the starting word of the question asking interval.
Because the ith word segmentation in the interview content data corresponds to the content sub-vector
Figure BDA00020974633700001211
Carrying the attentiveness information of the interviewer and thus the question vector
Figure BDA00020974633700001212
When the inner product is made, more accurate similarity can be obtained, and then the prompt starting point can be more accurately determined.
Step S163, based on the gate cycle calculation unit neural network, calculating, according to the probability that the participle corresponding to the content feature sub-vector is a prompt start point, the content sub-vector corresponding to the content feature sub-vector, and the problem vector, the probability that the participle corresponding to the content feature sub-vector in the interview content data is a prompt end point.
Illustratively, calculating the probability that the participle corresponding to the content feature sub-vector in the interview content data is a prompt termination point is implemented according to the following formula:
Figure BDA0002097463370000131
Figure BDA0002097463370000132
wherein GRU () represents the process of the neural network of the gate loop computation unit, tQFusion vector, P, representing the output of neural network of gate-cycle computational uniti ERepresent
Figure BDA0002097463370000133
Probability of corresponding word segmentation as a cue termination point, WEA weight matrix is represented.
Exemplary, WEIs a d × d weight matrix (parametrized matrix).
A Gate Recycling Unit (GRU) Neural Network is one of Recycling Neural Networks (RNN); the method is provided for solving the problems of long-term memory, gradient in backward propagation and the like; GRU neural networks are typically used to handle tasks with timing forgetting requirements.
In this embodiment, first, based on the GRU neural network, the fusion vector t is calculatedQAnd then, the probability that each participle in the interview content data is taken as a prompt starting point is fused into the calculation of the probability that each participle is taken as a prompt ending point. For one piece of interview content data and interview question data, only one fusion vector t needs to be calculatedQ
In this embodiment, the problem vector is
Figure BDA0002097463370000134
Inputting the hidden state h into the first GRU unit of GRU neural network, and analyzing the content sub-vector corresponding to each participle in interview content data
Figure BDA0002097463370000135
And the probability P of each participle as a prompt starting pointi SMultiplying the input x by the input x of each GRU unit in the GRU neural network; by a number of GRU units in a GRU neural networkThe content sub-vector corresponding to each participle
Figure BDA0002097463370000136
Probability P of each participle as prompt starting pointi SAnd a problem vector
Figure BDA0002097463370000137
Fusing the two components together; the output of the last GRU unit in the GRU neural network is used as a fusion vector tQIt is a 1 × d vector.
In an exemplary manner, the first and second electrodes are,
Figure BDA0002097463370000138
as an input to the first GRU unit,
Figure BDA0002097463370000139
as an input … … to the second GRU unit.
The parameters to be learned in the neural network based on the gate cycle computing unit, namely the GRU network, can be obtained by back propagation of the error of the loss function. Illustratively, the loss function is a cross-entropy loss function, and the loss function is used for calculating the difference between the probability value of the start label and the probability value of the end label output by the GRU network.
By calculating the content sub-vector corresponding to each participle in the interview content data
Figure BDA0002097463370000141
And the fusion vector tQThe probability that the corresponding word segmentation is the prompt termination point is evaluated according to the similarity. For example, the segmentation word with the highest similarity is the highest probability of being the prompt termination point, and the segmentation word may be used as the termination word of the prompt termination point, that is, the question asking interval.
Because the ith word segmentation in the interview content data corresponds to the content sub-vector
Figure BDA0002097463370000142
Carrying attention information and fusing the vector tQMore accurate similarity can be obtained when inner product is madeAnd further more accurately determine the position of the termination word.
Step S164, determining the word with the maximum probability of the prompt starting point as the prompt starting point, and determining the word with the maximum probability of the prompt ending point as the prompt ending point.
In this embodiment, after the prompt starting point and the prompt ending point are determined in step S160, prompt information is output according to the text between the prompt starting point and the prompt ending point to prompt the interviewer to ask a question.
For example, the text between the participles as the prompt start point and the prompt end point in the interview content data may be output as the prompt information to prompt the interviewee to ask a question, or the prompt information may be generated according to the text between the participles as the prompt start point and the prompt end point in the interview content data, and then the prompt information is output to prompt the interviewee to ask a question.
For example, a text between the interview session data and the interview corpus data acquired in step S110 and the position corresponding to the segmentation of the interview content data serving as the prompt start point and the prompt end point may be output as prompt information to prompt the interviewer to ask a question.
Illustratively, the text between the prompt starting point and the prompt ending point comprises that the main business is to provide diversified financial services and products and take insurance business as a core, and the main business can output the sentence to prompt an interviewee to ask a question; so that the interviewer can develop a question according to the prompt content and know about an interview company; for example, the interviewer may ask the interviewer about "what kinds of risks the company's insurance business has in particular", etc.
The interview question-asking prompting method provided by the embodiment extracts content feature vectors from interview session data and preset interview corpus data through a BERT model, and extracts question feature vectors from interview question data of an interviewer; then, calculating an attention feature vector for representing the attention degree from the problem feature vector to the content feature vector so as to splice the attention information to the content feature vector to obtain a content vector with the attentiveness information of the interviewee; determining a text for prompting in the interview content data according to the content vector and the question feature vector; therefore, the content which is interested by the interviewer is predicted according to the conversation content in the interviewing process, and a prompt is given to the interviewer for asking; moreover, the attention information of the interviewee is fused in the prediction process, and the prediction is more accurate.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an interview question prompting device according to an embodiment of the present application, where the interview question prompting device can be configured in a server or a terminal for executing the interview question prompting method.
As shown in fig. 7, the interview question prompting device includes: the system comprises a content acquisition module 110, a question acquisition module 120, a feature vector calculation module 130, an attention calculation module 140, an attention splicing module 150 and a prompt interval determination module 160.
The content obtaining module 110 is configured to obtain interview session data and preset interview corpus data, and process the interview session data and the interview corpus data into interview content data.
Illustratively, as shown in fig. 8, the content obtaining module 110 includes:
a content word segmentation sub-module 111, configured to perform word segmentation on the interview session data and the interview corpus data;
a content embedding sub-module 112, configured to embed the participle information, paragraph information, and position information of the interview session data and interview corpus data after the participle processing;
and the content data calculation sub-module 113 is configured to add the embedding results of the word segmentation information, the paragraph information, and the position information to obtain interview content data.
The question acquisition module 120 is configured to extract an interview question text for an interviewer to ask a question from the interview session data, and process the interview question text into interview question data.
Illustratively, as shown in fig. 8, the question acquisition module 120 includes:
the question word segmentation submodule 121 is used for performing word segmentation processing on the interview question text;
the question embedding submodule 122 is used for embedding the participle information, paragraph information and position information of the interview question text after the participle processing;
and the question data calculation submodule 123 is configured to add the embedding results of the word segmentation information, the paragraph information, and the position information to obtain interview question data.
And the feature vector calculation module 130 is configured to calculate, based on the BERT model, a content feature vector for describing the interview content data according to the interview content data, and calculate a problem feature vector for describing the interview problem text according to the interview problem data.
An attention calculation module 140 for calculating an attention feature vector according to the content feature vector and the question feature vector.
The attention feature vector represents a degree of attention of the question feature vector to the content feature vector.
Illustratively, the content feature vector comprises a plurality of content feature sub-vectors, and the question feature vector comprises a number of question feature sub-vectors.
Illustratively, as shown in fig. 8, the attention calculation module 140 includes: an attention numerical value calculation submodule 141, an attention feature calculation submodule 142 and an attention vector combination submodule 143.
An attention value calculation submodule 141, configured to calculate an attention value between each content feature sub-vector and each problem feature sub-vector.
Illustratively, the attention value calculating submodule 141 is configured to calculate an attention value between each of the content feature sub-vectors and each of the problem feature sub-vectors according to the following formula:
Figure BDA0002097463370000161
αij∝exp(Sij)
wherein the content of the first and second substances,
Figure BDA0002097463370000162
is shown inThe ith content feature sub-vector in the content feature vector,
Figure BDA0002097463370000163
represents the jth problem feature sub-vector, S, of the problem feature vectorsijRepresent
Figure BDA0002097463370000164
And with
Figure BDA0002097463370000165
An attention index of between, U represents a weight matrix and has U ∈ Rd×kD represents a diagonal matrix and has D ∈ Rk×kSeLU () represents an activation function; alpha (alpha) ("alpha")ijTo represent
Figure BDA0002097463370000166
And
Figure BDA0002097463370000167
the value of attention in between, oc represents proportional.
An attention feature calculation sub-module 142, configured to calculate an attention feature sub-vector of the question feature vector to the content feature sub-vector according to the attention value between each content feature sub-vector and all question feature sub-vectors in the question feature vector.
Illustratively, the attention feature calculation sub-module 142 is configured to calculate an attention feature sub-vector of the problem feature vector to the content feature sub-vector according to the following formula:
Figure BDA0002097463370000168
wherein the content of the first and second substances,
Figure BDA0002097463370000169
representing the question feature vector to the ith content feature sub-vector in the content feature vector
Figure BDA00020974633700001610
The attention feature sub-vector of (1).
The attention vector combination sub-module 143 is configured to combine the attention feature sub-vectors corresponding to all the content feature sub-vectors in the content feature vector to obtain an attention feature vector.
An attention stitching module 150, configured to stitch the attention feature vector and the content feature vector to obtain a content vector with attention information.
Illustratively, as shown in fig. 8, the attention-stitching module 150 includes:
the vector splicing sub-module 151 is configured to splice an attention feature sub-vector corresponding to each content feature sub-vector in the content feature vectors and the content feature sub-vector correspondingly to obtain a content sub-vector with attention information corresponding to each content feature sub-vector;
and a content vector combination sub-module 152, configured to combine, according to the content sub-vectors corresponding to all content feature sub-vectors in the content feature vector, content vectors with attention information.
And the prompt interval determining module 160 is configured to determine a prompt starting point and a prompt ending point in the interview content data according to the content vector and the question feature vector, and output prompt information according to a text between the prompt starting point and the prompt ending point to prompt an interviewer to ask a question.
Illustratively, as shown in fig. 8, the cue interval determination module 160 includes:
a weighted sum sub-module 161, configured to sum the problem feature sub-vectors in the problem feature vector in a weighted manner to obtain a problem vector;
the starting point judgment sub-module 162 is configured to calculate, according to the content sub-vector corresponding to the content feature sub-vector and the problem vector, a probability that a word segmentation corresponding to the content feature sub-vector in the interview content data is a prompt starting point:
Figure BDA0002097463370000171
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002097463370000172
a vector of the problem is represented by a vector of the problem,
Figure BDA0002097463370000173
sub-vector representing ith content feature
Figure BDA0002097463370000174
Corresponding content subvectors, Pi STo represent
Figure BDA0002097463370000175
Probability of corresponding word segmentation as a starting point of the prompt, WSRepresenting a weight matrix;
the termination point judging sub-module 163 is configured to calculate, based on the gate cycle calculating unit neural network, a probability that a participle corresponding to the content feature sub-vector is a prompt start point, a content sub-vector corresponding to the content feature sub-vector, and the problem vector, where the participle corresponding to the content feature sub-vector in the interview content data is a prompt termination point:
Figure BDA0002097463370000176
Figure BDA0002097463370000177
wherein GRU () represents the process of the neural network of the gate loop computation unit, tQFusion vector, P, representing the output of the neural network of the gate-cycle computational uniti ETo represent
Figure BDA0002097463370000178
Probability of corresponding word segmentation as a prompt termination point, WERepresenting a weight matrix;
the word segmentation determining sub-module 164 is configured to determine a word segment with the maximum probability of the prompt starting point as a prompt starting point, and determine a word segment with the maximum probability of the prompt ending point as a prompt ending point.
It should be noted that, as will be clear to those skilled in the art, for convenience and brevity of description, the specific working processes of the apparatus, the modules and the units described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The methods, apparatus, and devices of the present application are operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
For example, the method and apparatus described above may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 9.
Referring to fig. 9, fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure. The computer device may be a server or a terminal.
Referring to fig. 9, the computer device includes a processor, a memory and a network interface connected by a system bus, wherein the memory may include a nonvolatile storage medium and an internal memory.
The non-volatile storage medium may store an operating system and a computer program. The computer program includes program instructions that, when executed, cause a processor to perform any one of the interview question prompting methods.
The processor is used to provide computing and control capabilities to support the operation of the entire computer device.
The internal memory provides an environment for the execution of a computer program on a non-volatile storage medium, which when executed by the processor causes the processor to perform any one of the interview question prompting methods.
The network interface is used for network communication, such as sending assigned tasks and the like. It will be appreciated by those skilled in the art that the configuration of the computer apparatus is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computer apparatus to which the present application may be applied, and that a particular computer apparatus may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
It should be understood that the Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Wherein, in one embodiment, the processor is configured to execute the computer program stored in the memory to perform the steps of: acquiring interview session data and preset interview corpus data, and processing the interview session data and the interview corpus data into interview content data; extracting interview question texts asked by interviewers from the interview session data, and processing the interview question texts into interview question data; based on a BERT model, calculating a content feature vector for describing the interview content data according to the interview content data, and calculating a problem feature vector for describing the interview problem text according to the interview problem data; calculating an attention feature vector from the content feature vector and a question feature vector, the attention feature vector representing a degree of attention of the question feature vector to the content feature vector; splicing the attention feature vector and the content feature vector to obtain a content vector with attention information; and determining a prompt starting point and a prompt ending point in the interview content data according to the content vector and the question feature vector, and outputting prompt information according to the text between the prompt starting point and the prompt ending point to prompt an interviewer to ask a question.
Illustratively, the processor is configured to, when processing the interview session data and the interview corpus data into interview content data, implement: performing word segmentation processing on the interview session data and the interview corpus data; embedding the participle information, paragraph information and position information of the interview session data and interview corpus data after the participle processing; and adding the embedding results of the word segmentation information, the paragraph information and the position information to obtain interview content data.
Illustratively, the processor, when being configured to implement processing the interview question text into interview question data, is configured to implement: performing word segmentation processing on the interview question text; embedding the word segmentation information, paragraph information and position information of the interview question text after word segmentation processing; and adding the embedding results of the word segmentation information, the paragraph information and the position information to obtain interview question data.
Illustratively, the content feature vector comprises a plurality of content feature sub-vectors, and the question feature vector comprises a number of question feature sub-vectors. The processor, when implemented, is configured to implement:
illustratively, the processor, when being configured to compute an attention feature vector from the content feature vector and the problem feature vector, is configured to: calculating an attention value between each content feature sub-vector and each question feature sub-vector; calculating attention feature sub-vectors from the question feature vectors to the content feature sub-vectors according to attention values between the content feature sub-vectors and all question feature sub-vectors in the question feature vectors; and combining the attention feature sub-vectors corresponding to all the content feature sub-vectors in the content feature vector to obtain the attention feature vector.
Illustratively, the processor is configured to, when calculating the attention value between each content feature sub-vector and each question feature sub-vector, specifically calculate according to the following formula:
Figure BDA0002097463370000191
αij∝exp(Sij)
wherein the content of the first and second substances,
Figure BDA0002097463370000192
representing the ith content feature sub-vector in the content feature vector,
Figure BDA0002097463370000193
represents the jth problem feature sub-vector, S, of the problem feature vectorsijTo represent
Figure BDA0002097463370000201
And with
Figure BDA0002097463370000202
An attention index of between, U represents a weight matrix and has U ∈ Rd×kD represents a diagonal matrix and has D ∈ Rk×kSeLU () represents an activation function; alpha is alphaijRepresent
Figure BDA0002097463370000203
And
Figure BDA0002097463370000204
the value of attention in between, oc represents proportional.
For example, when the processor is configured to calculate the attention feature sub-vector from the question feature vector to the content feature sub-vector according to the attention value between each content feature sub-vector and all question feature sub-vectors in the question feature vector, the calculation is specifically performed according to the following formula:
Figure BDA0002097463370000205
wherein the content of the first and second substances,
Figure BDA0002097463370000206
representing the question feature vector to the ith content feature sub-vector in the content feature vector
Figure BDA0002097463370000207
The attention feature sub-vector of (1).
Illustratively, the processor is configured to, when the attention feature vector and the content feature vector are spliced to obtain a content vector with attention information, implement: correspondingly splicing the attention feature sub-vector corresponding to each content feature sub-vector in the content feature vectors with the content feature sub-vectors to obtain content sub-vectors with attention information corresponding to each content feature sub-vector; and combining to obtain the content vector with attention information according to the content sub-vectors corresponding to all the content feature sub-vectors in the content feature vector.
Illustratively, the processor, when determining a prompt start point and a prompt end point in the interview content data according to the content vector and the question feature vector, is configured to:
weighting and summing all problem feature sub-vectors in the problem feature vector to obtain a problem vector;
calculating the probability that the participle corresponding to the content feature subvector in the interview content data is a prompt starting point according to the content subvector corresponding to the content feature subvector and the problem vector:
Figure BDA0002097463370000208
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002097463370000209
a vector of the problem is represented by a vector of the problem,
Figure BDA00020974633700002010
sub-vector representing ith content feature
Figure BDA00020974633700002011
Corresponding content subvector, Pi STo represent
Figure BDA00020974633700002012
Probability of corresponding word segmentation as prompt starting point, WSRepresenting a weight matrix;
based on a gate cycle calculation unit neural network, calculating the probability that the participle corresponding to the content feature sub-vector in the interview content data is a prompt termination point according to the probability that the participle corresponding to the content feature sub-vector is a prompt initiation point, the content sub-vector corresponding to the content feature sub-vector, and the problem vector:
Figure BDA00020974633700002013
Figure BDA00020974633700002014
wherein GRU () represents the process of the gate loop computation unit neural network, tQFusion vector, P, representing the output of neural network of gate-cycle computational uniti ETo represent
Figure BDA0002097463370000211
Probability of corresponding word segmentation as a cue termination point, WERepresenting a weight matrix.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application, such as:
a computer-readable storage medium, where a computer program is stored, where the computer program includes program instructions, and a processor executes the program instructions to implement any one of the interview question prompting methods provided in the embodiments of the present application.
The computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the computer device.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. An interview question prompting method is characterized by comprising the following steps:
acquiring interview session data and preset interview corpus data, and processing the interview session data and the interview corpus data into interview content data;
extracting interview question texts asked by interviewers from the interview session data, and processing the interview question texts into interview question data;
based on a BERT model, calculating a content feature vector for describing the interview content data according to the interview content data, and calculating a problem feature vector for describing the interview problem text according to the interview problem data;
computing an attention feature vector from the content feature vector and a question feature vector, the attention feature vector representing a degree of attention of the question feature vector to the content feature vector;
splicing the attention feature vector and the content feature vector to obtain a content vector with attention information;
and determining a prompt starting point and a prompt ending point in the interview content data according to the content vector and the question feature vector, and outputting prompt information according to the text between the prompt starting point and the prompt ending point to prompt an interviewer to ask a question.
2. The interview question prompting method of claim 1, wherein: the processing the interview session data and the interview corpus data into interview content data comprises:
performing word segmentation processing on the interview session data and the interview corpus data;
embedding the participle information, paragraph information and position information of the interview session data and interview corpus data after the participle processing;
adding the embedding results of the word segmentation information, the paragraph information and the position information to obtain interview content data;
the processing of the interview question text into interview question data comprises:
performing word segmentation on the interview question text;
embedding the word segmentation information, paragraph information and position information of the interview question text after word segmentation processing;
and adding the embedding results of the word segmentation information, the paragraph information and the position information to obtain interview question data.
3. The interview question prompting method of claim 2, wherein: the content feature vector comprises a plurality of content feature sub-vectors, and the question feature vector comprises a plurality of question feature sub-vectors;
said computing an attention feature vector from said content feature vector and a problem feature vector, comprising:
calculating an attention value between each of the content feature sub-vectors and each of the problem feature sub-vectors;
calculating attention feature sub-vectors from the question feature vectors to the content feature sub-vectors according to attention values between the content feature sub-vectors and all question feature sub-vectors in the question feature vectors;
and combining the attention feature sub-vectors corresponding to all the content feature sub-vectors in the content feature vector to obtain the attention feature vector.
4. The interview question prompting method of claim 3, wherein: the calculating of the attention value between each content feature sub-vector and each problem feature sub-vector is specifically performed according to the following formula:
Figure FDA0002097463360000021
αij∝exp(Sij)
wherein the content of the first and second substances,
Figure FDA00020974633600000210
representing the ith content feature sub-vector in the content feature vector,
Figure FDA0002097463360000022
represents the jth problem feature sub-vector, S, of the problem feature vectorsijTo represent
Figure FDA0002097463360000023
And
Figure FDA0002097463360000024
the attention index between U represents the weight matrix and has U ∈ Rd×kD represents a diagonal matrix and has D ∈ Rk×kSeLU () represents an activation function; alpha is alphaijRepresent
Figure FDA0002097463360000025
And with
Figure FDA0002097463360000026
The value of degree of attention between them, oc means proportional to.
5. The interview question prompting method of claim 4, wherein: the attention feature sub-vector from the question feature vector to the content feature sub-vector is calculated according to the attention value between each content feature sub-vector and all the question feature sub-vectors in the question feature vector, specifically according to the following formula:
Figure FDA0002097463360000027
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0002097463360000028
representing the question feature vector to the ith content feature sub-vector in the content feature vector
Figure FDA0002097463360000029
The attention feature sub-vector of (1).
6. The interview question prompting method of claim 5, wherein: the splicing the attention feature vector and the content feature vector to obtain a content vector with attention information includes:
correspondingly splicing the attention feature sub-vector corresponding to each content feature sub-vector in the content feature vectors with the content feature sub-vectors to obtain the content sub-vectors with the attention information corresponding to each content feature sub-vector;
and combining to obtain the content vector with the attention information according to the content sub-vectors corresponding to all the content feature sub-vectors in the content feature vector.
7. The interview question prompting method of claim 6, wherein: determining a prompt starting point and a prompt ending point in the interview content data according to the content vector and the question feature vector, including:
weighting and summing all problem feature sub-vectors in the problem feature vector to obtain a problem vector;
calculating the probability that the participle corresponding to the content feature sub-vector in the interview content data is a prompt starting point according to the content sub-vector corresponding to the content feature sub-vector and the problem vector:
Figure FDA0002097463360000031
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0002097463360000032
the problem vector is represented by a vector of the problem,
Figure FDA0002097463360000033
sub-vector representing ith content feature
Figure FDA0002097463360000034
Corresponding content subvector, Pi SRepresent
Figure FDA0002097463360000035
Probability of corresponding word segmentation as a starting point of the prompt, WSRepresenting a weight matrix;
based on a gate cycle calculation unit neural network, calculating the probability that the participle corresponding to the content feature sub-vector in the interview content data is a prompt termination point according to the probability that the participle corresponding to the content feature sub-vector is a prompt initiation point, the content sub-vector corresponding to the content feature sub-vector and the problem vector:
Figure FDA0002097463360000036
Figure FDA0002097463360000037
wherein GRU () represents the process of the gate loop computation unit neural network, tQFusion vector, P, representing the output of the neural network of the gate-cycle computational uniti ERepresent
Figure FDA0002097463360000038
Probability of corresponding word segmentation as a prompt termination point, WERepresenting a weight matrix;
and determining the participle with the maximum probability of the prompt starting point as the prompt starting point, and determining the participle with the maximum probability of the prompt ending point as the prompt ending point.
8. An interview question prompting device, comprising:
the content acquisition module is used for acquiring interview session data and preset interview corpus data and processing the interview session data and the interview corpus data into interview content data;
the question acquisition module is used for extracting an interview question text asked by an interviewer from the interview session data and processing the interview question text into interview question data;
the feature vector calculation module is used for calculating a content feature vector for describing the interview content data according to the interview content data based on a BERT model and calculating a problem feature vector for describing the interview problem text according to the interview problem data;
an attention calculation module for calculating an attention feature vector from the content feature vector and a question feature vector, the attention feature vector representing a degree of attention of the question feature vector to the content feature vector;
the attention splicing module is used for splicing the attention feature vector and the content feature vector to obtain a content vector with attention information;
and the prompt interval determining module is used for determining a prompt starting point and a prompt ending point in the interview content data according to the content vector and the question feature vector, and outputting prompt information according to the text between the prompt starting point and the prompt ending point so as to prompt an interviewer to ask a question.
9. A computer device, wherein the computer device comprises a memory and a processor;
the memory is used for storing a computer program;
the processor, configured to execute the computer program and to implement the interview question prompting method according to any one of claims 1-7 when the computer program is executed.
10. A computer-readable storage medium storing a computer program, characterized in that: if the computer program is executed by a processor, the interview question prompting method according to any one of claims 1-7 is implemented.
CN201910523564.6A 2019-06-17 2019-06-17 Interview question prompting method and device, computer equipment and storage medium Active CN110399472B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910523564.6A CN110399472B (en) 2019-06-17 2019-06-17 Interview question prompting method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910523564.6A CN110399472B (en) 2019-06-17 2019-06-17 Interview question prompting method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110399472A CN110399472A (en) 2019-11-01
CN110399472B true CN110399472B (en) 2022-07-15

Family

ID=68323210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910523564.6A Active CN110399472B (en) 2019-06-17 2019-06-17 Interview question prompting method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110399472B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274362B (en) * 2020-02-01 2021-09-03 武汉大学 Dialogue generation method based on transformer architecture
CN113496122A (en) * 2020-04-08 2021-10-12 中移(上海)信息通信科技有限公司 Named entity identification method, device, equipment and medium
CN111538809B (en) * 2020-04-20 2021-03-16 马上消费金融股份有限公司 Voice service quality detection method, model training method and device
CN111694936B (en) * 2020-04-26 2023-06-06 平安科技(深圳)有限公司 Method, device, computer equipment and storage medium for identification of AI intelligent interview

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6684188B1 (en) * 1996-02-02 2004-01-27 Geoffrey C Mitchell Method for production of medical records and other technical documents
CN105224278A (en) * 2015-08-21 2016-01-06 百度在线网络技术(北京)有限公司 Interactive voice service processing method and device
CN106777013A (en) * 2016-12-07 2017-05-31 科大讯飞股份有限公司 Dialogue management method and apparatus
CN108306814A (en) * 2017-08-11 2018-07-20 腾讯科技(深圳)有限公司 Information-pushing method, device, terminal based on instant messaging and storage medium
CN109086303A (en) * 2018-06-21 2018-12-25 深圳壹账通智能科技有限公司 The Intelligent dialogue method, apparatus understood, terminal are read based on machine
CN109241251A (en) * 2018-07-27 2019-01-18 众安信息技术服务有限公司 A kind of session interaction method
CN109815318A (en) * 2018-12-24 2019-05-28 平安科技(深圳)有限公司 The problems in question answering system answer querying method, system and computer equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9378486B2 (en) * 2014-03-17 2016-06-28 Hirevue, Inc. Automatic interview question recommendation and analysis
US11087199B2 (en) * 2016-11-03 2021-08-10 Nec Corporation Context-aware attention-based neural network for interactive question answering

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6684188B1 (en) * 1996-02-02 2004-01-27 Geoffrey C Mitchell Method for production of medical records and other technical documents
CN105224278A (en) * 2015-08-21 2016-01-06 百度在线网络技术(北京)有限公司 Interactive voice service processing method and device
CN106777013A (en) * 2016-12-07 2017-05-31 科大讯飞股份有限公司 Dialogue management method and apparatus
CN108306814A (en) * 2017-08-11 2018-07-20 腾讯科技(深圳)有限公司 Information-pushing method, device, terminal based on instant messaging and storage medium
CN109086303A (en) * 2018-06-21 2018-12-25 深圳壹账通智能科技有限公司 The Intelligent dialogue method, apparatus understood, terminal are read based on machine
CN109241251A (en) * 2018-07-27 2019-01-18 众安信息技术服务有限公司 A kind of session interaction method
CN109815318A (en) * 2018-12-24 2019-05-28 平安科技(深圳)有限公司 The problems in question answering system answer querying method, system and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于E-learning用户行为的学习资源推荐系统;付芬;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180415;第I138-3633页 *

Also Published As

Publication number Publication date
CN110399472A (en) 2019-11-01

Similar Documents

Publication Publication Date Title
CN110399472B (en) Interview question prompting method and device, computer equipment and storage medium
CN110413746B (en) Method and device for identifying intention of user problem
CN111783474B (en) Comment text viewpoint information processing method and device and storage medium
CN107844481B (en) Text recognition error detection method and device
CN108595436B (en) Method and system for generating emotional dialogue content and storage medium
CN109344242B (en) Dialogue question-answering method, device, equipment and storage medium
KR102315830B1 (en) Emotional Classification Method in Dialogue using Word-level Emotion Embedding based on Semi-Supervised Learning and LSTM model
CN111625634A (en) Word slot recognition method and device, computer-readable storage medium and electronic device
CN112699686B (en) Semantic understanding method, device, equipment and medium based on task type dialogue system
CN110990555B (en) End-to-end retrieval type dialogue method and system and computer equipment
CN113239169A (en) Artificial intelligence-based answer generation method, device, equipment and storage medium
CN112417855A (en) Text intention recognition method and device and related equipment
CN112395887A (en) Dialogue response method, dialogue response device, computer equipment and storage medium
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN113918813A (en) Method and device for recommending posts based on external knowledge in chat record form
CN116341651A (en) Entity recognition model training method and device, electronic equipment and storage medium
CN114239607A (en) Conversation reply method and device
CN113886548A (en) Intention recognition model training method, recognition method, device, equipment and medium
CN110795531B (en) Intention identification method, device and storage medium
CN111241843B (en) Semantic relation inference system and method based on composite neural network
CN116702765A (en) Event extraction method and device and electronic equipment
CN114792097B (en) Method and device for determining prompt vector of pre-training model and electronic equipment
CN112818688B (en) Text processing method, device, equipment and storage medium
CN113779244B (en) Document emotion classification method and device, storage medium and electronic equipment
CN113033213A (en) Method and device for analyzing text information by using attention model and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant