CN110399472A

CN110399472A - Reminding method, device, computer equipment and storage medium are putd question in interview

Info

Publication number: CN110399472A
Application number: CN201910523564.6A
Authority: CN
Inventors: 金戈; 徐亮
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-06-17
Filing date: 2019-06-17
Publication date: 2019-11-01
Anticipated expiration: 2039-06-17
Also published as: CN110399472B

Abstract

This application involves voice semantic domain, the interested content of interviewee is predicted according to the session content in interview process by fusion interviewee's attention information realization, gives and prompts for the question of interviewee.It specifically discloses a kind of interview and puts question to reminding method, device, computer equipment and storage medium, this method comprises: obtaining interview session data and interview corpus data to obtain interview content-data；The interview question text of interviewee's enquirement is extracted to obtain interview question data；Based on BERT model, the corresponding content feature vector of interview content-data is calculated according to interview content-data, calculates corresponding problem characteristic vector according to interview question data；Attention feature vector is calculated according to content feature vector and problem characteristic vector；Attention feature vector and content feature vector are spliced to obtain the content vector with force information is paid attention to；Prompt information is determined in interview content-data according to content vector sum problem characteristic vector, to prompt interviewee to put question to.

Description

Reminding method, device, computer equipment and storage medium are putd question in interview

Technical field

This application involves natural language processing technique fields more particularly to a kind of interview to put question to reminding method, device, calculating Machine equipment and storage medium.

Background technique

In interview, interviewer can allow interviewee to oneself proposing a problem before terminating interview, and interviewee can be with Understand the details of company and position etc. whereby；At present generally asked a question by interviewee according to cut-and-dried problem or by Interviewer actively introduces interviewee may interested content.

Although the question and answer between interviewee may be implemented in existing artificial intelligence interview exam system, but can not be by interviewing Session content in journey accurately recommends the interested content of interviewee, thus cannot accurately provide for the enquirement of interviewee Show, reduces the confidence level of artificial intelligence interview exam system.

Summary of the invention

The embodiment of the present application provides a kind of interview and puts question to reminding method, device, computer equipment and storage medium, can be compared with It realizes goodly and the interested content of interviewee is predicted according to the session content in interview process, give and mention for the question of interviewee Show.

In a first aspect, this application provides a kind of interviews to put question to reminding method, which comprises

Interview session data and preset interview corpus data are obtained, by the interview session data and the interview corpus Data processing is interview content-data；

The interview question text that interviewee puts question to is extracted from the interview session data, at the interview question text Reason is interview question data；

Based on BERT model, it is special that the content for describing the interview content-data is calculated according to the interview content-data The problem of levying vector, being calculated according to the interview question data for describing interview question text feature vector；

Calculate attention feature vector according to the content feature vector and problem characteristic vector, the attention feature to Amount indicates described problem feature vector to the attention degree of the content feature vector；

The attention feature vector and the content feature vector are spliced, obtain with pay attention to the content of force information to Amount；

Prompt starting point is determined in the interview content-data according to the content vector sum described problem feature vector With prompt terminating point, starting point and the text output prompt information prompted between terminating point are prompted according to described, with prompt Interviewee puts question to.

Second aspect, this application provides a kind of interviews to put question to suggestion device, and described device includes:

Content obtains module, for obtaining interview session data and preset interview corpus data, by the interview session Data and interview corpus data processing are interview content-data；

Problem obtains module, the interview question text putd question to for extracting interviewee from the interview session data, will The interview question text-processing is interview question data；

Feature vector computing module is calculated according to the interview content-data for describing for being based on BERT model The content feature vector for stating interview content-data is calculated according to the interview question data for describing the interview question text The problem of feature vector；

Attention computing module, for according to the content feature vector and problem characteristic vector calculate attention feature to Amount, the attention feature vector indicate described problem feature vector to the attention degree of the content feature vector；

Attention splicing module obtains band for splicing the attention feature vector and the content feature vector There is the content vector for paying attention to force information；

Section determining module is prompted, is used for according to the content vector sum described problem feature vector in the interview content Prompt starting point and prompt terminating point are determined in data, according to the text between the prompt starting point and the prompt terminating point Prompt information is exported, to prompt interviewee to put question to.

The third aspect, this application provides a kind of computer equipment, the computer equipment includes memory and processor； The memory is for storing computer program；The processor, by executing the computer program and based on execution is described Realize that reminding method is putd question in above-mentioned interview when calculation machine program.

Fourth aspect, this application provides a kind of computer readable storage medium, the computer readable storage medium is deposited Computer program is contained, if the computer program is executed by processor, realizes that reminding method is putd question in above-mentioned interview.

This application discloses a kind of interviews to put question to reminding method, device, equipment and storage medium, by BERT model from face Content feature vector is extracted in examination session data and default interview corpus data, and is extracted from the interview question data of interviewee Problem characteristic vector；Then the attention feature for indicating problem characteristic vector to content feature vector attention degree is calculated Vector will notice that force information is spliced to content feature vector and obtains the content vector for paying attention to force information with interviewee；Later The text for making prompt is determined in the interview content-data according to the content vector sum problem characteristic vector；To real Show and the interested content of interviewee is predicted according to the session content in interview process, has given and prompt for the question of interviewee；And And predict the Process fusion attention force information of interviewee, it predicts more accurate.

Detailed description of the invention

Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to required use in embodiment description Attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is some embodiments of the present application, for this field For those of ordinary skill, without creative efforts, it is also possible to obtain other drawings based on these drawings.

Fig. 1 is that the flow diagram of reminding method is putd question in the interview of one embodiment of the application；

Fig. 2 is the sub-process schematic diagram that interview content-data is obtained in Fig. 1；

Fig. 3 is to obtain the sub-process schematic diagram of interview question data in Fig. 1；

Fig. 4 is the sub-process schematic diagram that attention feature vector is calculated in Fig. 1；

Fig. 5 is that splicing obtains the sub-process schematic diagram with the content vector for paying attention to force information in Fig. 1；

Fig. 6 is to determine prompt starting point in Fig. 1 and prompt the sub-process schematic diagram of terminating point；

Fig. 7 is the structural schematic diagram that suggestion device is putd question in the interview that one embodiment of the application provides；

Fig. 8 is the structural schematic diagram that suggestion device is putd question in the interview that another embodiment of the application provides；

Fig. 9 is a kind of structural schematic diagram for computer equipment that one embodiment of the application provides.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Based on this Shen Please in embodiment, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall in the protection scope of this application.

Flow chart shown in the drawings only illustrates, it is not necessary to including all content and operation/step, also not It is that must be executed by described sequence.For example, some operation/steps can also decompose, combine or partially merge, therefore practical The sequence of execution is possible to change according to the actual situation.In addition, though the division of functional module has been carried out in schematic device, But in some cases, it can be divided with the module being different from schematic device.

Embodiments herein provides a kind of interview and puts question to reminding method, device, equipment and storage medium.Wherein, should Interview puts question to reminding method to can be applied in terminal or server, predicts face according to the session content in interview process to realize The interested content of examination person is given and is prompted for the question of interviewee.

For example, interview puts question to reminding method to be used for server, it is of course possible to be used for terminal, such as mobile phone, notebook, desk-top Machine etc..But in order to make it easy to understand, following embodiment will put question to reminding method to describe in detail with the interview for being applied to server.

With reference to the accompanying drawing, it elaborates to some embodiments of the application.In the absence of conflict, following Feature in embodiment and embodiment can be combined with each other.

Referring to Fig. 1, Fig. 1 is the flow diagram that reminding method is putd question in a kind of interview that embodiments herein provides.

As shown in Figure 1, interview puts question to reminding method to include the following steps S110- step S160.

Step S110, interview session data and preset interview corpus data are obtained, by the interview session data and institute The processing of interview corpus data is stated as interview content-data.

In some embodiments, interview session data is obtained in interviewee's interview process.Illustratively, meeting is interviewed Words data include the information of session between interviewee and interviewer.Interviewer can be people or interview robot.Session can wrap The session of speech form and/or textual form is included, wherein the session of speech form can be text shape by speech recognition conversion The session of formula.

Illustratively, interview session data includes information, the information of interviewee's answer that interviewer puts question to；Can also include The information of information, interviewer's answer that interviewee puts question to.

Illustratively, interview session data includes the enquirement data and answer having occurred and that between the interviewee and interviewer Data.Such as interview session data includes:

Do interviewer puts question to 1: you have anything to like do you like what does at odd times

Interviewee answers 1: I, which misses potter, plays basketball, because a kind of team spirit can be cultivated by playing basketball.

Do interviewer puts question to 2: you feel when pressure maximum when to be

Interviewee answers 2: I feels when be that I will do several things simultaneously when pressure maximum, such as, Large-scale activity is organized when soon final examination, this when, my pressure was especially big.

Interviewee put question to 1: you company waited for how long, where most like company.

Interviewer answers 1: ....

……。

Illustratively, interview corpus data presets and stores, and interview corpus data includes inquiring for interviewee Information etc..

Illustratively, for interviewee inquiry interview corpus data include the information of company's situation, post description information, Treatment information etc..

For example, it is one in Hong Kong Exchange and Shanghai Stock Exchange's listing that interview corpus data, which includes: * * company, Financing corporation.Main business is to provide diversification financial service and product, and using insurance business as core；* company was in 1988 It is set up in Shenzhen Shekou, is China's first hand share-holding system insurance business.On March 1st, 2007, * * company is in Shanghai Stock Exchange A-share listing.On May 11st, 2007, Hang Seng Index service company announce that Heng Seng Index composition is added in * * company from June 4th, 2007 Stock ....

In some embodiments, interview corpus data and interview session data are arranged with preset format and is stored to obtain Interview content-data.For example, interview corpus data is stored in after interview session data to obtain interview content-data.

In other embodiments, it is carried out after can also being pre-processed to interview corpus data and interview session data Storage obtains interview content-data.

Illustratively, as shown in Fig. 2, in step S110 will be at the interview session data and the interview corpus data Reason is interview content-data, including step S111- step S113.

Step S111, word segmentation processing is carried out to the interview session data and the interview corpus data.

Illustratively, using one-hot coding, i.e. one-hot coding to interview session data and the interview corpus data into Row coding, and starting character is added in the beginning of each sentence text in interview session data and the interview corpus data in encoded [CLS] adds separator [SEP] between sentence, ends up addition separator [SEP] in sentence.

For example, interview session data in certain put question to text segmented after are as follows: [CLS] you company waited for how long Where [SEP] most likes company [SEP].

Step S112, by the interview session data after word segmentation processing and interview the participle information of corpus data, paragraph information, Location information carries out insertion processing.

Wherein, participle information being subjected to insertion processing, obtained insertion result Token Embeddings is term vector, the One word is CLS mark, the prediction task after can be used for.

In the present embodiment, each word (token) interview session data and interview corpus data segmented Token embedding layers is sent into which each word is converted into vector form.

Illustratively, the token embedding layers of vector that each word is converted into fixed dimension, for example, by each word The vector for being converted into 768 dimensions indicates.

Paragraph information is subjected to insertion processing, obtained insertion result Segment Embeddings is used to distinguish interview meeting It talks about data and interviews sentence different in corpus data.

The insertion result Position Embeddings study of location information obtains.For example, BERT model can be located Manage the list entries of 512 words (token) of longest.It will by allowing BERT model above to learn a vector expression at various locations The information coding of sequence order is come in.This means that Embeddings layers of Position be actually size be (512, 768) inquiry (lookup) table, the first row of table are to represent first position of first sequence, and the second row represents sequence Second position, and so on.

Specifically, first word (token) of the answer text after each word segmentation processing is special sort insertion always, i.e., Starting character [CLS].Corresponding to the final hidden state of the starting character, i.e. the output of Transformer is used as classification task Polymeric sequence indicates.

Step S113, by the participle information, paragraph information, the insertion results added of location information to obtain interview content Data.

Illustratively, the insertion result Token Embeddings for segmenting information is the vector expression of word, paragraph information Insertion result Segment Embeddings can assist BERT model distinguish different sentences vector indicate, location information it is embedding Enter the sequential attribute that result Position Embeddings can make BERT model learning to input.

Illustratively, the vector that information, paragraph information, the insertion result of location information are (1, n, 768) is segmented, these Result is embedded in be added by element, obtaining the synthesis that a size is (1, n, 768) indicates, it can be used as interview content-data, this One synthesis indicates to can be used as the input (input representation) of BERT model based coding layer.

Step S120, the interview question text that interviewee puts question to is extracted from the interview session data, by the interview Question text processing is interview question data.

In the present embodiment, the interview session data that step S110 is obtained includes the information of interviewee's enquirement, therefore can be with The interview question text that interviewee puts question to is extracted from interview session data.

In other embodiments, interviewee put question to interview question text be obtained in interview process, such as Text formatting can be converted by the asked questions of interviewee by speech recognition.

Illustratively, the interview question text of interviewee includes:

……。

In some way of example, interview question text is arranged and stored with preset format, such as is sent out according to problem Raw time sequencing is arranged and is stored, and obtains interview question data.

In other embodiments, it stores, is interviewed after can also being pre-processed to interview question text Problem data.

Illustratively, as shown in figure 3, in step S120 by the interview question text-processing be interview question data packet Include step S121- step S123.

Step S121, word segmentation processing is carried out to the interview question text.

Illustratively, using one-hot coding, i.e. one-hot coding encodes interview question text, and in encoded Interview question text in the beginning of each sentence text add starting character [CLS], separator [SEP] is added between sentence, in sentence knot Tail adds separator [SEP].

For example, in interview question text certain put question to text segment after are as follows: [CLS] you company waited for how long Where [SEP] most likes company [SEP].

Step S122, the participle information, paragraph information, location information of the interview question text after word segmentation processing are carried out embedding Enter processing.

Step S123, by the participle information, paragraph information, the insertion results added of location information to obtain interview question Data.

Illustratively, the vector that information, paragraph information, the insertion result of location information are (1, n, 768) is segmented, these Result is embedded in be added by element, obtaining the synthesis that a size is (1, n, 768) indicates, it can be used as interview question data, this One synthesis indicates to can be used as the input (input representation) of BERT model based coding layer.

Step S130, it is based on BERT model, is calculated according to the interview content-data for describing the interview content number According to content feature vector, according to the interview question data calculate for describing the interview question text the problem of feature to Amount.

BERT (Bidirectional Encoder Representations from Transformers) model, i.e., The encoder (Encoder) of bi-directional conversion (Transformer) is intended to adjust the context in all layers by joint come in advance The training two-way expression of depth；Transformer is that a kind of place one's entire reliance upon calculates the side of input and output characterization from attention Method.

The main innovation point of BERT model has used masking language model in pre-training (pre-train) method (masked language model, MLM) and next prediction (Next Sentence Prediction) two methods difference Capture the expression (representation) of word and sentence level.

Some words in the random shadow model input of language model are covered, the context for being based only upon masking word is aimed at Predict its original vocabulary id.It is different from language model pre-training from left to right, cover the training objective permission table of language model The context of the sign fusion left and right sides, thus one two-way Transformer of depth of pre-training.

The word for randomly choosing in corpus 15%, removes 15% word, such as replace original list with [Mask] mask Word, then using the correctly predicted superseded word of model as target.

In the specific 15% selected word that execute this task of [mask] scapegoat, only 80% is really replaced At [mask] mark, 10% by random replacement at another word, this word of 10% situation is not changed.Here it is The specific practice of Masked double-directional speech model.

Next prediction, i.e., when Next Sentence Prediction refers to doing language model pre-training, point Two kinds of situations select two sentences, and one is two sentences that really sequence is connected in selection corpus；Another is second Sentence throws dice from corpus, and random selection one is spliced to behind first sentence.Model is in addition to doing above-mentioned Masked language It says outside model tasks, it is subsidiary to do a sentence Relationship Prediction again, judge second sentence whether really after first sentence Continuous sentence.Increase this task and facilitate downstream sentence relationship and judges task.

The pre-training of BERT model is a multitask process, and pre-training is substantially by the way that a network structure mould designed Type does language model task, then the natural language text of largely even endless no mark is used, pre- to instruct White silk task, which extracts a large amount of linguistic knowledges, to be encoded in network structure.

Google has opened BERT-Base the and BERT-Large model of pre-training, can pass through the BERT of calling pre-training Model realization extracts corresponding feature vector according to text, can indicate the vector of semantic feature.

In the present embodiment, it is based on BERT model, is extracted according to interview content-data for describing interview content-data Content feature vector, and according to interview question data extract for describing interview question text the problem of feature vector.

Illustratively, by the BERT model for interviewing content-data input pre-training obtain interview content-data from attention Character representation, i.e. content feature vector.The content feature vector includes multiple content characteristic subvectors, and content feature vector can To be expressed asWherein,Indicate the corresponding content characteristic of i-th of participle in interview content-data Subvector namely i-th of content characteristic subvector；M indicates the quantity segmented in interview content-data.

Illustratively, by the BERT model of interview question data input pre-training obtain interview question text from attention Character representation, i.e. problem characteristic vector.Described problem feature vector includes some problem feature subvector, and problem characteristic vector can To be expressed asWherein,Indicate the corresponding problem characteristic of j-th of participle in interview question data Subvector namely j-th of problem characteristic subvector；N indicates the quantity segmented in interview question data.

In some embodiments, pre-training BERT model output the corresponding subvector of each participle, i.e., semantic feature to Length, that is, dimension of amount is certain, such as is hadAndWherein D indicates the length of the semantic feature vector of the BERT model output of pre-training, R^dIndicate d dimensional vector space.For example, step S130 Each content characteristic subvector in obtained content feature vectorAnd each problem characteristic subvector in problem characteristic vector It is the vector of 1 × d dimension.Illustratively, d is equal to 768.

Step S140, attention feature vector is calculated according to the content feature vector and problem characteristic vector.

Specifically, the attention feature vector indicates the attention of described problem feature vector to the content feature vector Range degree.

Illustratively, attention feature vector can be expressed asIt can indicate interview question Attention expression of the data to interview content-data.Wherein,Described problem feature vector h^QTo the content feature vector h^C In i-th of content characteristic subvectorAttention feature subvector.

In some embodiments, as shown in figure 4, step S140 is according to the content feature vector and problem characteristic vector Calculate attention feature vector, including step S141- step S143.

Step S141, the attention number between each content characteristic subvector and each described problem feature subvector is calculated Value.

Illustratively, the 1st content characteristic subvector is first calculated separatelyBetween each described problem feature subvector Attention numerical value, then calculate separately the 2nd content characteristic subvectorWith the attention between each described problem feature subvector Numerical value ... ..., then calculate separately i-th of content characteristic subvectorWith the attention number between each described problem feature subvector Value, until calculating m-th of content characteristic subvectorWith the attention numerical value between each described problem feature subvector.

In some embodiments, step S141 calculates each content characteristic subvector and each described problem feature subvector Between attention numerical value, with specific reference to following formula calculate:

α_ij∝exp(S_ij)

Wherein,Indicate i-th of content characteristic subvector in the content feature vector,Indicate described problem feature J-th of problem characteristic subvector, S in vector_ijIt indicatesWithBetween attention index, U indicates weight matrix and there is U ∈ R^d×k, D indicates diagonal matrix and there is D ∈ R^k×k, SeLU () expression activation primitive；α_ijIt indicatesWithBetween attention number Value, ∝ expression are proportional to.

Illustratively, wherein the dimension of k expression attention hidden layer, dimension are preset hyper parameter.SeLU () table Show scaling exponential type linear unit (Scaled Exponential Linear Units) activation primitive, carry out nonlinear activation, To introduce from normalization attribute.

Illustratively, weight matrix U, relevant parameter in diagonal matrix D weighted value for empirical value or can pass through Training obtains, and specially calculating loss function carries out error back propagation in turn to update weighted value.

The calculating of attention index, attention numerical value is by problem characteristic subvectorWith content characteristic subvectorFirst lead to After crossing identical weight matrix U, activation primitive SeLU () mapping, similarity is then calculated by way of inner product of vectors, in turn Obtain the attention index between each content characteristic subvector and each described problem feature subvector, alternatively referred to as attention Weight.

Illustratively,For 1 × d dimensional vector,Transposition beFor the dimensional vector of d × 1,WithBetween note Anticipate power numerical value α_ij、WithBetween attention index S_ijIt is 1 × 1 numerical value.

Step S142, according to all problems feature subvector in each content characteristic subvector and problem characteristic vector it Between attention numerical value calculate described problem feature vector to the content characteristic subvector attention feature subvector.

In some embodiments, step S142 is asked according to each content characteristic subvector with all in problem characteristic vector Attention numerical value between topic feature subvector calculates described problem feature vector to the attention of the content characteristic subvector Feature subvector is calculated with specific reference to following formula:

Wherein,Indicate described problem feature vector i-th of content characteristic subvector into the content feature vector Attention feature subvector.

I-th of content characteristic subvectorWith the attention numerical value between problem characteristic subvector each in problem characteristic vector Linear combination, can be with computational problem feature vector h^QTo the content feature vector h^CIn i-th of content characteristic subvector's Attention feature subvector

Illustratively, attention feature subvectorIt is 1 × d dimensional vector.

Step S143, according to corresponding attention feature of all the elements feature subvector in the content feature vector to Amount combines the power feature vector that gains attention.

Calculate separately the feature vector h that goes wrong^QTo the content feature vector h^CIn each content characteristic subvector attention Attention feature vector can be obtained in feature subvectorI.e. interview question data to interview content The attention of data is expressed.

Step S150, the attention feature vector and the content feature vector are spliced, obtains believing with attention The content vector of breath.

Specifically, content vector includes interviewing in content characteristic information and attention feature vector in content feature vector Attention force information of the problem data to interview content-data.

In some embodiments, as shown in figure 5, step S150 by the attention feature vector and the content characteristic to Amount splicing, obtains with the content vector for paying attention to force information, including step S151, step S152.

Step S151, by the corresponding attention feature subvector of content characteristic subvector each in the content feature vector with The corresponding splicing of the content characteristic subvector, it is corresponding with the content for paying attention to force information to obtain each content characteristic subvector Subvector.

Illustratively, by attention feature vector h^QIn attention feature subvector corresponding with each content characteristic subvectorIt is spliced to content feature vector h^CIn each content characteristic subvectorHead or tail portion, obtain each content characteristic Vector is corresponding with the content subvector for paying attention to force information.

Due to attention feature vector h^QIn i-th of content characteristic subvector correspond to interview content-data in i-th A participle will interview the attention feature subvector that corresponding 1 × d dimension is respectively segmented in content-dataBe spliced to content characteristic to Measure h^CIn respectively segment the content characteristic subvector of corresponding 1 × d dimensionHead or tail portion；Each point is obtained in interview content-data Word is corresponding with the content subvector for paying attention to force information

Illustratively, respectively with the content subvector for paying attention to force informationIt is the vector of 1 × 2d dimension.

Step S152, according to the corresponding content subvector of all the elements feature subvector, group in the content feature vector It closes and obtains the content vector with force information is paid attention to.

Attention feature vector and content feature vector are spliced, it is available with the content vector for paying attention to force informationContent vectorIncluding the corresponding content subvector of i-th of participle in interview content-dataTo in basis later Holding can be more acurrate when vector determines prompt starting point and prompt terminating point in interviewing content-data.

Step S160, according to the content vector sum described problem feature vector, determination is mentioned in the interview content-data Show starting point and prompt terminating point, letter is prompted according to the text output between the prompt starting point and the prompt terminating point Breath, to prompt interviewee to put question to.

Pay attention to the content vector of force information according to interviewee, and the problem of include interviewee's question information feature to The interested part text of interviewee in content-data is interviewed in amount, prediction, i.e., identified prompt starting point and prompt terminating point Between text.

In some embodiments, as shown in fig. 6, in step S160 according to the content vector sum described problem feature to Amount determines prompt starting point and prompt terminating point, including step S161- step S163 in the interview content-data.

Step S161, by each problem characteristic subvector weighted sum in described problem feature vector with obtain problem to Amount.

Illustratively, problem vectorProblem vector includes whole for describing interview question data Information.

Illustratively, weight beta_jIt can be obtained by study, also can be set to empirical value, such as be set as 1；In another example root According to the sequence that interviewee puts question to, successively the weight of corresponding problem characteristic subvector is putd question to successively to successively decrease by interviewee each time.Show Example property, problem vectorIt is 1 × d dimensional vector.

Due to attention feature vector h^QIn i-th of content characteristic subvector correspond to interview content-data in i-th A participle, thus it is subsequent can according to interview content-data in respectively segment it is corresponding with pay attention to force information content subvectorAnd problem vectorEach participle is calculated as section starting word is putd question to, that is, prompts the probability of starting point, and calculate each Participle prompts the probability of terminating point as puing question to section to terminate word, and then determine from interview content-data for interview The prompt section of person prompted.

Step S162, it is calculated according to the corresponding content subvector of the content characteristic subvector and described problem vector Participle corresponding with the content characteristic subvector is the probability of prompt starting point in the interview content-data.

Illustratively, calculating participle corresponding with the content characteristic subvector in the interview content-data is to prompt The probability of initial point is realized by following formula:

Wherein,Indicate described problem vector,Indicate i-th of content characteristic subvectorCorresponding content subvector, P_i ^SIt indicatesProbability of the corresponding participle for prompt starting point, W_SIndicate weight matrix.

Illustratively, W_SIt is the weight matrix (parametrized matrix) of a d × 2d.

Pass through the corresponding content subvector of participle each in calculating interview content-dataWith problem vectorSimilarity The corresponding participle of evaluation is the probability of prompt starting point.Illustratively, the highest participle of similarity is the probability of prompt starting point Maximum, the participle can be used as prompt starting point, that is, put question to the starting word in section.

Due to i-th of participle corresponding content subvector in interview content-dataThe attention force information of interviewee is carried, Thus with problem vectorAvailable more accurate similarity when inner product is done, and then more accurately determines prompt starting point.

Step S163, it is based on door cycle calculations unit neural network, according to the corresponding participle of the content characteristic subvector To prompt the probability, the corresponding content subvector of the content characteristic subvector and described problem vector of starting point to calculate institute Stating participle corresponding with the content characteristic subvector in interview content-data is the probability of prompt terminating point.

Illustratively, it is whole for prompt to calculate participle corresponding with the content characteristic subvector in the interview content-data The probability of stop is realized according to the following formula:

Wherein, GRU () indicates the treatment process of door cycle calculations unit neural network, t^QIndicate door cycle calculations unit mind The fusion vector exported through network, P_i ^EIt indicatesProbability of the corresponding participle for prompt terminating point, W_EIndicate weight matrix.

Illustratively, W_EIt is the weight matrix (parametrized matrix) of d × d.

Door cycle calculations unit (Gate Recurrent Unit, GRU) neural network is Recognition with Recurrent Neural Network The one kind of (Recurrent Neural Network, RNN)；The problems such as in order to solve the gradient in long-term memory and backpropagation And put forward；GRU neural network is generally used for the task that processing has the amnestic requirement of timing.

In the present embodiment, it is primarily based on GRU neural network, merges vector t by calculating^Q, so as to it is subsequent will interview in Respectively participle is to prompt the probability fusion of starting point into the calculating for the probability that each participle is prompt terminating point in appearance data.For one Part interview content-data and interview question data, need to only calculate Single cell fusion vector t^Q。

In the present embodiment, by problem vectorAs hidden state (hidden state) h input GRU neural network First GRU unit will respectively segment corresponding content subvector in interview content-dataIt is prompt starting point with each participle Probability P_i ^SInput x after multiplication as GRU unit each in GRU neural network；Pass through several GRU units in GRU neural network By the corresponding content subvector of each participleEach participle is the probability P of prompt starting point_i ^SAnd problem vectorIt merges Come；The output of the last one GRU unit is as fusion vector t in GRU neural network^Q, it is the vector of 1 × d.

Illustratively,As the input of first GRU unit,As the defeated of second GRU unit Enter ....

Based on door cycle calculations unit neural network, i.e., the parameter for needing to learn in GRU network with can pass through loss letter Several error back propagations obtain.Illustratively, loss function is cross entropy loss function, and loss function is for calculating GRU net The start-tag of network output, terminates the difference between the probability value of label and the probability value of mark.

Pass through the corresponding content subvector of participle each in calculating interview content-dataWith merge vector t^QSimilarity The corresponding participle of evaluation is the probability of prompt terminating point.Illustratively, the highest participle of similarity is the probability of prompt terminating point Highest, the participle can be used as to prompt terminating point, that is, put question to the termination word in section.

Due to i-th of participle corresponding content subvector in interview content-dataCarry attention force information, thus with Merge vector t^QAvailable more accurate similarity when inner product is done, and then more accurately determines that terminating lexeme sets.

Step S164, the participle for determining the maximum probability of prompt starting point is prompt starting point, determines prompt terminating point The participle of maximum probability is prompt terminating point.

In the present embodiment, it after step S160 determines prompt starting point and prompt terminating point, is originated according to the prompt Text output prompt information between point and the prompt terminating point, to prompt interviewee to put question to.

Illustratively, it can export in interview content-data as between the participle for prompting starting point, prompt terminating point Text, to prompt interviewee to put question to, can also prompt starting point, prompt according to being used as in interview content-data as prompt information Text generation prompt information between the participle of terminating point, then exports prompt information, to prompt interviewee to put question to.

Illustratively, it can also export in the interview session data and interview corpus data of step S110 acquisition and in interview Hold data as prompt starting point, the text of prompt terminating point segmented between corresponding position, as prompt information to prompt Interviewee puts question to.

Illustratively, the text between the prompt starting point and prompt terminating point includes that " main business is to provide polynary Change financial service and product, and using insurance business as core ", then the words can be exported to prompt interviewee to put question to；To face Examination person can be unfolded to put question to according to the suggestion content, understand interview company；Such as interviewee can put question to " company to interviewer Which insurance kind is insurance business specifically have " etc..

Reminding method is putd question in interview provided by the above embodiment, from interview session data and presets interview by BERT model Content feature vector is extracted in corpus data, and extracts problem characteristic vector from the interview question data of interviewee；Then it counts The attention feature vector for indicating problem characteristic vector to content feature vector attention degree is calculated, will pay attention to force information It is spliced to content feature vector and obtains the content vector for paying attention to force information with interviewee；Later according to the content vector sum problem Feature vector determines the text for making prompt in the interview content-data；To realize according in interview process Session content predicts the interested content of interviewee, gives and prompts for the question of interviewee；And predict Process fusion interview The attention force information of person is predicted more accurate.

Referring to Fig. 7, Fig. 7 is the structural schematic diagram that suggestion device is putd question in a kind of interview that one embodiment of the application provides, The interview puts question to suggestion device that can be configured in server or terminal, puts question to reminding method for executing interview above-mentioned.

As shown in fig. 7, suggestion device is putd question in the interview, comprising: content obtains module 110, problem obtains module 120, special Levy vector calculation module 130, attention computing module 140, attention splicing module 150, prompt section determining module 160.

Content obtains module 110, for obtaining interview session data and preset interview corpus data, by the interview meeting Data and interview corpus data processing are talked about as interview content-data.

Illustratively, as shown in figure 8, content acquisition module 110 includes:

Content segments submodule 111, for carrying out at participle to the interview session data and the interview corpus data Reason；

Content be embedded in submodule 112, for by after word segmentation processing interview session data and interview corpus data participle Information, paragraph information, location information carry out insertion processing；

Content-data computational submodule 113, for by the participle information, paragraph information, the insertion result of location information It is added to obtain interview content-data.

Problem obtains module 120, the interview question text putd question to for extracting interviewee from the interview session data, It is interview question data by the interview question text-processing.

Illustratively, as shown in figure 8, problem acquisition module 120 includes:

Problem segments submodule 121, for carrying out word segmentation processing to the interview question text；

Problem be embedded in submodule 122, for by the participle information of the interview question text after word segmentation processing, paragraph information, Location information carries out insertion processing；

Problem data computational submodule 123, for by the participle information, paragraph information, the insertion result of location information It is added to obtain interview question data.

Feature vector computing module 130 is calculated according to the interview content-data for describing for being based on BERT model The content feature vector of the interview content-data is calculated according to the interview question data for describing the interview question text This problem of feature vector.

Attention computing module 140, it is special for calculating attention according to the content feature vector and problem characteristic vector Levy vector.

The attention feature vector indicates described problem feature vector to the attention degree of the content feature vector.

Illustratively, the content feature vector includes multiple content characteristic subvectors, and described problem feature vector includes Some problem feature subvector.

Illustratively, as shown in figure 8, attention computing module 140 includes: attention numerical value computational submodule 141, pays attention to Power feature calculation submodule 142, attention Vector Groups zygote module 143.

Attention numerical value computational submodule 141, for calculating each content characteristic subvector and each described problem feature Attention numerical value between subvector.

Illustratively, attention numerical value computational submodule 141 for calculating each content characteristic subvector according to the following formula With the attention numerical value between each described problem feature subvector:

α_ij∝exp(S_ij)

Attention feature calculation submodule 142, for according in each content characteristic subvector and problem characteristic vector Attention numerical value between all problems feature subvector calculates described problem feature vector to the content characteristic subvector Attention feature subvector.

Illustratively, attention feature calculation submodule 142 for calculating described problem feature vector to institute according to the following formula State the attention feature subvector of content characteristic subvector:

Attention Vector Groups zygote module 143, for according to all the elements feature subvector in the content feature vector Corresponding attention feature subvector combines the power feature vector that gains attention.

Attention splicing module 150 is obtained for splicing the attention feature vector and the content feature vector With the content vector for paying attention to force information.

Illustratively, as shown in figure 8, attention splicing module 150 includes:

Vector splices submodule 151, is used for the corresponding attention of content characteristic subvector each in the content feature vector The splicing corresponding with the content characteristic subvector of power feature subvector, it is corresponding with note to obtain each content characteristic subvector The content subvector of meaning force information；

Content Vector Groups zygote module 152, for according to all the elements feature subvector pair in the content feature vector The content subvector answered, combination are obtained with the content vector for paying attention to force information.

Section determining module 160 is prompted, is used for according to the content vector sum described problem feature vector in the interview Prompt starting point and prompt terminating point are determined in content-data, according between the prompt starting point and the prompt terminating point Text output prompt information, to prompt interviewee to put question to.

Illustratively, as shown in figure 8, prompt section determining module 160 includes:

Weighted sum submodule 161, for by each problem characteristic subvector weighted sum in described problem feature vector To obtain problem vector；

Starting point judging submodule 162, for according to the corresponding content subvector of the content characteristic subvector, Yi Jisuo It is prompt starting point that the problem vector of stating, which calculates participle corresponding with the content characteristic subvector in the interview content-data, Probability:

Wherein,Indicate described problem vector,Indicate i-th of content characteristic subvectorCorresponding content subvector, P_i ^SIt indicatesProbability of the corresponding participle for prompt starting point, W_SIndicate weight matrix；

Terminating point judging submodule 163, for being based on door cycle calculations unit neural network, according to content characteristic The corresponding participle of vector is the prompt probability of starting point, the corresponding content subvector of the content characteristic subvector and described It is the general of prompt terminating point that problem vector, which calculates participle corresponding with the content characteristic subvector in the interview content-data, Rate:

Wherein, GRU () indicates the treatment process of door cycle calculations unit neural network, t^QIndicate door cycle calculations unit mind The fusion vector exported through network, P_i ^EIt indicatesProbability of the corresponding participle for prompt terminating point, W_EIndicate weight matrix；

It segments and determines submodule 164, for determining that the participle of maximum probability of prompt starting point is prompt starting point, determine The participle for prompting the maximum probability of terminating point is prompt terminating point.

It should be noted that it is apparent to those skilled in the art that, for convenience of description and succinctly, The device of foregoing description and each module, the specific work process of unit, can refer to corresponding processes in the foregoing method embodiment, Details are not described herein.

The present processes, device can be used in numerous general or special purpose computing system environments or configuration.Such as: it is personal Computer, server computer, handheld device or portable device, multicomputer system, are based on microprocessor at laptop device System, set-top box, programmable consumer-elcetronics devices, network PC, minicomputer, mainframe computer including any of the above Distributed computing environment of system or equipment etc..

Illustratively, above-mentioned method, apparatus can be implemented as a kind of form of computer program, which can To be run in computer equipment as shown in Figure 9.

Referring to Fig. 9, Fig. 9 is a kind of structural schematic diagram of computer equipment provided by the embodiments of the present application.The computer Equipment can be server or terminal.

Refering to Fig. 9, which includes processor, memory and the network interface connected by system bus, In, memory may include non-volatile memory medium and built-in storage.

Non-volatile memory medium can storage program area and computer program.The computer program includes program instruction, The program instruction is performed, and processor may make to execute any one interview and put question to reminding method.

Processor supports the operation of entire computer equipment for providing calculating and control ability.

Built-in storage provides environment for the operation of the computer program in non-volatile memory medium, the computer program quilt When processor executes, processor may make to execute any one interview and put question to reminding method.

The network interface such as sends the task dispatching of distribution for carrying out network communication.It will be understood by those skilled in the art that The structure of the computer equipment, only the block diagram of part-structure relevant to application scheme, is not constituted to the application side The restriction for the computer equipment that case is applied thereon, specific computer equipment may include more more or less than as shown in the figure Component, perhaps combine certain components or with different component layouts.

It should be understood that processor can be central processing unit (Central Processing Unit, CPU), it should Processor can also be other general processors, digital signal processor (Digital Signal Processor, DSP), specially With integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor are patrolled Collect device, discrete hardware components etc..Wherein, general processor can be microprocessor or the processor be also possible to it is any often The processor etc. of rule.

Wherein, in one embodiment, the processor is for running computer program stored in memory, with reality Existing following steps: interview session data and preset interview corpus data are obtained, by the interview session data and the interview Corpus data processing is interview content-data；The interview question text that interviewee puts question to is extracted from the interview session data, It is interview question data by the interview question text-processing；Based on BERT model, is calculated and used according to the interview content-data In the content feature vector for describing the interview content-data, calculated according to the interview question data for describing the interview The problem of question text feature vector；Attention feature vector is calculated according to the content feature vector and problem characteristic vector, The attention feature vector indicates described problem feature vector to the attention degree of the content feature vector；By the note Power feature vector of anticipating and content feature vector splicing, obtain with the content vector for paying attention to force information；According to the content Vector sum described problem feature vector determines prompt starting point and prompt terminating point in the interview content-data, according to described The text output prompt information between starting point and the prompt terminating point is prompted, to prompt interviewee to put question to.

Illustratively, the processor is for realizing by the interview session data and interview corpus data processing When interviewing content-data, for realizing: word segmentation processing is carried out to the interview session data and the interview corpus data；It will divide It word treated interview session data and interviews the participle information of corpus data, paragraph information, location information and carries out insertion processing； By the participle information, paragraph information, the insertion results added of location information to obtain interview content-data.

Illustratively, it when the processor is for realizing being interview question data by the interview question text-processing, uses In realization: carrying out word segmentation processing to the interview question text；By participle information, the section of the interview question text after word segmentation processing Fall information, location information carries out insertion processing；By the participle information, paragraph information, the insertion results added of location information with Obtain interview question data.

Illustratively, the content feature vector includes multiple content characteristic subvectors, and described problem feature vector includes Some problem feature subvector.The processor for realizing when, for realizing:

Illustratively, the processor calculates attention for realizing according to the content feature vector and problem characteristic vector When power feature vector, for realizing: calculate the note between each content characteristic subvector and each described problem feature subvector Meaning power numerical value；According to the attention in each content characteristic subvector and problem characteristic vector between all problems feature subvector Power numerical value calculates described problem feature vector to the attention feature subvector of the content characteristic subvector；According to the content The corresponding attention feature subvector of all the elements feature subvector in feature vector combines the power feature vector that gains attention.

Illustratively, the processor is sub with each described problem feature for realizing each content characteristic subvector is calculated When attention numerical value between vector, calculated with specific reference to following formula:

α_ij∝exp(S_ij)

Illustratively, the processor is for realizing according to institute in each content characteristic subvector and problem characteristic vector Attention numerical value between problematic feature subvector calculates described problem feature vector to the note of the content characteristic subvector When meaning power feature subvector, calculated with specific reference to following formula:

Illustratively, the processor is spelled for realizing by the attention feature vector and the content feature vector Connect, obtain with pay attention to force information content vector when, for realizing: by content characteristic each in the content feature vector to The splicing corresponding with the content characteristic subvector of corresponding attention feature subvector is measured, each content characteristic subvector is obtained The corresponding content subvector with attention force information；It is corresponding according to all the elements feature subvector in the content feature vector Content subvector, combination obtain with pay attention to force information content vector.

Illustratively, the processor for realizing according to the content vector sum described problem feature vector in the face When trying to determine prompt starting point and prompt terminating point in content-data, for realizing:

By each problem characteristic subvector weighted sum in described problem feature vector to obtain problem vector；

It is calculated in the interview according to the corresponding content subvector of the content characteristic subvector and described problem vector Holding participle corresponding with the content characteristic subvector in data is the probability of prompt starting point:

It is prompt starting according to the corresponding participle of the content characteristic subvector based on door cycle calculations unit neural network Probability, the corresponding content subvector of the content characteristic subvector and the described problem vector of point calculate the interview content Participle corresponding with the content characteristic subvector is the probability of prompt terminating point in data:

As seen through the above description of the embodiments, those skilled in the art can be understood that the application can It realizes by means of software and necessary general hardware platform.Based on this understanding, the technical solution essence of the application On in other words the part that contributes to existing technology can be embodied in the form of software products, the computer software product It can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that a computer equipment (can be personal computer, server or the network equipment etc.) executes the certain of each embodiment of the application or embodiment Method described in part, such as:

A kind of computer readable storage medium, the computer-readable recording medium storage have computer program, the meter It include program instruction in calculation machine program, the processor executes described program instruction, realizes provided by the embodiments of the present application any Reminding method is putd question in item interview.

Wherein, the computer readable storage medium can be the storage inside of computer equipment described in previous embodiment Unit, such as the hard disk or memory of the computer equipment.The computer readable storage medium is also possible to the computer The plug-in type hard disk being equipped on the External memory equipment of equipment, such as the computer equipment, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..

The above, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, it is any Those familiar with the art within the technical scope of the present application, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should all cover within the scope of protection of this application.Therefore, the protection scope of the application should be with right It is required that protection scope subject to.

Claims

1. reminding method is putd question in a kind of interview characterized by comprising

The interview question text that interviewee puts question to is extracted from the interview session data, is by the interview question text-processing Interview question data；

Based on BERT model, according to the interview content-data calculate the content characteristic for describing the interview content-data to The problem of measuring, being calculated according to the interview question data for describing interview question text feature vector；

Attention feature vector, the attention feature vector table are calculated according to the content feature vector and problem characteristic vector Show described problem feature vector to the content feature vector attention degree；

The attention feature vector and the content feature vector are spliced, obtained with the content vector for paying attention to force information；

Prompt starting point is determined in the interview content-data according to the content vector sum described problem feature vector and is mentioned Show terminating point, according to the text output prompt information between the prompt starting point and the prompt terminating point, to prompt to interview Person puts question to.

2. reminding method is putd question in interview as described in claim 1, it is characterised in that: described by the interview session data and institute The processing of interview corpus data is stated as interview content-data, comprising:

Word segmentation processing is carried out to the interview session data and the interview corpus data；

Interview session data after word segmentation processing and the participle information, paragraph information, location information of interviewing corpus data are carried out Insertion processing；

By the participle information, paragraph information, the insertion results added of location information to obtain interview content-data；

It is described by the interview question text-processing be interview question data, comprising:

Word segmentation processing is carried out to the interview question text；

The participle information, paragraph information, location information of interview question text after word segmentation processing are subjected to insertion processing；

By the participle information, paragraph information, the insertion results added of location information to obtain interview question data.

3. reminding method is putd question in interview as claimed in claim 2, it is characterised in that: the content feature vector includes in multiple Hold feature subvector, described problem feature vector includes some problem feature subvector；

It is described that attention feature vector is calculated according to the content feature vector and problem characteristic vector, comprising:

Calculate the attention numerical value between each content characteristic subvector and each described problem feature subvector；

According to the attention number in each content characteristic subvector and problem characteristic vector between all problems feature subvector Value calculates described problem feature vector to the attention feature subvector of the content characteristic subvector；

According to the corresponding attention feature subvector of all the elements feature subvector in the content feature vector, combination is infused Meaning power feature vector.

4. reminding method is putd question in interview as claimed in claim 3, it is characterised in that: it is described calculate each content characteristic to Attention numerical value between amount and each described problem feature subvector is calculated with specific reference to following formula:

α_ij∝exp(S_ij)

Wherein,Indicate i-th of content characteristic subvector in the content feature vector,It indicates in described problem feature vector J-th of problem characteristic subvector, S_ijIt indicatesWithBetween attention index, U indicates weight matrix and there is U ∈ R^d×k, D table Show diagonal matrix and has D ∈ R^k×k, SeLU () expression activation primitive；α_ijIt indicatesWithBetween attention numerical value, ∝ indicate It is proportional to.

5. reminding method is putd question in interview as claimed in claim 4, it is characterised in that: it is described according to each content characteristic to Attention numerical value in amount and problem characteristic vector between all problems feature subvector calculates described problem feature vector to institute The attention feature subvector for stating content characteristic subvector is calculated with specific reference to following formula:

Wherein,Indicate described problem feature vector i-th of content characteristic subvector into the content feature vectorNote Meaning power feature subvector.

6. reminding method is putd question in interview as claimed in claim 5, it is characterised in that: it is described by the attention feature vector and The content feature vector splicing, obtains with the content vector for paying attention to force information, comprising:

By the corresponding attention feature subvector of content characteristic subvector each in the content feature vector and the content characteristic The corresponding splicing of subvector, it is corresponding with the content subvector for paying attention to force information to obtain each content characteristic subvector；

According to the corresponding content subvector of all the elements feature subvector in the content feature vector, combination is obtained with attention The content vector of force information.

7. reminding method is putd question in interview as claimed in claim 6, it is characterised in that: described according to the content vector sum Problem characteristic vector determines prompt starting point and prompt terminating point in the interview content-data, comprising:

The interview content number is calculated according to the corresponding content subvector of the content characteristic subvector and described problem vector Participle corresponding with the content characteristic subvector is the probability of prompt starting point in:

Wherein,Indicate described problem vector,Indicate i-th of content characteristic subvectorCorresponding content subvector, P_i ^STable ShowProbability of the corresponding participle for prompt starting point, W_SIndicate weight matrix；

It is prompt starting point according to the corresponding participle of the content characteristic subvector based on door cycle calculations unit neural network Probability, the corresponding content subvector of the content characteristic subvector and described problem vector calculate the interview content-data In it is corresponding with the content characteristic subvector participle for prompt terminating point probability:

Wherein, GRU () indicates the treatment process of door cycle calculations unit neural network, t^QIndicate door cycle calculations unit nerve net The fusion vector of network output, P_i ^EIt indicatesProbability of the corresponding participle for prompt terminating point, W_EIndicate weight matrix；

The participle for determining the maximum probability of prompt starting point is prompt starting point, determines the participle of the maximum probability of prompt terminating point To prompt terminating point.

8. suggestion device is putd question in a kind of interview characterized by comprising

Content obtains module, for obtaining interview session data and preset interview corpus data, by the interview session data It is interview content-data with interview corpus data processing；

Problem obtains module, the interview question text putd question to for extracting interviewee from the interview session data, will be described Interview question text-processing is interview question data；

Feature vector computing module is calculated according to the interview content-data for describing the face for being based on BERT model The content feature vector for trying content-data is calculated according to the interview question data for describing asking for the interview question text Inscribe feature vector；

Attention computing module, for calculating attention feature vector according to the content feature vector and problem characteristic vector, The attention feature vector indicates described problem feature vector to the attention degree of the content feature vector；

Attention splicing module is obtained for splicing the attention feature vector and the content feature vector with note The content vector of meaning force information；

Section determining module is prompted, is used for according to the content vector sum described problem feature vector in the interview content-data Middle determining prompt starting point and prompt terminating point, according to the text output between the prompt starting point and the prompt terminating point Prompt information, to prompt interviewee to put question to.

9. a kind of computer equipment, which is characterized in that the computer equipment includes memory and processor；

The memory is for storing computer program；

The processor, for executing the computer program and realization such as claim 1- when executing the computer program Reminding method is putd question in interview described in any one of 7.

10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In: if the computer program is executed by processor, realize that prompt side is putd question in interview of any of claims 1-7 such as Method.