WO2021051586A1 - 面试回答文本的分类方法及装置、电子设备、存储介质 - Google Patents

面试回答文本的分类方法及装置、电子设备、存储介质 Download PDF

Info

Publication number
WO2021051586A1
WO2021051586A1 PCT/CN2019/118036 CN2019118036W WO2021051586A1 WO 2021051586 A1 WO2021051586 A1 WO 2021051586A1 CN 2019118036 W CN2019118036 W CN 2019118036W WO 2021051586 A1 WO2021051586 A1 WO 2021051586A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
interview
length
answer text
answer
Prior art date
Application number
PCT/CN2019/118036
Other languages
English (en)
French (fr)
Inventor
郑立颖
徐亮
金戈
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021051586A1 publication Critical patent/WO2021051586A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • G06Q10/1053Employment or hiring

Definitions

  • This application relates to the field of artificial intelligence technology, and specifically relates to a method and device for classifying interview answer text, electronic equipment, and computer-readable storage media.
  • the interviewer conducts an interview with the interviewer, and then the interviewer evaluates the interviewer's abilities in various aspects according to the interviewer's answer corpus during the interview.
  • the inventor realizes that there is a problem of low efficiency due to the fact that the interviewer determines the interviewer's grading level on each set ability item according to the interviewer's answer corpus.
  • the embodiments of this application provide a method and device for classifying interview answer text, electronic equipment, and computer-readable storage media to achieve Automated interview evaluation.
  • a method for categorizing interview answer text includes:
  • the interview answer text of the interviewer is obtained based on the interviewer's response to the interview question during the interview; the semantic vector of the interview answer text is constructed through the feature extraction layer of the constructed classification model, so The classification model is obtained by training several sample answer texts and label data marked for each sample answer text.
  • the label data indicates that according to the sample answer text, the interviewer’s set ability item is marked by the interviewer.
  • Each fully connected layer of the classification model is fully connected according to the semantic vector, and a feature vector is obtained correspondingly.
  • the feature vector obtained on the fully connected layer is used to characterize the The characteristics of the sample answer text on the set capability item corresponding to the fully connected layer, the classification model includes at least two fully connected layers, and each fully connected layer corresponds to a set capability item; The obtained feature vectors are classified and predicted, and the scoring grades of the interviewers on each set ability item are obtained respectively.
  • an apparatus for categorizing interview answer texts includes: an acquisition module configured to acquire an interview answer text as an interviewer, the interview answer text being based on the interviewer’s questions in the interview
  • the semantic vector construction module is configured to construct the semantic vector of the interview answer text through the feature extraction layer of the constructed classification model, the classification model is to answer the text through a number of samples and annotate each sample answer text
  • the label data is obtained by training, the label data indicates the scoring level marked on the set ability item for the interviewer according to the sample answer text; the fully connected module is configured to pass the classification model A fully connected layer is respectively fully connected according to the semantic vector, and corresponding feature vectors are obtained.
  • the feature vectors obtained on the fully connected layer are configured to characterize the corresponding settings of the sample answer text in the fully connected layer.
  • the classification model includes at least two fully connected layers, and each fully connected layer corresponds to a set capacity item; the classification prediction module is configured to perform the feature vectors obtained in each fully connected layer. Classification prediction, respectively obtaining the interviewer's scoring grades on each set ability item.
  • an electronic device includes: a processor; and a memory on which computer-readable instructions are stored.
  • the computer-readable instructions are executed by the processor, the above-mentioned interview answer text is realized Classification.
  • a computer non-readable storage medium has computer readable instructions stored thereon, and when the computer readable instructions are executed by the processor of the computer, the method for classifying interview answer text as described above is realized .
  • the interviewer’s grading level in each set ability item is automatically determined according to the interview answer text of the interviewer, and the ability of the interviewer in each set ability item is evaluated according to the interview answer text of the interviewer, in other words , Realize the automatic interview evaluation.
  • the interviewer is not required to participate in the interview evaluation, it is possible to avoid the inaccurate and objective question of the interviewer's grading of the interviewer's various ability items caused by the interviewer's subjective will and personal preferences.
  • Fig. 1 is a block diagram of a device exemplarily shown
  • Fig. 2 is a flow chart showing a method for categorizing interview answer text according to an exemplary embodiment
  • FIG. 3 is a flowchart of step 310 in FIG. 2 in an embodiment
  • FIG. 4 is a flowchart of step 330 in FIG. 2 in an embodiment
  • FIG. 5 is a flowchart of steps before step 351 in FIG. 4 in an embodiment
  • FIG. 6 is a flowchart in an embodiment of the step of determining the text truncation length according to the text length of each sample answer text
  • FIG. 7 is a flowchart of steps before step 330 in FIG. 2 in an embodiment
  • Fig. 8 is a block diagram showing a device for classifying interview answer text according to an exemplary embodiment
  • Fig. 9 is a block diagram showing an electronic device according to an exemplary embodiment.
  • Fig. 1 shows a block diagram of an apparatus 200 according to an exemplary embodiment.
  • the device 200 can be used as the execution subject of the application, and is used to implement the method for classifying the interview answer text of the application.
  • the method of this application is not limited to being implemented by using the device 200 as the execution subject, and other electronic devices with processing capabilities can also be used as the execution subject of this application to implement the method for classifying interview response texts of this application.
  • the device 200 is only an example adapted to the present application, and cannot be considered as providing any limitation on the scope of use of the present application.
  • the device cannot be interpreted as being dependent on or having one or more components in the exemplary device 200 shown in FIG. 1.
  • the hardware structure of the device 200 may vary greatly due to differences in configuration or performance.
  • the device 200 includes: a power supply 210, an interface 230, at least one memory 250, and at least one processor 270.
  • the power supply 210 is used to provide working voltage for each hardware device on the apparatus 200.
  • the interface 230 includes at least one wired or wireless network interface 231, at least one serial-to-parallel conversion interface 233, at least one input/output interface 235, at least one USB interface 237, etc., for communicating with external devices.
  • the memory 250 can be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc.
  • the resources stored on it include the operating system 251, application programs 253, and data 255, etc.
  • the storage method can be short-term storage or permanent storage.
  • the operating system 251 is used to manage and control various hardware devices and application programs 253 on the apparatus 200 to realize the calculation and processing of the massive data 255 by the processor 270. It can be Windows Server TM , Mac OS X TM , Unix TM , Linux TM , FreeBSD TM, etc.
  • the application program 253 is a computer program that completes at least one specific task based on the operating system 251.
  • the processor 270 may include one or more processors, and is configured to communicate with the memory 250 through a bus, and is used for computing and processing the massive data 255 in the memory 250.
  • the device 200 applicable to the present application will use the processor 270 to read a series of computer-readable instructions stored in the memory 250 to complete the method of classifying interview answer texts.
  • the application can also be implemented by hardware circuits or hardware circuits in combination with software. Therefore, implementation of the application is not limited to any specific hardware circuits, software, and combinations of both.
  • Fig. 2 is a flowchart showing a method for categorizing interview answer text according to an exemplary embodiment.
  • the method may be executed by the apparatus 200 shown in Fig. 1, or may be executed by other electronic devices with processing capabilities. There is no specific limitation. As shown in Figure 2, the method at least includes the following steps:
  • Step 310 Obtain the interview answer text of the interviewer.
  • the interview answer text is obtained based on the interviewer's response to the interview question during the interview. For an interview, during the interview process, the interviewer answers to the interview questions, and the content of the answer is the reply to the interview question.
  • the interview answer text is the textual expression of the response to the interview question. For example, if the interviewer answers the interview question in text, the response is the interview answer text; if the interviewer answers the interview question by voice, then The text obtained by performing voice recognition on the reply is the interview answer text.
  • the interviewer is interviewed through an intelligent interview system.
  • the intelligent interview system a number of questions are set in advance for the interviewer to be interviewed, such as setting questions based on the interviewer's resume and other materials. Therefore, when the interviewer is interviewed, the interviewer is asked according to the set questions, and the replies of the interviewer to the questions are collected, and then the interview answer text is obtained.
  • the intelligent interview system uses the method of this application to classify according to the interview answer text of the interviewer.
  • Step 330 Construct the semantic vector of the interview answer text through the feature extraction layer of the constructed classification model.
  • the classification model is obtained by training several sample answer texts and label data labeled for each sample answer text.
  • the label data indicates the basis
  • the sample answer text is the grading grade marked by the interviewer on the set ability item.
  • the semantic vector of the interview answer text is the vector representation of the semantics of the interview answer text.
  • the classification model is constructed through a neural network, and the constructed classification model is used to classify the interview answer text.
  • Neural networks such as deep feedforward networks, convolutional neural networks (Convolution Neural Networks, CNN), recurrent neural networks (Recurrent Neural Networks), etc. Neural Networks, etc., combine various neural networks to obtain a classification model for interview answer text classification.
  • the purpose of categorizing the interview answer text is to obtain the grading level on the ability item set by the interviewer through the interview answer text. Therefore, the classification is to classify the interview answer text to a grading level on the set ability item , So as to realize the ability assessment of the interviewer based on the interview answer text.
  • the ability of the interviewer is evaluated on a number of set ability items.
  • the classification model of this application is constructed to classify interview answer texts based on multiple set ability items.
  • Set ability items such as learning ability, planning ability, stability, teamwork ability, leadership ability, etc.
  • the set ability items that the interviewer needs to evaluate may be different. Therefore, a number of set ability items to be evaluated by the interviewer can be selected according to actual needs.
  • the classification model includes a feature extraction layer, a fully connected layer (one of the set capacity items corresponds to a fully connected layer) and an output layer (each fully connected layer corresponds to an output layer) respectively constructed for the set capability items ).
  • the feature extraction layer is used to construct the semantic vector of the interview answer text
  • the fully connected layer is used to fully connect according to the semantic vector on the set ability item corresponding to the fully connected layer, and obtain the ability to characterize the interview answer text in the set
  • the feature vector of the feature on the item the output layer is used to output according to the feature vector to obtain the rating level on the set ability item.
  • a set ability item corresponds to an output layer, that is, the output layer outputs
  • the rating level of is the rating level of the ability item corresponding to the output layer.
  • model training is performed based on a number of sample answer texts and label data marked for the interview answer text to obtain a classification model.
  • the classification model is used to output the interviewer’s scoring level on the set ability item according to the interview answer text. Therefore, the label data used for model training represents the corresponding word sample answer text in each set ability item. The rating level on the.
  • each fully connected layer of the classification model is fully connected according to the semantic vector, and the feature vector is correspondingly obtained.
  • the feature vector obtained on the fully connected layer is used to characterize the corresponding setting ability of the sample answer text in the fully connected layer
  • the classification model includes at least two fully connected layers, and each fully connected layer corresponds to a set capability item.
  • a fully connected layer is constructed for each set capability item.
  • the semantic vector of the interview answer text is obtained through the feature extraction layer, because it is necessary to classify the interview answer text on at least two set ability items, the semantic vector of the interview answer text represents all the features of the interview answer text . But, in the semantic vector, the performance level of the features on each set ability item is different, the characteristics on some set ability items are obvious, and the characteristics on some set ability items are not obvious. Therefore, if only the semantic vector is used to classify at least two set ability items, there is a problem of low classification accuracy.
  • each fully connected layer corresponding to the ability item to fully connect according to the semantic vector, and correspondingly obtain the feature vector used to characterize the feature of the interview answer text on the set ability item corresponding to the fully connected layer. Since in the classification model, each fully connected layer corresponds to a set ability item, in order to classify the interview answer text on each set ability item, the fully connected layer corresponding to the set ability item is based on semantics. The vector obtains the feature vector corresponding to the set ability item.
  • Step 370 Perform classification prediction on the feature vectors obtained in each fully connected layer, and obtain the scoring levels of the interviewers on each set ability item respectively.
  • the classification prediction performed is to predict the probability that the feature vector corresponds to each scoring level for the scoring level set on each set ability item, so that the interview answer text is determined correspondingly according to the predicted probability The rating level on the set ability item.
  • grading grade A 4 grading levels are preset, namely: grading grade A, grading grade B, grading grade C, and grading grade D.
  • the probability of the interview answer text being classified to the scoring levels A, B, C, and D is respectively predicted.
  • the probability of the interview answer text being classified to grade A is P1
  • the probability of the interview answer text being classified to grade B is P2
  • the probability of the interview answer text being classified to grade C is P3
  • interview The probability that the answer text is classified to grade D is P4.
  • the interview answer text is classified to Grading grade A, that is, the interviewer’s grading grade on learning ability is A.
  • the interviewer's grading level in each set ability item can be determined according to the interview answer text of the interviewer, and the ability of the interviewer in each set ability item can be evaluated according to the interview answer text of the interviewer.
  • It realizes the automatic interview evaluation and improves the efficiency of the interview evaluation.
  • the interviewer does not need to participate in the interview evaluation, it can avoid the inaccurate and objective evaluation results caused by the interviewer's subjective will and personal preferences.
  • step 310 includes: step 311, collecting the interviewer's response voice to the interview question during the interview.
  • the interviewer is interviewed by voice, and voice collection is performed during the interview, so as to obtain the interviewer's response voice to the interview question in the process.
  • Step 313 Perform voice recognition on the reply voice, and obtain the interview answer text corresponding to the reply voice.
  • the speech recognition performed is to recognize the reply speech as text, so as to obtain the interview answer text corresponding to the reply speech.
  • a voice recognition tool in the prior art can be directly invoked to perform the voice recognition.
  • step 330 includes: step 331, segmenting the interview answer text through the feature extraction layer of the classification model to obtain a word sequence composed of several words.
  • Word segmentation refers to the process of dividing the continuous interview answer text into word sequences according to certain specifications, so as to obtain a word sequence composed of several individual words.
  • the word segmentation performed may be a word segmentation method based on string matching, a word segmentation method based on understanding, and a word segmentation method based on statistics, which are not specifically limited herein.
  • it is also possible to directly call a word segmentation tool for word segmentation such as jieba, SnowNLP, THULAC, NLPIR, etc.
  • word segmentation may be different. For example, for English text, you can directly use spaces and punctuation for word segmentation, while for Chinese text, because there is no space between words, pass Word segmentation with spaces is not acceptable, so it is necessary to use a word segmentation method suitable for Chinese to perform word segmentation.
  • the feature extraction layer constructs a semantic vector of the interview answer text according to the code corresponding to each word in the word sequence and the semantic weight corresponding to each word. It is understandable that in a text, different types of words contribute differently to the semantics of the text.
  • the corresponding semantic weight is a quantitative expression of the degree of contribution of the word to the semantics of the text in which it is located.
  • the semantic weight of words of different parts of speech is different. For example, for nouns, verbs, and auxiliary words, the semantic weight of nouns and verbs is more important than that of auxiliary words.
  • a semantic dictionary is constructed correspondingly, in which the encoding of several words and the semantic weight of the words are stored. Therefore, the feature extraction layer generates the semantic vector of the interview answer text according to the encoding and semantic weight of each word in the semantic dictionary corresponding to the word sequence of the interview answer text.
  • the classification model is constructed by a text-CNN neural network, as shown in FIG. 5, before step 331, the method further includes: step 410, obtaining a text truncation length determined for word segmentation.
  • step 430 the interview answer text is truncated according to the acquired text truncation length, and the text retained by the truncation is used as the object for word segmentation.
  • text-CNN is an algorithm that uses convolutional neural networks to classify text. Before the text-CNN neural network classifies the interview answer text, the interview answer text needs to be truncated according to the text truncation length set for the text-CNN neural network.
  • the text truncation length limits the length of the text input to the classification model for classification, that is, if the text length exceeds the text truncation length, the text is truncated according to the text truncation length, and the part of the text that exceeds the text truncation length is removed, so that The text length of the truncated text is the truncated length of the text. And if the text length of the text does not exceed the text truncation length, when constructing the semantic vector for the text, it needs to be supplemented, that is, supplementary characters, such as 0; so that the semantic vector constructed for the text remains the same as the text truncation The length is the same.
  • the text truncation length is determined in order to determine the training parameter values of the classification model.
  • a reasonable text truncation length can improve the training efficiency of the classification model on the basis of ensuring that the semantic features of the text are fully captured.
  • the training parameters of the classification model are set according to the text truncation length, whether in the process of training the classification model or used to classify the interview answer text, the text (that is, the sample answer Text or interview answer text) to be truncated.
  • the length of the text refers to the number of words obtained after the text is segmented.
  • the method before step 410, further includes: determining the text truncation length according to the text length of each sample response text.
  • the text truncation length is too short, on the one hand, it will lead to insufficient information captured from the interview answer text, thereby reducing the accuracy of the interview answer text classification.
  • the number of batches will be too small, and the training path to convergence will be more random, so the classification accuracy of the classification model is not high; on the contrary, if the text is truncated, the training time of the classification model will be too long on the one hand, and the other This will lead to a long batch training time, and it is easy to fall into the local optimum.
  • the text truncation length is determined for the classification model according to the actual application scenario of the classification model, that is, the text truncation length is determined according to the text length of the answer text of each sample.
  • the text length of each sample response text represents to a certain extent the range of the text length of the interview response text, so that the text truncation length can be determined by the text length of each sample response text, and the determined text can be truncated
  • the length is adapted to the actual situation in classifying the interview answer text.
  • determining the text truncation length according to the text length of each sample answer text includes: step 510, obtaining the word segmentation of each sample answer text to obtain the text length of each sample answer text, The number of words obtained by word segmentation of the answer text is used as the text length of the sample answer text.
  • Step 530 Calculate the average text length and the standard deviation of the text length according to the text length of each sample response text.
  • Step 550 Determine the text truncation length according to the average text length and the standard deviation of the text length.
  • the weighted sum of the average text length and the standard deviation of the text length is used as the text truncated length.
  • the text truncation length determined by the average text length and the standard deviation of the text length achieves a balance between fully retaining the information of the sample answer text or interview answer text and improving the training efficiency of the classification model.
  • the method further includes:
  • a neural network model is pre-built according to the set capability items, and the neural network model includes a fully connected layer correspondingly constructed for each set capability item.
  • Step 630 Train the neural network model through several sample answer texts and label data corresponding to each sample answer text until the loss function of the neural network model converges, and the convergence function is the weighted sum of the cross entropy of each set ability item.
  • Step 650 Use the neural network model when the loss function converges as a classification model.
  • the cross entropy on the set ability item is the expectation of all the information on the set ability item, namely Among them, H(p 1 ) represents the cross entropy on the set ability item p 1 , p 1 (x i ) represents the probability that the variable X takes the value of x i , and n indicates that the variable X can be set in the set ability item p 1 The number of values.
  • the convergence function of the neural network model is: Among them, m represents the number of ability items set.
  • the training process of the pre-built neural network model is: predict the scoring level of each sample answer text on each set ability item through the neural network model, if the predicted score level on the set ability item is compared with the If the label data corresponding to the sample question and answer text has inconsistent scoring levels on the set ability item, adjust the model parameters of the neural network model; otherwise, if they are consistent, continue to use the next sample answer text for training. And during the training process, if the loss function converges, the training is stopped. The neural network model when the loss function converges is used as the classification model.
  • the following is an embodiment of the apparatus of this application, which can be used to implement an embodiment of a method for classifying interview answer text executed by the apparatus 200 of this application.
  • a method for classifying interview answer text executed by the apparatus 200 of this application.
  • Fig. 8 is a block diagram of a device for classifying interview answer texts according to an exemplary embodiment.
  • the device for classifying interview answer texts can be configured in the device 200 of Fig. 1 to execute any one of the above method embodiments. All or part of the steps of the interview answer text classification method.
  • the apparatus for classifying interview answer text includes but is not limited to: an acquisition module 710 configured to acquire interview answer text of the interviewer, the interview answer text being obtained based on the interviewer's replies to interview questions during the interview.
  • the semantic vector construction module 730 is configured to construct the semantic vector of the interview answer text through the feature extraction layer of the constructed classification model.
  • the classification model is obtained by training a number of sample answer texts and label data labeled for each sample answer text.
  • the label data indicates the scoring level marked on the set ability item for the interviewer based on the sample answer text.
  • the fully connected module 750 is configured to fully connect each fully connected layer of the classification model according to the semantic vector to obtain the feature vector correspondingly.
  • the feature vector obtained on the fully connected layer is configured to represent the sample answer text in the fully connected layer.
  • the classification model includes at least two fully connected layers, and each fully connected layer corresponds to a set capability item.
  • the classification prediction module 770 is configured to perform classification prediction on the feature vectors obtained in each fully connected layer, and obtain the scoring levels of the interviewers on each set ability item respectively.
  • modules can be implemented by hardware, software, or a combination of both.
  • these modules may be implemented as one or more hardware modules, such as one or more application specific integrated circuits.
  • these modules may be implemented as one or more computer programs executed on one or more processors, for example, a program stored in the memory 250 executed by the processor 270 in FIG. 1.
  • the acquisition module 710 includes: a collection unit configured to collect the interviewer's response voice to the interview question during the interview.
  • the voice recognition unit is configured to perform voice recognition on the reply voice to obtain the interview answer text corresponding to the reply voice.
  • the semantic vector construction module 730 includes a word segmentation unit configured to segment the interview answer text through the feature extraction layer of the classification model to obtain a word sequence composed of several words.
  • the semantic vector construction unit is configured to construct the semantic vector of the interview answer text according to the code corresponding to each word in the word sequence and the semantic weight corresponding to each word through the feature extraction layer.
  • the classification model is constructed by a text-CNN neural network
  • the classification device further includes: a text truncation length acquisition module configured to acquire a text truncation length determined for word segmentation.
  • the truncation module is configured to truncate the interview answer text according to the acquired text truncation length, and use the text retained by the truncation as the object for word segmentation.
  • the classification device further includes: a text truncation length determining module configured to determine the text truncation length according to the text length of each sample response text.
  • the text truncation length determining module includes: a text length obtaining unit configured to obtain the text length of each sample answer text by segmenting each sample answer text, and the number of words obtained by segmenting the sample answer text as The text length of the sample answer text.
  • the calculation unit is configured to calculate the average text length and the standard deviation of the text length according to the text length of each sample answer text.
  • the determining unit is configured to determine the text truncation length according to the average text length and the standard deviation of the text length.
  • the classification device further includes: a pre-construction module configured to pre-construct a neural network model according to a number of set capability items, the neural network model including a fully connected layer corresponding to each set capability item .
  • the training module is configured to train the neural network model through several sample answer texts and the label data corresponding to each sample answer text until the loss function of the neural network model converges, and the convergence function is the weight of the cross entropy on each set ability item with.
  • the classification model obtaining module is configured to use the neural network model when the loss function converges as the classification model.
  • the present application further provides an electronic device, which can execute all or part of the steps of the interview answer text classification method shown in any of the above method embodiments.
  • the electronic device includes: a processor 1001; and a memory 1002.
  • the memory 1002 stores computer readable instructions, and the computer readable instructions are executed by the processor 1001 to implement any one of the above methods.
  • the executable instruction is executed by the processor 1001 to implement the method in any of the above embodiments.
  • the executable instructions are, for example, computer-readable instructions.
  • the processor reads the computer-readable instructions stored in the memory through the communication line/bus 1003 connected to the memory.
  • a computer non-volatile readable storage medium is also provided, on which a computer program is stored, and the computer program is executed by a processor to implement the method in any of the above method embodiments.
  • the non-volatile computer readable storage medium includes, for example, the memory 250 of the computer program, and the above instructions can be executed by the processor 270 of the device 200 to implement the interview answer text classification method in any of the above embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种面试回答文本的分类方法及装置,包括:获取面试者的面试回答文本,所述面试回答文本是根据面试者在面试中对面试提问的回复获得的(310);通过所构建分类模型的特征提取层构建面试回答文本的语义向量,所述分类模型是通过若干样本回答文本和为每一样本回答文本所标注的标签数据进行训练获得的,所述标签数据指示了根据所述样本回答文本为所述面试者所标注在设定能力项上的评分等级(330);通过所述分类模型的每一全连接层分别根据语义向量进行全连接,对应获得特征向量,在全连接层上所获得的特征向量用于表征样本回答文本在全连接层所对应设定能力项上的特征,所述分类模型包括至少两个全连接层,每一全连接层对应一设定能力项(350);对在每一全连接层所获得的特征向量进行分类预测,分别获得面试者在各设定能力项上的评分等级(370)。该方法实现了词典的自动扩充,提高了面试回答文本的分类的速率,实现了自动对面试者进行面试评价。

Description

面试回答文本的分类方法及装置、电子设备、存储介质
本申请要求2019年9月18日递交、发明名称为“面试回答文本的分类方法及装置、电子设备、存储介质”的中国专利申请CN 201910882034.0的优先权,在此通过引用将其全部内容合并于此。
技术领域
本申请涉及人工智能技术领域,具体涉及一种面试回答文本的分类方法及装置、电子设备、计算机可读存储介质。
背景技术
对于面试而言,需要根据面试者对提问所作的回复来评价面试者在多个设定能力项上的能力,即分别确定面试者在每一设定能力项上的评分等级。
一般是由面试官对面试者进行面试,然后由面试官根据面试者在面试过程中的回答语料对面试者的各方面能力进行评估。发明人意识到:由于通过面试官根据面试者回答语料确定面试者在各设定能力项上的评分等级,存在效率低的问题。
由上可知,需要一种方法来自动对面试者进行评估,而不依赖于面试官对面试者进行评估,从而提高面试评估的效率。
发明内容
为了解决现有技术中因面试官进行面试评估所造成面试评估效率低的问题,本申请的实施例提供了一种面试回答文本的分类方法及装置、电子设备、计算机可读存储介质,以实现自动进行面试评估。
第一方面,一种面试回答文本的分类方法,所述方法包括:
获取面试者的面试回答文本,所述面试回答文本是根据所述面试者在面试中对面试提问的回复获得的;通过所构建分类模型的特征提取层构建所述面试回答文本的语义向量,所述分类模型是通过若干样本回答文本和为每一样本回答文本所标注的标签数据进行训练获得的,所述标签数据指示了根据所述样本回答文本为所述面试者所标注在设定能力项上的评分等级;通过所述分类模型的每一全连接层分别根据所述语义向量进行全连接,对应获得特征向量,在所述全连接层上所获得的所述特征向量用于表征所述样本回答文本在所述全连接层所对应设定能力项上的特征,所述分类模型包括至少两个全连接层,每一全连接层对应一设定能力项;对在每一全连接层所获得的特征向量进行分类 预测,分别获得所述面试者在各设定能力项上的评分等级。
第二方面,一种面试回答文本的分类装置,所述分类装置包括:获取模块,配置为获取为面试者的面试回答文本,所述面试回答文本是根据所述面试者在面试中对面试提问的回复获得的;语义向量构建模块,配置为通过所构建分类模型的特征提取层构建所述面试回答文本的语义向量,所述分类模型是通过若干样本回答文本和为每一样本回答文本所标注的标签数据进行训练获得的,所述标签数据指示了根据所述样本回答文本为所述面试者所标注在设定能力项上的评分等级;全连接模块,配置为通过所述分类模型的每一全连接层分别根据所述语义向量进行全连接,对应获得特征向量,在所述全连接层上所获得的所述特征向量配置为表征所述样本回答文本在所述全连接层所对应设定能力项上的特征,所述分类模型包括至少两个全连接层,每一全连接层对应一设定能力项;分类预测模块,配置为对在每一全连接层所获得的特征向量进行分类预测,分别获得所述面试者在各设定能力项上的评分等级。
第三方面,一种电子设备,包括:处理器;及存储器,所述存储器上存储有计算机可读指令,所述计算机可读指令被所述处理器执行时实现如上所述的面试回答文本的分类方法。
第四方面,一种计算机非易性可读存储介质,其上存储有计算机可读指令,当所述计算机可读指令被计算机的处理器执行时,实现如上所述的面试回答文本的分类方法。
通过本申请的技术方案,根据面试者的面试回答文本自动确定面试者在各个设定能力项的评分等级,实现根据面试者的面试回答文本评估面试者在各设定能力项上的能力,换言之,实现了自动进行面试评价。而不需要依赖面试官根据对面试者的面试情况对面试者在各个能力项上进行评估,大幅提高了面试评估的效率。而且,由于不需要面试官参与到面试评价中,从而可以避免因面试官的主观意志和个人喜好所导致面试官对面试者在各个能力项上所作出的评分等级不准确客观的问题。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。
图1是示例性示出的一种装置的框图;
图2是根据一示例性实施例示出的一种面试回答文本的分类方法的流程图;
图3是图2中步骤310在一实施例中的流程图;
图4是图2中步骤330在一实施例中的流程图;
图5是图4中步骤351之前步骤在一实施例中的流程图;
图6是根据各所述样本回答文本的文本长度确定所述文本截断长度的步骤在一实施例中的流程图;
图7是图2中步骤330之前步骤在一实施例中的流程图;
图8是根据一示例性实施例示出的一种面试回答文本的分类装置的框图;
图9是根据一示例性实施例示出的一种电子设备的框图。
通过上述附图,已示出本申请明确的实施例,后文中将有更详细的描述,这些附图和文字描述并不是为了通过任何方式限制本申请构思的范围,而是通过参考特定实施例为本领域技术人员说明本申请的概念。
具体实施方式
这里将详细地对示例性实施例执行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
图1根据一示例性实施例示出的一种装置200的框图。装置200可以作为本申请的执行主体,用于实现本申请的面试回答文本的分类方法。当然,本申请的方法并不限于以装置200作为执行主体实现,其他具备处理能力的电子设备也可以作为本申请的执行主体,用于实现本申请的面试回答文本的分类方法。
需要说明的是,该装置200只是一个适配于本申请的示例,不能认为是提供了对本申请的使用范围的任何限制。该装置也不能解释为需要依赖于或者必须具有图1中示出的示例性的装置200中的一个或者多个组件。
该装置200的硬件结构可因配置或者性能的不同而产生较大的差异,如图3所示,装置200包括:电源210、接口230、至少一存储器250、以及至少一处理器270。其中,电源210用于为装置200上的各硬件设备提供工作电压。
接口230包括至少一有线或无线网络接口231、至少一串并转换接口233、至少一输入输出接口235以及至少一USB接口237等,用于与外部设备通信。
存储器250作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者光盘等,其上所存储的资源包括操作系统251、应用程序253及数据255等,存储方式可以是短暂存储或者永久存储。其中,操作系统251用于管理与控制装置200上的各硬件设备以及应用程序253,以实现处理器270对海量数据255的计算与处理,其可以是Windows Server TM、Mac OS X TM、Unix TM、Linux TM、FreeBSD TM等。应用程序253是基于操作系统251之上完成至少一项特定工作的计算机程序,其可以包括至少一模块(图2中未示出),每个模块都可以分别包含有对装置200的一系列计算机可读指令。数据255可以是存储于磁盘中的样本文本、标签数据等。处理器270可以包括一个或多个以上的处理器,并设置为通过总线与存储器250通信,用于运算与处理存储器250中的海量数据255。
如上面所详细描述的,适用本申请的装置200将通过处理器270读取存储器250中存储的一系列计算机可读指令的形式来完成面试回答文本的分类的方法。此外,通过硬件电路或者硬件电路结合软件也能同样实现本申请,因此,实现本申请并不限于任何特定硬件电路、软件以及两者的组合。
图2是根据一示例性实施例示出的一种面试回答文本的分类方法的流程图,该方法可以由图1所示的装置200执行,也可以由其他具有处理能力的电子设备执行,在此不进行具体限定。如图2所示,该方法至少包括以下步骤:
步骤310,获取面试者的面试回答文本,面试回答文本是根据面试者在面试中对面试提问的回复获得的。对于面试而言,面试过程中,面试者针对面试提问进行回答,而所回答的内容即为对面试提问的回复。面试回答文本即为针对面试提问所作回复的文本表达,例如,如果面试者以文本的方式回答面试提问,则所作的回复即为面试回答文本;若果面试者以语音的方式回答面试提问,那么将对所作的回复进行语音识别所获得的文本即为面试回答文本。
在一具体实施例中,通过智能面试系统对面试者进行面试。在智能面试系统中,预先为待进行面试的面试者设定若干问题,例如针对面试者的简历等资料进行问题的设定。从而,在对该面试者进行面试时,根据所设定的问题对面试者进行提问,并采集面试者对提问所作的回复,进而获得面试回答文本。在该实施例中,智能面试系统即通过本申请的方法,根据面试者的面试回答文本进行分类。
步骤330,通过所构建分类模型的特征提取层构建面试回答文本的语义向量,分类模型是通过若干样本回答文本和为每一样本回答文本所标注的标签数据进行训练获得的,标签数据指示了根据样本回答文本为面试者所标注在设定能力项上的评分等级。
面试回答文本的语义向量即是面试回答文本所对应语义的向量表示。其中,分类模型是通过神经网络构建的,所构建的分类模型用于对面试回答文本进行分类,神经网络例如深度前馈网络、卷积神经网络(Convolution Neural Networks,CNN)、递归神经网络(Recurrent Neural Networks)等,通过各种神经网络进行组合,进而获得用于进行面试回答文本分类的分类模型。
对面试回答文本进行分类的目的是通过面试回答文本获得面试者设定能力项上的评分等级,因而,所进行的分类,即是将面试回答文本分类至在设定能力项上的一评分等级,从而实现了根据面试回答文本对面试者进行能力评估。
可以理解的是,为对面试者进行能力评估,是在多个设定能力项上对面试者进行能力评估。而本申请的分类模型,即是针对在多个设定能力项上对面试回答文本进行分类而是构建的。设定能力项例如学习能力、规划能力、稳定性、团队协作能力、领导能力等。当然,不同的应用场景下,对于面试者需要评估的设定能力项可能不同。因而对面试者所要评估的若干设定能力项可以根据实际需要进行选取。
可选的,分类模型包括一特征提取层、针对设定能力项所分别构建的全连接层(其中一设定能力项对应一全连接层)和输出层(每一全连接层对应有一输出层)。其中,特征提取层用于构建面试回答文本的语义向量;全连接层用于在全连接层所对应设定能力项上根据语义向量进行全连接,获得用于表征面试回答文本在该设定能力项上特征的特征向量;输出层用于根据特征向量进行输出,从而获得在设定能力项上的评分等级,值得一提的是,一设定能力项对应一输出层,即输出层所输出的评分等级即为在该输出层所对应设定能力项的评分等级。
而为了保证分类模型对于面试回答文本进行分类的准确性,在对面试回答文本进行分类之前,根据若干样本回答文本以及为面试回答文本所标注的标签数据进行模型训练,获得分类模型。如上所描述,分类模型用于根据面试回答文本输出面试者在设定能力项上的评分等级,从而,用于进行模型训练的标签数据表征了所对应词样本回答文本在每一设定能力项上的评分等级。
步骤350,通过分类模型的每一全连接层分别根据语义向量进行全连接,对应获得特征向量,在全连接层上所获得的特征向量用于表征样本回答文本在全连接层所对应设定能力项上的特征,分类模型包括至少两个全连接层,每一全连接层对应一设定能力项。
在分类模型中,为每一设定能力项对应构建有一全连接层。虽然在通过特征提取层获得了面试回答文本的语义向量,但是由于需要在至少两个设定能力项上对面试回答文 本进行分类,而面试回答文本的语义向量虽然表征了面试回答文本的全部特征,但是,在语义向量中,在各设定能力项上的特征的表现程度不同,在某些设定能力项上的特征明显,而在一些设定能力项上的特征不明显。因此,如果仅通过语义向量在至少两个设定能力项上进行分类,存在分类准确性低的问题。
因而,为了保证在每一设定能力项上进行分类的准确性,需要进一步从语义向量中将用于在一设定能力项上进行分类的特征提取出来,实现激活面试回答文本在每一设定能力项上所表现的特征。该过程即是通过设定能力项所对应全连接层根据语义向量进行全连接来实现的,对应获得用于表征面试回答文本在全连接层所对应设定能力项上的特征的特征向量。由于在分类模型中,每一全连接层对应一设定能力项,因而,为了在每一设定能力项上对面试回答文本进行分类,则通过该设定能力项所对应全连接层根据语义向量获得对应于设定能力项的特征向量。
步骤370,对在每一全连接层所获得的特征向量进行分类预测,分别获得面试者在各设定能力项上的评分等级。
所进行的分类预测,是针对在每一设定能力项上所设定的评分等级,预测该特征向量对应为每一评分等级的概率,从而,根据所预测得到的概率对应确定该面试回答文本在该设定能力项上的评分等级。
举例来说,在学习能力这一设定能力项上,预设了4个评分等级,分别为:评分等级A、评分等级B、评分等级C和评分等级D。那么,对应的,根据从对应于学习能力的全连接层所获得的特征向量,分别预测得到该面试回答文本被分类至评分等级A、B、C和D的概率。比如预测得到该面试回答文本被分类至评分等级A的概率为P1,该面试回答文本被分类至评分等级B的概率为P2,该面试回答文本被分类至评分等级C的概率为P3和该面试回答文本被分类至评分等级D的概率为P4。然后针对所预测得到的概率,遍历每一评分等级的概率,比较概率P1、P2、P3和P4的大小,若概率P1最大,在学习能力这一设定能力项上,面试回答文本被分类至评分等级A,即面试者在学习能力上的评分等级为A。
从而,通过以上步骤即可根据面试者的面试回答文本确定面试者在各个设定能力项的评分等级,实现根据面试者的面试回答文本评估面试者在各设定能力项上的能力,换言之,实现了自动进行面试评价,提高了面试评估的效率。而不需要依赖面试官根据对面试者的面试情况对面试者在各个能力项上进行评估,大幅降低了对面试者进行面试评价的工作量。而且,由于不需要面试官参与到面试评价中,从而可以避免因面试官的主 观意志和个人喜好所造成的评估结果不准确不客观。
在一实施例中,如图3所示,步骤310包括:步骤311,采集面试者在面试过程中针对面试提问的回复语音。在本实施例中,采用语音的方式对面试者进行面试,并在面试过程中,进行语音采集,从而获得面试者在该过程中针对面试提问的回复语音。步骤313,对回复语音进行语音识别,获得回复语音所对应的面试回答文本。所进行的语音识别,即将回复语音识别为文本,从而获得回复语音所对应的面试回答文本。在具体实施例中,为进行语音识别,可以直接调用现有技术中的语音识别工具进行。
在一实施例中,如图4所示,步骤330包括:步骤331,通过分类模型的特征提取层对面试回答文本进行分词,获得由若干词所构成的词序列。分词是指将连续的面试回答文本按照一定的规范划分成词序列的过程,从而获得由若干单独的词构成的词序列。其中,所进行的分词,可以是基于字符串匹配的分词方法、基于理解的分词方法以及基于统计的分词方法,在此不进行具体限定。在一具体实施例中,还可以直接调用分词工具进行分词,例如jieba、SnowNLP、THULAC、NLPIR等。
值得一提的是,针对不同的语言,所用于进行分词的方法可能不同,例如,对于英文文本可以直接通过空格和标点进行分词,而对于中文文本,由于字与字之间并没有空格,通过空格进行分词是不行的,那么需要采用适应于中文的分词方法进行分词。
步骤333,通过特征提取层根据词词序列中各词所对应的编码以及各词所对应的语义权重构建得到面试回答文本的语义向量。可以理解的是,在文本中,不同类型的词对于文本的语义的贡献程度是不同的。而此所对应的语义权重即是对词对所在文本的语义的贡献程度的量化表示。在面试回答文本中,不同词性的词的语义权重是不同的,例如对于名词、动词、助词而言,名词和动词的语义权重大于助词的语义权重。
为进行面试回答文本的分类,对应构建有一语义词典,在该语义词典中,存储有若干词的编码,以及词的语义权重。从而特征提取层根据面试回答文本所对应词序列中各词在语义词典中的编码以及语义权重,对应生成该面试回答文本的语义向量。
在一实施例中,分类模型是通过text-CNN神经网络所构建的,如图5所示,在步骤331之前,该方法还包括:步骤410,获取为进行分词而确定的文本截断长度。步骤430,根据所获取的文本截断长度对面试回答文本进行截断,将通过截断所保留的文本作为进行分词的对象。
text-CNN是利用卷积神经网络对文本进行分类的算法。而在text-CNN神经网络对面试回答文本进行分类之前,需要按照为该text-CNN神经网络所设定的文本截断长 度来对面试回答文本进行截断。
该文本截断长度限定了输入至分类模型进行分类的文本的长度,即如果文本的文本长度超过该文本截断长度,则按照文本截断长度进行截断,将文本中超出该文本截断长度的部分去除,使得截断后文本的文本长度为该文本截断长度。而若文本的文本长度未超过文本截断长度,在在为该文本构建语义向量时,需要进行补位,即补充补位字符,例如补充0;从而使得为文本所构建的语义向量保持与文本截断长度一致。
该文本截断长度是为了确定分类模型的训练参数值而确定的。合理的文本截断长度可以在保证充分捕捉到文本的语义特征的基础上,提高分类模型的训练效率。
从而,在根据文本截断长度设定好分类模型的训练参数之后,不管是在对分类模型进行训练还是用于对面试回答文本进行分类的过程中,均按照此文本截断长度对文本(即样本回答文本或面试回答文本)进行截断。其中,文本的长度,即将文本进行分词之后所获得词的数量。
在一实施例中,步骤410之前,该方法还包括:根据各样本回答文本的文本长度确定文本截断长度。对于通过text-CNN神经网络所构建的分类模型而言,如果文本截断长度过短,则一方面会导致从面试回答文本所捕捉的信息不够,从而降低面试回答文本的分类的准确性,另一方面会导致批处理数量过少,则训练到收敛的路径比较随机,从而分类模型的分类精度不高;反之,如果文本截断长度过程,则一方面会导致分类模型的训练时间过长,另一方面会导致一次批训练时间久,容易陷入局部最优。从而,为了保证分类模型的训练效率和分类模型的分类精度,根据分类模型的实际应用场景来为该分类模型确定文本截断长度,即根据各样本回答文本的文本长度来确定文本截断长度。
可以理解的是,各样本回答文本的文本长度在一定程度上表征了面试回答文本的文本长度的范围,从而通过各各样本回答文本的文本长度来确定文本截断长度,可以使所确定的文本截断长度适应于在对面试回答文本进行分类中的实际情况。
在一实施例中,如图6所示,根据各样本回答文本的文本长度确定文本截断长度,包括:步骤510,获取对各样本回答文本进行分词而获得各样本回答文本的文本长度,对样本回答文本进行分词所获得词的数量作为样本回答文本的文本长度。步骤530,根据每一样本回答文本的文本长度,计算得到文本长度均值和文本长度标准差。步骤550,根据文本长度均值和文本长度标准差确定文本截断长度。
在一具体实施例中,将文本长度均值和文本长度标准差的加权和,例如文本长度均值与文本长度标准差的和,作为文本截断长度。通过文本长度均值和文本长度标准差 所确定的文本截断长度在充分保留样本回答文本或面试回答文本的信息,和提高分类模型的训练效率之间取得了平衡。
在一实施例中,如图7所示,步骤330之前,该方法还包括:
步骤610,按照所设定的若干能力项预构建神经网络模型,神经网络模型包括为每一设定能力项对应构建的一全连接层。步骤630,通过若干样本回答文本和每一样本回答文本所对应的标签数据对神经网络模型进行训练,直至神经网络模型的损失函数收敛,收敛函数为各设定能力项上交叉熵的加权和。步骤650,将损失函数收敛时的神经网络模型作为分类模型。
对于一设定能力项,样本回答文本或者面试回答文本在该设定能力项上的评分等级为离散型随机变量X,其取值集合为C,概率分布函数p(x)=P(X=x),x∈C,那么事件X=x 0的信息量为:I(x 0)=-log(p(x 0))。
由于变量X有多种取值,每一种取值有对应的概率p(x i),则该设定能力项上的交叉熵即为该设定能力项上所有信息量的期望,即
Figure PCTCN2019118036-appb-000001
其中,H(p 1)表示在设定能力项p 1上的交叉熵,p 1(x i)表示变量X的取值为x i的概率,n表示在设定能力项p 1变量X可取值的数量。从而,神经网路模型的收敛函数为:
Figure PCTCN2019118036-appb-000002
其中,m表示所设定能力项的数量。
对预构建的神经网络模型的训练过程即:通过神经网络模型预测每一样本回答文本在每一设定能力项上的评分等级,若所预测得到在该设定能力项上的评分等级与该样本问答文本所对应标签数据中在该设定能力项上的评分等级不一致,则调整神经网络模型的模型参数;反之,如果一致,则继续用下一样本回答文本进行训练。并在训练过程中,若损失函数收敛,则停止进行训练。并将损失函数收敛时的神经网络模型作为分类模型。
下述为本申请装置实施例,可以用于执行本申请上述装置200执行的面试回答文本的分类方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请面试回答文本的分类方法实施例。
图8是根据一示例性实施例示出的一种面试回答文本的分类装置的框图,该面试回答文本的分类装置可以配置于图1的装置200中,执行以上方法实施例中任一所示的面试回答文本的分类方法的全部或者部分步骤。如图8所示,该面试回答文本的分类装置包括但不限于:获取模块710,配置为获取面试者的面试回答文本,面试回答文本是根 据面试者在面试中对面试提问的回复获得的。语义向量构建模块730,配置为通过所构建分类模型的特征提取层构建面试回答文本的语义向量,分类模型是通过若干样本回答文本和为每一样本回答文本所标注的标签数据进行训练获得的,标签数据指示了根据样本回答文本为面试者所标注在设定能力项上的评分等级。全连接模块750,配置为通过分类模型的每一全连接层分别根据语义向量进行全连接,对应获得特征向量,在全连接层上所获得的特征向量配置为表征样本回答文本在全连接层所对应设定能力项上的特征,分类模型包括至少两个全连接层,每一全连接层对应一设定能力项。分类预测模块770,配置为对在每一全连接层所获得的特征向量进行分类预测,分别获得面试者在各设定能力项上的评分等级。
上述装置中各个模块的功能和作用的实现过程具体详见上述面试回答文本的分类方法中对应步骤的实现过程,在此不再赘述。
可以理解,这些模块可以通过硬件、软件、或二者结合来实现。当以硬件方式实现时,这些模块可以实施为一个或多个硬件模块,例如一个或多个专用集成电路。当以软件方式实现时,这些模块可以实施为在一个或多个处理器上执行的一个或多个计算机程序,例如图1的处理器270所执行的存储在存储器250中的程序。
在一实施例中,获取模块710包括:采集单元,配置为采集面试者在面试过程中针对面试提问的回复语音。语音识别单元,配置为对回复语音进行语音识别,获得回复语音所对应的面试回答文本。
在一实施例中,语义向量构建模块730包括:分词单元,配置为通过分类模型的特征提取层对面试回答文本进行分词,获得由若干词所构成的词序列。语义向量构建单元,配置为通过特征提取层根据词词序列中各词所对应的编码以及各词所对应的语义权重构建得到面试回答文本的语义向量。
在一实施例中,分类模型是通过text-CNN神经网络所构建的,该分类装置还包括:文本截断长度获取模块,配置为获取为进行分词而确定的文本截断长度。截断模块,配置为根据所获取的文本截断长度对面试回答文本进行截断,将通过截断所保留的文本作为进行分词的对象。
在一实施例中,该分类装置还包括:文本截断长度确定模块,配置为根据各样本回答文本的文本长度确定文本截断长度。
在一实施例中,文本截断长度确定模块包括:文本长度获取单元,配置为获取对各样本回答文本进行分词而获得各样本回答文本的文本长度,对样本回答文本进行分词 所获得词的数量作为样本回答文本的文本长度。计算单元,配置为根据每一样本回答文本的文本长度,计算得到文本长度均值和文本长度标准差。确定单元,配置为根据文本长度均值和文本长度标准差确定文本截断长度。
在一实施例中,该分类装置还包括:预构建模块,配置为按照所设定的若干能力项预构建神经网络模型,神经网络模型包括为每一设定能力项对应构建的一全连接层。训练模块,配置为通过若干样本回答文本和每一样本回答文本所对应的标签数据对神经网络模型进行训练,直至神经网络模型的损失函数收敛,收敛函数为各设定能力项上交叉熵的加权和。分类模型获得模块,配置为将损失函数收敛时的神经网络模型作为分类模型。
上述装置中各个模块/单元的功能和作用的实现过程具体详见上述面试回答文本的分类方法中对应步骤的实现过程,在此不再赘述。
可选的,本申请还提供一种电子设备,该电子设备可以执行以上方法实施例中任一所示的面试回答文本的分类方法的全部或者部分步骤。如图9所示,电子设备包括:处理器1001;及存储器1002,存储器1002上存储有计算机可读指令,计算机可读指令被处理器1001执行时实现以上方法实施中任一项的方法。其中,可执行指令被处理器1001执行时实现以上任一实施例中的方法。其中可执行指令比如是计算机可读指令,在处理器1001执行时,处理器通过与存储器之间所连接的通信线/总线1003读取存储于存储器中的计算机可读指令。
该实施例中的装置的处理器执行操作的具体方式已经在有关该面试回答文本的分类方法的实施例中进行了详细描述,此处将不做详细阐述说明。
在示例性实施例中,还提供了一种计算机非易失性可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现如上任一方法实施例中的方法。其中计算机非易失性可读存储介质例如包括计算机程序的存储器250,上述指令可由装置200的处理器270执行以实现上述任一实施例中的面试回答文本的分类方法。
该实施例中的处理器执行操作的具体方式已经在有关该面试回答文本的分类方法的实施例中执行了详细描述,此处将不做详细阐述说明。
上述内容,仅为本申请的较佳示例性实施例,并非用于限制本申请的实施方案,本领域普通技术人员根据本申请的主要构思和精神,可以十分方便地进行相应的变通或修改,故本申请的保护范围应以权利要求书所要求的保护范围为准。

Claims (28)

  1. 一种面试回答文本的分类方法,所述方法包括:获取面试者的面试回答文本,所述面试回答文本是根据所述面试者在面试中对面试提问的回复获得的;通过所构建分类模型的特征提取层构建所述面试回答文本的语义向量,所述分类模型是通过若干样本回答文本和为每一样本回答文本所标注的标签数据进行训练获得的,所述标签数据指示了根据所述样本回答文本为所述面试者所标注在设定能力项上的评分等级;通过所述分类模型的每一全连接层分别根据所述语义向量进行全连接,对应获得特征向量,在所述全连接层上所获得的所述特征向量用于表征所述样本回答文本在所述全连接层所对应设定能力项上的特征,所述分类模型包括至少两个全连接层,每一全连接层对应一设定能力项;对在每一全连接层所获得的特征向量进行分类预测,分别获得所述面试者在各设定能力项上的评分等级。
  2. 根据权利要求1所述的方法,其中,所述获取为面试者所采集的面试数据,包括:采集面试者在面试过程中针对所述面试提问的回复语音;对所述回复语音进行语音识别,获得所述回复语音所对应的面试回答文本。
  3. 根据权利要求1所述的方法,其中,所述通过所构建分类模型的特征提取层构建所述面试回答文本的语义向量,包括:通过所述分类模型的特征提取层对所述面试回答文本进行分词,获得由若干词所构成的词序列;通过所述特征提取层根据所述词词序列中各词所对应的编码以及各词所对应的语义权重构建得到所述面试回答文本的语义向量。
  4. 根据权利要求3所述的方法,其中,所述分类模型是通过text-CNN神经网络所构建的,所述通过所述分类模型的特征提取层对所述面试回答文本进行分词,获得由若干词所构成的词序列之前,所述方法还包括:获取为进行分词而确定的文本截断长度;根据所获取的所述文本截断长度对所述面试回答文本进行截断,将通过截断所保留的文本作为进行分词的对象。
  5. 根据权利要求4所述的方法,其中,所述获取为进行分词而确定的文本截断长度之前,所述方法还包括:根据各所述样本回答文本的文本长度确定所述文本截断长度。
  6. 根据权利要求5所述的方法,其中,所述根据各所述样本回答文本的文本长度确定所述文本截断长度,包括:获取对各所述样本回答文本进行分词而获得各样本回答文本的文本长度,对样本回答文本进行分词所获得词的数量作为所述样本回答文本的文本长度;根据每一样本回答文本的文本长度,计算得到文本长度均值和文本长度标准差; 根据所述文本长度均值和所述文本长度标准差确定所述文本截断长度。
  7. 根据权利要求1-6中任一项所述的方法,其中,所述通过所构建分类模型的特征提取层构建所述面试回答文本的语义向量之前,所述方法还包括:按照所设定的若干能力项预构建神经网络模型,所述神经网络模型包括为每一设定能力项对应构建的一全连接层;通过所述若干样本回答文本和每一样本回答文本所对应的所述标签数据对所述神经网络模型进行训练,直至所述神经网络模型的损失函数收敛,所述收敛函数为各设定能力项上交叉熵的加权和;将所述损失函数收敛时的所述神经网络模型作为所述分类模型。
  8. 一种面试回答文本的分类装置,所述装置包括:获取模块,被配置为:获取为面试者的面试回答文本,所述面试回答文本是根据所述面试者在面试中对面试提问的回复获得的;语义向量构建模块,配置为通过所构建分类模型的特征提取层构建所述面试回答文本的语义向量,所述分类模型是通过若干样本回答文本和为每一样本回答文本所标注的标签数据进行训练获得的,所述标签数据指示了根据所述样本回答文本为所述面试者所标注在设定能力项上的评分等级;全连接模块,被配置为:通过所述分类模型的每一全连接层分别根据所述语义向量进行全连接,对应获得特征向量,在所述全连接层上所获得的所述特征向量配置为表征所述样本回答文本在所述全连接层所对应设定能力项上的特征,所述分类模型包括至少两个全连接层,每一全连接层对应一设定能力项;分类预测模块,被配置为:对在每一全连接层所获得的特征向量进行分类预测,分别获得所述面试者在各设定能力项上的评分等级。
  9. 根据权利要求8所述的分类装置,其中,所述获取模块,包括:采集单元,被配置为:采集面试者在面试过程中针对所述面试提问的回复语音;语音识别单元,被配置为:对所述回复语音进行语音识别,获得所述回复语音所对应的面试回答文本。
  10. 根据权利要求8所述的分类装置,其中,所述语义向量构建模块,包括:分词单元,被配置为:通过所述分类模型的特征提取层对所述面试回答文本进行分词,获得由若干词所构成的词序列;语义向量构建单元,被配置为:通过所述特征提取层根据所述词词序列中各词所对应的编码以及各词所对应的语义权重构建得到所述面试回答文本的语义向量。
  11. 根据权利要求10所述的分类装置,其中,所述分类模型是通过text-CNN神经网络所构建的,所述分类装置还包括:文本截断长度获取模块,被配置为:获取为进行分词而确定的文本截断长度;截断模块,被配置为:根据所获取的所述文本截断长度 对所述面试回答文本进行截断,将通过截断所保留的文本作为进行分词的对象。
  12. 根据权利要求11所述的分类装置,其中,所述分类装置还包括:文本截断长度确定模块,被配置为:根据各所述样本回答文本的文本长度确定所述文本截断长度。
  13. 根据权利要求12所述的分类装置,其中,所述文本截断长度确定模块,包括:文本长度获取单元,被配置为:获取对各所述样本回答文本进行分词而获得各样本回答文本的文本长度,对样本回答文本进行分词所获得词的数量作为所述样本回答文本的文本长度;计算单元,被配置为:根据每一样本回答文本的文本长度,计算得到文本长度均值和文本长度标准差;确定单元,被配置为:根据所述文本长度均值和所述文本长度标准差确定所述文本截断长度。
  14. 根据权利要求8-13中任一项所述的分类装置,其中,所述分类装置还包括:预构建模块,被配置为:按照所设定的若干能力项预构建神经网络模型,所述神经网络模型包括为每一设定能力项对应构建的一全连接层;训练模块,被配置为:通过所述若干样本回答文本和每一样本回答文本所对应的所述标签数据对所述神经网络模型进行训练,直至所述神经网络模型的损失函数收敛,所述收敛函数为各设定能力项上交叉熵的加权和;分类模型获得模块,被配置为:将所述损失函数收敛时的所述神经网络模型作为所述分类模型。
  15. 一种电子设备,包括:处理器;及存储器,所述存储器上存储有计算机可读指令,所述计算机可读指令被所述处理器执行时实现如下的步骤:
    获取面试者的面试回答文本,所述面试回答文本是根据所述面试者在面试中对面试提问的回复获得的;通过所构建分类模型的特征提取层构建所述面试回答文本的语义向量,所述分类模型是通过若干样本回答文本和为每一样本回答文本所标注的标签数据进行训练获得的,所述标签数据指示了根据所述样本回答文本为所述面试者所标注在设定能力项上的评分等级;通过所述分类模型的每一全连接层分别根据所述语义向量进行全连接,对应获得特征向量,在所述全连接层上所获得的所述特征向量用于表征所述样本回答文本在所述全连接层所对应设定能力项上的特征,所述分类模型包括至少两个全连接层,每一全连接层对应一设定能力项;对在每一全连接层所获得的特征向量进行分类预测,分别获得所述面试者在各设定能力项上的评分等级。
  16. 根据权利要求15所述的电子设备,其中,在所述获取为面试者所采集的面试数据的步骤中,所述处理器被配置为:
    采集面试者在面试过程中针对所述面试提问的回复语音;对所述回复语音进行语音 识别,获得所述回复语音所对应的面试回答文本。
  17. 根据权利要求15所述的电子设备,其中,在所述通过所构建分类模型的特征提取层构建所述面试回答文本的语义向量的步骤中,所述处理器被配置为:
    通过所述分类模型的特征提取层对所述面试回答文本进行分词,获得由若干词所构成的词序列;通过所述特征提取层根据所述词词序列中各词所对应的编码以及各词所对应的语义权重构建得到所述面试回答文本的语义向量。
  18. 根据权利要求17所述的电子设备,其中,所述分类模型是通过text-CNN神经网络所构建的,在所述通过所述分类模型的特征提取层对所述面试回答文本进行分词,获得由若干词所构成的词序列的步骤之前,所述处理器还被配置为:
    获取为进行分词而确定的文本截断长度;根据所获取的所述文本截断长度对所述面试回答文本进行截断,将通过截断所保留的文本作为进行分词的对象。
  19. 根据权利要求18所述的电子设备,其中,在所述获取为进行分词而确定的文本截断长度的步骤之前,所述处理器被配置为:根据各所述样本回答文本的文本长度确定所述文本截断长度。
  20. 根据权利要求19所述的电子设备,其中,在所述根据各所述样本回答文本的文本长度确定所述文本截断长度的步骤中,所述处理器被配置为:
    获取对各所述样本回答文本进行分词而获得各样本回答文本的文本长度,对样本回答文本进行分词所获得词的数量作为所述样本回答文本的文本长度;根据每一样本回答文本的文本长度,计算得到文本长度均值和文本长度标准差;根据所述文本长度均值和所述文本长度标准差确定所述文本截断长度。
  21. 根据权利要求15-20中任一项所述的电子设备,其中,在所述通过所构建分类模型的特征提取层构建所述面试回答文本的语义向量的步骤之前,所述处理器还被配置为:
    按照所设定的若干能力项预构建神经网络模型,所述神经网络模型包括为每一设定能力项对应构建的一全连接层;通过所述若干样本回答文本和每一样本回答文本所对应的所述标签数据对所述神经网络模型进行训练,直至所述神经网络模型的损失函数收敛,所述收敛函数为各设定能力项上交叉熵的加权和;将所述损失函数收敛时的所述神经网络模型作为所述分类模型。
  22. 一种计算机非易失性可读存储介质,其上存储有计算机可读指令,当所述计算机可读指令被计算机的处理器执行时实现如下的步骤:
    获取面试者的面试回答文本,所述面试回答文本是根据所述面试者在面试中对面试提问的回复获得的;通过所构建分类模型的特征提取层构建所述面试回答文本的语义向量,所述分类模型是通过若干样本回答文本和为每一样本回答文本所标注的标签数据进行训练获得的,所述标签数据指示了根据所述样本回答文本为所述面试者所标注在设定能力项上的评分等级;通过所述分类模型的每一全连接层分别根据所述语义向量进行全连接,对应获得特征向量,在所述全连接层上所获得的所述特征向量用于表征所述样本回答文本在所述全连接层所对应设定能力项上的特征,所述分类模型包括至少两个全连接层,每一全连接层对应一设定能力项;对在每一全连接层所获得的特征向量进行分类预测,分别获得所述面试者在各设定能力项上的评分等级。
  23. 根据权利要求22所述的计算机非易失性可读存储介质,其中,在所述获取为面试者所采集的面试回答文本的步骤中,所述处理器被配置为:
    采集面试者在面试过程中针对所述面试提问的回复语音;对所述回复语音进行语音识别,获得所述回复语音所对应的面试回答文本。
  24. 根据权利要求22所述的计算机非易失性可读存储介质,其中,在所述通过所构建分类模型的特征提取层构建所述面试回答文本的语义向量的步骤中,所述处理器被配置为:
    通过所述分类模型的特征提取层对所述面试回答文本进行分词,获得由若干词所构成的词序列;通过所述特征提取层根据所述词词序列中各词所对应的编码以及各词所对应的语义权重构建得到所述面试回答文本的语义向量。
  25. 根据权利要求24所述的计算机非易失性可读存储介质,其中,所述分类模型是通过text-CNN神经网络所构建的,在所述通过所述分类模型的特征提取层对所述面试回答文本进行分词,获得由若干词所构成的词序列的步骤之前,所述处理器还被配置为:
    获取为进行分词而确定的文本截断长度;根据所获取的所述文本截断长度对所述面试回答文本进行截断,将通过截断所保留的文本作为进行分词的对象。
  26. 根据权利要求25所述的计算机非易失性可读存储介质,其中,在所述获取为进行分词而确定的文本截断长度的步骤之前,所述处理器还被配置为:根据各所述样本回答文本的文本长度确定所述文本截断长度。
  27. 根据权利要求26所述的计算机非易失性可读存储介质,其中,在所述根据各所述样本回答文本的文本长度确定所述文本截断长度的步骤中,所述处理器被配置为:
    获取对各所述样本回答文本进行分词而获得各样本回答文本的文本长度,对样本回答文本进行分词所获得词的数量作为所述样本回答文本的文本长度;根据每一样本回答文本的文本长度,计算得到文本长度均值和文本长度标准差;根据所述文本长度均值和所述文本长度标准差确定所述文本截断长度。
  28. 根据权利要求22-27中任一项所述的计算机非易失性可读存储介质,其中,在所述通过所构建分类模型的特征提取层构建所述面试回答文本的语义向量的步骤之前,所述处理器被配置为:
    按照所设定的若干能力项预构建神经网络模型,所述神经网络模型包括为每一设定能力项对应构建的一全连接层;通过所述若干样本回答文本和每一样本回答文本所对应的所述标签数据对所述神经网络模型进行训练,直至所述神经网络模型的损失函数收敛,所述收敛函数为各设定能力项上交叉熵的加权和;将所述损失函数收敛时的所述神经网络模型作为所述分类模型。
PCT/CN2019/118036 2019-09-18 2019-11-13 面试回答文本的分类方法及装置、电子设备、存储介质 WO2021051586A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910882034.0 2019-09-18
CN201910882034.0A CN110717023B (zh) 2019-09-18 2019-09-18 面试回答文本的分类方法及装置、电子设备、存储介质

Publications (1)

Publication Number Publication Date
WO2021051586A1 true WO2021051586A1 (zh) 2021-03-25

Family

ID=69210550

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118036 WO2021051586A1 (zh) 2019-09-18 2019-11-13 面试回答文本的分类方法及装置、电子设备、存储介质

Country Status (2)

Country Link
CN (1) CN110717023B (zh)
WO (1) WO2021051586A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449095A (zh) * 2021-07-02 2021-09-28 中国工商银行股份有限公司 一种面试数据分析方法和装置

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111234A (zh) * 2020-02-13 2021-07-13 北京明亿科技有限公司 基于正则表达式的处警警情类别确定方法和装置
CN111522916B (zh) * 2020-04-20 2021-03-09 马上消费金融股份有限公司 一种语音服务质量检测方法、模型训练方法及装置
CN111695591B (zh) * 2020-04-26 2024-05-10 平安科技(深圳)有限公司 基于ai的面试语料分类方法、装置、计算机设备和介质
CN111695352A (zh) * 2020-05-28 2020-09-22 平安科技(深圳)有限公司 基于语义分析的评分方法、装置、终端设备及存储介质
CN111709630A (zh) * 2020-06-08 2020-09-25 深圳乐信软件技术有限公司 语音质检方法、装置、设备及存储介质
CN116452047A (zh) * 2023-04-12 2023-07-18 上海才历网络有限公司 一种候选人胜任能力测评方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170084269A1 (en) * 2015-09-17 2017-03-23 Panasonic Intellectual Property Management Co., Ltd. Subject estimation system for estimating subject of dialog
CN108519975A (zh) * 2018-04-03 2018-09-11 北京先声教育科技有限公司 作文评分方法、装置及存储介质
CN109299246A (zh) * 2018-12-04 2019-02-01 北京容联易通信息技术有限公司 一种文本分类方法及装置
CN109670168A (zh) * 2018-11-14 2019-04-23 华南师范大学 基于特征学习的短答案自动评分方法、系统及存储介质
CN109918506A (zh) * 2019-03-07 2019-06-21 安徽省泰岳祥升软件有限公司 一种文本分类方法及装置
CN109918497A (zh) * 2018-12-21 2019-06-21 厦门市美亚柏科信息股份有限公司 一种基于改进textCNN模型的文本分类方法、装置及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102416048B1 (ko) * 2017-10-16 2022-07-04 일루미나, 인코포레이티드 변이체 분류를 위한 심층 컨볼루션 신경망
CN109241288A (zh) * 2018-10-12 2019-01-18 平安科技(深圳)有限公司 文本分类模型的更新训练方法、装置及设备
CN109522395A (zh) * 2018-10-12 2019-03-26 平安科技(深圳)有限公司 自动问答方法及装置
CN109978339A (zh) * 2019-02-27 2019-07-05 平安科技(深圳)有限公司 Ai面试模型训练方法、装置、计算机设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170084269A1 (en) * 2015-09-17 2017-03-23 Panasonic Intellectual Property Management Co., Ltd. Subject estimation system for estimating subject of dialog
CN108519975A (zh) * 2018-04-03 2018-09-11 北京先声教育科技有限公司 作文评分方法、装置及存储介质
CN109670168A (zh) * 2018-11-14 2019-04-23 华南师范大学 基于特征学习的短答案自动评分方法、系统及存储介质
CN109299246A (zh) * 2018-12-04 2019-02-01 北京容联易通信息技术有限公司 一种文本分类方法及装置
CN109918497A (zh) * 2018-12-21 2019-06-21 厦门市美亚柏科信息股份有限公司 一种基于改进textCNN模型的文本分类方法、装置及存储介质
CN109918506A (zh) * 2019-03-07 2019-06-21 安徽省泰岳祥升软件有限公司 一种文本分类方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449095A (zh) * 2021-07-02 2021-09-28 中国工商银行股份有限公司 一种面试数据分析方法和装置

Also Published As

Publication number Publication date
CN110717023B (zh) 2023-11-07
CN110717023A (zh) 2020-01-21

Similar Documents

Publication Publication Date Title
WO2021051586A1 (zh) 面试回答文本的分类方法及装置、电子设备、存储介质
CN110717039B (zh) 文本分类方法和装置、电子设备、计算机可读存储介质
US11017220B2 (en) Classification model training method, server, and storage medium
US10942962B2 (en) Systems and methods for categorizing and moderating user-generated content in an online environment
CN107436875B (zh) 文本分类方法及装置
US11544459B2 (en) Method and apparatus for determining feature words and server
CN110909165B (zh) 数据处理方法、装置、介质及电子设备
WO2021051598A1 (zh) 文本情感分析模型训练方法、装置、设备及可读存储介质
CN111078887B (zh) 文本分类方法和装置
KR20200127020A (ko) 의미 텍스트 데이터를 태그와 매칭시키는 방법, 장치 및 명령을 저장하는 컴퓨터 판독 가능한 기억 매체
EP3567865A1 (en) Method and system for processing on-screen comment information
WO2020087774A1 (zh) 基于概念树的意图识别方法、装置及计算机设备
WO2020238353A1 (zh) 数据处理方法和装置、存储介质及电子装置
CN110705255B (zh) 检测语句之间的关联关系的方法和装置
CN112732871B (zh) 一种机器人催收获取客户意向标签的多标签分类方法
US10417578B2 (en) Method and system for predicting requirements of a user for resources over a computer network
WO2021218027A1 (zh) 智能面试中专业术语的提取方法、装置、设备及介质
US11875114B2 (en) Method and system for extracting information from a document
CN112528022A (zh) 主题类别对应的特征词提取和文本主题类别识别方法
WO2021174814A1 (zh) 众包任务的答案验证方法、装置、计算机设备及存储介质
US20100296728A1 (en) Discrimination Apparatus, Method of Discrimination, and Computer Program
CN115048523B (zh) 文本分类方法、装置、设备以及存储介质
CN113095073B (zh) 语料标签生成方法、装置、计算机设备和存储介质
CN114357152A (zh) 信息处理方法、装置、计算机可读存储介质和计算机设备
CN111708884A (zh) 文本分类方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19945893

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19945893

Country of ref document: EP

Kind code of ref document: A1