CN111695338A

CN111695338A - Interview content refining method, device, equipment and medium based on artificial intelligence

Info

Publication number: CN111695338A
Application number: CN202010356767.3A
Authority: CN
Inventors: 邓悦; 金戈; 徐亮
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-04-29
Filing date: 2020-04-29
Publication date: 2020-09-22
Also published as: WO2021218028A1

Abstract

The invention discloses an interview content refining method, device, equipment and medium based on artificial intelligence, wherein the method comprises the following steps: acquiring interview recording, converting the interview recording into a self-introduction text and an interview response text, performing text analysis on the self-introduction text to obtain basic information of an interviewer, classifying the sentences of the interview response text to obtain classified texts, extracting the sentences from each classified text through a language extraction model to obtain extracted sentences, and refining and extracting statements by adopting a Transformer model to obtain an interview refining corpus, so that the interview is refined from interview record contents with large data volume, accurately refine the core content, improve the accuracy of content refinement, facilitate the improvement of the accuracy of intelligent interview evaluation, store the interviewer basic information and interview refining corpus into a block chain, meanwhile, the evaluation result is sent to the management terminal for evaluation, so that the situation that the evaluation result cannot meet the requirement due to direct semantic recognition is avoided, and the accuracy and the efficiency of the evaluation of the intelligent interview result are improved.

Description

Interview content refining method, device, equipment and medium based on artificial intelligence

Technical Field

The invention relates to the field of artificial intelligence, in particular to an interview content refining method, device, equipment and medium based on artificial intelligence.

Background

In a hot season of recruitment of large enterprises, a plurality of interviewers participate in interviewing, and at present, most people and interviewers conduct interviewing in a field or video conference mode. The human unit will evaluate the interviewer, often after interviewing, in conjunction with the interview response of the interviewer. The usual manual interviews store at least the following questions: (1) different interviewers have a better angle for asking questions, and the same interviewer can also judge differently due to different workplace experiences, interview skills and emotional states; (2) in view of the high labor cost and the interview time cost, some enterprises adopt an interview robot based on artificial intelligence to conduct interview and provide obtained interview contents for decision makers to evaluate results, so that the interview fairness is improved, but a new problem is caused, when more interviewers are available, the obtained interview contents are more, the time cost of decision evaluation is increased, and the efficiency of intelligent interview is low.

The existing solution mainly obtains key sentences by performing keyword matching on interview content, or performs semantic recognition by using a Natural Language Processing (NLP) model, and when a keyword matching method is adopted, because different interviewees answer questions in response processes in different ways, the problem may not be matched with preset keywords, so that the final interview estimation accuracy is low, and when a general Natural Language Processing model is adopted for semantic recognition, the semantic recognition accuracy often fails to meet the requirements.

Disclosure of Invention

The embodiment of the invention provides an interview content refining method, device and medium based on artificial intelligence, so as to improve the accuracy of interview content evaluation in an intelligent interview.

In order to solve the above technical problem, an embodiment of the present application provides an interview content refining method based on artificial intelligence, including:

acquiring interview recording, and converting the interview recording into an interview text, wherein the interview text comprises a self-introduction text and an interview response text;

performing text analysis on the self-introduction text to obtain basic information of the interviewer;

according to the related interview angle, carrying out sentence classification on the interview response text to obtain a classified text;

extracting sentences from each type of classified text through a language extraction model to obtain extracted sentences, and refining the extracted sentences by adopting a Transformer model to obtain interview refined linguistic data;

and sending the interviewer basic information and the interview refined corpus to a management end so that the management end determines an interview result according to the interviewer basic information and the interview refined corpus.

Optionally, the converting the interview recording into an interview text comprises:

identifying question and answer starting marks contained in the interview recording;

and performing text conversion on the interview recording by adopting a voice text conversion mode, converting the recording content before the question-answer starting identifier into a text serving as a self-introduction text, and converting the recording content before the question-answer starting identifier into a text serving as an interview response text.

Optionally, the performing sentence classification on the interview response text according to the related interview angle to obtain a classified text includes:

taking each sentence in the interview response text as a basic sentence, and performing word segmentation processing on the basic sentence in a preset word segmentation mode to obtain basic words;

converting the basic participles into word vectors, and clustering the word vectors through a clustering algorithm to obtain a clustering center corresponding to the basic sentences;

and calculating the Euclidean distance between the clustering center corresponding to the basic sentence and the word vector corresponding to each preset interview angle for each basic sentence, taking the preset interview angle with the minimum distance as the target classification of the basic sentence, and taking the basic sentence as the classified text corresponding to the target classification.

Optionally, the word segmentation processing on the basic sentence through a preset word segmentation mode to obtain a basic word segmentation includes:

performing word segmentation on the basic sentence by adopting a conditional random field model to obtain an initial word segmentation;

acquiring the word frequency of each initial participle from a historical interview response text;

and generating the weight of the initial participle based on the word frequency of the initial participle, and taking the initial participle marked with the weight as the basic participle.

Optionally, the language extraction model is a bidirectional long and short term memory network model, the bidirectional long and short term memory network model includes a sentence encoder and a document encoder, and extracting sentences from each type of classified text through the language extraction model to obtain extracted sentences includes:

splitting the texts in the classified texts according to characters through the sentence encoder to obtain basic characters;

encoding a basic character to obtain encoding content corresponding to the basic character;

inputting the coded content into a character coding layer with an initialized weight, mapping each code into a character vector through the character coding layer, and taking each character vector as a sentence coding result;

splicing the sentence coding result into a hidden layer vector at the forward hidden layer output and the reverse hidden layer output, and inputting the hidden layer vector into the document coder;

and weighting the hidden layer vector by the document encoder to obtain a document feature vector, decoding the document feature vector, and taking an output result obtained by decoding as the extraction statement.

Optionally, the weighting the hidden layer vector by the document encoder to obtain a document feature vector includes:

determining the document feature vector by adopting the following formula:

wherein, C_iIs the ith document feature vector, j is the sequence number of the embedded codes, n is the number of the embedded codes, b_ijFor the weight of the ith document feature vector to the jth hidden layer vector, h_jIs the jth hidden layer vector, wherein the embedded coding is generated based on the hidden state of the bidirectional long short term memory network model.

Optionally, after the extracting statement is refined by using a Transformer model to obtain an interview refined corpus, the method for refining interview content based on artificial intelligence further includes: and storing the interviewer basic information and the interview refining corpus into a block chain network.

In order to solve the above technical problem, an embodiment of the present application further provides an interview content refining device based on artificial intelligence, including:

the system comprises a text acquisition module, a data processing module and a data processing module, wherein the text acquisition module is used for acquiring interview recording and converting the interview recording into an interview text, and the interview text comprises a self-introduction text and an interview response text;

the text analysis module is used for carrying out text analysis on the self-introduction text to obtain the basic information of the interviewer;

the text classification module is used for carrying out sentence classification on the interview response text according to the related interview angles to obtain a classified text;

the corpus extraction module is used for extracting sentences from each type of classified text through a language extraction model to obtain extracted sentences, and refining the extracted sentences by adopting a Transformer model to obtain interview refined corpuses;

and the information sending module is used for sending the interviewer basic information and the interview refining corpus to a management end so that the management end determines an interview result according to the interviewer basic information and the interview refining corpus.

Optionally, the text obtaining module includes:

the identification unit is used for identifying the question and answer starting identification contained in the interview recording;

and the text determining unit is used for performing text conversion on the interview recording by adopting a text-to-speech conversion mode, converting the recording content before the question and answer starting identifier into a text which is used as a self-introduction text, and converting the recording content before the question and answer starting identifier into a text which is used as an interview response text.

Optionally, the text classification module includes:

the word segmentation unit is used for taking each sentence in the interview response text as a basic sentence and carrying out word segmentation processing on the basic sentence in a preset word segmentation mode to obtain basic words;

the clustering unit is used for converting the basic participles into word vectors and clustering the word vectors through a clustering algorithm to obtain a clustering center corresponding to the basic sentences;

and the classification unit is used for calculating the Euclidean distance between the clustering center corresponding to the basic sentence and the word vector corresponding to each preset interview angle for each basic sentence, taking the preset interview angle with the minimum distance as the target classification of the basic sentence, and taking the basic sentence as the classified text corresponding to the target classification.

Optionally, the word segmentation unit includes:

the initial word segmentation unit is used for segmenting the basic sentence by adopting a conditional random field model to obtain initial segmented words;

the word frequency obtaining subunit is used for obtaining the word frequency of each initial participle from a historical interview response text;

and the participle weighting unit is used for generating the weight of the initial participle based on the word frequency of the initial participle, and taking the initial participle marked with the weight as the basic participle.

Optionally, the corpus extraction module includes:

the splitting unit is used for splitting the texts in the classified texts according to characters through the sentence encoder to obtain basic characters;

the encoding unit is used for encoding the basic character to obtain the encoding content corresponding to the basic character;

the mapping unit is used for inputting the coded contents into a character coding layer with initialized weight, mapping each code into a character vector through the character coding layer, and taking each character vector as a sentence coding result;

the splicing unit is used for splicing the sentence coding results into hidden layer vectors in the forward and reverse hidden layer outputs and inputting the hidden layer vectors into the document coder;

and the weighting unit is used for weighting the hidden layer vector through the document encoder to obtain a document feature vector, decoding the document feature vector and taking an output result obtained by decoding as the extraction statement.

Optionally, the weighted decoding unit includes:

a calculating subunit, configured to determine the document feature vector by using the following formula:

Optionally, the apparatus for refining interview content based on artificial intelligence further comprises:

and the storage module is used for storing the interviewer basic information and the interview refining corpus into a block chain network.

In order to solve the technical problem, an embodiment of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the above artificial intelligence-based interview content refining method when executing the computer program.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the above artificial intelligence-based interview content refining method.

The invention provides an interview content refining method, device, equipment and medium based on artificial intelligence, which are characterized in that interview recording is obtained and converted into interview texts, wherein the interview texts comprise self-introduction texts and interview response texts, text analysis is carried out on the self-introduction texts to obtain basic information of an interviewer, sentence classification is carried out on the interview response texts according to related interview angles to obtain classified texts, sentence extraction is carried out from each type of classified texts through a language extraction model to obtain extracted sentences, the extracted sentences are refined by adopting a Transformer model to obtain interview refined linguistic data, core content is accurately extracted from interview recorded content with large data volume, the content refining accuracy is improved, the intelligent interview evaluation accuracy is improved, and finally the basic information of the interviewer and the interview refined linguistic data are sent to a management terminal, the management end determines the interview result according to the interviewer basic information and the interview refining corpus, so that the inaccuracy of the evaluation result caused by direct semantic recognition is avoided, and the accuracy and the efficiency of intelligent interview result evaluation are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of an artificial intelligence based interview content refining method of the present application;

FIG. 3 is a schematic block diagram of one embodiment of an artificial intelligence based interview content refining apparatus according to the present application;

FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, as shown in fig. 1, a system architecture 100 may include

terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like.

The

terminal devices

101, 102, 103 may be various electronic devices having display screens and supporting web browsing, including but not limited to smart phones, tablet computers, E-book readers, MP3 players (Moving Picture E interface displays a properties Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture E interface displays a properties Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103.

The method for refining interview content based on artificial intelligence provided by the embodiment of the application is executed by the server, and accordingly, the device for refining interview content based on artificial intelligence is arranged in the server.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. Any number of terminal devices, networks and servers may be provided according to implementation needs, and the

terminal devices

101, 102 and 103 in this embodiment may specifically correspond to an application system in actual production.

Referring to fig. 2, fig. 2 shows an interview content refining method based on artificial intelligence according to an embodiment of the present invention, which is described by taking the application of the method to the server in fig. 1 as an example, and is detailed as follows:

s201: and acquiring an interview record, and converting the interview record into an interview text, wherein the interview text comprises a self-introduction text and an interview response text.

Specifically, in the process of interviewing and recruitment of an enterprise, a plurality of interviewers participate in interviewing, because the interviewing positions are limited, the situation that a plurality of interviewers interview the same position exists, and in order to avoid confusion or forgetting of information of the interviewers, in the interviewing process, the embodiment records the interviewing process of the interviewers, converts the recorded content into an interviewing text after the fact, and performs subsequent processing, wherein the interviewing text comprises a self-introduction text and an interviewing response text.

The self-introduction text refers to a text obtained by the interviewer through self-introduction voice conversion, and the response text refers to a text which is asked by the interviewer and responded by the interviewer after the self-introduction is finished.

It should be noted that the interviewer mentioned in this embodiment may be a human, or may be a question and answer robot participating in an intelligent interview, which is not specifically limited herein.

It should be understood that the general interview time is 30-40 minutes, and even longer, so the content of the interviewer responses is relatively large in volume, for this case, the present embodiment is based on self-introduction, because the information in the self-introduction part can already summarize the ability of a larger part of the interviewer, and other parts of the interview, such as skill investigation and business acuity investigation, can be used as reference for training data to perform supplementary verification on the interview of the interviewer, so as to obtain a more comprehensive result.

In this embodiment, the interview recording is converted into the interview text, and a tool supporting voice conversion of the text or a voice conversion text algorithm may be specifically used, which is not specifically limited herein. The specific implementation process of dividing the interview text into the self-introduction text and the interview response text can refer to the description of the subsequent embodiment, and is not repeated here to avoid repetition.

S202: and carrying out text analysis on the self-introduction text to obtain the basic information of the interviewer.

Specifically, since the self-introduction text generally includes personal basic data, experience information, skilled areas and skills, past honor terms, self evaluation and other categories, the related content modules are relatively similar, and in order to improve efficiency, the embodiment adopts a text parsing manner based on regular expression to parse the self-introduction text, and quickly extract the content in the self-introduction text to obtain the basic information of the interviewer.

Wherein, the basic information of the interviewer includes but is not limited to: personal fixed information such as name, household register, graduate colleges, professions, working years and the like, and personal professional information such as acquired honor, served enterprises, professional experience and mastered skills and the like.

It should be noted that, because the dimensions of the contents included in the self-introduction text are substantially similar, the basic information of the interviewer to be acquired is divided into a plurality of dimensions, and each dimension is provided with at least one regular expression to perform matching analysis with the self-introduction text, so as to obtain the content corresponding to the dimension, which is used as the analysis content of the dimension.

The regular expression (regular expression) describes a pattern (pattern) for matching a character string, and may be used to check whether a string contains a certain substring, replace the matched substring, or extract a substring that meets a certain condition from a certain string, and the like.

For example, in one embodiment, text parsing is performed from seven dimensions, namely name, household, graduation, specialty, age, experience of the practitioner, and skills in the grasp of the household, wherein keywords comprising certain characters may be set for matching in the dimension of household, for example, matching sentences comprising certain keywords such as "i am XXX", "i am from XXX", "i am long at XXX".

S203: and (4) carrying out sentence classification on the interview response text according to the related interview angle to obtain a classified text.

Specifically, in the process of asking questions by interviewees, the questions are asked around the aspects of work experience, field of excellence, skill and the like, in the embodiment, the interview angles are preset according to actual needs, after the interview response texts are obtained, the interview response texts are subjected to sentence classification according to the related interview angles to obtain classified texts, so that the important sentences can be extracted and refined in a targeted manner according to the classification of the classified texts, and the accuracy of content refining is improved.

The interview angle refers to the focus of question and response, such as compensation requirements, awards, working years, professional skills, and the like.

Further, in this embodiment, according to the interview angle involved, semantic recognition is performed on the interview response text, and the sentences are classified according to the semantic recognition result to obtain a specific implementation process of the classified text, which may refer to the description of the subsequent embodiment and is not repeated here to avoid repetition.

The sentences are classified according to the semantic recognition result, specifically, the recognition result is clustered to obtain a clustering result, the clustering result and the word vector corresponding to each interview angle are subjected to Euclidean distance calculation, and the interview angle closest to the clustering result is used as the interview angle corresponding to the clustering result.

S204: and extracting sentences from each type of classified text through a language extraction model to obtain extracted sentences, and refining the extracted sentences by adopting a Transformer model to obtain interview refined linguistic data.

Specifically, sentence extraction is performed from each type of classified text through a language extraction model to obtain an extracted sentence, and then a Transformer model is adopted to refine the extracted sentence to obtain an interview refined corpus.

The language extraction model includes but is not limited to: depth semantic representation (ELMo) algorithm, OpenAI GPT, and a pre-trained bi-directional encoder semantic (BERT) model.

Preferably, in this embodiment, an improved OpenAI GPT model is used as the semantic extraction model, and a process of implementing statement extraction is specifically performed, which may refer to the description of the subsequent embodiments and is not repeated here to avoid repetition.

It should be noted that the concrete expression form of the extraction statement obtained in the present embodiment may also be a vector form, so that the extraction statement can be subsequently and quickly input into a transform model for refining and extraction.

The Transformer model can quickly extract the sentences with higher importance according to the weight through an attention mechanism.

In the present embodiment, in the decoding stage, the transform model inputs the sum of the generated document feature vectors into the decoder, and this autoregressive long-short term network predicts the sentence to be extracted in the next sentence, and the output result is connected to the input when the next sentence is decoded. The biggest difference between the decoder used by the transform model in the embodiment and other common decoders is that in the process of obtaining attention by dot product, if the same index appears twice continuously, the whole extraction process is ended, and information redundancy caused by extracting similar information for multiple times is avoided.

It should be understood that, in this embodiment, there is no necessary logical sequence between step S203 to step S204 and step S202, and they may also be executed in parallel, which is not limited herein.

S205: and sending the basic information of the interviewer and the interview refining corpus to the management end, so that the management end determines an interview result according to the basic information of the interviewer and the interview refining corpus.

Specifically, the extracted interviewer basic information and the interview refining corpus are sent to the management end, the accuracy and refining of the extracted content are guaranteed, a user of the subsequent management end can accurately and quickly determine an evaluation result according to the extracted content, and the accuracy and the efficiency of intelligent interview are improved.

In the embodiment, the interview recording is obtained and converted into the interview text, wherein the interview text comprises a self-introduction text and an interview response text, the self-introduction text is subjected to text analysis to obtain basic information of an interviewer, the interview response text is subjected to sentence classification according to related interview angles to obtain a classified text, sentences are extracted from each classified text through a language extraction model to obtain extracted sentences, a Transformer model is adopted to refine the extracted sentences to obtain interview refined linguistic data, core content is accurately extracted from interview record content with large data volume, the content refining accuracy is improved, the intelligent interview evaluation accuracy is improved, and finally the basic information of the interviewer and the interview refined linguistic data are sent to a management end so that the management end determines an interview result according to the basic information of the interviewer and the interview refined linguistic data, the method avoids inaccurate evaluation results caused by directly carrying out semantic recognition, and is beneficial to improving the accuracy and efficiency of intelligent interview result evaluation.

In an embodiment, the obtained interviewer basic information and interview refined corpus can be stored on a block chain network, and data information can be shared among different platforms through block chain storage, and data can also be prevented from being tampered.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

In some optional implementation manners of this embodiment, in step S201, converting the interview recording into an interview text includes:

and performing text conversion on the interview recording by adopting a voice text conversion mode, and converting the recording content before the question and answer starting identifier to obtain a text which is used as a self-introduction text and a text which is converted from the recording content before the question and answer starting identifier to obtain an interview response text.

Specifically, before the text is converted by voice, the interview recording file is traversed, a voice segment with the same voice information as a preset question-answer start identifier is searched for and used as a demarcation point, the voice before the voice segment, the text obtained by conversion and used as a self-introduction text, and the voice after the voice segment and the text obtained by conversion and used as an interview response text.

Specifically, amplitude normalization processing, pre-emphasis processing and framing windowing are performed on voice signals to obtain a voice frame set, then voice frame segments identical to the voice frames of the preset question-answer starting identifier are found in the voice frame set in a traversing comparison mode, and the voice frame segments are determined to be the voice segments with the same voice information as the preset question-answer starting identifier.

The preset question and answer starting identifier is a voice identifier used for reminding that the self-introduction stage is finished and the question and answer link is started, for example, a voice prompt of thank you for introduction and asking you for a few questions at present, and the preset question and answer starting identifier can be preset according to actual conditions, and is not limited herein.

The text is converted by voice, a voice recognition algorithm may be adopted, and a third-party tool with a voice conversion function may also be used, which is not particularly limited. Speech-to-speech text algorithms include, but are not limited to: a voice recognition algorithm based on a sound channel model, a voice template matching recognition algorithm and/or a voice recognition algorithm of an artificial neural network and the like.

In the embodiment, the interview recording text is converted into the self-introduction text and the interview response text, so that the two types of texts are separately processed in the following process, the pertinence is better, and the obtained processing result is more accurate.

In some optional implementation manners of this embodiment, in step S203, performing sentence classification on the interview response text according to the referred interview angle, and obtaining a classified text includes:

Specifically, word segmentation and clustering are carried out on each sentence in the interview response text to obtain a clustering center corresponding to each sentence, word vectors corresponding to the clustering centers and preset interview angles are calculated, and the classification of each sentence is determined.

The preset word segmentation mode includes but is not limited to: through a third-party word segmentation tool or a word segmentation algorithm, and the like.

Common third-party word segmentation tools include, but are not limited to: the system comprises a Stanford NLP word segmentation device, an ICTCLAS word segmentation system, an ansj word segmentation tool, a HanLP Chinese word segmentation tool and the like.

The word segmentation algorithm includes, but is not limited to: a rule-based word segmentation method, a statistic-based word segmentation method, an understanding-based word segmentation method, and a neural network word segmentation method.

The rule-based word segmentation method mainly comprises the following steps: minimum Matching Method (Minimum Matching), forward Maximum Matching Method (Maximum Matching), Reverse Maximum Matching Method (Reverse Directional Maximum Matching), bidirectional Maximum Matching Method (Bi-Directional Maximum Matching, BMM), mark segmentation Method, total segmentation path selection Method, Association-Backtracking Method (AB Method for short), and the like.

The word segmentation method based on statistics mainly comprises the following steps: N-Gram Model, Hidden Markov Model (HMM) sequence notation, Maximum Entropy Model (MEM) sequence notation, Maximum Entropy Model (MEMM) sequence notation, and Conditional Random Field (CRF) sequence notation, and the like.

Preferably, the present embodiment uses an improved CRF model for word segmentation, and the specific implementation process may refer to the description of the subsequent embodiments, and is not described herein again in order to avoid repetition.

It is easy to understand that basic participles are extracted in a participle mode, on one hand, some nonsense words in texts can be effectively filtered, and on the other hand, the subsequent use of the texts to generate word vectors is facilitated.

The clustering (Cluster) algorithm is also called Cluster analysis, is a statistical analysis method for sample or index classification problems, and is also an important algorithm for data mining, and the clustering algorithm includes but is not limited to: K-Means (K-Means) Clustering algorithm, mean shift Clustering algorithm, Density-Based Clustering of applications with Noise (DBSCAN) method, gaussian mixture model-Based maximum expected Clustering, agglomerative hierarchical Clustering, and Graph Community Detection (Graph Community Detection) algorithm, etc.

Preferably, in the present embodiment, a K-Means (K-Means) clustering algorithm is employed.

In the embodiment, the classification of each sentence in the interview response text is determined by clustering and calculating the semantic similarity, which is beneficial to refining the sentences of different classifications in a targeted manner in the follow-up process.

In some optional implementation manners of this embodiment, performing word segmentation processing on the basic sentence in a preset word segmentation manner, and obtaining the basic word segmentation includes:

acquiring the word frequency of each initial word segmentation from the historical interview response text;

and generating the weight of the initial participle based on the word frequency of the initial participle, and taking the initial participle marked with the weight as a basic participle.

Specifically, a conditional random field model is adopted to perform word segmentation on a basic sentence to obtain initial word segmentation, then a historical interview response text is used to obtain the word frequency of each initial word segmentation, and a weight corresponding to the initial word segmentation is generated according to the word frequency to obtain basic word segmentation with weight information, so that the proportion of each basic word segmentation is more in line with the requirements of an interview scene when the basic word segmentation is labeled subsequently.

The Conditional Random Field (CRF) model is an identification probability model, is one of random fields, is usually used for labeling or analyzing sequence data, and represents a markov random field of another set of output random variables Y under the condition of a set of input random variables X, and has a good effect in sequence labeling tasks such as word segmentation, part of speech labeling, named entity recognition and the like.

The historical interview response text refers to interview response text generated by interviews, and the proportion of participles in the interview process can be embodied through the word frequency of the historical interview response text.

In the embodiment, the initial participle obtained by participling the conditional random field model is weighted, so that the basic participle more conforming to the intelligent interview scene is obtained, and the classification accuracy is improved.

In some optional implementation manners of this embodiment, in step S204, the language extraction model is a bidirectional long and short term memory network model, the bidirectional long and short term memory network model includes a sentence encoder and a document encoder, and performing sentence extraction from each type of classified text through the language extraction model to obtain an extracted sentence includes:

splitting the texts in the classified texts according to characters through a sentence encoder to obtain basic characters;

encoding the basic character to obtain the encoding content corresponding to the basic character;

splicing the sentence coding result into a hidden layer vector at the forward hidden layer output and the reverse hidden layer output, and inputting the hidden layer vector into a document coder;

and weighting the hidden layer vector by a document encoder to obtain a document feature vector, decoding the document feature vector, and taking an output result obtained by decoding as an extraction statement.

Specifically, the text in the classified text is split and coded according to characters through a sentence coder to obtain coded content, the coded content is input into a character coding layer to obtain a character vector corresponding to each code, each character vector is used as a coding result of the sentence and is transmitted to a document coder through a hidden layer, and the extracted sentence is obtained through weighting through the document coder.

It is worth mentioning that based on the sentence encoding result, the forward and reverse hidden layer outputs corresponding to each character in the model are spliced into a hidden layer vector:

wherein the forward direction is denoted by superscript + and the reverse direction is denoted by superscript-, and the ith character is denoted by subscript i.

Among them, the Long Short-Term Memory network (LSTM) is a time-recursive neural network, which is suitable for processing and predicting important events with relatively Long interval and delay in time sequence.

It should be noted that the one-way LSTM can memorize from the first word to the last word of a sentence in human reading order, and the LSTM structure can only capture the above information, cannot capture the below information, the bidirectional LSTM is composed of two LSTMs with different directions, one LSTM reads data from front to back according to the sequence of words in the sentence, the other LSTM reads data from back to front according to the reverse direction of the word sequence of the sentence, so that the first LSTM obtains context information, the other LSTM obtains context information, the union of the two LSTM is the context information for the entire sentence, and the context information is provided by the whole sentence, naturally contains relatively abstract semantic information (meaning of the sentence), the method has the advantages of fully utilizing the processing advantages of the LSTM on the sequence data with time sequence characteristics, and because the position characteristics are input, which can extract the entity direction information contained in the position characteristics after bidirectional LSTM encoding.

In the embodiment, the classified sentences are analyzed and extracted from two bidirectional long and short memory networks of different levels through the sentence encoder and the document encoder, so that the accuracy rate of extracting the key sentences is improved.

In some optional implementations of this embodiment, weighting the hidden vector by the document encoder to obtain the document feature vector includes:

determining a document feature vector by adopting the following formula:

wherein, C_iIs the ith document feature vector, j is the embedded code sequence number, n is the number of embedded codes, b_ijWeight of ith document feature vector to jth hidden layer vector, h_jIs the jth hidden layer vector, wherein the embedded code is generated based on the hidden state of the bidirectional long-short term memory network model.

In this embodiment, a generation manner of the document feature vector is obtained through weighting calculation, which is beneficial to accurately extracting the key sentences.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Fig. 3 is a schematic block diagram of an artificial intelligence-based interview content refining apparatus in one-to-one correspondence with the above-described embodiment of the artificial intelligence-based interview content refining method. As shown in fig. 3, the apparatus for refining interview content based on artificial intelligence includes a text acquisition module 31, a text parsing module 32, a text classification module 33, a corpus extraction module 34 and an information transmission module 35. The functional modules are explained in detail as follows:

the text acquisition module 31 is configured to acquire an interview recording and convert the interview recording into an interview text, where the interview text includes a self-introduction text and an interview response text;

the text analysis module 32 is used for performing text analysis on the self-introduction text to obtain the basic information of the interviewer;

the text classification module 33 is used for performing statement classification on the interview response text according to the related interview angles to obtain a classified text;

the corpus extraction module 34 is configured to perform sentence extraction from each type of classified text through a language extraction model to obtain extracted sentences, and refine the extracted sentences by using a transform model to obtain interview refined corpuses;

the information sending module 35 is configured to send the basic information of the interviewer and the interview refining corpus to the management end, so that the management end determines an interview result according to the basic information of the interviewer and the interview refining corpus.

Optionally, the text acquiring module 31 includes:

and the text determining unit is used for performing text conversion on the interview recording by adopting a voice text conversion mode, converting the recording content before the question and answer starting mark into a text as a self-introduction text, and converting the recording content before the question and answer starting mark into the text as an interview response text.

Optionally, the text classification module 33 includes:

Optionally, the word segmentation unit comprises:

the initial word segmentation unit is used for segmenting words of the basic sentences by adopting a conditional random field model to obtain initial segmented words;

the word frequency obtaining subunit is used for obtaining the word frequency of each initial participle from the historical interview response text;

and the participle weighting unit is used for generating the weight of the initial participle based on the word frequency of the initial participle, and taking the initial participle marked with the weight as a basic participle.

Optionally, the corpus extraction module 34 includes:

the splitting unit is used for splitting the texts in the classified texts according to characters through a sentence encoder to obtain basic characters;

the mapping unit is used for inputting the coded content into the character coding layer of the initialized weight, mapping each code into a character vector through the character coding layer, and taking each character vector as a sentence coding result;

the splicing unit is used for splicing the sentence coding results into hidden layer vectors in the forward and reverse hidden layer outputs and inputting the hidden layer vectors into a document coder;

and the weighting unit is used for weighting the hidden layer vector through a document encoder to obtain a document feature vector, decoding the document feature vector and taking an output result obtained by decoding as an extraction statement.

Optionally, the weighted decoding unit includes:

the calculating subunit is configured to determine the document feature vector by using the following formula:

and the storage module is used for storing the interviewer basic information and the interview refining corpus into the block chain network.

For the specific limitations of the apparatus for refining interview content based on artificial intelligence, reference may be made to the above limitations of the method for refining interview content based on artificial intelligence, which are not described herein again. The various modules of the artificial intelligence based interview content refining device can be implemented in whole or in part by software, hardware and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only the computer device 4 having the components connection memory 41, processor 42, network interface 43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or D interface display memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system installed in the computer device 4 and various types of application software, such as program codes for controlling electronic files. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute the program code stored in the memory 41 or process data, such as program code for executing control of an electronic file.

The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.

The present application further provides another embodiment, which is a computer-readable storage medium storing an interface display program, which is executable by at least one processor to cause the at least one processor to perform the steps of the artificial intelligence based interview content refining method as described above.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. An artificial intelligence based interview content refining method is characterized by comprising the following steps:

2. The artificial intelligence based interview content refining method of claim 1, wherein said converting the interview recording into interview text comprises:

3. The method for refining interview content based on artificial intelligence as claimed in claim 1, wherein said sentence classification of said interview response text according to the interview angle involved to obtain a classification text comprises:

4. The method for refining interview content based on artificial intelligence as claimed in claim 3, wherein the obtaining of the basic word segmentation by performing word segmentation on the basic sentence in a preset word segmentation manner comprises:

5. The method for refining interview content based on artificial intelligence according to any one of claims 1 to 4, wherein the language extraction model is a two-way long-short term memory network model, the two-way long-short term memory network model comprises a sentence coder and a document coder, and the extracting sentences from each type of classified text through the language extraction model comprises:

6. The artificial intelligence based interview content refining method of claim 5, wherein the weighting the hidden vector by the document encoder to obtain a document feature vector comprises:

determining the document feature vector by adopting the following formula:

7. The method for refining interview content based on artificial intelligence as claimed in claim 1, wherein after refining the extracted sentence using a Transformer model to obtain interview refining corpus, the method further comprises: and storing the interviewer basic information and the interview refining corpus into a block chain network.

8. An interview content refining device based on artificial intelligence, characterized in that the interview content refining device based on artificial intelligence comprises:

9. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements the artificial intelligence based interview content refining method of any one of claims 1-7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the artificial intelligence based interview content refining method according to any one of claims 1 to 7.