WO2021218028A1 - Artificial intelligence-based interview content refining method, apparatus and device, and medium - Google Patents

Artificial intelligence-based interview content refining method, apparatus and device, and medium Download PDF

Info

Publication number
WO2021218028A1
WO2021218028A1 PCT/CN2020/118928 CN2020118928W WO2021218028A1 WO 2021218028 A1 WO2021218028 A1 WO 2021218028A1 CN 2020118928 W CN2020118928 W CN 2020118928W WO 2021218028 A1 WO2021218028 A1 WO 2021218028A1
Authority
WO
WIPO (PCT)
Prior art keywords
interview
text
sentence
basic
vector
Prior art date
Application number
PCT/CN2020/118928
Other languages
French (fr)
Chinese (zh)
Inventor
邓悦
金戈
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021218028A1 publication Critical patent/WO2021218028A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • G06Q10/1053Employment or hiring

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a method, device, equipment and medium for refining interview content based on artificial intelligence.
  • interviewers In the hot recruitment season for large companies, there are often many interviewers participating in the interview. At present, most employers and interviewers conduct interviews on-site or through video conferences. Employers often evaluate the interviewer based on the interviewer’s response to the interview after the interview.
  • the usual manual interviews include at least the following questions: (1) Different interviewers have a preference for asking questions. The same interviewer will have different judgments due to different workplace experience, interview skills and emotional states; (2) High In view of this, some companies use artificial intelligence-based interview robots to conduct interviews and provide the interview content to decision makers for result evaluation. This is conducive to improving the fairness of the interview, but also at the same time. This leads to a new problem. When there are many interviewers, more interview content will be obtained. This also increases the time cost of decision-making evaluation and leads to inefficient intelligent interviews.
  • the existing solutions mainly use keyword matching on the interview content to obtain key sentences, or use Natural Language Processing (NLP) models for semantic recognition.
  • NLP Natural Language Processing
  • keyword matching different interviewers In the response process, the way to answer the question may be different, and there may be situations where the preset keywords may not be matched, resulting in low accuracy of the final interview estimation.
  • the general natural language processing model is used for semantic recognition, the accuracy of semantic recognition is often up to Not required.
  • the embodiments of the application provide an artificial intelligence-based method, device, and medium for refining interview content, so as to improve the accuracy of the interview content evaluation in the smart interview.
  • an embodiment of the present application provides an artificial intelligence-based method for refining interview content, including:
  • interview recordings and convert the interview recordings into interview texts, where the interview texts include self-introduction texts and interview response texts;
  • the basic information of the interviewer and the refined corpus of the interview are sent to the management terminal, so that the management terminal determines the interview result according to the basic information of the interviewer and the refined corpus of the interview.
  • an embodiment of the present application also provides an artificial intelligence-based interview content refining device, including:
  • a text acquisition module for acquiring interview recordings and converting the interview recordings into interview texts, where the interview texts include self-introduction texts and interview response texts;
  • the text analysis module is used for text analysis of the self-introduction text to obtain basic information of the interviewer
  • the text classification module is used to classify the interview response text according to the interview angle involved to obtain the classified text
  • the corpus extraction module is used to extract sentences from each type of the classified text through the language extraction model to obtain the extracted sentences, and use the Transformer model to refine the extracted sentences to obtain the interview refined corpus;
  • the information sending module is configured to send the basic information of the interviewer and the refined corpus of the interview to the management terminal, so that the management terminal determines the interview result according to the basic information of the interviewer and the refined corpus of the interview.
  • an embodiment of the present application also provides a computer device, including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor executes all
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor executes all
  • the steps to implement the method for refining interview content based on artificial intelligence are:
  • interview recordings and convert the interview recordings into interview texts, where the interview texts include self-introduction texts and interview response texts;
  • the basic information of the interviewer and the refined corpus of the interview are sent to the management terminal, so that the management terminal determines the interview result according to the basic information of the interviewer and the refined corpus of the interview.
  • embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the following is achieved based on artificial intelligence The steps to refine the interview content:
  • interview recordings and convert the interview recordings into interview texts, where the interview texts include self-introduction texts and interview response texts;
  • the basic information of the interviewer and the refined corpus of the interview are sent to the management terminal, so that the management terminal determines the interview result according to the basic information of the interviewer and the refined corpus of the interview.
  • the artificial intelligence-based interview content refining method, device, equipment and medium obtained interview recordings and convert the interview recordings into interview texts.
  • the interview texts include self-introduction text and interview response text.
  • the introduction text is analyzed to obtain the basic information of the interviewer. According to the interview angle involved, the interview response text is classified to obtain the classified text.
  • the Transformer model is used to refine and extract sentences to obtain interview refined corpus, which can accurately extract the core content from the content of interview records with a large amount of data, improve the accuracy of content extraction, and help improve the accuracy of intelligent interview evaluation.
  • the interviewer’s basic information and the interview refined corpus are sent to the management side, so that the management side can determine the interview result based on the interviewer’s basic information and the interview refined corpus, avoiding inaccurate evaluation results caused by direct semantic recognition, and improving the evaluation of intelligent interview results. Accuracy and efficiency.
  • Figure 1 is an exemplary system architecture diagram to which the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of the method for refining interview content based on artificial intelligence of the present application
  • Fig. 3 is a schematic structural diagram of an embodiment of an artificial intelligence-based interview content refining device according to the present application
  • Fig. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • the system architecture 100 may include terminal devices 101, 102, and 103, a network 104 and a server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on.
  • the terminal devices 101, 102, 103 may be various electronic devices with a display screen and support web browsing, including but not limited to smart phones, tablets, e-book readers, MP3 players (Moving Picture E interface display perts Group Audio Layer III. The moving picture expert compresses the standard audio layer 3), MP4 (Moving Picture E interface displays perts Group Audio Layer IV, the moving picture expert compresses the standard audio layer 4) player, laptop portable computer and desktop computer, etc.
  • the server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the terminal devices 101, 102, and 103.
  • the artificial intelligence-based interview content refining method provided by the embodiment of the present application is executed by the server, and accordingly, the artificial intelligence-based interview content refining device is set in the server.
  • terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there may be any number of terminal devices, networks, and servers.
  • the terminal devices 101, 102, and 103 in the embodiments of the present application may specifically correspond to application systems in actual production.
  • FIG. 2 shows an artificial intelligence-based interview content refining method provided by an embodiment of the present application.
  • the application of the method to the server in FIG. 1 is taken as an example for description. The details are as follows:
  • S201 Obtain interview recordings and convert the interview recordings into interview texts, where the interview text includes self-introduction text and interview response text.
  • interview process In the interview recruitment process of an enterprise, many interviewers participate in the interview. Due to the limited number of interview positions, there are situations in which multiple interviewers interview the same position. In order to avoid confusion or forgetting the interviewer’s information, this embodiment is used in the interview process.
  • the interview process of many interviewees is recorded, and the content of the recording is converted into interview text after the fact and subsequent processing is carried out.
  • the interview text includes self-introduction text and interview response text.
  • the self-introduction text refers to the text obtained by voice conversion of the interviewer's self-introduction
  • the response text refers to the text that the interviewer asks and the interviewer responds after the self-introduction.
  • interviewer mentioned in this embodiment may be a person or a question-and-answer robot participating in a smart interview, which is not specifically limited here.
  • this embodiment uses self-introduction as the starting point, because the self-introduction part
  • the information can already summarize a larger part of the interviewer’s abilities, and other aspects of the interview, such as skill surveys and business acumen surveys, can be used as reference and used as training data to supplement and verify the interviewer’s self-introduction , Get a more comprehensive result.
  • a tool that supports voice-to-text conversion can be used specifically, or a voice-to-text algorithm can also be used, and there is no specific limitation here.
  • a voice-to-text algorithm can also be used, and there is no specific limitation here.
  • S202 Perform text analysis on the self-introduction text to obtain basic information of the interviewer.
  • the self-introduction text generally includes basic personal information, experience information, areas of expertise and skills, past honors and awards, and self-evaluation, the content modules involved are relatively similar.
  • this embodiment adopts regular expressions based on Analyze the self-introduction text, quickly extract the content in the self-introduction text, and get the basic information of the interviewer.
  • the basic information of the interviewer includes, but is not limited to: personal fixed information such as name, household registration, graduate school, major, working years, etc., and personal professional information such as honors obtained, companies served, experience and skills mastered. Wait.
  • the basic information of the interviewer to be obtained is divided into multiple dimensions, and at least one regular expression is set for each dimension to match the self-introduction text. Perform matching analysis to obtain the content corresponding to the dimension as the analysis content of the dimension.
  • regular expression describes a string matching pattern (pattern), which can be used to check whether a string contains a certain substring, replace the matched substring, or take out a string that matches a certain The substring of the condition, etc.
  • text analysis is carried out from the seven dimensions of name, household registration, graduate school, major, working years, employment experience, and skills mastered.
  • the household registration dimension can be set to include some Match keywords with specific characters, for example, match sentence patterns containing specific keywords such as "I am a XXX person", "I come from XXX”, "I am a XXX person", "I grew up in XXX” .
  • S203 According to the interview angle involved, classify the interview response text to obtain the classified text.
  • interviewer questions are usually asked around work experience, field of expertise, and skills.
  • these interview angles are preset according to actual needs, and the interview response text is obtained.
  • the interview response text is classified to obtain the classified text, so that the subsequent extraction and refining of key sentences can be carried out according to the category of the classified text, which is beneficial to improve the accuracy of content refining.
  • interview angle involved refers to the focus of questions and responses, such as salary requirements, awards, working years, professional skills, etc.
  • the interview response text is semantically recognized, and the sentence is classified according to the semantic recognition result to obtain the specific implementation process of the classified text.
  • the sentence classification according to the semantic recognition results can be specifically clustering the recognition results to obtain the clustering results, and calculating the Euclidean distance between the clustering results and the word vectors corresponding to each interview angle, and then calculating the nearest interview Angle, as the interview angle corresponding to the clustering result.
  • S204 Use the language extraction model to extract sentences from each type of classified text to obtain the extracted sentences, and use the Transformer model to refine the extracted sentences to obtain interview refined corpus.
  • the language extraction model is used to extract sentences from each type of classified text to obtain the extracted sentences, and then use the Transformer model to refine the extracted sentences to obtain the interview refined corpus.
  • language extraction models include but are not limited to: deep semantic representation (Embedding from Language Model, ELMo) algorithm, OpenAI GPT, and pre-trained Bidirectional Encoder Representations from Transformers (BERT) model.
  • ELMo Embedding from Language Model
  • OpenAI GPT OpenAI GPT
  • BERT Bidirectional Encoder Representations from Transformers
  • the improved OpenAI GPT model is used as the semantic extraction model.
  • sentence extraction please refer to the description of the subsequent embodiments. In order to avoid repetition, it will not be repeated here.
  • the specific expression form of the extracted sentence obtained in this embodiment may also be in the form of a vector, so that it can be quickly input to the Transformer model for refinement and extraction.
  • the Transformer model can quickly extract sentences of higher importance based on the weight through the attention mechanism.
  • the decoding stage of the Transformer model of this embodiment the sum of the generated document feature vectors is input to the decoder.
  • This autoregressive long- and short-term network will predict the next sentence to be extracted, and the output result will be in the next sentence Connect to the input when decoding.
  • the biggest difference between the decoder used in the Transformer model of this embodiment and other commonly used decoders is that in the process of obtaining attention through dot products, if the same index appears twice in succession, the entire extraction process is ended to avoid multiple extractions. Similar information leads to information redundancy.
  • step S203 there is no necessary logical sequence between step S203 to step S204 and step S202, and it can also be executed in parallel, which is not limited here.
  • S205 Send the interviewer's basic information and the interview refined corpus to the management terminal, so that the management terminal determines the interview result based on the interviewer's basic information and the interview refined corpus.
  • the extracted basic information of the interviewer and the interview refined corpus are sent to the management end to ensure the accuracy and refinement of the extracted content, so that the subsequent management end user can accurately and quickly determine the evaluation result based on the extracted content, which is beneficial to improve the intelligent interview Accuracy and efficiency.
  • the interview recording is obtained and converted into interview text.
  • the interview text includes self-introduction text and interview response text.
  • the self-introduction text is text-analyzed to obtain the basic information of the interviewer. From the interview perspective, the interview response text is classified to obtain the classified text.
  • the sentence is extracted from each type of classified text to obtain the extracted sentence, and the Transformer model is used to refine the extracted sentence to obtain the interview refined corpus.
  • the core content is accurately extracted from the content of a large number of interview records, which improves the accuracy of content extraction, which is conducive to improving the accuracy of intelligent interview evaluation.
  • the basic information of the interviewer and the interview refined corpus are sent to the management end to enable
  • the management terminal determines the interview results based on the interviewee's basic information and the interview refined corpus, avoiding inaccurate evaluation results caused by direct semantic recognition, which is conducive to improving the accuracy and efficiency of the evaluation of intelligent interview results.
  • the obtained basic interviewer information and interview refined corpus can be stored on the blockchain network, and the data information can be shared between different platforms through the storage of the blockchain, and the data can also be prevented from being tampered with.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • step S201 converting interview recordings into interview text includes:
  • the interview recording file is traversed to find a voice segment with the same voice information as the preset question and answer start identification, as a demarcation point, the voice before the voice segment is converted into the text obtained, As the self-introduction text, the speech after the speech fragment is converted into the text as the interview response text.
  • the voice signal can be subjected to amplitude normalization processing, pre-emphasis processing, and frame windowing to obtain a voice frame set, and then from the voice In the frame set, the voice frame segment that is the same as the voice frame of the preset question and answer start identifier is found by traversal and comparison, and the voice frame segment is determined as the voice segment with the same voice information as the preset question and answer start identifier.
  • the preset Q&A start mark is used to remind the self-introduction stage is over and the Q&A session begins, such as the voice prompt of "Thank you for your introduction, now I want to ask you a few questions", etc., which can be carried out according to the actual situation.
  • the voice prompt of "Thank you for your introduction, now I want to ask you a few questions", etc., which can be carried out according to the actual situation.
  • the voice prompt of "Thank you for your introduction, now I want to ask you a few questions", etc.
  • the voice conversion text can use a voice recognition algorithm, or a third-party tool with a voice conversion function can be used, and the specifics are not limited.
  • Speech-to-speech-to-text algorithms include, but are not limited to: speech recognition algorithms based on vocal tract models, speech template matching recognition algorithms, or artificial neural network speech recognition algorithms, etc.
  • the interview recording text is converted into the self-introduction text and the interview response text, so that the two types of text are processed separately in the subsequent process, which is more targeted and the processing result obtained is more accurate.
  • step S203 sentence classification is performed on the interview response text according to the interview angle involved, and the obtained classification text includes:
  • the cluster center corresponding to each sentence is obtained, and then the word vector corresponding to the cluster center and the preset interview angle is calculated to determine the classification to which each sentence belongs.
  • the preset word segmentation methods include, but are not limited to: third-party word segmentation tools or word segmentation algorithms, etc.
  • common third-party word segmentation tools include but are not limited to: Stanford NLP word segmentation, ICTCLAS word segmentation system, ansj word segmentation tool and HanLP Chinese word segmentation tool, etc.
  • word segmentation algorithms include, but are not limited to: rule-based word segmentation methods, statistics-based word segmentation methods, understanding-based word segmentation methods, and neural network word segmentation methods.
  • Rule-based word segmentation methods mainly include: Minimum Matching, Maximum Matching, Reverse Direction Maximum Matching, and Bi-Direction Maximum Matching (BMM) , Sign segmentation method, full segmentation path selection method and Association-Backtracking Method (Association-Backtracking Method, AB method for short), etc.
  • the word segmentation methods based on statistics mainly include: N-Gram model, Hidden Markov Model (HMM) sequence labeling method, Maximum Entropy Model (MEM) sequence labeling method, Maximum Entropy Markov model (Maximum Entropy Markov Model (MEMM) sequence labeling method and Conditional Random Fields (CRF) sequence labeling method, etc.
  • HMM Hidden Markov Model
  • MEM Maximum Entropy Model
  • MEMM Maximum Entropy Markov model
  • CRF Conditional Random Fields
  • this embodiment adopts an improved CRF model for word segmentation, and the specific implementation process can refer to the description of the subsequent embodiments. In order to avoid repetition, it will not be repeated here.
  • clustering algorithm is also called cluster analysis. It is a statistical analysis method for the classification of samples or indicators. It is also an important algorithm for data mining.
  • Clustering algorithms include but are not limited to: K-Means ) Clustering algorithm, mean shift clustering algorithm, density-based clustering (Density-Based Spatial Clustering of Applications with Noise, DBSCAN) method, maximum expected clustering based on Gaussian mixture model, agglomerative hierarchical clustering and graph group detection ( Graph Community Detection) algorithm, etc.
  • a K-Means clustering algorithm is adopted.
  • the classification of each sentence in the interview response text is determined, which is beneficial to the subsequent refinement of sentences of different classifications in a targeted manner.
  • performing word segmentation processing on the basic sentence through a preset word segmentation method to obtain the basic word segmentation includes:
  • the weight of the initial word segmentation is generated, and the weighted initial word segmentation is marked as the basic word segmentation.
  • conditional random field model is used to segment the basic sentence to obtain the initial word segmentation, and then through the historical interview response text, the word frequency of each initial word segmentation is obtained, and the weight corresponding to the initial word segmentation is generated according to the word frequency, and the weight information is obtained.
  • the basic word segmentation makes the proportion of each basic word segmentation more in line with the needs of the interview scene when the basic word segmentation is subsequently marked.
  • conditional random field (CRF) model is a discriminative probability model, which is a kind of random field. It is often used to label or analyze sequence data. It represents a given set of input random variables X. Under the conditions, another set of Markov random fields outputting random variable Y has good effect in sequence tagging tasks such as word segmentation, part-of-speech tagging and named entity recognition.
  • the historical interview response text refers to the interview response text generated by the interview that has occurred.
  • the word frequency of the historical interview response text can reflect the proportion of some word segmentation in the interview process.
  • weight is assigned to the initial word segmentation obtained by performing word segmentation on the conditional random field model to obtain a basic word segmentation that is more in line with the intelligent interview scenario, which is beneficial to improve the accuracy of classification.
  • the language extraction model is a two-way long short-term memory network model.
  • the two-way long short-term memory network model includes a sentence encoder and a document encoder. Sentence extraction is performed in the class classification text, and the extracted sentences include:
  • the sentence encoding results are spliced into a hidden layer vector in the forward and reverse hidden layer output, and the hidden layer vector is input to the document encoder;
  • the hidden layer vector is weighted by the document encoder to obtain the document feature vector, and the document feature vector is decoded, and the output result obtained by the decoding is used as the extraction sentence.
  • the text in the classified text is split and encoded by the sentence encoder to obtain the encoding content, and then the encoding content is input into the character encoding layer to obtain the character vector corresponding to each encoding, and each character vector , As the encoding result of the sentence, and passed to the document encoder through the hidden layer, and weighted by the document encoder to obtain the extracted sentence.
  • the forward and reverse hidden layer output corresponding to each character in the model will be spliced into a hidden layer vector:
  • the positive direction is represented by superscript +
  • the reverse direction is represented by superscript -
  • the i-th character is represented by subscript i.
  • the Long Short-Term Memory is a time recurrent neural network, which is suitable for processing and predicting important events with relatively long intervals and delays in a time series.
  • the one-way LSTM can memorize the first word to the last word of a sentence according to the human reading order.
  • This LSTM structure can only capture the above information but not the following information, while the two-way LSTM is composed of Two LSTMs with different directions are composed.
  • One LSTM reads data from front to back according to the order of words in the sentence, and the other LSTM reads data from back to front in the opposite direction of the sentence word order, so that the first LSTM obtains the above information.
  • Another LSTM obtains the following information.
  • the joint statement of the two LSTMs is the context information of the entire sentence, and the context information is provided by the entire sentence, which naturally contains more abstract semantic information (meaning of the sentence).
  • the advantage of this method is It makes full use of the advantages of LSTM in processing sequence data with time series characteristics, and because we input location features, the entity direction information contained in the location features can be extracted after bidirectional LSTM encoding.
  • the sentence encoder and the document encoder are used to analyze and extract the classified sentences from two bidirectional long-short memory networks of different levels, so as to improve the accuracy of key sentence extraction.
  • weighting the hidden layer vector by the document encoder to obtain the document feature vector includes:
  • C i is the i-th document feature vector
  • j is the serial number of the embedding code
  • n is the number of embedding codes
  • b ij is the weight of the i-th document feature vector for the j-th hidden layer vector
  • h j is the j-th hidden layer vector.
  • the method of generating document feature vectors is obtained through weighted calculation, which is beneficial for accurately extracting key sentences.
  • Fig. 3 shows a schematic block diagram of an artificial intelligence-based interview content refining device corresponding to the above-mentioned embodiment of the artificial intelligence-based interview content refining method one-to-one.
  • the artificial intelligence-based interview content refining device includes a text acquisition module 31, a text analysis module 32, a text classification module 33, a corpus extraction module 34, and an information sending module 35.
  • the detailed description of each functional module is as follows:
  • the text acquisition module 31 is used to acquire interview recordings and convert the interview recordings into interview texts, where the interview text includes self-introduction text and interview response text;
  • the text analysis module 32 is used for text analysis of the self-introduction text to obtain basic information of the interviewer;
  • the text classification module 33 is used to classify the interview response text according to the interview angle involved to obtain the classified text
  • the corpus extraction module 34 is used to extract sentences from each type of classified text through the language extraction model to obtain the extracted sentences, and use the Transformer model to refine the extracted sentences to obtain the interview refined corpus;
  • the information sending module 35 is used to send the interviewer's basic information and the interview refined corpus to the management terminal, so that the management terminal determines the interview result based on the interviewer's basic information and the interview refined corpus.
  • the text acquisition module 31 includes:
  • the text determination unit is used to convert the interview recording to the text by using the voice-to-text method, and convert the text obtained by converting the recording content before the question and answer start marking, as a self-introduction text, and convert the recording content before the question and answer start marking Text, as the interview response text.
  • the text classification module 33 includes:
  • the word segmentation unit is used to treat each sentence in the interview response text as a basic sentence, and use the preset word segmentation method to segment the basic sentence to obtain the basic word segmentation;
  • the clustering unit is used to convert the basic word segmentation into a word vector, and use a clustering algorithm to cluster the word vector to obtain the cluster center corresponding to the basic sentence;
  • the classification unit is used to calculate the Euclidean distance between the cluster center corresponding to the basic sentence and the word vector corresponding to each preset interview angle for each basic sentence, and use the preset interview angle with the smallest distance as the target classification of the basic sentence , Regard the basic sentence as the classification text corresponding to the target classification.
  • the word segmentation unit includes:
  • the initial word segmentation unit is used to segment the basic sentence using the conditional random field model to obtain the initial word segmentation
  • the word frequency acquisition subunit is used to obtain the word frequency of each initial word segmentation from the historical interview response text
  • the word segmentation weighting unit is used to generate the weight of the initial word segmentation based on the word frequency of the initial word segmentation, and use the weighted initial word segmentation as the basic word segmentation.
  • the corpus extraction module 34 includes:
  • the splitting unit is used to split the text in the classified text by the sentence encoder according to the characters to obtain the basic characters
  • the coding unit is used to encode the basic character to obtain the coding content corresponding to the basic character
  • the mapping unit is used to input the encoded content to the character encoding layer of the initialization weight, and map each encoding into a character vector through the character encoding layer, and use each character vector as a sentence encoding result;
  • the splicing unit is used to splice the sentence encoding result into a hidden layer vector in the forward and reverse hidden layer output, and input the hidden layer vector to the document encoder;
  • the weighting unit is used to weight the hidden layer vector by the document encoder to obtain the document feature vector, decode the document feature vector, and use the decoded output result as the extraction sentence.
  • the weighted decoding unit includes:
  • the calculation subunit is used to determine the document feature vector using the following formula:
  • C i is the i-th document feature vector
  • j is the serial number of the embedding code
  • n is the number of embedding codes
  • b ij is the weight of the i-th document feature vector for the j-th hidden layer vector
  • h j is the j-th hidden layer vector.
  • the device for refining interview content based on artificial intelligence further includes:
  • the storage module is used to store the basic information of the interviewer and the refined corpus of the interview in the blockchain network.
  • each module in the above artificial intelligence-based interview content refining device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • FIG. 4 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are connected to each other in communication via a system bus. It should be pointed out that the figure only shows the computer device 4 with the components connected to the memory 41, the processor 42, and the network interface 43, but it should be understood that it is not required to implement all the shown components, and alternative implementations can be made. More or fewer components. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • the memory 41 includes at least one type of readable storage medium, the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or D interface display memory, etc.), random access memory (RAM) , Static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or memory of the computer device 4.
  • the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk equipped on the computer device 4, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc.
  • the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device.
  • the memory 41 is generally used to store an operating system and various application software installed in the computer device 4, such as program codes for controlling electronic files.
  • the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 42 is generally used to control the overall operation of the computer device 4.
  • the processor 42 is configured to run program codes or process data stored in the memory 41, for example, run program codes for controlling electronic files.
  • the network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.
  • the computer-readable storage medium may be non-volatile or volatile, and the computer-readable storage medium stores An interface display program, the interface display program can be executed by at least one processor, so that the at least one processor executes the steps of the above-mentioned artificial intelligence-based interview content refinement method.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
  • a terminal device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Economics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An artificial intelligence-based interview content refining method, apparatus and device, and a medium. The method comprises: obtaining an interview recording and converting the interview recording into self-introduction text and interview response text (S201); performing text parsing on the self-introduction text, and obtaining basic information of an interviewee (S202); classifying sentences of the interview response text to obtain classified text (S203); using a language extraction model to extract sentences from each type of classified text to obtain extracted sentences, and using a Transformer model to refine the extracted sentences to obtain a refined interview corpus (S204). Thus, the accurate extraction of core content from content of an interview recording having a large amount of data is achieved, and the accuracy of content extraction is improved, which helps to improve the accuracy of intelligent interview evaluation. The basic information of the interviewee and the refined interview corpus are stored in a blockchain and are sent at the same time to a management end for evaluation, which avoids direct semantic recognition that causes evaluation results to fail to meet requirements, which helps to improve the accuracy and efficiency of the intelligent evaluation of interview results.

Description

基于人工智能的面试内容精炼方法、装置、设备及介质Refining methods, devices, equipment and media for interview content based on artificial intelligence
本申请要求于2020年4月29日,提交中国专利局、申请号为202010356767.3,发明名称为“基于人工智能的面试内容精炼方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on April 29, 2020, the application number is 202010356767.3, and the invention title is "Artificial Intelligence-based Interview Content Refining Method, Apparatus, Equipment, and Media", all of which The content is incorporated in this application by reference.
技术领域Technical field
本申请涉及人工智能领域,尤其涉及一种基于人工智能的面试内容精炼方法、装置、设备及介质。This application relates to the field of artificial intelligence, and in particular to a method, device, equipment and medium for refining interview content based on artificial intelligence.
背景技术Background technique
在大型企业招聘热季,往往有众多面试者参与面试,目前大多数用人单位与面试者是通过现场或者视频会议的方式进行面试。用人单位往往在面试后,结合面试者的面试应答情况,对面试者进行评估。通常的人工面试至少存入如下问题:(1)不同面试官,进行提问的角度偏好,同一面试官,由于不同的职场经验、面试技能和情绪状态,也会有不同的判断;(2)高额的人力成本和面试时间成本,鉴于此,一些企业采用基于人工智能的面试机器人进行面试,并将得到的面试内容提供给决策者进行结果评估,这有利于提高面试的公平性,但同时也导致一个新的问题,在面试者较多时,得到的面试内容也将较多,这也增大了决策评估的时间成本,导致智能面试的效率不高。In the hot recruitment season for large companies, there are often many interviewers participating in the interview. At present, most employers and interviewers conduct interviews on-site or through video conferences. Employers often evaluate the interviewer based on the interviewer’s response to the interview after the interview. The usual manual interviews include at least the following questions: (1) Different interviewers have a preference for asking questions. The same interviewer will have different judgments due to different workplace experience, interview skills and emotional states; (2) High In view of this, some companies use artificial intelligence-based interview robots to conduct interviews and provide the interview content to decision makers for result evaluation. This is conducive to improving the fairness of the interview, but also at the same time. This leads to a new problem. When there are many interviewers, more interview content will be obtained. This also increases the time cost of decision-making evaluation and leads to inefficient intelligent interviews.
现有的解决方案主要是通过对面试内容进行关键字匹配,得到重点语句,或者,使用自然语言处理(Natural Language Processing,NLP)模型进行语义识别,采用关键字匹配的方式时,由于不同面试者在应答过程,回答问题的方式可能不同,可能存在匹配不到预设关键字的情况,导致最终面试估计准确率低,而采用通用自然语言处理模型进行语义识别时,语义识别准确率也往往达不到要求。The existing solutions mainly use keyword matching on the interview content to obtain key sentences, or use Natural Language Processing (NLP) models for semantic recognition. When keyword matching is used, different interviewers In the response process, the way to answer the question may be different, and there may be situations where the preset keywords may not be matched, resulting in low accuracy of the final interview estimation. When the general natural language processing model is used for semantic recognition, the accuracy of semantic recognition is often up to Not required.
发明内容Summary of the invention
本申请实施例提供一种基于人工智能的面试内容精炼方法、装置、和介质,以提高智能面试中对面试内容评估的准确率。The embodiments of the application provide an artificial intelligence-based method, device, and medium for refining interview content, so as to improve the accuracy of the interview content evaluation in the smart interview.
为了解决上述技术问题,本申请实施例提供一种基于人工智能的面试内容精炼方法,包括:In order to solve the above technical problems, an embodiment of the present application provides an artificial intelligence-based method for refining interview content, including:
获取面试录音,并将所述面试录音转化为面试文本,其中,所述面试文本包括自我介绍文本和面试应答文本;Obtain interview recordings, and convert the interview recordings into interview texts, where the interview texts include self-introduction texts and interview response texts;
对所述自我介绍文本进行文本解析,得到面试者基本信息;Perform text analysis on the self-introduction text to obtain the basic information of the interviewer;
按照涉及的面试角度,对所述面试应答文本进行语句分类,得到分类文本;According to the interview angle involved, classify the sentence of the interview response text to obtain the classified text;
通过语言抽取模型,从每类所述分类文本中进行语句抽取,得到抽取语句,并采用Transformer模型精炼所述抽取语句,得到面试精炼语料;Using the language extraction model, extract sentences from each type of the classified text to obtain the extracted sentences, and use the Transformer model to refine the extracted sentences to obtain the interview refined corpus;
将所述面试者基本信息和所述面试精炼语料发送给管理端,以使所述管理端根据所述面试者基本信息和所述面试精炼语料确定面试结果。The basic information of the interviewer and the refined corpus of the interview are sent to the management terminal, so that the management terminal determines the interview result according to the basic information of the interviewer and the refined corpus of the interview.
为了解决上述技术问题,本申请实施例还提供一种基于人工智能的面试内容精炼装置,包括:In order to solve the above technical problems, an embodiment of the present application also provides an artificial intelligence-based interview content refining device, including:
文本获取模块,用于获取面试录音,并将所述面试录音转化为面试文本,其中,所述面试文本包括自我介绍文本和面试应答文本;A text acquisition module for acquiring interview recordings and converting the interview recordings into interview texts, where the interview texts include self-introduction texts and interview response texts;
文本解析模块,用于对所述自我介绍文本进行文本解析,得到面试者基本信息;The text analysis module is used for text analysis of the self-introduction text to obtain basic information of the interviewer;
文本分类模块,用于按照涉及的面试角度,对所述面试应答文本进行语句分类,得到分类文本;The text classification module is used to classify the interview response text according to the interview angle involved to obtain the classified text;
语料抽取模块,用于通过语言抽取模型,从每类所述分类文本中进行语句抽取,得到抽取语句,并采用Transformer模型精炼所述抽取语句,得到面试精炼语料;The corpus extraction module is used to extract sentences from each type of the classified text through the language extraction model to obtain the extracted sentences, and use the Transformer model to refine the extracted sentences to obtain the interview refined corpus;
信息发送模块,用于将所述面试者基本信息和所述面试精炼语料发送给管理端,以使所述管理端根据所述面试者基本信息和所述面试精炼语料确定面试结果。The information sending module is configured to send the basic information of the interviewer and the refined corpus of the interview to the management terminal, so that the management terminal determines the interview result according to the basic information of the interviewer and the refined corpus of the interview.
为了解决上述技术问题,本申请实施例还提供一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现下基于人工智能的面试内容精炼方法的步骤:In order to solve the above technical problems, an embodiment of the present application also provides a computer device, including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor executes all When describing computer-readable instructions, the steps to implement the method for refining interview content based on artificial intelligence are:
获取面试录音,并将所述面试录音转化为面试文本,其中,所述面试文本包括自我介绍文本和面试应答文本;Obtain interview recordings, and convert the interview recordings into interview texts, where the interview texts include self-introduction texts and interview response texts;
对所述自我介绍文本进行文本解析,得到面试者基本信息;Perform text analysis on the self-introduction text to obtain the basic information of the interviewer;
按照涉及的面试角度,对所述面试应答文本进行语句分类,得到分类文本;According to the interview angle involved, classify the sentence of the interview response text to obtain the classified text;
通过语言抽取模型,从每类所述分类文本中进行语句抽取,得到抽取语句,并采用Transformer模型精炼所述抽取语句,得到面试精炼语料;Using the language extraction model, extract sentences from each type of the classified text to obtain the extracted sentences, and use the Transformer model to refine the extracted sentences to obtain the interview refined corpus;
将所述面试者基本信息和所述面试精炼语料发送给管理端,以使所述管理端根据所述面试者基本信息和所述面试精炼语料确定面试结果。The basic information of the interviewer and the refined corpus of the interview are sent to the management terminal, so that the management terminal determines the interview result according to the basic information of the interviewer and the refined corpus of the interview.
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下基于人工智能的面试内容精炼方法的步骤:In order to solve the above technical problems, embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the following is achieved based on artificial intelligence The steps to refine the interview content:
获取面试录音,并将所述面试录音转化为面试文本,其中,所述面试文本包括自我介绍文本和面试应答文本;Obtain interview recordings, and convert the interview recordings into interview texts, where the interview texts include self-introduction texts and interview response texts;
对所述自我介绍文本进行文本解析,得到面试者基本信息;Perform text analysis on the self-introduction text to obtain the basic information of the interviewer;
按照涉及的面试角度,对所述面试应答文本进行语句分类,得到分类文本;According to the interview angle involved, classify the sentence of the interview response text to obtain the classified text;
通过语言抽取模型,从每类所述分类文本中进行语句抽取,得到抽取语句,并采用Transformer模型精炼所述抽取语句,得到面试精炼语料;Using the language extraction model, extract sentences from each type of the classified text to obtain the extracted sentences, and use the Transformer model to refine the extracted sentences to obtain the interview refined corpus;
将所述面试者基本信息和所述面试精炼语料发送给管理端,以使所述管理端根据所述面试者基本信息和所述面试精炼语料确定面试结果。The basic information of the interviewer and the refined corpus of the interview are sent to the management terminal, so that the management terminal determines the interview result according to the basic information of the interviewer and the refined corpus of the interview.
本申请实施例提供的基于人工智能的面试内容精炼方法、装置、设备及介质,通过获取面试录音,并将面试录音转化为面试文本,其中,面试文本包括自我介绍文本和面试应答文本,对自我介绍文本进行文本解析,得到面试者基本信息,按照涉及的面试角度,对面试应答文本进行语句分类,得到分类文本,通过语言抽取模型,从每类分类文本中进行语句抽取,得到抽取语句,并采用Transformer模型精炼抽取语句,得到面试精炼语料,实现从数据量较大的面试记录内容中,准确提炼出核心内容,提高了内容提炼的准确性,有利于提高智能面试评估的准确性,最后将面试者基本信息和面试精炼语料发送给管理端, 以使管理端根据面试者基本信息和面试精炼语料确定面试结果,避免直接进行语义识别导致的评估结果不准确,有利于提高智能面试结果评估的准确性和效率。The artificial intelligence-based interview content refining method, device, equipment and medium provided by this application embodiment obtain interview recordings and convert the interview recordings into interview texts. The interview texts include self-introduction text and interview response text. The introduction text is analyzed to obtain the basic information of the interviewer. According to the interview angle involved, the interview response text is classified to obtain the classified text. Through the language extraction model, the sentence is extracted from each type of classified text to obtain the extracted sentence, and The Transformer model is used to refine and extract sentences to obtain interview refined corpus, which can accurately extract the core content from the content of interview records with a large amount of data, improve the accuracy of content extraction, and help improve the accuracy of intelligent interview evaluation. The interviewer’s basic information and the interview refined corpus are sent to the management side, so that the management side can determine the interview result based on the interviewer’s basic information and the interview refined corpus, avoiding inaccurate evaluation results caused by direct semantic recognition, and improving the evaluation of intelligent interview results. Accuracy and efficiency.
附图说明Description of the drawings
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.
图1是本申请可以应用于其中的示例性系统架构图;Figure 1 is an exemplary system architecture diagram to which the present application can be applied;
图2是本申请的基于人工智能的面试内容精炼方法的一个实施例的流程图;FIG. 2 is a flowchart of an embodiment of the method for refining interview content based on artificial intelligence of the present application;
图3是根据本申请的基于人工智能的面试内容精炼装置的一个实施例的结构示意图;Fig. 3 is a schematic structural diagram of an embodiment of an artificial intelligence-based interview content refining device according to the present application;
图4是根据本申请的计算机设备的一个实施例的结构示意图。Fig. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
具体实施方式Detailed ways
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meanings as commonly understood by those skilled in the technical field of the application; the terms used in the specification of the application herein are only for describing specific embodiments. The purpose is not to limit the application; the terms "including" and "having" in the specification and claims of the application and the above-mentioned description of the drawings and any variations thereof are intended to cover non-exclusive inclusions. The terms "first", "second", etc. in the specification and claims of the present application or the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。The reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
请参阅图1,如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。Please refer to FIG. 1. As shown in FIG. 1, the system architecture 100 may include terminal devices 101, 102, and 103, a network 104 and a server 105. The network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。The user can use the terminal devices 101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on.
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture E界面显示perts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture E界面显示perts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。The terminal devices 101, 102, 103 may be various electronic devices with a display screen and support web browsing, including but not limited to smart phones, tablets, e-book readers, MP3 players (Moving Picture E interface display perts Group Audio Layer III. The moving picture expert compresses the standard audio layer 3), MP4 (Moving Picture E interface displays perts Group Audio Layer IV, the moving picture expert compresses the standard audio layer 4) player, laptop portable computer and desktop computer, etc.
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the terminal devices 101, 102, and 103.
需要说明的是,本申请实施例所提供的基于人工智能的面试内容精炼方法由服务器执行,相应地,基于人工智能的面试内容精炼装置设置于服务器中。It should be noted that the artificial intelligence-based interview content refining method provided by the embodiment of the present application is executed by the server, and accordingly, the artificial intelligence-based interview content refining device is set in the server.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器,本申请实施例中的终端设备101、102、103具体可以对应的是实际生产中的应用系统。It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there may be any number of terminal devices, networks, and servers. The terminal devices 101, 102, and 103 in the embodiments of the present application may specifically correspond to application systems in actual production.
请参阅图2,图2示出本申请实施例提供的一种基于人工智能的面试内容精炼方法,以该方法应用在图1中的服务端为例进行说明,详述如下:Please refer to FIG. 2. FIG. 2 shows an artificial intelligence-based interview content refining method provided by an embodiment of the present application. The application of the method to the server in FIG. 1 is taken as an example for description. The details are as follows:
S201:获取面试录音,并将面试录音转化为面试文本,其中,面试文本包括自我介绍文本和面试应答文本。S201: Obtain interview recordings and convert the interview recordings into interview texts, where the interview text includes self-introduction text and interview response text.
具体地,在企业进行面试招聘过程中,众多面试者参与面试,由于面试的岗位有限,存在多个面试者面试同一岗位的情况,为避免混淆或者遗忘面试者的信息,本实施例在面试过程中,对众多面试者中的面试过程进行录音,在事后将录音内容转换为面试文本并进行后续处理,面试文本包括自我介绍文本和面试应答文本。Specifically, in the interview recruitment process of an enterprise, many interviewers participate in the interview. Due to the limited number of interview positions, there are situations in which multiple interviewers interview the same position. In order to avoid confusion or forgetting the interviewer’s information, this embodiment is used in the interview process. In, the interview process of many interviewees is recorded, and the content of the recording is converted into interview text after the fact and subsequent processing is carried out. The interview text includes self-introduction text and interview response text.
其中,自我介绍文本是指面试者进行自我介绍的语音转化得到的文本,应答文本是指在自我介绍完之后,面试官提问,面试者应答的文本。Among them, the self-introduction text refers to the text obtained by voice conversion of the interviewer's self-introduction, and the response text refers to the text that the interviewer asks and the interviewer responds after the self-introduction.
需要解释的是,本实施例中所提及的面试官,具体可以是人,也可以是参与智能面试的问答机器人,此处不做具体限定。It needs to be explained that the interviewer mentioned in this embodiment may be a person or a question-and-answer robot participating in a smart interview, which is not specifically limited here.
应理解,一般的面试时间在30-40分钟,甚至更长的时间,所以面试者回答的内容加起来篇幅是比较大的,针对这个情况,本实施例以自我介绍为出发点,因为自我介绍部分的信息已经可以概括面试者的较大一部分的能力,而面试的其他环节,如技能考察和业务敏锐度的考察等可以作为参考,用于作为训练数据,来对面试者的自我介绍进行补充验证,得到一个更加全面的结果。It should be understood that the general interview time is 30-40 minutes, or even longer, so the content of the interviewer’s answer is relatively large in total. In view of this situation, this embodiment uses self-introduction as the starting point, because the self-introduction part The information can already summarize a larger part of the interviewer’s abilities, and other aspects of the interview, such as skill surveys and business acumen surveys, can be used as reference and used as training data to supplement and verify the interviewer’s self-introduction , Get a more comprehensive result.
本实施例中,将面试录音转化为面试文本,具体可使用支持语音转换文本的工具,也可以使用语音转换文本算法,此处不作具体限制。将面试文本进行自我介绍文本和面试应答文本的划分的具体实现过程,可参考后续实施例的描述,为避免重复,此处不再赘述。In this embodiment, to convert interview recordings into interview text, a tool that supports voice-to-text conversion can be used specifically, or a voice-to-text algorithm can also be used, and there is no specific limitation here. For the specific implementation process of dividing the interview text into the self-introduction text and the interview response text, refer to the description of the subsequent embodiments. To avoid repetition, it will not be repeated here.
S202:对自我介绍文本进行文本解析,得到面试者基本信息。S202: Perform text analysis on the self-introduction text to obtain basic information of the interviewer.
具体地,由于自我介绍文本中,一般包括个人基本资料、经历信息、擅长领域和技能、过往荣誉奖项和自我评价等类别,涉及的内容模块较为相似,为提高效率,本实施例采用基于正则表达的文本解析方式,对自我介绍文本进行解析,快速提取自我介绍文本中的内容,得到面试者基本信息。Specifically, since the self-introduction text generally includes basic personal information, experience information, areas of expertise and skills, past honors and awards, and self-evaluation, the content modules involved are relatively similar. In order to improve efficiency, this embodiment adopts regular expressions based on Analyze the self-introduction text, quickly extract the content in the self-introduction text, and get the basic information of the interviewer.
其中,面试者的基本信息包括但不限于:姓名、户籍、毕业院校、专业、工作年限等个人固定信息,和获取过的荣誉、服务过的企业、从业经历和掌握的技能等个人职业信息等。Among them, the basic information of the interviewer includes, but is not limited to: personal fixed information such as name, household registration, graduate school, major, working years, etc., and personal professional information such as honors obtained, companies served, experience and skills mastered. Wait.
需要说明的是,由于自我介绍文本中包含的内容维度大致相似,因而,将待获取的面试者基本信息划分为多个维度,通过对每个维度设置至少一个正则表达式,来与自我介绍文本进行匹配解析,得到该维度对应的内容,作为该维度的解析内容。It should be noted that since the content dimensions contained in the self-introduction text are roughly similar, the basic information of the interviewer to be obtained is divided into multiple dimensions, and at least one regular expression is set for each dimension to match the self-introduction text. Perform matching analysis to obtain the content corresponding to the dimension as the analysis content of the dimension.
其中,正则表达式(regular expression)描述了一种字符串匹配的模式(pattern),可以用来检查一个串是否含有某种子串、将匹配的子串替换或者从某个串中取出符合某个条件的子串等。Among them, regular expression (regular expression) describes a string matching pattern (pattern), which can be used to check whether a string contains a certain substring, replace the matched substring, or take out a string that matches a certain The substring of the condition, etc.
例如,在一具体实施方式中,从姓名、户籍、毕业院校、专业、工作年限、从业经历 和掌握的技能这七个维度去进行文本解析,其中,对于户籍这一维度,可以设置包含一些特定字符的关键字来进行匹配,例如,对包含“我是XXX人”、“我来自XXX”、“我是XXX人”、“我在XXX长大”等特定关键字组成的句式进行匹配。For example, in a specific implementation, text analysis is carried out from the seven dimensions of name, household registration, graduate school, major, working years, employment experience, and skills mastered. Among them, the household registration dimension can be set to include some Match keywords with specific characters, for example, match sentence patterns containing specific keywords such as "I am a XXX person", "I come from XXX", "I am a XXX person", "I grew up in XXX" .
S203:按照涉及的面试角度,对面试应答文本进行语句分类,得到分类文本。S203: According to the interview angle involved, classify the interview response text to obtain the classified text.
具体地,通常在面试官提问的过程中,会围绕工作经历、擅长领域和技能等方面进行提问,在本实施例中,根据实际需要,对这些面试角度进行预先设定,在得到面试应答文本后,按照涉及的面试角度,对面试应答文本进行语句分类,得到分类文本,使得后续可以根据分类文本的类别,有针对性地进行重点语句的抽取精炼,有利于提高内容精炼的准确率。Specifically, in the interviewer’s questioning process, questions are usually asked around work experience, field of expertise, and skills. In this embodiment, these interview angles are preset according to actual needs, and the interview response text is obtained. Then, according to the interview angle involved, the interview response text is classified to obtain the classified text, so that the subsequent extraction and refining of key sentences can be carried out according to the category of the classified text, which is beneficial to improve the accuracy of content refining.
其中,涉及的面试角度是指提问和应答的侧重点,例如薪酬要求、所获奖项、工作年限、专业技能等。Among them, the interview angle involved refers to the focus of questions and responses, such as salary requirements, awards, working years, professional skills, etc.
进一步地,本实施例中按照涉及的面试角度,对面试应答文本进行语义识别,并根据语义识别结果对语句分类,得到分类文本的具体实现过程,可参考后续实施例的描述,为避免重复,此处不再赘述。Further, in this embodiment, according to the interview angle involved, the interview response text is semantically recognized, and the sentence is classified according to the semantic recognition result to obtain the specific implementation process of the classified text. You can refer to the description of the subsequent embodiments. To avoid repetition, I won't repeat them here.
其中,根据语义识别结果对语句分类,具体可以是将识别结果进行聚类,得到聚类结果,并将聚类结果与每个面试角度对应的词向量进行欧式距离计算,进而将距离最近的面试角度,作为该聚类结果对应的面试角度。Among them, the sentence classification according to the semantic recognition results can be specifically clustering the recognition results to obtain the clustering results, and calculating the Euclidean distance between the clustering results and the word vectors corresponding to each interview angle, and then calculating the nearest interview Angle, as the interview angle corresponding to the clustering result.
S204:通过语言抽取模型,从每类分类文本中进行语句抽取,得到抽取语句,并采用Transformer模型精炼抽取语句,得到面试精炼语料。S204: Use the language extraction model to extract sentences from each type of classified text to obtain the extracted sentences, and use the Transformer model to refine the extracted sentences to obtain interview refined corpus.
具体地,通过语言抽取模型,从每类分类文本中进行语句抽取,得到抽取语句,再采用Transformer模型精炼抽取语句,得到面试精炼语料。Specifically, the language extraction model is used to extract sentences from each type of classified text to obtain the extracted sentences, and then use the Transformer model to refine the extracted sentences to obtain the interview refined corpus.
其中,语言抽取模型包括但不限于:深度语义表征(Embedding from Language Model,ELMo)算法、OpenAI GPT和预训练双向编码器语义(Bidirectional Encoder Representations from Transformers,BERT)模型。Among them, language extraction models include but are not limited to: deep semantic representation (Embedding from Language Model, ELMo) algorithm, OpenAI GPT, and pre-trained Bidirectional Encoder Representations from Transformers (BERT) model.
优选地,在本实施例中采用改进的OpenAI GPT模型作为语义抽取模型,具体进行语句抽取的实现过程,可参考后续实施例的描述,为避免重复,此处不再赘述。Preferably, in this embodiment, the improved OpenAI GPT model is used as the semantic extraction model. For the specific implementation process of sentence extraction, please refer to the description of the subsequent embodiments. In order to avoid repetition, it will not be repeated here.
需要说明的是,本实施例中得到的抽取语句,其具体表现形式也可以是通过向量的形式,以便后续快速输入到Transformer模型进行精炼抽取。It should be noted that the specific expression form of the extracted sentence obtained in this embodiment may also be in the form of a vector, so that it can be quickly input to the Transformer model for refinement and extraction.
其中,Transformer模型通过注意力机制,能快速根据权重提取重要性较高的语句。Among them, the Transformer model can quickly extract sentences of higher importance based on the weight through the attention mechanism.
需要说明的是,本实施例Transformer模型在解码阶段,将生成的文档特征向量之和输入解码器,这个自回归的长短期时间网络会预测下一句需要抽取的句子,输出的结果会在下一句解码时连接到输入。本实施例Transformer模型使用的解码器与其他常用解码器最大的不同在于,在通过点积进行注意力获取的过程中,如果连续出现两次相同的索引,那么结束整个抽取过程,避免多次提取类似信息导致信息冗余。It should be noted that in the decoding stage of the Transformer model of this embodiment, the sum of the generated document feature vectors is input to the decoder. This autoregressive long- and short-term network will predict the next sentence to be extracted, and the output result will be in the next sentence Connect to the input when decoding. The biggest difference between the decoder used in the Transformer model of this embodiment and other commonly used decoders is that in the process of obtaining attention through dot products, if the same index appears twice in succession, the entire extraction process is ended to avoid multiple extractions. Similar information leads to information redundancy.
应理解,本实施例中,步骤S203至步骤S204,与步骤S202之间,没有必然的逻辑先后顺序,其也可以是并行执行,此处不做限定。It should be understood that, in this embodiment, there is no necessary logical sequence between step S203 to step S204 and step S202, and it can also be executed in parallel, which is not limited here.
S205:将面试者基本信息和面试精炼语料发送给管理端,以使管理端根据面试者基本信息和面试精炼语料确定面试结果。S205: Send the interviewer's basic information and the interview refined corpus to the management terminal, so that the management terminal determines the interview result based on the interviewer's basic information and the interview refined corpus.
具体地,将提取到的面试者基本信息和面试精炼语料发送给管理端,确保提取内容的准确性和精炼,使得后续管理端的用户根据该提取内容可以准确快速确定评估结果,有利 于提高智能面试的准确率和效率。Specifically, the extracted basic information of the interviewer and the interview refined corpus are sent to the management end to ensure the accuracy and refinement of the extracted content, so that the subsequent management end user can accurately and quickly determine the evaluation result based on the extracted content, which is beneficial to improve the intelligent interview Accuracy and efficiency.
在本实施例中,通过获取面试录音,并将面试录音转化为面试文本,其中,面试文本包括自我介绍文本和面试应答文本,对自我介绍文本进行文本解析,得到面试者基本信息,按照涉及的面试角度,对面试应答文本进行语句分类,得到分类文本,通过语言抽取模型,从每类分类文本中进行语句抽取,得到抽取语句,并采用Transformer模型精炼抽取语句,得到面试精炼语料,实现从数据量较大的面试记录内容中,准确提炼出核心内容,提高了内容提炼的准确性,有利于提高智能面试评估的准确性,最后将面试者基本信息和面试精炼语料发送给管理端,以使管理端根据面试者基本信息和面试精炼语料确定面试结果,避免直接进行语义识别导致的评估结果不准确,有利于提高智能面试结果评估的准确性和效率。In this embodiment, the interview recording is obtained and converted into interview text. The interview text includes self-introduction text and interview response text. The self-introduction text is text-analyzed to obtain the basic information of the interviewer. From the interview perspective, the interview response text is classified to obtain the classified text. Through the language extraction model, the sentence is extracted from each type of classified text to obtain the extracted sentence, and the Transformer model is used to refine the extracted sentence to obtain the interview refined corpus. The core content is accurately extracted from the content of a large number of interview records, which improves the accuracy of content extraction, which is conducive to improving the accuracy of intelligent interview evaluation. Finally, the basic information of the interviewer and the interview refined corpus are sent to the management end to enable The management terminal determines the interview results based on the interviewee's basic information and the interview refined corpus, avoiding inaccurate evaluation results caused by direct semantic recognition, which is conducive to improving the accuracy and efficiency of the evaluation of intelligent interview results.
在一实施例中,可将得到的面试者基本信息和面试精炼语料保存在区块链网络上,通过区块链存储,实现数据信息在不同平台之间的共享,也可防止数据被篡改。In one embodiment, the obtained basic interviewer information and interview refined corpus can be stored on the blockchain network, and the data information can be shared between different platforms through the storage of the blockchain, and the data can also be prevented from being tampered with.
其中,区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层。Among them, the blockchain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
在本实施例的一些可选的实现方式中,步骤S201中,将面试录音转化为面试文本包括:In some optional implementation manners of this embodiment, in step S201, converting interview recordings into interview text includes:
识别面试录音中包含的问答开始标识;Identify the beginning of the question and answer included in the interview recording;
采用语音转换文本的方式,对面试录音进行文本转换,并将问答开始标识前的录音内容转化得到的文本,作为自我介绍文本,将问答开始标识前的录音内容转化得到的文本,作为面试应答文本。Use the voice-to-text method to convert the interview recording text, and use the text converted from the recording content before the question and answer start identification as the self-introduction text, and the text converted from the recording content before the question and answer start identification as the interview response text .
具体地,在进行语音转换文本之前,在面试录音文件进行遍历,查找与预设的问答开始标识具有相同语音信息的语音片段,作为分界点,将该语音片段之前的语音,转化得到的文本,作为自我介绍文本,将该语音片段之后的语音,转化得到的文本,作为面试应答文本。Specifically, before performing the voice conversion text, the interview recording file is traversed to find a voice segment with the same voice information as the preset question and answer start identification, as a demarcation point, the voice before the voice segment is converted into the text obtained, As the self-introduction text, the speech after the speech fragment is converted into the text as the interview response text.
其中,查找与预设的问答开始标识具有相同语音信息的语音片段,具体可通过对语音信号进行幅值归一化处理、预加重处理和分帧加窗,得到语音帧集合,进而从该语音帧集合中,通过遍历对比的方式,找到与预设的问答开始标识的语音帧相同的语音帧片段,将该语音帧片段确定为与预设的问答开始标识具有相同语音信息的语音片段。Among them, search for a voice segment with the same voice information as the preset Q&A start identifier. Specifically, the voice signal can be subjected to amplitude normalization processing, pre-emphasis processing, and frame windowing to obtain a voice frame set, and then from the voice In the frame set, the voice frame segment that is the same as the voice frame of the preset question and answer start identifier is found by traversal and comparison, and the voice frame segment is determined as the voice segment with the same voice information as the preset question and answer start identifier.
其中,预设的问答开始标识是用于提醒自我介绍阶段完毕,问答环节开始的语音标识,例如“感谢您的介绍,现在想询问您几个问题”的语音提示等,具体可根据实际情况进行预设,此处不做限定。Among them, the preset Q&A start mark is used to remind the self-introduction stage is over and the Q&A session begins, such as the voice prompt of "Thank you for your introduction, now I want to ask you a few questions", etc., which can be carried out according to the actual situation. By default, there is no limitation here.
其中,语音转换文本,可采用语音识别算法,也可以使用具有语音转换功能的第三方工具,具体不作限制。语语音转换文本算法包括但不限于:基于声道模型的语音识别算法、语音模板匹配识别算法和或人工神经网络的语音识别算法等。Among them, the voice conversion text can use a voice recognition algorithm, or a third-party tool with a voice conversion function can be used, and the specifics are not limited. Speech-to-speech-to-text algorithms include, but are not limited to: speech recognition algorithms based on vocal tract models, speech template matching recognition algorithms, or artificial neural network speech recognition algorithms, etc.
在本实施例中,将面试录音文本转化为自我介绍文本和面试应答文本,使得后续对这两类文本分开处理,更具有针对性,得到的处理结果也更为准确。In this embodiment, the interview recording text is converted into the self-introduction text and the interview response text, so that the two types of text are processed separately in the subsequent process, which is more targeted and the processing result obtained is more accurate.
在本实施例的一些可选的实现方式中,步骤S203中,按照涉及的面试角度,对面试 应答文本进行语句分类,得到分类文本包括:In some optional implementations of this embodiment, in step S203, sentence classification is performed on the interview response text according to the interview angle involved, and the obtained classification text includes:
将面试应答文本中的每个句子作为一个基础语句,并通过预设的分词方式,对基础语句进行分词处理,得到基础分词;Take each sentence in the interview response text as a basic sentence, and perform word segmentation on the basic sentence through the preset word segmentation method to obtain the basic word segmentation;
将基础分词转换为词向量,并通过聚类算法,对词向量进行聚类,得到基础语句对应的聚类中心;Convert the basic word segmentation into a word vector, and cluster the word vector through a clustering algorithm to obtain the cluster center corresponding to the basic sentence;
针对每个基础语句,计算基础语句对应的聚类中心与每个预设面试角度对应的词向量的欧式距离,并将距离最小的预设面试角度,作为基础语句的目标分类,将该基础语句,作为目标分类对应的分类文本。For each basic sentence, calculate the Euclidean distance between the cluster center corresponding to the basic sentence and the word vector corresponding to each preset interview angle, and use the preset interview angle with the smallest distance as the target classification of the basic sentence. , As the classification text corresponding to the target classification.
具体地,通过对面试应答文本中每个句子进行分词聚类,得到每个句子对应的聚类中心,再计算该聚类中心与预设面试角度对应词向量,确定每个句子所属的分类。Specifically, by performing word segmentation clustering on each sentence in the interview response text, the cluster center corresponding to each sentence is obtained, and then the word vector corresponding to the cluster center and the preset interview angle is calculated to determine the classification to which each sentence belongs.
其中,预设的分词方式包括但不限于:通过第三方分词工具或者分词算法等。Among them, the preset word segmentation methods include, but are not limited to: third-party word segmentation tools or word segmentation algorithms, etc.
其中,常见的第三方分词工具包括但不限于:Stanford NLP分词器、ICTClAS分词系统、ansj分词工具和HanLP中文分词工具等。Among them, common third-party word segmentation tools include but are not limited to: Stanford NLP word segmentation, ICTCLAS word segmentation system, ansj word segmentation tool and HanLP Chinese word segmentation tool, etc.
其中,分词算法包括但不限于:基于规则的分词方法、基于统计的分词方法、基于理解的分词方法和神经网络分词法。Among them, word segmentation algorithms include, but are not limited to: rule-based word segmentation methods, statistics-based word segmentation methods, understanding-based word segmentation methods, and neural network word segmentation methods.
基于规则的分词方法主要包括:最小匹配法(Minimum Matching)、正向最大匹配法(Maximum Matching)、逆向最大匹配法(Reverse Directional Maximum Matching)、双向最大匹配法(Bi-Direction Maximum Matching,BMM)、标志切分法、全切分路径选择法和联想--回溯法(Association-Backtracking Method,简称AB法)等。Rule-based word segmentation methods mainly include: Minimum Matching, Maximum Matching, Reverse Direction Maximum Matching, and Bi-Direction Maximum Matching (BMM) , Sign segmentation method, full segmentation path selection method and Association-Backtracking Method (Association-Backtracking Method, AB method for short), etc.
基于统计的分词方法主要包括:N-Gram模型、隐马尔科夫模型(HiddenMarkov Model,HMM)序列标注法、最大熵模型(Maximum Entropy Model,MEM)序列标注法、最大熵马尔科夫模型(Maximum Entropy Markov Model,MEMM)序列标注法和条件随机场(Conditional Random Fields,CRF)序列标注法等。The word segmentation methods based on statistics mainly include: N-Gram model, Hidden Markov Model (HMM) sequence labeling method, Maximum Entropy Model (MEM) sequence labeling method, Maximum Entropy Markov model (Maximum Entropy Markov Model (MEMM) sequence labeling method and Conditional Random Fields (CRF) sequence labeling method, etc.
优选地,本实施例采用改进的CRF模型进行分词,具体实现过程可参考后续实施例的描述,为避免重复,此处不再赘述。Preferably, this embodiment adopts an improved CRF model for word segmentation, and the specific implementation process can refer to the description of the subsequent embodiments. In order to avoid repetition, it will not be repeated here.
容易理解地,通过分词的方式提取基础分词,一方面,可以有效滤掉文本中一些无意义的词汇,另一方面,也有利于后续使用这些文本生成词向量。It is easy to understand that by extracting basic word segmentation by word segmentation, on the one hand, it can effectively filter out some meaningless words in the text, and on the other hand, it is also conducive to the subsequent use of these texts to generate word vectors.
其中,聚类(Cluster)算法又称群分析,它是样品或指标分类问题的一种统计分析方法,同时也是数据挖掘的一个重要算法,聚类算法包括但不限于:K均值(K-Means)聚类算法、均值漂移聚类算法、基于密度的聚类(Density-Based Spatial Clustering of Applications with Noise,DBSCAN)方法、基于高斯混合模型的最大期望聚类、凝聚层次聚类和图团体检测(Graph Community Detection)算法等。Among them, the clustering algorithm is also called cluster analysis. It is a statistical analysis method for the classification of samples or indicators. It is also an important algorithm for data mining. Clustering algorithms include but are not limited to: K-Means ) Clustering algorithm, mean shift clustering algorithm, density-based clustering (Density-Based Spatial Clustering of Applications with Noise, DBSCAN) method, maximum expected clustering based on Gaussian mixture model, agglomerative hierarchical clustering and graph group detection ( Graph Community Detection) algorithm, etc.
优选地,在本实施例中,采用K均值(K-Means)聚类算法。Preferably, in this embodiment, a K-Means clustering algorithm is adopted.
在本实施例中,通过聚类和计算语义相似度,确定面试应答文本中每个句子的分类,有利于后续有针对性地对不同分类的句子进行精炼。In this embodiment, by clustering and calculating semantic similarity, the classification of each sentence in the interview response text is determined, which is beneficial to the subsequent refinement of sentences of different classifications in a targeted manner.
在本实施例的一些可选的实现方式中,通过预设的分词方式,对基础语句进行分词处理,得到基础分词包括:In some optional implementation manners of this embodiment, performing word segmentation processing on the basic sentence through a preset word segmentation method to obtain the basic word segmentation includes:
采用条件随机场模型,对基础语句进行分词,得到初始分词;Using the conditional random field model, the basic sentence is segmented to obtain the initial segmentation;
从历史面试应答文本中,获取每个初始分词的词频;Obtain the word frequency of each initial word segmentation from the historical interview response text;
基于初始分词的词频,生成初始分词的权重,将标注有权重的初始分词,作为基础分 词。Based on the word frequency of the initial word segmentation, the weight of the initial word segmentation is generated, and the weighted initial word segmentation is marked as the basic word segmentation.
具体地,采用条件随机场模型,对基础语句进行分词,得到初始分词,再通过历史面试应答文本,获取每个初始分词的词频,并根据该词频生成初始分词对应的权重,得到带有权重信息的基础分词,使得在后续对基础分词进行标注时,各个基础分词的比重更符合面试场景的需要。Specifically, the conditional random field model is used to segment the basic sentence to obtain the initial word segmentation, and then through the historical interview response text, the word frequency of each initial word segmentation is obtained, and the weight corresponding to the initial word segmentation is generated according to the word frequency, and the weight information is obtained The basic word segmentation makes the proportion of each basic word segmentation more in line with the needs of the interview scene when the basic word segmentation is subsequently marked.
其中,条件随机场(conditional random field,简称CRF)模型,是一种鉴别式机率模型,是随机场的一种,常用于标注或分析序列资料,表示的是给定一组输入随机变量X的条件下另一组输出随机变量Y的马尔可夫随机场,在分词、词性标注和命名实体识别等序列标注任务中具有较好的效果。Among them, the conditional random field (CRF) model is a discriminative probability model, which is a kind of random field. It is often used to label or analyze sequence data. It represents a given set of input random variables X. Under the conditions, another set of Markov random fields outputting random variable Y has good effect in sequence tagging tasks such as word segmentation, part-of-speech tagging and named entity recognition.
其中,历史面试应答文本是指已经发生的面试产生的面试应答文本,通过历史面试应答文本的词频,可以体现出一些分词在面试过程中占的比重。Among them, the historical interview response text refers to the interview response text generated by the interview that has occurred. The word frequency of the historical interview response text can reflect the proportion of some word segmentation in the interview process.
在本实施例中,通过对条件随机场模型进行分词得到的初始分词赋予权重,得到更符合智能面试场景的基础分词,有利于提高分类的准确率。In this embodiment, weight is assigned to the initial word segmentation obtained by performing word segmentation on the conditional random field model to obtain a basic word segmentation that is more in line with the intelligent interview scenario, which is beneficial to improve the accuracy of classification.
在本实施例的一些可选的实现方式中,步骤S204中,语言抽取模型为双向长短期记忆网络模型,双向长短期记忆网络模型包括句子编码器和文档编码器,通过语言抽取模型,从每类分类文本中进行语句抽取,得到抽取语句包括:In some optional implementations of this embodiment, in step S204, the language extraction model is a two-way long short-term memory network model. The two-way long short-term memory network model includes a sentence encoder and a document encoder. Sentence extraction is performed in the class classification text, and the extracted sentences include:
通过句子编码器对分类文本中的文本,按照字符进行拆分,得到基础字符;Use the sentence encoder to split the text in the classified text according to the characters to obtain the basic characters;
对基础字符进行编码,得到基础字符对应的编码内容;Encode the basic characters to obtain the encoding content corresponding to the basic characters;
将编码内容输入到初始化权重的字符编码层,通过字符编码层将每个编码映射成一个字符向量,将每个字符向量作为句子编码结果;Input the encoded content into the character encoding layer of the initial weight, map each encoding into a character vector through the character encoding layer, and use each character vector as the sentence encoding result;
将句子编码结果在正向和逆向隐层输出拼接成隐层向量,并将隐层向量输入到文档编码器;The sentence encoding results are spliced into a hidden layer vector in the forward and reverse hidden layer output, and the hidden layer vector is input to the document encoder;
通过文档编码器对隐层向量进行加权,得到文档特征向量,并对文档特征向量进行解码,将解码得到的输出结果作为抽取语句。The hidden layer vector is weighted by the document encoder to obtain the document feature vector, and the document feature vector is decoded, and the output result obtained by the decoding is used as the extraction sentence.
具体地,通过句子编码器对分类文本中的文本,按照字符进行拆分并编码,得到编码内容,再讲编码内容输入到字符编码层,得到每个编码对应的字符向量,将每个字符向量,作为句子的编码结果,并通过隐层传递给文档编码器,通过文档编码器进行加权,得到抽取语句。Specifically, the text in the classified text is split and encoded by the sentence encoder to obtain the encoding content, and then the encoding content is input into the character encoding layer to obtain the character vector corresponding to each encoding, and each character vector , As the encoding result of the sentence, and passed to the document encoder through the hidden layer, and weighted by the document encoder to obtain the extracted sentence.
值得说明的是,基于句子的编码结果,模型中每个字符对应的正向与逆向隐层输出会被拼接成一个隐层向量:
Figure PCTCN2020118928-appb-000001
其中正向用上标+表示,逆向用上标-表示,第i个字符用下标i表示。
It is worth noting that, based on the encoding result of the sentence, the forward and reverse hidden layer output corresponding to each character in the model will be spliced into a hidden layer vector:
Figure PCTCN2020118928-appb-000001
The positive direction is represented by superscript +, the reverse direction is represented by superscript -, and the i-th character is represented by subscript i.
其中,长短期记忆网络(Long Short-Term Memory,LSTM),是一种时间递归神经网络,适合于处理和预测时间序列中间隔和延迟相对较长的重要事件。Among them, the Long Short-Term Memory (LSTM) is a time recurrent neural network, which is suitable for processing and predicting important events with relatively long intervals and delays in a time series.
需要说明的是,单向LSTM可以按照人类的阅读顺序从一句话的第一个字记忆到最后一个字,这种LSTM结构只能捕捉到上文信息,无法捕捉到下文信息,而双向LSTM由两个方向不同的LSTM组成,一个LSTM按照句子中词的顺序从前往后读取数据,另一个LSTM从后往前按照句子词序的反方向读取数据,这样第一个LSTM获得上文信息,另一个LSTM获得下文信息,两个LSTM的联合说出就是整个句子的上下文信息,而上下文信息是由整个句子提供的,自然包含比较抽象的语义信息(句子的意思),这种方法的优点是 充分利用了LSTM对具有时序特点的序列数据的处理优势,而且由于我们输入了位置特征,其经过双向LSTM编码后可以抽取出位置特征中包含的实体方向信息。It should be noted that the one-way LSTM can memorize the first word to the last word of a sentence according to the human reading order. This LSTM structure can only capture the above information but not the following information, while the two-way LSTM is composed of Two LSTMs with different directions are composed. One LSTM reads data from front to back according to the order of words in the sentence, and the other LSTM reads data from back to front in the opposite direction of the sentence word order, so that the first LSTM obtains the above information. Another LSTM obtains the following information. The joint statement of the two LSTMs is the context information of the entire sentence, and the context information is provided by the entire sentence, which naturally contains more abstract semantic information (meaning of the sentence). The advantage of this method is It makes full use of the advantages of LSTM in processing sequence data with time series characteristics, and because we input location features, the entity direction information contained in the location features can be extracted after bidirectional LSTM encoding.
在本实施例中,通过句子编码器和文档编码器,从两个不同级别的双向长短记忆网络对分类后的句子进行解析抽取,提高关键句子抽取的准确率。In this embodiment, the sentence encoder and the document encoder are used to analyze and extract the classified sentences from two bidirectional long-short memory networks of different levels, so as to improve the accuracy of key sentence extraction.
在本实施例的一些可选的实现方式中,通过文档编码器对隐层向量进行加权,得到文档特征向量包括:In some optional implementation manners of this embodiment, weighting the hidden layer vector by the document encoder to obtain the document feature vector includes:
采用如下公式,确定文档特征向量:Use the following formula to determine the document feature vector:
Figure PCTCN2020118928-appb-000002
Figure PCTCN2020118928-appb-000002
其中,C i为第i个文档特征向量,j为嵌入编码的序号,n为嵌入编码的数量,b ij为第i个文档特征向量针对第j个隐层向量的权重,h j为第j个隐层向量,其中,嵌入编码基于双向长短期记忆网络模型的隐藏状态生成。 Among them, C i is the i-th document feature vector, j is the serial number of the embedding code, n is the number of embedding codes, b ij is the weight of the i-th document feature vector for the j-th hidden layer vector, and h j is the j-th hidden layer vector. A hidden layer vector, in which the embedded coding is generated based on the hidden state of the bidirectional long and short-term memory network model.
在本实施例中,通过加权计算,得到文档特征向量的生成方式,有利于准确抽取重点语句。In this embodiment, the method of generating document feature vectors is obtained through weighted calculation, which is beneficial for accurately extracting key sentences.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.
图3示出与上述实施例基于人工智能的面试内容精炼方法一一对应的基于人工智能的面试内容精炼装置的原理框图。如图3所示,该基于人工智能的面试内容精炼装置包括文本获取模块31、文本解析模块32、文本分类模块33、语料抽取模块34和信息发送模块35。各功能模块详细说明如下:Fig. 3 shows a schematic block diagram of an artificial intelligence-based interview content refining device corresponding to the above-mentioned embodiment of the artificial intelligence-based interview content refining method one-to-one. As shown in FIG. 3, the artificial intelligence-based interview content refining device includes a text acquisition module 31, a text analysis module 32, a text classification module 33, a corpus extraction module 34, and an information sending module 35. The detailed description of each functional module is as follows:
文本获取模块31,用于获取面试录音,并将面试录音转化为面试文本,其中,面试文本包括自我介绍文本和面试应答文本;The text acquisition module 31 is used to acquire interview recordings and convert the interview recordings into interview texts, where the interview text includes self-introduction text and interview response text;
文本解析模块32,用于对自我介绍文本进行文本解析,得到面试者基本信息;The text analysis module 32 is used for text analysis of the self-introduction text to obtain basic information of the interviewer;
文本分类模块33,用于按照涉及的面试角度,对面试应答文本进行语句分类,得到分类文本;The text classification module 33 is used to classify the interview response text according to the interview angle involved to obtain the classified text;
语料抽取模块34,用于通过语言抽取模型,从每类分类文本中进行语句抽取,得到抽取语句,并采用Transformer模型精炼抽取语句,得到面试精炼语料;The corpus extraction module 34 is used to extract sentences from each type of classified text through the language extraction model to obtain the extracted sentences, and use the Transformer model to refine the extracted sentences to obtain the interview refined corpus;
信息发送模块35,用于将面试者基本信息和面试精炼语料发送给管理端,以使管理端根据面试者基本信息和面试精炼语料确定面试结果。The information sending module 35 is used to send the interviewer's basic information and the interview refined corpus to the management terminal, so that the management terminal determines the interview result based on the interviewer's basic information and the interview refined corpus.
可选地,文本获取模块31包括:Optionally, the text acquisition module 31 includes:
标识识别单元,用于识别面试录音中包含的问答开始标识;Logo recognition unit, used to identify the start logo of the question and answer contained in the interview recording;
文本确定单元,用于采用语音转换文本的方式,对面试录音进行文本转换,并将问答开始标识前的录音内容转化得到的文本,作为自我介绍文本,将问答开始标识前的录音内容转化得到的文本,作为面试应答文本。The text determination unit is used to convert the interview recording to the text by using the voice-to-text method, and convert the text obtained by converting the recording content before the question and answer start marking, as a self-introduction text, and convert the recording content before the question and answer start marking Text, as the interview response text.
可选地,文本分类模块33包括:Optionally, the text classification module 33 includes:
分词单元,用于将面试应答文本中的每个句子作为一个基础语句,并通过预设的分词方式,对基础语句进行分词处理,得到基础分词;The word segmentation unit is used to treat each sentence in the interview response text as a basic sentence, and use the preset word segmentation method to segment the basic sentence to obtain the basic word segmentation;
聚类单元,用于将基础分词转换为词向量,并通过聚类算法,对词向量进行聚类,得到基础语句对应的聚类中心;The clustering unit is used to convert the basic word segmentation into a word vector, and use a clustering algorithm to cluster the word vector to obtain the cluster center corresponding to the basic sentence;
分类单元,用于针对每个基础语句,计算基础语句对应的聚类中心与每个预设面试角 度对应的词向量的欧式距离,并将距离最小的预设面试角度,作为基础语句的目标分类,将基础语句,作为目标分类对应的分类文本。The classification unit is used to calculate the Euclidean distance between the cluster center corresponding to the basic sentence and the word vector corresponding to each preset interview angle for each basic sentence, and use the preset interview angle with the smallest distance as the target classification of the basic sentence , Regard the basic sentence as the classification text corresponding to the target classification.
可选地,分词单元包括:Optionally, the word segmentation unit includes:
初始分词单元,用于采用条件随机场模型,对基础语句进行分词,得到初始分词;The initial word segmentation unit is used to segment the basic sentence using the conditional random field model to obtain the initial word segmentation;
词频获取子单元,用于从历史面试应答文本中,获取每个初始分词的词频;The word frequency acquisition subunit is used to obtain the word frequency of each initial word segmentation from the historical interview response text;
分词加权单元,用于基于初始分词的词频,生成初始分词的权重,将标注有权重的初始分词,作为基础分词。The word segmentation weighting unit is used to generate the weight of the initial word segmentation based on the word frequency of the initial word segmentation, and use the weighted initial word segmentation as the basic word segmentation.
可选地,语料抽取模块34包括:Optionally, the corpus extraction module 34 includes:
拆分单元,用于通过句子编码器对分类文本中的文本,按照字符进行拆分,得到基础字符;The splitting unit is used to split the text in the classified text by the sentence encoder according to the characters to obtain the basic characters;
编码单元,用于对基础字符进行编码,得到基础字符对应的编码内容;The coding unit is used to encode the basic character to obtain the coding content corresponding to the basic character;
映射单元,用于将编码内容输入到初始化权重的字符编码层,通过字符编码层将每个编码映射成一个字符向量,将每个字符向量作为句子编码结果;The mapping unit is used to input the encoded content to the character encoding layer of the initialization weight, and map each encoding into a character vector through the character encoding layer, and use each character vector as a sentence encoding result;
拼接单元,用于将句子编码结果在正向和逆向隐层输出拼接成隐层向量,并将隐层向量输入到文档编码器;The splicing unit is used to splice the sentence encoding result into a hidden layer vector in the forward and reverse hidden layer output, and input the hidden layer vector to the document encoder;
加权单元,用于通过文档编码器对隐层向量进行加权,得到文档特征向量,并对文档特征向量进行解码,将解码得到的输出结果作为抽取语句。The weighting unit is used to weight the hidden layer vector by the document encoder to obtain the document feature vector, decode the document feature vector, and use the decoded output result as the extraction sentence.
可选地,加权解码单元包括:Optionally, the weighted decoding unit includes:
计算子单元,用于采用如下公式,确定文档特征向量:The calculation subunit is used to determine the document feature vector using the following formula:
Figure PCTCN2020118928-appb-000003
Figure PCTCN2020118928-appb-000003
其中,C i为第i个文档特征向量,j为嵌入编码的序号,n为嵌入编码的数量,b ij为第i个文档特征向量针对第j个隐层向量的权重,h j为第j个隐层向量,其中,嵌入编码基于双向长短期记忆网络模型的隐藏状态生成。 Among them, C i is the i-th document feature vector, j is the serial number of the embedding code, n is the number of embedding codes, b ij is the weight of the i-th document feature vector for the j-th hidden layer vector, and h j is the j-th hidden layer vector. A hidden layer vector, in which the embedded coding is generated based on the hidden state of the bidirectional long and short-term memory network model.
可选地,该基于人工智能的面试内容精炼装置还包括:Optionally, the device for refining interview content based on artificial intelligence further includes:
存储模块,用于将面试者基本信息和面试精炼语料存储至区块链网络中。The storage module is used to store the basic information of the interviewer and the refined corpus of the interview in the blockchain network.
关于基于人工智能的面试内容精炼装置的具体限定可以参见上文中对于基于人工智能的面试内容精炼方法的限定,在此不再赘述。上述基于人工智能的面试内容精炼装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。Regarding the specific limitation of the artificial intelligence-based interview content refining device, please refer to the above limitation on the artificial intelligence-based interview content refining method, which will not be repeated here. Each module in the above artificial intelligence-based interview content refining device can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图4,图4为本实施例计算机设备基本结构框图。In order to solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 4 for details. FIG. 4 is a block diagram of the basic structure of the computer device in this embodiment.
所述计算机设备4包括通过系统总线相互通信连接存储器41、处理器42、网络接口43。需要指出的是,图中仅示出了具有组件连接存储器41、处理器42、网络接口43的计算机设备4,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设 备等。The computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are connected to each other in communication via a system bus. It should be pointed out that the figure only shows the computer device 4 with the components connected to the memory 41, the processor 42, and the network interface 43, but it should be understood that it is not required to implement all the shown components, and alternative implementations can be made. More or fewer components. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
所述存储器41至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或D界面显示存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器41可以是所述计算机设备4的内部存储单元,例如该计算机设备4的硬盘或内存。在另一些实施例中,所述存储器41也可以是所述计算机设备4的外部存储设备,例如该计算机设备4上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器41还可以既包括所述计算机设备4的内部存储单元也包括其外部存储设备。本实施例中,所述存储器41通常用于存储安装于所述计算机设备4的操作系统和各类应用软件,例如电子文件的控制的程序代码等。此外,所述存储器41还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 41 includes at least one type of readable storage medium, the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or D interface display memory, etc.), random access memory (RAM) , Static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk equipped on the computer device 4, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc. Of course, the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device. In this embodiment, the memory 41 is generally used to store an operating system and various application software installed in the computer device 4, such as program codes for controlling electronic files. In addition, the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
所述处理器42在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器42通常用于控制所述计算机设备4的总体操作。本实施例中,所述处理器42用于运行所述存储器41中存储的程序代码或者处理数据,例如运行电子文件的控制的程序代码。The processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 42 is generally used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to run program codes or process data stored in the memory 41, for example, run program codes for controlling electronic files.
所述网络接口43可包括无线网络接口或有线网络接口,该网络接口43通常用于在所述计算机设备4与其他电子设备之间建立通信连接。The network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读存储介质存储有界面显示程序,所述界面显示程序可被至少一个处理器执行,以使所述至少一个处理器执行如上述的基于人工智能的面试内容精炼方法的步骤。This application also provides another implementation manner, that is, a computer-readable storage medium is provided. The computer-readable storage medium may be non-volatile or volatile, and the computer-readable storage medium stores An interface display program, the interface display program can be executed by at least one processor, so that the at least one processor executes the steps of the above-mentioned artificial intelligence-based interview content refinement method.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Obviously, the above-described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. The drawings show preferred embodiments of the present application, but do not limit the patent scope of the present application. The present application can be implemented in many different forms. On the contrary, the purpose of providing these examples is to make the understanding of the disclosure of the present application more thorough and comprehensive. Although this application has been described in detail with reference to the foregoing embodiments, for those skilled in the art, it is still possible for those skilled in the art to modify the technical solutions described in each of the foregoing specific embodiments, or equivalently replace some of the technical features. . All equivalent structures made by using the contents of the description and drawings of this application, directly or indirectly used in other related technical fields, are similarly within the scope of patent protection of this application.

Claims (20)

  1. 一种基于人工智能的面试内容精炼方法,其中,所述基于人工智能的面试内容精炼方法包括:An artificial intelligence-based interview content refining method, wherein the artificial intelligence-based interview content refining method includes:
    获取面试录音,并将所述面试录音转化为面试文本,其中,所述面试文本包括自我介绍文本和面试应答文本;Obtain interview recordings, and convert the interview recordings into interview texts, where the interview texts include self-introduction texts and interview response texts;
    对所述自我介绍文本进行文本解析,得到面试者基本信息;Perform text analysis on the self-introduction text to obtain the basic information of the interviewer;
    按照涉及的面试角度,对所述面试应答文本进行语句分类,得到分类文本;According to the interview angle involved, classify the sentence of the interview response text to obtain the classified text;
    通过语言抽取模型,从每类所述分类文本中进行语句抽取,得到抽取语句,并采用Transformer模型精炼所述抽取语句,得到面试精炼语料;Using the language extraction model, extract sentences from each type of the classified text to obtain the extracted sentences, and use the Transformer model to refine the extracted sentences to obtain the interview refined corpus;
    将所述面试者基本信息和所述面试精炼语料发送给管理端,以使所述管理端根据所述面试者基本信息和所述面试精炼语料确定面试结果。The basic information of the interviewer and the refined corpus of the interview are sent to the management terminal, so that the management terminal determines the interview result according to the basic information of the interviewer and the refined corpus of the interview.
  2. 如权利要求1所述的基于人工智能的面试内容精炼方法,其中,所述将所述面试录音转化为面试文本包括:The method for refining interview content based on artificial intelligence according to claim 1, wherein said converting said interview recording into interview text comprises:
    识别所述面试录音中包含的问答开始标识;Identify the question and answer start mark contained in the interview recording;
    采用语音转换文本的方式,对所述面试录音进行文本转换,并将所述问答开始标识前的录音内容转化得到的文本,作为自我介绍文本,将所述问答开始标识前的录音内容转化得到的文本,作为面试应答文本。Using voice-to-text, the interview recording is converted into text, and the text obtained by converting the recording content before the start of the question and answer mark is used as the self-introduction text, and the recorded content before the start of the question and answer mark is converted into the text Text, as the interview response text.
  3. 如权利要求1所述的基于人工智能的面试内容精炼方法,其中,所述按照涉及的面试角度,对所述面试应答文本进行语句分类,得到分类文本包括:The method for refining interview content based on artificial intelligence according to claim 1, wherein said classifying the interview response text according to the interview angle involved, and obtaining the classified text comprises:
    将面试应答文本中的每个句子作为一个基础语句,并通过预设的分词方式,对所述基础语句进行分词处理,得到基础分词;Use each sentence in the interview response text as a basic sentence, and perform word segmentation processing on the basic sentence through a preset word segmentation method to obtain the basic word segmentation;
    将所述基础分词转换为词向量,并通过聚类算法,对所述词向量进行聚类,得到所述基础语句对应的聚类中心;Converting the basic word segmentation into a word vector, and clustering the word vector through a clustering algorithm to obtain the cluster center corresponding to the basic sentence;
    针对每个基础语句,计算所述基础语句对应的聚类中心与每个预设面试角度对应的词向量的欧式距离,并将距离最小的预设面试角度,作为所述基础语句的目标分类,将所述基础语句,作为所述目标分类对应的分类文本。For each basic sentence, calculate the Euclidean distance between the cluster center corresponding to the basic sentence and the word vector corresponding to each preset interview angle, and use the preset interview angle with the smallest distance as the target classification of the basic sentence, The basic sentence is used as the classification text corresponding to the target classification.
  4. 如权利要求3所述的基于人工智能的面试内容精炼方法,其中,所述通过预设的分词方式,对所述基础语句进行分词处理,得到基础分词包括:The method for refining interview content based on artificial intelligence according to claim 3, wherein said performing word segmentation processing on said basic sentence through a preset word segmentation method to obtain basic word segmentation comprises:
    采用条件随机场模型,对所述基础语句进行分词,得到初始分词;Using a conditional random field model to segment the basic sentence to obtain an initial segmentation;
    从历史面试应答文本中,获取每个所述初始分词的词频;Obtain the word frequency of each initial word segmentation from the historical interview response text;
    基于所述初始分词的词频,生成所述初始分词的权重,将所述标注有权重的初始分词,作为所述基础分词。Based on the word frequency of the initial segmentation, the weight of the initial segmentation is generated, and the initial segmentation labeled with the weight is used as the basic segmentation.
  5. 如权利要求1所述的基于人工智能的面试内容精炼方法,其中,所述语言抽取模型为双向长短期记忆网络模型,所述双向长短期记忆网络模型包括句子编码器和文档编码器,所述通过语言抽取模型,从每类所述分类文本中进行语句抽取,得到抽取语句包括:The method for refining interview content based on artificial intelligence according to claim 1, wherein the language extraction model is a two-way long and short-term memory network model, and the two-way long and short-term memory network model includes a sentence encoder and a document encoder. Through the language extraction model, sentences are extracted from each type of the classified text, and the extracted sentences are obtained including:
    通过所述句子编码器对所述分类文本中的文本,按照字符进行拆分,得到基础字符;Splitting the text in the classified text by the sentence encoder according to characters to obtain basic characters;
    对基础字符进行编码,得到所述基础字符对应的编码内容;Encoding the basic characters to obtain the encoding content corresponding to the basic characters;
    将所述编码内容输入到初始化权重的字符编码层,通过所述字符编码层将每个编码映射成一个字符向量,将每个所述字符向量作为句子编码结果;Inputting the encoded content to a character encoding layer of initial weights, mapping each encoding into a character vector through the character encoding layer, and using each character vector as a sentence encoding result;
    将句子编码结果在正向和逆向隐层输出拼接成隐层向量,并将所述隐层向量输入到所述文档编码器;Concatenating the sentence encoding result into a hidden layer vector in the forward and reverse hidden layer outputs, and inputting the hidden layer vector to the document encoder;
    通过所述文档编码器对所述隐层向量进行加权,得到文档特征向量,并对所述文档特征向量进行解码,将解码得到的输出结果作为所述抽取语句。The hidden layer vector is weighted by the document encoder to obtain a document feature vector, and the document feature vector is decoded, and the output result obtained by the decoding is used as the extraction sentence.
  6. 如权利要求5所述的基于人工智能的面试内容精炼方法,其中,所述通过所述文档编码器对所述隐层向量进行加权,得到文档特征向量包括:The method for refining interview content based on artificial intelligence according to claim 5, wherein said weighting said hidden layer vector by said document encoder to obtain a document feature vector comprises:
    采用如下公式,确定所述文档特征向量:Use the following formula to determine the document feature vector:
    Figure PCTCN2020118928-appb-100001
    Figure PCTCN2020118928-appb-100001
    其中,C i为第i个所述文档特征向量,j为嵌入编码的序号,n为嵌入编码的数量,b ij为第i个所述文档特征向量针对第j个隐层向量的权重,h j为第j个隐层向量,其中,所述嵌入编码基于所述双向长短期记忆网络模型的隐藏状态生成。 Wherein, C i is the i-th document feature vector, j is the serial number of the embedding code, n is the number of embedding codes, b ij is the weight of the i-th document feature vector for the j-th hidden layer vector, h j is the j-th hidden layer vector, where the embedded coding is generated based on the hidden state of the bidirectional long short-term memory network model.
  7. 如权利要求1所述的基于人工智能的面试内容精炼方法,其中,在所述采用Transformer模型精炼所述抽取语句,得到面试精炼语料之后,还包括:将所述面试者基本信息和所述面试精炼语料存储至区块链网络中。The method for refining interview content based on artificial intelligence according to claim 1, wherein after the extracting sentence is refined by the Transformer model and the interview refined corpus is obtained, the method further comprises: combining the basic information of the interviewer and the interview The refined corpus is stored in the blockchain network.
  8. 一种基于人工智能的面试内容精炼装置,其中,基于人工智能的面试内容精炼装置包括:An artificial intelligence-based interview content refining device, wherein the artificial intelligence-based interview content refining device includes:
    文本获取模块,用于获取面试录音,并将所述面试录音转化为面试文本,其中,所述面试文本包括自我介绍文本和面试应答文本;A text acquisition module for acquiring interview recordings and converting the interview recordings into interview texts, where the interview texts include self-introduction texts and interview response texts;
    文本解析模块,用于对所述自我介绍文本进行文本解析,得到面试者基本信息;The text analysis module is used for text analysis of the self-introduction text to obtain basic information of the interviewer;
    文本分类模块,用于按照涉及的面试角度,对所述面试应答文本进行语句分类,得到分类文本;The text classification module is used to classify the interview response text according to the interview angle involved to obtain the classified text;
    语料抽取模块,用于通过语言抽取模型,从每类所述分类文本中进行语句抽取,得到抽取语句,并采用Transformer模型精炼所述抽取语句,得到面试精炼语料;The corpus extraction module is used to extract sentences from each type of the classified text through the language extraction model to obtain the extracted sentences, and use the Transformer model to refine the extracted sentences to obtain the interview refined corpus;
    信息发送模块,用于将所述面试者基本信息和所述面试精炼语料发送给管理端,以使所述管理端根据所述面试者基本信息和所述面试精炼语料确定面试结果。The information sending module is configured to send the basic information of the interviewer and the refined corpus of the interview to the management terminal, so that the management terminal determines the interview result according to the basic information of the interviewer and the refined corpus of the interview.
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, wherein the processor implements the following steps when the processor executes the computer-readable instructions:
    获取面试录音,并将所述面试录音转化为面试文本,其中,所述面试文本包括自我介绍文本和面试应答文本;Obtain interview recordings, and convert the interview recordings into interview texts, where the interview texts include self-introduction texts and interview response texts;
    对所述自我介绍文本进行文本解析,得到面试者基本信息;Perform text analysis on the self-introduction text to obtain the basic information of the interviewer;
    按照涉及的面试角度,对所述面试应答文本进行语句分类,得到分类文本;According to the interview angle involved, classify the sentence of the interview response text to obtain the classified text;
    通过语言抽取模型,从每类所述分类文本中进行语句抽取,得到抽取语句,并采用Transformer模型精炼所述抽取语句,得到面试精炼语料;Using the language extraction model, extract sentences from each type of the classified text to obtain the extracted sentences, and use the Transformer model to refine the extracted sentences to obtain the interview refined corpus;
    将所述面试者基本信息和所述面试精炼语料发送给管理端,以使所述管理端根据所述面试者基本信息和所述面试精炼语料确定面试结果。The basic information of the interviewer and the refined corpus of the interview are sent to the management terminal, so that the management terminal determines the interview result according to the basic information of the interviewer and the refined corpus of the interview.
  10. 如权利要求9所述的计算机设备,其中,所述将所述面试录音转化为面试文本包括:9. The computer device according to claim 9, wherein said converting said interview recording into interview text comprises:
    识别所述面试录音中包含的问答开始标识;Identify the question and answer start mark contained in the interview recording;
    采用语音转换文本的方式,对所述面试录音进行文本转换,并将所述问答开始标识前 的录音内容转化得到的文本,作为自我介绍文本,将所述问答开始标识前的录音内容转化得到的文本,作为面试应答文本。Using voice-to-text, the interview recording is converted into text, and the text obtained by converting the recording content before the start of the question and answer mark is used as the self-introduction text, and the recorded content before the start of the question and answer mark is converted into the text Text, as the interview response text.
  11. 如权利要求9所述的计算机设备,其中,所述按照涉及的面试角度,对所述面试应答文本进行语句分类,得到分类文本包括:9. The computer device according to claim 9, wherein said classifying the interview response text according to the interview angle involved, and obtaining the classified text comprises:
    将面试应答文本中的每个句子作为一个基础语句,并通过预设的分词方式,对所述基础语句进行分词处理,得到基础分词;Use each sentence in the interview response text as a basic sentence, and perform word segmentation processing on the basic sentence through a preset word segmentation method to obtain the basic word segmentation;
    将所述基础分词转换为词向量,并通过聚类算法,对所述词向量进行聚类,得到所述基础语句对应的聚类中心;Converting the basic word segmentation into a word vector, and clustering the word vector through a clustering algorithm to obtain the cluster center corresponding to the basic sentence;
    针对每个基础语句,计算所述基础语句对应的聚类中心与每个预设面试角度对应的词向量的欧式距离,并将距离最小的预设面试角度,作为所述基础语句的目标分类,将所述基础语句,作为所述目标分类对应的分类文本。For each basic sentence, calculate the Euclidean distance between the cluster center corresponding to the basic sentence and the word vector corresponding to each preset interview angle, and use the preset interview angle with the smallest distance as the target classification of the basic sentence, The basic sentence is used as the classification text corresponding to the target classification.
  12. 如权利要求11所述的计算机设备,其中,所述通过预设的分词方式,对所述基础语句进行分词处理,得到基础分词包括:11. The computer device according to claim 11, wherein said performing word segmentation processing on said basic sentence through a preset word segmentation method to obtain basic word segmentation comprises:
    采用条件随机场模型,对所述基础语句进行分词,得到初始分词;Using a conditional random field model to segment the basic sentence to obtain an initial segmentation;
    从历史面试应答文本中,获取每个所述初始分词的词频;Obtain the word frequency of each initial word segmentation from the historical interview response text;
    基于所述初始分词的词频,生成所述初始分词的权重,将所述标注有权重的初始分词,作为所述基础分词。Based on the word frequency of the initial segmentation, the weight of the initial segmentation is generated, and the initial segmentation labeled with the weight is used as the basic segmentation.
  13. 如权利要求9所述的计算机设备,其中,所述语言抽取模型为双向长短期记忆网络模型,所述双向长短期记忆网络模型包括句子编码器和文档编码器,所述通过语言抽取模型,从每类所述分类文本中进行语句抽取,得到抽取语句包括:The computer device according to claim 9, wherein the language extraction model is a two-way long and short-term memory network model, the two-way long and short-term memory network model includes a sentence encoder and a document encoder, and the language extraction model from Sentence extraction is performed in each type of the classified text, and the extracted sentences include:
    通过所述句子编码器对所述分类文本中的文本,按照字符进行拆分,得到基础字符;Splitting the text in the classified text by the sentence encoder according to characters to obtain basic characters;
    对基础字符进行编码,得到所述基础字符对应的编码内容;Encoding the basic characters to obtain the encoding content corresponding to the basic characters;
    将所述编码内容输入到初始化权重的字符编码层,通过所述字符编码层将每个编码映射成一个字符向量,将每个所述字符向量作为句子编码结果;Inputting the encoded content to a character encoding layer of initial weights, mapping each encoding into a character vector through the character encoding layer, and using each character vector as a sentence encoding result;
    将句子编码结果在正向和逆向隐层输出拼接成隐层向量,并将所述隐层向量输入到所述文档编码器;Concatenating the sentence encoding result into a hidden layer vector in the forward and reverse hidden layer outputs, and inputting the hidden layer vector to the document encoder;
    通过所述文档编码器对所述隐层向量进行加权,得到文档特征向量,并对所述文档特征向量进行解码,将解码得到的输出结果作为所述抽取语句。The hidden layer vector is weighted by the document encoder to obtain a document feature vector, and the document feature vector is decoded, and the output result obtained by the decoding is used as the extraction sentence.
  14. 如权利要求13所述的计算机设备,其中,所述通过所述文档编码器对所述隐层向量进行加权,得到文档特征向量包括:The computer device according to claim 13, wherein said weighting said hidden layer vector by said document encoder to obtain a document feature vector comprises:
    采用如下公式,确定所述文档特征向量:Use the following formula to determine the document feature vector:
    Figure PCTCN2020118928-appb-100002
    Figure PCTCN2020118928-appb-100002
    其中,C i为第i个所述文档特征向量,j为嵌入编码的序号,n为嵌入编码的数量,b ij为第i个所述文档特征向量针对第j个隐层向量的权重,h j为第j个隐层向量,其中,所述嵌入编码基于所述双向长短期记忆网络模型的隐藏状态生成。 Wherein, C i is the i-th document feature vector, j is the serial number of the embedding code, n is the number of embedding codes, b ij is the weight of the i-th document feature vector for the j-th hidden layer vector, h j is the j-th hidden layer vector, where the embedded coding is generated based on the hidden state of the bidirectional long short-term memory network model.
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,其中,所述计算机可读指令被处理器执行时实现如权利要求下步骤:A computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, wherein, when the computer-readable instructions are executed by a processor, the following steps are implemented as claimed in the claims:
    获取面试录音,并将所述面试录音转化为面试文本,其中,所述面试文本包括自我介绍文本和面试应答文本;Obtain interview recordings, and convert the interview recordings into interview texts, where the interview texts include self-introduction texts and interview response texts;
    对所述自我介绍文本进行文本解析,得到面试者基本信息;Perform text analysis on the self-introduction text to obtain the basic information of the interviewer;
    按照涉及的面试角度,对所述面试应答文本进行语句分类,得到分类文本;According to the interview angle involved, classify the sentence of the interview response text to obtain the classified text;
    通过语言抽取模型,从每类所述分类文本中进行语句抽取,得到抽取语句,并采用Transformer模型精炼所述抽取语句,得到面试精炼语料;Using the language extraction model, extract sentences from each type of the classified text to obtain the extracted sentences, and use the Transformer model to refine the extracted sentences to obtain the interview refined corpus;
    将所述面试者基本信息和所述面试精炼语料发送给管理端,以使所述管理端根据所述面试者基本信息和所述面试精炼语料确定面试结果。The basic information of the interviewer and the refined corpus of the interview are sent to the management terminal, so that the management terminal determines the interview result according to the basic information of the interviewer and the refined corpus of the interview.
  16. 如权利要求15所述的计算机可读存储介质,其中,所述将所述面试录音转化为面试文本包括:15. The computer-readable storage medium of claim 15, wherein said converting said interview recording into interview text comprises:
    识别所述面试录音中包含的问答开始标识;Identify the question and answer start mark contained in the interview recording;
    采用语音转换文本的方式,对所述面试录音进行文本转换,并将所述问答开始标识前的录音内容转化得到的文本,作为自我介绍文本,将所述问答开始标识前的录音内容转化得到的文本,作为面试应答文本。Using voice-to-text, the interview recording is converted into text, and the text obtained by converting the recording content before the start of the question and answer mark is used as the self-introduction text, and the recorded content before the start of the question and answer mark is converted into the text Text, as the interview response text.
  17. 如权利要求15所述的计算机可读存储介质,其中,所述按照涉及的面试角度,对所述面试应答文本进行语句分类,得到分类文本包括:15. The computer-readable storage medium according to claim 15, wherein the sentence classification of the interview response text according to the interview angle involved to obtain the classified text comprises:
    将面试应答文本中的每个句子作为一个基础语句,并通过预设的分词方式,对所述基础语句进行分词处理,得到基础分词;Use each sentence in the interview response text as a basic sentence, and perform word segmentation processing on the basic sentence through a preset word segmentation method to obtain the basic word segmentation;
    将所述基础分词转换为词向量,并通过聚类算法,对所述词向量进行聚类,得到所述基础语句对应的聚类中心;Converting the basic word segmentation into a word vector, and clustering the word vector through a clustering algorithm to obtain the cluster center corresponding to the basic sentence;
    针对每个基础语句,计算所述基础语句对应的聚类中心与每个预设面试角度对应的词向量的欧式距离,并将距离最小的预设面试角度,作为所述基础语句的目标分类,将所述基础语句,作为所述目标分类对应的分类文本。For each basic sentence, calculate the Euclidean distance between the cluster center corresponding to the basic sentence and the word vector corresponding to each preset interview angle, and use the preset interview angle with the smallest distance as the target classification of the basic sentence, The basic sentence is used as the classification text corresponding to the target classification.
  18. 如权利要求17所述的计算机可读存储介质,其中,所述通过预设的分词方式,对所述基础语句进行分词处理,得到基础分词包括:17. The computer-readable storage medium of claim 17, wherein said performing word segmentation processing on said basic sentence through a preset word segmentation method to obtain basic word segmentation comprises:
    采用条件随机场模型,对所述基础语句进行分词,得到初始分词;Using a conditional random field model to segment the basic sentence to obtain an initial segmentation;
    从历史面试应答文本中,获取每个所述初始分词的词频;Obtain the word frequency of each initial word segmentation from the historical interview response text;
    基于所述初始分词的词频,生成所述初始分词的权重,将所述标注有权重的初始分词,作为所述基础分词。Based on the word frequency of the initial segmentation, the weight of the initial segmentation is generated, and the initial segmentation labeled with the weight is used as the basic segmentation.
  19. 如权利要求15所述的计算机可读存储介质,其中,所述语言抽取模型为双向长短期记忆网络模型,所述双向长短期记忆网络模型包括句子编码器和文档编码器,所述通过语言抽取模型,从每类所述分类文本中进行语句抽取,得到抽取语句包括:The computer-readable storage medium according to claim 15, wherein the language extraction model is a two-way long and short-term memory network model, the two-way long and short-term memory network model includes a sentence encoder and a document encoder, and the language extraction The model extracts sentences from each type of the classified text, and obtains the extracted sentences including:
    通过所述句子编码器对所述分类文本中的文本,按照字符进行拆分,得到基础字符;Splitting the text in the classified text by the sentence encoder according to characters to obtain basic characters;
    对基础字符进行编码,得到所述基础字符对应的编码内容;Encoding the basic characters to obtain the encoding content corresponding to the basic characters;
    将所述编码内容输入到初始化权重的字符编码层,通过所述字符编码层将每个编码映射成一个字符向量,将每个所述字符向量作为句子编码结果;Inputting the encoded content to a character encoding layer of initial weights, mapping each encoding into a character vector through the character encoding layer, and using each character vector as a sentence encoding result;
    将句子编码结果在正向和逆向隐层输出拼接成隐层向量,并将所述隐层向量输入到所述文档编码器;Concatenating the sentence encoding result into a hidden layer vector in the forward and reverse hidden layer outputs, and inputting the hidden layer vector to the document encoder;
    通过所述文档编码器对所述隐层向量进行加权,得到文档特征向量,并对所述文档特征向量进行解码,将解码得到的输出结果作为所述抽取语句。The hidden layer vector is weighted by the document encoder to obtain a document feature vector, and the document feature vector is decoded, and the output result obtained by the decoding is used as the extraction sentence.
  20. 如权利要求19所述的计算机可读存储介质,其中,所述通过所述文档编码器对所述隐层向量进行加权,得到文档特征向量包括:19. The computer-readable storage medium of claim 19, wherein said weighting said hidden layer vector by said document encoder to obtain a document feature vector comprises:
    采用如下公式,确定所述文档特征向量:Use the following formula to determine the document feature vector:
    Figure PCTCN2020118928-appb-100003
    Figure PCTCN2020118928-appb-100003
    其中,C i为第i个所述文档特征向量,j为嵌入编码的序号,n为嵌入编码的数量,b ij为第i个所述文档特征向量针对第j个隐层向量的权重,h j为第j个隐层向量,其中,所述嵌入编码基于所述双向长短期记忆网络模型的隐藏状态生成。 Wherein, C i is the i-th document feature vector, j is the serial number of the embedding code, n is the number of embedding codes, b ij is the weight of the i-th document feature vector for the j-th hidden layer vector, h j is the j-th hidden layer vector, where the embedded coding is generated based on the hidden state of the bidirectional long short-term memory network model.
PCT/CN2020/118928 2020-04-29 2020-09-29 Artificial intelligence-based interview content refining method, apparatus and device, and medium WO2021218028A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010356767.3 2020-04-29
CN202010356767.3A CN111695338A (en) 2020-04-29 2020-04-29 Interview content refining method, device, equipment and medium based on artificial intelligence

Publications (1)

Publication Number Publication Date
WO2021218028A1 true WO2021218028A1 (en) 2021-11-04

Family

ID=72476872

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118928 WO2021218028A1 (en) 2020-04-29 2020-09-29 Artificial intelligence-based interview content refining method, apparatus and device, and medium

Country Status (2)

Country Link
CN (1) CN111695338A (en)
WO (1) WO2021218028A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115062134A (en) * 2022-08-17 2022-09-16 腾讯科技(深圳)有限公司 Knowledge question-answering model training and knowledge question-answering method, device and computer equipment
CN116229955A (en) * 2023-05-09 2023-06-06 海尔优家智能科技(北京)有限公司 Interactive intention information determining method based on generated pre-training GPT model
CN116304111A (en) * 2023-04-10 2023-06-23 大连数通云网络科技有限公司 AI call optimization processing method and server based on visual service data

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695338A (en) * 2020-04-29 2020-09-22 平安科技(深圳)有限公司 Interview content refining method, device, equipment and medium based on artificial intelligence
CN113761143A (en) * 2020-11-13 2021-12-07 北京京东尚科信息技术有限公司 Method, apparatus, device and medium for determining answers to user questions
CN112466308B (en) * 2020-11-25 2024-09-06 北京明略软件系统有限公司 Auxiliary interview method and system based on voice recognition
CN113449095A (en) * 2021-07-02 2021-09-28 中国工商银行股份有限公司 Interview data analysis method and device
CN113709028A (en) * 2021-09-29 2021-11-26 五八同城信息技术有限公司 Interview video processing method and device, electronic equipment and readable medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170357945A1 (en) * 2016-06-14 2017-12-14 Recruiter.AI, Inc. Automated matching of job candidates and job listings for recruitment
CN109919564A (en) * 2018-12-20 2019-06-21 平安科技(深圳)有限公司 Interview optimization method and device, storage medium, computer equipment
CN110223742A (en) * 2019-06-14 2019-09-10 中南大学 The clinical manifestation information extraction method and equipment of Chinese electronic health record data
CN110472647A (en) * 2018-05-10 2019-11-19 百度在线网络技术(北京)有限公司 Secondary surface method for testing, device and storage medium based on artificial intelligence
CN110543639A (en) * 2019-09-12 2019-12-06 扬州大学 english sentence simplification algorithm based on pre-training Transformer language model
CN111695338A (en) * 2020-04-29 2020-09-22 平安科技(深圳)有限公司 Interview content refining method, device, equipment and medium based on artificial intelligence

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170357945A1 (en) * 2016-06-14 2017-12-14 Recruiter.AI, Inc. Automated matching of job candidates and job listings for recruitment
CN110472647A (en) * 2018-05-10 2019-11-19 百度在线网络技术(北京)有限公司 Secondary surface method for testing, device and storage medium based on artificial intelligence
CN109919564A (en) * 2018-12-20 2019-06-21 平安科技(深圳)有限公司 Interview optimization method and device, storage medium, computer equipment
CN110223742A (en) * 2019-06-14 2019-09-10 中南大学 The clinical manifestation information extraction method and equipment of Chinese electronic health record data
CN110543639A (en) * 2019-09-12 2019-12-06 扬州大学 english sentence simplification algorithm based on pre-training Transformer language model
CN111695338A (en) * 2020-04-29 2020-09-22 平安科技(深圳)有限公司 Interview content refining method, device, equipment and medium based on artificial intelligence

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115062134A (en) * 2022-08-17 2022-09-16 腾讯科技(深圳)有限公司 Knowledge question-answering model training and knowledge question-answering method, device and computer equipment
CN115062134B (en) * 2022-08-17 2022-11-08 腾讯科技(深圳)有限公司 Knowledge question-answering model training and knowledge question-answering method, device and computer equipment
CN116304111A (en) * 2023-04-10 2023-06-23 大连数通云网络科技有限公司 AI call optimization processing method and server based on visual service data
CN116304111B (en) * 2023-04-10 2024-02-20 深圳市兴海物联科技有限公司 AI call optimization processing method and server based on visual service data
CN116229955A (en) * 2023-05-09 2023-06-06 海尔优家智能科技(北京)有限公司 Interactive intention information determining method based on generated pre-training GPT model
CN116229955B (en) * 2023-05-09 2023-08-18 海尔优家智能科技(北京)有限公司 Interactive intention information determining method based on generated pre-training GPT model

Also Published As

Publication number Publication date
CN111695338A (en) 2020-09-22

Similar Documents

Publication Publication Date Title
WO2021218028A1 (en) Artificial intelligence-based interview content refining method, apparatus and device, and medium
CN110795543B (en) Unstructured data extraction method, device and storage medium based on deep learning
CN111241237B (en) Intelligent question-answer data processing method and device based on operation and maintenance service
WO2021121198A1 (en) Semantic similarity-based entity relation extraction method and apparatus, device and medium
WO2022095380A1 (en) Ai-based virtual interaction model generation method and apparatus, computer device and storage medium
CN112131350B (en) Text label determining method, device, terminal and readable storage medium
WO2021218029A1 (en) Artificial intelligence-based interview method and apparatus, computer device, and storage medium
CN112633003B (en) Address recognition method and device, computer equipment and storage medium
CN112328761B (en) Method and device for setting intention label, computer equipment and storage medium
CN111930792B (en) Labeling method and device for data resources, storage medium and electronic equipment
CN112287069B (en) Information retrieval method and device based on voice semantics and computer equipment
WO2021139316A1 (en) Method and apparatus for establishing expression recognition model, and computer device and storage medium
US20140255886A1 (en) Systems and Methods for Content Scoring of Spoken Responses
CN113627797B (en) Method, device, computer equipment and storage medium for generating staff member portrait
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
CN113392265A (en) Multimedia processing method, device and equipment
CN113421551B (en) Speech recognition method, speech recognition device, computer readable medium and electronic equipment
CN107844531B (en) Answer output method and device and computer equipment
CN111126084B (en) Data processing method, device, electronic equipment and storage medium
CN113505786A (en) Test question photographing and judging method and device and electronic equipment
CN113705191A (en) Method, device and equipment for generating sample statement and storage medium
CN110647613A (en) Courseware construction method, courseware construction device, courseware construction server and storage medium
CN115273856A (en) Voice recognition method and device, electronic equipment and storage medium
CN113822040B (en) Subjective question scoring method, subjective question scoring device, computer equipment and storage medium
CN116701593A (en) Chinese question-answering model training method based on GraphQL and related equipment thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20932958

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20932958

Country of ref document: EP

Kind code of ref document: A1