CN110619035A

CN110619035A - Method, device and equipment for identifying keywords in interview video and storage medium

Info

Publication number: CN110619035A
Application number: CN201910706481.0A
Authority: CN
Inventors: 金戈; 徐亮
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-08-01
Filing date: 2019-08-01
Publication date: 2019-12-27
Anticipated expiration: 2039-08-01
Also published as: CN110619035B; WO2021017296A1

Abstract

The application relates to the field of neural networks, and provides a method, a device, equipment and a storage medium for identifying keywords in an interview video, wherein the method comprises the following steps: training the multi-view self-training neural network model by using a plurality of training texts, and converting the collected voice signals into texts to be recognized; extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are the words to be recognized which are fed back by the interviewee and are marked or not; generating prompt information, and displaying the prompt information and the words to be recognized, wherein the prompt information is used for prompting the interviewer to label the words to be recognized; calculating the keyword probability of each word to be recognized as a keyword by using a multi-view self-training neural network model; when the keyword probability is in the range of the probability threshold, marking all the words to be recognized in the range of the probability threshold as keywords; the keywords and notification messages are sent to at least one interview server. By adopting the scheme, the accuracy of identifying the keywords in the text can be improved.

Description

Method, device and equipment for identifying keywords in interview video and storage medium

Technical Field

The present application relates to the field of neural networks, and in particular, to a method, an apparatus, a device, and a storage medium for identifying keywords in an interview video.

Background

With the rapid development of information technology, AI technology has been applied to various industries. The human resource field is a typical field which is widely used. Both the image module and the voice module have become the fact that the AI technology is rapidly developed, and particularly, in the voice module, various voice system layers are endless, and voice recognition, voice conversion, voice interaction, voice synthesis and the like have gradually matured, which brings unprecedented opportunities for the development of the voice technology.

However, the existing voice technology only stays at a stage of roughly estimating voice semantics, and although a part of keywords can be recognized by obtaining the keywords in the voice signal by comparing the similarity between the voice signal and the visual lip shape of a speaker, the part of the keywords still cannot be accurately positioned to the detail part in the keywords, so that accurate information cannot be obtained in many cases, and the existing voice technology cannot be popularized and used in more fields, especially in the field of video interview.

Disclosure of Invention

The application provides a method, a device and equipment for identifying keywords in an interview video and a storage medium, which can solve the problem that the accuracy rate of acquiring the keywords of a speaker in a voice signal is low in the prior art.

In a first aspect, the present application provides a method for identifying keywords in an interview video, the method comprising:

inputting a plurality of collected training texts into a multi-view self-training neural network model to train the multi-view self-training neural network model, wherein the training texts are used for training the multi-view self-training neural network model;

collecting a voice signal, calling a voice recognition system, and converting the voice signal into a text to be recognized;

extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are words to be recognized whether the words to be recognized are labeled or not in the feedback of the interviewer;

generating prompt information, and displaying the prompt information and the words to be recognized, wherein the prompt information is used for prompting an interviewer to label the words to be recognized;

inputting the multiple words to be recognized into the multi-view self-training neural network model, and calculating the keyword probability of each word to be recognized as a keyword;

comparing the keyword probability with a probability threshold, and when the keyword probability is in the range of the probability threshold, marking all the words to be recognized in the range of the probability threshold as keywords;

and sending the key words and the notification message to at least one interview server in the interview server list according to a preset interview server name, wherein the notification message is used for prompting the interview server to upload a final interview result in time.

In some possible designs, the inputting the collected plurality of training texts into the multi-view self-training neural network model includes:

dividing the training text into a first training text and a second training text according to the labels, wherein the first training text is the training text with the labels, and the second training text is the training text without the labels;

converting the first training text into a first word vector according to the coding rule, and converting the second training text into a second word vector;

extracting a first feature of the first word vector and extracting a second feature of the second word vector;

inputting the first feature and the second feature into a multi-view self-training neural network model, and respectively calculating to obtain a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature;

comparing the first keyword probability and the second keyword probability with a preset probability threshold respectively;

when any first loss probability or second loss probability is lower than the lower limit of a preset probability threshold, setting the first loss probability or the second loss probability as the lower limit of a new probability threshold;

and when any first loss probability or second loss probability is higher than the upper limit of the preset probability threshold, setting the first loss probability or the second loss probability as the upper limit of a new probability threshold.

In some possible designs, the extracting first features of the first word vector and extracting second features of the second word vector includes:

inputting the first word vector and the second word vector into a GRU encoder respectively;

and respectively carrying out conversion and feature extraction operations on the first word vector and the second word vector in a GRU encoder to obtain the first feature in the first word vector and the second feature in the second word vector.

In some possible designs, the performing, in the GRU encoder, conversion and feature extraction operations on the first word vector and the second word vector, respectively, to obtain the first feature in the first word vector and the second feature in the second word vector includes:

calculating a reset gate and an update gate according to the hidden layer at the time t-1, the first word vector at the time t and the second word vector at the time t;

calculating a candidate hidden layer according to the reset gate, the first word vector at the time t and the second word vector at the time t;

hidden layer information in the first word vector and the second word vector is respectively calculated according to the candidate hidden layers;

extracting a first feature of the first word vector and a second feature of the second word vector according to the hidden layer information; wherein the formula for extracting the first feature and the second feature is as follows:

h_t′＝σ(W_Oh_t)

wherein h is_t' is the first feature, the second feature, h_tTo hide layer information, W_OThe characteristic weight matrix is a preset matrix, and sigma is a calculation function. And (4) calculating to obtain the first characteristic and the second characteristic which are the characteristics of the key words.

In some possible designs, the inputting the first feature and the second feature into a multi-view self-training neural network model, and respectively calculating a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature includes:

calculating a first probability of the first feature from the primary perspective by SoftMax using a first probability formula of: p is a radical of₁＝NN(h_t')＝softmax(U·ReLU(W(h_t') + b); wherein p is₁Is a first probability, h_t' is a first characteristic of time t, ReLU is an activation function, U, W is a probability matrix which is a preset matrix, b is a key probability parameter which is a preset constant and is used for compensating an error of first key probability calculation, and softmax is a calculation function;

adjusting the first probability with a loss function from an auxiliary perspective to obtain the first keyword probability, the second keyword probability being:wherein p is_LabelingIs the first keyword probability, h_t' As a first feature, p₁Is the first probability, N is the number of first features, and CE is the loss function.

and comprehensively calculating a second probability from the main view and the auxiliary view by adopting a second probability formula, wherein the second probability formula is as follows:

p₂ ^fwd＝NN^fwd(h_t'(x_t))

p₂ ^bwd＝NN^bwd(h_t'(x_t))

p₂ ^future＝NN^future(h_t'(x_t))

p₂ ^past＝NN^past(h_t'(x_t))

wherein p is₂ ^fwdSecond probability of previous moment, p₂ ^bwdA second probability, p, of a later instant₂ ^futureFor a second probability of future time, p₂ ^pastIs a second probability of elapsed time, h_t' As a second feature at time t, NN^fwdIs a calculation function of the second probability of the previous moment, NN^bwdFor a calculation function of a second probability at a later time, NN^futureFor the calculation function of the second probability of future time, NN^pastAs a function of the calculation of the past time, x_tThe second characteristic at time t. p is a radical of₂ ^past、p₂ ^fwd、 p₂ ^bwd、p₂ ^futureFrom left to right, arranged in time;

and adjusting the second probability by using loss functions corresponding to four auxiliary views to obtain the second keyword probability, wherein a calculation formula of the second keyword probability is as follows:

wherein p is_{Not labeled}For the second keyword probability, θ is four auxiliary views including fwd, bwd, future, past, p₂ ^θA second probability corresponding to four auxiliary views, N being the number of second features, D_θThe loss function for four auxiliary views.

In some possible designs, the extracting a plurality of words to be recognized from the text to be recognized includes:

segmenting a text to be recognized into words, making part-of-speech identifications on the words after segmentation, keeping the part-of-speech identifications as words of a first word, a verb, an adjective and an adverb, and taking each word as a node;

calculating the weight value of each node in the text to be recognized, wherein the weight value calculation formula is as follows:

wherein WS (V)_i) Is node V_iThe weight value in the text to be recognized, d is a damping coefficient and is a preset constant, w_jiIs node V_iAnd node V_jWeight between, Out (V)_j) Is node V_jSet of pointed-to nodes, node V_kIs node V_jDirected node, w_jkIs node V_kAnd node V_jWeight between, WS (V)_j) Is node V_jA weight value in the text to be recognized;

dividing the weight value of each node by the maximum weight value in the weight value set to obtain the normalized weight value of each node in the text to be identified, and taking the word corresponding to the node of which the normalized weight value is greater than a preset weight value threshold value in the text to be identified as the word to be identified.

In a second aspect, the present application provides an apparatus for identifying keywords in an interview video, which has a function of implementing the method for identifying keywords in an interview video corresponding to the first aspect. The functions can be realized by hardware, and can also be realized by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, which may be software and/or hardware.

In one possible design, the means for identifying keywords in the interview video comprises:

the input and output module is used for inputting a plurality of collected training texts into a multi-view self-training neural network model so as to train the multi-view self-training neural network model, and the training texts are used for training the multi-view self-training neural network model;

the processing module is used for acquiring a voice signal, calling a voice recognition system and converting the voice signal into a text to be recognized; extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are the words to be recognized which are fed back by the interviewee and are marked or not; generating prompt information;

the display module is used for displaying the prompt information and the words to be recognized, and the prompt information is used for prompting the interviewer to label the words to be recognized;

the processing module is further used for inputting the multiple words to be recognized into the multi-view self-training neural network model through the input and output module, and calculating the probability of keywords of the words to be recognized as keywords; comparing the keyword probability with a probability threshold, and when the keyword probability is in the range of the probability threshold, marking all the words to be recognized in the range of the probability threshold as keywords; and sending the keyword and the notification message to at least one interview server in the interview server list through the input/output module according to a preset interview server list, wherein the notification message is used for prompting the interview server to upload a final interview result in time.

In some possible designs, the processing module is specifically configured to:

inputting the first characteristic and the second characteristic into a multi-view self-training neural network model through the input and output module, and respectively calculating to obtain a first keyword probability corresponding to the first characteristic and a second keyword probability corresponding to the second characteristic;

In some possible designs, the processing module is specifically configured to:

h_t′＝σ(W_Oh_t)

In some possible designs, the processing module is specifically configured to:

calculating a first probability of the first feature from the primary perspective by SoftMax using a first probability formula：p₁＝NN(h_t')＝softmax(U·ReLU(W(h_t') + b); wherein p is₁Is a first probability, h_t' is a first characteristic of time t, ReLU is an activation function, U, W is a probability matrix which is a preset matrix, b is a key probability parameter which is a preset constant and is used for compensating an error of first key probability calculation, and softmax is a calculation function;

In some possible designs, the processing module is specifically configured to:

p₂ ^fwd＝NN^fwd(h_t'(x_t))

p₂ ^bwd＝NN^bwd(h_t'(x_t))

p₂ ^future＝NN^future(h_t'(x_t))

p₂ ^past＝NN^past(h_t'(x_t))

wherein p is₂ ^fwdSecond probability of previous moment, p₂ ^bwdA second probability, p, of a later instant₂ ^futureFor a second probability of future time, p₂ ^pastIs a second probability of elapsed time, h_t' As a second feature at time t, NN^fwdIs a calculation function of the second probability of the previous moment, NN^bwdFor a calculation function of a second probability at a later time, NN^futureAs a function of the calculation of the second probability for a future time, NN^pastas a function of the calculation of the past time, x_tThe second characteristic at time t. p is a radical of₂ ^past、p₂ ^fwd、 p₂ ^bwd、p₂ ^futureFrom left to right, arranged in time;

In some possible designs, the processing module is specifically configured to:

A further aspect of the application provides a computer device comprising at least one connected processor, memory and transceiver, wherein the memory is configured to store program code and the processor is configured to invoke the program code in the memory to perform the method of the first aspect.

A further aspect of the present application provides a computer storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect described above.

Compared with the prior art, in the scheme provided by the application, the multi-view self-training model is trained, the voice signal is converted into the text to be recognized, the key words in the text to be recognized are recognized based on the multi-view self-training model, namely the neural network model is trained from the main view and the multiple auxiliary views respectively, the accuracy and the hit rate of recognizing the key words can be improved, and the recognition precision of the neural network model is improved. In addition, the method for extracting the text features by the GRU encoder is improved, the purpose of accurately positioning the keywords in the text can be achieved, and the accuracy rate of extracting the keywords in the text is further improved.

Drawings

FIG. 1 is a flowchart illustrating a method for identifying keywords in an interview video according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a structure of an apparatus for identifying keywords in an interview video according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a computer device in an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further described with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not explicitly listed or inherent to such process, method, article, or apparatus, and such that a division of modules presented in this application is merely a logical division that may be implemented in an actual implementation in another manner, such that multiple modules may be combined or integrated in another system, or some features may be omitted, or not implemented.

The application provides a method, a device, equipment and a storage medium for identifying keywords in an interview video, which can be used for video interview or voice interview and also can be used for emotion analysis of a speaker, and the application scene of the scheme is not limited.

Referring to fig. 1, a method for identifying keywords in an interview video in an embodiment of the application is described as follows, where the method includes:

101. and inputting a plurality of collected training texts into a multi-view self-training neural network model so as to train the multi-view self-training neural network model.

Wherein the training text is used for training the multi-view self-training neural network model.

Respectively the multi-view self-training neural network model

The training texts are obtained from a text database provided by a business demander, the text database is a preset text storage library and is provided by the business demander, and a plurality of training texts are stored in the text database. The training text comprises keywords of working ability and quality meeting interview requirements.

In some embodiments, the inputting the collected plurality of training texts into the multi-view self-training neural network model comprises:

Therefore, all the first keyword probabilities and all the second keyword probabilities calculated through the multi-view self-training neural network model are respectively compared with preset probability threshold values, the range of the probability threshold values is adjusted, and a more accurate keyword identification range can be obtained.

In some embodiments, the extracting first features of the first word vector and extracting second features of the second word vector comprises:

inputting the first word vector and the second word vector into a Gated Recursive Unit (GRU) encoder;

102. Collecting voice signals, calling a voice recognition system, and converting the voice signals into texts to be recognized.

Specifically, a voice signal can be detected in real time through the voice receiving device, the voice signal is interview voice sent by an applicant in an interview environment, and the voice signal triggers the voice recognition system to convert the voice signal into a text to be recognized and is used as a basis for recognizing keywords. The main purpose of step 2 is to convert the voice signal into a text to be recognized, so that the difficulty of voice recognition is reduced, and keywords can be recognized more easily.

The voice recognition system is a preset system, and specifically, hundredth voice recognition, news flight voice recognition or ali cloud voice recognition and the like can be selected, and the specific application is not limited.

103. And extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are the words to be recognized which are fed back by the interviewee and are not labeled.

In some embodiments, the extracting a plurality of words to be recognized from the text to be recognized includes:

wherein WS (V)_i) Is node V_iThe weight value in the text to be recognized, d is a damping coefficient and is a preset constant, w_jiIs node V_iAnd node V_jWeight between, Out (V)_j) Is node V_jSet of pointed-to nodes, node V_kIs node V_jDirected node, w_jkIs node V_kAnd node V_jThe weight between the weight of the first and second groups,

WS(V_j) Is node V_jA weight value in the text to be recognized;

104. And generating prompt information, displaying the prompt information and the words to be recognized, inputting the words to be recognized into the multi-view self-training neural network model, and calculating the keyword probability of each word to be recognized as a keyword.

The prompt information is used for prompting the interviewer to label the words to be recognized.

The keywords refer to words in the text to be recognized, whose weighted values are higher than a preset weighted value threshold, and the weighted value threshold of the keywords may be specifically set according to dimensions such as word frequency, part of speech, text theme, and the like, which is not limited in this application.

The keyword probability refers to the probability that a word in the text to be recognized is a keyword.

In the embodiment of the present application, whether to label or not can be freely selected by the interviewer, and the interviewer can make text labels or not in the words to be recognized, which is not limited in the present application.

The method comprises the steps of classifying according to whether a text to be recognized is a text with a mark or not, and classifying according to the mark, wherein the probability that a corresponding word to be recognized is a keyword, specifically, the probability that the word to be recognized is the keyword in the application can be classified into a first keyword probability and a second keyword probability.

Optionally, in some embodiments of the present application, the multi-view self-trained neural network model calculates the first keyword probability and the second keyword probability from the main view and the auxiliary view. The main view refers to the current time, the auxiliary view includes a future time, a past time, a previous time and a next time, the future time does not include the next time, and the past time does not include the previous time.

The following introduces a procedure of calculating the first keyword probability and the second keyword probability from the viewpoint that the types of the features are the first feature and the second feature, respectively.

(1) If the first feature is obtained, the process of calculating the probability of the first keyword is as follows:

adjusting the first probability with a loss function from an auxiliary perspective to obtain the first keyword probability, the first keyword probability being:wherein p is_LabelingIs the first keyword probability, h_t' As a first feature, p₁Is the first probability, N is the number of first features, and CE is the loss function.

(2) If the second feature is the first feature, the process of calculating the probability of the first keyword is as follows:

p₂ ^fwd＝NN^fwd(h_t'(x_t))

p₂ ^bwd＝NN^bwd(h_t'(x_t))

p₂ ^future＝NN^future(h_t'(x_t))

p₂ ^past＝NN^past(h_t'(x_t))

In the embodiment of the present application, the probability threshold range of the probability of judging whether the text to be recognized is the keyword in the processes of step 102 to step 105 may be continuously adjusted by calculating the keyword probability of the keyword.

105. And comparing the keyword probability with a probability threshold, and when the keyword probability is in the range of the probability threshold, marking the words to be recognized in the range of the probability threshold as key words.

In some embodiments, when the annotation instruction of the interviewer for the word to be recognized is detected, the keyword probability that the word to be recognized is the keyword is calculated by the calculation method for calculating the first keyword probability in step 104.

In other embodiments, when the interviewer is detected that the word to be recognized is not labeled, the calculation method for calculating the second keyword probability in step 104 is used to calculate the keyword probability that the word to be recognized is the keyword.

106. And sending the keyword and the notification message to at least one interview server in the interview server list according to a preset interview server name, wherein the notification message is used for prompting the interview server to upload a final interview result in time.

Compared with the existing mechanism, in the embodiment of the application, the multi-view self-training model is trained, the voice signal is converted into the text to be recognized, and the keywords in the text to be recognized are recognized based on the multi-view self-training model, namely, the neural network model is trained from the main view and the multiple auxiliary views respectively, so that the accuracy and hit rate of recognizing the keywords can be improved, and the recognition precision of the neural network model is improved.

Optionally, in some embodiments of the present application, the performing, in a GRU encoder, conversion and feature extraction operations on a first word vector and a second word vector respectively to obtain the first feature in the first word vector and the second feature in the second word vector includes:

h_t′＝σ(W_Oh_t)

Therefore, the method for extracting the text features by the GRU encoder is improved, the purpose of accurately positioning the keywords in the text can be achieved, and the accuracy rate of extracting the keywords in the text is further improved.

The technical features mentioned in the embodiment or implementation manner corresponding to fig. 1 are also applicable to the embodiments corresponding to fig. 2 and fig. 3 in the present application, and the details of the following similarities are not repeated.

A method for identifying keywords in an interview video in the present application is described above, and an apparatus for performing the method for identifying keywords in an interview video is described below.

Fig. 2 is a schematic structural diagram of an apparatus 20 for identifying keywords in an interview video, which can be applied to video interviews. The apparatus 20 in the embodiment of the present application is capable of implementing steps corresponding to the method for identifying keywords in an interview video performed in the embodiment corresponding to fig. 1. The functions implemented by the apparatus 20 may be implemented by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, which may be software and/or hardware. The apparatus 20 may include an input/output module 201, a processing module 202, and a display module 203, and the processing module 202, the input/output module 201, and the display module 203 may refer to operations performed in the embodiment corresponding to fig. 1, which are not described herein again. The processing module 202 may be used to control the input and output operations of the input and output module 201 and control the display operation of the display module 203.

In some embodiments, the input/output module 201 is configured to input a plurality of collected training texts into a multi-view self-training neural network model to train the multi-view self-training neural network model, where the training texts are used for training the multi-view self-training neural network model;

the processing module 202 may be configured to collect a voice signal, call a voice recognition system, and convert the voice signal into a text to be recognized; extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are the words to be recognized which are fed back by the interviewee and are marked or not; generating prompt information;

the display module 202 may be configured to display the prompt information and the to-be-recognized word, where the prompt information is used to prompt the interviewee to label the to-be-recognized word;

the processing module 202 is further configured to input the multiple to-be-recognized words into the multi-view self-training neural network model through the input/output module 201, and calculate a keyword probability that each to-be-recognized word is a keyword; comparing the keyword probability with a probability threshold, and marking the words to be recognized in the range of the probability threshold as keywords when the keyword probability is in the range of the probability threshold; and sending the keyword and the notification message to at least one interview server in the interview server list through the input/output module 201 according to a preset interview server list, wherein the notification message is used for prompting the interview server to upload a final interview result in time.

Compared with the existing mechanism, in the embodiment of the application, the processing module 202 trains the multi-view self-training model, converts the voice signal into the text to be recognized, and recognizes the keywords in the text to be recognized based on the multi-view self-training model, that is, trains the neural network model from the main view and the multiple auxiliary views respectively, so that the accuracy and hit rate of recognizing the keywords can be improved, that is, the recognition accuracy of the neural network model is improved.

In some embodiments, the processing module 202 is specifically configured to:

inputting the first feature and the second feature into a multi-view self-training neural network model through the input/output module 201, and respectively calculating to obtain a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature;

In some embodiments, the processing module 202 is specifically configured to:

inputting the first word vector and the second word vector into a GRU encoder through the input-output module 201, respectively;

In some embodiments, the processing module 202 is specifically configured to:

h_t′＝σ(W_Oh_t)

In some embodiments, the processing module 202 is specifically configured to:

p₂ ^fwd＝NN^fwd(h_t'(x_t))

p₂ ^bwd＝NN^bwd(h_t'(x_t))

p₂ ^future＝NN^future(h_t'(x_t))

p₂ ^past＝NN^past(h_t'(x_t))

In some embodiments, the processing module 202 is specifically configured to:

The physical device corresponding to the input/output module 201 shown in fig. 2 is the input/output unit shown in fig. 3, and the input/output unit can implement part or all of the functions of the input/output module 201, or implement the same or similar functions as the input/output module 201.

The physical device corresponding to the processing module 202 shown in fig. 2 is a processor shown in fig. 3, and the processor can implement part or all of the functions of the processing module 202, or implement the same or similar functions as the processing module 202.

The physical device corresponding to the display module 203 shown in fig. 2 is a processor shown in fig. 3, and the processor can implement part or all of the functions of the display module 203, or implement the same or similar functions as the display module 203.

While the embodiment 20 of the present application is described above from the perspective of a modular functional entity, the following description describes a computer device from the perspective of hardware, as shown in fig. 3, which includes: a processor, a memory, a transceiver (which may also be an input-output unit, not identified in fig. 3), and a computer program stored in the memory and executable on the processor. For example, the computer program may be a program corresponding to the method for identifying keywords in the interview video in the embodiment corresponding to fig. 1. For example, when the computer device implements the functions of the apparatus 20 shown in fig. 2, the processor executes the computer program to implement the steps of the method for identifying keywords in the interview video, which is performed by the apparatus 20 in the embodiment corresponding to fig. 2; alternatively, the processor implements the functions of the modules in the apparatus 20 according to the embodiment corresponding to fig. 2 when executing the computer program. For another example, the computer program may be a program corresponding to the method for identifying keywords in the interview video in the embodiment corresponding to fig. 1.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is the control center of the computer device and that connects the various parts of the overall computer device using various interfaces and lines.

The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the computer device by running or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, video data, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The transceivers may also be replaced by receivers and transmitters, which may be the same or different physical entities. When the same physical entity, may be collectively referred to as a transceiver. The transceiver may be an input-output unit.

The memory may be integrated in the processor or may be provided separately from the processor.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM), and includes several instructions for enabling a terminal (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the drawings, but the present application is not limited to the above-mentioned embodiments, which are only illustrative and not restrictive, and those skilled in the art can make many forms without departing from the spirit and scope of the present application and protection thereof, and all changes that come within the meaning and range of equivalency of the claims are intended to be embraced therein.

Claims

1. A method for identifying keywords in an interview video, the method comprising:

extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are the words to be recognized which are fed back by the interviewee and are marked or not;

and sending the keyword and the notification message to at least one interview server in the interview server list according to a preset interview server name, wherein the notification message is used for prompting the interview server to upload a final interview result in time.

2. The method of claim 1, wherein inputting the collected plurality of training texts into a multi-view self-training neural network model comprises:

inputting the first characteristic and the second characteristic into a multi-view self-training neural network model, and respectively calculating to obtain a first keyword probability corresponding to the first characteristic and a second keyword probability corresponding to the second characteristic;

comparing the first keyword probability and the second keyword probability with preset probability threshold values respectively;

when any one of the first loss probability or the second loss probability is lower than the lower limit of a preset probability threshold, setting the first loss probability or the second loss probability as the lower limit of a new probability threshold;

3. The method of claim 1, wherein extracting the first feature of the first word vector and extracting the second feature of the second word vector comprises:

4. The method of claim 3, wherein performing the transforming and feature extracting operations on the first word vector and the second word vector in the GRU encoder to obtain the first feature in the first word vector and the second feature in the second word vector respectively comprises:

h_t′＝σ(W_Oh_t)

5. The method according to any one of claims 2-4, wherein the inputting the first feature and the second feature into a multi-view self-training neural network model, and respectively calculating a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature comprises:

calculating a first probability of the first feature from the primary perspective by SoftMax using a first probability formula of: p is a radical of₁＝NN(h_t')＝softmax(U·ReLU(W(h_t') + b); wherein p is₁Is a first probability, h_t' As the first feature at time t, ReLU is the activation function, U, W is the probability matrix, which is a predetermined matrix, and b is the key probability parameter, which is a predetermined constant, which is used to compensate the first key probabilityError of rate calculation, softmax is a calculation function;

6. The method according to any one of claims 2-4, wherein the inputting the first feature and the second feature into a multi-view self-training neural network model, and respectively calculating a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature comprises:

p₂ ^fwd＝NN^fwd(h_t'(x_t))

p₂ ^bwd＝NN^bwd(h_t'(x_t))

p₂ ^future＝NN^future(h_t'(x_t))

p₂ ^past＝NN^past(h_t'(x_t))

wherein p is₂ ^fwdSecond probability of previous moment, p₂ ^bwdA second probability, p, of a later instant₂ ^futureFor a second probability of future time, p₂ ^pastIs a second probability of elapsed time, h_t' As a second feature at time t, NN^fwdIs a calculation function of the second probability of the previous moment, NN^bwdFor a calculation function of the second probability of a later instant, NN^futureFor the calculation function of the second probability of future time, NN^pastAs a function of the calculation of the past time, x_tThe second characteristic at time t. p is a radical of₂ ^past、p₂ ^fwd、p₂ ^bwd、p₂ ^futureFrom left to right, arranged in time;

and adjusting the second probability by using loss functions corresponding to four auxiliary views to obtain the second keyword probability, wherein the calculation formula of the second keyword probability is as follows:

wherein p is_{Not labeled}For the second keyword probability, θ is four auxiliary views including fwd, bwd, future, past, p₂ ^θA second probability for four auxiliary views, N being the number of second features, D_θThe loss function for four auxiliary views.

7. The method of claim 1, wherein the extracting a plurality of words to be recognized from the text to be recognized comprises:

dividing the text to be recognized into words, making part-of-speech identifications on the words after word division, keeping the part-of-speech identifications as words of nouns, verbs, adjectives and adverbs, and taking each word as a node;

dividing the weighted value of each node by the maximum weighted value in the weighted value set to obtain the normalized weighted value of each node in the text to be recognized, and taking the word corresponding to the node of which the normalized weighted value is greater than the preset weighted value threshold value in the text to be recognized as the word to be recognized.

8. An apparatus for identifying keywords in an interview video, the apparatus comprising:

the processing module is further used for inputting the multiple words to be recognized into the multi-view self-training neural network model through the input and output module, and calculating the keyword probability of each word to be recognized as a keyword; comparing the keyword probability with a probability threshold, and when the keyword probability is in the range of the probability threshold, marking all the words to be recognized in the range of the probability threshold as keywords; and sending the keyword and the notification message to at least one interview server in the interview server list through the input/output module according to a preset interview server list, wherein the notification message is used for prompting the interview server to upload a final interview result in time.

9. A computer device, characterized in that the computer device comprises:

at least one processor, memory, and transceiver;

wherein the memory is configured to store program code and the processor is configured to invoke the program code stored in the memory to perform the method of any of claims 1-7.

10. A computer storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1-7.