CN110619035A - Method, device and equipment for identifying keywords in interview video and storage medium - Google Patents

Method, device and equipment for identifying keywords in interview video and storage medium Download PDF

Info

Publication number
CN110619035A
CN110619035A CN201910706481.0A CN201910706481A CN110619035A CN 110619035 A CN110619035 A CN 110619035A CN 201910706481 A CN201910706481 A CN 201910706481A CN 110619035 A CN110619035 A CN 110619035A
Authority
CN
China
Prior art keywords
probability
recognized
feature
keyword
word vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910706481.0A
Other languages
Chinese (zh)
Other versions
CN110619035B (en
Inventor
金戈
徐亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910706481.0A priority Critical patent/CN110619035B/en
Priority to PCT/CN2019/117928 priority patent/WO2021017296A1/en
Publication of CN110619035A publication Critical patent/CN110619035A/en
Application granted granted Critical
Publication of CN110619035B publication Critical patent/CN110619035B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • G06Q10/1053Employment or hiring
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to the field of neural networks, and provides a method, a device, equipment and a storage medium for identifying keywords in an interview video, wherein the method comprises the following steps: training the multi-view self-training neural network model by using a plurality of training texts, and converting the collected voice signals into texts to be recognized; extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are the words to be recognized which are fed back by the interviewee and are marked or not; generating prompt information, and displaying the prompt information and the words to be recognized, wherein the prompt information is used for prompting the interviewer to label the words to be recognized; calculating the keyword probability of each word to be recognized as a keyword by using a multi-view self-training neural network model; when the keyword probability is in the range of the probability threshold, marking all the words to be recognized in the range of the probability threshold as keywords; the keywords and notification messages are sent to at least one interview server. By adopting the scheme, the accuracy of identifying the keywords in the text can be improved.

Description

Method, device and equipment for identifying keywords in interview video and storage medium
Technical Field
The present application relates to the field of neural networks, and in particular, to a method, an apparatus, a device, and a storage medium for identifying keywords in an interview video.
Background
With the rapid development of information technology, AI technology has been applied to various industries. The human resource field is a typical field which is widely used. Both the image module and the voice module have become the fact that the AI technology is rapidly developed, and particularly, in the voice module, various voice system layers are endless, and voice recognition, voice conversion, voice interaction, voice synthesis and the like have gradually matured, which brings unprecedented opportunities for the development of the voice technology.
However, the existing voice technology only stays at a stage of roughly estimating voice semantics, and although a part of keywords can be recognized by obtaining the keywords in the voice signal by comparing the similarity between the voice signal and the visual lip shape of a speaker, the part of the keywords still cannot be accurately positioned to the detail part in the keywords, so that accurate information cannot be obtained in many cases, and the existing voice technology cannot be popularized and used in more fields, especially in the field of video interview.
Disclosure of Invention
The application provides a method, a device and equipment for identifying keywords in an interview video and a storage medium, which can solve the problem that the accuracy rate of acquiring the keywords of a speaker in a voice signal is low in the prior art.
In a first aspect, the present application provides a method for identifying keywords in an interview video, the method comprising:
inputting a plurality of collected training texts into a multi-view self-training neural network model to train the multi-view self-training neural network model, wherein the training texts are used for training the multi-view self-training neural network model;
collecting a voice signal, calling a voice recognition system, and converting the voice signal into a text to be recognized;
extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are words to be recognized whether the words to be recognized are labeled or not in the feedback of the interviewer;
generating prompt information, and displaying the prompt information and the words to be recognized, wherein the prompt information is used for prompting an interviewer to label the words to be recognized;
inputting the multiple words to be recognized into the multi-view self-training neural network model, and calculating the keyword probability of each word to be recognized as a keyword;
comparing the keyword probability with a probability threshold, and when the keyword probability is in the range of the probability threshold, marking all the words to be recognized in the range of the probability threshold as keywords;
and sending the key words and the notification message to at least one interview server in the interview server list according to a preset interview server name, wherein the notification message is used for prompting the interview server to upload a final interview result in time.
In some possible designs, the inputting the collected plurality of training texts into the multi-view self-training neural network model includes:
dividing the training text into a first training text and a second training text according to the labels, wherein the first training text is the training text with the labels, and the second training text is the training text without the labels;
converting the first training text into a first word vector according to the coding rule, and converting the second training text into a second word vector;
extracting a first feature of the first word vector and extracting a second feature of the second word vector;
inputting the first feature and the second feature into a multi-view self-training neural network model, and respectively calculating to obtain a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature;
comparing the first keyword probability and the second keyword probability with a preset probability threshold respectively;
when any first loss probability or second loss probability is lower than the lower limit of a preset probability threshold, setting the first loss probability or the second loss probability as the lower limit of a new probability threshold;
and when any first loss probability or second loss probability is higher than the upper limit of the preset probability threshold, setting the first loss probability or the second loss probability as the upper limit of a new probability threshold.
In some possible designs, the extracting first features of the first word vector and extracting second features of the second word vector includes:
inputting the first word vector and the second word vector into a GRU encoder respectively;
and respectively carrying out conversion and feature extraction operations on the first word vector and the second word vector in a GRU encoder to obtain the first feature in the first word vector and the second feature in the second word vector.
In some possible designs, the performing, in the GRU encoder, conversion and feature extraction operations on the first word vector and the second word vector, respectively, to obtain the first feature in the first word vector and the second feature in the second word vector includes:
calculating a reset gate and an update gate according to the hidden layer at the time t-1, the first word vector at the time t and the second word vector at the time t;
calculating a candidate hidden layer according to the reset gate, the first word vector at the time t and the second word vector at the time t;
hidden layer information in the first word vector and the second word vector is respectively calculated according to the candidate hidden layers;
extracting a first feature of the first word vector and a second feature of the second word vector according to the hidden layer information; wherein the formula for extracting the first feature and the second feature is as follows:
ht′=σ(WOht)
wherein h ist' is the first feature, the second feature, htTo hide layer information, WOThe characteristic weight matrix is a preset matrix, and sigma is a calculation function. And (4) calculating to obtain the first characteristic and the second characteristic which are the characteristics of the key words.
In some possible designs, the inputting the first feature and the second feature into a multi-view self-training neural network model, and respectively calculating a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature includes:
calculating a first probability of the first feature from the primary perspective by SoftMax using a first probability formula of: p is a radical of1=NN(ht')=softmax(U·ReLU(W(ht') + b); wherein p is1Is a first probability, ht' is a first characteristic of time t, ReLU is an activation function, U, W is a probability matrix which is a preset matrix, b is a key probability parameter which is a preset constant and is used for compensating an error of first key probability calculation, and softmax is a calculation function;
adjusting the first probability with a loss function from an auxiliary perspective to obtain the first keyword probability, the second keyword probability being:wherein p isLabelingIs the first keyword probability, ht' As a first feature, p1Is the first probability, N is the number of first features, and CE is the loss function.
In some possible designs, the inputting the first feature and the second feature into a multi-view self-training neural network model, and respectively calculating a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature includes:
and comprehensively calculating a second probability from the main view and the auxiliary view by adopting a second probability formula, wherein the second probability formula is as follows:
p2 fwd=NNfwd(ht'(xt))
p2 bwd=NNbwd(ht'(xt))
p2 future=NNfuture(ht'(xt))
p2 past=NNpast(ht'(xt))
wherein p is2 fwdSecond probability of previous moment, p2 bwdA second probability, p, of a later instant2 futureFor a second probability of future time, p2 pastIs a second probability of elapsed time, ht' As a second feature at time t, NNfwdIs a calculation function of the second probability of the previous moment, NNbwdFor a calculation function of a second probability at a later time, NNfutureFor the calculation function of the second probability of future time, NNpastAs a function of the calculation of the past time, xtThe second characteristic at time t. p is a radical of2 past、p2 fwd、 p2 bwd、p2 futureFrom left to right, arranged in time;
and adjusting the second probability by using loss functions corresponding to four auxiliary views to obtain the second keyword probability, wherein a calculation formula of the second keyword probability is as follows:
wherein p isNot labeledFor the second keyword probability, θ is four auxiliary views including fwd, bwd, future, past, p2 θA second probability corresponding to four auxiliary views, N being the number of second features, DθThe loss function for four auxiliary views.
In some possible designs, the extracting a plurality of words to be recognized from the text to be recognized includes:
segmenting a text to be recognized into words, making part-of-speech identifications on the words after segmentation, keeping the part-of-speech identifications as words of a first word, a verb, an adjective and an adverb, and taking each word as a node;
calculating the weight value of each node in the text to be recognized, wherein the weight value calculation formula is as follows:
wherein WS (V)i) Is node ViThe weight value in the text to be recognized, d is a damping coefficient and is a preset constant, wjiIs node ViAnd node VjWeight between, Out (V)j) Is node VjSet of pointed-to nodes, node VkIs node VjDirected node, wjkIs node VkAnd node VjWeight between, WS (V)j) Is node VjA weight value in the text to be recognized;
dividing the weight value of each node by the maximum weight value in the weight value set to obtain the normalized weight value of each node in the text to be identified, and taking the word corresponding to the node of which the normalized weight value is greater than a preset weight value threshold value in the text to be identified as the word to be identified.
In a second aspect, the present application provides an apparatus for identifying keywords in an interview video, which has a function of implementing the method for identifying keywords in an interview video corresponding to the first aspect. The functions can be realized by hardware, and can also be realized by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, which may be software and/or hardware.
In one possible design, the means for identifying keywords in the interview video comprises:
the input and output module is used for inputting a plurality of collected training texts into a multi-view self-training neural network model so as to train the multi-view self-training neural network model, and the training texts are used for training the multi-view self-training neural network model;
the processing module is used for acquiring a voice signal, calling a voice recognition system and converting the voice signal into a text to be recognized; extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are the words to be recognized which are fed back by the interviewee and are marked or not; generating prompt information;
the display module is used for displaying the prompt information and the words to be recognized, and the prompt information is used for prompting the interviewer to label the words to be recognized;
the processing module is further used for inputting the multiple words to be recognized into the multi-view self-training neural network model through the input and output module, and calculating the probability of keywords of the words to be recognized as keywords; comparing the keyword probability with a probability threshold, and when the keyword probability is in the range of the probability threshold, marking all the words to be recognized in the range of the probability threshold as keywords; and sending the keyword and the notification message to at least one interview server in the interview server list through the input/output module according to a preset interview server list, wherein the notification message is used for prompting the interview server to upload a final interview result in time.
In some possible designs, the processing module is specifically configured to:
dividing the training text into a first training text and a second training text according to the labels, wherein the first training text is the training text with the labels, and the second training text is the training text without the labels;
converting the first training text into a first word vector according to the coding rule, and converting the second training text into a second word vector;
extracting a first feature of the first word vector and extracting a second feature of the second word vector;
inputting the first characteristic and the second characteristic into a multi-view self-training neural network model through the input and output module, and respectively calculating to obtain a first keyword probability corresponding to the first characteristic and a second keyword probability corresponding to the second characteristic;
comparing the first keyword probability and the second keyword probability with a preset probability threshold respectively;
when any first loss probability or second loss probability is lower than the lower limit of a preset probability threshold, setting the first loss probability or the second loss probability as the lower limit of a new probability threshold;
and when any first loss probability or second loss probability is higher than the upper limit of the preset probability threshold, setting the first loss probability or the second loss probability as the upper limit of a new probability threshold.
In some possible designs, the processing module is specifically configured to:
inputting the first word vector and the second word vector into a GRU encoder respectively;
and respectively carrying out conversion and feature extraction operations on the first word vector and the second word vector in a GRU encoder to obtain the first feature in the first word vector and the second feature in the second word vector.
In some possible designs, the processing module is specifically configured to:
calculating a reset gate and an update gate according to the hidden layer at the time t-1, the first word vector at the time t and the second word vector at the time t;
calculating a candidate hidden layer according to the reset gate, the first word vector at the time t and the second word vector at the time t;
hidden layer information in the first word vector and the second word vector is respectively calculated according to the candidate hidden layers;
extracting a first feature of the first word vector and a second feature of the second word vector according to the hidden layer information; wherein the formula for extracting the first feature and the second feature is as follows:
ht′=σ(WOht)
wherein h ist' is the first feature, the second feature, htTo hide layer information, WOThe characteristic weight matrix is a preset matrix, and sigma is a calculation function. And (4) calculating to obtain the first characteristic and the second characteristic which are the characteristics of the key words.
In some possible designs, the processing module is specifically configured to:
calculating a first probability of the first feature from the primary perspective by SoftMax using a first probability formula:p1=NN(ht')=softmax(U·ReLU(W(ht') + b); wherein p is1Is a first probability, ht' is a first characteristic of time t, ReLU is an activation function, U, W is a probability matrix which is a preset matrix, b is a key probability parameter which is a preset constant and is used for compensating an error of first key probability calculation, and softmax is a calculation function;
adjusting the first probability with a loss function from an auxiliary perspective to obtain the first keyword probability, the second keyword probability being:wherein p isLabelingIs the first keyword probability, ht' As a first feature, p1Is the first probability, N is the number of first features, and CE is the loss function.
In some possible designs, the processing module is specifically configured to:
and comprehensively calculating a second probability from the main view and the auxiliary view by adopting a second probability formula, wherein the second probability formula is as follows:
p2 fwd=NNfwd(ht'(xt))
p2 bwd=NNbwd(ht'(xt))
p2 future=NNfuture(ht'(xt))
p2 past=NNpast(ht'(xt))
wherein p is2 fwdSecond probability of previous moment, p2 bwdA second probability, p, of a later instant2 futureFor a second probability of future time, p2 pastIs a second probability of elapsed time, ht' As a second feature at time t, NNfwdIs a calculation function of the second probability of the previous moment, NNbwdFor a calculation function of a second probability at a later time, NNfutureAs a function of the calculation of the second probability for a future time, NNpastas a function of the calculation of the past time, xtThe second characteristic at time t. p is a radical of2 past、p2 fwd、 p2 bwd、p2 futureFrom left to right, arranged in time;
and adjusting the second probability by using loss functions corresponding to four auxiliary views to obtain the second keyword probability, wherein a calculation formula of the second keyword probability is as follows:
wherein p isNot labeledFor the second keyword probability, θ is four auxiliary views including fwd, bwd, future, past, p2 θA second probability corresponding to four auxiliary views, N being the number of second features, DθThe loss function for four auxiliary views.
In some possible designs, the processing module is specifically configured to:
segmenting a text to be recognized into words, making part-of-speech identifications on the words after segmentation, keeping the part-of-speech identifications as words of a first word, a verb, an adjective and an adverb, and taking each word as a node;
calculating the weight value of each node in the text to be recognized, wherein the weight value calculation formula is as follows:
wherein WS (V)i) Is node ViThe weight value in the text to be recognized, d is a damping coefficient and is a preset constant, wjiIs node ViAnd node VjWeight between, Out (V)j) Is node VjSet of pointed-to nodes, node VkIs node VjDirected node, wjkIs node VkAnd node VjWeight between, WS (V)j) Is node VjA weight value in the text to be recognized;
dividing the weight value of each node by the maximum weight value in the weight value set to obtain the normalized weight value of each node in the text to be identified, and taking the word corresponding to the node of which the normalized weight value is greater than a preset weight value threshold value in the text to be identified as the word to be identified.
A further aspect of the application provides a computer device comprising at least one connected processor, memory and transceiver, wherein the memory is configured to store program code and the processor is configured to invoke the program code in the memory to perform the method of the first aspect.
A further aspect of the present application provides a computer storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect described above.
Compared with the prior art, in the scheme provided by the application, the multi-view self-training model is trained, the voice signal is converted into the text to be recognized, the key words in the text to be recognized are recognized based on the multi-view self-training model, namely the neural network model is trained from the main view and the multiple auxiliary views respectively, the accuracy and the hit rate of recognizing the key words can be improved, and the recognition precision of the neural network model is improved. In addition, the method for extracting the text features by the GRU encoder is improved, the purpose of accurately positioning the keywords in the text can be achieved, and the accuracy rate of extracting the keywords in the text is further improved.
Drawings
FIG. 1 is a flowchart illustrating a method for identifying keywords in an interview video according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a structure of an apparatus for identifying keywords in an interview video according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a computer device in an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further described with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not explicitly listed or inherent to such process, method, article, or apparatus, and such that a division of modules presented in this application is merely a logical division that may be implemented in an actual implementation in another manner, such that multiple modules may be combined or integrated in another system, or some features may be omitted, or not implemented.
The application provides a method, a device, equipment and a storage medium for identifying keywords in an interview video, which can be used for video interview or voice interview and also can be used for emotion analysis of a speaker, and the application scene of the scheme is not limited.
Referring to fig. 1, a method for identifying keywords in an interview video in an embodiment of the application is described as follows, where the method includes:
101. and inputting a plurality of collected training texts into a multi-view self-training neural network model so as to train the multi-view self-training neural network model.
Wherein the training text is used for training the multi-view self-training neural network model.
Respectively the multi-view self-training neural network model
The training texts are obtained from a text database provided by a business demander, the text database is a preset text storage library and is provided by the business demander, and a plurality of training texts are stored in the text database. The training text comprises keywords of working ability and quality meeting interview requirements.
In some embodiments, the inputting the collected plurality of training texts into the multi-view self-training neural network model comprises:
dividing the training text into a first training text and a second training text according to the labels, wherein the first training text is the training text with the labels, and the second training text is the training text without the labels;
converting the first training text into a first word vector according to the coding rule, and converting the second training text into a second word vector;
extracting a first feature of the first word vector and extracting a second feature of the second word vector;
inputting the first feature and the second feature into a multi-view self-training neural network model, and respectively calculating to obtain a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature;
comparing the first keyword probability and the second keyword probability with a preset probability threshold respectively;
when any first loss probability or second loss probability is lower than the lower limit of a preset probability threshold, setting the first loss probability or the second loss probability as the lower limit of a new probability threshold;
and when any first loss probability or second loss probability is higher than the upper limit of the preset probability threshold, setting the first loss probability or the second loss probability as the upper limit of a new probability threshold.
Therefore, all the first keyword probabilities and all the second keyword probabilities calculated through the multi-view self-training neural network model are respectively compared with preset probability threshold values, the range of the probability threshold values is adjusted, and a more accurate keyword identification range can be obtained.
In some embodiments, the extracting first features of the first word vector and extracting second features of the second word vector comprises:
inputting the first word vector and the second word vector into a Gated Recursive Unit (GRU) encoder;
and respectively carrying out conversion and feature extraction operations on the first word vector and the second word vector in a GRU encoder to obtain the first feature in the first word vector and the second feature in the second word vector.
102. Collecting voice signals, calling a voice recognition system, and converting the voice signals into texts to be recognized.
Specifically, a voice signal can be detected in real time through the voice receiving device, the voice signal is interview voice sent by an applicant in an interview environment, and the voice signal triggers the voice recognition system to convert the voice signal into a text to be recognized and is used as a basis for recognizing keywords. The main purpose of step 2 is to convert the voice signal into a text to be recognized, so that the difficulty of voice recognition is reduced, and keywords can be recognized more easily.
The voice recognition system is a preset system, and specifically, hundredth voice recognition, news flight voice recognition or ali cloud voice recognition and the like can be selected, and the specific application is not limited.
103. And extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are the words to be recognized which are fed back by the interviewee and are not labeled.
In some embodiments, the extracting a plurality of words to be recognized from the text to be recognized includes:
segmenting a text to be recognized into words, making part-of-speech identifications on the words after segmentation, keeping the part-of-speech identifications as words of a first word, a verb, an adjective and an adverb, and taking each word as a node;
calculating the weight value of each node in the text to be recognized, wherein the weight value calculation formula is as follows:
wherein WS (V)i) Is node ViThe weight value in the text to be recognized, d is a damping coefficient and is a preset constant, wjiIs node ViAnd node VjWeight between, Out (V)j) Is node VjSet of pointed-to nodes, node VkIs node VjDirected node, wjkIs node VkAnd node VjThe weight between the weight of the first and second groups,
WS(Vj) Is node VjA weight value in the text to be recognized;
dividing the weight value of each node by the maximum weight value in the weight value set to obtain the normalized weight value of each node in the text to be identified, and taking the word corresponding to the node of which the normalized weight value is greater than a preset weight value threshold value in the text to be identified as the word to be identified.
104. And generating prompt information, displaying the prompt information and the words to be recognized, inputting the words to be recognized into the multi-view self-training neural network model, and calculating the keyword probability of each word to be recognized as a keyword.
The prompt information is used for prompting the interviewer to label the words to be recognized.
The keywords refer to words in the text to be recognized, whose weighted values are higher than a preset weighted value threshold, and the weighted value threshold of the keywords may be specifically set according to dimensions such as word frequency, part of speech, text theme, and the like, which is not limited in this application.
The keyword probability refers to the probability that a word in the text to be recognized is a keyword.
In the embodiment of the present application, whether to label or not can be freely selected by the interviewer, and the interviewer can make text labels or not in the words to be recognized, which is not limited in the present application.
The method comprises the steps of classifying according to whether a text to be recognized is a text with a mark or not, and classifying according to the mark, wherein the probability that a corresponding word to be recognized is a keyword, specifically, the probability that the word to be recognized is the keyword in the application can be classified into a first keyword probability and a second keyword probability.
Optionally, in some embodiments of the present application, the multi-view self-trained neural network model calculates the first keyword probability and the second keyword probability from the main view and the auxiliary view. The main view refers to the current time, the auxiliary view includes a future time, a past time, a previous time and a next time, the future time does not include the next time, and the past time does not include the previous time.
The following introduces a procedure of calculating the first keyword probability and the second keyword probability from the viewpoint that the types of the features are the first feature and the second feature, respectively.
(1) If the first feature is obtained, the process of calculating the probability of the first keyword is as follows:
calculating a first probability of the first feature from the primary perspective by SoftMax using a first probability formula of: p is a radical of1=NN(ht')=softmax(U·ReLU(W(ht') + b); wherein p is1Is a first probability, ht' is a first characteristic of time t, ReLU is an activation function, U, W is a probability matrix which is a preset matrix, b is a key probability parameter which is a preset constant and is used for compensating an error of first key probability calculation, and softmax is a calculation function;
adjusting the first probability with a loss function from an auxiliary perspective to obtain the first keyword probability, the first keyword probability being:wherein p isLabelingIs the first keyword probability, ht' As a first feature, p1Is the first probability, N is the number of first features, and CE is the loss function.
(2) If the second feature is the first feature, the process of calculating the probability of the first keyword is as follows:
and comprehensively calculating a second probability from the main view and the auxiliary view by adopting a second probability formula, wherein the second probability formula is as follows:
p2 fwd=NNfwd(ht'(xt))
p2 bwd=NNbwd(ht'(xt))
p2 future=NNfuture(ht'(xt))
p2 past=NNpast(ht'(xt))
wherein p is2 fwdSecond probability of previous moment, p2 bwdA second probability, p, of a later instant2 futureFor a second probability of future time, p2 pastIs a second probability of elapsed time, ht' As a second feature at time t, NNfwdIs a calculation function of the second probability of the previous moment, NNbwdFor a calculation function of a second probability at a later time, NNfutureFor the calculation function of the second probability of future time, NNpastAs a function of the calculation of the past time, xtThe second characteristic at time t. p is a radical of2 past、p2 fwd、 p2 bwd、p2 futureFrom left to right, arranged in time;
and adjusting the second probability by using loss functions corresponding to four auxiliary views to obtain the second keyword probability, wherein a calculation formula of the second keyword probability is as follows:
wherein p isNot labeledFor the second keyword probability, θ is four auxiliary views including fwd, bwd, future, past, p2 θA second probability corresponding to four auxiliary views, N being the number of second features, DθThe loss function for four auxiliary views.
In the embodiment of the present application, the probability threshold range of the probability of judging whether the text to be recognized is the keyword in the processes of step 102 to step 105 may be continuously adjusted by calculating the keyword probability of the keyword.
105. And comparing the keyword probability with a probability threshold, and when the keyword probability is in the range of the probability threshold, marking the words to be recognized in the range of the probability threshold as key words.
In some embodiments, when the annotation instruction of the interviewer for the word to be recognized is detected, the keyword probability that the word to be recognized is the keyword is calculated by the calculation method for calculating the first keyword probability in step 104.
In other embodiments, when the interviewer is detected that the word to be recognized is not labeled, the calculation method for calculating the second keyword probability in step 104 is used to calculate the keyword probability that the word to be recognized is the keyword.
106. And sending the keyword and the notification message to at least one interview server in the interview server list according to a preset interview server name, wherein the notification message is used for prompting the interview server to upload a final interview result in time.
Compared with the existing mechanism, in the embodiment of the application, the multi-view self-training model is trained, the voice signal is converted into the text to be recognized, and the keywords in the text to be recognized are recognized based on the multi-view self-training model, namely, the neural network model is trained from the main view and the multiple auxiliary views respectively, so that the accuracy and hit rate of recognizing the keywords can be improved, and the recognition precision of the neural network model is improved.
Optionally, in some embodiments of the present application, the performing, in a GRU encoder, conversion and feature extraction operations on a first word vector and a second word vector respectively to obtain the first feature in the first word vector and the second feature in the second word vector includes:
calculating a reset gate and an update gate according to the hidden layer at the time t-1, the first word vector at the time t and the second word vector at the time t;
calculating a candidate hidden layer according to the reset gate, the first word vector at the time t and the second word vector at the time t;
hidden layer information in the first word vector and the second word vector is respectively calculated according to the candidate hidden layers;
extracting a first feature of the first word vector and a second feature of the second word vector according to the hidden layer information; wherein the formula for extracting the first feature and the second feature is as follows:
ht′=σ(WOht)
wherein h ist' is the first feature, the second feature, htTo hide layer information, WOThe characteristic weight matrix is a preset matrix, and sigma is a calculation function. And (4) calculating to obtain the first characteristic and the second characteristic which are the characteristics of the key words.
Therefore, the method for extracting the text features by the GRU encoder is improved, the purpose of accurately positioning the keywords in the text can be achieved, and the accuracy rate of extracting the keywords in the text is further improved.
The technical features mentioned in the embodiment or implementation manner corresponding to fig. 1 are also applicable to the embodiments corresponding to fig. 2 and fig. 3 in the present application, and the details of the following similarities are not repeated.
A method for identifying keywords in an interview video in the present application is described above, and an apparatus for performing the method for identifying keywords in an interview video is described below.
Fig. 2 is a schematic structural diagram of an apparatus 20 for identifying keywords in an interview video, which can be applied to video interviews. The apparatus 20 in the embodiment of the present application is capable of implementing steps corresponding to the method for identifying keywords in an interview video performed in the embodiment corresponding to fig. 1. The functions implemented by the apparatus 20 may be implemented by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, which may be software and/or hardware. The apparatus 20 may include an input/output module 201, a processing module 202, and a display module 203, and the processing module 202, the input/output module 201, and the display module 203 may refer to operations performed in the embodiment corresponding to fig. 1, which are not described herein again. The processing module 202 may be used to control the input and output operations of the input and output module 201 and control the display operation of the display module 203.
In some embodiments, the input/output module 201 is configured to input a plurality of collected training texts into a multi-view self-training neural network model to train the multi-view self-training neural network model, where the training texts are used for training the multi-view self-training neural network model;
the processing module 202 may be configured to collect a voice signal, call a voice recognition system, and convert the voice signal into a text to be recognized; extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are the words to be recognized which are fed back by the interviewee and are marked or not; generating prompt information;
the display module 202 may be configured to display the prompt information and the to-be-recognized word, where the prompt information is used to prompt the interviewee to label the to-be-recognized word;
the processing module 202 is further configured to input the multiple to-be-recognized words into the multi-view self-training neural network model through the input/output module 201, and calculate a keyword probability that each to-be-recognized word is a keyword; comparing the keyword probability with a probability threshold, and marking the words to be recognized in the range of the probability threshold as keywords when the keyword probability is in the range of the probability threshold; and sending the keyword and the notification message to at least one interview server in the interview server list through the input/output module 201 according to a preset interview server list, wherein the notification message is used for prompting the interview server to upload a final interview result in time.
Compared with the existing mechanism, in the embodiment of the application, the processing module 202 trains the multi-view self-training model, converts the voice signal into the text to be recognized, and recognizes the keywords in the text to be recognized based on the multi-view self-training model, that is, trains the neural network model from the main view and the multiple auxiliary views respectively, so that the accuracy and hit rate of recognizing the keywords can be improved, that is, the recognition accuracy of the neural network model is improved.
In some embodiments, the processing module 202 is specifically configured to:
dividing the training text into a first training text and a second training text according to the labels, wherein the first training text is the training text with the labels, and the second training text is the training text without the labels;
converting the first training text into a first word vector according to the coding rule, and converting the second training text into a second word vector;
extracting a first feature of the first word vector and extracting a second feature of the second word vector;
inputting the first feature and the second feature into a multi-view self-training neural network model through the input/output module 201, and respectively calculating to obtain a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature;
comparing the first keyword probability and the second keyword probability with a preset probability threshold respectively;
when any first loss probability or second loss probability is lower than the lower limit of a preset probability threshold, setting the first loss probability or the second loss probability as the lower limit of a new probability threshold;
and when any first loss probability or second loss probability is higher than the upper limit of the preset probability threshold, setting the first loss probability or the second loss probability as the upper limit of a new probability threshold.
In some embodiments, the processing module 202 is specifically configured to:
inputting the first word vector and the second word vector into a GRU encoder through the input-output module 201, respectively;
and respectively carrying out conversion and feature extraction operations on the first word vector and the second word vector in a GRU encoder to obtain the first feature in the first word vector and the second feature in the second word vector.
In some embodiments, the processing module 202 is specifically configured to:
calculating a reset gate and an update gate according to the hidden layer at the time t-1, the first word vector at the time t and the second word vector at the time t;
calculating a candidate hidden layer according to the reset gate, the first word vector at the time t and the second word vector at the time t;
hidden layer information in the first word vector and the second word vector is respectively calculated according to the candidate hidden layers;
extracting a first feature of the first word vector and a second feature of the second word vector according to the hidden layer information; wherein the formula for extracting the first feature and the second feature is as follows:
ht′=σ(WOht)
wherein h ist' is the first feature, the second feature, htTo hide layer information, WOThe characteristic weight matrix is a preset matrix, and sigma is a calculation function. And (4) calculating to obtain the first characteristic and the second characteristic which are the characteristics of the key words.
In some embodiments, the processing module 202 is specifically configured to:
calculating a first probability of the first feature from the primary perspective by SoftMax using a first probability formula of: p is a radical of1=NN(ht')=softmax(U·ReLU(W(ht') + b); wherein p is1Is a first probability, ht' is a first characteristic of time t, ReLU is an activation function, U, W is a probability matrix which is a preset matrix, b is a key probability parameter which is a preset constant and is used for compensating an error of first key probability calculation, and softmax is a calculation function;
adjusting the first probability with a loss function from an auxiliary perspective to obtain the first keyword probability, the second keyword probability being:wherein p isLabelingIs the first keyword probability, ht' As a first feature, p1Is the first probability, N is the number of first features, and CE is the loss function.
In some embodiments, the processing module 202 is specifically configured to:
and comprehensively calculating a second probability from the main view and the auxiliary view by adopting a second probability formula, wherein the second probability formula is as follows:
p2 fwd=NNfwd(ht'(xt))
p2 bwd=NNbwd(ht'(xt))
p2 future=NNfuture(ht'(xt))
p2 past=NNpast(ht'(xt))
wherein p is2 fwdSecond probability of previous moment, p2 bwdA second probability, p, of a later instant2 futureFor a second probability of future time, p2 pastIs a second probability of elapsed time, ht' As a second feature at time t, NNfwdIs a calculation function of the second probability of the previous moment, NNbwdFor a calculation function of a second probability at a later time, NNfutureFor the calculation function of the second probability of future time, NNpastAs a function of the calculation of the past time, xtThe second characteristic at time t. p is a radical of2 past、p2 fwd、 p2 bwd、p2 futureFrom left to right, arranged in time;
and adjusting the second probability by using loss functions corresponding to four auxiliary views to obtain the second keyword probability, wherein a calculation formula of the second keyword probability is as follows:
wherein p isNot labeledFor the second keyword probability, θ is four auxiliary views including fwd, bwd, future, past, p2 θA second probability corresponding to four auxiliary views, N being the number of second features, DθThe loss function for four auxiliary views.
In some embodiments, the processing module 202 is specifically configured to:
segmenting a text to be recognized into words, making part-of-speech identifications on the words after segmentation, keeping the part-of-speech identifications as words of a first word, a verb, an adjective and an adverb, and taking each word as a node;
calculating the weight value of each node in the text to be recognized, wherein the weight value calculation formula is as follows:
wherein WS (V)i) Is node ViThe weight value in the text to be recognized, d is a damping coefficient and is a preset constant, wjiIs node ViAnd node VjWeight between, Out (V)j) Is node VjSet of pointed-to nodes, node VkIs node VjDirected node, wjkIs node VkAnd node VjWeight between, WS (V)j) Is node VjA weight value in the text to be recognized;
dividing the weight value of each node by the maximum weight value in the weight value set to obtain the normalized weight value of each node in the text to be identified, and taking the word corresponding to the node of which the normalized weight value is greater than a preset weight value threshold value in the text to be identified as the word to be identified.
The physical device corresponding to the input/output module 201 shown in fig. 2 is the input/output unit shown in fig. 3, and the input/output unit can implement part or all of the functions of the input/output module 201, or implement the same or similar functions as the input/output module 201.
The physical device corresponding to the processing module 202 shown in fig. 2 is a processor shown in fig. 3, and the processor can implement part or all of the functions of the processing module 202, or implement the same or similar functions as the processing module 202.
The physical device corresponding to the display module 203 shown in fig. 2 is a processor shown in fig. 3, and the processor can implement part or all of the functions of the display module 203, or implement the same or similar functions as the display module 203.
While the embodiment 20 of the present application is described above from the perspective of a modular functional entity, the following description describes a computer device from the perspective of hardware, as shown in fig. 3, which includes: a processor, a memory, a transceiver (which may also be an input-output unit, not identified in fig. 3), and a computer program stored in the memory and executable on the processor. For example, the computer program may be a program corresponding to the method for identifying keywords in the interview video in the embodiment corresponding to fig. 1. For example, when the computer device implements the functions of the apparatus 20 shown in fig. 2, the processor executes the computer program to implement the steps of the method for identifying keywords in the interview video, which is performed by the apparatus 20 in the embodiment corresponding to fig. 2; alternatively, the processor implements the functions of the modules in the apparatus 20 according to the embodiment corresponding to fig. 2 when executing the computer program. For another example, the computer program may be a program corresponding to the method for identifying keywords in the interview video in the embodiment corresponding to fig. 1.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is the control center of the computer device and that connects the various parts of the overall computer device using various interfaces and lines.
The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the computer device by running or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, video data, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The transceivers may also be replaced by receivers and transmitters, which may be the same or different physical entities. When the same physical entity, may be collectively referred to as a transceiver. The transceiver may be an input-output unit.
The memory may be integrated in the processor or may be provided separately from the processor.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM), and includes several instructions for enabling a terminal (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.
The embodiments of the present application have been described above with reference to the drawings, but the present application is not limited to the above-mentioned embodiments, which are only illustrative and not restrictive, and those skilled in the art can make many forms without departing from the spirit and scope of the present application and protection thereof, and all changes that come within the meaning and range of equivalency of the claims are intended to be embraced therein.

Claims (10)

1. A method for identifying keywords in an interview video, the method comprising:
inputting a plurality of collected training texts into a multi-view self-training neural network model to train the multi-view self-training neural network model, wherein the training texts are used for training the multi-view self-training neural network model;
collecting a voice signal, calling a voice recognition system, and converting the voice signal into a text to be recognized;
extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are the words to be recognized which are fed back by the interviewee and are marked or not;
generating prompt information, and displaying the prompt information and the words to be recognized, wherein the prompt information is used for prompting an interviewer to label the words to be recognized;
inputting the multiple words to be recognized into the multi-view self-training neural network model, and calculating the keyword probability of each word to be recognized as a keyword;
comparing the keyword probability with a probability threshold, and when the keyword probability is in the range of the probability threshold, marking all the words to be recognized in the range of the probability threshold as keywords;
and sending the keyword and the notification message to at least one interview server in the interview server list according to a preset interview server name, wherein the notification message is used for prompting the interview server to upload a final interview result in time.
2. The method of claim 1, wherein inputting the collected plurality of training texts into a multi-view self-training neural network model comprises:
dividing the training text into a first training text and a second training text according to the labels, wherein the first training text is the training text with the labels, and the second training text is the training text without the labels;
converting the first training text into a first word vector according to the coding rule, and converting the second training text into a second word vector;
extracting a first feature of the first word vector and extracting a second feature of the second word vector;
inputting the first characteristic and the second characteristic into a multi-view self-training neural network model, and respectively calculating to obtain a first keyword probability corresponding to the first characteristic and a second keyword probability corresponding to the second characteristic;
comparing the first keyword probability and the second keyword probability with preset probability threshold values respectively;
when any one of the first loss probability or the second loss probability is lower than the lower limit of a preset probability threshold, setting the first loss probability or the second loss probability as the lower limit of a new probability threshold;
and when any first loss probability or second loss probability is higher than the upper limit of the preset probability threshold, setting the first loss probability or the second loss probability as the upper limit of a new probability threshold.
3. The method of claim 1, wherein extracting the first feature of the first word vector and extracting the second feature of the second word vector comprises:
inputting the first word vector and the second word vector into a GRU encoder respectively;
and respectively carrying out conversion and feature extraction operations on the first word vector and the second word vector in a GRU encoder to obtain the first feature in the first word vector and the second feature in the second word vector.
4. The method of claim 3, wherein performing the transforming and feature extracting operations on the first word vector and the second word vector in the GRU encoder to obtain the first feature in the first word vector and the second feature in the second word vector respectively comprises:
calculating a reset gate and an update gate according to the hidden layer at the time t-1, the first word vector at the time t and the second word vector at the time t;
calculating a candidate hidden layer according to the reset gate, the first word vector at the time t and the second word vector at the time t;
hidden layer information in the first word vector and the second word vector is respectively calculated according to the candidate hidden layers;
extracting a first feature of the first word vector and a second feature of the second word vector according to the hidden layer information; wherein the formula for extracting the first feature and the second feature is as follows:
ht′=σ(WOht)
wherein h ist' is the first feature, the second feature, htTo hide layer information, WOThe characteristic weight matrix is a preset matrix, and sigma is a calculation function. And (4) calculating to obtain the first characteristic and the second characteristic which are the characteristics of the key words.
5. The method according to any one of claims 2-4, wherein the inputting the first feature and the second feature into a multi-view self-training neural network model, and respectively calculating a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature comprises:
calculating a first probability of the first feature from the primary perspective by SoftMax using a first probability formula of: p is a radical of1=NN(ht')=softmax(U·ReLU(W(ht') + b); wherein p is1Is a first probability, ht' As the first feature at time t, ReLU is the activation function, U, W is the probability matrix, which is a predetermined matrix, and b is the key probability parameter, which is a predetermined constant, which is used to compensate the first key probabilityError of rate calculation, softmax is a calculation function;
adjusting the first probability with a loss function from an auxiliary perspective to obtain the first keyword probability, the second keyword probability being:wherein p isLabelingIs the first keyword probability, ht' As a first feature, p1Is the first probability, N is the number of first features, and CE is the loss function.
6. The method according to any one of claims 2-4, wherein the inputting the first feature and the second feature into a multi-view self-training neural network model, and respectively calculating a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature comprises:
and comprehensively calculating a second probability from the main view and the auxiliary view by adopting a second probability formula, wherein the second probability formula is as follows:
p2 fwd=NNfwd(ht'(xt))
p2 bwd=NNbwd(ht'(xt))
p2 future=NNfuture(ht'(xt))
p2 past=NNpast(ht'(xt))
wherein p is2 fwdSecond probability of previous moment, p2 bwdA second probability, p, of a later instant2 futureFor a second probability of future time, p2 pastIs a second probability of elapsed time, ht' As a second feature at time t, NNfwdIs a calculation function of the second probability of the previous moment, NNbwdFor a calculation function of the second probability of a later instant, NNfutureFor the calculation function of the second probability of future time, NNpastAs a function of the calculation of the past time, xtThe second characteristic at time t. p is a radical of2 past、p2 fwd、p2 bwd、p2 futureFrom left to right, arranged in time;
and adjusting the second probability by using loss functions corresponding to four auxiliary views to obtain the second keyword probability, wherein the calculation formula of the second keyword probability is as follows:
wherein p isNot labeledFor the second keyword probability, θ is four auxiliary views including fwd, bwd, future, past, p2 θA second probability for four auxiliary views, N being the number of second features, DθThe loss function for four auxiliary views.
7. The method of claim 1, wherein the extracting a plurality of words to be recognized from the text to be recognized comprises:
dividing the text to be recognized into words, making part-of-speech identifications on the words after word division, keeping the part-of-speech identifications as words of nouns, verbs, adjectives and adverbs, and taking each word as a node;
calculating the weight value of each node in the text to be recognized, wherein the weight value calculation formula is as follows:
wherein WS (V)i) Is node ViThe weight value in the text to be recognized, d is a damping coefficient and is a preset constant, wjiIs node ViAnd node VjWeight between, Out (V)j) Is node VjSet of pointed-to nodes, node VkIs node VjDirected node, wjkIs node VkAnd node VjWeight between, WS (V)j) Is node VjA weight value in the text to be recognized;
dividing the weighted value of each node by the maximum weighted value in the weighted value set to obtain the normalized weighted value of each node in the text to be recognized, and taking the word corresponding to the node of which the normalized weighted value is greater than the preset weighted value threshold value in the text to be recognized as the word to be recognized.
8. An apparatus for identifying keywords in an interview video, the apparatus comprising:
the input and output module is used for inputting a plurality of collected training texts into a multi-view self-training neural network model so as to train the multi-view self-training neural network model, and the training texts are used for training the multi-view self-training neural network model;
the processing module is used for acquiring a voice signal, calling a voice recognition system and converting the voice signal into a text to be recognized; extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are the words to be recognized which are fed back by the interviewee and are marked or not; generating prompt information;
the display module is used for displaying the prompt information and the words to be recognized, and the prompt information is used for prompting the interviewer to label the words to be recognized;
the processing module is further used for inputting the multiple words to be recognized into the multi-view self-training neural network model through the input and output module, and calculating the keyword probability of each word to be recognized as a keyword; comparing the keyword probability with a probability threshold, and when the keyword probability is in the range of the probability threshold, marking all the words to be recognized in the range of the probability threshold as keywords; and sending the keyword and the notification message to at least one interview server in the interview server list through the input/output module according to a preset interview server list, wherein the notification message is used for prompting the interview server to upload a final interview result in time.
9. A computer device, characterized in that the computer device comprises:
at least one processor, memory, and transceiver;
wherein the memory is configured to store program code and the processor is configured to invoke the program code stored in the memory to perform the method of any of claims 1-7.
10. A computer storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1-7.
CN201910706481.0A 2019-08-01 2019-08-01 Method, device, equipment and storage medium for identifying keywords in interview video Active CN110619035B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910706481.0A CN110619035B (en) 2019-08-01 2019-08-01 Method, device, equipment and storage medium for identifying keywords in interview video
PCT/CN2019/117928 WO2021017296A1 (en) 2019-08-01 2019-11-13 Information recognition method, device, apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910706481.0A CN110619035B (en) 2019-08-01 2019-08-01 Method, device, equipment and storage medium for identifying keywords in interview video

Publications (2)

Publication Number Publication Date
CN110619035A true CN110619035A (en) 2019-12-27
CN110619035B CN110619035B (en) 2023-07-25

Family

ID=68921514

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910706481.0A Active CN110619035B (en) 2019-08-01 2019-08-01 Method, device, equipment and storage medium for identifying keywords in interview video

Country Status (2)

Country Link
CN (1) CN110619035B (en)
WO (1) WO2021017296A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115049372A (en) * 2022-08-15 2022-09-13 山东心法科技有限公司 Method, apparatus and medium for constructing digital infrastructure for human resource information
CN116366801A (en) * 2023-06-03 2023-06-30 深圳市小麦飞扬科技有限公司 Multi-terminal interaction system for recruitment information
CN116862318A (en) * 2023-09-04 2023-10-10 国电投华泽(天津)资产管理有限公司 New energy project evaluation method and device based on text semantic feature extraction

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377965B (en) * 2021-06-30 2024-02-23 中国农业银行股份有限公司 Method and related device for sensing text keywords
CN116882416B (en) * 2023-09-08 2023-11-21 江西省精彩纵横采购咨询有限公司 Information identification method and system for bidding documents

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103943107A (en) * 2014-04-03 2014-07-23 北京大学深圳研究生院 Audio/video keyword identification method based on decision-making level fusion
CN105740900A (en) * 2016-01-29 2016-07-06 百度在线网络技术(北京)有限公司 Information identification method and apparatus
US20180204111A1 (en) * 2013-02-28 2018-07-19 Z Advanced Computing, Inc. System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform
CN108419123A (en) * 2018-03-28 2018-08-17 广州市创新互联网教育研究院 A kind of virtual sliced sheet method of instructional video
CN108549626A (en) * 2018-03-02 2018-09-18 广东技术师范学院 A kind of keyword extracting method for admiring class
CN108806668A (en) * 2018-06-08 2018-11-13 国家计算机网络与信息安全管理中心 A kind of audio and video various dimensions mark and model optimization method
CN109147763A (en) * 2018-07-10 2019-01-04 深圳市感动智能科技有限公司 A kind of audio-video keyword recognition method and device based on neural network and inverse entropy weighting
CN109697973A (en) * 2019-01-22 2019-04-30 清华大学深圳研究生院 A kind of method, the method and device of model training of prosody hierarchy mark

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108962247B (en) * 2018-08-13 2023-01-31 南京邮电大学 Multi-dimensional voice information recognition system and method based on progressive neural network
CN109871446B (en) * 2019-01-31 2023-06-06 平安科技(深圳)有限公司 Refusing method in intention recognition, electronic device and storage medium
CN109979439B (en) * 2019-03-22 2021-01-29 泰康保险集团股份有限公司 Voice recognition method, device, medium and electronic equipment based on block chain

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180204111A1 (en) * 2013-02-28 2018-07-19 Z Advanced Computing, Inc. System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform
CN103943107A (en) * 2014-04-03 2014-07-23 北京大学深圳研究生院 Audio/video keyword identification method based on decision-making level fusion
CN105740900A (en) * 2016-01-29 2016-07-06 百度在线网络技术(北京)有限公司 Information identification method and apparatus
CN108549626A (en) * 2018-03-02 2018-09-18 广东技术师范学院 A kind of keyword extracting method for admiring class
CN108419123A (en) * 2018-03-28 2018-08-17 广州市创新互联网教育研究院 A kind of virtual sliced sheet method of instructional video
CN108806668A (en) * 2018-06-08 2018-11-13 国家计算机网络与信息安全管理中心 A kind of audio and video various dimensions mark and model optimization method
CN109147763A (en) * 2018-07-10 2019-01-04 深圳市感动智能科技有限公司 A kind of audio-video keyword recognition method and device based on neural network and inverse entropy weighting
CN109697973A (en) * 2019-01-22 2019-04-30 清华大学深圳研究生院 A kind of method, the method and device of model training of prosody hierarchy mark

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MING SUN ET AL: "MAX-POOLING LOSS TRAINING OF LONG SHORT-TERM MEMORY NETWORKS FOR SMALL-FOOTPRINT KEYWORD SPOTTING", 《ARXIV》, pages 1 - 7 *
THEMOS STAFYLAKIS ET AL: "Zero-shot keyword spotting for visual speech recognition in-the-wild", 《PROCEEDINGS OF THE EUROPEAN CONFERENCE ON COMPUTER VISION (ECCV)》, pages 513 - 529 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115049372A (en) * 2022-08-15 2022-09-13 山东心法科技有限公司 Method, apparatus and medium for constructing digital infrastructure for human resource information
CN115049372B (en) * 2022-08-15 2022-12-02 山东心法科技有限公司 Method, apparatus and medium for constructing digital infrastructure for human resource information
CN116366801A (en) * 2023-06-03 2023-06-30 深圳市小麦飞扬科技有限公司 Multi-terminal interaction system for recruitment information
CN116366801B (en) * 2023-06-03 2023-10-13 深圳市小麦飞扬科技有限公司 Multi-terminal interaction system for recruitment information
CN116862318A (en) * 2023-09-04 2023-10-10 国电投华泽(天津)资产管理有限公司 New energy project evaluation method and device based on text semantic feature extraction
CN116862318B (en) * 2023-09-04 2023-11-17 国电投华泽(天津)资产管理有限公司 New energy project evaluation method and device based on text semantic feature extraction

Also Published As

Publication number Publication date
CN110619035B (en) 2023-07-25
WO2021017296A1 (en) 2021-02-04

Similar Documents

Publication Publication Date Title
CN110619035A (en) Method, device and equipment for identifying keywords in interview video and storage medium
CN109117777B (en) Method and device for generating information
WO2020244073A1 (en) Speech-based user classification method and device, computer apparatus, and storage medium
CN109872162B (en) Wind control classification and identification method and system for processing user complaint information
US20180157743A1 (en) Method and System for Multi-Label Classification
CN111695352A (en) Grading method and device based on semantic analysis, terminal equipment and storage medium
CN111046133A (en) Question-answering method, question-answering equipment, storage medium and device based on atlas knowledge base
CN107491435B (en) Method and device for automatically identifying user emotion based on computer
CN111261162B (en) Speech recognition method, speech recognition apparatus, and storage medium
CN111159409B (en) Text classification method, device, equipment and medium based on artificial intelligence
CN108038208B (en) Training method and device of context information recognition model and storage medium
Van Durme Streaming analysis of discourse participants
CN111930792A (en) Data resource labeling method and device, storage medium and electronic equipment
CN113949582B (en) Network asset identification method and device, electronic equipment and storage medium
CN112235470B (en) Incoming call client follow-up method, device and equipment based on voice recognition
JP2017146720A (en) Patent requirement adequacy prediction device and patent requirement adequacy prediction program
CN114416979A (en) Text query method, text query equipment and storage medium
CN117251551A (en) Natural language processing system and method based on large language model
CN110377708B (en) Multi-scene conversation switching method and device
CN112948550B (en) Schedule creation method and device and electronic equipment
CN113158667B (en) Event detection method based on entity relationship level attention mechanism
CN110265024A (en) Requirement documents generation method and relevant device
CN113934848A (en) Data classification method and device and electronic equipment
CN113569021A (en) Method for user classification, computer device and readable storage medium
CN112784011A (en) Emotional problem processing method, device and medium based on CNN and LSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant