CN112199953A

CN112199953A - Method and device for extracting information in telephone conversation and computer equipment

Info

Publication number: CN112199953A
Application number: CN202010854608.6A
Authority: CN
Inventors: 刘嗣平; 柯登峰; 汤丁青; 林旻
Original assignee: Guangzhou Jiusi Intelligent Technology Co ltd
Current assignee: Guangzhou Jiusi Intelligent Technology Co ltd
Priority date: 2020-08-24
Filing date: 2020-08-24
Publication date: 2021-01-08

Abstract

The embodiment of the application belongs to the technical field of semantic recognition and relates to a method for extracting information in telephone conversation, which comprises the steps of carrying out voice recognition on the telephone conversation and extracting word embedding vectors from the result of the voice recognition; inputting the word embedding vector into a pre-trained recognition network to output a probability matrix corresponding to the word embedding vector; inputting the probability matrix into a CRF layer to identify named entities with words embedded in vectors; the named entity is disambiguated to extract the information contained by the named entity. The application also provides an information extraction device, computer equipment and a storage medium in the telephone conversation. According to the method, the probability matrix is extracted firstly, then the probability matrix is restrained through a CRF layer, then the named entity is output, and finally the information contained in the named entity is extracted through disambiguation of the named entity. The scheme can simplify the extraction process of the named entity, meanwhile, the accuracy of the named entity extraction is guaranteed, and the scheme can quickly extract information generated in the call.

Description

Method and device for extracting information in telephone conversation and computer equipment

Technical Field

The present application relates to the field of semantic recognition technology, and in particular, to a method and an apparatus for extracting information in a telephone call, and a computer device

Background

An important task in natural language processing is to extract specific information existing in natural language, wherein the extraction of one kind of information is realized by Named entity recognition (Named entity recognition). The named entity is an information element with a certain identification in a piece of information, such as a company name, a person name, time, a place and the like.

At present, with the development of artificial intelligence, the accuracy of a named entity recognition task is higher and higher, but different from other language systems with natural delimiters, recognition of a named entity in Chinese is difficult, generally, Chinese Word Segmentation (Chinese Word Segmentation) needs to be performed first, and in the Word Segmentation process, a named entity recognition model based on words easily generates problems of entity boundary error Segmentation and occurrence of out-of-vocabulary without words, so in order to improve the performance of named entity recognition, a global laziness neural network capable of capturing characters from different subspaces and any adjacent characters needs to be used. The structure and algorithm of the network are complex, and the operation efficiency is low. The identification method can not be applied to projects with requirements on instantaneity, and the technical problem to be solved by the application is the problem of low information extraction efficiency in the existing telephone call.

Disclosure of Invention

The embodiment of the application aims to provide a method and a device for extracting information in a telephone call and computer equipment, which can quickly extract information generated in the call.

In order to solve the above technical problem, an embodiment of the present application provides a method for extracting information in a telephone call, which adopts the following technical scheme:

a method for extracting information in a telephone call comprises the following steps:

performing voice recognition on the telephone call, and extracting a word embedding vector from a voice recognition result;

inputting the word embedding vector into a pre-trained recognition network to output a probability matrix corresponding to the word embedding vector;

inputting the probability matrix into a CRF layer to identify named entities with words embedded in vectors;

disambiguating the named entity to extract information contained by the named entity.

Further, the inputting the word embedding vector into a pre-trained recognition network to output a probability matrix corresponding to the word embedding vector specifically includes:

inputting the word embedding vector into a Hybrid Gated considerations layer to obtain a set of first feature vectors;

inputting the set of first feature vectors into a high way BilSt layer to obtain a set of second feature vectors;

inputting the set of second feature vectors into a Gated Self-Attention layer to obtain a probability matrix.

Further, the inputting the word embedding vector into a Hybrid Gated considerations layer to obtain a group of first feature vectors specifically includes:

performing hole convolution on the word embedding vector under the activation function;

performing cavity gate convolution on the word embedding vector according to the output of the activation function;

performing gate convolution on the word embedding vector, and splicing the word embedding vector and the output of the hole gate convolution in a matrix to obtain a group of first feature vectors;

further, the inputting the set of first feature vectors into the high way bilst layer to obtain the set of second feature vectors specifically includes:

inputting the set of first feature vectors to BilSTM to obtain a set of intermediate vectors;

and performing gating processing on the group of intermediate vectors to obtain a group of second feature vectors.

Further, the inputting the set of second feature vectors into the Gated Self-orientation layer to obtain the probability matrix specifically includes:

initializing an attention parameter by the second feature vector;

processing the corresponding second eigenvectors according to the attention parameters, and splicing the processed second eigenvectors into an intermediate matrix;

and performing gating processing on the intermediate matrix to obtain a probability matrix.

Further, the method for extracting the word embedding vector from the result of the voice recognition and extracting the word embedding vector through a preset extraction network comprises the following steps:

training the extraction network through the result of speech recognition and the word embedding vector;

and fixing the parameters of the extraction network, and training the recognition network through the output result of the extraction network.

Further, the probability matrix is input to a CRF layer to identify a named entity with a word embedded in a vector, wherein the training mode of the CRF layer specifically includes:

training a CRF layer by minimizing L (theta) back propagation to optimize network parameters;

wherein

For the probability that the named entity of the word embedded in the vector belongs to each tag,

the label with the highest probability is selected as the label,

to be the probability of a sequence transformation,

wherein, O is the probability matrix, and the transpose matrix yi with T being O is the label i in the label sequence y.

In order to solve the above technical problem, an embodiment of the present application further provides … …, which adopts the following technical solutions:

an in-phone-call information extraction apparatus, comprising:

the word embedded vector extraction module is used for carrying out voice recognition on the telephone call and extracting word embedded vectors from a voice recognition result;

a probability matrix output module for inputting the word embedding vector to a pre-trained recognition network to output a probability matrix corresponding to the word embedding vector;

the named entity extracting module is used for inputting the probability matrix to a CRF layer so as to identify the named entity with words embedded in the vector;

and the disambiguation module is used for disambiguating the named entity so as to extract the information contained in the named entity.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

a computer device comprising a memory having stored therein a computer program and a processor implementing the steps of the method for efficient invocation of a distributed lock as described above when executing the computer program.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a method for efficient invocation of a distributed lock as described above.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects: the method comprises the steps of extracting word embedding vectors firstly, then extracting probability vectors of the word embedding vectors corresponding to named entities to be extracted according to the word embedding vectors through a recognition network and splicing the probability vectors into a probability matrix, then constraining the probability matrix through a CRF layer to prevent the extraction of wrong named entities, then outputting the named entities, and finally extracting information contained in the named entities through disambiguation of the named entities. The scheme can simplify the extraction process of the named entity, meanwhile, the accuracy of the named entity extraction is guaranteed, and the scheme can quickly extract information generated in the call.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for use in the description of the embodiments of the present application, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

Fig. 1 is a flow chart of an embodiment of a method for extracting information in a telephone call according to the present application;

fig. 2 is a schematic structural diagram of an embodiment of an in-phone call information extraction apparatus according to the present application;

FIG. 3 is a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

Referring to fig. 1, a flow diagram of one embodiment of a method for information extraction in a telephone call in accordance with the present application is shown. The method for extracting the information in the telephone call comprises the following steps:

step S100: the telephone call is speech-recognized and a word-embedding vector is extracted from the result of the speech recognition.

Specifically, the word embedding vector is extracted through a BERT network, the input source is a word of a voice recognition result, the output is the word embedding vector through numerical representation, and the extracted information represents the effect of a specific word at a specific position.

Step S200: and inputting the word embedding vector into a pre-trained recognition network to output a probability matrix corresponding to the word embedding vector.

In the real-time case, the recognition network outputs probability vectors from the word-embedding vectors via a recurrent neural network, in particular a memory network, and concatenates them into a probability matrix, which reflects the probability of the occurrence of the respective named entity.

Step S300: the probability matrix is input to the CRF layer to identify the named entities whose words are embedded in the vector.

The probability matrix is input into the CRF layer, which may add some constraints to the last predicted tag to ensure that the predicted tag is legitimate. These constraints may be learned automatically by the CRF layer during training of the training data. Therefore, some named entities which obviously do not conform to the rules are screened out in the extraction process of the named entities.

Step S400: disambiguating the named entity to extract information contained by the named entity.

The embodiment of the application mainly has the following beneficial effects: the method comprises the steps of firstly extracting word embedding vectors, then extracting probability vectors of the word embedding vectors corresponding to named entities to be extracted according to the word embedding vectors through a recognition network and splicing the probability vectors into a probability matrix, then constraining the probability matrix through a CRF layer to prevent the extraction of wrong named entities, then outputting the named entities, and finally extracting information contained in the named entities through disambiguation of the named entities. In the present scheme, the disambiguation process specifically judges the specific time referred by the time word according to logic. The scheme can simplify the extraction process of the named entity, meanwhile, the extraction accuracy of the named entity is guaranteed, and the scheme can quickly extract information generated in the call.

The technical scheme described in the application is particularly suitable for being applied to tasks of confirming and disambiguating time information in the telephone.

Further, the step S201: inputting the word embedding vector into a pre-trained recognition network to output a probability matrix corresponding to the word embedding vector, specifically comprising:

step S201: the word embedding vector is input to a Hybrid Gated considerations layer to obtain a set of first feature vectors.

The Hybrid Gated contributions conform to the Gated convolution network, the feature set is collected through the convolution network, and the accuracy of sign extraction is improved through gating.

Step S202: inputting the set of first feature vectors into a high way BilsTM layer to obtain a set of second feature vectors.

The probability distribution is extracted for the first feature vector by the Highway BilSTM layer.

Step S203: inputting the set of second feature vectors into a Gated Self-orientation layer to obtain a probability matrix.

And extracting the probability of belonging to the named entity for the second feature vector through a self attention mechanism, and splicing a plurality of results to form a probability matrix.

The scheme extracts the probability that the word embedding vector belongs to the named entity through a convolution application network and a bidirectional long-short term memory network in cooperation with a gating and attention mechanism, and the extraction rate of the probability matrix is high and accurate.

Further, the step S201: inputting the word embedding vector into a Hybrid Gated considerations layer to obtain a set of first feature vectors specifically includes:

step S2011: and performing hole convolution on the word embedding vector under the activation function.

Step S2012: and performing cavity gate convolution on the word embedding vector according to the output of the activation function.

Step S2013: the word embedding vectors are gate convolved and the output of the gate convolution with the holes is concatenated in a matrix to obtain a set of first feature vectors.

In particular, the method comprises the following steps of,

wherein Y is_g(X) represents the result of the gate convolution calculation, Y_dg(X) represents the result of the scaled gate convolution empty-gate convolution calculation,

represents a connection operation in which

Y_g(X)＝(X*W+b)⊙(X*V+c)

Y_dg(X)＝X(1-θ)+C₂(X)⊙θ

θ＝σ(C₁(X))

Where X represents the word embedding vector, W and V represent different convolution kernel parameters obtained by training, b and C represent different offsets, C₁And C₂Indicating different scaled constraint hole convolution calculations,. indicates bit-by-bit multiplication calculations of the matrix, and. indicates a sigmoid activation function. According to the scheme, the acquisition precision and speed of the first feature vector are improved through the empty hole convolution and the gating is applied to the convolution.

Further, the step S202: inputting the set of first feature vectors into the high way BilSt layer to obtain a set of second feature vectors specifically includes:

step S2021: the set of first feature vectors is input to BilSTM to obtain a set of intermediate vectors.

Step S2022: and performing gating processing on the group of intermediate vectors to obtain a group of second feature vectors.

In particular, the method comprises the following steps of,

tg＝σ(W_gh+b_g)

z＝tg⊙f(W_hh+b_h)+(1-tg)⊙h

wherein h is the set of first eigenvectors output in step S2013, z is the set of second eigenvectors output in step S2022, tg represents transform gate, and the selection of information obtained by the calculation of the BiLSTM output is controlled to determine the information to be transmitted into the next layer. W_gAnd W_hThe parameters of the full connection layer preset by training. (1-tg) represents a carry gate, controls the selection of the result output by the BilSTM, and determines the result transmitted to the next layer. Thus, h and z have the same shape. According to the scheme, probability distribution of information in the first characteristic vector is extracted through context of the BilSTM combined word embedded vector through gating, and the probability distribution is accurately extracted through less operation.

Further, the step S203: inputting the set of second feature vectors into a Gated Self-orientation layer to obtain a probability matrix specifically includes:

step S2031: an attention parameter is initialized by the second feature vector.

Step S2032: and processing the corresponding second eigenvector according to the attention parameter, and splicing the processed second eigenvector into an intermediate matrix.

Step S2033: and performing gating processing on the intermediate matrix to obtain a probability matrix.

Specifically, the output Z of highwaybill tm is used to initialize Q, K, V in the Self-orientation layer, and the orientation (Q, K, V) is counted:

the intermediate result S calculated thereafter is calculated as follows:

head_i＝Attention(QW_i ^Q,KW_i ^K,VW_i ^V)

the output of this layer is calculated as follows:

O＝W_PP+b_p

the output O is the probability matrix. The specific process is that the second eigenvector is weighted and summed according to the similarity, and the result is spliced. The scheme can improve the extraction precision of the probability matrix.

step S501: and training the extraction network through the voice recognition result and the word embedding vector.

Step S502: and fixing the parameters of the extraction network, and training the recognition network through the output result of the extraction network. The scheme can efficiently and accurately determine the parameters of the extraction network and the identification network, and improves the precision of the information extraction method in the telephone call.

wherein

the label with the highest probability is selected as the label,

to be the probability of a sequence transformation,

wherein O is the probability matrix and T is the transpose matrix of O_yiIs the label i in the label sequence y.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or in turns with other steps or at least a portion of the sub-steps or stages of other steps.

In order to solve the above technical problem, an embodiment of the present application further provides an apparatus for extracting information during a telephone call, which adopts the following technical solutions:

an in-phone-call information extraction apparatus, comprising:

the word embedding vector extraction module 100 is configured to perform voice recognition on a telephone call, and extract a word embedding vector from a result of the voice recognition.

A probability matrix output module 200, configured to input the word embedding vector to a pre-trained recognition network, so as to output a probability matrix corresponding to the word embedding vector.

A named entity extraction module 300 for inputting the probability matrix into the CRF layer to identify named entities with words embedded in vectors.

A disambiguation module 400 for disambiguating the named entity to extract information contained by the named entity.

The embodiment of the application mainly has the following beneficial effects: the method comprises the steps of firstly extracting word embedding vectors, then extracting probability vectors of the word embedding vectors corresponding to named entities to be extracted according to the word embedding vectors through a recognition network and splicing the probability vectors into a probability matrix, then constraining the probability matrix through a CRF layer to prevent the extraction of wrong named entities, then outputting the named entities, and finally extracting information contained in the named entities through disambiguation of the named entities. The scheme can simplify the extraction process of the named entity, meanwhile, the accuracy of the named entity extraction is guaranteed, and the scheme can quickly extract information generated in the call.

Further, the probability matrix output module specifically includes: the device comprises a first feature vector output submodule, a second feature vector output submodule and a probability matrix output submodule.

And the first feature vector output submodule is used for inputting the word embedding vector into a Hybrid Gated constraints layer to obtain a group of first feature vectors.

And the second feature vector output submodule is used for inputting the group of first feature vectors into the high way BilSTM layer to obtain a group of second feature vectors.

And the probability matrix output submodule is used for inputting the group of second feature vectors into the Gated Self-orientation layer to obtain a probability matrix.

Further, the first feature vector output submodule is further configured to:

and performing hole convolution on the word embedding vector under the activation function.

And performing cavity gate convolution on the word embedding vector according to the output of the activation function.

The word embedding vectors are gate convolved and the output of the gate convolution with the holes is concatenated in a matrix to obtain a set of first eigenvectors.

According to the scheme, the acquisition precision and speed of the first feature vector are improved by the hollow convolution and gating is applied to the convolution.

Further, the second feature vector output submodule is further configured to:

the set of first feature vectors is input to BilSTM to obtain a set of intermediate vectors.

According to the scheme, probability distribution of information in the first characteristic vector is extracted through context of the BilSTM combined word embedded vector through gating, and the probability distribution is accurately extracted through less operation.

Further, the probability matrix output submodule is further configured to:

an attention parameter is initialized by the second feature vector.

And processing the corresponding second eigenvector according to the attention parameter, and splicing the processed second eigenvector into an intermediate matrix.

The scheme can improve the extraction precision of the probability matrix.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 3, fig. 3 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 6 comprises a memory 61, a processor 62, a network interface 63 communicatively connected to each other via a system bus. It is noted that only a computer device 6 having components 61-63 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 61 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 6. Of course, the memory 61 may also comprise both an internal storage unit of the computer device 6 and an external storage device thereof. In this embodiment, the memory 61 is generally used for storing an operating system installed in the computer device 6 and various application software, such as a program code of an information extraction method in a telephone call. Further, the memory 61 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 62 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to execute the program code stored in the memory 61 or process data, for example, execute the program code of the X method.

The network interface 63 may comprise a wireless network interface or a wired network interface, and the network interface 63 is typically used for establishing a communication connection between the computer device 6 and other electronic devices.

The present application further provides another embodiment, which is to provide a computer-readable storage medium, where a program of a method for extracting information during a telephone call is stored, and the program of the method for extracting information during a telephone call is executable by at least one processor, so that the at least one processor executes the steps of the method for extracting information during a telephone call.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that the present application may be practiced without these specific details or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A method for extracting information in a telephone call is characterized by comprising the following steps:

2. The method of claim 1, wherein the method comprises: the inputting the word embedding vector into a pre-trained recognition network to output a probability matrix corresponding to the word embedding vector specifically includes:

inputting the set of second feature vectors into a Gated Self-orientation layer to obtain a probability matrix.

3. The method of claim 2, wherein the method comprises: the inputting the word embedding vector into a Hybrid Gated considerations layer to obtain a group of first feature vectors specifically includes:

4. the method of claim 2, wherein the method comprises: the inputting the set of first feature vectors into the high way BilSt layer to obtain the set of second feature vectors specifically includes:

5. The method of claim 2, wherein the method comprises: the inputting the set of second feature vectors into the Gated Self-orientation layer to obtain the probability matrix specifically includes:

initializing an attention parameter by the second feature vector;

6. The method of claim 1, wherein the method comprises: the method for extracting the word embedding vector from the result of the voice recognition and extracting the word embedding vector through a preset extraction network comprises the following steps:

7. The method of claim 1, wherein the method comprises: inputting the probability matrix into a CRF layer to identify named entities with words embedded in vectors, wherein the training mode of the CRF layer specifically comprises the following steps:

wherein

the label with the highest probability is selected as the label,

to be the probability of a sequence transformation,

8. An information extraction device in a telephone call is characterized in that: the method comprises the following steps:

9. A computer device comprising a memory having stored therein a computer program and a processor which when executed implements the steps of a method for efficient invocation of a distributed lock according to any of claims 1-7.

10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, performs the steps of a method for efficient invocation of a distributed lock according to any of claims 1-7.