CN114372140A

CN114372140A - Layered conference abstract generation model training method, generation method and device

Info

Publication number: CN114372140A
Application number: CN202111679303.7A
Authority: CN
Inventors: 陈春丽; 黄震; 孙岩; 罗红
Original assignee: BEIJING MT-HIRISUN INFORMATION TECHNOLOGY DEVELOPMENT CO LTD; Beijing University of Posts and Telecommunications
Current assignee: BEIJING MT-HIRISUN INFORMATION TECHNOLOGY DEVELOPMENT CO LTD; Beijing University of Posts and Telecommunications
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-19

Abstract

The invention provides a training method, a generating method and a device for a hierarchical conference abstract generation model, wherein the method comprises the following steps: acquiring a Chinese conference data set and preprocessing the Chinese conference data set, wherein the preprocessing comprises word segmentation and dictionary establishment, a Chinese conference word list is constructed, the Chinese conference word list is input into a BERT model, and a BERT word vector is output; based on the acquired Chinese conference data set, generating a conversation behavior label of each sentence of the original Chinese conference data by using a bidirectional long-short term memory network and an attention mechanism; and training a pre-established hierarchical conference abstract model by adopting a training sample set formed by the BERT word vectors and the conversation behavior labels to obtain a target hierarchical conference abstract generation model. The invention can generate the layered conference abstract with high fluency, accuracy, readability and heterogeneity.

Description

Layered conference abstract generation model training method, generation method and device

Technical Field

The invention relates to the technical field of automatic generation of a conference abstract, in particular to a training method, a generation method and a device of a layered conference abstract generation model for conversation behavior optimization.

Background

Since the outbreak of new coronavirus epidemic situations all over the world, more and more organizations such as government departments, companies and schools begin to use teleconferencing applications to process daily affairs and develop online teaching, the demands of the whole human society on teleconferencing reach unprecedented heights, and therefore, nailing and Tencent conferences all gain exponential user growth, but with the popularization of more and more online meeting applications, a large amount of multimedia data such as audio and video information, text information and the like are generated. Meeting and non-face-to-face discussion of people in different regions bring new challenges to meeting records and refining of main contents after meeting. It has become an urgent need to extract important content from a large amount of conference session information, i.e. extract a conference summary by means of information technology. Therefore, the technology of automatic meeting summarization is gradually receiving attention.

The automatic meeting abstract can be divided into an abstract and a generative abstract according to an abstract method. The extraction method selects key words and key sentences from the original text to form the abstract, however, the extraction method has the problems of wrong content selection, poor continuity, poor flexibility and the like, and the extracted abstract can not well meet the requirements of people. The generated abstract supports the abstract generation task after the whole content of the conference is understood, new words or phrases can be generated, the flexibility is high, but the problems of repeated generation, poor readability, Out of dictionary range (OOV) and the like exist. In order to solve the problems, Abigail See et al propose a pointer generator network, an algorithm is fused with a copy mechanism and a coverage mechanism, and words are copied from a text to an abstract, so that the OOV problem is effectively relieved; by introducing attention weight and coverage loss, the problem of repeated generation can be solved by avoiding repeated consideration of the obtained high-weight part, but the network only considers global text information and does not consider semantic information and speaking intention of each participant, and heterogeneity among the participants cannot be modeled, so that the network cannot be directly applied to generation of the conference summary.

Compared with common document content, the conference content is more tedious and has a plurality of participants, so that the speech content of each participant and the relationship between the speech of different participants before and after the conference content need to be modeled and understood; in addition, the different semantic styles, positions and roles of each participant contribute to the heterogeneity of the meeting generation summary, which makes it more difficult to train the meeting summary end-to-end. Considering the influence of the speaking intent of the participants on the generation of the abstract, Chih-Wen Goo et al designed a multi-task learning framework and proposed a sentence gating mechanism to model the relationship between the conversation behavior and the conversation summary. However, the model does not model the conference data well, but simply takes the subject information as the abstract, and the generated abstract cannot summarize all the contents of the conference well. In addition, in the field of Chinese conference abstract generation, conference abstract data sets which are open on the internet are all English, and a Chinese conference abstract data set is lacked, so that great difficulty is caused to a Chinese conference abstract generation task.

Disclosure of Invention

Aiming at the problems in the prior art, the invention aims to provide a training method, a generating method and a device for a hierarchical conference abstract generation model oriented to conversation behavior optimization, so as to eliminate or improve one or more defects in the prior art, solve the problem of quality of the existing hierarchical conference abstract generation, and realize heterogeneous and hierarchical conference abstract generation.

One aspect of the present invention provides a dialogue behavior optimization-oriented hierarchical conference abstract generation model training method, which includes the following steps:

acquiring a Chinese conference data set and preprocessing the Chinese conference data set, wherein the preprocessing comprises word segmentation and dictionary establishment, a Chinese conference word list is constructed, the Chinese conference word list is input into a BERT (binary Encoder retrieval from transforms) model, and a BERT word vector is output;

based on the acquired Chinese conference data set, generating a conversation behavior label of each sentence of the original Chinese conference data by using a bidirectional long-short term memory network and an attention mechanism;

and training a pre-established hierarchical conference abstract model by adopting a training sample set formed by the BERT word vectors and the conversation behavior labels to obtain a target hierarchical conference abstract generation model.

In some embodiments of the present invention, the operation of obtaining the chinese conference data set is implemented by one or more of the following: and translating the English conference data set, and summarizing and continuously supplementing the Chinese conference data.

In some embodiments of the present invention, the acquiring and preprocessing the chinese conference data set includes: removing sentences shorter than a preset length; removing punctuation marks in a preset range; performing word segmentation operation; filtering stop words in a preset stop word list range; counting word frequency, and removing words with the word frequency smaller than preset frequency; and establishing a dictionary, mapping each word to an index with a unique identifier, and enabling the words to be in one-to-one correspondence with the indexes.

In some embodiments of the present invention, the step of inputting the chinese conference vocabulary list into the BERT model and outputting the BERT word vector includes: in a BERT input layer, a token embedding layer converts each word of a Chinese conference vocabulary list into a vector with a preset dimension; the segment embedding layer has two vector representations for distinguishing two sentences in a sentence pair, the former vector is to assign 0 to each token in the first sentence, the latter vector is to assign 1 to each token in the second sentence; the position embedding layer encodes the position information of the words into a feature vector and introduces the position relation of the words; then adding the vectors of the token embedding layer, the segment embedding layer and the position embedding layer to obtain an output vector of the BERT output layer; and in the BERT hidden layer, the output vector of a BERT input layer is input into the hidden layer of a BERT model, each hidden layer comprises a converter with a preset number of layers, and the BERT word vector is output.

In some embodiments of the present invention, the step of inputting the chinese conference vocabulary into the BERT pre-training model and outputting the BERT word vector further includes: and (3) inputting the BERT word vector to a full connection layer, outputting the BERT word vector after dimension reduction, wherein each node of the full connection layer is connected with all nodes of the previous layer for integrating the characteristics extracted in the early stage and reserving useful information, so that the BERT word vector is changed from high dimension to low dimension, and the model training speed can be accelerated.

In some embodiments of the present invention, the step of generating conversation behavior labels per sentence of the original chinese conference data by using the bidirectional long-short term memory network and the attention mechanism includes: coding is carried out by utilizing a bidirectional long-short term memory network, the Chinese conference data is input in a sentence sequence format to obtain a forward hidden state and a backward hidden state, the forward hidden state and the backward hidden state are connected to obtain a final hidden state, and the final hidden state is output as a coding result of original conference data; calculating the dialogue action weight by using a Sigmoid activation function and a weight matrix of a feedforward neural network; computing a dialog behavior context vector using the final hidden state and the dialog behavior weight; and calculating to obtain the conversation behavior label of each sentence by using the conversation behavior context vector and the final hidden state.

In some embodiments of the present invention, the step of training the preset hierarchical conference abstract model by using the training sample set composed of the BERT word vectors and the dialogue behavior labels includes: inputting the BERT word vector into a word-level converter to obtain a coding result of each character; splicing the coding result of each character output by the word-level converter with the conversation behavior label of each sentence of the original Chinese conference data, and inputting the splicing result to the wheel-level converter to obtain the coding result of the wheel-level converter; inputting the encoding result output by the wheel-level converter into a decoder, outputting an output sequence in each step of a decoding stage, repeating the process until a special termination symbol is reached, and finally outputting a real number vector by the decoder; and performing linear transformation and Softmax function processing on the real number vector output by the decoder to generate a final conference summary.

In another aspect, the present invention provides a method for generating a summary of a hierarchical conference optimized for conversation activities, where the method includes: acquiring Chinese conference data, and generating a conversation behavior label of each sentence for the original Chinese conference data based on a bidirectional long-short term memory network and an attention mechanism; and inputting the dialogue behavior labels and the trained BERT word vectors into the abstract generation model in the dialogue behavior optimization-oriented hierarchical conference abstract generation model training method so as to output the hierarchical conference abstract.

In another aspect, the present invention provides a hierarchical conference summary generation apparatus optimized for conversation activities, which includes a processor and a memory, wherein the memory stores computer instructions, and the processor is configured to execute the computer instructions stored in the memory, and when the computer instructions are executed by the processor, the apparatus implements the steps of the above method.

The invention discloses a training method, a generating method and a device for a layered conference abstract generation model for conversation behavior optimization, which realize automatic conference abstract generation of Chinese conference data, realize word vector training of the Chinese conference data based on a BERT model, build a model to generate a conversation behavior label, construct a layered conference abstract model and train, and solve the problem of heterogeneity among conference participants by the generated conference abstract.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the specific details set forth above, and that these and other objects that can be achieved with the present invention will be more clearly understood from the detailed description that follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

fig. 1 is a schematic diagram of a dialogue behavior optimization-oriented hierarchical conference summary generation model training method according to an embodiment of the present invention.

FIG. 2 is a diagram illustrating the generation of BERT word vectors using a BERT model according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a fully connected layer structure according to an embodiment of the invention.

FIG. 4 is a diagram illustrating generation of dialog behavior tags according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a hierarchical meeting summary network model according to an embodiment of the present invention.

Fig. 6 is a diagram of a hierarchical conference abstract model for conversation behavior optimization according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.

In order to solve the problem of lack of heterogeneity of the existing Chinese conference summary generation, the problem of lack of Chinese conference summary data sets and the quality problem of Chinese summary generation, the invention provides a training method, a generating method and a device of a layered conference summary generation model oriented to conversation behavior optimization.

Fig. 1 is a schematic diagram of a method for training a hierarchical conference summary generation model for dialogue-behavior optimization according to an embodiment of the present invention, where the method may be implemented by a computer device, and as shown in fig. 1, the method includes the following steps S110 to S130:

step S110, acquiring a Chinese conference data set and preprocessing the Chinese conference data set, wherein the preprocessing comprises word segmentation and dictionary establishment, a Chinese conference vocabulary list is constructed, the Chinese conference vocabulary list is input into a BERT model, and a BERT word vector is output.

In an embodiment of the invention, the Chinese conference data set is obtained by translating the English conference data set, but the original ecological Chinese conference data set can be established by subsequent Chinese conference data expansion. At present, the online sourced conference data sets are all English conference data sets and comprise AMI data sets, ICSI and other data sets, and AMI is an English multi-mode conference data set and can be used for abstract and generative abstract; ICSI is a conference data set in english and can be used to generate a formula summary. The chinese conference data set is constructed by translation. Firstly, calling an API (application programming interface) interface of Google translation to translate an English AMI (advanced metering infrastructure) conference summary data set and an English ICSI (intensive metering infrastructure) conference summary data set into Chinese, compiling a script for processing data, realizing the arrangement of data formats, and optimizing translation contents by manually correcting wrong translations to complete the construction of the Chinese AMI and ICSI data sets. And finally obtaining 184 Chinese conference data, wherein 101 data are used as a training set, 26 data are used as a test set, and 57 data are used as a prediction set. The data set is divided into an abstract part and a conversation part, the abstract part is an abstract of the whole conference, and the conversation part is the conference content corresponding to each character. After the target meeting abstract generation model is built, the Chinese meeting abstract data set can be continuously expanded; the audio information of the daily conference recorded by the online conference platform is converted into text data, the text data is input into a conference abstract generation model to generate an abstract, the abstract is manually participated in fine tuning, and the newly obtained conference data is added into a Chinese data set, so that the continuous expansion of the Chinese conference data set is realized.

In an embodiment of the present invention, the step of preprocessing the chinese conference data set includes removing sentences shorter than a preset length; removing punctuation marks in a preset range; performing word segmentation operation; filtering stop words in a preset stop word list range; counting word frequency, and removing words with the word frequency smaller than preset frequency; and establishing a dictionary, mapping each word to an index with a unique identifier, and enabling the words to be in one-to-one correspondence with the indexes.

In an embodiment of the present invention, preprocessing an original chinese conference data set is performed by using a third-party word segmentation tool (jieba), which specifically includes: the method comprises the steps of removing sentences with the length shorter than 3, wherein the short sentences often have no practical meaning and influence the result of subsequent word segmentation; removing punctuations in a preset range, wherein the removed punctuations range is set artificially, and the punctuations are completed by means of a jieba third-party tool in order to reduce the interference of the punctuations on word segmentation; the word segmentation operation is also finished by means of a jidba third-party tool, and a word bank for word segmentation can be further artificially corrected subsequently, so that the word segmentation accuracy is further improved; filtering stop words, wherein the stop words refer to that in the information retrieval, in order to save storage space and improve search efficiency, before or after natural language data is processed, certain characters or words can be automatically filtered, such as virtual words, pronouns or verbs and nouns without specific meanings in sentences, and the removal does not influence the understanding of the semantics of the whole sentence; counting word frequency, and extracting the occurrence frequency of each word in all sentences by using a Counter library in the invention; according to the statistical word frequency information, removing words with frequency lower than the set frequency, in some embodiments of the present invention, low-frequency words ranked after a preset ranking can also be removed by setting; the dictionary establishment is to map each word to an index of a unique identifier for the remaining words, so that the words and the indexes are in one-to-one correspondence.

In a Natural Language Processing (NLP) method based on a deep neural network, characters/words in a text are generally represented by one-dimensional vectors, which are generally called "word vectors", and in the embodiment of the present invention, called BERT word vectors indicate that the word vectors are obtained based on BERT model training; on the basis, the neural network takes one-dimensional word vectors of all characters or words in the text as input, and outputs one-dimensional text vector as semantic representation of the text after a series of complex conversions. The distance of characters/words with similar semantics on the feature vector space is also relatively close, and text vectors converted from the character/word vectors can also contain more accurate semantic information. The input of the BERT model is the original word vector of each character/word in the text, and the output is the vector representation of each character/word in the text after full-text semantic information is fused.

In the embodiment of the invention, the training process of the word vector or called BERT word vector is the process of imitating language learning. Specifically, 15% of the vocabulary in a sentence is randomly selected for prediction. For the words erased in the original sentence, a special symbol [ MASK ] is used for replacing the words with 80% probability, an arbitrary word is used for replacing 10% probability, and the rest 10% probability keeps the original words unchanged. The main reasons for doing so are: the [ MASK ] tag does not appear in the statement in subsequent fine-tuning tasks, and another benefit of doing so is: when a vocabulary is predicted, the model does not know whether the vocabulary in the input corresponding position is the correct vocabulary or not, and the correct probability of the vocabulary position in the model is 10 percent, so that the model has more error correction capability.

Fig. 2 is a schematic diagram of generating a BERT word vector using a BERT model according to an embodiment of the present invention, where the process includes the following steps:

in the BERT input layer, text X is input₁，X₂，……X_nA Token embedding layer (Token Embeddings) converts each word of the Chinese conference vocabulary list into a vector with a preset dimension, wherein the preset dimension is a vector with 768 dimensions in the embodiment of the invention; the Segment embedding layer (Segment embedding) has two vector representations for distinguishing two sentences in a sentence pair, the former vector is to assign 0 to each token in the first sentence, the latter vector is to assign 1 to each token in the second sentence; the Position embedding layer (Position embedding) encodes the Position information of the words into a feature vector and introduces the Position relation of the words; and then adding the vectors of the token embedding layer, the segment embedding layer and the position embedding layer to obtain an output vector of the BERT output layer. In the BERT hidden layer, the output vector of a BERT input layer is input into the hidden layer of a BERT model, each hidden layer comprises a converter with a preset number of layers, and B is outputERT word vectors.

The BERT word vectors are input to a full connection layer, a schematic diagram of a full connection layer structure is shown in FIG. 3, the BERT word vectors after dimension reduction are output, each node of the full connection layer is connected with all nodes of the previous layer and used for integrating the extracted features in the early stage and keeping useful information, the BERT word vectors are changed from high dimension to low dimension, the calculation burden can be effectively reduced, and the model training speed is accelerated. The derivation process of dimension reduction of the BERT word vector at the full connection layer is as follows:

h₁＝W₁₁·X₁+W₁₂·X₂+W₁₃·X₃+..+W_1n·X_n；

h₂＝W₂₁·X₁+W₂₂·X₂+W₂₃·X₃+..+W_2n·X_n；

h_m＝W_m1·X₁+W_m2·X₂+W_m3·X₃+..+W_mn·X_n；

wherein h is₁，h₂，……h_mRepresenting the BERT word vector, X, obtained after dimensionality reduction₁，X₂，……X_nThe BERT word vector representing the original high dimension is realized by the derivation process.

In an embodiment of the invention, a text obtained by preprocessing is input into a BERT model, token embedding, segment embedding and position embedding are carried out, and feature vectors obtained by 3 embedding are summed to obtain an output vector of a BERT input layer. The output vectors of the input layer are input to the hidden layer of the BERT model. Each hidden layer is composed of a converter (Transformer), 12 hidden layers are used, and the implementation mode is that 12 times of operations of the hidden layers are circulated, and word vectors generated by a BERT preprocessing model are obtained.

And step S120, generating a conversation behavior label of each sentence of the original Chinese conference data by utilizing a bidirectional long-short term memory network and an attention mechanism based on the acquired Chinese conference data set.

Dialog Acts (DA) are semantic labels of utterances, meaning the intent of the participants in the dialog when speaking these utterances, and are essential for understanding the dialog. Most intentions of a speaker are expressed explicitly or implicitly by social behaviors related to the utterance (e.g., question/request/consent or rejection). Therefore, in order to fully utilize the conversation behavior information, the invention constructs the conversation behavior tag to be used for predicting the conversation behavior of the utterance, and assists in modeling the relationship between the conversation behavior and the conference abstract, thereby further exploring the spatial heterogeneity among different participant viewpoints to promote the abstract generation model. In order to fully utilize the dialogue behavior information, effectively integrate the dialogue behavior information into a hierarchical conference abstract generation model, the part is used for predicting the dialogue behaviors of all utterances, generating a dialogue behavior tag of each sentence, and then adding the dialogue behavior tag into a hierarchical conference abstract network to assist the generation of the abstract.

FIG. 4 is a schematic diagram of generating a dialog behavior tag according to an embodiment of the present invention, which is encoded by using a Bi-directional Long Short-Term Memory (Bi-LSTM), and the Chinese conference data is input in a sentence sequence format to obtain a forward hidden state

And a backward hidden state

Connecting the forward hidden state and the backward hidden state to obtain the final hidden state

And outputting the final hidden state as an encoding result of the original conference data, wherein the expression is as follows:

calculating conversation behavior weight by using Sigmoid activation function and weight matrix of feedforward neural network

The expression is as follows:

computing a dialog behavior context vector using the final hidden state and the dialog behavior weights

The expression is as follows:

calculating to obtain the dialog behavior label of each sentence by using the dialog behavior context vector and the final hidden state

The expression is as follows:

wherein

Is the dialog behavior tag for the ith sentence,

is a weight matrix.

And step S130, training a pre-established hierarchical conference abstract model by adopting a training sample set formed by the BERT word vectors and the conversation behavior labels to obtain a target hierarchical conference abstract generation model.

The conversation behavior tag and the hierarchical conference abstract network are built by adopting a Pythrch frame, conference data are usually long, and the problem of memory overflow can be caused because a conventional converter (Transformer) is not feasible to be directly applied. Considering that the conference content is multi-round, the invention adopts a layered structure to realize the generation of the abstract, and a layered conference abstract network model consists of a word level converter (word level Transformer), a round level converter (round level Transformer) and a Decoder (Decoder). The word-level converter performs character-level understanding within each round, and the round-level converter performs multiple rounds of level understanding throughout the conference. In the abstract generation, attention is focused on two levels of understanding, so that the calculation burden can be effectively reduced, and the training speed is accelerated.

Fig. 5 is a schematic diagram of a hierarchical conference abstract network model according to an embodiment of the present invention, which is built using a Transformer (Transformer) architecture, and includes two layers of Transformer and encoder structures, a word level Transformer and a round level Transformer. The Transformer consists of two parts, an Encoder and a Decode, wherein the Encoder and the Decode both contain 6 blocks. The Encoder consists of a position code, a multi-head attention mechanism, a residual error and normalization layer and a feedforward neural network, and the structure of the decoder is similar to that of the Encoder.

The first layer is a character-level converter, and the BERT word vectors are input into the character-level converter to obtain the coding result of each character. The word vector generated by the BERT model is used as the input of a word-level converter, position coding is carried out to obtain the position information of each word, then a multi-head attention value is obtained through a multi-head attention mechanism, then a residual error sum normalization layer (Add & Norm) is input, the corresponding positions of the input and the output of the multi-head attention mechanism module are subjected to addition operation, finally the added positions are output to a feedforward neural network, the residual error sum normalization layer (Add & Norm), and nonlinear transformation of data is carried out to obtain the code of each character. The meeting data of a round is processed in the meeting, namely all character data of a participant, and each character is coded in the round by using a trainable embedding matrix, wherein the expression is as follows:

Word-Transformer({x_i，0，...，x_i，n})＝{y_i，0，...，y_i，n}

wherein x_i，nIs the nth character of the ith sentence.

Position coding refers to adding position information to the input vector so that the model knows the position information for each word. In the transform, the sine-cosine wave is used to calculate the position information, which is specifically as follows:

PE_(pos，2i)＝sin(pos/10000^2i/d)

PE_(pos，2i+1)＝cos(pos/10000^2i/d)

where pos represents the position of a word in a sentence and i represents the ith sentence.

The multi-head attention mechanism is to calculate the attention of all words in a sentence to form a plurality of subspaces, so that the model can pay attention to different information. Q, K, V are projected through n different linear transformations, different attentions (Q, K, V) are spliced, and then a matrix is used for multiplication, so that a final multi-head Attention value is obtained, and the expression is as follows:

MultiHead(Q，K，V)＝Concat(head₁，...，head_n)·W⁰

head_i＝Attenion(QW_i ^Q，KW_i ^K，VW_i ^V)

the residual error is a means for solving the problem of gradient explosion/gradient dissipation, and the corresponding positions of the input and the output of the multi-head attention mechanism module are added. The normalization layer limits the data within a certain range, eliminates adverse effects caused by singular sample data and ensures the stability of training.

The feedforward neural network is responsible for providing the nonlinear transformation of the last output to the next layer of input. One converter module can generate an embedded output that is the same dimension as the input. Thus, a plurality of converter modules may be sequentially stacked to form a converter network, the expression for which is as follows:

Transformer({x₁，...，x_n})＝{y₁，....，y_n}

and the second layer is a wheel level converter, and the output of the word level converter and the dialogue behavior label generated by the dialogue behavior label device are spliced to be used as the input of the wheel level converter to obtain the coding result of the wheel level converter. The output of the top position Encoder (Encoder) is then transformed into a set of attention vectors comprising vectors K (key vectors) and V (value vectors). These vectors will be used by each decoder for its own "Self-Attention" layer, which may help the decoder to focus on important positions of the input sequence. Processing all m-wheel conversation data in a one-time conference, splicing the output of a word level converter and a conversation behavior label generated by a conversation behavior labeler by combining conversation behavior information to be used as the input of the wheel level converter, and coding the m-wheel conversation data, wherein in the embodiment of the invention, the value of m is 4. The m-wheel speech data are subjected to coding processing, and the expression is as follows:

wherein y is_m,0For the output of the mth sentence encoded by the word level converter,

a dialog behavior tag generated for the mth sentence by the dialog behavior tagger.

And inputting the encoding result output by the wheel-level converter into a decoder, outputting an output sequence in each step of the decoding stage, repeating the process until a special termination symbol is reached, and finally outputting a real number vector by the decoder. And inputting the coding result into a decoder layer to obtain a real number vector. In one embodiment of the invention, the output of each time step is provided to a bottom Decoder (Decoder) at the next time step, first embedding a word vector and adding a position code as input to the Decoder to represent the position of each word. Secondly, with Self-Attention layers (Self-Attention), the Self-Attention layers in the decoder behave differently from the encoder in terms of the mode they represent: in the decoder, the self-attention layer is only allowed to process those positions further forward in the output sequence. Then, before the step of Softmax function, it gives the following positions to the hidden, and finally the decoder part outputs a real number vector.

And performing linear transformation and Softmax function processing on the real number vector output by the decoder to generate a final conference summary. The real vector is then converted to a log-probability vector through a Linear transformation layer (Linear), a simple fully-connected neural network that converts the decoder-generated vector to a much larger vector called log-probability (logits) that contains a sentence-length vector of cells, each corresponding to the score of a word. Finally, processed by the Softmax function, the Softmax function will change those scores to probability values of positive numbers with an upper limit of 1.0. The cell with the highest probability value is selected and its corresponding word is output as this time step to generate the final meeting summary.

Fig. 6 is a diagram of a hierarchical conference abstract model for dialogue behavior optimization according to an embodiment of the present invention, in which dialogue behavior tags are generated by a dialogue behavior tag device, a trained BERT word vector is input to a word-level converter, then the output of the word-level converter and the dialogue behavior tags are input to a round-level converter, the output result of the round-level converter is input to a decoder, and a conference abstract result is obtained through conversion.

In an embodiment of the present invention, a method for generating a summary of a layered conference for conversation behavior optimization is provided, which is established on the basis of a trained generation model of the summary of the layered conference, and specifically includes the following steps: acquiring Chinese conference data, and generating a conversation behavior label of each sentence for the original Chinese conference data based on a bidirectional long-short term memory network and an attention mechanism; and inputting the dialogue behavior labels and the trained BERT word vectors into the abstract generation model in the dialogue behavior optimization-oriented hierarchical conference abstract generation model training method in the embodiment to output the hierarchical conference abstract.

Correspondingly to the method, the invention also provides a hierarchical conference summary generation device oriented to conversation behavior optimization, which comprises a computer device, wherein the computer device comprises a processor and a memory, the memory stores computer instructions, the processor is used for executing the computer instructions stored in the memory, and when the computer instructions are executed by the processor, the device realizes the steps of the method.

Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the foregoing steps of the edge computing server deployment method. The computer readable storage medium may be a tangible storage medium such as Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, floppy disks, hard disks, removable storage disks, CD-ROMs, or any other form of storage medium known in the art.

Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein may be implemented as hardware, software, or combinations of both. Whether this is done in hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link.

It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.

Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments in the present invention.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A hierarchical conference abstract generation model training method for conversation behavior optimization is characterized by comprising the following steps:

acquiring a Chinese conference data set and preprocessing the Chinese conference data set, wherein the preprocessing comprises word segmentation and dictionary establishment, a Chinese conference word list is constructed, the Chinese conference word list is input into a BERT model, and a BERT word vector is output;

2. The method of claim 1, wherein the operation of obtaining the chinese conference data set is performed in one or more of the following manners: and translating the English conference data set, and summarizing and continuously supplementing the Chinese conference data.

3. The method of claim 1, wherein the step of obtaining and pre-processing chinese conference data sets comprises:

removing sentences shorter than a preset length;

removing punctuation marks in a preset range;

performing word segmentation operation;

filtering stop words in a preset stop word list range;

counting word frequency, and removing words with the word frequency smaller than preset frequency;

and establishing a dictionary, mapping each word to an index with a unique identifier, and enabling the words to be in one-to-one correspondence with the indexes.

4. The method of claim 1, wherein the step of inputting the chinese conference vocabulary list into the BERT model and outputting BERT word vectors comprises:

in a BERT input layer, a token embedding layer converts each word of a Chinese conference vocabulary list into a vector with a preset dimension; the segment embedding layer has two vector representations for distinguishing two sentences in a sentence pair, the former vector is to assign 0 to each token in the first sentence, the latter vector is to assign 1 to each token in the second sentence; the position embedding layer encodes the position information of the words into a feature vector and introduces the position relation of the words; then adding the vectors of the token embedding layer, the segment embedding layer and the position embedding layer to obtain an output vector of the BERT output layer;

and in the BERT hidden layer, the output vector of a BERT input layer is input into the hidden layer of a BERT model, each hidden layer comprises a converter with a preset number of layers, and the BERT word vector is output.

5. The method of claim 4, wherein the step of entering the Chinese conference vocabulary into a BERT pre-training model and outputting a BERT word vector further comprises:

and (3) inputting the BERT word vector to a full connection layer, outputting the BERT word vector after dimension reduction, wherein each node of the full connection layer is connected with all nodes of the previous layer for integrating the characteristics extracted in the early stage and reserving useful information, so that the BERT word vector is changed from high dimension to low dimension, and the model training speed can be accelerated.

6. The method of claim 1, wherein the step of generating conversation activity labels per sentence for the original chinese conference data using a two-way long-short term memory network and attention mechanism comprises:

coding is carried out by utilizing a bidirectional long-short term memory network, the Chinese conference data is input in a sentence sequence format to obtain a forward hidden state and a backward hidden state, the forward hidden state and the backward hidden state are connected to obtain a final hidden state, and the final hidden state is output as a coding result of original conference data;

calculating the dialogue action weight by using a Sigmoid activation function and a weight matrix of a feedforward neural network;

computing a dialog behavior context vector using the final hidden state and the dialog behavior weight;

and calculating to obtain the conversation behavior label of each sentence by using the conversation behavior context vector and the final hidden state.

7. The method according to claim 1, wherein the step of training the preset hierarchical conference abstract model by using the training sample set composed of the BERT word vectors and the dialogue behavior labels comprises:

inputting the BERT word vector into a word-level converter to obtain a coding result of each character;

splicing the coding result of each character output by the word-level converter with the conversation behavior label of each sentence of the original Chinese conference data, and inputting the splicing result to the wheel-level converter to obtain the coding result of the wheel-level converter;

inputting the encoding result output by the wheel-level converter into a decoder, outputting an output sequence in each step of a decoding stage, repeating the process until a special termination symbol is reached, and finally outputting a real number vector by the decoder;

and performing linear transformation and Softmax function processing on the real number vector output by the decoder to generate a final conference summary.

8. A hierarchical conference abstract generation method oriented to conversation behavior optimization is characterized by comprising the following steps:

acquiring Chinese conference data, and generating a conversation behavior label of each sentence for the original Chinese conference data based on a bidirectional long-short term memory network and an attention mechanism;

inputting the dialog behavior labels and the trained BERT word vectors into the abstract generation model in the dialog behavior optimization-oriented hierarchical conference abstract generation model training method according to any one of claims 1 to 7 to output the hierarchical conference abstract.

9. A hierarchical conference summary generation apparatus optimized for conversational behaviour, comprising a processor and a memory, wherein the memory has stored therein computer instructions for executing the computer instructions stored in the memory, and wherein the apparatus implements the steps of the method as claimed in claim 8 when the computer instructions are executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.