CN116957056A

CN116957056A - Feedback-based model training method, keyword extraction method and related equipment

Info

Publication number: CN116957056A
Application number: CN202311199088.XA
Authority: CN
Inventors: 余梓飞; 朵思惟; 刘双勇; 张程华; 薛晨云; 张艳丽
Original assignee: Tianjin Huizhi Xingyuan Information Technology Co ltd
Current assignee: Tianjin Huizhi Xingyuan Information Technology Co ltd
Priority date: 2023-09-18
Filing date: 2023-09-18
Publication date: 2023-10-27
Anticipated expiration: 2043-09-18
Also published as: CN116957056B

Abstract

The application provides a feedback-based model training method, a keyword extraction method and related equipment, wherein the model training method comprises the following steps: acquiring an event description text, respectively inputting the event description text into a first language model and a second language model which are constructed in advance, outputting a first probability distribution and recommended keywords through the first language model, and outputting a second probability distribution through the second language model, wherein the first language model and the second language model are used for extracting the keywords in the event description text. Based on the first probability distribution, the second probability distribution and the recommended keywords, constructing a total loss function by adopting a near-end strategy optimization algorithm; the total loss function is minimized to update model parameters of the first language model. The model training method can enable the recommended keywords to be more in line with human preference.

Description

Feedback-based model training method, keyword extraction method and related equipment

Technical Field

The application relates to the technical field of deep learning, in particular to a feedback-based model training method, a keyword extraction method and related equipment.

Background

The keyword extraction technology is one of the key research objects in the field of natural language processing, and the extracted keywords can clearly express the meaning of the text although the structure is short. Common keyword extraction models aim at extracting meaningful keywords, but the incorrectly extracted keywords are ignored and can be used as feedback signals to guide the models to perform self-improvement. In addition, how to make the output of the model more humanized, the extracted keywords more conform to the preferences of human beings is also a problem to be solved.

Disclosure of Invention

Therefore, the present application is directed to a feedback-based model training method, a keyword extraction method and related devices, so as to solve the problem that the keyword extraction does not conform to the human preference.

Based on the above object, a first aspect of the present application provides a feedback-based model training method, including:

acquiring an event description text;

respectively inputting the event description text into a first language model and a second language model which are constructed in advance, outputting a first probability distribution and recommended keywords through the first language model, and outputting a second probability distribution through the second language model, wherein the first language model and the second language model are used for extracting the keywords in the event description text;

based on the first probability distribution, the second probability distribution and the recommended keywords, constructing a total loss function by adopting a near-end strategy optimization algorithm;

the total loss function is minimized to update model parameters of the first language model.

Optionally, the constructing a total loss function by using a near-end policy optimization algorithm based on the first probability distribution, the second probability distribution and the recommended keyword includes:

scoring the recommended keywords according to the event description text to obtain keyword scores;

calculating relative entropy based on the first probability distribution and the second probability distribution;

and constructing the total loss function by adopting a near-end strategy optimization algorithm based on the keyword scores and the relative entropy.

Optionally, the constructing the total loss function by adopting a near-end policy optimization algorithm based on the keyword score and the relative entropy includes:

determining the total loss function by：

wherein ,representing policy penalty function, ++>Representing adjustable parameters->And representing a cost loss function, wherein the strategy loss function and the cost loss function are determined based on the keyword scores, the relative entropy and a first score vector corresponding to the first probability distribution.

Optionally, the policy loss function is determined by:

wherein ,representing a dominance function, said dominance function being determined based on said keyword score and said relative entropy>Update amplitude indicative of said first probability distribution,/->Representing a clipping function, < >>Representing an adjustable parameter.

Optionally, the cost loss function is determined by:

wherein ,representing the updated first score vector, < >>Representing a first score vector, ">Representing a dominance function, said dominance function being determined based on said keyword score and said relative entropy>Representing a clipping function, < >>Representing an adjustable parameter.

Optionally, the first language model and the second language model are the same, and each of the first language model and the second language model includes a multi-layer transducer encoder; the first probability distribution and the second probability distribution are calculated based on the attention weights of a penultimate layer of the multi-layer transform encoders; the model parameters of the second language model are in a frozen state.

A second aspect of the present application provides a keyword extraction method applied to the first language model of the first aspect, where the method includes:

acquiring a target text;

and inputting the target text into the first language model, and outputting a target keyword corresponding to the target text through the first language model.

A third aspect of the present application provides a feedback-based model training apparatus comprising:

the acquisition module is configured to acquire event description text;

the output module is configured to input the event description text into a first language model and a second language model which are built in advance respectively, output first probability distribution and recommended keywords through the first language model, and output second probability distribution through the second language model, wherein the first language model and the second language model are used for extracting the keywords in the event description text;

the construction module is configured to construct a total loss function by adopting a near-end strategy optimization algorithm based on the first probability distribution, the second probability distribution and the recommended keywords;

an updating module configured to minimize the total loss function to update model parameters of the first language model.

A fourth aspect of the application provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to the first or second aspect when executing the program.

A fifth aspect of the application provides a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of the first or second aspect.

From the above, it can be seen that the feedback-based model training method, the keyword extraction method and the related devices provided by the application, the model training method comprises: acquiring an event description text, respectively inputting the event description text into a first language model and a second language model which are constructed in advance, outputting a first probability distribution and recommended keywords through the first language model, and outputting a second probability distribution through the second language model, wherein the first language model and the second language model are used for extracting the keywords in the event description text. Based on the first probability distribution, the second probability distribution and the recommended keywords, constructing a total loss function by adopting a near-end strategy optimization algorithm; the total loss function is minimized to update model parameters of the first language model. According to the model training method, low-level punishment can be carried out on the keywords which are output by the first language model and do not accord with the event description text through the near-end strategy optimization algorithm, so that model parameters of the first language model are continuously optimized through the reinforcement learning algorithm, the quality of recommended keywords extracted by the model is improved, and the recommended keywords are more in accordance with human preference.

Drawings

In order to more clearly illustrate the technical solutions of the present application or related art, the drawings that are required to be used in the description of the embodiments or related art will be briefly described below, and it is apparent that the drawings in the following description are only embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort to those of ordinary skill in the art.

FIG. 1 is a flow chart of a feedback-based model training method according to an embodiment of the present application;

FIG. 2 is a flow chart of a feedback-based model training method according to another embodiment of the present application;

FIG. 3 is a schematic diagram of a feedback-based model training apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a keyword extraction device according to an embodiment of the present application;

fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The present application will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present application more apparent.

It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present application should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present application belongs. The terms "first," "second," and the like, as used in embodiments of the present application, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

The social management relates to a plurality of aspects such as community management, environmental protection, city planning and the like, and is mainly used for solving social problems and resolving social contradictions, thereby creating a safer and harmonious social environment. In recent years, advanced scientific technology such as big data and artificial intelligence is actively applied in the field of social treatment to enrich intelligent treatment means under different application scenes, and an effective pushing effect is achieved for realizing transformation and upgrading of treatment modes and improvement of treatment efficiency.

With the continuous expansion of the online treatment platform, the text data volume produced by the social treatment field every day is greatly increased. However, the value information contained in the mass data needs to be mined through intelligent text processing technology. The keyword extraction technology is one of the key research objects in the field of natural language processing, and the extracted keywords can clearly express the meaning of the text although the structure is short.

Common keyword extraction methods include TF-IDF (Term Frequency-Inverse Document Frequency, term Frequency inverse text Frequency index) based on word Frequency statistics, LSA (Latent Semantic Analysis ) and LDA (Latent Dirichlet Allocation, hidden dirichlet distribution) based on topic modeling, bertopac topic modeling technology based on deep language model, keyBert keyword extraction technology, and the like. These models are directed to extracting meaningful keywords, but ignoring the erroneously extracted keywords may also serve as feedback signals to guide the model to self-refine. In addition, how to make the output of the model more humanized, the extracted keywords more conform to the preferences of human beings is also a problem to be solved.

In view of this, the present application proposes a feedback-based model training method, which introduces a human feedback-based reinforcement learning RLHF (Reinforcement Learning from Human Feedback) technique into a transducer-based encoder model, and fine-adjusts model parameters of the encoder model by an evaluation of an output result and a reinforcement learning algorithm through manual feedback, so that the model can obtain a satisfactory result in keyword extraction.

Embodiments of the present application are described in detail below with reference to the accompanying drawings.

The application provides a feedback-based model training method, which comprises the following steps with reference to fig. 1:

step 102, acquiring an event description text.

Specifically, the event description text in this embodiment may be an unstructured social management event, and the exemplary social management event relates to common social problems such as disputes, reimbursements, environments, marital feelings, and the like. The event description text may be obtained through a relevant platform or web crawling, and the path of obtaining the event description text is not particularly limited herein.

Step 104, the event description text is respectively input into a first language model and a second language model which are constructed in advance, a first probability distribution and recommended keywords are output through the first language model, and a second probability distribution is output through the second language model, wherein the first language model and the second language model are used for extracting the keywords in the event description text.

Specifically, before the event description text is input into the language model, cleaning operations need to be performed on the event description text, for example, duplicate events, invalid formats and the like are deleted, so that the quality of the event description text is improved. The first language model and the second language model are models for keyword extraction of event description text. The first language model and the second language model selected in this embodiment are both a BERT-base-Chinese model, and the model adopts a basic BERT architecture, including a 12-layer transducer encoder, each layer having 12 self-attention heads, and having 110M parameters in total. The BERT-base-Chinese model contains a multi-head attention mechanism, so that the semantics of sentences can be understood from a deeper level, and by utilizing the characteristics, keywords which can reflect the subjects of the text sentences can be extracted. After the event description text is input into the first language model, a first score vector is output through a transducer encoder of the first language modelWhere N represents the total number of characters in the sentence. Normalizing the first score vector by softmax to obtain a first probability distributionProbability +.>Representing the probability corresponding to the ith character, and taking the character corresponding to the maximum probability value as a candidate character. Determining in sentencesAfter the candidate characters, sentences of the event description text may be segmented using a chinese segmentation tool to determine recommended keywords. The sentence of the event description text is "resident reaction cell entrance has garbage", the word may be "resident/reaction/cell/entrance/existence/garbage" after the word segmentation, and if the candidate character determined by the first language model is "garbage", the word "garbage" is determined as the recommended keyword. Similarly, a second scoring vector is output by a transducer encoder of the second language model>Normalization of the second score vector by softmax yields a second probability distribution +.>。

And 106, constructing a total loss function by adopting a near-end strategy optimization algorithm based on the first probability distribution, the second probability distribution and the recommended keywords.

Specifically, the near-end policy optimization (Proximal Policy Optimization, PPO) algorithm is a variation of the policy gradient algorithm, belonging to the reinforcement learning algorithm. The first language model can be helped to better guide own decisions through the near-end strategy optimization algorithm, model parameters are adjusted, and the output result of the model is optimized. In this step, a near-end policy optimization algorithm is used to construct a total loss function based on the first probability distribution, the second probability distribution, and the recommended keywords.

Step 108, minimizing the total loss function to update model parameters of the first language model.

Specifically, the total loss function is minimized through back propagation, the enhancement of the first language model is completed, and the quality of the recommended keywords is improved.

Based on the steps 102 to 108, the present embodiment provides a feedback-based model training method, which includes: acquiring an event description text, respectively inputting the event description text into a first language model and a second language model which are constructed in advance, outputting a first probability distribution and recommended keywords through the first language model, and outputting a second probability distribution through the second language model, wherein the first language model and the second language model are used for extracting the keywords in the event description text. Based on the first probability distribution, the second probability distribution and the recommended keywords, constructing a total loss function by adopting a near-end strategy optimization algorithm; the total loss function is minimized to update model parameters of the first language model. According to the model training method, low-level punishment can be carried out on the keywords which are output by the first language model and do not accord with the event description text through the near-end strategy optimization algorithm, so that model parameters of the first language model are continuously optimized through the reinforcement learning algorithm, the quality of recommended keywords extracted by the model is improved, and the recommended keywords are more in accordance with human preference.

In some embodiments, the constructing the total loss function using a near-end policy optimization algorithm based on the first probability distribution, the second probability distribution, and the recommended keyword includes:

Specifically, when the recommended keywords are scored, the semantics of the event description text need to be considered, if the recommended keywords output through the first language model are consistent with the semantics of the event description text, the score of the scored keywords is 1, and if the recommended keywords are inconsistent with the semantics of the event description text, the score of the scored keywords is 0, and the low-score penalty is used for the first language model. So that the first language model can continuously optimize model parameters through reinforcement learning according to a scored feedback mechanism. The relative entropy, also known as KL (Kullback-Leibler divergence) divergence, or information divergence, is a measure of asymmetry of the difference between two probability distributions (probability distribution). The first probability distribution and the second probabilityDifference between distributions as KL divergence. After the KL divergence and the keyword score are obtained, a total loss function is constructed based on the KL divergence and the keyword score.

In some embodiments, the constructing the total loss function using a near-end policy optimization algorithm based on the keyword score and the relative entropy includes:

determining the total loss function by：

Specifically, a total loss function is constructedThe method comprises the following steps:

1) Calculated value

In this embodiment, the first language model is approximated as a cost function. The first language model is used as an encoder to obtain semantic embedment corresponding to each character in the sentence. The encoder is connected with a linear layer softmax, the input of the linear layer softmax is an embedded vector generated by the encoder, and the output is a one-dimensional scalar value. Thus, N areThe sentence of the event description text composed of characters is input into the first language model to obtain a first score vector, namely a value vector. In the keyword extraction task scenario, we consider the assignment of attention weight to each character by the first language model as one action. If a sentence contains N characters, the model performs N actions during the whole sentence's attention giving process. The cost function is an assessment of how well the model performs, i.e., whether the model is weighted appropriately for the current character, thereby helping the model to better guide its decision.

2) Calculating a reward function

Each action of the model will have a corresponding instant prize. Further, in the present embodiment, the keyword is scoredThe level of the keyword score corresponds to whether the first language model finally focuses attention on the characters contained in the keywords, which is regarded as the final reward of the attention giving process. Reward function->Defined by formula (2): /> （2）

wherein ,is the divergence of the first probability distribution subtracted from the second probability distribution, +.>Is a value of a parameter that is adjustable,is a keyword score.

3) Calculating a dominance function

In updating the strategy, the dominance function needs to be used to calculate the total loss function. The merit function is a function that measures the goodness of the current state and motion with respect to the average level. The larger the value of the dominance function, the more excellent the current state and action, and the greater the rewards should be. The advantage function is used for improving the stability of strategy updating and avoiding unstable optimization process caused by excessively intense strategy updating. When calculating policy losses in the near-end policy optimization algorithm, the use of the dominance function can help control clipping amplitude, thereby limiting the amplitude of policy updates. The dominance function is determined by equation (3):

（3）

wherein ,，/> and />Is an adjustable parameter value,/->Representing the number of characters contained in the sentence, +.>Indicate->Characters-> and />Respectively represent bonus functions->And value vector->The%>The elements.

4) Calculating a policy loss function

The application needs to update the parameters of the first language model, and hopes that the updating amplitude of the strategy in the updating process is within a certain range, and the amplitude is usedRepresentation, i.e.)>。/>First probability distribution generated for current sentence by updated first language model after minimizing policy loss function for back propagation, ++>A first probability distribution generated for a first language model that implements the old policy. The policy penalty function is determined by equation (4):

（4）

wherein ,vector +.>The value of (2) is limited to-> and />Between, in order to avoid too large a policy update amplitude, +.>Is an adjustable parameter value.

5) Calculating a cost loss function

Cost loss functionDetermined by formula (5):

（5）

wherein ,value vector generated for executing the latest cost function,/-, for example>Is an adjustable scalar parameter value.

6) Calculating the total Loss function Loss

The total loss function is determined by equation (6):

（6）

wherein ,representing adjustable parameters to balance the size of the policy loss function and the cost loss function. Enhancement of the first language model and fitting of the cost function are accomplished by back-propagation to minimize the total loss function.

In summary, the construction of the total loss function is completed through the steps 1) to 6).

In some embodiments, the first language model and the second language model are the same, each comprising a multi-layer transducer encoder; the first probability distribution and the second probability distribution are calculated based on the attention weights of a penultimate layer of the multi-layer transform encoders; the model parameters of the second language model are in a frozen state.

Specifically, in this embodiment, the first language model and the second language model are the same language model, the model parameters of the first language model are in a continuously updated state, and the model parameters of the second language model are in a frozen state, that is, are not updated. The first language model is used as a lifting model, and the second language model is used as an initial model. And taking the second probability distribution of the output of the second language model as a reference to avoid that the first probability distribution of the output after the updating of the first language model deviates from the understanding of the initial model on the event statement. In this embodiment, the first language model and the second language model each include a multi-layer transducer encoder. In the practical process, the penultimate layer of transformers in the language model are found to be more concentrated, and the attention of other layers is more dispersed, so that the attention weight of the penultimate layer of transformers is selected to determine the keywords of the sentence. The attention weights of the attention heads in the layer are then added, and the attention of the other characters to which each character is subjected are summed to obtain a score vector.

FIG. 2 shows a flow chart of another feedback-based model training method according to an embodiment of the present application, as shown in FIG. 2, and the embodiment of the present application can be further described by:

firstly, data acquisition is carried out to obtain event description texts in the field of social management, and the event description texts are respectively input into a first language model and a second language model. The first score vector is output through the first language model, and the second score vector is output through the second language model. And normalizing the first score vector and the second score vector through softmax to obtain recommended keywords, first probability distribution and second probability distribution. And manually scoring the recommended keywords to obtain keyword scores, and determining KL divergence based on the first probability distribution and the second probability distribution. Based on the keyword scores and the KL divergence, optimizing model parameters of the first language model by adopting a near-end strategy optimization algorithm.

On the basis of the feedback-based model training method, the application also provides a keyword extraction method which is applied to the first language model in any embodiment, and comprises the following steps:

acquiring a target text;

Specifically, the target text is a text of a keyword to be extracted, and may specifically be a text describing a social security event. Before the target text is input into the first language model, the target text is cleaned to remove duplicate data, invalid formats, and the like. In this embodiment, after model parameters of the first language model have been updated, the target text is input into the first language model, and then the corresponding target keyword is output through the first language model. The target keywords can correctly reflect the semantics of the target text, and more humanized keyword extraction service is provided for the user.

It should be noted that, the method of the embodiment of the present application may be performed by a single device, for example, a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the method of an embodiment of the present application, the devices interacting with each other to accomplish the method.

It should be noted that the foregoing describes some embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

Based on the same inventive concept, the application also provides a feedback-based model training device corresponding to the method of any embodiment.

Referring to fig. 3, the feedback-based model training apparatus includes:

a first obtaining module 302 configured to obtain an event description text;

a first output module 304 configured to input the event description text to a first language model and a second language model which are built in advance, output a first probability distribution and recommended keywords through the first language model, and output a second probability distribution through the second language model, wherein the first language model and the second language model are used for extracting keywords in the event description text;

a construction module 306 configured to construct a total loss function using a near-end policy optimization algorithm based on the first probability distribution, the second probability distribution, and the recommended keywords;

an updating module 308 configured to minimize the total loss function to update model parameters of the first language model.

In some embodiments, the construction module 306 is further configured to score the recommended keywords according to the event description text, resulting in keyword scores;

In some embodiments, the construction module 306 is further configured to determine the total loss function by：

In some embodiments, the construction module 306 is further configured to determine the policy loss function by:

In some embodiments, the construction module 306 is further configured to determine the cost loss function by:

Based on the same inventive concept, the application also provides a keyword extraction device corresponding to the method of any embodiment.

Referring to fig. 4, the keyword extraction apparatus includes:

a second acquisition module 402 configured to acquire a target text;

and a second output module 404 configured to input the target text into the first language model, and output a target keyword corresponding to the target text via the first language model.

For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, the functions of each module may be implemented in the same piece or pieces of software and/or hardware when implementing the present application.

The device of the foregoing embodiment is configured to implement the corresponding feedback-based model training method or keyword extraction method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.

Based on the same inventive concept, the application also provides an electronic device corresponding to the method of any embodiment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the feedback-based model training method or the keyword extraction method of any embodiment when executing the program.

Fig. 5 shows a more specific hardware architecture of an electronic device according to this embodiment, where the device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 implement communication connections therebetween within the device via a bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit ), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 1020 may store an operating system and other application programs, and when the embodiments of the present specification are implemented in software or firmware, the associated program code is stored in memory 1020 and executed by processor 1010.

The input/output interface 1030 is used to connect with an input/output module for inputting and outputting information. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

Communication interface 1040 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

Bus 1050 includes a path for transferring information between components of the device (e.g., processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).

It should be noted that although the above-described device only shows processor 1010, memory 1020, input/output interface 1030, communication interface 1040, and bus 1050, in an implementation, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.

The electronic device of the foregoing embodiment is configured to implement the corresponding feedback-based model training method or keyword extraction method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein.

Based on the same inventive concept, the present application also provides a non-transitory computer readable storage medium corresponding to the method of any embodiment, wherein the non-transitory computer readable storage medium stores computer instructions for causing the computer to execute the feedback-based model training method or the keyword extraction method according to any embodiment.

The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.

The storage medium of the foregoing embodiments stores computer instructions for causing the computer to perform the feedback-based model training method or the keyword extraction method described in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein.

Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the application (including the claims) is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the application, the steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the application as described above, which are not provided in detail for the sake of brevity.

Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the embodiments of the present application. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the embodiments of the present application, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the present application are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.

While the application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.

The present embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalent substitutions, improvements, and the like, which are within the spirit and principles of the embodiments of the application, are intended to be included within the scope of the application.

Claims

1. A feedback-based model training method, comprising:

acquiring an event description text;

2. The method of claim 1, wherein the constructing a total loss function using a near-end policy optimization algorithm based on the first probability distribution, the second probability distribution, and the recommended keywords comprises:

3. The method of claim 2, wherein the constructing the total loss function using a near-end policy optimization algorithm based on the keyword scores and the relative entropy comprises:

determining the total loss function by：

，

4. A method according to claim 3, wherein the policy loss function is determined by:

，

wherein ,representing a merit function, the merit function being determined based on the keyword score and the relative entropy,update amplitude indicative of said first probability distribution,/->Representing a clipping function, < >>Representing an adjustable parameter.

5. A method according to claim 3, wherein the cost loss function is determined by:

，

6. The method of claim 1, wherein the first language model and the second language model are the same, the first language model and the second language model each comprising a multi-layer transducer encoder; the first probability distribution and the second probability distribution are calculated based on the attention weights of a penultimate layer of the multi-layer transform encoders; the model parameters of the second language model are in a frozen state.

7. A keyword extraction method, applied to the first language model of any one of claims 1-6, the method comprising:

acquiring a target text;

8. A feedback-based model training apparatus, comprising:

the acquisition module is configured to acquire event description text;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 7 when the program is executed by the processor.

10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 7.