WO2022257468A1 - Procédé et appareil de mise à jour d'un système de gestion de dialogue, et dispositif informatique et support de stockage - Google Patents

Procédé et appareil de mise à jour d'un système de gestion de dialogue, et dispositif informatique et support de stockage Download PDF

Info

Publication number
WO2022257468A1
WO2022257468A1 PCT/CN2022/072286 CN2022072286W WO2022257468A1 WO 2022257468 A1 WO2022257468 A1 WO 2022257468A1 CN 2022072286 W CN2022072286 W CN 2022072286W WO 2022257468 A1 WO2022257468 A1 WO 2022257468A1
Authority
WO
WIPO (PCT)
Prior art keywords
dialogue
dialog
content
state
value
Prior art date
Application number
PCT/CN2022/072286
Other languages
English (en)
Chinese (zh)
Inventor
侯翠琴
文彬
李剑锋
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022257468A1 publication Critical patent/WO2022257468A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the technical field of speech processing, and in particular to a dialog management system updating method, device, computer equipment and storage medium.
  • the multi-round dialogue management system mainly generates dialogue responses based on technologies such as retrieval database, intent understanding model, and dialogue generation model.
  • technologies such as retrieval database, intent understanding model, and dialogue generation model.
  • the database content, intent understanding model, and dialogue generation model are not updated in time during the use of the multi-round dialogue management system, the dialogue effect may be affected.
  • the inventor realized that the manual update method is generally used to update the database content, optimize the intent understanding model or optimize the dialogue generation model, but this method is inefficient in updating the dialogue management system, and the manual update method is prone to errors, which may lead to The accuracy of the dialogue management system becomes lower.
  • Embodiments of the present application provide a dialogue management system update method, device, computer equipment, and storage medium to solve the problems of low update efficiency of the dialogue system and low accuracy of the updated dialogue system.
  • a method for updating a dialogue management system comprising:
  • a dialogue activation mapping function is determined through a preset discriminant generation model
  • the state value function refers to the corresponding incentive value of the dialogue reply content generated by the initial dialogue policy function in a dialogue state
  • the state value function and the dialog incentive mapping function determine the behavior value function of each dialog state in the preset system state sequence
  • a dialogue management system updating device comprising:
  • the dialog content acquisition module is used to obtain the latest round of the first dialog content in the dialog management system, and all the second dialog content before the first dialog content; wherein, the first dialog content is associated with the first dialog state; A second dialog content is associated with a second dialog state;
  • a dialog incentive mapping function determination module configured to determine the dialog incentive mapping function through a preset discriminant generation model according to the first dialog content, the first dialog state, the second dialog state, and the second dialog content;
  • a state value function determination module configured to obtain the initial dialog strategy function and the preset system state sequence of the dialog management system, and determine the initial dialog strategy function, the preset system state sequence, and the dialog incentive mapping function according to the dialog management system.
  • a behavior value function determination module configured to determine the behavior value function of each dialog state in the preset system state sequence according to the preset system state sequence, the state value function, and the dialog incentive mapping function;
  • a system update management module configured to determine whether there is an update to the initial dialogue policy function according to the state value function and the behavior value function corresponding to the same dialogue state, and when there is no update to the initial dialogue policy function , to determine that the dialog management system has been updated.
  • a computer device comprising a memory, a processor, and computer-readable instructions stored in the memory and operable on the processor, and the processor implements the following steps when executing the computer-readable instructions:
  • a dialogue activation mapping function is determined through a preset discriminant generation model
  • the state value function refers to an incentive value corresponding to a dialog reply content generated by the initial dialog policy function in a dialog state
  • the state value function and the dialog incentive mapping function determine the behavior value function of each dialog state in the preset system state sequence
  • One or more readable storage media storing computer-readable instructions, wherein, when the computer-readable instructions are executed by one or more processors, the one or more processors are caused to perform the following steps:
  • a dialogue activation mapping function is determined through a preset discriminant generation model
  • the state value function refers to an incentive value corresponding to a dialog reply content generated by the initial dialog policy function in a dialog state
  • the state value function and the dialog incentive mapping function determine the behavior value function of each dialog state in the preset system state sequence
  • the above dialogue management system update method, device, computer equipment and storage medium by determining the dialogue incentive mapping function according to the first dialogue content, the first dialogue state, the second dialogue state and the second dialogue content, more comprehensively consider the historical dialogue content , so that the generated dialogue incentive mapping function has a higher accuracy rate when determining the state value function and the behavior value function; further, this embodiment determines whether there is an update to the initial dialogue policy function by introducing the state value function and the behavior value function, which can be more comprehensive Considering different dialogue generation actions to determine whether the initial dialogue policy function is updated, the update efficiency of the dialogue management system is improved, and the dialogue management system can be updated according to the real-time dialogue content, so that the dialogue policy function of the dialogue management system is more accurate and the dialogue The response content output by the management system is more accurate.
  • Fig. 1 is a schematic diagram of an application environment of a dialogue management system update method in an embodiment of the present application
  • Fig. 2 is a flowchart of a dialog management system update method in an embodiment of the present application
  • Fig. 3 is a functional block diagram of a dialogue management system updating device in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a computer device in an embodiment of the present application.
  • the dialog management system update method provided in the embodiment of the present application can be applied in the application environment shown in FIG. 1 .
  • the dialog management system update method is applied in the dialog management system update system
  • the dialog management system update system includes a client and a server as shown in Figure 1, and the client and the server communicate through the network to solve the problem of the dialog system.
  • the update efficiency is low, and the accuracy of the updated dialogue system becomes lower.
  • the client is also referred to as the client, which refers to a program corresponding to the server and providing local services for the client.
  • Clients can be installed on, but not limited to, various personal computers, laptops, smartphones, tablets and portable wearable devices.
  • the server can be implemented by an independent server or a server cluster composed of multiple servers.
  • a method for updating a dialog management system is provided.
  • the method is applied to the server in FIG. 1 as an example for illustration, including the following steps:
  • S10 Obtain the latest round of first dialog content in the dialog management system, and all second dialog content before the first dialog content; wherein, the first dialog content is associated with the first dialog state; one of the second dialog content The dialog content is associated with a second dialog state;
  • the first dialogue content is the latest round of dialogue content in the dialogue system from the current time; the second dialogue content is other dialogue content in the dialogue system except the first dialogue content, and the occurrence of the second dialogue content
  • the time point is before the occurrence time point of the first dialogue content.
  • the first dialog state refers to the dialog management state of the dialog management system when the first dialog content occurs; the second dialog state refers to the dialog management state of the dialog management system when the second dialog content occurs; further, the first dialog state and Both the second dialogue state can be stored in a preset storage database.
  • the first dialogue state and the second dialogue state can include user intentions, user questions, system response actions, dialogue content generated by the system in response to user questions, etc. .
  • the preset discriminant generation model in this embodiment is a deep learning network model that combines the discriminative deep learning network structure and the generative deep learning network structure.
  • the second dialog content, the first dialog state, and the second dialog state automatically learn to generate a dialog activation mapping function.
  • step S20 includes:
  • the preset discrimination generation model includes a vector encoding module, which includes an encoding unit and a decoding unit, so as to perform vector encoding processing on the first dialogue content and the first dialogue state of the vector encoding module to obtain the first Dialogue vectors; at the same time, the vector encoding module performs vector encoding processing on all second dialogue contents and second dialogue states corresponding to each second dialogue content to obtain a second dialogue vector.
  • a vector encoding module which includes an encoding unit and a decoding unit, so as to perform vector encoding processing on the first dialogue content and the first dialogue state of the vector encoding module to obtain the first Dialogue vectors; at the same time, the vector encoding module performs vector encoding processing on all second dialogue contents and second dialogue states corresponding to each second dialogue content to obtain a second dialogue vector.
  • the reply content prediction is to predict the replies to the questions in the current round of dialogue content. Specifically, after performing vector encoding processing on the first dialogue content and the first dialogue state through the preset discrimination generation model, a first dialogue vector is obtained; at the same time, all the The second dialogue content and each of the second dialogue states are subjected to vector encoding processing to obtain the second dialogue vector, and then predict the reply content of the dialogue system according to the first dialogue vector to obtain a first prediction vector corresponding to the reply content. , and at the same time predict the reply content of the dialogue system according to the second dialogue vector to obtain a second prediction vector corresponding to the reply content.
  • the accuracy of dialogue content prediction is improved by introducing the first dialogue state associated with the first dialogue content, the second dialogue content, and the second dialogue state associated with the second dialogue content, thereby improving dialogue Accuracy of the incentive mapping function.
  • the dialogue activation mapping function is obtained.
  • the linear regression classification in this embodiment includes linear regression processing and classification processing. Specifically, predicting the reply content of the dialogue system according to the first dialogue vector to obtain a first prediction vector; meanwhile predicting the reply content of the dialogue system according to the second dialogue vector to obtain a second prediction vector Afterwards, the first prediction vector and the second prediction vector are decoded by the vector encoding module in the preset discriminant generation model, and the decoded first prediction vector and the second prediction vector are input to the linear regression module to extract The first specific feature of the processed first prediction vector and the second specific feature of the second prediction vector are decoded, and then the dialogue activation mapping function is determined through a preset classification module in the discriminant generation model. Further, the dialogue incentive mapping function can be determined by predicting the fluency between the dialogue content of the reply and all dialogue content, the number of dialogue rounds and context correlation, that is, the dialogue incentive mapping function is used to determine and dialogue The stimulus value corresponding to the sample.
  • S30 Obtain an initial dialogue strategy function and a preset system state sequence of the dialogue management system, and determine the preset system state sequence according to the initial dialogue strategy function, the preset system state sequence, and the dialogue incentive mapping function
  • the state value function refers to the corresponding incentive value of the dialog reply content generated by the initial dialog policy function in a dialog state;
  • the preset system state sequence refers to a preset sequence including multiple dialog management system states.
  • the initial dialog strategy function is the strategy function for dialog generation for the dialog management system, and the initial dialog strategy function represents the mapping relationship between the dialog state and the dialog generation action.
  • the dialogue generation action is used to generate the first reply content.
  • the state value function characterizes the future prediction incentives of each dialogue state under the initial dialogue policy function, that is, the state value function refers to the dialogue reply content generated in a dialogue state through the initial dialogue policy function (the dialogue reply content includes The incentive value corresponding to the content of the first reply mentioned).
  • step S30 includes:
  • the initial dialog policy function determine the first dialog generating action corresponding to each dialog state in the preset system state sequence, and execute the first dialog generating action through the dialog management system to obtain the The first reply content corresponding to the dialogue state and the conversion rate of the first state;
  • the first dialogue generation action corresponding to each dialogue state can be determined according to the initial dialogue policy function, that is, the initial dialogue policy function can be used to map
  • the first dialog generation action corresponding to the dialog state is displayed, which further shows that the first dialog generation action in this embodiment is not randomly selected or preset.
  • the first state conversion rate refers to the probability of transitioning from the current dialog state to the next dialog state after the first dialog generating action is executed by the dialog management system.
  • the first dialogue generation action corresponding to each dialogue state is mapped out according to the initial dialogue policy function, and the first dialogue generation action is executed through the dialogue management system.
  • the first similarity between the first conversation content and the first reply content can be determined by a similarity algorithm (such as a cosine similarity algorithm or a Euclidean distance algorithm, etc.), and Determine the second similarity between each second dialogue content and the first reply content, and compare the first similarity and the second similarity with a preset similarity threshold (the preset similarity threshold can be set to 95 %, 98%, etc.), and then when the first similarity or any second similarity is greater than or equal to the preset similarity threshold, it can be determined that the first dialogue content and the second dialogue content are similar to the first reply content If the first similarity and all the second similarities are less than the preset similarity threshold, it can be determined that there is no dialogue content similar to the first reply content in the first
  • a similarity algorithm such as a cosine similarity algorithm or a Euclidean distance algorithm, etc.
  • the similarity threshold is preset, it can be determined that there is no dialogue content similar to the first reply content in the first dialogue content and the second dialogue content. At this time, it can be directly determined by the dialogue incentive mapping function determined in step S20.
  • the first state incentive value after the first dialog generation action is executed in the dialog state.
  • a state value function corresponding to each dialog state is determined according to the first state transition rate corresponding to each dialog state and the first state excitation value.
  • the dialog excitation mapping function After determining the first state excitation value corresponding to each of the dialog states according to the dialog excitation mapping function, according to the first state transition rate corresponding to each of the dialog states and the first state excitation value value, which determines the state-value function corresponding to each dialog state.
  • state value function can be determined according to the following expression:
  • V(s) refers to the state value function when the dialogue state is s; p(s', r
  • the attenuation factor can be set to 0.9;
  • V(s') refers to the state value function of the dialogue state s'; S is the preset system state sequence.
  • the first similarity is detected or any second similarity is greater than or When it is equal to the preset similarity threshold, it indicates that there is a dialogue content similar to the first reply content in the first dialogue content and the second dialogue content, if there is only the first similarity at this time or there is only one second similarity greater than or Equal to the preset similarity threshold, then record the first dialogue content corresponding to the first similarity as the first similar content, or record the second dialogue content corresponding to the second similarity as the first similar content; if At this time, there are multiple second similarities greater than or equal to the preset similarity threshold, or there is a first similarity and at least one second similarity greater than or equal to the preset similarity threshold, and then the first dialogue with the highest similarity can be The content or the second dialogue content is recorded as the first similar content.
  • the first historical incentive value is the incentive value for determining the first similar content according to the dialogue activation function, and the historical incentive value is associated with the first similar content and stored in the preset database. After the first similar content, the first historical incentive value associated with the first similar content can be obtained directly from the preset database.
  • the difference in the number of dialogue turns is the difference between the number of dialogue turns of the first similar content and the number of dialogue turns of the first dialogue content. Two dialogue content, and then it can be determined that the difference in the number of dialogue turns between the first similar content and the first dialogue content is 1.
  • the A historical incentive value and the first dialogue round difference after acquiring the first historical incentive value corresponding to the first similar content and the first dialogue round difference between the first similar content and the first dialogue content, determine a second state incentive value, and determine a state value function corresponding to each dialogue state according to the first state conversion rate and the second state incentive value .
  • the second state excitation value can be determined by the following expression:
  • R is the incentive value of the second state
  • r is the first historical incentive value
  • u is a dialogue system parameter, and the dialogue system parameter is any real number greater than 0
  • n is the difference value of the first dialogue round number.
  • S40 Determine the behavior value function of each dialog state in the preset system state sequence according to the preset system state sequence, the state value function, and the dialog incentive mapping function;
  • the behavior value function represents the incentive value of each dialog state under conditions different from the first dialog generating action.
  • the determining the behavior value function of each dialog state in the preset system state sequence according to the preset system state sequence, the state value function, and the dialog incentive mapping function includes:
  • the second dialogue generation action is a randomly selected dialogue generation action, and the second dialogue generation action is different from the first dialogue generation action.
  • the dialog management system executes the second dialog generation action, and obtains the first dialog generated after the dialog management system executes the corresponding second dialog generation action in each dialog state.
  • the first dialog content and the second dialog content Whether there is a dialogue content similar to the second reply content in the query, for example, the first dialogue content between the first dialogue content and the second reply content can be determined by a similarity algorithm (such as a cosine similarity algorithm or a Euclidean distance algorithm, etc.).
  • a similarity algorithm such as a cosine similarity algorithm or a Euclidean distance algorithm, etc.
  • the preset similarity threshold can be set to 95%, 98%, etc.
  • Dialogue content similar to the two replies; if the first similarity and all the second similarities are less than the preset similarity threshold, it can be determined that there is no similarity with the second reply in the first dialogue content and the second dialogue content. Conversation content.
  • the similarity threshold is preset, it can be determined that there is no dialogue content similar to the second reply content in the first dialogue content and the second dialogue content. At this time, it can be directly determined by the dialogue incentive mapping function determined in step S20.
  • the third state incentive value after executing the second dialog generating action in the dialog state.
  • a behavior value function corresponding to each dialogue state is determined according to the second state transition rate and the third state excitation value corresponding to each dialogue state.
  • behavior value function can be determined according to the following expression:
  • Q(s, a') refers to the behavior value function of executing the second dialog generation action a' in the dialog state s
  • s, a') refers to the dialog state s
  • r' refers to the third state incentive value of executing the second dialogue generation action a' in the dialogue state s
  • is the dialogue management system
  • Attenuation factor the attenuation factor can be set arbitrarily, for example, the attenuation factor can be set to 0.9
  • V(s') refers to the state value function of the dialogue state s'
  • S is the preset system state sequence.
  • the first similarity is detected or any second similarity is greater than or When it is equal to the preset similarity threshold, it indicates that there is a dialogue content similar to the second reply content in the first dialogue content and the second dialogue content, if there is only the first similarity at this time or there is only one second similarity greater than or equal to the preset similarity threshold, then record the first dialogue content corresponding to the first similarity as the second similar content, or record the second dialogue content corresponding to the second similarity as the second similar content; if At this time, there are multiple second similarities greater than or equal to the preset similarity threshold, or there is a first similarity and at least one second similarity greater than or equal to the preset similarity threshold, and then the first dialogue with the highest similarity can be The content or the second dialogue content is recorded as the second similar content.
  • the second historical incentive value is the incentive value for determining the second similar content according to the dialogue activation function.
  • the second historical incentive value associated with the second similar content can be obtained directly from the preset database.
  • the difference in the number of dialogue turns is the difference between the number of dialogue turns of the second similar content and the number of dialogue turns of the first dialogue content. Two dialogue contents, and then it can be determined that the difference in the number of dialogue turns between the second similar content and the first dialogue content is 1.
  • the Two historical incentive values and the second dialogue round difference that is, the fourth state incentive value can be determined according to the expression for determining the second state incentive value in the above steps, and according to the second state conversion rate and the obtained The fourth state incentive value is used to determine the state value function corresponding to each dialogue state.
  • S50 Determine whether there is an update to the initial dialogue policy function according to the state value function and the behavior value function corresponding to the same dialogue state, and determine the dialogue when there is no update to the initial dialogue policy function The management system has been updated.
  • the initial dialogue policy function is updated according to the state-value function and behavior-value function corresponding to the same dialogue state, that is, it is determined that the same dialogue state corresponds to Whether the value of the behavior function value of the behavior function value is greater than or equal to the value of the state value function; If the value of the behavior function value corresponding to the same dialog state is greater than or equal to the value of the state value function, the second dialogue generation action that uses the second dialogue generation action that determines the behavior function value, preferably The first dialogue generation action based on the determined state value function, that is, the dialogue policy function using the second dialogue generation action in the same dialogue state is better than the dialogue policy function using the first dialogue generation action, so the initial dialogue policy function Update; if the value of the behavior function value corresponding to the same dialogue state is smaller than the value of the state value function, it means that the first dialogue generation action using the determined state value function is better than the second dialogue generation action with the determined behavior function
  • the historical dialogue content is considered more comprehensively, so that the generated dialogue activation mapping function is The accuracy rate is higher when determining the state value function and the behavior value function; further, this embodiment determines whether there is an update in the initial dialogue policy function by introducing the state value function and the behavior value function, and can more comprehensively consider different dialogue generation actions to determine the initial dialogue Whether the strategy function is updated improves the update efficiency of the dialogue management system, and the dialogue management system can be updated according to the real-time dialogue content, so that the dialogue strategy function of the dialogue management system is more accurate, and the reply content output by the dialogue management system is more accurate. high.
  • a dialog management system updating device is provided, and the dialog management system updating device is in one-to-one correspondence with the dialog management system updating method in the foregoing embodiments.
  • the dialogue management system update device includes a dialogue content acquisition module 10 , a dialogue incentive mapping function determination module 20 , a state value function determination module 30 , a behavior value function determination module 40 and a system update management module 50 .
  • the detailed description of each functional module is as follows:
  • Dialogue content acquisition module 10 configured to acquire the latest round of first dialogue content in the dialogue management system, and all second dialogue contents before the first dialogue content; wherein, the first dialogue content is associated with the first dialogue state ; A second dialog content is associated with a second dialog state;
  • Dialogue activation mapping function determination module 20 for determining the dialogue activation mapping function through a preset discriminant generation model according to the first dialogue content, the first dialogue state, the second dialogue state and the second dialogue content;
  • a state value function determination module 30 configured to obtain an initial dialog strategy function and a preset system state sequence of the dialog management system, and determine according to the initial dialog strategy function, the preset system state sequence, and the dialog incentive mapping function The state value function of each dialog state in the preset system state sequence; the state value function refers to the incentive value corresponding to the dialog reply content generated by the initial dialog policy function in a dialog state;
  • a behavior value function determination module 40 configured to determine the behavior value function of each dialog state in the preset system state sequence according to the preset system state sequence, the state value function, and the dialog incentive mapping function;
  • a system update management module 50 configured to determine whether there is an update to the initial dialogue policy function according to the state value function and the behavior value function corresponding to the same dialogue state, and if there is no update to the initial dialogue policy function , it is determined that the dialogue management system is updated.
  • the dialogue activation mapping function determination module 20 includes:
  • a vector encoding unit configured to perform vector encoding processing on the first dialogue content and the first dialogue state through the preset discrimination generation model to obtain a first dialogue vector; at the same time, use the preset discrimination generation model to performing vector encoding processing on the second dialog content and each of the second dialog states to obtain a second dialog vector;
  • the content prediction unit is configured to predict the reply content of the dialogue management system according to the first dialogue vector to obtain a first prediction vector; at the same time, predict the reply content of the dialogue management system according to the second dialogue vector to obtain the second prediction vector;
  • the linear regression classification unit is configured to obtain the dialogue activation mapping function after performing linear regression classification on the first prediction vector and the second prediction vector.
  • the state value function determination module 30 includes:
  • the first dialogue action execution unit is configured to determine a first dialogue generation action corresponding to each dialogue state in the preset system state sequence according to the initial dialogue policy function, and execute the first dialogue action through the dialogue management system A dialog generation action, obtaining the first reply content and the first state conversion rate corresponding to each of the dialog states;
  • a first dialog content comparison unit configured to determine whether there is a dialog content identical to the first reply content among the first dialog content and the second dialog content
  • the first state incentive value determination unit is configured to determine the corresponding dialog content according to the dialog incentive mapping function when there is no dialog content identical to the first reply content in the first dialog content and the second dialog content.
  • a first state value function determining unit is configured to determine a state value function corresponding to each dialog state according to the first state transition rate corresponding to each dialog state and the first state excitation value.
  • the state value function determination module 30 also includes:
  • the first similar content recording unit is configured to record the same first reply content as the first reply content when there is the same conversation content as the first reply content in the first conversation content and the second conversation content.
  • the dialogue content or the second dialogue content is recorded as the first similar content;
  • a first parameter acquisition unit configured to acquire a first historical incentive value corresponding to the first similar content, and a first dialogue round difference between the first similar content and the first dialogue content;
  • the second state value function determination unit is configured to determine a second state incentive value according to the first historical incentive value and the first dialogue round difference value, and determine a second state incentive value according to the first state transition rate and the second The state incentive value determines the state value function corresponding to each dialogue state.
  • the behavior value function determination module 40 includes:
  • the second dialog action executing unit is configured to obtain a second dialog generating action corresponding to each of the dialog states, and execute the second dialog generating action through the dialog management system to obtain a second dialog generating action corresponding to each of the dialog states.
  • a second dialog content comparison unit configured to determine whether there is a dialog content identical to the second reply content among the first dialog content and the second dialog content;
  • the first state incentive value determining unit is configured to determine the corresponding dialog content according to the dialog incentive mapping function when there is no dialog content identical to the second reply content in the first dialog content and the second dialog content.
  • the first behavior value function determination unit is configured to determine the behavior value function corresponding to each dialog state according to the second state transition rate and the third state incentive value corresponding to each dialog state.
  • the behavior value function determination module 40 also includes:
  • the second similar content recording unit is configured to record the same first dialogue content as the second reply content when the same dialogue content as the second reply content exists in the first dialogue content and the second dialogue content.
  • the dialogue content or the second dialogue content is recorded as the second similar content;
  • a second parameter acquisition unit configured to acquire a second historical incentive value corresponding to the second similar content, and a second dialogue round difference between the second similar content and the first dialogue content
  • the second behavior value function determination unit is configured to determine a fourth state incentive value according to the second historical incentive value and the second dialogue round difference value, and determine a fourth state incentive value according to the second state conversion rate and the fourth The state incentive value determines the behavior value function corresponding to each dialogue state.
  • the system update management module 50 includes:
  • An update condition detection unit configured to determine whether an update condition is met according to the state value function and the behavior value function corresponding to the same dialogue state; the update condition refers to the behavior value corresponding to the same dialogue state Whether the value of the function is greater than or equal to the value of the state value function;
  • a first dialog strategy function update unit configured to determine that there is an update of the dialog strategy function when the update condition is met
  • the second dialog strategy function updating unit is configured to determine that there is no update of the dialog strategy function when the update condition is not met.
  • Each module in the above-mentioned dialogue management system updating device can be fully or partially realized by software, hardware and a combination thereof.
  • the above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can invoke and execute the corresponding operations of the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 4 .
  • the computer device includes a processor, memory, network interface and database connected by a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a readable storage medium and an internal memory.
  • the readable storage medium stores an operating system, computer readable instructions and a database.
  • the internal memory provides an environment for the execution of the operating system and computer readable instructions in the readable storage medium.
  • the database of the computer device is used to store the data used in the method for updating the dialog management system in the above embodiments.
  • the network interface of the computer device is used to communicate with an external terminal via a network connection. When the computer-readable instructions are executed by the processor, a method for updating the dialog management system is realized.
  • the readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
  • a computer device comprising a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, the processor executing the computer-readable The following steps are implemented during the instruction:
  • a dialogue activation mapping function is determined through a preset discriminant generation model
  • the state value function refers to an incentive value corresponding to a dialog reply content generated by the initial dialog policy function in a dialog state
  • the state value function and the dialog incentive mapping function determine the behavior value function of each dialog state in the preset system state sequence
  • one or more readable storage media storing computer-readable instructions, wherein the computer-readable instructions, when executed by one or more processors, cause the one or more processing The device performs the following steps:
  • a dialogue activation mapping function is determined through a preset discriminant generation model
  • the state value function refers to an incentive value corresponding to a dialog reply content generated by the initial dialog policy function in a dialog state
  • the state value function and the dialog incentive mapping function determine the behavior value function of each dialog state in the preset system state sequence
  • Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Machine Translation (AREA)

Abstract

La présente invention concerne un procédé et un appareil de mise à jour d'un système de gestion de dialogue, et un dispositif informatique et un support de stockage. Le procédé comprend les étapes suivantes : selon un premier contenu de dialogue, un premier état de dialogue, un deuxième état de dialogue et un deuxième contenu de dialogue, déterminer une fonction de mise en correspondance d'excitation de dialogue au moyen d'un modèle de génération de discrimination prédéfini ; selon une fonction de politique de dialogue initiale et une séquence d'état de système prédéfinie d'un système de gestion de dialogue, et la fonction de mise en correspondance d'excitation de dialogue, déterminer une fonction de valeur d'état de chaque état de dialogue dans la séquence d'état de système prédéfinie ; selon la séquence d'état de système prédéfinie, la fonction de valeur d'état et la fonction de mise en correspondance d'excitation de dialogue, déterminer une fonction de valeur de comportement de chaque état de dialogue dans la séquence d'état de système prédéfinie ; et selon la fonction de valeur d'état et la fonction de valeur de comportement qui correspondent au même état de dialogue, déterminer s'il existe une mise à jour dans la fonction de politique de dialogue initiale, et lorsqu'il n'y a pas de mise à jour dans la fonction de politique de dialogue initiale, déterminer que la mise à jour du système de gestion de dialogue est achevée. Au moyen de la présente invention, l'efficacité et la précision de la mise à jour d'un système de gestion de dialogue sont améliorées.
PCT/CN2022/072286 2021-06-07 2022-01-17 Procédé et appareil de mise à jour d'un système de gestion de dialogue, et dispositif informatique et support de stockage WO2022257468A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110630480.XA CN113239171B (zh) 2021-06-07 2021-06-07 对话管理系统更新方法、装置、计算机设备及存储介质
CN202110630480.X 2021-06-07

Publications (1)

Publication Number Publication Date
WO2022257468A1 true WO2022257468A1 (fr) 2022-12-15

Family

ID=77137003

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/072286 WO2022257468A1 (fr) 2021-06-07 2022-01-17 Procédé et appareil de mise à jour d'un système de gestion de dialogue, et dispositif informatique et support de stockage

Country Status (2)

Country Link
CN (1) CN113239171B (fr)
WO (1) WO2022257468A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239171B (zh) * 2021-06-07 2023-08-01 平安科技(深圳)有限公司 对话管理系统更新方法、装置、计算机设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357838A (zh) * 2017-06-23 2017-11-17 上海交通大学 基于多任务学习的对话策略在线实现方法
US20180233143A1 (en) * 2017-02-13 2018-08-16 Kabushiki Kaisha Toshiba Dialogue system, a dialogue method and a method of adapting a dialogue system
CN109582767A (zh) * 2018-11-21 2019-04-05 北京京东尚科信息技术有限公司 对话系统处理方法、装置、设备及可读存储介质
CN111159371A (zh) * 2019-12-21 2020-05-15 华南理工大学 一种面向任务型对话系统的对话策略方法
CN113239171A (zh) * 2021-06-07 2021-08-10 平安科技(深圳)有限公司 对话管理系统更新方法、装置、计算机设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2556600A (en) * 1999-03-12 2000-10-04 Christopher Nice Man-machine dialogue system and method
CN105788593B (zh) * 2016-02-29 2019-12-10 中国科学院声学研究所 生成对话策略的方法及系统
US11574636B2 (en) * 2019-08-29 2023-02-07 Oracle International Corporation Task-oriented dialog suitable for a standalone device
CN110706785B (zh) * 2019-08-29 2022-03-15 合肥工业大学 基于对话的情感调节方法和系统
CN110942774A (zh) * 2019-12-12 2020-03-31 北京声智科技有限公司 一种人机交互系统、其对话方法、介质和设备
CN112884130A (zh) * 2021-03-16 2021-06-01 浙江工业大学 一种基于SeqGAN的深度强化学习数据增强防御方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180233143A1 (en) * 2017-02-13 2018-08-16 Kabushiki Kaisha Toshiba Dialogue system, a dialogue method and a method of adapting a dialogue system
CN107357838A (zh) * 2017-06-23 2017-11-17 上海交通大学 基于多任务学习的对话策略在线实现方法
CN109582767A (zh) * 2018-11-21 2019-04-05 北京京东尚科信息技术有限公司 对话系统处理方法、装置、设备及可读存储介质
CN111159371A (zh) * 2019-12-21 2020-05-15 华南理工大学 一种面向任务型对话系统的对话策略方法
CN113239171A (zh) * 2021-06-07 2021-08-10 平安科技(深圳)有限公司 对话管理系统更新方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN113239171B (zh) 2023-08-01
CN113239171A (zh) 2021-08-10

Similar Documents

Publication Publication Date Title
WO2022142613A1 (fr) Procédé et appareil d'expansion de corpus de formation et procédé et appareil de formation de modèle de reconnaissance d'intention
WO2021218024A1 (fr) Procédé et appareil d'entraînement de modèle de reconnaissance d'entité nommée, et dispositif informatique
WO2021189960A1 (fr) Procédé et appareil pour entrainer un réseau antagoniste, procédé et appareil pour compléter des données médicales, dispositif et support
CN110135681B (zh) 风险用户识别方法、装置、可读存储介质及终端设备
WO2020232874A1 (fr) Procédé et appareil de modélisation basés sur l'apprentissage par transfert, et dispositif d'ordinateur et support d'informations
WO2022147910A1 (fr) Procédé et appareil de vérification d'informations de dossier médical, et dispositif informatique et support d'enregistrement
CN113228006A (zh) 检测连续事件中的异常的装置和方法及其计算机程序产品
CN110912908B (zh) 网络协议异常检测方法、装置、计算机设备和存储介质
WO2021004318A1 (fr) Procédé et appareil de traitement de données de ressources, dispositif informatique et support de stockage
WO2022257468A1 (fr) Procédé et appareil de mise à jour d'un système de gestion de dialogue, et dispositif informatique et support de stockage
CN110956195A (zh) 图像匹配方法、装置、计算机设备及存储介质
CN112966054A (zh) 基于企业图谱节点间关系的族群划分方法和计算机设备
CN111340245A (zh) 一种模型训练方法及系统
CN117454668B (zh) 零部件失效概率的预测方法、装置、设备和介质
US11688175B2 (en) Methods and systems for the automated quality assurance of annotated images
CN114782960B (zh) 模型训练方法、装置、计算机设备及计算机可读存储介质
CN113672870B (zh) 故障事件概率估算方法、装置、计算机设备和存储介质
CN115169334A (zh) 意图识别模型训练方法、装置、计算机设备及存储介质
CN113220858B (zh) 对话系统更新方法、装置、计算机设备及存储介质
CN111507188A (zh) 人脸识别模型训练方法、装置、计算机设备及存储介质
CN111368044A (zh) 智能问答方法、装置、计算机设备和存储介质
CN113360744A (zh) 媒体内容的推荐方法、装置、计算机设备和存储介质
CN114782758B (zh) 图像处理模型训练方法、系统、计算机设备及存储介质
CN114627342B (zh) 基于稀疏度的图像识别模型的训练方法、装置和设备
US20240013089A1 (en) Sequential Synthesis and Selection for Feature Engineering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22819076

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22819076

Country of ref document: EP

Kind code of ref document: A1