CN114519094A

CN114519094A - Method and device for conversational recommendation based on random state and electronic equipment

Info

Publication number: CN114519094A
Application number: CN202210143900.6A
Authority: CN
Inventors: 沈越
Original assignee: Ping An Puhui Enterprise Management Co Ltd
Current assignee: Ping An Puhui Enterprise Management Co Ltd
Priority date: 2022-02-16
Filing date: 2022-02-16
Publication date: 2022-05-20

Abstract

The application discloses a method and a device for recommending dialect based on a random state and electronic equipment, wherein the method comprises the following steps: performing text conversion processing on the current voice information of the user to obtain a first sentence corresponding to the voice information; querying historical dialogue data according to the first statement to obtain a second statement, wherein the occurrence time of the second statement is earlier than that of the first statement, and the absolute value of the difference between the occurrence time of the second statement and the occurrence time of the first statement is minimum; extracting the intention of the second sentence to obtain a first intention characteristic; generating a state tracking label of the first statement according to the first will characteristic; obtaining at least one first score from the first sentence, the state tracking label and the at least one conversational input scoring model; multiplying each first fraction of the at least one first fraction by a random function to obtain at least one second fraction; and recommending the dialect corresponding to the largest second score in the at least one second score to the answering device.

Description

Method and device for conversational recommendation based on random state and electronic equipment

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a conversational recommendation method and device based on a random state and electronic equipment.

Background

At present, the traditional conversational recommendation model is basically recommended based on a TOP list obtained by collaborative filtering or sorting model, and in general, for most scenes, the effect of the conversational recommendation model is to meet the requirements of the scene. However, in a conversation scene with people, the collaborative filtering or ranking model adopted by the conventional conversation recommendation method extracts the high-frequency conversations corresponding to the high-frequency intentions for replying, so that the client asks a question each time, and even in the same conversation, the obtained reply is always repeated high-frequency conversations as long as the intentions of the conversations are similar, i.e., the high-frequency conversations are repeated. Thus, a vicious circle is generated, so that high frequency dialogues are always recommended, the frequency is further increased, the subsequent access weight is increased, and the continuous access is easier. The whole conversation process is rigid, interestingness and innovation are lacked, and user experience is poor.

Disclosure of Invention

In order to solve the problems in the prior art, embodiments of the present application provide a method, an apparatus, and an electronic device for conversational recommendation based on a random state, which can improve interest and creativity in a conversation scene with a person, and thus improve user experience.

In a first aspect, an embodiment of the present application provides a method for conversational recommendation based on a random state, including:

performing text conversion processing on the current voice information of the user to obtain a first sentence corresponding to the voice information;

querying historical dialogue data according to the first statement to obtain a second statement, wherein the historical dialogue data is used for recording dialogue data generated before the current moment of a dialogue event to which the first statement belongs, the occurrence time of the second statement is earlier than that of the first statement, and the absolute value of the difference between the occurrence time of the second statement and the occurrence time of the first statement is minimum;

extracting the will of the second sentence to obtain a first will characteristic;

generating a state tracking label of the first statement according to the first intention characteristic, wherein the state tracking label is used for identifying the intention direction and the demand strength of the user when the user speaks the voice information;

inputting the first statement, the state tracking label and at least one dialect into a scoring model to obtain at least one first score, wherein the scoring model is used for scoring each of the at least one dialect, and the at least one first score is in one-to-one correspondence with the at least one dialect;

Multiplying each first score in the at least one first score by a random function to obtain at least one second score, wherein the at least one second score is in one-to-one correspondence with the at least one first score;

recommending the dialect corresponding to the maximum second score in the at least one second score to the answering device, so that the answering device generates an answering sentence according to the recommended dialect to answer the current voice information of the user.

In a second aspect, an embodiment of the present application provides a random state-based conversational recommendation device, including:

the analysis module is used for performing text conversion processing on the current voice information of the user to obtain a first sentence corresponding to the voice information;

the query module is used for querying historical dialogue data according to the first statement to obtain a second statement, wherein the historical dialogue data is used for recording dialogue data generated before the current moment of a dialogue event to which the first statement belongs, the occurrence time of the second statement is earlier than that of the first statement, and the absolute value of the difference between the occurrence time of the second statement and the occurrence time of the first statement is minimum;

the processing module is used for carrying out intention extraction on the second sentence to obtain a first intention characteristic, and generating a state tracking label of the first sentence according to the first intention characteristic, wherein the state tracking label is used for identifying the own intention direction and the demand intensity of the user when the user speaks the voice information;

The scoring module is used for inputting the first statement, the state tracking label and the at least one dialect into a scoring model to obtain at least one first score, wherein the scoring model is used for scoring each of the at least one dialect, the at least one first score is in one-to-one correspondence with the at least one dialect, and each of the at least one first score is multiplied by a random function to obtain at least one second score, and the at least one second score is in one-to-one correspondence with the at least one first score;

and the recommending module is used for recommending the dialect corresponding to the maximum second score in the at least one second score to the answering equipment so that the answering equipment generates an answering sentence according to the recommended dialect to answer the current voice information of the user.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor coupled to the memory, the memory for storing a computer program, the processor for executing the computer program stored in the memory to cause the electronic device to perform the method of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having a computer program stored thereon, the computer program causing a computer to perform the method according to the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program, the computer operable to cause the computer to perform a method according to the first aspect.

The implementation of the embodiment of the application has the following beneficial effects:

in the embodiment of the present invention, the intention of a sentence (second sentence) preceding a sentence (first sentence) spoken by the user at the present time in the current dialogue is analyzed and extracted, and the state trace tag of the first sentence is generated in accordance with the intention of the second sentence. And then, scoring the adapted at least one dialect through the first statement and the state tracking label, disturbing a scoring result through a random function, and then selecting the dialect corresponding to the highest score in the disturbed result for recommendation. Therefore, by means of the state tracking labels, each time of performing the dialect recommendation, the reference parameters carry the willingness of the dialog at the previous moment besides the current dialog, so that the recovery process has more initiative, the dialect recommendation direction can be influenced by marking different state tracking labels, and then the logic judgment is introduced into the model, so that the output result of the model has interpretable logicality. Meanwhile, random probability disturbance is carried out on the output result of the model through the introduced random function, interestingness and creativity in question answering can be increased, and customer experience is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic hardware structure diagram of a conversational recommendation device based on a random state according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a method for conversational recommendation based on a random state according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a method for performing text conversion processing on current voice information of a user to obtain a first sentence corresponding to the voice information according to an embodiment of the present application;

fig. 4 is a flowchart illustrating a method for determining dialect categories of voice information according to acoustic features according to an embodiment of the present application;

fig. 5 is a block diagram illustrating functional modules of a speech recommendation device based on a random state according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, of the embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application are within the scope of protection of the present application.

The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

First, referring to fig. 1, fig. 1 is a schematic hardware structure diagram of a random state-based conversational recommendation device according to an embodiment of the present disclosure. The stochastic state-based dialog recommendation device 100 includes at least one processor 101, a communication line 102, a memory 103, and at least one communication interface 104.

In this embodiment, the processor 101 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more ics for controlling the execution of programs according to the present disclosure.

The communication link 102, which may include a pathway, conveys information between the aforementioned components.

The communication interface 104 may be any transceiver or other device (e.g., an antenna, etc.) for communicating with other devices or communication networks, such as an ethernet, RAN, Wireless Local Area Network (WLAN), etc.

The memory 103 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

In this embodiment, the memory 103 may be independent and connected to the processor 101 through the communication line 102. The memory 103 may also be integrated with the processor 101. The memory 103 provided in the embodiments of the present application may generally have a nonvolatile property. The memory 103 is used for storing computer-executable instructions for executing the scheme of the application, and is controlled by the processor 101 to execute. The processor 101 is configured to execute computer-executable instructions stored in the memory 103, thereby implementing the methods provided in the embodiments of the present application described below.

In alternative embodiments, computer-executable instructions may also be referred to as application code, which is not specifically limited in this application.

In alternative embodiments, processor 101 may include one or more CPUs, such as CPU0 and CPU1 of FIG. 1.

In an alternative embodiment, the random state based dialog recommendation device 100 may include a plurality of processors, such as processor 101 and processor 107 in FIG. 1. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

In an alternative embodiment, if the technology recommendation device 100 is based on a random state, for example, the technology recommendation device may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, a cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. The random state based dialog recommendation apparatus 100 may further include an output device 105 and an input device 106. The output device 105 is in communication with the processor 101 and may display information in a variety of ways. For example, the output device 105 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 106 is in communication with the processor 101 and may receive user input in a variety of ways. For example, the input device 106 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.

The random state based dialog recommendation device 100 may be a general purpose device or a special purpose device. The embodiment of the present application does not limit the type of the speech recommendation device 100 based on the random state.

Next, it should be noted that the embodiments disclosed in the present application may acquire and process relevant data based on artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Finally, the tactical recommendation method based on the random state can be applied to scenes of e-commerce sales, off-line entity sales, service popularization, AI (artificial intelligence) call outgoing, social platform popularization and the like. In the application, the random state-based speech recommendation method is mainly described by taking an AI telephone outbound scenario as an example, and the random state-based speech recommendation method in other scenarios is similar to the implementation manner in the AI telephone outbound scenario and is not described here.

Hereinafter, the random state-based conversational recommendation method disclosed in the present application will be described:

referring to fig. 2, fig. 2 is a schematic flowchart of a method for recommending dialect based on a random state according to an embodiment of the present disclosure. The tactical recommendation method based on the random state comprises the following steps:

201: and performing text conversion processing on the current voice information of the user to obtain a first sentence corresponding to the voice information.

Because the region of China is extensive, the landform is numerous, and complex landforms which are difficult to surmount are few. The complex terrains make people difficult to communicate through the terrains in the period of underdeveloped traffic, so that China forms various social, historical and humanistic environments, and therefore, various dialects are formed.

In this embodiment, the voice information may be the voice information input by the user through the audio capture device, for example, for an AI call-out scenario, the voice information may be a sentence in which the user replies or asks a question to the voice sent out by the AI through the audio capture device of the communication device. Therefore, in consideration of the diversity and wide usability of dialects in China, it is necessary to consider the situation that users communicate through dialects in practical application.

Based on this, the present embodiment provides a method for performing text conversion processing on current voice information of a user to obtain a first sentence corresponding to the voice information, as shown in fig. 3, the method includes:

301: acoustic features of the speech information are obtained.

In this embodiment, an acoustic model may be trained in advance, for example: multilayer long and short term memory networks, multilayer convolutional neural networks, and the like. Thus, by inputting the speech to be recognized into the acoustic model, the acoustic features of the speech to be recognized are extracted. For example, the acoustic features may include a feature sequence of the speech to be recognized, a posterior probability distribution of phonemes in the speech to be recognized, and an acoustic vector of the speech to be recognized.

Specifically, the output of the lower network in the acoustic model may be used as the feature sequence of the speech to be recognized, and the output of the higher network may be used as the acoustic vector of the speech to be recognized. The posterior probability distribution of the phonemes in the speech to be recognized refers to the probability that each phoneme in the speech to be recognized is recognized as a different phoneme.

302: and determining the dialect class of the voice information according to the acoustic characteristics.

In this embodiment, a method for determining dialect categories of voice information according to acoustic features is provided, and as shown in fig. 4, the method includes:

401: and determining the energy distribution, rhythm distribution, fundamental frequency and average speech power of the speech information according to the acoustic characteristics.

Specifically, the posterior probability distribution and the acoustic vectors of phonemes in the acoustic features can be analyzed to obtain the energy distribution and the prosody distribution of the voice information; analyzing the characteristic sequence in the acoustic characteristics to obtain the average speech power of the speech information; and the fundamental frequency can be obtained by analyzing the pronunciation characteristics of the voice information.

402: and determining a dialect chip area corresponding to the voice information according to the fundamental frequency and the average voice power.

Specifically, the dialect, although only traveling in a certain region, has a complete system. All dialects have a voice structure system, a vocabulary structure system and a grammar structure system, can meet the social communication needs of the local area, and all in all, the dialects in the same dialect section often show the language characteristics of ' different ones in the same dialect and ' same ones in different ones '. The term "identity" is usually reflected in the bottom-level features of sound, i.e. pronunciation frequency, speech rate, etc. In other words, for each dialect, there is a certain geographic commonality between the fundamental frequency of pronunciation and the average speech power of pronunciation.

Therefore, in this embodiment, the fundamental frequency feature and the average speech power feature of the dialect of each dialect segment may be extracted and stored in advance, and then the dialect segment corresponding to the speech information may be determined by performing feature matching on the fundamental frequency and the average speech power of the speech information and the fundamental frequency feature and the average speech power feature of each dialect segment collected in advance, for example, by calculating a similarity or an euclidean distance.

403: and respectively coding the energy distribution and the prosody distribution to obtain an energy distribution vector and a prosody distribution vector.

In this embodiment, the energy distribution may indicate changes in loudness in the speech information, and the prosody distribution may indicate changes in pitch in the speech information. Specifically, an energy distribution vector may be obtained by acquiring an energy spectrum of the speech information and performing a map embedding process on the energy spectrum. Similarly, the prosody distribution vector can be obtained by obtaining the frequency distribution map of the voice information and then performing map embedding processing on the frequency distribution map.

404: and longitudinally splicing the energy distribution vector and the rhythm distribution vector to obtain a tone distribution vector.

In an alternative embodiment, the energy distribution vector and the prosody distribution vector may be summed and then averaged, and the obtained average vector may be used as the timbre distribution vector.

405: and matching in a tone library corresponding to the dialect film area according to the tone distribution vector, and determining the dialect category of the voice information according to a matching result.

Specifically, as shown in step 402, dialects in the same dialect zone may exhibit a language feature of "having a different middle or a same middle" wherein the "different" is usually reflected in the middle-high level features of sound, such as the circulation of tones, the tail tone, etc.

Based on this, in the present embodiment, the tone color distribution characteristics of each dialect in each dialect segment may be extracted and stored in advance, and then the dialect category corresponding to the voice information may be determined by performing feature matching between the tone color distribution vector of the voice information and the tone color distribution characteristics of each dialect previously collected in the corresponding dialect segment, for example, by calculating the similarity or euclidean distance.

303: and acquiring an audio transposition formula corresponding to the dialect type, and converting the voice information into standard voice through the audio transposition formula.

In this embodiment, the audio transposition formula is used to identify translation features between corresponding dialect pronunciations and mandarin pronunciations. Specifically, with the dialect transpose formula, dialect speech can be converted into corresponding mandarin speech, i.e., the standard speech mentioned in the present application.

In this embodiment, the differences and rules between different dialects and mandarin chinese can be determined by a training method by collecting a large amount of texts with different dialects but the same content, for example: the difference and the law of pronunciation, the difference and the law of tone, the corresponding relation of the proprietary vocabulary, etc., then form the audio transposition formula that different dialects are converted into mandarin.

304: and acquiring the pinyin text of the voice information according to the standard voice.

In this embodiment, the feature extraction may be performed on the standard speech, for example: and obtaining corresponding audio features by means of spectrum conversion, nonlinear spectrum conversion and feature coefficient conversion. The audio feature may be a degree map at the auditory critical band scale corresponding to the standard speech, such as: the mapping of the standard voice in the Bark domain and the mapping of the standard voice in the Equivalent Rectangular Bandwidth (ERB) domain can be used for performing quantization representation on the audio features of the standard voice through the audio features.

Then, the audio features can be matched in a preset neural network to obtain a pinyin text matched with the audio features. Specifically, the pinyin text may be composed of at least one first pinyin-meta text, and the first pinyin-meta text refers to any one of an initial or a final.

305: and matching in a preset vocabulary library according to the Pinyin text to obtain a first sentence.

Specifically, after the pinyin text is obtained, matching can be performed in a preset vocabulary library according to each first pinyin element text in at least one first pinyin element text in the pinyin text to obtain at least one first character in one-to-one correspondence with the at least one first pinyin element text. Then, the at least one first character is arranged according to the arrangement sequence of the at least one first pinyin element text in the pinyin text according to the corresponding relation between the at least one first character and the at least one first pinyin element text, and a first sentence can be obtained.

202: and querying historical dialogue data according to the first statement to obtain a second statement.

In the present embodiment, the historical dialogue data is used to record dialogue data generated before the current time of a dialogue event to which the first sentence belongs, the occurrence time of the second sentence is earlier than the occurrence time of the first sentence, and the absolute value of the difference between the occurrence time of the second sentence and the occurrence time of the first sentence is minimum. In brief, the second sentence is the last sentence the user said before the current sentence.

Specifically, taking the scenario of an AI call-out as an example, two interrelated sentence queues may be stored in the historical dialog data, where one queue is used for storing user sentences issued by users, and the other pair is used for storing AI sentences issued by AIs. Meanwhile, each user statement in the user statement queue and each AI statement in the AI statement queue include a dialog identifier and dialog occurrence time, and the user statement and the AI statement with the same dialog identifier are a question-and-answer pair, that is, the user statement with the same dialog identifier is a reply to the AI statement, or the AI statement with the same dialog identifier is a reply to the user statement. Therefore, the question-answer logicality in the historical dialogue data can be guaranteed, and meanwhile, the statements of the user and the AI are separately stored, so that the search is facilitated.

Therefore, in the present embodiment, by referring to the user sentence queue, a sentence in which the dialog occurrence time is earlier than the occurrence time of the first sentence and the absolute value of the difference between the occurrence time and the occurrence time of the first sentence is the smallest can be specified as the second sentence.

In an alternative embodiment, the historical dialog of the user may be stored according to the sequence of the dialog occurrence time, and at this time, only the sentence preceding the first sentence needs to be determined in the user sentence queue, so that the second sentence may be obtained.

203: and extracting the will of the second sentence to obtain a first will characteristic.

In this embodiment, semantic extraction may be performed on the second sentence, and then matching may be performed in a preset will library according to the obtained semantic vector, so as to determine the first will feature of the second sentence. Specifically, the willingness characteristics pre-stored in the willingness library are set according to the applicable field, for example, for an AI call-out scene, the willingness characteristics can be pre-set to include: the interest, the amount, the repayment time, the dividend proportion and other strongly related willingness characteristics. Therefore, when matching is carried out, the intention characteristic most similar to the semantic vector of the second sentence can be matched as the first intention characteristic in a mode of calculating the similarity.

204: and generating a state tracking label of the first statement according to the first will characteristic.

In this embodiment, the state tracking tag is used to identify the willingness direction and demand intensity of the user when speaking the current voice message, and may be used to determine the interest degree and direction of the user in the service visited or recommended by the user during the current outbound call, so as to guide the subsequent reply selection, and improve the call completion rate of the current outbound call.

Specifically, the requirement strength of the user in the previous session can be determined by acquiring a first requirement score corresponding to the second statement, then the intention of the first statement is extracted, the first requirement score is updated according to the obtained second intention characteristics, and a second requirement score corresponding to the first statement is obtained, so that the current intention strength of the user is determined. Finally, combining the second requirement score and the first requirement characteristic to obtain a state tracking label capable of reflecting the direction and the strength of the user's intention, for example: "antecedent _ first intent characteristic _ second demand score".

Illustratively, when the first sentence is the first sentence, the previous sentence (the second sentence) is not present since it is the first sentence. Therefore, the second sentence may be regarded as null, and the corresponding will is also null, and at this time, the first demand score corresponding to the second sentence may be determined according to the single forming rate of the user in the historical business data by analyzing the business data of the user history, which is now assumed to be 6. Thus, the state tracking tag of the first statement is "antecedent _0_ 6". When the first sentence is a non-first sentence, the previous sentence is bound to exist, and at this time, the first requirement score corresponding to the previous sentence (the second sentence) can be determined by analyzing the semantics and emotion of the previous sentence, and if the willingness characteristic of the previous sentence is "how much amount", and the corresponding second requirement score is 8, the state tracking label of the first sentence is "how much amount of previous item — 8".

Meanwhile, the embodiment provides a method for updating the first requirement score according to the second intention characteristics to obtain a second requirement score corresponding to the first statement. Specifically, the second intention characteristic may be encoded to obtain a first character string, and meanwhile, a call parameter of a call to which the first sentence belongs is obtained, and a missing value in the call parameter is completed to obtain a target parameter. And then, inputting the second character string into a logistic regression model to obtain a second demand score.

In the present embodiment, for the completion of the missing value, the missing value may be replaced with a preset replacement value, for example: 999, and obtaining target parameters; or acquiring a historical data set corresponding to the data type according to the data type of the missing value, and replacing the missing value with a median of at least one data value included in the historical data set to obtain a target parameter; or replacing the missing value with the mean value of at least one data value included in the historical data set to obtain the target parameter.

In an alternative embodiment, the second intention characteristic may also be directly used as the state tracking label of the first sentence, that is, the intention characteristic of the previous sentence is used as the state tracking label of the current sentence. Illustratively, when the first sentence is the first sentence, since the first sentence is the first sentence and the previous sentence is not present, the state tracking tag of the first sentence is "previous item _ 0", that is, the previous sentence is empty. When the first sentence is a non-first sentence, the previous sentence is bound to exist, and at this time, the intended feature of the previous sentence can be determined as the state tracking tag of the first sentence. Illustratively, if the willingness characteristic of the previous sentence is "how much", then the state tracking label of the first sentence is "how much of the previous term _ amount".

205: and inputting the first sentence, the state tracking label and the at least one dialect into a scoring model to obtain at least one first score in one-to-one correspondence with the at least one dialect.

In this embodiment, the scoring model is used to score each of the at least one dialect, and the rankingbert model may be selected as the scoring model. Specifically, the first sentence is analyzed, and the obtained analysis result and the state tracking tag are arranged in the following manner to obtain input data:

[ cls, customer intent, sep, jargon, sep ] + [ whether there is a State Trace ] + [ State Trace tag ]

The "client intention" is an intention feature of the first sentence, and the extraction method is similar to the extraction method of the intention feature of the second sentence in step 203, and is not described herein again. The utterance is a type of utterance to which the first sentence belongs.

Therefore, compared with the traditional input sequence "cls, customer intention, sep, dialog, sep" of the rankingbert model, the state tracking label is added in the input sequence in the embodiment, so that logical judgment is introduced in the model, and the result output by the model has interpretable logicality.

206: and multiplying each first fraction in the at least one first fraction by a random function to obtain at least one second fraction in one-to-one correspondence with the at least one first fraction.

In this embodiment, a random function is provided, as shown in formula (i):

wherein x represents the type of utterance to which the first sentence belongs, C_xRepresenting the number of successful singletons by utterance type in the historical dialogue data, S_xRepresents the number of dialogs using the dialogs type in the historical dialog data, a represents the weight coefficient, random ([1, n)_x]) Is equal to n_xAssociated random number, n_xThe number of available dialogs matched for the intended feature of the first sentence.

Therefore, the first fraction, the second fraction and the random function satisfy the formula (II):

y_i＝z_i×p(x).........②

wherein, y_iRepresents the ith second score, z, of the at least one second score_iRepresents the ith first fraction of the at least one first fraction, i being an integer greater than or equal to 1.

207: and recommending the dialect corresponding to the maximum second score in the at least one second score to the answering device as a recommended dialect so that the answering device generates an answering sentence according to the recommended dialect to answer the current voice information of the user.

In this embodiment, the first 5 largest scores in the first score group may be selected, and the first 5 largest scores may be multiplied by the random function, respectively, to select a rule corresponding to the largest score in the result for recommendation. Therefore, the calculation quantity can be reduced, and the recommendation efficiency can be improved.

In summary, in the method for recommending words based on random states provided by the present invention, the intention of a sentence (second sentence) before a sentence (first sentence) spoken by a user at the current time in a current dialog is analyzed and extracted, so that a state tracking label of the first sentence is generated according to the intention of the second sentence. And then, scoring the adapted at least one dialect through the first statement and the state tracking label, disturbing a scoring result through a random function, and then selecting the dialect corresponding to the highest score in the disturbed result for recommendation. Therefore, by means of the state tracking labels, each time of performing the dialect recommendation, the reference parameters carry the willingness of the dialog at the previous moment besides the current dialog, so that the recovery process has more initiative, the dialect recommendation direction can be influenced by marking different state tracking labels, and then the logic judgment is introduced into the model, so that the output result of the model has interpretable logicality. Meanwhile, random probability disturbance is carried out on the output result of the model through the introduced random function, interestingness and creativity in question answering can be increased, and customer experience is improved.

Referring to fig. 5, fig. 5 is a block diagram illustrating functional modules of a speech recommendation device based on a random state according to an embodiment of the present disclosure. As shown in fig. 5, the random state based dialog recommendation device 500 includes:

the analysis module 501 is configured to perform text conversion processing on the current voice information of the user to obtain a first sentence corresponding to the voice information;

the query module 502 is configured to query historical dialogue data according to a first statement to obtain a second statement, where the historical dialogue data is used to record dialogue data generated before a current time by a dialogue event to which the first statement belongs, an occurrence time of the second statement is earlier than an occurrence time of the first statement, and an absolute value of a difference between the occurrence time of the second statement and the occurrence time of the first statement is minimum;

the processing module 503 is configured to perform intent extraction on the second sentence to obtain a first intent feature, and generate a state tracking label of the first sentence according to the first intent feature, where the state tracking label is used to identify an intent direction and a demand intensity of the user when the user speaks the voice information;

a scoring module 504, configured to input the first sentence, the state tracking tag, and the at least one utterance into a scoring model to obtain at least one first score, where the scoring model is configured to score each of the at least one utterance, and multiply each of the at least one first score by a random function to obtain at least one second score, where the at least one second score is in one-to-one correspondence with the at least one first score;

And a recommending module 505, configured to recommend the dialect corresponding to the largest second score in the at least one second score to the replying device, so that the replying device generates a reply sentence according to the recommended dialect to reply to the current voice information of the user.

In an embodiment of the present invention, in terms of generating a state tracking label of a first statement according to a first will characteristic, the processing module 503 is specifically configured to:

acquiring a first demand score corresponding to the second statement;

willingness of the first sentence is extracted to obtain a second willingness characteristic;

updating the first demand score according to the second will characteristics to obtain a second demand score corresponding to the first statement;

and combining the second demand score and the first will characteristic to obtain a state tracking label.

In an embodiment of the present invention, in terms of updating the first requirement score according to the second intention characteristic to obtain a second requirement score corresponding to the first statement, the processing module 503 is specifically configured to:

coding the second will characteristics to obtain a first character string;

acquiring call parameters of a call to which the first statement belongs, and completing missing values in the call parameters to obtain target parameters;

Splicing the target parameter, the first demand score and the first character string to obtain a second character string;

and inputting the second character string into the logistic regression model to obtain a second demand score.

In an embodiment of the present invention, in completing missing values in call parameters to obtain target parameters, the processing module 503 is specifically configured to:

replacing the missing value with a preset replacing value to obtain a target parameter;

or acquiring a historical data set corresponding to the data type according to the data type of the missing value, and replacing the missing value with a median of at least one data value included in the historical data set to obtain a target parameter;

or replacing the missing value with the mean value of at least one data value included in the historical data set to obtain the target parameter.

In the embodiment of the invention, the random function is shown as formula (c):

wherein x represents the type of utterance to which the first sentence belongs, C_xRepresenting the number of successful singletons by utterance type in historical dialogue data, S_xRepresents the number of dialogs using the type of dialogs in the historical dialog data, and a represents the weighting factor, random ([1, n)_x]) Is equal to n_xAssociated random number, n_xThe number of available dialogs matched for the intended feature of the first sentence.

In the embodiment of the present invention, in terms of performing text conversion processing on the current voice information of the user to obtain the first sentence corresponding to the voice information, the analysis module 501 is specifically configured to:

acquiring acoustic features of voice information;

determining dialect types of the voice information according to the acoustic characteristics;

acquiring an audio transposition formula corresponding to the dialect type, and converting the voice information into standard voice through the audio transposition formula, wherein the audio transposition formula is used for identifying the conversion characteristics between corresponding dialect pronunciation and mandarin pronunciation;

acquiring a pinyin text of voice information according to standard voice;

and matching in a preset vocabulary library according to the Pinyin text to obtain a first sentence.

In an embodiment of the present invention, in terms of determining a dialect category of the voice information according to the acoustic features, the analysis module 501 is specifically configured to:

determining the energy distribution, rhythm distribution, fundamental frequency and average voice power of the voice information according to the acoustic characteristics;

determining a dialect area corresponding to the voice information according to the fundamental frequency and the average voice power;

respectively coding the energy distribution and the rhythm distribution to obtain an energy distribution vector and a rhythm distribution vector;

Longitudinally splicing the energy distribution vector and the rhythm distribution vector to obtain a tone distribution vector;

and matching in a tone library corresponding to the dialect area according to the tone distribution vector, and determining the dialect category of the voice information according to a matching result.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 6, the electronic device 600 includes a transceiver 601, a processor 602, and a memory 603. Connected to each other by a bus 604. The memory 603 is used to store computer programs and data, and can transfer data stored in the memory 603 to the processor 602.

The processor 602 is configured to read the computer program in the memory 603 to perform the following operations:

querying historical dialogue data according to the first statement to obtain a second statement, wherein the historical dialogue data is used for recording dialogue data generated before the current moment by a dialogue event to which the first statement belongs, the occurrence time of the second statement is earlier than that of the first statement, and the absolute value of the difference between the occurrence time of the second statement and that of the first statement is minimum;

Extracting the intention of the second sentence to obtain a first intention characteristic;

generating a state tracking label of a first statement according to the first intention characteristic, wherein the state tracking label is used for identifying own intention direction and demand strength of a user when the user speaks voice information;

inputting the first sentence, the state tracking tag and at least one dialect into a scoring model to obtain at least one first score, wherein the scoring model is used for scoring each of the at least one dialect, and the at least one first score is in one-to-one correspondence with the at least one dialect;

recommending the dialect corresponding to the maximum second score in the at least one second score as the recommended dialect to the answering device, so that the answering device generates an answering sentence according to the recommended dialect to answer the current voice information of the user.

In an embodiment of the present invention, in generating a state tracking label of a first statement according to a first will characteristic, the processor 602 is specifically configured to perform the following operations:

Acquiring a first demand score corresponding to the second statement;

In an embodiment of the present invention, in terms of updating the first requirement score according to the second will characteristic to obtain a second requirement score corresponding to the first sentence, the processor 602 is specifically configured to perform the following operations:

coding the second will characteristics to obtain a first character string;

In an embodiment of the present invention, in completing missing values in call parameters to obtain target parameters, the processor 602 is specifically configured to perform the following operations:

or replacing the missing value with a mean value of at least one data value included in the historical data set to obtain the target parameter.

In the embodiment of the present invention, the random function is represented by the formula (iv):

In the embodiment of the present invention, in terms of performing text conversion processing on current voice information of a user to obtain a first sentence corresponding to the voice information, the processor 602 is specifically configured to perform the following operations:

acquiring acoustic characteristics of voice information;

acquiring an audio transposition formula corresponding to the dialect type, and converting the voice information into standard voice through the audio transposition formula, wherein the audio transposition formula is used for identifying the conversion characteristics between the corresponding dialect pronunciation and the mandarin pronunciation;

Acquiring a pinyin text of voice information according to standard voice;

In an embodiment of the present invention, in determining the dialect class of the speech information according to the acoustic features, the processor 602 is specifically configured to:

determining the energy distribution, rhythm distribution, fundamental frequency and average speech power of the speech information according to the acoustic characteristics;

It should be understood that the random state based tactical recommendation device in the present application may include a smart Phone (e.g., an Android Phone, an iOS Phone, a Windows Phone, etc.), a tablet computer, a palm computer, a notebook computer, a Mobile Internet device MID (Mobile Internet Devices, MID for short), a robot or a wearable device, etc. The random state based utterance recommendation device is only exemplary and not exhaustive, and includes but is not limited to the random state based utterance recommendation device. In practical applications, the above speaking recommendation device based on random state may further include: intelligent vehicle-mounted terminal, computer equipment and the like.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by combining software and a hardware platform. With this understanding in mind, all or part of the technical solutions of the present invention that contribute to the background can be embodied in the form of a software product, which can be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments or some parts of the embodiments.

Accordingly, the present application also provides a computer readable storage medium, which stores a computer program, wherein the computer program is executed by a processor to implement part or all of the steps of any one of the random state based dialog recommendation methods as described in the above method embodiments. For example, the storage medium may include a hard disk, a floppy disk, an optical disk, a magnetic tape, a magnetic disk, a flash memory, and the like.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the stochastic state based dialog recommendation methods as described in the above method embodiments.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are all alternative embodiments and that the acts and modules referred to are not necessarily required by the application.

In the above embodiments, the description of each embodiment has its own weight, and for parts that are not described in detail in a certain embodiment, reference may be made to the description of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is merely a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.

The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps of the various methods of the above embodiments may be implemented by program instructions associated with hardware, the program instructions may be stored in a computer readable memory, and the memory may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the methods and their core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method for random state based conversational recommendation, the method comprising:

performing text conversion processing on current voice information of a user to obtain a first sentence corresponding to the voice information;

Querying historical dialogue data according to the first statement to obtain a second statement, wherein the historical dialogue data is used for recording dialogue data generated before the current moment by a dialogue event to which the first statement belongs, the occurrence time of the second statement is earlier than that of the first statement, and the absolute value of the difference between the occurrence time of the second statement and the occurrence time of the first statement is minimum;

generating a state tracking label of the first statement according to the first intention characteristic, wherein the state tracking label is used for identifying own intention direction and demand strength of the user when the user speaks the voice information;

Recommending the dialect corresponding to the maximum second score in the at least one second score to a reply device, so that the reply device generates a reply sentence according to the recommended dialect to reply to the current voice information of the user.

2. The method of claim 1, wherein generating the state tracking label for the first sentence according to the first will characteristics comprises:

acquiring a first demand score corresponding to the second statement;

extracting the will of the first sentence to obtain a second will characteristic;

updating the first demand score according to the second will characteristic to obtain a second demand score corresponding to the first statement;

and combining the second demand score and the first will characteristic to obtain the state tracking label.

3. The method of claim 2, wherein the updating the first demand score according to the second will characteristics to obtain a second demand score corresponding to the first sentence, comprises:

coding the second will characteristics to obtain a first character string;

and inputting the second character string into a logistic regression model to obtain the second demand score.

4. The method of claim 3, wherein completing missing values in the call parameters to obtain target parameters comprises:

replacing the missing value with a preset replacing value to obtain the target parameter;

or acquiring a historical data set corresponding to the data type according to the data type of the missing value, and replacing the missing value with a median of at least one data value included in the historical data set to obtain the target parameter;

5. The method of claim 1, wherein the random function is as follows:

wherein x represents a conversational type to which the first sentence belongs, C_xRepresenting the number of successful singletons by said type of dialogs in the historical dialog data, S_xRepresenting the number of dialogs in the historical dialog data using the type of dialogs, a representing the weight coefficient, random ([1, n) _x]) Is a and n_xAssociated random number, n_xThe number of available dialogs matched for the intended feature of the first sentence.

6. The method of claim 1, wherein performing text conversion processing on the current speech information of the user to obtain a first sentence corresponding to the speech information comprises:

acquiring acoustic features of the voice information;

determining dialect types of the voice information according to the acoustic features;

acquiring an audio transposition formula corresponding to the dialect category, and converting the voice information into standard voice through the audio transposition formula, wherein the audio transposition formula is used for identifying conversion characteristics between corresponding dialect pronunciation and mandarin pronunciation;

acquiring a pinyin text of the voice information according to the standard voice;

and matching the pinyin text in a preset vocabulary library to obtain the first sentence.

7. The method of claim 6, wherein determining the dialect class of the speech information based on the acoustic features comprises:

determining the energy distribution, rhythm distribution, fundamental frequency and average voice power of the voice information according to the acoustic features;

8. A stochastic state based dialog recommendation device, the device comprising:

The processing module is used for extracting the will of the second sentence to obtain a first will feature, and generating a state tracking label of the first sentence according to the first will feature, wherein the state tracking label is used for identifying the own direction of the will and the required strength of the user when the user speaks the voice information;

a scoring module, configured to input the first sentence, the state tracking tag, and at least one utterance into a scoring model to obtain at least one first score, where the scoring model is configured to score each of the at least one utterance, the at least one first score is in one-to-one correspondence with the at least one utterance, and multiply each of the at least one first score by a random function to obtain at least one second score, where the at least one second score is in one-to-one correspondence with the at least one first score;

and the recommending module is used for recommending the dialect corresponding to the maximum second score in the at least one second score to the answering equipment so as to enable the answering equipment to generate an answering sentence according to the recommended dialect and answer the current voice information of the user.

9. An electronic device comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the one or more programs including instructions for performing the steps in the method of any of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method according to any one of claims 1-7.