WO2021082070A1 - 智能对话方法及相关设备 - Google Patents

智能对话方法及相关设备 Download PDF

Info

Publication number
WO2021082070A1
WO2021082070A1 PCT/CN2019/117542 CN2019117542W WO2021082070A1 WO 2021082070 A1 WO2021082070 A1 WO 2021082070A1 CN 2019117542 W CN2019117542 W CN 2019117542W WO 2021082070 A1 WO2021082070 A1 WO 2021082070A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
question
target
sentences
similarities
Prior art date
Application number
PCT/CN2019/117542
Other languages
English (en)
French (fr)
Inventor
刘涛
许开河
王少军
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021082070A1 publication Critical patent/WO2021082070A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • This application relates to the field of electronic technology, in particular to an intelligent dialogue method and related equipment.
  • Intelligent dialogue is an important application in the field of artificial intelligence. Humans are born with the ability to analyze the state, theme, and tone of dialogue. It is of great significance to realize intelligent dialogue on machines.
  • intelligent dialogue is mainly implemented based on two models, a generative model and a rule model.
  • the generative model can answer questions that do not appear in the corpus, but the answer sentence is uncontrollable; while the rule model can answer questions that do not appear in the corpus although the answer sentence is controllable. Therefore, how to achieve a controllable answer to the problems that did not appear in the corpus is a technical problem that needs to be solved.
  • the embodiments of the present application provide an intelligent dialogue method and related equipment, which are used to implement controllable answers to questions that do not appear in a corpus.
  • an embodiment of the present application provides an intelligent dialogue method, which is applied to an electronic device, and the method includes:
  • N first question sentences are determined, and the similarity between each first question sentence and the target question sentence is greater than or equal to the first threshold, the N is an integer greater than 1, and each first question sentence is A question sentence is associated with a first answer sentence;
  • N first parameters are determined based on a preset neural network model, the N first parameters are in one-to-one correspondence with the N first question sentences, and the N first parameters are used to evaluate their corresponding first question sentences Similarity with the target question sentence;
  • the target answer sentence is the first answer sentence associated with the first question sentence corresponding to the target parameter, the value of the target parameter is greater than or equal to the second threshold, the
  • the N first parameters include the target parameter
  • an intelligent dialogue device which is applied to an electronic device, and the device includes:
  • the determining unit is configured to determine N first question sentences based on the target question sentence input by the user, and the similarity between each first question sentence and the target question sentence is greater than or equal to a first threshold, where N is greater than 1.
  • each first question sentence is associated with a first answer sentence;
  • N first parameters are determined based on a preset neural network model, and the N first parameters are in one-to-one correspondence with the N first question sentences, the The N first parameters are used to evaluate the similarity between the corresponding first question sentence and the target question sentence;
  • the target answer sentence is taken as the answer sentence of the target question sentence, and the target answer sentence is the first question corresponding to the target parameter.
  • a first answer sentence associated with a question sentence, the value of the target parameter is greater than or equal to a second threshold, and the N first parameters include the target parameter;
  • the output unit is used to output the target answer sentence.
  • embodiments of the present application provide an electronic device that includes a processor, a memory, a communication interface, and one or more programs.
  • the one or more programs are stored in the memory and are The configuration is executed by the processor, and the program includes instructions for executing part or all of the steps described in the method described in the first aspect of the embodiments of the present application.
  • an embodiment of the present application provides a computer-readable storage medium, wherein the above-mentioned computer-readable storage medium is used to store a computer program, and the above-mentioned computer program is executed by a processor to realize Part or all of the steps described in the method described in one aspect.
  • N first question sentences are determined based on the target question sentence input by the user, and then N first parameters are determined based on the preset neural network model, and the N first parameters are used to evaluate them.
  • the corresponding first question sentence is similar to the target question sentence, and then the first answer sentence associated with the first question sentence corresponding to the first parameter greater than or equal to the second threshold is used as the answer sentence of the target question sentence, and finally output
  • This answer sentence determines N first question sentences based on the target question sentence input by the user, and conducts a rough screening to ensure the controllability of the answer sentence; N first parameters are determined based on the preset neural network model, which realizes flexibility Answer questions that did not appear in the corpus.
  • FIG. 1 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 2A is a schematic flowchart of an intelligent dialogue method provided by an embodiment of the present application.
  • 2B is a schematic diagram of a calculation process of sentence similarity provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of an intelligent dialogue method provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of an intelligent dialogue device provided by an embodiment of the present application.
  • Electronic devices can include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices (such as smart watches, smart bracelets, pedometers, etc.), computing devices or other processing devices that are communicatively connected to wireless modems, and various Various forms of user equipment (User Equipment, UE), mobile station (Mobile Station, MS), terminal equipment (terminal device), and so on.
  • UE User Equipment
  • MS Mobile Station
  • terminal device terminal device
  • FIG. 1 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic equipment includes a processor, a memory, a signal processor, a transceiver, a display screen, a speaker, a microphone, a random access memory (RAM), a camera, a sensor, and so on.
  • the memory, signal processor, display screen, speaker, microphone, RAM, camera, sensor are connected with the processor, and the transceiver is connected with the signal processor.
  • the display screen can be a liquid crystal display (Liquid Crystal Display, LCD), an organic or inorganic light-emitting diode (Organic Light-Emitting Diode, OLED), an active matrix organic light-emitting diode (Active Matrix/Organic Light Emitting Diode, AMOLED) )Wait.
  • LCD Liquid Crystal Display
  • OLED Organic Light-Emitting Diode
  • AMOLED Active Matrix/Organic Light Emitting Diode
  • the camera may be a normal camera or an infrared camera, which is not limited here.
  • the camera may be a front camera or a rear camera, which is not limited here.
  • the sensor includes at least one of the following: a light sensor, a gyroscope, an infrared proximity sensor, a fingerprint sensor, a pressure sensor, and so on.
  • the light sensor also called the ambient light sensor, is used to detect the brightness of the ambient light.
  • the light sensor may include a photosensitive element and an analog-to-digital converter.
  • the photosensitive element is used to convert the collected light signal into an electric signal
  • the analog-to-digital converter is used to convert the above electric signal into a digital signal.
  • the light sensor may further include a signal amplifier, and the signal amplifier may amplify the electrical signal converted by the photosensitive element and output it to the analog-to-digital converter.
  • the aforementioned photosensitive element may include at least one of a photodiode, a phototransistor, a photoresistor, and a silicon photocell.
  • the processor is the control center of the electronic device. It uses various interfaces and lines to connect the various parts of the entire electronic device. By running or executing software programs and/or modules stored in the memory, and calling data stored in the memory, Perform various functions of the electronic device and process data to monitor the electronic device as a whole.
  • the processor can integrate an application processor and a modem processor.
  • the application processor mainly processes an operating system, a user interface, and an application program
  • the modem processor mainly processes wireless communication. It can be understood that the above modem processor may not be integrated into the processor.
  • the memory is used to store software programs and/or modules, and the processor executes various functional applications and data processing of the electronic device by running the software programs and/or modules stored in the memory.
  • the memory may mainly include a storage program area and a storage data area, where the storage program area can store an operating system, at least one software program required by a function, etc.; the storage data area can store data created according to the use of electronic equipment, etc.
  • the memory may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • FIG. 2A is a schematic flowchart of an intelligent dialogue method provided by an embodiment of the present application, which is applied to an electronic device, and the method includes:
  • Step 201 Determine N first question sentences based on the target question sentence input by the user, and the similarity between each first question sentence and the target question sentence is greater than or equal to a first threshold, where N is an integer greater than 1, Each first question sentence is associated with a first answer sentence.
  • the information input by the user can be voice, text or pictures, and then the information input by the user is analyzed to obtain the target question sentence.
  • N can be, for example, 5, 10, 15, 20, or other values, which are not limited here.
  • the first threshold can be, for example, 80%, 85%, 90%, 95%, or other values, which are not limited here.
  • Table 1 is a relationship mapping table of a one-to-one correspondence between the first question sentence and the first answer sentence, and the relationship mapping table may be stored in a database associated with the electronic device.
  • Step 202 Determine N first parameters based on a preset neural network model.
  • the N first parameters are in one-to-one correspondence with the N first question sentences, and the N first parameters are used to evaluate the corresponding first parameters.
  • Step 203 Use a target answer sentence as an answer sentence of the target question sentence, the target answer sentence being the first answer sentence associated with the first question sentence corresponding to the target parameter, and the value of the target parameter is greater than or equal to a second threshold ,
  • the N first parameters include the target parameter.
  • the first threshold and the second threshold are both preset values.
  • the determined first question sentence includes 3, and the value of the first parameter of each is 80%, 85%, and 90% respectively, then the target parameter can be 90%, and 90% corresponds to the first
  • the first answer sentence associated with the question sentence serves as the answer sentence of the target question sentence.
  • Step 204 Output the target answer sentence.
  • the target answer sentence can be output by voice, or the target answer sentence can be output by text, which is not limited here.
  • N first question sentences are determined based on the target question sentence input by the user, and then N first parameters are determined based on the preset neural network model, and the N first parameters are used to evaluate them.
  • the corresponding first question sentence is similar to the target question sentence, and then the first answer sentence associated with the first question sentence corresponding to the first parameter greater than or equal to the second threshold is used as the answer sentence of the target question sentence, and finally output
  • This answer sentence determines N first question sentences based on the target question sentence input by the user, and conducts a rough screening to ensure the controllability of the answer sentence; N first parameters are determined based on the preset neural network model, which realizes flexibility Answer questions that did not appear in the corpus.
  • the determination of N first question sentences based on the target question sentences input by the user includes:
  • M second question sentences are determined from a preset corpus based on a literal search
  • W third question sentences are determined from the preset corpus based on a semantic search
  • the keywords of the literal search are determined based on the target question sentence Yes
  • the literal similarity between each second question sentence and the target question sentence is greater than or equal to the third threshold
  • the semantic similarity between each third question sentence and the target question sentence is greater than or equal to the fourth threshold
  • the first threshold is greater than or equal to the third threshold
  • the first threshold is greater than or equal to the fourth threshold
  • the M and W are both integers greater than 0;
  • N first question sentences are determined, and the N first question sentences include at least one second question sentence and at least one third question sentence.
  • the target question sentence is composed of a first character set, the first character set includes P first characters, and P is an integer greater than 0; the literal search determines M from a preset corpus
  • a specific implementation of the second question sentence is: searching in a preset corpus using at least one first character among the P first characters as a keyword to obtain Q fifth question sentences; Select M fifth question sentences from the five question sentences; determine the M fifth question sentences as M second question sentences.
  • the M fifth question sentences can be any M fifth question sentences manually selected, or they can be the M fifth question sentences ranked first after the search, or the M fifth question sentences that contain the most keywords.
  • the sentence of the fifth question is not limited here.
  • the number of first characters included in the M second question sentences is greater than or equal to the number of first characters included in the QM sixth question sentences, and the QM sixth question sentences are the Q fifth question sentences.
  • Question sentences except for the M fifth question sentences in the question sentences are the Q fifth question sentences.
  • the third threshold may be 60%, 70%, 80%, 90%, or other values, which are not limited here; the fourth threshold may be 60%, 70%, 80%, 90%, or Other values are not limited here.
  • the specific method for determining N first question sentences based on the M second question sentences and the W third question sentences is: determining n* from the M second question sentences N second question sentences, and (1-n)*N third question sentences are determined from the W third question sentences; combining the n*N second question sentences with the (1-n )*N third question sentences as N first question sentences.
  • n is a number greater than 0 and less than 1, for example, it can be 0.1, 0.2, 0.3, 0.4, or other values, which are not limited here.
  • the literal similarity between the n*N second question sentences and the target question sentence is greater than or equal to a fifth threshold, and the semantics of the (1-n)*N third question sentences of the target question sentence
  • the similarity is greater than or equal to the sixth threshold
  • the fifth threshold may be equal to the sixth threshold
  • the fifth threshold may not be equal to the sixth threshold, which is not limited here.
  • the determination of W third question sentences from the preset corpus based on semantic search includes:
  • W third question sentences are determined from the preset corpus, and the semantic similarity between each third question sentence and the fourth question sentence is greater than or equal to the fourth threshold.
  • the sentence constituents include at least one of the following: subject, predicate, object, attributive, adverbial, complement, head, and verb.
  • the subject of the sentence can be words such as "he”, “she”, “it”, “them”, “me”, “you” and so on.
  • the target question sentence is "Recommend a suitable schoolbag for me”
  • the sentence after removing the stop words is "Recommend a suitable schoolbag for me”.
  • the determination of W third question sentences from the preset corpus based on semantic search includes:
  • W third question sentences are determined from the preset corpus, and the semantic similarity between each third question sentence and the seventh question sentence is greater than or equal to the fourth threshold.
  • stop words are words that have no meaning to the sentence, such as "ah”, “oh”, “um”, “le”, “Me”, “ ⁇ ” and other words.
  • the target question sentence is "How is the weather tomorrow.”
  • the sentence after removing the stop words is "How is the weather tomorrow”.
  • the determining the N first parameters based on a preset neural network model includes:
  • N sentence-sentence similarity Determine the N sentence-sentence similarity, N edit distance, and N Jaccard similarity of the target question sentence and the N first question sentence based on the preset neural network model, and the N sentence-sentence similarity ,
  • the N edit distances and the N Jackard similarities are all in one-to-one correspondence with the N first question sentences;
  • N first parameters based on the N sentence similarities, the N edit distances, and the N Jackard similarities, the N first parameters and the N sentence similarities, The N edit distances and the N Jackard similarities have a one-to-one correspondence.
  • sentence similarity refers to the similarity between the target question sentence and the first question sentence.
  • the edit distance refers to the minimum number of edits to convert the first question sentence into the target question sentence through editing operations.
  • the determining N first parameters based on the N sentence similarities, the N edit distances, and the N Jackard similarities includes:
  • the first weight is used to represent the proportion of sentence similarity when used to evaluate the first parameter
  • the second weight is used to represent the first similarity
  • the third weight is used to indicate the proportion of Jackard’s similarity when used to evaluate the first parameter.
  • the first weight and the second weight The sum with the third weight is 1;
  • the formula determines N first parameters.
  • Table 2 is a one-to-one correspondence table between the edit distance and the first similarity provided in the embodiment of the present application.
  • Edit distance First similarity Greater than or equal to 0 and less than 3 90% Greater than or equal to 3 and less than 6 80% Greater than or equal to 6 and less than 9 70% Greater than or equal to 9 and less than 12 60% ⁇ ⁇
  • the determining the N sentence sentence similarity between the target question sentence and the N first question sentence based on a preset neural network model includes:
  • the target question sentence is transformed into a first sentence vector, and the N first question sentences are transformed into N second sentence vectors, and the N second sentence vectors are the same as the N first question sentences.
  • the feature information of the first sentence vector is extracted to obtain a first target vector, and the feature information of the N second sentence vectors is extracted to obtain N second target vectors.
  • the second sentence vector corresponds to one to one;
  • the sentence similarity between the first target vector and each second target vector is determined based on the sentence-sentence similarity calculation formula, and N sentence-sentence similarities are obtained.
  • the target question sentence is composed of a first character set, the first character set includes P first characters, and a specific implementation manner of converting the target question sentence into a first sentence vector includes: The P first characters are converted into P word vectors; the P word vectors are combined to obtain the first sentence vector.
  • the method of converting the P first characters into P word vectors can be at least one of the following: Bidirectional Encoder Representation from Transformers, BERT model, and language model embedding (Embeddings from Language Models) , ELMo) model, word2vec model.
  • the formula for calculating sentence similarity is Wherein h a, h b are the first target vector and the second target vector.
  • FIG. 2B is a schematic diagram of a calculation process of sentence similarity provided by an embodiment of the present application.
  • the target question sentence is “He is smart”
  • the word vector of “He” is x 1 a
  • the word vector of “is” is x 2 a
  • the word vector of “smart” is x 3 a
  • the first question sentence is "A truly wise man”
  • the word vector of "A” is x 1 b
  • the word vector of "truly” is x 2 b
  • the word vector of "wise” is x 3 b
  • "man The word vector of ” is x 4 b
  • the characteristic information of x 1 b , x 2 b , x 3 b , and x 4 b are extracted through the LSTMb algorithm to obtain h 1 b , h 2 b , h 3 b , h 4 b
  • the similarity is calculated by sentence f (h a, h b) to give sentence similarity A, and an output sentence similarity A.
  • the target question sentence is composed of a first character set
  • the N first question sentences are composed of N second character sets
  • the N second character sets are identical to the N One-to-one correspondence between the first question sentences
  • the determining the N edit distances between the target question sentence and the N first question sentences based on a preset neural network model includes:
  • the obtained N minimum number of editing operations are determined as N editing distances, and the N editing distances are in one-to-one correspondence with the N minimum number of editing operations.
  • the editing operation includes at least one of the following: insert, delete, and replace.
  • the two words “kitten” and “sitting” are converted from “kitten” to “sitting".
  • the minimum single-character editing operations required are: the first step, kitten ⁇ sitten (replace “k” with “s”) ; The second step, sitten ⁇ sittin (replace “e” with “i”); the third step, sittin ⁇ sitting (insert "g" at the end of the word). Therefore, the edit distance between the words "kitten” and "sitting” is 3.
  • the determining the N Jackard similarities between the target question sentence and the N first question sentences based on a preset neural network model includes:
  • the N Jaccard similarities are determined based on the N intersections and the N unions, and the N Jaccard similarities are in a one-to-one correspondence with the N intersections and the N unions.
  • the first character set includes P first characters
  • the second character set includes Q second characters, where there are R identical first characters and second characters, then the first character set and the second character set The intersection is R, the union of the first character set and the second character set is P+QR, the Jaccard similarity is R/(P+QR), and both R and Q are integers greater than 0.
  • FIG. 3 is a schematic flowchart of an intelligent dialogue method provided by an embodiment of the present application, which is applied to an electronic device, and the method includes:
  • Step 301 Obtain a target question sentence input by a user, where the target question sentence is composed of a first character set.
  • Step 302 Determine M second question sentences from a preset corpus based on a literal search, the keywords of the literal search are determined based on the target question sentence, and each second question sentence and the literal face of the target question sentence The similarities are all greater than or equal to the third threshold, and the M is an integer greater than zero.
  • Step 303 Determine the sentence constituents of the target question sentence.
  • Step 304 Filter the target question sentence based on the sentence composition component to obtain a fourth question sentence, and the sentence composition component of the fourth question sentence is less than or equal to the sentence composition component of the target question sentence.
  • Step 305 Determine W third question sentences from the preset corpus, and the semantic similarity between each third question sentence and the fourth question sentence is greater than or equal to the fourth threshold, and W is greater than An integer of 0.
  • Step 306 Determine N first question sentences based on the M second question sentences and the W third question sentences, the N first question sentences including at least one second question sentence and at least one third question sentence Question sentences, the similarity between each first question sentence and the target question sentence is greater than or equal to a first threshold, the first threshold is greater than or equal to the third threshold, and the first threshold is greater than or equal to the The fourth threshold, the N first question sentences are composed of N second character sets, and the N second character sets correspond to the N first question sentences in a one-to-one correspondence.
  • Step 307 Transform the target question sentence into a first sentence vector, and transform the N first question sentences into N second sentence vectors, the N second sentence vectors and the N first sentence vectors One-to-one correspondence between question sentences.
  • Step 308 Extract the characteristic information of the first sentence vector to obtain a first target vector, and extract the characteristic information of the N second sentence vectors to obtain N second target vectors.
  • the N second sentence vectors have a one-to-one correspondence.
  • Step 309 Determine the sentence similarity between the first target vector and each second target vector based on the sentence-sentence similarity calculation formula, and obtain N sentence-sentence similarities.
  • Step 310 Determine the minimum number of editing operations required to convert the first character set into each second character set.
  • Step 311 Determine the obtained N minimum number of editing operations as N editing distances, and the N editing distances correspond to the N minimum number of editing operations one-to-one.
  • Step 312 Determine N intersections and N unions of the first character set and the N second character sets, and the N intersections and the N unions are all the same as the N second characters Set one-to-one correspondence.
  • Step 313 Determine N Jaccard similarities based on the N intersections and the N unions, and the N Jaccards similarities are uniform with the N intersections and the N unions correspond.
  • Step 314 Determine N first parameters based on the N sentence similarities, the N edit distances, and the N Jackard similarities, and the N first parameters and the N sentences The similarity, the N edit distances, and the N Jackard similarities are in a one-to-one correspondence.
  • Step 315 Use the target answer sentence as the answer sentence of the target question sentence, the target answer sentence being the first answer sentence associated with the first question sentence corresponding to the target parameter, and the value of the target parameter is greater than or equal to the second threshold ,
  • the N first parameters include the target parameter.
  • Step 316 Output the target answer sentence.
  • step 302 and steps 303-305 can be performed at the same time, or step 302 can be performed first, then steps 303-305, or steps 303-305 can be performed first, and then step 302; steps 307-309, steps 310-311 and steps 312-314 can be performed at the same time, or steps 307-309 can be performed first, then 310-311, and then steps 312-314, or steps 310-311 can be performed first, and then steps 307-309, Steps 312-314 are then executed, and steps 312-314 may be executed first, then steps 307-309, and then steps 310-311 are executed, which are not limited here.
  • steps 312-314 may be executed first, then steps 307-309, and then steps 310-311 are executed, which are not limited here.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device includes a memory and a communication interface. And one or more programs, wherein the one or more programs are stored in the memory and are configured to be executed by the processor, and the programs include instructions for executing the following steps:
  • N first question sentences are determined, and the similarity between each first question sentence and the target question sentence is greater than or equal to the first threshold, the N is an integer greater than 1, and each first question sentence is A question sentence is associated with a first answer sentence;
  • N first parameters are determined based on a preset neural network model, the N first parameters are in one-to-one correspondence with the N first question sentences, and the N first parameters are used to evaluate their corresponding first question sentences Similarity with the target question sentence;
  • the target answer sentence is the first answer sentence associated with the first question sentence corresponding to the target parameter, the value of the target parameter is greater than or equal to the second threshold, the
  • the N first parameters include the target parameter
  • the above-mentioned program includes instructions specifically for executing the following steps:
  • M second question sentences are determined from a preset corpus based on a literal search
  • W third question sentences are determined from the preset corpus based on a semantic search
  • the keywords of the literal search are determined based on the target question sentence Yes
  • the literal similarity between each second question sentence and the target question sentence is greater than or equal to the third threshold
  • the semantic similarity between each third question sentence and the target question sentence is greater than or equal to the fourth threshold
  • the first threshold is greater than or equal to the third threshold
  • the first threshold is greater than or equal to the fourth threshold
  • the M and W are both integers greater than 0;
  • N first question sentences are determined, and the N first question sentences include at least one second question sentence and at least one third question sentence.
  • the target question sentence is composed of a first character set, the first character set includes P first characters, and the P is an integer greater than 0;
  • the above program includes instructions specifically for performing the following steps:
  • the M fifth question sentences are determined as M second question sentences.
  • the foregoing program includes instructions specifically for executing the following steps:
  • W third question sentences are determined from the preset corpus, and the semantic similarity between each third question sentence and the fourth question sentence is greater than or equal to the fourth threshold.
  • the above-mentioned program includes instructions specifically for executing the following steps:
  • N sentence-sentence similarity Determine the N sentence-sentence similarity, N edit distance, and N Jaccard similarity of the target question sentence and the N first question sentence based on the preset neural network model, and the N sentence-sentence similarity ,
  • the N edit distances and the N Jackard similarities are all in one-to-one correspondence with the N first question sentences;
  • N first parameters based on the N sentence similarities, the N edit distances, and the N Jackard similarities, the N first parameters and the N sentence similarities, The N edit distances and the N Jackard similarities have a one-to-one correspondence.
  • the above program includes specifically configured to perform the following steps instruction:
  • the target question sentence is transformed into a first sentence vector, and the N first question sentences are transformed into N second sentence vectors, and the N second sentence vectors are the same as the N first question sentences.
  • the feature information of the first sentence vector is extracted to obtain a first target vector, and the feature information of the N second sentence vectors is extracted to obtain N second target vectors.
  • the second sentence vector corresponds to one to one;
  • the sentence similarity between the first target vector and each second target vector is determined based on the sentence-sentence similarity calculation formula, and N sentence-sentence similarities are obtained.
  • the target question sentence is composed of a first character set
  • the N first question sentences are composed of N second character sets
  • the N second character sets are identical to the N
  • the above program includes instructions specifically for executing the following steps:
  • the obtained N minimum number of editing operations are determined as N editing distances, and the N editing distances are in one-to-one correspondence with the N minimum number of editing operations.
  • the above program includes specifically configured to execute the following steps The instructions:
  • the N Jaccard similarities are determined based on the N intersections and the N unions, and the N Jaccard similarities are in a one-to-one correspondence with the N intersections and the N unions.
  • the above-mentioned program includes specifically follow the instructions for the following steps:
  • the first weight is used to represent the proportion of sentence similarity when used to evaluate the first parameter
  • the second weight is used to represent the first similarity
  • the third weight is used to indicate the proportion of Jackard’s similarity when used to evaluate the first parameter.
  • the first weight and the second weight The sum with the third weight is 1;
  • the formula determines N first parameters.
  • an electronic device includes hardware structures and/or software modules corresponding to each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the embodiment of the present application may divide the electronic device into functional units according to the method example.
  • each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit.
  • the integrated unit can be implemented in the form of hardware or software functional unit. It should be noted that the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • FIG. 5 is a schematic structural diagram of an intelligent dialogue device provided by an embodiment of the present application, which is applied to electronic equipment, and the device includes:
  • the determining unit 501 is configured to determine N first question sentences based on the target question sentence input by the user, and the similarity between each first question sentence and the target question sentence is greater than or equal to a first threshold, where N is greater than 1.
  • Each first question sentence is associated with a first answer sentence; N first parameters are determined based on the preset neural network model, and the N first parameters correspond to the N first question sentences one-to-one, so The N first parameters are used to evaluate the similarity between the corresponding first question sentence and the target question sentence; the target answer sentence is taken as the answer sentence of the target question sentence, and the target answer sentence is corresponding to the target parameter For a first answer sentence associated with a first question sentence, the value of the target parameter is greater than or equal to a second threshold, and the N first parameters include the target parameter;
  • the output unit 502 is configured to output the target answer sentence.
  • the determining unit 501 in terms of determining N first question sentences based on the target question sentence input by the user, includes an acquiring subunit 5011, a first sub determining unit 5012, and a second sub determining unit 5013. And the third sub-determining unit 5014, where:
  • the obtaining subunit 5011 is used to obtain the target question sentence input by the user;
  • the first sub-determination unit 5012 is configured to determine M second question sentences from a preset corpus based on a literal search, and keywords of the literal search are determined based on the target question sentence;
  • the second sub-determination unit 5013 is configured to determine W third question sentences from the preset corpus based on semantic search, and the literal similarity between each second question sentence and the target question sentence is greater than or equal to the first question sentence.
  • Three thresholds The semantic similarity between each third question sentence and the target question sentence is greater than or equal to the fourth threshold, the first threshold is greater than or equal to the third threshold, and the first threshold is greater than or equal to all
  • M and W are integers greater than 0;
  • the third sub-determination unit 5014 is configured to determine N first question sentences based on the M second question sentences and the W third question sentences, and the N first question sentences include at least one first question sentence. Two question sentences and at least one third question sentence.
  • the target question sentence is composed of a first character set, the first character set includes P first characters, and the P is an integer greater than 0;
  • the first sub-determining unit 5012 is specifically used for
  • the M fifth question sentences are determined as M second question sentences.
  • the second sub-determination unit 5013 is specifically configured to determine the sentence composition of the target question sentence Component; filtering the target question sentence based on the sentence component to obtain a fourth question sentence, the sentence component of the fourth question sentence is less than or equal to the sentence component of the target question sentence; from the prediction It is assumed that W third question sentences are determined in the corpus, and the semantic similarity between each third question sentence and the fourth question sentence is greater than or equal to the fourth threshold.
  • the determining unit 501 further includes a fourth sub-determining unit 5015 and a fifth sub-determining unit 5016, wherein:
  • the fourth sub-determination unit 5015 is configured to determine N sentence similarities, N edit distances, and N Jaccards between the target question sentence and the N first question sentences based on a preset neural network model Similarity, the N sentence similarities, the N edit distances, and the N Jackard similarities all correspond to the N first question sentences in a one-to-one correspondence;
  • the fifth sub-determining unit 5016 is configured to determine N first parameters based on the N sentence similarities, the N edit distances, and the N Jackard similarities, the N first parameters The parameters are in a one-to-one correspondence with the N sentence similarities, the N edit distances, and the N Jackard similarities.
  • the fourth sub-determining unit 5015 specifically Used for:
  • the target question sentence is transformed into a first sentence vector, and the N first question sentences are transformed into N second sentence vectors, and the N second sentence vectors are the same as the N first question sentences.
  • the feature information of the first sentence vector is extracted to obtain a first target vector, and the feature information of the N second sentence vectors is extracted to obtain N second target vectors.
  • the second sentence vector corresponds to one to one;
  • the sentence similarity between the first target vector and each second target vector is determined based on the sentence-sentence similarity calculation formula, and N sentence-sentence similarities are obtained.
  • the target question sentence is composed of a first character set
  • the N first question sentences are composed of N second character sets
  • the N second character sets are identical to the N One-to-one correspondence between the first question sentences
  • the fourth sub-determining unit 5015 is specifically configured to :
  • the obtained N minimum number of editing operations are determined as N editing distances, and the N editing distances are in one-to-one correspondence with the N minimum number of editing operations.
  • the fourth sub-determining unit 5015 in terms of determining the N Jackard similarities between the target question sentence and the N first question sentences based on a preset neural network model, the fourth sub-determining unit 5015, Specifically used for:
  • the N Jaccard similarities are determined based on the N intersections and the N unions, and the N Jaccard similarities are in a one-to-one correspondence with the N intersections and the N unions.
  • the fifth sub-determining unit 5016 in determining N first parameters based on the N sentence similarities, the N edit distances, and the N Jackard similarities, the fifth sub-determining unit 5016, specifically used for:
  • the first weight is used to represent the proportion of sentence similarity when used to evaluate the first parameter
  • the second weight is used to represent the first similarity
  • the third weight is used to indicate the proportion of Jackard’s similarity when used to evaluate the first parameter.
  • the first weight and the second weight The sum with the third weight is 1;
  • the formula determines N first parameters.
  • the acquisition sub-unit 5011, the first sub-determining unit 5012, the second sub-determining unit 5013, the third sub-determining unit 5014, the fourth sub-determining unit 5015, the fifth sub-determining unit 5016, and the output unit 502 can pass through Processor implementation.
  • the embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium may be a computer non-volatile readable storage medium or a computer volatile readable storage medium, which is not limited here.
  • the computer-readable storage medium stores a computer program for electronic data exchange.
  • the computer program enables a computer to execute part or all of the steps of any method as described in the above method embodiments, and the above computer includes an electronic device.
  • the embodiments of the present application also provide a computer program product.
  • the above-mentioned computer program product includes a non-transitory computer-readable storage medium storing the computer program.
  • the above-mentioned computer program is operable to cause a computer to execute any of the methods described in the above-mentioned method embodiments. Some or all of the steps.
  • the computer program product may be a software installation package, and the above-mentioned computer includes electronic equipment.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are only illustrative, for example, the division of the above-mentioned units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the above integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory.
  • a number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the foregoing methods of the various embodiments of the present application.
  • the aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • the program can be stored in a computer-readable memory, and the memory can include: a flash disk , Read-only memory (English: Read-Only Memory, abbreviation: ROM), random access device (English: Random Access Memory, abbreviation: RAM), magnetic disk or optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种智能对话方法及相关设备,涉及语音语义领域,应用于电子设备,所述方法包括:基于用户输入的目标问题语句确定N个第一问题语句(201),每个第一问题语句关联一个第一答案语句;基于预设神经网络模型确定N个第一参数,N个第一参数与所述N个第一问题语句一一对应(202);将目标答案语句作为目标问题语句的答案语句,目标答案语句为目标参数对应的第一问题语句关联的第一答案语句(203),所述N个第一参数包括所述目标参数;输出目标答案语句(204),可实现可控的回答语料库中未出现的问题。

Description

智能对话方法及相关设备
本申请要求于2019年10月29日提交中国专利局、申请号为2019110344253、申请名称为“智能对话方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及电子技术领域,尤其涉及一种智能对话方法及相关设备。
背景技术
智能对话是人工智能领域中的一个重要应用,人类天生具有分析对话状态、主题、语气的能力,在机器上实现智能对话具有重大的意义。目前,智能对话主要基于两种模型实现,生成式模型和规则模型。生成式模型可以回答语料库中未出现的问题,但是回答语句不可控;而规则模型虽回答语句可控,但是无法回答语料中未出现的问题。因此如何实现可控的回答语料库中未出现的问题是一个需要解决的技术问题。
发明内容
本申请实施例提供一种智能对话方法及相关设备,用于实现可控的回答语料库中未出现的问题。
第一方面,本申请实施例提供一种智能对话方法,应用于电子设备,所述方法包括:
基于用户输入的目标问题语句确定N个第一问题语句,每个第一问题语句与所述目标问题语句的相似度均大于或等于第一阈值,所述N为大于1的整数,每个第一问题语句关联一个第一答案语句;
基于预设神经网络模型确定N个第一参数,所述N个第一参数与所述N个第一问题语句一一对应,所述N个第一参数用于评价其对应的第一问题语句与所述目标问题语句的相似度;
将目标答案语句作为所述目标问题语句的答案语句,所述目标答案语句为目标参数对应的第一问题语句关联的第一答案语句,所述目标参数的值大于或等于第二阈值,所述N个第一参数包括所述目标参数;
输出所述目标答案语句。
第二方面,本申请实施例提供一种智能对话装置,应用于电子设备,所述装置包括:
确定单元,用于基于用户输入的目标问题语句确定N个第一问题语句,每个第一问题语句与所述目标问题语句的相似度均大于或等于第一阈值,所述N为大于1的整数,每个第一问题语句关联一个第一答案语句;基于预设神经网络模型确定N个第一参数,所述N个第一参数与所述N个第一问题语句一一对应,所述N个第一参数用于评价其对应的第一问题语句与所述目标问题语句的相似度;将目标答案语句作为所述目标问题语句的答案语句,所述目标答案语句为目标参数对应的第一问题语句关联的第一答案语句,所述目标参数的值大于或等于第二阈值,所述N个第一参数包括所述目标参数;
输出单元,用于输出所述目标答案语句。
第三方面,本申请实施例提供一种电子设备,该电子设备包括处理器、存储器、通信接口,以及一个或多个程序,所述一个或多个程序被存储在所述存储器中,并且被配置由所述处理器执行,所述程序包括用于执行如本申请实施例第一方面所述的方法中所描述的部分或全部步骤的指令。
第四方面,本申请实施例提供了一种计算机可读存储介质,其中,上述计算机可读存储介质用于存储计算机程序,其中,上述计算机程序被处理器执行,以实现如本申请实施例第一方面所述的方法中所描述的部分或全部步骤。
可以看出,在本申请实施例中,先基于用户输入的目标问题语句确定N个第一问题语句,然后基于预设神经网络模型确定N个第一参数,N个第一参数用于评价其对应的第一问题语句与目标问题语句的相似度,然后将大于或等于第二阈值的第一参数对应的第一问题语句关联的第一答案语句作为所述目标问题语句的答案语句,最后输出该答案语句,基于用户输入的目标问题语句确定N个第一问题语句,进行了一个粗略筛选,保证了答案语句的可控性;基于预设神经网络模型确定N个第一参数,实现了灵活回答语料库中未出现的问题。
本申请的这些方面或其他方面在以下实施例的描述中会更加简明易懂。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种电子设备的结构示意图;
图2A是本申请实施例提供的一种智能对话方法的流程示意图;
图2B是本申请实施例提供的一种句句相似度的计算过程示意图;
图3是本申请实施例提供的一种智能对话方法的流程示意图;
图4是本申请实施例提供的一种电子设备的结构示意图;
图5是本申请实施例提供的一种智能对话装置的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
以下分别进行详细说明。
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
下面结合附图对本申请的实施例进行描述。
电子设备可以包括各种具有无线通信功能的手持设备、车载设备、可穿戴设备(例如智能手表、智能手环、计步器等)、计算设备或通信连接到无线调制解调器的其他处理设备,以及各种形式的用户设备(User Equipment,UE),移动台(Mobile Station,MS),终端设备(terminal device)等等。为方便描述,上面提到的设备统称为电子设备。
如图1所示,图1是本申请实施例提供的一种电子设备的结构示意图。该电子设备包括处理器、存储器、信号处理器、收发器、显示屏、扬声器、麦克风、随机存取存储器(Random Access Memory,RAM)、摄像头和传感器等等。其中,存储器、信号处理器、显示屏、扬声器、麦克风、RAM、摄像头、传感器与处理器连接,收发器与信号处理器连接。
其中,显示屏可以是液晶显示器(Liquid Crystal Display,LCD)、有机或无机发光二极管(Organic Light-Emitting Diode,OLED)、有源矩阵有机发光二极体面板(Active Matrix/Organic Light Emitting Diode,AMOLED)等。
其中,该摄像头可以是普通摄像头、也可以是红外摄像,在此不作限定。该摄像头可以是前置摄像头或后置摄像头,在此不作限定。
其中,传感器包括以下至少一种:光感传感器、陀螺仪、红外接近传感器、指纹传感器、压力传感器等等。其中,光感传感器,也称为环境光传感器,用于检测环境光亮度。光线传感器可以包括光敏元件和模数转换器。其中,光敏元件用于将采集的光信号转换为电信号,模数转换器用于将上述电信号转换为数字信号。可选的,光线传感器还可以包括信号放大器,信号放大器可以将光敏元件转换的电信号进行放大后输出至模数转换器。上述光敏元件可以包括光电二极管、光电三极管、光敏电阻、硅光电池中的至少一种。
其中,处理器是电子设备的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或执行存储在存储器内的软体程序和/或模块,以及调用存储在存储器内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。
其中,处理器可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器中。
其中,存储器用于存储软体程序和/或模块,处理器通过运行存储在存储器的软件程序和/或模块,从而执行电子设备的各种功能应用以及数据处理。存储器可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的软体程序等;存储数据区可存储根据电子设备的使用所创建的数据等。此外,存储器可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
下面对本申请实施例进行详细介绍。
请参阅图2A,图2A是本申请实施例提供的一种智能对话方法的流程示意图,应用于电子设备,所述方法包括:
步骤201:基于用户输入的目标问题语句确定N个第一问题语句,每个第一问题语句与所述目标问题语句的相似度均大于或等于第一阈值,所述N为大于1的整数,每个第一问题语句关联一个第一答案语句。
其中,用户输入的信息可以是语音、文字或图片,然后解析用户输入的信息得到目标问题语句。
其中,N例如可以为5、10、15、20,或是其他值,在此不作限定。
其中,第一阈值例如可以为80%、85%、90%、95%,或是其他值,在此不作限定。
如表1所示,表1为第一问题语句与第一答案语句一一对应的关系映射表,所述关系映射表可以存储在所述电子设备关联的数据库中。
表1
Figure PCTCN2019117542-appb-000001
Figure PCTCN2019117542-appb-000002
步骤202:基于预设神经网络模型确定N个第一参数,所述N个第一参数与所述N个第一问题语句一一对应,所述N个第一参数用于评价其对应的第一问题语句与所述目标问题语句的相似度。
步骤203:将目标答案语句作为所述目标问题语句的答案语句,所述目标答案语句为目标参数对应的第一问题语句关联的第一答案语句,所述目标参数的值大于或等于第二阈值,所述N个第一参数包括所述目标参数。
其中,第一阈值和第二阈值均为预先设定的值。
举例说明,例如确定的第一问题语句包括3个,每个的第一参数的值例如分别为80%、85%、90%,那么目标参数则可以为90%,将90%对应的第一问题语句关联的第一答案语句作为目标问题语句的答案语句。
步骤204:输出所述目标答案语句。
其中,可以语音输出所述目标答案语句,也可以文字输出所述目标答案语句,在此不作限定。
可以看出,在本申请实施例中,先基于用户输入的目标问题语句确定N个第一问题语句,然后基于预设神经网络模型确定N个第一参数,N个第一参数用于评价其对应的第一问题语句与目标问题语句的相似度,然后将大于或等于第二阈值的第一参数对应的第一问题语句关联的第一答案语句作为所述目标问题语句的答案语句,最后输出该答案语句,基于用户输入的目标问题语句确定N个第一问题语句,进行了一个粗略筛选,保证了答案语句的可控性;基于预设神经网络模型确定N个第一参数,实现了灵活回答语料库中未出现的问题。
在本申请的一实现方式中,所述基于用户输入的目标问题语句确定N个第一问题语句,包括:
获取用户输入的目标问题语句;
基于字面搜索从预设语料库中确定M个第二问题语句,以及基于语义搜索从所述预设语料库中确定W个第三问题语句,所述字面搜索的关键词是基于所述目标问题语句确定的,每个第二问题语句与所述目标问题语句的字面相似度均大于或等于第三阈值,每个第三问题语句与所述目标问题语句的语义相似度均大于或等于第四阈值,所述第一阈值大于或等于所述第三阈值,所述第一阈值大于或等于所述第四阈值,所述M和所述W均为大于0的整数;
基于所述M个第二问题语句和所述W个第三问题语句,确定N个第一问题语句,所述N个第一问题语句包括至少一个第二问题语句和至少一个第三问题语句。
具体地,所述目标问题语句由第一字符集构成,所述第一字符集包括P个第一字符,所述P为大于0的整数;所述基于字面搜索从预设语料库中确定M个第二问题语句的一具体实现方式为:在预设语料库以所述P个第一字符中的至少一个第一字符为关键词进行搜索,得到Q个第五问题语句;从所述Q个第五问题语句中选择M个第五问题语句;将所述M个第五问题语句确定为M个第二问题语句。
其中,所述M个第五问题语句可以是人为选择的任意M个第五问题语句,也可以是搜索之后,排序靠前的M个第五问题语句,还可以是包含关键词最多的M个第五问题语句,在此不做限定。
进一步地,所述M个第二问题语句包括的第一字符的数量大于或等于Q-M个第六问 题语句包括的第一字符的数量,所述Q-M个第六问题语句为所述Q个第五问题语句中除所述M个第五问题语句的问题语句。
其中,第三阈值例如可以为60%、70%、80%、90%,或是其他值,在此不作限定;第四阈值例如可以为60%、70%、80%、90%,或是其他值,在此不作限定。
具体地,所述基于所述M个第二问题语句和所述W个第三问题语句,确定N个第一问题语句的一具体方式为:从所述M个第二问题语句中确定n*N个第二问题语句,以及从所述W个第三问题语句中确定(1-n)*N个第三问题语句;将所述n*N个第二问题语句和所述(1-n)*N个第三问题语句作为N个第一问题语句。
其中,n为大于0且小于1的数,例如可以为0.1、0.2、0.3、0.4,或是其他的值,在此不做限定。
其中,所述n*N个第二问题语句与所述目标问题语句的字面相似度大于或等于第五阈值,所述(1-n)*N个第三问题语句所述目标问题语句的语义相似度大于或等于第六阈值,所述第五阈值可以等于所述第六阈值,所述第五阈值也可以不等于所述第六阈值,在此不做限定。
在本申请的一实现方式中,所述基于语义搜索从所述预设语料库中确定W个第三问题语句,包括:
确定所述目标问题语句的语句构成成分;
基于所述语句构成成分对所述目标问题语句进行过滤得到第四问题语句,所述第四问题语句的语句构成成分少于或等于所述目标问题语句的语句构成成分;
从所述预设语料库中确定W个第三问题语句,每个第三问题语句与所述第四问题语句的语义相似度均大于或等于所述第四阈值。
其中,语句构成成分包括以下至少一种:主语、谓语、宾语、定语、状语、补足语、中心语、动语。
举例说明,将目标问题语句中的主语去除,从而获得去除主语后的语句。语句中的主语,例如可以为“他”、“她”、“它”、“他们”、“我”、“你”等词。示例性的,目标问题语句为“给我推荐一个合适的书包”,则去除停用词后的语句为“给推荐一个合适的书包”。
在本申请的一实现方式中,所述基于语义搜索从所述预设语料库中确定W个第三问题语句,包括:
对所述目标问题语句进行分词处理,以得到多个目标词;
基于预设停用词表删除所述多个目标词中的停用词,以得到第七问题语句;
从所述预设语料库中确定W个第三问题语句,每个第三问题语句与所述第七问题语句的语义相似度均大于或等于所述第四阈值。
其中,停用词是对语句无意义的词,比如“啊”、“哦”、“嗯”、“了”、“么”、“的”等词。示例性的,目标问题语句为“明天的天气怎么样啊。”,则去除停用词后的语句为“明天天气怎么样”。
在本申请的一实现方式中,所述基于预设神经网络模型确定N个第一参数,包括:
基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个句句相似度、N个编辑距离和N个杰卡德相似度,所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度均与所述N个第一问题语句一一对应;
基于所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度确定N个第一参数,所述N个第一参数与所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度均一一对应。
其中,句句相似度指的是目标问题语句与第一问题语句的相似度。
其中,编辑距离是指将第一问题语句通过编辑操作转换成目标问题语句的最少编辑次 数。
具体地,所述基于所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度确定N个第一参数,包括:
将所述N个编辑距离转化成N个第一相似度;
确定第一权重、第二权重和第三权重,所述第一权重用于表示句句相似度在用于评价第一参数时所占的比重,所述第二权重用于表示第一相似度在用于评价第一参数时所占的比重,所述第三权重用于表示杰卡德相似度在用于评价第一参数时所占的比重,所述第一权重、所述第二权重与所述第三权重之和为1;
基于所述第一权重、所述第二权重、所述第三权重、所述N个句句相似度、所述N个第一相似度、所述N个杰卡德相似度和第一参数公式确定N个第一参数。
举例说明,表2是本申请实施例提供的一种编辑距离与第一相似度的一一对应关系表。
表2
编辑距离 第一相似度
大于或等于0,且小于3 90%
大于或等于3,且小于6 80%
大于或等于6,且小于9 70%
大于或等于9,且小于12 60%
··· ···
进一步地,所述第一参数公式为:S=a*A+b*B+c*C,所述S为第一参数,所述a为所述第一权重,所述b为所述第二权重,所述c为所述第三权重,所述A为句句相似度,所述B为第一相似度,所述C为杰卡德相似度。
举例说明,a为0.3,b为0.5,c为0.2,A为80%,B为90%,C为80%,则计算可得S=85%。
在本申请的一实现方式中,所述基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个句句相似度,包括:
将所述目标问题语句转化成第一句向量,以及将所述N个第一问题语句转化成N个第二句向量,所述N个第二句向量与所述N个第一问题语句一一对应;
提取所述第一句向量的特征信息得到第一目标向量,以及提取所述N个第二句向量的特征信息得到N个第二目标向量,所述N个第二目标向量与所述N个第二句向量一一对应;
基于句句相似度计算公式确定所述第一目标向量和每个第二目标向量的句句相似度,得到N个句句相似度。
进一步地,所述目标问题语句由第一字符集构成,所述第一字符集包括P个第一字符,所述将所述目标问题语句转化成第一句向量的一具体实现方式包括:将所述P个第一字符转化成P个词向量;将所述P个词向量组合得到第一句向量。
需要说明的是,将所述P个第一字符转化成P个词向量的方式可以为以下至少一种:双向编码表示(Bidirectional Encoder Representation from Transformers,BERT)模型、语言模型嵌入(Embeddings from Language Models,ELMo)模型、word2vec模型。
其中,句句相似度计算公式为
Figure PCTCN2019117542-appb-000003
其中h a、h b分别为第一目标向量和第二目标向量。
如图2B所述,图2B是本申请实施例提供的一种句句相似度的计算过程示意图。目标问题语句为“He is smart”,“He”的词向量为x 1 a,“is”的词向量为x 2 a,“smart”的词向量为x 3 a,然后通过LSTMa算法分别提取x 1 a、x 2 a、x 3 a的特征信息,得到h 1 a、h 2 a、h 3 a。同理,第一 问题语句为“A truly wise man”,“A”的词向量为x 1 b,“truly”的词向量为x 2 b,“wise”的词向量为x 3 b,“man”的词向量为x 4 b,然后通过LSTMb算法分别提取x 1 b、x 2 b、x 3 b、x 4 b的特征信息,得到h 1 b、h 2 b、h 3 b、h 4 b。最后,通过句句相似度计算公式f(h a,h b)即可得到句句相似度A,以及输出句句相似度A。
在本申请的一实现方式中,所述目标问题语句由第一字符集构成,所述N个第一问题语句由N个第二字符集构成,所述N个第二字符集与所述N个第一问题语句一一对应;所述基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个编辑距离,包括:
确定将所述第一字符集转化成每个第二字符集所需的最少编辑操作次数;
将得到的N个最少编辑操作次数确定为N个编辑距离,所述N个编辑距离与所述N个最少编辑操作次数一一对应。
其中,编辑操作包括以下至少一种:插入、删除、替换。
举例说明,"kitten"和"sitting"这两个单词,由"kitten"转换为"sitting"需要的最少单字符编辑操作有:第一步,kitten→sitten(用"s"替换"k");第二步,sitten→sittin(用"i"替换"e");第三步,sittin→sitting(在单词末尾插入"g")。因此,"kitten"和"sitting"这两个单词之间的编辑距离为3。
在本申请的一实现方式中,所述基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个杰卡德相似度,包括:
确定所述第一字符集与所述N个第二字符集的N个交集和N个并集,所述N个交集和所述N个并集均与所述N个第二字符集一一对应;
基于所述N个交集和所述N个并集确定N个杰卡德相似度,所述N个杰卡德相似度与所述N个交集和所述N个并集均一一对应。
进一步地,第一字符集包括P个第一字符,第二字符集包括Q个第二字符,其中第一字符和第二字符相同的有R个,则第一字符集与第二字符集的交集为R,第一字符集与第二字符集的并集为P+Q-R,杰卡德相似度为R/(P+Q-R),R和Q均为大于0的整数。
与所述图2A所示的实施例一致的,请参阅图3,图3是本申请实施例提供的一种智能对话方法的流程示意图,应用于电子设备,所述方法包括:
步骤301:获取用户输入的目标问题语句,所述目标问题语句由第一字符集构成。
步骤302:基于字面搜索从预设语料库中确定M个第二问题语句,所述字面搜索的关键词是基于所述目标问题语句确定的,每个第二问题语句与所述目标问题语句的字面相似度均大于或等于第三阈值,所述M为大于0的整数。
步骤303:确定所述目标问题语句的语句构成成分。
步骤304:基于所述语句构成成分对所述目标问题语句进行过滤得到第四问题语句,所述第四问题语句的语句构成成分少于或等于所述目标问题语句的语句构成成分。
步骤305:从所述预设语料库中确定W个第三问题语句,每个第三问题语句与所述第四问题语句的语义相似度均大于或等于所述第四阈值,所述W为大于0的整数。
步骤306:基于所述M个第二问题语句和所述W个第三问题语句,确定N个第一问题语句,所述N个第一问题语句包括至少一个第二问题语句和至少一个第三问题语句,每个第一问题语句与所述目标问题语句的相似度均大于或等于第一阈值,所述第一阈值大于或等于所述第三阈值,所述第一阈值大于或等于所述第四阈值,所述N个第一问题语句由N个第二字符集构成,所述N个第二字符集与所述N个第一问题语句一一对应。
步骤307:将所述目标问题语句转化成第一句向量,以及将所述N个第一问题语句转化成N个第二句向量,所述N个第二句向量与所述N个第一问题语句一一对应。
步骤308:提取所述第一句向量的特征信息得到第一目标向量,以及提取所述N个第二句向量的特征信息得到N个第二目标向量,所述N个第二目标向量与所述N个第二句向量一一对应。
步骤309:基于句句相似度计算公式确定所述第一目标向量和每个第二目标向量的句句相似度,得到N个句句相似度。
步骤310:确定将所述第一字符集转化成每个第二字符集所需的最少编辑操作次数。
步骤311:将得到的N个最少编辑操作次数确定为N个编辑距离,所述N个编辑距离与所述N个最少编辑操作次数一一对应。
步骤312:确定所述第一字符集与所述N个第二字符集的N个交集和N个并集,所述N个交集和所述N个并集均与所述N个第二字符集一一对应。
步骤313:基于所述N个交集和所述N个并集确定N个杰卡德相似度,所述N个杰卡德相似度与所述N个交集和所述N个并集均一一对应。
步骤314:基于所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度确定N个第一参数,所述N个第一参数与所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度均一一对应。
步骤315:将目标答案语句作为所述目标问题语句的答案语句,所述目标答案语句为目标参数对应的第一问题语句关联的第一答案语句,所述目标参数的值大于或等于第二阈值,所述N个第一参数包括所述目标参数。
步骤316:输出所述目标答案语句。
需要说明的是,步骤302和步骤303-305可以同时执行,也可以先执行步骤302,再执行步骤303-305,还可以先执行步骤303-305,再执行步骤302;步骤307-309、步骤310-311、步骤312-314可以同时执行,也可以先执行步骤307-309,再执行310-311,然后执行步骤312-314,还可以先执行步骤310-311,再执行步骤307-309,然后执行步骤312-314,还可以先执行步骤312-314,再执行步骤307-309,然后再执行步骤310-311,在此均不做限定。本实施例的具体实现过程可参见上述方法实施例所述的具体实现过程,在此不再叙述。
与上述图2A和图3所示的实施例一致的,请参阅图4,图4是本申请实施例提供的一种电子设备的结构示意图,如图所示,该电子设备包括存储器、通信接口以及一个或多个程序,其中,上述一个或多个程序被存储在上述存储器中,并且被配置由上述处理器执行,上述程序包括用于执行以下步骤的指令:
基于用户输入的目标问题语句确定N个第一问题语句,每个第一问题语句与所述目标问题语句的相似度均大于或等于第一阈值,所述N为大于1的整数,每个第一问题语句关联一个第一答案语句;
基于预设神经网络模型确定N个第一参数,所述N个第一参数与所述N个第一问题语句一一对应,所述N个第一参数用于评价其对应的第一问题语句与所述目标问题语句的相似度;
将目标答案语句作为所述目标问题语句的答案语句,所述目标答案语句为目标参数对应的第一问题语句关联的第一答案语句,所述目标参数的值大于或等于第二阈值,所述N个第一参数包括所述目标参数;
输出所述目标答案语句。
在本申请的一实现方式中,在基于用户输入的目标问题语句确定N个第一问题语句方面,上述程序包括具体用于执行以下步骤的指令:
获取用户输入的目标问题语句;
基于字面搜索从预设语料库中确定M个第二问题语句,以及基于语义搜索从所述预设 语料库中确定W个第三问题语句,所述字面搜索的关键词是基于所述目标问题语句确定的,每个第二问题语句与所述目标问题语句的字面相似度均大于或等于第三阈值,每个第三问题语句与所述目标问题语句的语义相似度均大于或等于第四阈值,所述第一阈值大于或等于所述第三阈值,所述第一阈值大于或等于所述第四阈值,所述M和所述W均为大于0的整数;
基于所述M个第二问题语句和所述W个第三问题语句,确定N个第一问题语句,所述N个第一问题语句包括至少一个第二问题语句和至少一个第三问题语句。
在本申请的一实现方式中,所述目标问题语句由第一字符集构成,所述第一字符集包括P个第一字符,所述P为大于0的整数;在基于字面搜索从预设语料库中确定M个第二问题语句方面,上述程序包括具体用于执行以下步骤的指令:
在预设语料库以所述P个第一字符中的至少一个第一字符为关键词进行搜索,得到Q个第五问题语句;
从所述Q个第五问题语句中选择M个第五问题语句;
将所述M个第五问题语句确定为M个第二问题语句。
在本申请的一实现方式中,在基于语义搜索从所述预设语料库中确定W个第三问题语句方面,上述程序包括具体用于执行以下步骤的指令:
确定所述目标问题语句的语句构成成分;
基于所述语句构成成分对所述目标问题语句进行过滤得到第四问题语句,所述第四问题语句的语句构成成分少于或等于所述目标问题语句的语句构成成分;
从所述预设语料库中确定W个第三问题语句,每个第三问题语句与所述第四问题语句的语义相似度均大于或等于所述第四阈值。
在本申请的一实现方式中,在基于预设神经网络模型确定N个第一参数方面,上述程序包括具体用于执行以下步骤的指令:
基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个句句相似度、N个编辑距离和N个杰卡德相似度,所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度均与所述N个第一问题语句一一对应;
基于所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度确定N个第一参数,所述N个第一参数与所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度均一一对应。
在本申请的一实现方式中,在基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个句句相似度方面,上述程序包括具体用于执行以下步骤的指令:
将所述目标问题语句转化成第一句向量,以及将所述N个第一问题语句转化成N个第二句向量,所述N个第二句向量与所述N个第一问题语句一一对应;
提取所述第一句向量的特征信息得到第一目标向量,以及提取所述N个第二句向量的特征信息得到N个第二目标向量,所述N个第二目标向量与所述N个第二句向量一一对应;
基于句句相似度计算公式确定所述第一目标向量和每个第二目标向量的句句相似度,得到N个句句相似度。
在本申请的一实现方式中,所述目标问题语句由第一字符集构成,所述N个第一问题语句由N个第二字符集构成,所述N个第二字符集与所述N个第一问题语句一一对应;在基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个编辑距离方面,上述程序包括具体用于执行以下步骤的指令:
确定将所述第一字符集转化成每个第二字符集所需的最少编辑操作次数;
将得到的N个最少编辑操作次数确定为N个编辑距离,所述N个编辑距离与所述N个最少编辑操作次数一一对应。
在本申请的一实现方式中,在基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个杰卡德相似度方面,上述程序包括具体用于执行以下步骤的指令:
确定所述第一字符集与所述N个第二字符集的N个交集和N个并集,所述N个交集和所述N个并集均与所述N个第二字符集一一对应;
基于所述N个交集和所述N个并集确定N个杰卡德相似度,所述N个杰卡德相似度与所述N个交集和所述N个并集均一一对应。
在本申请的一实现方式中,在基于所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度确定N个第一参数方面,上述程序包括具体用于执行以下步骤的指令:
将所述N个编辑距离转化成N个第一相似度;
确定第一权重、第二权重和第三权重,所述第一权重用于表示句句相似度在用于评价第一参数时所占的比重,所述第二权重用于表示第一相似度在用于评价第一参数时所占的比重,所述第三权重用于表示杰卡德相似度在用于评价第一参数时所占的比重,所述第一权重、所述第二权重与所述第三权重之和为1;
基于所述第一权重、所述第二权重、所述第三权重、所述N个句句相似度、所述N个第一相似度、所述N个杰卡德相似度和第一参数公式确定N个第一参数。
需要说明的是,本实施例的具体实现过程可参见上述方法实施例所述的具体实现过程,在此不再叙述。
上述实施例主要从方法侧执行过程的角度对本申请实施例的方案进行了介绍。可以理解的是,电子设备为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据所述方法示例对电子设备进行功能单元的划分,例如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。所述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
下面为本申请装置实施例,本申请装置实施例用于执行本申请方法实施例所实现的方法。请参阅图5,图5是本申请实施例提供的一种智能对话装置的结构示意图,应用于电子设备,所述装置包括:
确定单元501,用于基于用户输入的目标问题语句确定N个第一问题语句,每个第一问题语句与所述目标问题语句的相似度均大于或等于第一阈值,所述N为大于1的整数,每个第一问题语句关联一个第一答案语句;基于预设神经网络模型确定N个第一参数,所述N个第一参数与所述N个第一问题语句一一对应,所述N个第一参数用于评价其对应的第一问题语句与所述目标问题语句的相似度;将目标答案语句作为所述目标问题语句的答案语句,所述目标答案语句为目标参数对应的第一问题语句关联的第一答案语句,所述目标参数的值大于或等于第二阈值,所述N个第一参数包括所述目标参数;
输出单元502,用于输出所述目标答案语句。
在本申请的一实现方式中,在基于用户输入的目标问题语句确定N个第一问题语句方面,所述确定单元501包括获取子单元5011、第一子确定单元5012、第二子确定单元5013和第三子确定单元5014,其中:
所述获取子单元5011,用于获取用户输入的目标问题语句;
所述第一子确定单元5012,用于基于字面搜索从预设语料库中确定M个第二问题语句,所述字面搜索的关键词是基于所述目标问题语句确定的;
所述第二子确定单元5013,用于基于语义搜索从所述预设语料库中确定W个第三问题语句,每个第二问题语句与所述目标问题语句的字面相似度均大于或等于第三阈值,每个第三问题语句与所述目标问题语句的语义相似度均大于或等于第四阈值,所述第一阈值大于或等于所述第三阈值,所述第一阈值大于或等于所述第四阈值,所述M和所述W均为大于0的整数;
所述第三子确定单元5014,用于基于所述M个第二问题语句和所述W个第三问题语句,确定N个第一问题语句,所述N个第一问题语句包括至少一个第二问题语句和至少一个第三问题语句。
在本申请的一实现方式中,所述目标问题语句由第一字符集构成,所述第一字符集包括P个第一字符,所述P为大于0的整数;在基于字面搜索从预设语料库中确定M个第二问题语句方面,所述第一子确定单元5012,具体用于
在预设语料库以所述P个第一字符中的至少一个第一字符为关键词进行搜索,得到Q个第五问题语句;
从所述Q个第五问题语句中选择M个第五问题语句;
将所述M个第五问题语句确定为M个第二问题语句。
在本申请的一实现方式中,在基于语义搜索从所述预设语料库中确定W个第三问题语句方面,所述第二子确定单元5013,具体用于确定所述目标问题语句的语句构成成分;基于所述语句构成成分对所述目标问题语句进行过滤得到第四问题语句,所述第四问题语句的语句构成成分少于或等于所述目标问题语句的语句构成成分;从所述预设语料库中确定W个第三问题语句,每个第三问题语句与所述第四问题语句的语义相似度均大于或等于所述第四阈值。
在本申请的一实现方式中,在基于预设神经网络模型确定N个第一参数方面,所述确定单元501还包括第四子确定单元5015和第五子确定单元5016,其中:
所述第四子确定单元5015,用于基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个句句相似度、N个编辑距离和N个杰卡德相似度,所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度均与所述N个第一问题语句一一对应;
所述第五子确定单元5016,用于基于所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度确定N个第一参数,所述N个第一参数与所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度均一一对应。
在本申请的一实现方式中,在基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个句句相似度方面,所述第四子确定单元5015,具体用于:
将所述目标问题语句转化成第一句向量,以及将所述N个第一问题语句转化成N个第二句向量,所述N个第二句向量与所述N个第一问题语句一一对应;
提取所述第一句向量的特征信息得到第一目标向量,以及提取所述N个第二句向量的特征信息得到N个第二目标向量,所述N个第二目标向量与所述N个第二句向量一一对应;
基于句句相似度计算公式确定所述第一目标向量和每个第二目标向量的句句相似度,得到N个句句相似度。
在本申请的一实现方式中,所述目标问题语句由第一字符集构成,所述N个第一问题语句由N个第二字符集构成,所述N个第二字符集与所述N个第一问题语句一一对应;在基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个编辑距离方面,所述第四子确定单元5015,具体用于:
确定将所述第一字符集转化成每个第二字符集所需的最少编辑操作次数;
将得到的N个最少编辑操作次数确定为N个编辑距离,所述N个编辑距离与所述N个最少编辑操作次数一一对应。
在本申请的一实现方式中,在基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个杰卡德相似度方面,所述第四子确定单元5015,具体用于:
确定所述第一字符集与所述N个第二字符集的N个交集和N个并集,所述N个交集和所述N个并集均与所述N个第二字符集一一对应;
基于所述N个交集和所述N个并集确定N个杰卡德相似度,所述N个杰卡德相似度与所述N个交集和所述N个并集均一一对应。
在本申请的一实现方式中,在基于所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度确定N个第一参数,所述第五子确定单元5016,具体用于:
将所述N个编辑距离转化成N个第一相似度;
确定第一权重、第二权重和第三权重,所述第一权重用于表示句句相似度在用于评价第一参数时所占的比重,所述第二权重用于表示第一相似度在用于评价第一参数时所占的比重,所述第三权重用于表示杰卡德相似度在用于评价第一参数时所占的比重,所述第一权重、所述第二权重与所述第三权重之和为1;
基于所述第一权重、所述第二权重、所述第三权重、所述N个句句相似度、所述N个第一相似度、所述N个杰卡德相似度和第一参数公式确定N个第一参数。
需要说明的是,获取子单元5011、第一子确定单元5012、第二子确定单元5013、第三子确定单元5014、第四子确定单元5015、第五子确定单元5016和输出单元502可通过处理器实现。
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质可以为计算机非易失性可读存储介质,也可以为计算机易失性可读存储介质,在此不作限定,其中,该计算机可读存储介质存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤,上述计算机包括电子设备。
本申请实施例还提供一种计算机程序产品,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质上述计算机程序可操作来使计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤。该计算机程序产品可以为一个软件安装包,上述计算机包括电子设备。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络 单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例上述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种智能对话方法,其特征在于,应用于电子设备,所述方法包括:
    基于用户输入的目标问题语句确定N个第一问题语句,每个第一问题语句与所述目标问题语句的相似度均大于或等于第一阈值,所述N为大于1的整数,每个第一问题语句关联一个第一答案语句;
    基于预设神经网络模型确定N个第一参数,所述N个第一参数与所述N个第一问题语句一一对应,所述N个第一参数用于评价其对应的第一问题语句与所述目标问题语句的相似度;
    将目标答案语句作为所述目标问题语句的答案语句,所述目标答案语句为目标参数对应的第一问题语句关联的第一答案语句,所述目标参数的值大于或等于第二阈值,所述N个第一参数包括所述目标参数;
    输出所述目标答案语句。
  2. 根据权利要求1所述的方法,其特征在于,所述基于用户输入的目标问题语句确定N个第一问题语句,包括:
    获取用户输入的目标问题语句;
    基于字面搜索从预设语料库中确定M个第二问题语句,以及基于语义搜索从所述预设语料库中确定W个第三问题语句,所述字面搜索的关键词是基于所述目标问题语句确定的,每个第二问题语句与所述目标问题语句的字面相似度均大于或等于第三阈值,每个第三问题语句与所述目标问题语句的语义相似度均大于或等于第四阈值,所述第一阈值大于或等于所述第三阈值,所述第一阈值大于或等于所述第四阈值,所述M和所述W均为大于0的整数;
    基于所述M个第二问题语句和所述W个第三问题语句,确定N个第一问题语句,所述N个第一问题语句包括至少一个第二问题语句和至少一个第三问题语句。
  3. 根据权利要求2所述的方法,其特征在于,所述目标问题语句由第一字符集构成,所述第一字符集包括P个第一字符,所述P为大于0的整数;所述基于字面搜索从预设语料库中确定M个第二问题语句,包括:
    在预设语料库以所述P个第一字符中的至少一个第一字符为关键词进行搜索,得到Q个第五问题语句;
    从所述Q个第五问题语句中选择M个第五问题语句;
    将所述M个第五问题语句确定为M个第二问题语句。
  4. 根据权利要求2或3所述的方法,其特征在于,所述基于语义搜索从所述预设语料库中确定W个第三问题语句,包括:
    确定所述目标问题语句的语句构成成分;
    基于所述语句构成成分对所述目标问题语句进行过滤得到第四问题语句,所述第四问题语句的语句构成成分少于或等于所述目标问题语句的语句构成成分;
    从所述预设语料库中确定W个第三问题语句,每个第三问题语句与所述第四问题语句的语义相似度均大于或等于所述第四阈值。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述基于预设神经网络模型确定N个第一参数,包括:
    基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个句句相似度、N个编辑距离和N个杰卡德相似度,所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度均与所述N个第一问题语句一一对应;
    基于所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度确定N个第 一参数,所述N个第一参数与所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度均一一对应。
  6. 根据权利要求5所述的方法,其特征在于,所述基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个句句相似度,包括:
    将所述目标问题语句转化成第一句向量,以及将所述N个第一问题语句转化成N个第二句向量,所述N个第二句向量与所述N个第一问题语句一一对应;
    提取所述第一句向量的特征信息得到第一目标向量,以及提取所述N个第二句向量的特征信息得到N个第二目标向量,所述N个第二目标向量与所述N个第二句向量一一对应;
    基于句句相似度计算公式确定所述第一目标向量和每个第二目标向量的句句相似度,得到N个句句相似度。
  7. 根据权利要求5或6所述的方法,其特征在于,所述目标问题语句由第一字符集构成,所述N个第一问题语句由N个第二字符集构成,所述N个第二字符集与所述N个第一问题语句一一对应;所述基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个编辑距离,包括:
    确定将所述第一字符集转化成每个第二字符集所需的最少编辑操作次数;
    将得到的N个最少编辑操作次数确定为N个编辑距离,所述N个编辑距离与所述N个最少编辑操作次数一一对应。
  8. 根据权利要求5-7任一项所述的方法,其特征在于,所述基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个杰卡德相似度,包括:
    确定所述第一字符集与所述N个第二字符集的N个交集和N个并集,所述N个交集和所述N个并集均与所述N个第二字符集一一对应;
    基于所述N个交集和所述N个并集确定N个杰卡德相似度,所述N个杰卡德相似度与所述N个交集和所述N个并集均一一对应。
  9. 根据权利要求5-8任一项所述的方法,其特征在于,所述基于所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度确定N个第一参数,包括:
    将所述N个编辑距离转化成N个第一相似度;
    确定第一权重、第二权重和第三权重,所述第一权重用于表示句句相似度在用于评价第一参数时所占的比重,所述第二权重用于表示第一相似度在用于评价第一参数时所占的比重,所述第三权重用于表示杰卡德相似度在用于评价第一参数时所占的比重,所述第一权重、所述第二权重与所述第三权重之和为1;
    基于所述第一权重、所述第二权重、所述第三权重、所述N个句句相似度、所述N个第一相似度、所述N个杰卡德相似度和第一参数公式确定N个第一参数。
  10. 一种智能对话装置,其特征在于,应用于电子设备,所述装置包括:
    确定单元,用于基于用户输入的目标问题语句确定N个第一问题语句,每个第一问题语句与所述目标问题语句的相似度均大于或等于第一阈值,所述N为大于1的整数,每个第一问题语句关联一个第一答案语句;基于预设神经网络模型确定N个第一参数,所述N个第一参数与所述N个第一问题语句一一对应,所述N个第一参数用于评价其对应的第一问题语句与所述目标问题语句的相似度;将目标答案语句作为所述目标问题语句的答案语句,所述目标答案语句为目标参数对应的第一问题语句关联的第一答案语句,所述目标参数的值大于或等于第二阈值,所述N个第一参数包括所述目标参数;
    输出单元,用于输出所述目标答案语句。
  11. 根据权利要求10所述的装置,其特征在于,所述基于用户输入的目标问题语句确定N个第一问题语句,所述确定单元包括获取子单元、第一子确定单元、第二子确定单元和第三子确定单元,其中:
    获取子单元,用于获取用户输入的目标问题语句;
    第一子确定单元,用于基于字面搜索从预设语料库中确定M个第二问题语句,所述字面搜索的关键词是基于所述目标问题语句确定的;
    第二子确定单元,用于以及基于语义搜索从所述预设语料库中确定W个第三问题语句,每个第二问题语句与所述目标问题语句的字面相似度均大于或等于第三阈值,每个第三问题语句与所述目标问题语句的语义相似度均大于或等于第四阈值,所述第一阈值大于或等于所述第三阈值,所述第一阈值大于或等于所述第四阈值,所述M和所述W均为大于0的整数;
    第三子确定单元,用于基于所述M个第二问题语句和所述W个第三问题语句,确定N个第一问题语句,所述N个第一问题语句包括至少一个第二问题语句和至少一个第三问题语句。
  12. 根据权利要求11所述的装置,其特征在于,所述目标问题语句由第一字符集构成,所述第一字符集包括P个第一字符,所述P为大于0的整数;所述基于字面搜索从预设语料库中确定M个第二问题语句,所述第一子确定单元,具体用于:
    在预设语料库以所述P个第一字符中的至少一个第一字符为关键词进行搜索,得到Q个第五问题语句;
    从所述Q个第五问题语句中选择M个第五问题语句;
    将所述M个第五问题语句确定为M个第二问题语句。
  13. 根据权利要求10或11所述的装置,其特征在于,所述基于语义搜索从所述预设语料库中确定W个第三问题语句,所述第二子确定单元,具体用于:
    确定所述目标问题语句的语句构成成分;
    基于所述语句构成成分对所述目标问题语句进行过滤得到第四问题语句,所述第四问题语句的语句构成成分少于或等于所述目标问题语句的语句构成成分;
    从所述预设语料库中确定W个第三问题语句,每个第三问题语句与所述第四问题语句的语义相似度均大于或等于所述第四阈值。
  14. 根据权利要求10-13任一项所述的装置,其特征在于,所述基于预设神经网络模型确定N个第一参数,所述确定单元还包括:
    第四子确定单元,用于基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个句句相似度、N个编辑距离和N个杰卡德相似度,所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度均与所述N个第一问题语句一一对应;
    所述第五子确定单元,用于基于所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度确定N个第一参数,所述N个第一参数与所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度均一一对应。
  15. 根据权利要求14所述的装置,其特征在于,所述基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个句句相似度,所述第四子确定单元,具体用于:
    将所述目标问题语句转化成第一句向量,以及将所述N个第一问题语句转化成N个第二句向量,所述N个第二句向量与所述N个第一问题语句一一对应;
    提取所述第一句向量的特征信息得到第一目标向量,以及提取所述N个第二句向量的特征信息得到N个第二目标向量,所述N个第二目标向量与所述N个第二句向量一一对应;
    基于句句相似度计算公式确定所述第一目标向量和每个第二目标向量的句句相似度,得到N个句句相似度。
  16. 根据权利要求14或15所述的装置,其特征在于,所述目标问题语句由第一字符集构成,所述N个第一问题语句由N个第二字符集构成,所述N个第二字符集与所述N 个第一问题语句一一对应;所述基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个编辑距离,所述第四子确定单元,具体用于:
    确定将所述第一字符集转化成每个第二字符集所需的最少编辑操作次数;
    将得到的N个最少编辑操作次数确定为N个编辑距离,所述N个编辑距离与所述N个最少编辑操作次数一一对应。
  17. 根据权利要求14-16任一项所述的装置,其特征在于,所述基于预设神经网络模型确定所述目标问题语句与所述N个第一问题语句的N个杰卡德相似度,所述第四子确定单元,具体用于:
    确定所述第一字符集与所述N个第二字符集的N个交集和N个并集,所述N个交集和所述N个并集均与所述N个第二字符集一一对应;
    基于所述N个交集和所述N个并集确定N个杰卡德相似度,所述N个杰卡德相似度与所述N个交集和所述N个并集均一一对应。
  18. 根据权利要求14-17任一项所述的装置,其特征在于,所述基于所述N个句句相似度、所述N个编辑距离和所述N个杰卡德相似度确定N个第一参数,所述第五子确定单元,具体用于:
    将所述N个编辑距离转化成N个第一相似度;
    确定第一权重、第二权重和第三权重,所述第一权重用于表示句句相似度在用于评价第一参数时所占的比重,所述第二权重用于表示第一相似度在用于评价第一参数时所占的比重,所述第三权重用于表示杰卡德相似度在用于评价第一参数时所占的比重,所述第一权重、所述第二权重与所述第三权重之和为1;
    基于所述第一权重、所述第二权重、所述第三权重、所述N个句句相似度、所述N个第一相似度、所述N个杰卡德相似度和第一参数公式确定N个第一参数。
  19. 一种电子设备,其特征在于,包括处理器、存储器、通信接口,以及一个或多个程序,所述一个或多个程序被存储在所述存储器中,并且被配置由所述处理器执行,所述程序包括用于执行如权利要求1-9任一项所述的方法中的步骤的指令。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现权利要求1-9任一项所述的方法。
PCT/CN2019/117542 2019-10-29 2019-11-12 智能对话方法及相关设备 WO2021082070A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911034425.3A CN111008267B (zh) 2019-10-29 2019-10-29 智能对话方法及相关设备
CN201911034425.3 2019-10-29

Publications (1)

Publication Number Publication Date
WO2021082070A1 true WO2021082070A1 (zh) 2021-05-06

Family

ID=70111048

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117542 WO2021082070A1 (zh) 2019-10-29 2019-11-12 智能对话方法及相关设备

Country Status (2)

Country Link
CN (1) CN111008267B (zh)
WO (1) WO2021082070A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694942A (zh) * 2020-05-29 2020-09-22 平安科技(深圳)有限公司 问答方法、装置、设备及计算机可读存储介质
CN112667794A (zh) * 2020-12-31 2021-04-16 民生科技有限责任公司 一种基于孪生网络bert模型的智能问答匹配方法及系统
CN113407699A (zh) * 2021-06-30 2021-09-17 北京百度网讯科技有限公司 对话方法、装置、设备和存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472008A (zh) * 2018-11-20 2019-03-15 武汉斗鱼网络科技有限公司 一种文本相似度计算方法、装置及电子设备
CN110096580A (zh) * 2019-04-24 2019-08-06 北京百度网讯科技有限公司 一种faq对话方法、装置及电子设备
CN110334356A (zh) * 2019-07-15 2019-10-15 腾讯科技(深圳)有限公司 文章质量的确定方法、文章筛选方法、以及相应的装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5213098B2 (ja) * 2007-06-22 2013-06-19 独立行政法人情報通信研究機構 質問応答方法及びシステム
CN104598445B (zh) * 2013-11-01 2019-05-10 腾讯科技(深圳)有限公司 自动问答系统和方法
CN109829040B (zh) * 2018-12-21 2023-04-07 深圳市元征科技股份有限公司 一种智能对话方法及装置
CN109710744B (zh) * 2018-12-28 2021-04-06 合肥讯飞数码科技有限公司 一种数据匹配方法、装置、设备及存储介质
CN109740077B (zh) * 2018-12-29 2021-02-12 北京百度网讯科技有限公司 基于语义索引的答案搜索方法、装置及其相关设备
CN109948143B (zh) * 2019-01-25 2023-04-07 网经科技(苏州)有限公司 社区问答系统的答案抽取方法
CN110162611B (zh) * 2019-04-23 2021-03-26 苏宁金融科技(南京)有限公司 一种智能客服应答方法及系统
CN110263346B (zh) * 2019-06-27 2023-01-24 卓尔智联(武汉)研究院有限公司 基于小样本学习的语意分析方法、电子设备及存储介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472008A (zh) * 2018-11-20 2019-03-15 武汉斗鱼网络科技有限公司 一种文本相似度计算方法、装置及电子设备
CN110096580A (zh) * 2019-04-24 2019-08-06 北京百度网讯科技有限公司 一种faq对话方法、装置及电子设备
CN110334356A (zh) * 2019-07-15 2019-10-15 腾讯科技(深圳)有限公司 文章质量的确定方法、文章筛选方法、以及相应的装置

Also Published As

Publication number Publication date
CN111008267B (zh) 2024-07-12
CN111008267A (zh) 2020-04-14

Similar Documents

Publication Publication Date Title
US11651236B2 (en) Method for question-and-answer service, question-and-answer service system and storage medium
US11409813B2 (en) Method and apparatus for mining general tag, server, and medium
WO2021139108A1 (zh) 情绪智能识别方法、装置、电子设备及存储介质
US11295092B2 (en) Automatic post-editing model for neural machine translation
US9940370B2 (en) Corpus augmentation system
JP6601470B2 (ja) 自然言語の生成方法、自然言語の生成装置及び電子機器
WO2021082070A1 (zh) 智能对话方法及相关设备
WO2018157789A1 (zh) 一种语音识别的方法、计算机、存储介质以及电子装置
WO2021114810A1 (zh) 基于图结构的公文推荐方法、装置、计算机设备及介质
CN110619050B (zh) 意图识别方法及设备
DE112016005912T5 (de) Technologien zur satzende-detektion unter verwendung von syntaktischer kohärenz
JP7335300B2 (ja) 知識事前訓練モデルの訓練方法、装置及び電子機器
WO2024098623A1 (zh) 跨媒体检索及模型训练方法、装置、设备、菜谱检索系统
US20220058349A1 (en) Data processing method, device, and storage medium
WO2022022049A1 (zh) 文本长难句的压缩方法、装置、计算机设备及存储介质
EP3961433A2 (en) Data annotation method and apparatus, electronic device and storage medium
CN111126084A (zh) 数据处理方法、装置、电子设备和存储介质
EP2984588A2 (en) Ordering a lexicon network for automatic disambiguation
CN117173269A (zh) 一种人脸图像生成方法、装置、电子设备和存储介质
CN117272977A (zh) 人物描写语句的识别方法、装置、电子设备及存储介质
US20230317058A1 (en) Spoken language processing method and apparatus, and storage medium
CN114118049B (zh) 信息获取方法、装置、电子设备及存储介质
US11531811B2 (en) Method and system for extracting keywords from text
CN114818684A (zh) 文本复述模型的训练方法、文本复述方法和装置
CN114218431A (zh) 视频搜索方法、装置、电子设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19950559

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19950559

Country of ref document: EP

Kind code of ref document: A1