CN113946658A

CN113946658A - AI-based man-machine conversation method, device and storage medium

Info

Publication number: CN113946658A
Application number: CN202111240855.8A
Authority: CN
Inventors: 蒋似尧
Original assignee: Hisense Visual Technology Co Ltd
Current assignee: Hisense Visual Technology Co Ltd
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2022-01-18

Abstract

The embodiment of the application provides a man-machine conversation method, equipment and a storage medium based on AI. The method comprises the following steps: the method comprises the steps of obtaining a dialog text of a user, determining at least one first semantic vector unit from at least one semantic vector unit according to a semantic feature vector of the dialog text in a sub-corpus corresponding to a first corpus category, determining at least one candidate reply corpus from the corpus corresponding to the at least one first semantic vector unit according to a plurality of keywords of the dialog text, determining the reply corpus of the dialog text in the at least one candidate reply corpus, and outputting the reply corpus. The matching rate is improved, and the system overhead is reduced.

Description

AI-based man-machine conversation method, device and storage medium

Technical Field

The embodiments of the present application relate to the technical field of artificial intelligence, and more particularly, to an AI-based human-machine conversation method, device, and storage medium.

Background

With the continuous development of the technical field of Artificial Intelligence (AI), the AI-based human-computer dialog system can improve the convenience of human-computer interaction, and thus is increasingly applied.

At present, the following two technical schemes are commonly adopted for replying the sentences input by the user during the man-machine conversation: the first is to adopt text matching technology, select the corpus with the highest matching score from the corpus as the reply; the second is to read the chat history using an encoder in a sequence-to-sequence model and generate a reply using a decoder in the model. The second technical scheme adopts a sequence-to-sequence model, so that the reply effect of the generated content is difficult to control, and compared with the second technical scheme, the first technical scheme has the advantages of high reply correlation and high fluency.

However, in the first technical solution, because the corpus is huge in corpus number, the sentences input by the user are matched with all the corpora in the corpus one by one, which inevitably brings about the problems of slow matching rate and high system overhead.

Disclosure of Invention

The embodiment of the application provides a man-machine conversation method, equipment and a storage medium based on AI, which improve the matching rate of reply corpora and reduce the system overhead.

In a first aspect, a man-machine conversation method based on artificial intelligence AI is provided, which includes: acquiring a dialog text of a user; determining at least one first semantic vector unit from at least one semantic vector unit in a sub-corpus corresponding to a first corpus class according to a semantic feature vector of the dialog text, wherein the distance between the first semantic vector unit and the semantic feature vector of the dialog text in a vector space is smaller than or equal to a first preset value, the sub-corpus comprises the at least one semantic vector unit, the semantic vector unit comprises corpora similar to the semantic feature vector in the sub-corpus, and the first corpus class is the corpus class corresponding to the dialog text in the corpus; determining at least one candidate reply corpus from the corpus corresponding to the at least one first semantic vector unit according to a plurality of keywords of the dialog text, wherein the plurality of keywords comprise at least one entity of the dialog text and at least one expansion word of the entity; and determining the reply linguistic data of the dialog text in the at least one candidate reply linguistic data, and outputting the reply linguistic data.

In a second aspect, an electronic device is provided, comprising: the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a dialog text of a user; the processing unit is used for determining at least one first semantic vector unit from at least one semantic vector unit according to a semantic feature vector of the dialog text in a sub-corpus corresponding to a first corpus class, wherein the distance between the first semantic vector unit and the semantic feature vector of the dialog text in a vector space is smaller than or equal to a first preset value, the sub-corpus comprises the at least one semantic vector unit, the semantic vector unit comprises corpora similar to the semantic feature vector in the sub-corpus, and the first corpus class is the corpus class corresponding to the dialog text in the corpus; the processing unit is further configured to determine at least one candidate reply corpus from the corpus corresponding to the at least one first semantic vector unit according to a plurality of keywords of the dialog text, where the plurality of keywords include at least one entity of the dialog text and at least one expanded word of the entity; the processing unit is further configured to determine a reply corpus of the dialog text in the at least one candidate reply corpus; and the output unit is used for outputting the reply corpus.

In a third aspect, an electronic device is provided that includes a memory and a processor; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored by the memory, causing the processor to perform a method as in the first aspect or its implementations.

In a fourth aspect, there is provided a computer readable storage medium for storing a computer program for causing a computer to perform the method as in the first aspect or its implementations.

In a fifth aspect, there is provided a computer program product comprising computer program instructions to cause a computer to perform the method as in the first aspect or its implementations.

A sixth aspect provides a computer program for causing a computer to perform a method as in the first aspect or implementations thereof.

According to the embodiment of the application, the electronic equipment determines at least one first semantic vector unit corresponding to the dialog text in the sub corpus of the first corpus category, and then determines at least one candidate reply corpus corresponding to a plurality of keywords of the dialog text from the corpus corresponding to the first semantic vector unit, so that the screening of the corpus in the corpus is realized, and then the reply corpus is determined from the at least one candidate reply corpus, the matching rate is improved, and the system overhead is reduced.

Drawings

Fig. 1 is a schematic structural diagram of a human-machine interaction system according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of an AI-based man-machine conversation method according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of a clustering result of a semantic vector unit according to an embodiment of the present application;

FIG. 4 is a block diagram of a matching model provided by an embodiment of the present application;

fig. 5 is a schematic flowchart of a man-machine interaction method according to an embodiment of the present application;

fig. 6 is a schematic block diagram of an electronic device provided in an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art without making any creative effort with respect to the embodiments in the present application belong to the protection scope of the present application.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

In a man-machine conversation scene, when a sentence input by a user is replied, in order to make the relevance between the reply and the input sentence higher and the fluency higher, a text matching technology is often adopted, the sentence input by the user is matched with the corpora in the corpus, and the corpora with the highest matching score is selected as the reply, however, the sentence input by the user is matched with the corpora in the corpus one by one, so that the matching rate is reduced, and the system overhead is increased.

In view of the above technical problems, an AI-based man-machine conversation method provided in an embodiment of the present application divides corpora in a corpus into sub-corpora of different corpus categories (primary classification), performs corpus division on a plurality of corpora corresponding to each corpus category by at least one semantic vector unit (secondary classification), determines at least one corresponding first semantic vector unit in the corpus category corresponding to a dialog text (query) according to a semantic feature vector of the dialog text, and determines at least one candidate reply corpus from the corpus corresponding to the at least one first semantic vector unit, thereby implementing a screening of corpora in the corpus, and further determining a reply corpus from the at least one candidate reply corpus, thereby improving a matching rate and reducing a system overhead.

It should be noted that the human-computer conversation method provided by the embodiment of the present application can be applied to any human-computer conversation scenario, but is particularly applicable to open domain chat, and even multiple rounds of open domain chat. It will be appreciated by those skilled in the art that open domain chat, i.e., chat on no topic, i.e., the response made by the system when the statements entered by the user do not have explicit information or service acquisition requirements. Corresponding to this is a task-based dialogue, i.e. an interaction with the user in order to perform a certain task in a certain area (e.g. booking airline tickets, querying maps, etc.). The multi-round open domain chat is an open domain chat with contextual semantics. Open domain chat in existing man-machine conversation systems mainly plays roles in closing distance, establishing trust relationships, emotional companions, smoothing conversation processes (for example, when task-type conversation cannot meet user requirements) and improving user stickiness.

Fig. 1 is a schematic structural diagram of a human-machine interaction system according to an embodiment of the present disclosure. As shown in fig. 1, the man-machine interaction system 100 at least includes: the terminal device 110 and the server (or server cluster) 120 are connected in a wired or wireless manner.

The terminal device 110 is configured to receive a dialog text input by a user and send the dialog text to the server 120. The server 120 determines a reply corpus for the dialog text and transmits the reply corpus to the terminal device 110. The terminal device 110 may present the reply corpus.

Illustratively, the server 120 may perform at least one of the following processes:

identifying an entity in the dialog text;

expanding the entities in the dialogue text to obtain expansion words of any one or more entities;

screening the linguistic data in the linguistic data base to obtain candidate reply linguistic data corresponding to the dialogue text;

and determining a reply corpus corresponding to the dialog text in the candidate reply corpuses (the reply corpuses in the following text).

It should be understood that the processes described above as being performed by server 120 may all be performed by terminal device 110; it should be further understood that the terminal device 110 and the server 120 may cooperate to complete part or all of the process executed by the server 120, for example, the terminal device 110 may identify entities in the dialog text, and expand the entities in the dialog text to obtain expanded words of any one or more entities, the server 120 may filter corpora in the corpus to obtain candidate reply corpora corresponding to the dialog text, and determine the reply corpora corresponding to the dialog text in the candidate reply corpora according to the entities provided by the terminal device 110 and the expanded words associated with the entities.

Alternatively, the terminal device 110 may be, for example, a Mobile Phone (Mobile Phone), a tablet computer (Pad), a computer, a Television (TV), a Virtual Reality (VR) terminal device, an Augmented Reality (AR) terminal device, a terminal device in industrial control (industrial control), a terminal device in unmanned driving (self driving), a terminal device in remote medical treatment (remote medical), a terminal device in smart city (smart city), a terminal device in smart home (smart home), or the like. The terminal equipment in this application embodiment can also be wearable equipment, and wearable equipment also can be called as wearing formula smart machine, is the general term of using wearing formula technique to carry out intelligent design, develop the equipment that can dress to daily wearing, like glasses, gloves, wrist-watch, dress and shoes etc.. A wearable device is a portable device that is worn directly on the body or integrated into the clothing or accessories of the user.

The method provided by the embodiment of the present application will be described below with reference to the accompanying drawings.

It should be understood that the following description is only for convenience of understanding and description, and the method provided by the embodiment of the present application is mainly described by taking an electronic device as an execution subject. For example, the electronic device may be the terminal device 110 or the server 120 in fig. 1, and it should be understood that the electronic device may be the terminal device 110 in some steps and may be the server in another step in the following embodiments.

Fig. 2 is a schematic flowchart of an AI-based man-machine conversation method according to an embodiment of the present disclosure. As shown in fig. 2, the method specifically includes the following steps:

s210, acquiring a dialog text of a user;

s220, in a sub corpus corresponding to a first corpus category, determining at least one first semantic vector unit from at least one semantic vector unit according to a semantic feature vector of the dialog text, wherein the distance between the first semantic vector unit and the semantic feature vector of the dialog text in a vector space is smaller than or equal to a first preset value, the self corpus comprises the at least one first semantic vector unit, the semantic vector unit comprises corpora similar to the semantic feature vector in the sub corpus, and the first corpus category is the corpus category corresponding to the dialog text in the corpus;

s230, determining at least one candidate reply corpus from the corpus corresponding to the at least one first semantic vector unit according to a plurality of keywords of the dialog text, wherein the plurality of keywords comprise at least one entity of the dialog text and at least one expansion word of the entity;

s240, determining the reply linguistic data of the dialog text in the at least one candidate reply linguistic data, and outputting the reply linguistic data.

It should be noted that in S210, the electronic device may receive a dialog text input by a user on the human-computer interaction interface, or the electronic device may receive a voice input by the user through the audio receiver and convert the voice into the dialog text, or the electronic device may obtain the dialog text sent by another device.

For S220, it is noted that:

the corpus category may be preset, or obtained by classifying the corpus in advance by the electronic device, where the corpus category may be understood as a first-level classification of the corpus. For example, in an open-domain chat process, the topic of the conversation is not limited, and the sentences input by the user may be of any category, for which the corpus in the corpus may be divided into a plurality of categories, such as exaggeration, smiling, delicacy, sports, and the like. The first corpus category is the corpus category corresponding to all corpus categories of the dialog text, so that the reply corpus corresponding to the dialog text is determined in the first corpus category corresponding to the dialog text, and the preliminary screening of the reply corpus is realized.

Furthermore, the sub-corpus corresponding to each corpus category may be divided into corpora by at least one semantic vector unit, the semantic vector unit may be understood as a secondary classification of the corpus corpora, and the at least one semantic vector unit may be obtained by clustering semantic feature vectors corresponding to each corpus in the sub-corpus corresponding to the corpus category of the corpus, that is, a plurality of corpora corresponding to each semantic vector unit have similar semantics. As shown in fig. 3, each gray level represents a clustering result of a semantic vector unit.

The above corpus classification and semantic vector unit division process will be explained below.

In the above S220, the electronic device determines, in the first corpus category corresponding to the dialog text, at least one first semantic vector unit whose distance to the semantic feature vector corresponding to the dialog text is smaller than the first preset value, so as to implement re-screening of the corpus in the corpus.

For example, the electronic device may determine the distance between the semantic feature vector of the dialog text and the coordinates of each semantic vector unit in the vector space according to the coordinates of the semantic feature vector in the vector space. For example, the electronic device may take the distance between the coordinates of the semantic feature vector of the dialog text in the vector space and the coordinates of the center position of each semantic vector unit in the vector space as the distance between the two; or the distance between the coordinates of the semantic feature vector of the dialog text in the vector space and the coordinate point of each semantic vector unit closest to the coordinates of the semantic feature vector of the dialog text may be taken as the distance between the two.

It should be noted that in the above S230, after the electronic device determines at least one first semantic vector unit in the above S220, the corpus corresponding to the at least one first semantic vector unit in the corpus is obtained. Furthermore, the electronic device matches the corpus corresponding to the at least one first semantic vector unit to obtain at least one candidate reply corpus according to the plurality of keywords of the dialog text.

It should be noted that the plurality of keywords include at least one entity of the dialog text, and the entity refers to a word having a specific meaning, such as appointments, place names, organization names, proper nouns, and the like.

The electronic device may perform entity recognition on the dialog text to obtain the at least one entity. The electronic device may perform entity identification based on a method of rule matching, a method of statistical machine school, a method of deep learning, and the like. For example, in the rule matching method, a built rule template and a dictionary are preset in the electronic equipment, and the electronic equipment performs entity recognition on a dialog text by using a template and character string matching method; as another example, the electronic device may perform entity recognition on the sequence labels of words in the conversational text via a deep school model, such as a Long Short Term Memory-Conditional Random Field Algorithm (LSTM-CRF) model.

Optionally, the electronic device may sort the identified entities through a text library (text tank) algorithm, to obtain at least one entity ranked in the top, which is used as the entity in the final keyword.

The plurality of keywords further includes at least one expansion word, and each of some or all of the at least one entity in the keywords may be associated with the at least one expansion word. It should be noted that the number of the expansion words associated with each entity may be the same, for example, 10 expansion words are associated, or the number of the expansion words associated with each entity may be different. For example, the dialog text is "i like watching a movie", and for the entity "movie" therein, the expanded expansion words may include "actor", "movie show", "theme song", and so on.

For example, the electronic device may download words and their corresponding vector representations from the open source data, with the vector representation of each word being sufficiently pre-trained to represent the context semantics of the word. The electronic equipment searches vectors corresponding to the entity (namely, the words to be expanded) in vector indexes of all the words, searches m vectors with the highest similarity, and finally obtains the words corresponding to the vectors, namely, expands the words to obtain m similar words, wherein m is an integer larger than 0. Wherein, the similarity (e.g. cosine similarity cos (θ)) can be calculated by the following formula:

the vector corresponding to the entity can be represented by An n-dimensional vector a, namely [ a1, a2, … An ], and the vector of any word can be represented by An n-dimensional vector B, namely [ B1, B2, … Bn ], wherein n is An integer greater than 0.

In S240, the electronic device may determine one or more reply corpora from at least one candidate reply corpus, generally, the number of the reply corpora is one.

Further, the electronic device may output the reply corpus, for example, display the reply corpus by display decoding; or converting the reply corpus into audio and playing the audio corresponding to the reply corpus; or the electronic device may send the reply corpus to other devices, for example, to a testing device, which tests the relevance and/or fluency of the reply corpus with respect to the dialog text.

In some embodiments, the electronic device may identify a corpus class corresponding to the dialog text before determining the at least one first semantic vector unit from the at least one semantic vector unit according to the semantic feature vector of the dialog text.

Illustratively, the electronic device inputs a dialog text into a corpus classification model trained in advance, and obtains a corpus class corresponding to the dialog text in a corpus.

Optionally, the corpus classification model further performs category classification on a plurality of corpuses in the corpus in advance. For example, the electronic device inputs a plurality of corpuses in the corpus into a corpus classification model, identifies a category of each corpus, and tags each corpus, where the tag is used to indicate the corpus category of the corpus.

The corpus classification model may be trained based on a deep learning model, for example.

In some embodiments, before determining the at least one first semantic vector unit from the at least one semantic vector unit according to the semantic feature vector of the dialog text, the electronic device may perform feature extraction and/or vector conversion on the dialog text to obtain the semantic feature vector of the dialog text. For example, the electronic device may input the dialog text into a pre-trained semantic representation model to obtain a semantic feature vector of the dialog text.

Further, the electronic device may determine the at least one first semantic vector unit among the at least one semantic vector unit via a vector retrieval tool (e.g., Faiss, Annoy, etc., where Faiss is a framework that provides efficient similarity search and clustering for dense vectors, and where Annoy is a library of approximate nearest neighbors for open-source high-dimensional spaces).

The semantic representation model is further used for obtaining semantic feature vectors corresponding to the plurality of corpora in the corpus respectively. For example, the electronic device inputs a plurality of corpora in the corpus into the semantic representation model to obtain a semantic feature vector corresponding to each corpus.

Further, the electronic device clusters semantic feature vectors corresponding to the plurality of corpora respectively, that is, aggregates corpora represented by the desired vectors into one class, as shown in fig. 3, each class cluster (i.e., semantic vector unit) after the clustering is completed is classified into a secondary class, and the class of the secondary class does not carry a displayed label, thereby obtaining at least one semantic vector unit.

Optionally, the electronic device clusters a plurality of corpuses in the same corpus category through a vector retrieval tool (e.g., Faiss, Annoy, etc.).

In a possible implementation manner of the S230, the determining, by the electronic device, at least one candidate reply corpus according to a plurality of keywords of the dialog text includes: the electronic equipment determines a word feature vector according to a plurality of key words of the dialog text, and inputs the word feature vector and the corpus corresponding to the at least one first semantic vector unit into a pre-trained vector space model to obtain at least one candidate reply corpus with the highest matching degree with the dialog text.

Illustratively, the electronic device obtains, through a vector space model, a similarity score between each corpus of a plurality of corpuses corresponding to the first semantic vector unit and the plurality of keywords. The electronic device may use p corpora with similarity scores greater than a threshold as candidate reply corpora, or use q corpora with similarity scores sorted before as candidate reply corpora, where p is a positive integer and q is a positive integer.

For example, the electronic device may determine at least one candidate reply corpus according to a plurality of keywords of the dialog text, and may include: the electronic equipment determines a word weight corresponding to each keyword in the plurality of keywords, and generates a word feature vector according to the plurality of keywords and the word weight corresponding to each keyword in the plurality of keywords, wherein each keyword in the word feature vector is a feature dimension.

Optionally, the electronic device may determine a word weight corresponding to each keyword in the plurality of keywords by using a Term Frequency-Inverse Document Frequency (TF-IDF) algorithm.

Alternatively, the electronic device may implement the TF-IDF algorithm to compute the word weights by a full-text search tool (e.g., ElasticSearch).

For each keyword, the word weight calculated by the TF-IDF algorithm can be calculated by the following formula:

word weight (TF) Inverse Document Frequency (IDF)

Optionally, to highlight the entity in the plurality of keywords, the electronic device may increase the word weight of the entity in the keyword, for example, multiply the word weight of the entity by a preset multiple, such as multiply the word weight of the entity by 2.

In a possible implementation manner of the foregoing S240, the electronic device may determine the reply corpus by using a matching model. For example, the electronic device may input the at least one candidate reply corpus and the dialog text into the matching model, resulting in a reply corpus output by the matching model.

Illustratively, the electronic device performs information interaction between the dialog text and each candidate reply corpus through an attention mechanism of the matching model to obtain an attention vector between each candidate reply corpus and the dialog text, obtains a matching degree between each candidate reply corpus and the dialog text according to at least one attention vector corresponding to at least one candidate reply corpus through the matching model, and takes the reply corpus with the highest matching degree as the reply corpus.

The matching model can be obtained based on deep learning model training, and the matching model can be used for matching the dialogue text and the candidate reply corpus by fully utilizing the interactive information of the text support. FIG. 4 shows a frame diagram of a matching model, which mainly includes five parts, namely a word representation layer, a sentence coding layer, a matching layer, a fusion layer and a prediction layer.

1) word representation layer: each word of the dialog text is represented as a pre-trained word vector.

2) sensor encoding layer: inputting the word vector of the dialog text into a multi-layer Recurrent Neural Network (RNN) encoder, and performing weighted summation on the output of each layer to serve as the encoding characteristic of the dialog text. For dialog text and candidate reply corpora, the encoders share weights.

3) matching layer: this layer mainly carries out information interaction between the dialog text and the candidate reply corpus through an attention mechanism. And calculating and splicing the attention vectors from the dialog text to the candidate reply corpus and from the candidate reply corpus to the dialog text to obtain the attention vector, wherein the attention vector fully contains matching information between the dialog text and the candidate reply corpus.

4) An Aggregation layer: RNN and feature extraction (posing) operations are used to fuse the encoding vector and attention vector of the dialog text and candidate reply corpus, respectively. And then the two are spliced to obtain the final matching semantic feature vector.

5) prediction layer: and inputting the obtained matching semantic feature vector into a fully-connected neural network classifier to predict whether a dialog text and the candidate reply corpus are matched, wherein the output result is a matching score and represents the matching degree. And evaluating whether the output result is consistent with the label (label) or not by adopting a supervised learning mode.

The trained loss function L (D, Θ) can be expressed as:

Θ represents the parameters of the matching model, which is trained on the training data set by minimizing the cross-entropy loss function. After the model is trained, the matching degree of all candidate reply linguistic data and the dialog text input by the user can be calculated, and one of the candidate reply linguistic data with the highest matching score is selected as the reply linguistic data.

Fig. 5 shows a flow diagram of a man-machine interaction method. On the basis of any of the above embodiments, fig. 5 shows a possible implementation manner. In one aspect, an electronic device classifies a corpus into two levels. In a first class of classification, the electronic device divides the corpora in the corpus into a plurality of corpora categories, such as a bonus, smiling, delicacy, sports, and the like; in the second-level classification, the electronic equipment clusters semantic feature vectors corresponding to the corpora in the corpus category according to each corpus category to obtain an implicit classification result. On the other hand, the electronic equipment performs keyword extraction on the dialog text input by the user, including entity recognition and similar word expansion. Furthermore, the electronic equipment determines the linguistic data corresponding to the dialog text in each classification level in sequence according to the dialog text, and further determines at least one candidate reply linguistic data from the secondary classification, so that the linguistic data in the linguistic database can be screened. Furthermore, the electronic equipment determines a reply corpus from the at least one candidate reply corpus according to the keywords of the dialog text, and outputs the reply corpus to complete a man-machine dialog process.

Therefore, in the embodiment of the application, the electronic device determines at least one first semantic vector unit corresponding to the dialog text in the corresponding corpus category of the dialog text, and further determines at least one candidate reply corpus corresponding to a plurality of keywords of the dialog text from the corpus corresponding to the first semantic vector unit, so that the screening of the corpus in the corpus library is realized, and further determines the reply corpus from the at least one candidate reply corpus, thereby improving the matching rate and reducing the system overhead.

Fig. 6 is a schematic block diagram of an electronic device according to an embodiment of the present application. As shown in fig. 6, the electronic device 300 includes:

an acquisition unit 310, configured to acquire a dialog text of a user;

a processing unit 320, configured to determine, in a sub-corpus corresponding to a first corpus class, at least one first semantic vector unit from at least one semantic vector unit according to a semantic feature vector of the dialog text, where a distance between the first semantic vector unit and the semantic feature vector of the dialog text in a vector space is less than or equal to a first preset value, the sub-corpus includes the at least one semantic vector unit, the semantic vector unit includes corpora similar to the semantic feature vector in the sub-corpus, and the first corpus class is a corpus class corresponding to the dialog text in the corpus;

the processing unit 320 is further configured to determine at least one candidate reply corpus from the corpus corresponding to the at least one first semantic vector unit according to a plurality of keywords of the dialog text, where the plurality of keywords include at least one entity of the dialog text and at least one expanded word of the entity;

the processing unit 320 is further configured to determine a reply corpus of the dialog text in the at least one candidate reply corpus;

the output unit 330 is configured to output the reply corpus.

In some embodiments, the processing unit 320 is further configured to input the dialog text into a pre-trained corpus classification model, so as to obtain a corpus class corresponding to the dialog text in the corpus.

In some embodiments, the processing unit 320 is further configured to divide the corpora into corpus categories by the corpus classification model.

In some embodiments, the processing unit 320 is further configured to input the dialog text into a pre-trained semantic representation model, and obtain a semantic feature vector of the dialog text.

In some embodiments, the processing unit 320 is further configured to input a plurality of corpora in the corpus into the semantic representation model, so as to obtain semantic feature vectors corresponding to the corpora respectively; and clustering the semantic feature vectors respectively corresponding to the plurality of corpora to obtain the at least one semantic vector unit.

In some embodiments, the processing unit 320 is specifically configured to:

determining a word weight corresponding to each keyword in the plurality of keywords;

generating a word feature vector according to the plurality of keywords and the word weight corresponding to each keyword in the plurality of keywords;

and inputting the word feature vector and the corpus corresponding to the at least one first semantic vector unit into a pre-trained vector space model to obtain at least one candidate reply corpus of which the semantic matching degree with the dialog text is higher than a second preset value.

In some embodiments, processing unit 320 is specifically configured to determine a word weight corresponding to each keyword of the plurality of keywords using a word frequency-inverse document frequency TF-IDF algorithm.

In some embodiments, the processing unit 320 is specifically configured to determine, by using the TF-IDF algorithm, an initial word weight corresponding to at least one entity in the plurality of keywords and a word weight corresponding to the at least one expanded word; and multiplying the initial word weight of the at least one entity by a preset multiple to obtain the word weight corresponding to the at least one entity.

In some embodiments, the processing unit 320 is specifically configured to:

performing information interaction between the dialog text and each candidate reply corpus through an attention mechanism of a matching model to obtain an attention vector of each candidate reply corpus and the dialog text;

determining the reply corpus with the highest matching degree according to at least one attention vector corresponding to the at least one candidate reply corpus respectively through the matching model, and taking the reply corpus with the highest matching degree as the reply corpus.

The electronic device provided by the above embodiment may execute the technical solution of the above method embodiment, and the implementation principle and the technical effect are similar, which are not described herein again.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device shown in fig. 7 includes a processor 410, and the processor 410 can call and run a computer program from a memory to implement the method in the embodiment of the present application.

Optionally, as shown in fig. 7, the electronic device 400 may further include a memory 420. From the memory 420, the processor 410 can call and run a computer program to implement the method in the embodiment of the present application.

The memory 420 may be a separate device from the processor 410, or may be integrated into the processor 410.

Optionally, as shown in fig. 7, the electronic device 400 may further include a transceiver 430, and the processor 410 may control the transceiver 430 to communicate with other devices, and specifically, may transmit information or data to the other devices or receive information or data transmitted by the other devices.

The transceiver 430 may include a transmitter and a receiver, among others. The transceiver 430 may further include antennas, and the number of antennas may be one or more.

Optionally, the electronic device 400 may implement corresponding processes in the methods of the embodiments of the present application, and for brevity, details are not described here again.

It should be understood that the processor of the embodiments of the present application may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

It will be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (DDR SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous link SDRAM (SLDRAM), and Direct Rambus RAM (DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

It should be understood that the above memories are exemplary but not limiting illustrations, for example, the memories in the embodiments of the present application may also be Static Random Access Memory (SRAM), dynamic random access memory (dynamic RAM, DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (enhanced SDRAM, ESDRAM), Synchronous Link DRAM (SLDRAM), Direct Rambus RAM (DR RAM), and the like. That is, the memory in the embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.

The embodiment of the application also provides a computer readable storage medium for storing the computer program.

Optionally, the computer-readable storage medium may be applied to the electronic device in the embodiment of the present application, and the computer program enables a computer to execute corresponding processes in each method in the embodiment of the present application, which is not described herein again for brevity.

Embodiments of the present application also provide a computer program product comprising computer program instructions.

Optionally, the computer program product may be applied to the electronic device in the embodiment of the present application, and the computer program instructions enable the computer to execute corresponding processes in each method in the embodiment of the present application, which is not described herein again for brevity.

The embodiment of the application also provides a computer program.

Optionally, the computer program may be applied to the electronic device in the embodiment of the present application, and when the computer program runs on a computer, the computer is enabled to execute corresponding processes in each method in the embodiment of the present application, and for brevity, details are not described here again.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, device and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. With regard to such understanding, the technical solutions of the present application may be essentially implemented or contributed to by the prior art, or may be implemented in a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A man-machine conversation method based on artificial intelligence AI is characterized by comprising the following steps:

acquiring a dialog text of a user;

determining at least one first semantic vector unit from at least one semantic vector unit in a sub-corpus corresponding to a first corpus class according to a semantic feature vector of a dialog text, wherein the distance between the first semantic vector unit and the semantic feature vector of the dialog text in a vector space is smaller than or equal to a first preset value, the sub-corpus comprises the at least one semantic vector unit, the semantic vector unit comprises corpora similar to the semantic feature vector in the sub-corpus, and the first corpus class is the corpus class corresponding to the dialog text in the corpus;

determining at least one candidate reply corpus from the corpus corresponding to the at least one first semantic vector unit according to a plurality of keywords of the dialog text, wherein the plurality of keywords comprise at least one entity of the dialog text and at least one expansion word of the entity;

and determining the reply linguistic data of the dialog text in the at least one candidate reply linguistic data, and outputting the reply linguistic data.

2. The method of claim 1, further comprising:

and inputting the dialog text into a pre-trained corpus classification model to obtain the corpus category corresponding to the dialog text in the corpus.

3. The method of claim 2, further comprising:

and dividing the plurality of corpora in the corpus into a plurality of corpus categories through the corpus classification model.

4. The method according to any one of claims 1 to 3, further comprising:

and inputting the dialog text into a pre-trained semantic representation model to obtain a semantic feature vector of the dialog text.

5. The method of claim 4, further comprising:

inputting a plurality of corpora in a corpus into the semantic representation model to obtain semantic feature vectors corresponding to the corpora respectively;

and clustering the semantic feature vectors respectively corresponding to the plurality of corpora to obtain the at least one semantic vector unit.

6. The method according to any one of claims 1 to 3, wherein the determining at least one candidate reply corpus from the at least one first semantic vector unit according to the plurality of keywords of the dialog text comprises:

7. The method of claim 6, wherein determining a word weight for each of the plurality of keywords comprises:

and determining a word weight corresponding to each keyword in the plurality of keywords by using a word frequency-inverse document frequency TF-IDF algorithm.

8. The method of claim 7, wherein determining a word weight for each of the plurality of keywords using a TF-IDF algorithm comprises:

determining an initial word weight corresponding to at least one entity in the plurality of keywords and a word weight corresponding to the at least one expansion word by using the TF-IDF algorithm;

and multiplying the initial word weight of the at least one entity by a preset multiple to obtain a word weight corresponding to the at least one entity.

9. The method according to any one of claims 1 to 3, wherein said determining a reply corpus matching said dialog text in said at least one candidate reply corpus comprises:

and determining the reply corpus with the highest matching degree according to at least one attention vector corresponding to the at least one candidate reply corpus respectively through the matching model, and taking the reply corpus with the highest matching degree as the reply corpus.

10. An electronic device, comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a dialog text of a user;

the processing unit is used for determining at least one first semantic vector unit from at least one semantic vector unit according to a semantic feature vector of the dialog text in a sub-corpus corresponding to a first corpus class, wherein the distance between the first semantic vector unit and the semantic feature vector of the dialog text in a vector space is smaller than or equal to a first preset value, the sub-corpus comprises the at least one semantic vector unit, the semantic vector unit comprises corpora similar to the semantic feature vector in the sub-corpus, and the first corpus class is the corpus class corresponding to the dialog text in the corpus;

the processing unit is further configured to determine at least one candidate reply corpus from the corpus corresponding to the at least one first semantic vector unit according to a plurality of keywords of the dialog text, where the plurality of keywords include at least one entity of the dialog text and at least one expanded word of the entity;

the processing unit is further configured to determine a reply corpus of the dialog text in the at least one candidate reply corpus;

and the output unit is used for outputting the reply corpus.

11. An electronic device comprising a memory and a processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored by the memory, causing the processor to perform the method of any of claims 1 to 9.

12. A storage medium, comprising: a readable storage medium and a computer program for implementing the method of any one of claims 1 to 9.