CN111522909A

CN111522909A - Voice interaction method and server

Info

Publication number: CN111522909A
Application number: CN202010279089.5A
Authority: CN
Inventors: 贾淇超
Original assignee: Hisense Visual Technology Co Ltd
Current assignee: Hisense Visual Technology Co Ltd
Priority date: 2020-04-10
Filing date: 2020-04-10
Publication date: 2020-08-11
Anticipated expiration: 2040-04-10
Also published as: CN111522909B

Abstract

The application provides a voice interaction method and a server. The method comprises the following steps: receiving a voice query request from a communication terminal, wherein the voice query request carries a first query text, and the first query text contains a reference word and lacks a designated object; combining and matching the antecedent words in the second query text which has the context association relation with the first query text with the reference words in the first query text to obtain candidate matched pairs; selecting candidate matching pairs according with the matching rules according to the attributes of the antecedent words and the referent words and the matching rules among the attributes; selecting a candidate matching pair from the candidate matching pairs according with the matching rule as a target matching pair; replacing the reference words in the first query text by using the target matching pairs to obtain a target query text; and responding the voice query request according to the target query text, so that the accuracy of semantic analysis is improved, and the user experience is further improved.

Description

Voice interaction method and server

Technical Field

The present application relates to artificial intelligence technologies, and in particular, to a voice interaction method and a server.

Background

With the development of voice technology, the past question-and-answer voice experience is developing towards continuous multiple rounds of conversations. Users are increasingly expecting Artificial Intelligence (AI) functionality for voice interactive devices. In the past, voice interaction has only been directed to specific intents (e.g., movie searches) in a specific domain (e.g., the movie domain), with matching of the above dialog based on some specific templates. Today, users want to simplify the interactive process using pronouns, i.e. the interactive device can perform the current round of conversation based on only a few special fixed words (e.g. songs he writes) on the basis of the previous round of conversation (e.g. talk show i want to listen to high dawn).

Therefore, it is desirable to provide a voice interaction method, which can correctly analyze the semantics for the above scenario so as to correctly respond, thereby improving the user experience.

Disclosure of Invention

The application provides a voice interaction method and a server, which are used for improving the accuracy of semantic analysis in the human-computer interaction process and further improving the user experience.

In a first aspect, an embodiment of the present application provides a voice interaction method, including:

receiving a voice query request from a communication terminal, wherein the voice query request carries a first query text, and the first query text contains a reference word and lacks a designated object;

combining and matching the antecedent words in the second query text which has context association relation with the first query text with the reference words in the first query text to obtain candidate matched pairs;

selecting candidate matching pairs according with matching rules according to the attributes of the antecedent words and the referent words and the matching rules among the attributes;

selecting a candidate matching pair from the candidate matching pairs according with the matching rule as a target matching pair;

replacing the reference words in the first query text by using the target matching pairs to obtain a target query text;

and responding the voice query request according to the target query text.

Optionally, selecting one candidate matching pair from the candidate matching pairs according with the matching rule as a target matching pair, including:

sorting the candidate matching pairs which accord with the matching rule according to the sequence of the matching degree of the candidate matching pairs;

and selecting one candidate matching pair from the sorted candidate matching pairs in sequence as a target vocabulary matching pair, wherein the association relation between the words in the target matching pair meets the association relation between the words in the knowledge graph.

Optionally, the step of sorting the candidate matching pairs according to the matching degree of the candidate matching pairs includes:

respectively extracting a word feature vector in the first query text and a word feature vector in the second query text;

and determining the matching degree of the candidate matching pairs according to the position, space and semantic relation of corresponding words in the first query text and the second query text represented by the word feature vector in the first query text and the word feature vector in the second query text, and sequencing the candidate matching pairs which accord with the matching rule according to the high-low sequence of the matching degree.

Optionally, replacing the reference word in the first query text with the target matching pair to obtain a target query text, including:

replacing the pronouns in the first query text by the target matching pairs;

and determining a service scene according to the first query text and the second query text, and completing the missing sentence components in the first query text after the substitute word is replaced according to the service scene to obtain the target query text.

Optionally, the method further includes:

after receiving a voice query request from a communication terminal, acquiring a second query text which has a context association relation with the first query text;

and judging whether the first query text contains the pronouns and lacks the designated objects according to the word feature vector extracted from the first query text and the word feature vector extracted from the second query text.

In a second aspect, an embodiment of the present application provides a server, where the server is configured to:

and responding the voice query request according to the target query text.

Optionally, the server is configured to:

determining the matching degree of the candidate matching pairs according to the position, space and semantic relation of corresponding words in the first query text and the second query text represented by the word feature vector in the first query text and the word feature vector in the second query text;

Optionally, the server is configured to:

replacing the pronouns in the first query text by the target matching pairs;

Optionally, the server is configured to:

and judging whether the first query text contains the representative words and lacks the specified objects or not according to the word feature vector extracted from the first query text and the word feature vector extracted from the second query text.

In a third aspect, an embodiment of the present application provides a server, including:

the receiving module is used for receiving a voice query request from the communication terminal, wherein the voice query request carries a first query text, and the first query text contains a representative word and lacks a designated object;

the processing module is used for combining and matching the antecedent words in the second query text which has context association relation with the first query text with the reference words in the first query text to obtain candidate matched pairs;

the selecting module is used for selecting candidate matching pairs which accord with matching rules according to the attributes of the antecedent words and the referent words and the matching rules among the attributes; selecting a candidate matching pair from the candidate matching pairs according with the matching rule as a target matching pair;

the replacing module is used for replacing the reference words in the first query text by using the target matching pairs to obtain a target query text;

and the response module is used for responding the voice query request according to the target query text.

In the embodiment of the application, after receiving a voice query request containing a first query sentence text from a communication terminal, if the first query sentence text contains a pronoun and lacks a designated object, a server combines and matches a pronoun in a second query text having a context association relationship with the first query text with a pronoun in the first query text to obtain a candidate matching pair, selects a candidate matching pair meeting a matching rule as a target matching pair according to attributes of the pronoun and a matching rule among the attributes, replaces the pronoun in the first query text with the target matching pair, and responds to the voice query request according to the replaced first query text, so that the accuracy of semantic analysis is improved, and further the user experience is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is a schematic diagram illustrating a voice interaction scenario in an embodiment of the present application;

FIG. 2 is a flow chart illustrating a method for voice interaction according to an embodiment of the present application;

fig. 3a and fig. 3b are flowcharts illustrating a method for obtaining a target query text according to an embodiment of the present application;

fig. 4 illustrates a schematic diagram of an SVM provided by an embodiment of the present application;

fig. 5 is an architecture diagram illustrating a server provided in an embodiment of the present application.

Detailed Description

The embodiment of the application provides a voice interaction method and a server, which are used for correctly analyzing semantics aiming at a plurality of turns of conversation scenes in the process of voice interaction with a user so as to correctly respond, and therefore the user experience is improved.

To make the objects, technical solutions and advantages of the exemplary embodiments of the present application clearer, the technical solutions in the exemplary embodiments of the present application will be clearly and completely described below with reference to the drawings in the exemplary embodiments of the present application, and it is obvious that the described exemplary embodiments are only a part of the embodiments of the present application, but not all the embodiments. Moreover, while the disclosure herein has been presented in terms of exemplary one or more examples, it is to be understood that each aspect of the disclosure can be utilized independently and separately from other aspects of the disclosure to provide a complete disclosure.

Furthermore, the terms "comprises" and "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.

The term "module" as used herein refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as implying or implying relative importance or implicitly indicating the number of technical features indicated.

Illustratively, the communication terminal in the embodiment of the present application may be a display device with a voice interaction function, such as a smart television, and the display device is configured to display a response result of a voice query request of a user. The communication terminal in the embodiment of the present application may also be a playing device with a voice interaction function, such as a smart speaker, and the playing device may perform voice broadcast on a response result of the user voice query request.

Taking a communication terminal as a display device with a voice interaction function as an example, fig. 1 exemplarily shows a voice interaction scene schematic diagram in the embodiment of the present application. As shown in fig. 1, a user may operate the display apparatus 200 through the control device 100. The user may also send a voice query request through an application program of the display apparatus 200, and the display apparatus 200 processes the received voice query request and responds to the query request, and displays a response result to the user.

The control device 100 may be a remote controller, which includes infrared protocol communication or bluetooth protocol communication, and other short-distance communication methods, and controls the display apparatus 200 in a wireless or other wired manner. The user may input a user command through a key on a remote controller, voice input, control panel input, etc. to control the display apparatus 200. Such as: the user can input a corresponding control command through a volume up/down key, a channel control key, up/down/left/right moving keys, a voice input key, a menu key, a power on/off key, etc. on the remote controller, to implement the function of controlling the display device 200.

In some embodiments, mobile terminals, tablets, computers, laptops, and other smart devices may also be used to control the display device 200. For example, the display device 200 is controlled using an application program running on the smart device. The application, through configuration, may provide the user with various controls in an intuitive User Interface (UI) on a screen associated with the smart device.

As also shown in fig. 1, the display apparatus 200 also performs data communication with the server 300 through various communication means. The display device 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 300 may provide various contents and interactions to the display apparatus 200. Illustratively, the display device 200 receives software program updates, or accesses a remotely stored digital media library, by sending and receiving information, as well as Electronic Program Guide (EPG) interactions. The servers 300 may be a group or groups of servers, and may be one or more types of servers. Other web service contents such as vod and inquiry services are provided through the server 300.

The display device 200 may be a liquid crystal display, an OLED display, a projection display device. The particular display device type, size, resolution, etc. are not limiting, and those skilled in the art will appreciate that the display device 200 may be modified in performance and configuration as desired.

The display apparatus 200 may additionally provide an intelligent network tv function that provides a computer support function in addition to the broadcast receiving tv function. Examples include a web tv, a smart tv, an Internet Protocol Tv (IPTV), and the like.

Fig. 2 is a flowchart illustrating a voice interaction method provided by an embodiment of the present application.

The flow shown in FIG. 2 may be applied to the following scenarios: aiming at the condition that the user query text contains the pronouns and lacks the designated objects, the current query text can be completed according to the above query text which has an association relation with the current query text, and the voice query request of the user can be responded according to the completed text.

As shown, the process includes the following steps:

s201: the server receives a voice query request from the communication terminal, wherein the voice query request carries a first query text, and the first query text contains a pronouncing word and lacks a designated object.

In some application scenarios, a user opens a voice assistant in a communication terminal, after receiving voice data of the user through the voice assistant, the voice assistant performs a drying process on the voice data (including removing echo and environmental noise) to obtain clean voice data, identifies the clean voice data to obtain a query text (also called a first query text), carries the first query text in a query request, and sends the query text to a server, and the server processes the voice query request of the user according to the first query text, wherein the voice assistant can input an application program for voice in the communication terminal.

In other embodiments, the voice processing capability of the communication terminal is low, and clean voice data obtained after drying can be directly sent to the server, and the server recognizes the clean voice data to obtain the first query text.

The first query text carried by the voice query request contains the pronouns and lacks the objects. The pronouns include pronouns of people (for example, "Li Ming afraid that mom alone lonely stays in the home and he takes the television in the home"), pronouns of indications (for example, "many people want to create a beautiful world for children, which can be understood but not exactly correct"), pronouns of indications (for example, "many people want to create a beautiful world for children"), descriptive words of certain types (for example, "the source of peach flowers under the pottery and profound writing seems to be the same world to which people go", however, the world of large size "means the pronouns of large size", and the object of "the source of peach flowers" (for example, "pause playing movie", wherein "movie" means the pronouns of dialogue, and the object of dialogue is a specific film name in the upper wheel), and the like, the reference word refers to the missing objects including people, things, phrases, sentences and the like, wherein the people called pronouns and the indication pronouns can be regarded as explicit pronouns, and definite descriptors, omitted words and the like can be regarded as non-explicit pronouns.

In the embodiment of the application, the first query text contains the pronouns and lacks the objects to be referred, so that the referring words and the objects to be referred in the first query text need to be resolved by executing a subsequent process, so that the semantics of the first query text can be correctly analyzed. For example, in a shopping scenario based on voice interaction, a user speaks the following statements in sequence in a round of conversation: "I want to buy red cup do not have" "white and do", where "white" is the word that refers to the object that is missing.

In some embodiments, whether the first query text contains a pronoun and lacks a referring object may be determined according to a word feature vector of the first query text or according to a word feature vector of the first query text and a second query text having a contextual association relationship therewith. Specifically, a word feature Vector in the first query text and a word feature Vector in the second query text may be extracted, and the extracted word feature Vector in the first query text and the extracted word feature Vector in the second query text are input into a trained Support Vector Machine (SVM) model, so that whether a reference word is included in the first query text and a reference object is missing may be determined.

The context association relationship exists between the first query text and the second query text, which means that the first query text and the second query text come from the same round of conversation in the user voice interaction process and belong to the same service scene. For example, the first query text and the second query text can be from a process for a user to shop based on a voice interaction, and for example, the first query text and the second query text can be from a process for a weather query based on a voice interaction. More likely, the two query texts are adjacent one after the other in the same round of dialog. Whether different query texts belong to the same round of conversation can be identified through a pre-trained model, and the method is not limited in the embodiment of the application.

S202: the server combines and matches the antecedent words in the second query text which has context association relation with the first query text with the reference words in the first query text to obtain candidate matching pairs.

In this step, first, the antecedent in the second query text may be extracted, and the pronouns in the first query text may be extracted, where the number of the extracted antecedent is usually multiple; then, the antecedent words and the referent words are combined and matched to obtain candidate matched pairs. A candidate collocation pair includes a pronoun and one or more antecedent words. If the number of extracted antecedents is multiple, it is considered that different antecedents are combined and the combination order of the antecedents may be different, so that various combination situations are generated, and a plurality of candidate matching pairs are generated. Therefore, the generated candidate matching pairs may be extracted according to a predetermined rule to obtain candidate matching pairs, for example, a candidate matching pair with a large number of antecedents included in the generated candidate matching pairs is preferentially extracted, or a candidate matching pair with a preceding candidate word in the generated candidate matching pairs is preferentially extracted.

The antecedent term is a noun or pronoun modified by the idiom clause, and may include, for example, a name of a person, a place, time, a name of a institution, a pronoun, a general noun, and the like.

For example, the second query text is "how so today Qingdao weather", from which the following antecedent can be extracted: "today", "Qingdao", "weather"; the first query text is "tomorrow", the index word "tomorrow" can be extracted from the first query text, and the following candidate matching pairs can be obtained after the antecedent words and the index words are combined and matched: (1) tomorrow-today; (2) tomorrow-Qingdao; (3) tomorrow-weather; (4) tomorrow-Qingdao-weather; (5) the blue island-tomorrow (6) weather-tomorrow. Wherein, the Qingdao-Mingtian and the Mingtian-Qingdao are used as different candidate matching pairs, and the time relationship exists between the Qingdao and the Mingtian.

S203: the server selects candidate matching pairs according with matching rules according to the attributes of the antecedent words and the referent words and the matching rules among the attributes.

In some embodiments of the present application, a tag database may be preset to store tags of words, where the tags are also attributes. The tag database can be pre-generated according to the corpus of the user in the specific field and is used for storing tags in the specific field and nouns corresponding to the tags, each tag represents one attribute, the nouns corresponding to the same tag have the same attribute, and one noun can correspond to a plurality of attributes.

For example, the tag database includes a "time" tag, and the nouns included under the tag include 'today', 'tomorrow', 'postnatal', etc., which indicate that these words have a time attribute, representing time; the database label also comprises a "city" label, wherein nouns contained under the label are 'Qingdao', 'Beijing', 'Guangzhou', and the like, which indicate that the words have city attributes and represent city names; the tag database also comprises a "weather" tag, nouns contained under the tag are 'weather', 'sunny', 'rainy' and the like, which indicate that the words have weather attributes and represent weather; the "singer" tag in the tag database, which contains the names ' Liudebua ', "Power macro '," Liu Ruojin "etc. under the tag, means that these words have singer attributes and represent the names of people.

In the embodiment of the application, a tag matching database can be preset and used for storing matching pairs meeting matching rules, namely the matching pairs are tag matching pairs meeting knowledge requirements related to the knowledge graph. The tag matching database can be generated according to user corpus learning. For example, the generation process may include: the method comprises the steps of obtaining user linguistic data, segmenting words, determining labels of each word, combining and collocating the labels of the words, storing label collocation pairs into a map form (namely a key value pair form), checking the reasonability of the label collocation pairs according to the related knowledge requirements of a knowledge map, and storing the label collocation pairs in a label collocation database after unreasonable label collocation pairs are removed.

Specifically, the process of generating the tag collocation database through the tag collocation learning process is as follows:

(1) generating tag collocation pairs

And capturing a certain amount of user linguistic data, and determining words which can be labeled in the sentence by using the knowledge graph from the first sentence. For example, the user corpus is 'movie playing liu de hua for me', and the knowledge graph is used to determine that two words capable of being labeled in the sentence, 'liu de hua' and 'movie' respectively correspond to labels as follows: liudebhua [ operator, singer, director ], movie [ movieKey, musicType ], and the generated tag matching pairs are "operator: movieKey, musicType "," singer: movieKey, musicType "," director: movieKey, musicType ".

(2) Updating the tag collocation pair

The second sentence corpus of the user is captured as 'weather for looking up the next day', the knowledge graph is used for determining that the words which can be labeled in the sentence have 'weather' and 'tomorrow', and the labels corresponding to the words are as follows: tomorrow [ time, song ], weather [ weather Key ], tag matching is "actor: movieKey, musicType "," singer: movieKey, musicType "," director: movieKey, musicType "," time: weather Key, song: weather key.

(3) Generating a tag collocation database

Repeating the processes (1) to (2) until the last sentence of corpus to obtain a complete tag matching pair, and then manually checking to remove unreasonable tag matching pairs, such as song: and the weberKey generates a tag matching database according to the residual tag matching pairs.

For example, a corpus is obtained, and a song of liudeluxe is played to me, wherein, liudeluxe' corresponds to a "singer" tag and an "actor" tag, and "song" corresponds to a "music" tag, and the obtained tags are matched with singer-music key (singer-music) and actor-music key (actor-music), and the result stored as map is "singer: musicKey', actor: musicKey ", based on knowledge map verification, removes unreasonable matching pairs" operator: musicKey ", will mate to" singer: MusicKey "is stored in the tag collocation database.

Based on the tag database and the tag matching database, in S203, the attributes of the first word and the reference word in the candidate matching pair may be obtained by querying the tag database to obtain an attribute matching pair (or referred to as a tag pair) corresponding to each candidate matching pair, and then querying the tag matching database, if a tag pair corresponding to a candidate matching pair exists in the tag matching database, it indicates that the candidate matching pair meets the matching rule, otherwise, it indicates that the candidate matching pair does not meet the matching rule, and thus a candidate matching pair meeting the matching rule is selected.

Taking the candidate matching pair obtained in the example of S202 as an example, the tag pair corresponding to "tomorrow-today" is "time: time ", the tag pair is not in the tag collocation database, so the candidate collocation pair does not conform to the collocation rules; the label pair for "tomorrow-weather" is "day: weather ", exists in the tag collocation database, so that the candidate collocation pair conforms to the collocation rules.

S204: and the server selects one candidate matching pair from the candidate matching pairs according with the matching rule as a target matching pair.

In this step, a candidate matching pair may be randomly selected from the candidate matching pairs that meet the matching rule, or a candidate matching pair may be selected as a target matching pair according to a preset policy.

In some embodiments, the target pairings may be selected based on the degree of matching of the candidate pairings and whether the candidate pairings match the knowledge-graph. Specifically, the candidate matching pairs which accord with the matching rule are sorted according to the sequence of the matching degrees of the candidate matching pairs, the candidate matching pair with the highest matching degree is selected from the sorted candidate matching pairs, the rationality of the candidate matching pairs is checked according to the incidence relation among the words in the knowledge graph, if the checking is reasonable, the candidate matching pair is used as the target word matching pair, otherwise, the next candidate matching pair with the next matching degree is selected and the rationality is checked, and the like until the candidate matching pair which accords with the knowledge graph is selected. By the method, the selected target matching pair has higher matching degree, the incidence relation between the words in the matching pair can be matched with the incidence relation between the words in the knowledge graph, so that the accuracy of semantic relation of target matching is ensured, and the accuracy of semantics of the query text can be ensured when the query text is completed by the target matching pair.

In some embodiments, the candidate pairings may be ordered according to word feature vectors in the first query text and the second query text. Specifically, word feature vectors in the first query text and word feature vectors in the second query text can be extracted respectively, the matching degree of the candidate matching pairs is determined according to the position, space and semantic relation of corresponding words in the first query text and the second query text represented by the word feature vectors of the first query text and the word vectors of the second query text, and the candidate matching pairs which accord with the matching rule are sorted according to the high-low sequence of the matching degree.

The candidate matching pairs can be sorted by using a lamdamart sorting model, and the model can calculate the matching degree of the candidate matching pairs according to the weight of the feature vector, so that sorting is realized. For example, the initial weight of each feature vector is set to 1, and in the case of supervised learning, the lamdamart ranking model learns the actual weight value of each feature vector, and determines the high-low order of the matching degree through N-gram scoring.

Taking the first query text as "tomorrow's tweed" as an example, after the candidate matching pairs meeting the matching rule are sorted by adopting a lamdamard sorting model, according to the sequence of matching degrees from high to low, the sorted candidate matching pairs are: (1) tomorrow-Qingdao-weather; (2) tomorrow-weather; (3) tomorrow-Qingdao.

S205: and the server uses the target matching pair to replace the reference word in the first query text to obtain the target query text.

In the step, the target matching pairs are used for replacing the reference words in the first query text to obtain a completed first query text, and the completed first query text is used as the target query text.

For example, the first query text of the user is "sky tweed", and the target matching is used for sky-Qingdao-weather, and the completed target query text is "sky tweed of sky-Qingdao".

S206: the server responds to the voice query request according to the target query text.

In the step, after the server obtains a voice query request response according to the target query text, the server sends a response result to the communication terminal, a display screen of the communication terminal displays the text of the response result, and the audio player plays the response result in a voice mode.

In some embodiments of the present application, word feature vectors of the first query text and the second query text may be extracted, and since the word feature vectors may represent the association relationship of the specific words in the first query text and the second query text in terms of position, space, and semantics, the pronouns and the missing objects in the first query text may be determined based on the extracted word feature vectors, and the candidate matching pairs may be ranked based on the word feature vectors. The features required to be extracted can be set as required, and table 1 exemplarily shows the features required to be extracted by some embodiments of the present application. The antecedent features are extracted from the second query text, the term features are extracted from the first query text, and other features are extracted from the first query text and the second query text together.

TABLE 1

Table 2 shows the word feature vector extraction results, taking as an example a set of dialogs including statement 1: how today Qingdao weather, statement 2: tomorrow's woollen cloth.

TABLE 2

The word feature vectors and context word feature vectors in table 2 can be extracted from a word vector model, which is a pre-trained model, and a single word is represented as a fixed-length vector through data training.

In other embodiments, after replacing the pronouns in the first query text with the target matching pairs, if sentence components may still be missing in the first query text, a service scenario needs to be determined according to the first query text and the second query text, and the missing sentence components in the first query text after replacing the pronouns are complemented according to the service scenario, so as to obtain the target query text.

For example, if the second query text is "talk show that i want to listen to high jubilation", the first query text is "his song", the text after replacing the pronouncing word in the first query text with the target matching pair is "song of high jubilation", the text lacks a predicate and does not satisfy the structure of the bingo, the service scenario is determined to be music search according to the first query text and the second query text, the missing sentence component in the text of "song of high jubilation" is complemented according to the music search service scenario, and the complemented target query text is "song top-ending high jubilation".

In the above embodiments of the present application, in a human-computer interaction process, a server responds to a user voice query request, combines and matches a first word in a second query text having a context association relationship with a first query text and a second word in the first query text to obtain a candidate matching pair, ranks the candidate matching pair according to attributes of the first word and the second word and matching rules among the attributes, selects one candidate matching pair in sequence and performs reasonableness check by using knowledge related to atlas knowledge, if the candidate matching pair is checked reasonably, completes the first query text by using the selected candidate matching pair as an objective matching pair to obtain an objective query text, responds to the voice query request according to the objective query text, the accuracy of semantic parsing is improved, and then user experience is improved.

Fig. 3a and fig. 3b illustrate a flowchart of a method for obtaining a target query text, taking a specific scenario as an example, according to one or a combination of the foregoing embodiments.

Referring to fig. 3a and 3b, taking an example that the first query text of the user is "tomorrow" and the second query text is "how much weather is like in Qingdao today", the first query text includes a pronoun and lacks a designated object, and after receiving the voice query request including the first query text, the server performs the following processes:

s301 to S302: and judging whether the reference word in the first query text is the dominant reference word, if so, determining the missing reference object through semantic analysis, and otherwise, turning to S303.

In this step, the first query text "tomorrow" contains no dominant pronouns and contains non-dominant pronouns modified with "tomorrow", so that the process proceeds to S303, where the judgment of the non-dominant pronouns may use an SVM model.

S303: obtaining a second query text having a context association relationship with the first query text, performing dependency syntax analysis on the first query text and the second query text, determining missing sentence components (in this example, missing predicates and objects) of the first query text, and extracting word feature vectors in the first query text and word feature vectors in the second query text.

S304 to S306: and judging whether the first query text lacks the designated object or not by using an SVM (support vector machine) model according to the extracted word feature vector in the first query text and the extracted word feature vector in the second query text, responding to the voice query request according to the first query text if the designated object is not lacked, and otherwise, extracting the antecedent in the second query text.

In this step, the first query text contains the non-dominant pronouns modified by "tomorrow" and the designated objects are missing, and the antecedents in the second query text need to be extracted, wherein the extracted antecedents include "today", "Qingdao", and "weather".

S307: and combining and matching the extracted antecedent words and the referent words to generate candidate matched pairs.

In this step, the obtained candidate pairings include: (1) tomorrow: today; (2) tomorrow: a Qingdao; (3) tomorrow: weather; (4) tomorrow: a Qingdao: weather; (5) a Qingdao: tomorrow. Tomorrow.

S308: and extracting the candidate matching pairs.

In this step, the extraction rule may be preset, for example, a candidate matching pair with a larger number of antecedents in the candidate matching pair is preset to be preferentially extracted.

The candidate pairings to be extracted include: (2) tomorrow: a Qingdao; (3) tomorrow: weather; (4) tomorrow: a Qingdao: weather.

S309: and querying a tag database to obtain the attributes of the antecedent and the referent in the extracted candidate collocation.

In this step, the tag database is queried to obtain the attribute of the antecedent "today" as the "time" tag, "the attribute of" Qingdao "as the" city "tag, and the attribute of" weather "as the" weather "tag.

S310: and querying the tag matching database to obtain candidate matching pairs according with matching rules.

In this step, all candidate matching pairs that meet the matching rule obtained include tomorrow: qingdao, tomorrow: weather, tomorrow: a Qingdao: weather.

S311: and sorting the matching degrees of the candidate matching pairs by using a lamdamard sorting model according to the word feature vector of the first query text and the word feature vector of the second query text.

In this step, the result of the lamdamart ranking model ranking the candidate pairings is: 1 tomorrow: a Qingdao: weather; 2 tomorrow: weather; 3, tomorrow: a Qingdao.

S312 to S315: selecting a candidate matching pair according to the sequence of matching degrees from high to low by using an N-gram model, judging whether the selected candidate matching is reasonable according to the incidence relation among the words in the knowledge graph, if so, using the candidate matching pair as a target word matching pair to replace a reference word in a first query text to obtain a target query text, responding to a query request according to the target text, and otherwise, responding to the query request according to the first query text.

In the step, the candidate matching pair with the highest matching degree selected by the N-gram model is tomorrow: a Qingdao: and weather, determining that the candidate matching pair is reasonable according to knowledge related to the knowledge map, replacing the substitute words in the first query text by using the candidate matching pair as a target vocabulary matching pair, wherein the replaced target query text is the weather tweed of the tomorrow Tsuntan.

S301 to S315 are not strictly executed in a predetermined order, and for example, S305 may be executed before S304.

In the above embodiments of the present application, according to the extracted word vector features of the first query text and the extracted word vector features of the second query text, as shown in table 1, an SVM model is used to determine whether an object that contains a non-dominant representative word and lacks a reference is included, where the SVM model is generated by pre-training according to a user corpus. Specifically, the extracted feature vector of the word space between the representative word in the current corpus text and the antecedent word in the previous corpus text and the feature vector of the attribute of the representative word in the current corpus text are input into an SVM model for training, wherein the word space between the representative word in the current corpus text and the antecedent word in the previous corpus text is specifically embodied as a character space.

The SVM model is shown in fig. 4. As shown in the figure, the goal of SVM is to find a hyperplane that can solve the classification problem well, so that samples of each classification are found firstThe point closest to the hyperplane maximizes the distance from the point to the hyperplane, and the closest point is the point drawn on the dashed line. By hyperplane formula

Points with a calculated distance greater than 1 belong to negative samples, indicated by crosses, and points with a distance less than 0 belong to positive samples, indicated by circles. The positive and negative samples can well define a hyperplane, and a vector is also represented in the geometric space, and the vector defining the hyperplane is called a support vector.

Based on the same technical concept, embodiments of the present application provide a server, where the server may implement the method in the foregoing embodiments, and the server is configured to:

and responding the voice query request according to the target query text.

determining the matching degree of the candidate matching pairs according to the position, space and semantic relation of corresponding words in the first query text and the second query text represented by the word feature vector;

and selecting one candidate matching pair with the incidence relation meeting the requirement of the incidence relation of the knowledge graph from the sorted candidate matching pairs in sequence as a target vocabulary matching pair.

replacing the pronouns in the first query text by the target matching pairs;

Optionally, the method further includes:

The functions of the server can be referred to the description of the functions implemented by the flow diagram of the voice interaction method in the foregoing embodiments, and are not repeated here.

Based on the same technical concept, the embodiment of the application provides a server, and the server can implement the method on the server side in the embodiment.

Referring to fig. 5, the server includes: the system comprises a receiving module 501, a processing module 502, a selecting module 503, a replacing module 504 and a responding module 505.

A receiving module 501, configured to receive a voice query request from a communication terminal, where the voice query request carries a first query text, and the first query text contains a reference word and lacks a reference object;

the processing module 502 is configured to combine and match the antecedent in the second query text having a context association relationship with the first query text with the reference word in the first query text to obtain a candidate matching pair;

a selecting module 503, configured to select a candidate matching pair meeting the matching rule according to the attributes of the precedent and the referent and the matching rule among the attributes; selecting a candidate matching pair from the candidate matching pairs according with the matching rule as a target matching pair;

a replacing module 504, configured to replace a reference word in the first query text with the target matching pair to obtain a target query text;

and a response module 505, configured to respond to the voice query request according to the target query text.

The functions of the modules in the server can be referred to the description of the functions implemented by the flow diagram of the voice interaction method in the foregoing embodiments, and are not repeated here.

The embodiment of the present application further provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and the computer-executable instructions are used to enable a computer to execute the method performed by the server in the foregoing embodiment.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of voice interaction, comprising:

combining and matching the antecedent words in the second query text which has the context association relation with the first query text with the reference words in the first query text to obtain candidate matched pairs;

selecting candidate matching pairs according with the matching rules according to the attributes of the antecedent words and the referent words and the matching rules among the attributes;

and responding the voice query request according to the target query text.

2. The method of claim 1, wherein selecting one candidate matching pair from the candidate matching pairs that meet the matching rule as a target matching pair comprises:

sorting the candidate matching pairs according with the matching rule according to the sequence of the matching degree of the candidate matching pairs;

3. The method of claim 2, wherein ranking the candidate pairings that meet the collocation rule in order of how well the candidate pairings match comprises:

and determining the matching degree of the candidate matching pairs according to the position, space and semantic relation of corresponding words in the first query text and the second query text represented by the word feature vector in the first query text and the word feature vector in the second query text, and sequencing the candidate matching pairs which accord with the matching rule according to the sequence of the matching degree.

4. The method of claim 1, wherein replacing a reference word in the first query text with the target-matching pair to obtain a target query text comprises:

replacing the pronouns in the first query text with the target matching pairs;

and determining a service scene according to the first query text and the second query text, and completing the missing sentence components in the first query text after the substitute words are replaced according to the service scene to obtain a target query text.

5. The method of any one of claims 1-4, further comprising:

6. A server, wherein the server is configured to:

and responding the voice query request according to the target query text.

7. The server of claim 6, wherein the server is configured to:

8. The server according to claim 6, wherein the server is configured to:

replacing the pronouns in the first query text with the target matching pairs;

9. The server according to any one of claims 6-8, wherein the server is configured to:

and judging whether the first query text contains the representative words and lacks the specified objects according to the word feature vector extracted from the first query text and the word feature vector extracted from the second query text.

10. A server, comprising:

the receiving module is used for receiving a voice query request from a communication terminal, wherein the voice query request carries a first query text, and the first query text contains a reference word and lacks a reference object;

the processing module is used for combining and matching the antecedent words in the second query text which has the context association relation with the first query text with the reference words in the first query text to obtain candidate matched pairs;

the selecting module is used for selecting candidate matching pairs which accord with the matching rules according to the attributes of the antecedent words and the referent words and the matching rules among the attributes; selecting a candidate matching pair from the candidate matching pairs according with the matching rule as a target matching pair;

the replacing module is used for replacing the reference words in the first query text by using the target collocation pair to obtain a target query text;