CN112148848A

CN112148848A - Question and answer processing method and device

Info

Publication number: CN112148848A
Application number: CN202010885459.XA
Authority: CN
Inventors: 李喜莲; 牛嘉斌; 雷欣; 李志飞
Original assignee: Go Out And Ask Suzhou Information Technology Co ltd
Current assignee: Volkswagen China Investment Co Ltd; Mobvoi Innovation Technology Co Ltd
Priority date: 2020-08-28
Filing date: 2020-08-28
Publication date: 2020-12-29

Abstract

The invention discloses a question and answer processing method and device, and relates to the technical field of artificial intelligence. One embodiment of the method comprises: acquiring a voice request; processing the voice request to obtain an intention classification corresponding to the voice request; and executing answer searching operation corresponding to the intention classification on the voice request, and feeding back a searching answer. Therefore, the voice requests with multiple intentions and incomplete intentions can be recognized, the active inquiry function is provided, and the intellectualization of the dialogue system is improved.

Description

Question and answer processing method and device

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a question and answer processing method and device.

Background

Existing dialog systems include a speech recognition module, a natural language understanding module, a dialog management module, a language generation module, and a speech generation module. The existing language recognition module can only recognize voice requests with single intention and can not recognize voice requests with multiple intentions or incomplete intentions. For example, the user's voice request is "i want to listen", the existing dialog system answers "ready to play friend", "play music for you", or "do not find this song", etc. For a multi-intent voice request, for example, the user voice request is "play Harry potter", the answer of the existing dialog system is "prepare to play Harry potter song", "play music for you", or "do not find this song", etc.

For voice requests with multiple intentions or incomplete intentions, the processing method of the existing dialogue system is simple and rough, the multiple intentions or the incomplete intentions are directly divided into single intentions with higher probability, and then answers corresponding to the voice requests are returned according to the single intentions. Although this process completes one round of dialog, the initiative of the dialog system is not fully embodied, and thus the accuracy of the dialog system response is reduced.

Disclosure of Invention

In view of this, embodiments of the present invention provide a question and answer processing method and device, which can recognize voice requests with multiple intentions and incomplete intentions, provide an active query function, and improve the intelligence of a dialog system.

In order to achieve the above object, according to a first aspect of the embodiments of the present invention, there is provided a question-answering processing method, including: acquiring a voice request; processing the voice request to obtain an intention classification corresponding to the voice request; and executing answer searching operation corresponding to the intention classification on the voice request, and feeding back a searching answer.

Optionally, the processing the voice request to obtain an intention classification corresponding to the voice request includes: performing semantic analysis on the voice request; performing intention recognition on the voice request by utilizing a model; determining the voice request with the semantic parsing result being scene-free and the intention recognition result being ambiguous intention as incomplete intention; and determining the voice request with the semantic parsing result of having scenes and the intention recognition result of being clear intention as a complete intention.

Optionally, the performing answer search operation corresponding to the intention classification on the voice request includes: if the intention classification is a complete intention, searching answers of the voice request under the scene indicated by the semantic parsing result; if the intention classification is an incomplete intention, predicting a plurality of preselected scenes corresponding to the voice request, and selecting a preselected scene meeting a first preset condition from the preselected scenes as a candidate scene; and sending a query request corresponding to the candidate scene, and searching for an answer of the voice request under the candidate scene indicated by the query request result.

Optionally, the searching for the answer to the voice request in the scene indicated by the semantic parsing result includes: if the semantic analysis result has an entity value, searching answers corresponding to the entity value in a scene indicated by the semantic analysis result, and selecting answers of the voice request from the searched answers; and if the semantic analysis result does not have an entity value, searching answers corresponding to the scene indicated by the semantic analysis result, and selecting answers of the voice request from the searched answers.

Optionally, if an entity value exists in the semantic analysis result, searching for an answer corresponding to the entity value in a scene indicated by the semantic analysis result, including: if an entity value exists, searching an answer corresponding to the entity value under the scene indicated by the semantic analysis result; if a plurality of entity values exist, determining whether the entity values simultaneously satisfy the same semantic slot; if yes, respectively searching answers aiming at each entity value under the scene indicated by the semantic analysis result, and selecting any one answer from answers corresponding to each entity value; aiming at a plurality of entity values, splicing a plurality of selected answers to obtain search answers corresponding to the entity values; if not, searching answers meeting the entity values simultaneously under the scene indicated by the semantic analysis result.

Optionally, the first preset condition refers to a preselected scene with the first two confidence levels and the search answer.

To achieve the above object, according to a second aspect of the embodiments of the present invention, there is also provided a question-answering processing apparatus including: the acquisition module is used for acquiring a voice request; the processing module is used for processing the voice request to obtain an intention classification corresponding to the voice request; and the searching module is used for executing answer searching operation corresponding to the intention classification on the voice request and feeding back searching answers.

Optionally, the processing module includes: the semantic analysis unit is used for carrying out semantic analysis on the voice request; an intention recognition unit for performing intention recognition on the voice request by using a model; a determining unit configured to determine a voice request, the semantic parsing result of which is scene-free and the intention recognition result of which is an ambiguous intention, as an incomplete intention; and determining the voice request with the semantic parsing result of having scenes and the intention recognition result of being clear intention as a complete intention.

Optionally, the search module includes: the first searching unit is used for searching answers of the voice requests under the scene indicated by the semantic parsing result if the intention classification is a complete intention; and the second searching unit is used for predicting a plurality of preselected scenes corresponding to the voice request if the intention classification is an incomplete intention, selecting the preselected scenes meeting a first preset condition from the preselected scenes as candidate scenes, sending an inquiry request corresponding to the candidate scenes, and searching answers of the voice request under the candidate scenes indicated by the inquiry request result.

Optionally, the first searching unit includes: an entity value unit, configured to search, if an entity value exists in the semantic analysis result, an answer corresponding to the entity value in a scene indicated by the semantic analysis result, and select an answer to the voice request from the search answers; and the scene unit is used for searching answers corresponding to the scenes indicated by the semantic analysis result if the semantic analysis result does not have an entity value, and selecting the answer of the voice request from the searched answers.

To achieve the above object, according to a third aspect of the embodiments of the present invention, there is also provided a computer-readable medium having stored thereon a computer program which, when executed by a processor, implements the question-answer processing method according to the first aspect.

According to the embodiment of the invention, the acquired voice request is processed to obtain the intention classification corresponding to the voice request, then answer searching operation corresponding to the intention classification is carried out on the voice request, and searching answers are fed back. Therefore, the voice requests with multiple intentions and incomplete intentions can be recognized, the active inquiry function is provided, and the intellectualization of the dialogue system is improved.

Further effects of the above-described non-conventional alternatives will be described below in connection with specific embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein like or corresponding reference numerals designate like or corresponding parts throughout the several views.

FIG. 1 is a flow chart of a question-answer processing method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a question-answer processing method for an incomplete intended voice request according to an embodiment of the present invention;

FIG. 3 is a flow chart of a question-answer processing method for a full intent voice request according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a question answering device according to an embodiment of the present invention;

FIG. 5 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 6 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

As shown in fig. 1, a flow chart of a question-answering processing method according to an embodiment of the present invention is shown, and the method at least includes the following operation flows:

s101, acquiring a voice request.

Illustratively, the voice request is obtained by text input or voice input.

S102, the voice request is processed, and intention classification corresponding to the voice request is obtained.

Illustratively, the semantic parsing is performed on the voice request; performing intention identification on the voice request by using a model; determining the voice request with the semantic parsing result being scene-free and the intention recognition result being ambiguous intention as incomplete intention; and determining the voice request with the semantic parsing result of having scenes and the intention recognition result of being clear intention as a complete intention. The following two situations occur in the processing result of the voice request, both belong to abnormal situations, and when the abnormal situations occur, the semantic parsing module or the intention identifying module is generally required to be optimized. The two cases are that the semantic parsing result is no scene and the intention identification result is definite intention, and the semantic parsing result is a scene and the intention identification result is indefinite intention.

S103, answer searching operation corresponding to the intention classification is carried out on the voice request, and searching answers are fed back.

Illustratively, intent classifications include complete intent and incomplete intent. If the intention classification of the voice request is an incomplete intention, a plurality of candidate answers corresponding to the voice request are searched, and a query request is sent aiming at the candidate answers to determine the answer of the voice request.

If the intention classification of the voice request is a complete intention, searching answers of the voice request under the scene indicated by the semantic analysis result; further, if the entity value exists in the semantic analysis result, searching answers corresponding to the entity value in the scene indicated by the semantic analysis result, and selecting answers of the voice request from the searched answers; and if the semantic analysis result does not have the entity value, searching answers corresponding to the scene indicated by the semantic analysis result, and selecting answers of the voice request from the searched answers.

Therefore, the intention classification of the voice request is determined through the semantic parsing result and the intention recognition result, so that answer search operation corresponding to the intention classification is executed on the voice request, and the intelligence of a conversation system is improved.

FIG. 2 is a flow chart of a question-answering processing method for an incomplete intended voice request according to an embodiment of the present invention; the method at least comprises the following operation flows:

s201, acquiring a voice request.

S202, semantic analysis and intention recognition are respectively carried out on the voice request, and the intention corresponding to the voice request is obtained and classified as an incomplete intention.

Illustratively, the semantic parsing result of the voice request is scene-free, and the intention recognition model is an ambiguous intention for the intention recognition result of the voice request; thus, it is determined that the intent corresponding to the voice request is classified as an incomplete intent.

S203, a plurality of preselected scenes corresponding to the voice request are predicted.

S204, selecting a preselected scene meeting a first preset condition from the plurality of preselected scenes as a candidate scene.

Illustratively, the first preset condition refers to a preselected scene with the confidence degree in the first two digits and the search answer. And when the semantic analysis module predicts a plurality of preselected scenes corresponding to the voice request, the confidence of the preselected scenes is output.

Here, the first preset condition is artificially set according to an actual application scenario.

S205, an inquiry request corresponding to the candidate scene is transmitted.

And S206, searching answers of the voice requests under the candidate scenes indicated by the inquiry request result.

Specifically, the voice request is 'i want to listen', semantic analysis is carried out on the voice request, and a semantic analysis result represents that the voice request has no scene; the voice request is subjected to intention recognition by utilizing an intention recognition model, and the intention recognition result represents that the voice request is an ambiguous intention, so that the voice request is determined to be an incomplete intention. Predicting a plurality of preselected scenes corresponding to the voice request through a semantic analysis module, outputting the confidence coefficient of each preselected scene, and selecting the preselected scenes with the confidence coefficients of the first two and search answers from the preselected scenes, wherein the preselected scenes are, for example, a music scene, a poetry scene, a vocal book scene, a phase sound scene and a drama scene which are sequentially ordered according to the sequence of the confidence coefficients of the preselected scenes, and the selected first two preselected scenes are the music scene and the poetry scene respectively; it is then determined whether the voice request has an answer in a music scene or a poetry scene. If only the music scenes have search answers, such as "friend", "same song", and "lover", the music scenes are used as candidate scenes for the voice request. If the search answer is available in both the music scene and the poetry scene, for example, the search answer in the poetry scene is 'absolute sentence'. Sending a query request, wherein the query request is 'you want to listen to music or poetry', if the user feedback content is 'want to listen to music', selecting one of search answers corresponding to music scenes as an answer of the voice request, for example, selecting 'friend' from three search answers of 'friend', 'same song' and 'lover' as an answer of 'my want to listen'.

Here, the search answer ranked first may be selected as the answer of the voice request based on the degree of heat of the search answer, or one search answer may be randomly selected as the answer of the voice request.

The embodiment of the invention can identify the incomplete intention and provide the function of active inquiry aiming at the incomplete intention, thereby realizing the initiative and the intellectualization of a dialog system.

FIG. 3 is a flow chart of a question-answer processing method for a full intention voice request according to an embodiment of the present invention; the method at least comprises the following operation flows:

s301, acquiring a voice request.

S302, semantic analysis and intention recognition are respectively carried out on the voice request, and the intention corresponding to the voice request is obtained and classified as a complete intention; if the semantic analysis result has an entity value, executing S303 operation; if the semantic analysis result has a plurality of entity values, executing S304 operation; if the semantic parsing result does not have an entity value, the operation S307 is executed.

S303, searching answers corresponding to the entity values in the scene indicated by the semantic analysis result; operation S308 is then performed.

For example, the search answer may be one or more.

S304, determining whether a plurality of entity values simultaneously satisfy the same semantic slot; if yes, the operation S305 is performed, and if no, the operation S306 is performed.

S305, respectively searching answers aiming at each entity value under the scene indicated by the semantic analysis result, and selecting any one answer from the answers corresponding to each entity value; aiming at a plurality of entity values, splicing a plurality of selected answers to obtain search answers corresponding to the entity values; operation S308 is then performed.

S306, searching answers meeting multiple entity values simultaneously in the scene indicated by the semantic parsing result; operation S308 is then performed.

S307, searching answers corresponding to scenes indicated by the semantic parsing result; operation S308 is then performed.

S308, the answer of the voice request is selected from the search answers.

For example, if there is a search answer, the search answer is used as the answer of the voice request. If a plurality of search answers exist, selecting one search answer meeting preset conditions from the plurality of search answers as an answer of the voice request.

Specifically, the voice request is 'i want to listen to a song of a friend', the semantic parsing result represents that the scene of the voice request is 'song' and the entity value is 'friend', and the intention recognition result is an explicit intention, so that the intention of the voice request is determined to be classified as a complete intention. Since there is one entity value for the voice request, the answer corresponding to "friend" is searched for in the "song" scene, resulting in three search answers, such as "friend" sung, "friend" sung, and "friend" sung. And selecting a first ranked answer as an answer of the voice request according to the heat of the 'friends' sung by different singers, or randomly selecting a search answer as an answer of the voice request.

The voice request is 'I want to listen to music', the scene of the voice request is 'music' represented by the semantic parsing result, no entity value exists, the intention recognition result is a clear intention, and therefore the intention of the voice request is determined to be classified as a complete intention. Since there is no entity value in the voice request, the answer is searched in the scene of "music", and three search answers are obtained, namely "friend", "same song", and "lover". The search answer with the highest playing popularity is selected from the three search answers as the answer of the voice request, for example, "friend" is selected as the answer of "i want to listen to music".

The voice request is 'help me to look up weather of Beijing and Shanghai', the scene of the voice request is 'weather' represented by the semantic analysis result, the entity values are 'Beijing' and 'Shanghai', respectively, and the intention identification result is a clear intention, so that the intention of the voice request is determined to be classified as a complete intention. Because the voice request has two entity values and the semantic slots corresponding to the two entity values are both "weather location", the two entity values satisfy the same semantic slot. In the weather scene, answers are respectively searched for an entity value ' Beijing ' and an entity value ' Shanghai ', the answers corresponding to the entity value ' Beijing ' are ' clear, 16-30 ℃ and ' light rain, 25 ℃ and the answers corresponding to the entity value ' Shanghai ' are ' thunderstorm, 21 ℃. Selecting any one answer from the answers corresponding to each entity value; splicing the two selected answers aiming at the two entity values to obtain two groups of answers corresponding to the entity value of the voice request, wherein the two groups of answers are respectively' Beijing City today is fine, 16-30 ℃; shanghai city today Ramie rain, 21 ℃ and' Mingtian light rain in Beijing city, 25 ℃; shanghai city today's thunderstorm, 21 ℃ ". A query request may then be sent for two sets of answers, for example, "you want to know what day's weather", according to the user answer, "i want to know what day's weather", will "beijing city today is sunny, 16-30 ℃; shanghai city today's thunderstorm, 21 ℃ "as the answer to the voice request and fed back to the user.

The voice request is 'helping me to look up the weather of Beijing at 2 months and 1 day', the scene of the voice request is 'weather' represented by the semantic analysis result, the entity values are 'Beijing' and '2 months and 1 day', and the intention identification result is a clear intention, so that the intention classification of the voice request is determined to be a complete intention. The voice request has two entity values, the semantic slot corresponding to the entity value 'Beijing' is 'weather location', and the semantic slot corresponding to the entity value '2 months and 1 days' is 'weather time'. Since the two entity values do not satisfy the same semantic slot. Therefore, two answers which simultaneously satisfy the weather location and the weather time are searched in the weather scene, and the two answers corresponding to the entity values are obtained, namely '2-month and 1-day sunny in Beijing City, 16-30 ℃ and' 2-month and 1-day thunderstorm in Beijing City, 16-30 ℃. The answer with the highest confidence level is selected from the two search answers as the answer of the voice request, for example, the selected answer is' 2 month and 1 day sunny in Beijing city, 16-30 ℃.

Therefore, the embodiment of the invention can identify the voice request with multiple intentions, improve the function of the dialog system and improve the intelligence of the dialog system.

It should be understood that, in various embodiments of the present invention, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and the inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Fig. 4 is a schematic diagram of a question answering processing device according to an embodiment of the invention; the apparatus 400 comprises: an obtaining module 401, configured to obtain a voice request; a processing module 402, configured to process the voice request to obtain an intention classification corresponding to the voice request; and a searching module 403, configured to perform answer searching operation corresponding to the intention classification on the voice request, and feed back a search answer.

In an alternative embodiment, the processing module 402 includes: the semantic analysis unit is used for carrying out semantic analysis on the voice request; an intention identification unit for performing intention identification on the voice request by using a model; a determining unit configured to determine a voice request, the semantic parsing result of which is scene-free and the intention recognition result of which is an ambiguous intention, as an incomplete intention; and determining the voice request with the semantic parsing result of having scenes and the intention recognition result of being clear intention as a complete intention.

In an alternative embodiment, the search module 403 includes: the first search unit is used for searching answers of the voice requests under the scene indicated by the semantic analysis result if the intention classification is a complete intention; and the second searching unit is used for predicting a plurality of preselected scenes corresponding to the voice request if the intention classification is an incomplete intention, selecting the preselected scenes meeting the first preset condition from the preselected scenes as candidate scenes, sending an inquiry request corresponding to the candidate scenes, and searching answers of the voice request under the candidate scenes indicated by the inquiry request result.

In an alternative embodiment, the first search unit includes: the entity value unit is used for searching answers corresponding to the entity values under the scene indicated by the semantic analysis result if the entity values exist in the semantic analysis result, and selecting answers of the voice requests from the searched answers; and the scene unit is used for searching answers corresponding to the scenes indicated by the semantic analysis result if the entity value does not exist in the semantic analysis result, and selecting answers of the voice request from the searched answers.

In an alternative embodiment, the entity value unit includes: the first entity value unit is used for searching answers corresponding to the entity values under the scene indicated by the semantic analysis result if one entity value exists; a second entity value unit, configured to determine whether multiple entity values simultaneously satisfy the same semantic slot if multiple entity values exist; if yes, respectively searching answers aiming at each entity value under the scene indicated by the semantic analysis result, and selecting any one answer from the answers corresponding to each entity value; aiming at a plurality of entity values, splicing a plurality of selected answers to obtain search answers corresponding to the entity values; if not, searching answers meeting the multiple entity values simultaneously under the scene indicated by the semantic analysis result.

In an alternative embodiment, the first preset condition refers to a preselected scene with the first two confidence levels and the search answer.

The device can execute the question answering processing method provided by the embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the question answering processing method. For technical details that are not described in detail in this embodiment, reference may be made to the question answering processing method provided in the embodiment of the present invention.

As shown in fig. 5, the system architecture 500 may include

terminal devices

501, 502, 503, a network 504, and a server 505, which are exemplary system architecture diagrams to which embodiments of the present invention may be applied. The network 504 serves to provide a medium for communication links between the

terminal devices

501, 502, 503 and the server 505. Network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

501, 502, 503 to interact with a server 505 over a network 504 to receive or send messages or the like. The

terminal devices

501, 502, 503 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

501, 502, 503 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 505 may be a server providing various services, such as a background management server (for example only) providing support for click events generated by users using the

terminal devices

501, 502, 503. The background management server may analyze and perform other processing on the received click data, text content, and other data, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.

It should be noted that the question answering processing method provided in the embodiment of the present application is generally executed by the server 505, and accordingly, the interpretation device is generally disposed in the server 505.

It should be understood that the number of terminal devices, networks, and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 6, shown is a block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the system 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604. The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 601.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a sending module, an obtaining module, a determining module, and a first processing module. The names of these modules do not in some cases constitute a limitation on the unit itself, and for example, the sending module may also be described as a "module that sends a picture acquisition request to a connected server".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: s101, acquiring a voice request. S102, processing the voice request to obtain an intention classification corresponding to the voice request. S103, answer searching operation corresponding to the intention classification is carried out on the voice request, and searching answers are fed back.

The system provided by the embodiment of the invention can process not only single-intention conversations, but also incomplete intentions and multi-intention conversations, thereby improving the intelligence of a conversation system.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A question-answer processing method, characterized by comprising:

acquiring a voice request;

processing the voice request to obtain an intention classification corresponding to the voice request;

and executing answer searching operation corresponding to the intention classification on the voice request, and feeding back a searching answer.

2. The method of claim 1, wherein the processing the voice request to obtain an intent classification corresponding to the voice request comprises:

performing semantic analysis on the voice request;

performing intention recognition on the voice request by utilizing a model;

determining the voice request with the semantic parsing result being scene-free and the intention recognition result being ambiguous intention as incomplete intention; and determining the voice request with the semantic parsing result of having scenes and the intention recognition result of being clear intention as a complete intention.

3. The method of claim 2, wherein performing an answer search operation on the voice request corresponding to the intent classification comprises:

if the intention classification is a complete intention, searching answers of the voice request under the scene indicated by the semantic parsing result;

if the intention classification is an incomplete intention, predicting a plurality of preselected scenes corresponding to the voice request, and selecting a preselected scene meeting a first preset condition from the preselected scenes as a candidate scene; and sending a query request corresponding to the candidate scene, and searching for an answer of the voice request under the candidate scene indicated by the query request result.

4. The method according to claim 3, wherein the searching for the answer of the voice request in the scene indicated by the semantic parsing result comprises:

if the semantic analysis result has an entity value, searching answers corresponding to the entity value in a scene indicated by the semantic analysis result, and selecting answers of the voice request from the searched answers;

and if the semantic analysis result does not have an entity value, searching answers corresponding to the scene indicated by the semantic analysis result, and selecting answers of the voice request from the searched answers.

5. The method according to claim 4, wherein if an entity value exists in the semantic analysis result, searching for an answer corresponding to the entity value in a scene indicated by the semantic analysis result comprises:

if an entity value exists, searching an answer corresponding to the entity value under the scene indicated by the semantic analysis result;

if a plurality of entity values exist, determining whether the entity values simultaneously satisfy the same semantic slot; if yes, respectively searching answers aiming at each entity value under the scene indicated by the semantic analysis result, and selecting any one answer from answers corresponding to each entity value; aiming at a plurality of entity values, splicing a plurality of selected answers to obtain search answers corresponding to the entity values; if not, searching answers meeting the entity values simultaneously under the scene indicated by the semantic analysis result.

6. The method according to claim 3, wherein the first preset condition is a preselected scene with the first two confidence levels and the search answer.

7. A question-answering processing apparatus characterized by comprising:

the acquisition module is used for acquiring a voice request;

the processing module is used for processing the voice request to obtain an intention classification corresponding to the voice request;

and the searching module is used for executing answer searching operation corresponding to the intention classification on the voice request and feeding back searching answers.

8. The apparatus of claim 7, wherein the processing module comprises:

the semantic analysis unit is used for carrying out semantic analysis on the voice request;

an intention recognition unit for performing intention recognition on the voice request by using a model;

a determining unit configured to determine a voice request, the semantic parsing result of which is scene-free and the intention recognition result of which is an ambiguous intention, as an incomplete intention; and determining the voice request with the semantic parsing result of having scenes and the intention recognition result of being clear intention as a complete intention.

9. The apparatus of claim 8, wherein the search module comprises:

the first searching unit is used for searching answers of the voice requests under the scene indicated by the semantic parsing result if the intention classification is a complete intention;

and the second searching unit is used for predicting a plurality of preselected scenes corresponding to the voice request if the intention classification is an incomplete intention, selecting the preselected scenes meeting a first preset condition from the preselected scenes as candidate scenes, sending an inquiry request corresponding to the candidate scenes, and searching answers of the voice request under the candidate scenes indicated by the inquiry request result.

10. The apparatus of claim 9, wherein the first search unit comprises:

an entity value unit, configured to search, if an entity value exists in the semantic analysis result, an answer corresponding to the entity value in a scene indicated by the semantic analysis result, and select an answer to the voice request from the search answers;

and the scene unit is used for searching answers corresponding to the scenes indicated by the semantic analysis result if the semantic analysis result does not have an entity value, and selecting the answer of the voice request from the searched answers.