CN110765312A

CN110765312A - Man-machine interaction and content search method, device, equipment and storage medium

Info

Publication number: CN110765312A
Application number: CN201810753299.6A
Authority: CN
Inventors: 许侃; 姚维; 马骥; 陈康增
Original assignee: Alibaba Group Holding Ltd
Current assignee: Banma Zhixing Network Hongkong Co Ltd
Priority date: 2018-07-10
Filing date: 2018-07-10
Publication date: 2020-02-07

Abstract

The disclosure provides a human-computer interaction and content search method, a human-computer interaction and content search device, human-computer interaction and content search equipment and a storage medium. Identifying an intention category to which intention information in an instruction input by a user in the current turn belongs; updating the category combination of the previous round based on the intention category of the current round to obtain the category combination of the current round; and determining the content fed back to the user by the current turn based on the category combination of the current turn. Therefore, the interactive flow is set by taking the real intention expression of the user as a clue, so that the intention of the user in the interactive process is not limited, and the intention of the user can be switched at any time.

Description

Man-machine interaction and content search method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of human-computer interaction, and in particular, to a method, an apparatus, a device, and a storage medium for human-computer interaction and content search.

Background

Human-Computer Interaction technologies (Human-Computer Interaction technologies) refers to a technology for realizing Human-Computer Interaction through Computer input and output devices. When the human-computer interaction technology is applied to a specific business scenario, multiple rounds of conversations with a user are generally required to determine a specific intention of the user so as to provide a corresponding service for the user.

Taking an interactive scenario in which a user performs video resource search through voice as an example, due to the fact that videos are various, multiple sessions with the user are usually required to be performed to determine the category of the video that the user desires to watch. In the traditional voice interaction scheme, a multi-round interaction process is preset, the intention input of a user in each round is limited, and the watching intention of the user is finally determined through turn-by-turn intention logic judgment from top to bottom.

The disadvantage of this solution is that each turn has a predetermined grammar limit, the user can only express the intention within a predetermined range of intentions, and if the user intention exceeds the predetermined grammar limit, the user cannot give correct judgment and feedback, and the user needs to perform error processing, so that the interactive experience is not natural and smooth enough. Moreover, the whole process is irreversible linear logic from top to bottom, and cannot backtrack.

Therefore, there is a need for a more user-friendly interaction scheme.

Disclosure of Invention

It is an object of the present disclosure to provide a human-computer interaction scheme that is more user-friendly.

According to a first aspect of the present disclosure, there is provided a human-computer interaction method, including: identifying an intention category to which intention information in an instruction input by a user in the current turn belongs; updating the category combination of the previous round based on the intention category of the current round to obtain the category combination of the current round; and determining the content fed back to the user by the current turn based on the category combination of the current turn.

Optionally, the human-computer interaction method further includes: the content is sent to the user.

Optionally, the step of updating the category combination of the previous round includes: and overlapping the intention category of the current round with the intention category of the previous round in the category combination of the previous round.

Optionally, the step of superimposing the intention category of the current round with the intention category of the previous round in the category combination of the previous round comprises: judging whether the intention type of the current round is mutually exclusive with the intention type of the previous round; and deleting the mutually exclusive intention categories from the intention categories of the current round in the category combination of the previous round.

Optionally, the human-computer interaction method further includes: and deleting a preset number of intention categories which are ranked at the top in the category combination of the current round according to the chronological order under the condition that the content is empty.

Optionally, the content comprises: recommendation information matched with the category combination of the current round; and/or a category label that is different from the intent category in the category combination of the current turn.

Optionally, the human-computer interaction method further includes: extracting a keyword of intention information in an instruction when the intention category to which the intention information belongs in the instruction input by the user in the current turn is empty; and determining the content fed back to the user by the current round based on the keyword and the category combination of the previous round.

Optionally, the human-computer interaction method further includes: pre-dividing a plurality of intent levels having a predetermined priority, each intent level comprising one or more intent categories, the step of determining the content of the current turn to be fed back to the user comprising: and according to the priority order, combining the categories of the current turn with the intention categories under the uncovered intention level, and determining the categories as the contents fed back to the user by the current turn.

Optionally, the instruction is a voice instruction and the method is for determining a voice search intention of the user.

According to a second aspect of the present disclosure, there is also provided a content search method implemented based on a dialog, including: in response to a content search instruction of a current conversation turn of a user, identifying an intention category to which content search intention information in the content search instruction belongs; updating the category combination of the previous conversation turn based on the intention category of the current conversation turn to obtain the category combination of the current conversation turn; and based on the category combination of the current conversation turn, performing a search to determine content that the current conversation turn feeds back to the user.

According to a third aspect of the present disclosure, there is also provided a human-computer interaction device, including: the category identification module is used for identifying the intention category to which the intention information in the instruction input by the user in the current turn belongs; the combination updating module is used for updating the category combination of the previous round based on the intention category of the current round so as to obtain the category combination of the current round; and the content determining module is used for determining the content fed back to the user by the current turn based on the category combination of the current turn.

Optionally, the human-computer interaction device further includes: and the sending module is used for sending the content to the user.

Optionally, the combination update module superimposes the intention category of the current round with the intention category of the previous round in the category combination of the previous round to obtain the category combination of the current round.

Optionally, the combination update module includes: the judging module is used for judging whether the intention type of the current round is mutually exclusive with the intention type of the previous round; and the first deleting module is used for deleting the intention category which is mutually exclusive with the intention category of the current round in the category combination of the previous round.

Optionally, the human-computer interaction device further includes: and the second deleting module is used for deleting the preset number of intention categories which are ranked at the top in the category combination of the current round according to the time sequence under the condition that the content is empty.

Optionally, the human-computer interaction device further includes: the keyword extraction module is used for extracting the keywords of the intention information in the instruction under the condition that the intention category to which the intention information belongs in the instruction input by the user in the current round is empty, and the content determination module is used for determining the content fed back to the user in the current round based on the keywords and the category combination of the previous round.

Optionally, the human-computer interaction device further includes: the content determination module combines the categories of the current turn with the intent categories under the uncovered intent levels according to the priority order to determine the content fed back to the user by the current turn.

Optionally, the instruction is a voice instruction, the apparatus being for determining a voice search intention of the user.

According to a fourth aspect of the present disclosure, there is also provided a content search apparatus based on dialog implementation, including: the category identification module is used for responding to a content search instruction of the current conversation turn of the user and identifying an intention category to which content search intention information in the content search instruction belongs; the combination updating module is used for updating the category combination of the previous conversation turn based on the intention category of the current conversation turn so as to obtain the category combination of the current conversation turn; and a content determination module for performing a search based on the category combination of the current conversation turn to determine content fed back to the user by the current conversation turn.

According to a fifth aspect of the present disclosure, there is also provided a computing device comprising: a processor; and a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform a human-machine interaction method as set forth in the first aspect of the disclosure.

According to a sixth aspect of the present disclosure, there is also provided a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform a human-machine interaction method as set forth in the first aspect of the present disclosure.

The method can achieve repeated screening of multiple rounds of interaction without limiting the number of rounds by shifting the emphasis to the analysis and classification of the real intention of the user, and the screening logic is the same for each round. By adopting the human-computer interaction and content search scheme disclosed by the invention, the user intention is not limited by the set grammar and is not limited by the number of conversation rounds.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.

FIG. 1 is a schematic flow chart diagram illustrating a human-computer interaction method according to an embodiment of the present disclosure.

Fig. 2 is a schematic block diagram illustrating the structure of a human-computer interaction device according to an embodiment of the present disclosure.

FIG. 3 shows a schematic diagram of a human-machine interaction scheme of the present disclosure performed by a client and a server in cooperation.

FIG. 4 shows a schematic diagram of an interface for feedback to a user.

FIG. 5 shows a schematic structural diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The man-machine interaction scheme of the present disclosure is particularly suitable for interaction scenarios requiring multiple rounds of dialog with the user. For example, the scene may be a resource search scene such as video, music, information, etc., or a business scene such as ticket booking, online ordering, etc. A conversation (i.e., a conversation) refers to an interactive process between a computer and a user. A complete dialog includes a process of making feedback from user input to the machine for the user input. The multi-turn conversation refers to the whole interaction process, and user input needs to be acquired through more than one turn of conversation so as to finally give a feedback result meeting the requirements of the user.

As described in the background section, for such an interaction scenario requiring multiple rounds of dialog implementation, in the existing scheme, a system presets an interaction flow of multiple rounds of conversations, each round has a given grammar limit, and a user can only express an intention within a given range.

In order to prevent the user intention from being limited by the established grammar in the interaction process, the disclosure proposes that the interaction flow can be set by taking the real intention expression of the user as a clue. Specifically, the intention of the user in each dialog may be identified and categorized, and the resulting intention category may be used as a screening condition in one dimension. In addition, each turn of dialog can retain the intention category of the previous turn, so that a filtering condition combination (namely, an intention category combination) can be obtained. In the multi-turn conversation process, the screening condition combination can be continuously updated so as to gradually narrow the screening range and finally obtain the screening result meeting the user requirements.

Further, since the present disclosure uses the real intention expression of the user as a clue, the interactive process is set. Therefore, the content fed back to the user in each turn of the dialog may only serve as a prompt for prompting the user to express an intention related to the content fed back, and the intention of the user is not limited. That is, the user may not be limited by the content fed back for him when expressing the intent, and may express the intent outside the range characterized by the content of the feedback. For example, in the case of recognizing that the intention of the user is "i want to watch tv", the user may be fed back with a prompt content "which actor is starring you want to watch tv", and according to the prompt content, the user may make an intention expression that "i want to watch tv which is starring ice", "i want to show tv which is starring" and the like are within the scope of the prompt content, or may make a real intention expression according to the preference of the user without being limited by the prompt content, such as making an intention expression that "i want to watch tv of a comedy type" and the like are out of the scope of the prompt content.

Therefore, the method and the device can achieve repeated screening of multiple rounds of intersection without limiting the number of rounds by shifting the emphasis to the analysis and classification of the real intention of the user, and the screening logic is the same for each round. By adopting the man-machine interaction scheme disclosed by the invention, the user intention is not limited by the set grammar and is not limited by the number of conversation rounds.

The man-machine interaction scheme of the present disclosure is further explained below.

Referring to fig. 1, in step S110, an intention category to which intention information in an instruction input by the user in the current turn belongs is identified.

The instruction input by the user may be voice data sent by the user, or text information input by the user through an input device such as a keyboard and a touch screen.

Intent refers to the intended goal that the user desires to achieve during the interaction. Intent information as referred to herein refers to the desired goal to be achieved in the user-entered command for the current turn. The intention information may be a positive intention or a negative intention. For example, when the voice instruction uttered by the user is "i want to watch a movie", the intention information is "watch a movie", which is an affirmative intention. When the voice command sent by the user is 'i do not want to watch comedy movies', the intention information is 'do not watch comedy movies', and the intention belongs to negative intention.

The intention category is category information obtained by classifying intention information, namely a category label. For example, when the intent information is "watch movie", the corresponding intent category may be a category label of "movie". Wherein, the intention category can be regarded as a filtering condition characterized by the instruction input by the user in the current turn.

As described above, the intention information may be a positive intention or a negative intention. Therefore, the intent categories mentioned herein may be affirmative intent categories or negative intent categories. A positive intent category may be considered a positive screening condition and a negative intent category may be considered a negative screening condition. The positive screening condition is a condition to be satisfied, and the negative screening condition is a condition to be excluded.

The intention information in the instruction input by the user may be first identified and then the identified intention information may be categorized to derive an intention category. For example, in the case where the instruction input by the user is text information, the intention information in the instruction may be determined by a semantic recognition technique, such as may be recognized by an NLU (Natural Language Understanding technique). In the case where the instruction input by the user is Speech, the Speech may be first converted into text, for example, the Speech may be converted into text by using an ASR (Automatic Speech Recognition) technique, and then the intention information in the text is determined by a semantic Recognition technique. The classification of the intention information can also be realized by using a semantic recognition technology, and the details are not repeated here.

As an example of the present disclosure, a plurality of different intention categories may be divided in advance according to a specific application scenario of human-computer interaction, so as to identify an intention category to which intention information in an instruction belongs. For example, multiple levels of intent may be divided, each level of intent may include one or more categories of intent. Wherein, a plurality of intention levels can be regarded as multi-level filtering conditions, and the intention levels can have a predetermined priority therebetween. Each intent category may be considered a screening condition for one dimension.

Taking an interactive scenario applied to a video resource search as an example, the following plurality of intention levels having predetermined priorities may be divided. The first level of intent: intent categories that may include movies, television shows, art, animations, etc. representing video categories; the second intention level: intent categories that may represent video types including antique, idol, comedy, science fiction, etc.; the third level of intent: the intention category may include intention categories indicating countries/regions such as usa, korea, japan, etc., intention categories indicating product units such as TVB, CCTV, BBC, disney, etc., intention categories indicating show times such as new films, specific times, old films, etc., and intention categories related to names of people such as actors, director, sound quality, character name, sports star, etc.

In step S120, based on the intention category of the current round, the category combination of the previous round is updated to obtain the category combination of the current round.

The combination of categories under each turn is based on the intent categories of the previous turn, as may be a result of an overlay of the intent categories of the previous turn. And the intention category of each round can be regarded as the screening condition of one dimension, so the category combination under each round can be regarded as the screening condition combination under the round.

The intention category of the current round may be superimposed with the intention category of the previous round in the category combination of the previous round to obtain the category combination of the current round. In the overlapping process, whether the intention category of the current round is mutually exclusive with the intention category of the previous round can be judged, and the intention category which is mutually exclusive with the intention category of the current round in the category combination of the previous round can be deleted. Thus, the user can naturally and smoothly transform the intention during the multi-turn interaction.

In step S130, based on the category combination of the current turn, the content fed back to the user by the current turn is determined.

The category combination of the current round can be regarded as the condition screening combination of the user in the current state. Therefore, the content of the current turn fed back to the user can be determined based on the category combination of the current turn. The feedback content may be a category label different from the intended category in the category combination of the current round, or may be recommendation information matching the category combination of the current round. That is, the content fed back to the user may include a category tag different from the intention category in the category combination of the current round, and may also include recommendation information matching the category combination of the current round.

For example, as described above, a plurality of intention levels having predetermined priorities, each intention level including one or more intention categories, may be divided in advance. Therefore, preferably, the category of the current turn may be combined with the intention category under the intention level not covered by the category in the priority order, and the combination is used as the content fed back to the user by the current turn, so as to prompt the user to express the related intention, thereby narrowing the filtering range.

For another example, recommendation information (such as hit information) meeting the category combination of the current turn may also be fed back to the user as a feedback result, so as to quickly feed back result information to the user, thereby improving the interaction experience of the user.

In the case where it is determined that the content fed back to the user by the current round is empty based on the category combination of the current round, the intention category in the category combination under the current round may be deleted. The deletion principle may be set to delete a predetermined number of intention categories ranked at the top in the category combination of the current round in chronological order. And then determining the content fed back to the user in the current turn based on the pruned category combination. If the content is still empty, an error can be returned, and the intention category with the top rank can be continuously deleted according to the chronological order.

As an example of the present disclosure, in a case where the intention information in the instruction input by the user in the current round does not have a corresponding intention category, that is, the intention category to which the intention information belongs is empty, a keyword of the intention information in the instruction may be extracted, and then based on the keyword and a category combination of the previous round, the content fed back to the user in the current round may be determined.

In summary, after determining the content fed back to the user in the current turn, the content may be sent to the user. The sent content is only used for prompting the user and does not limit the intention expression of the user. That is, the user may not be limited by the content fed back for him when expressing the intent, and may express the intent outside the range characterized by the content of the feedback.

So far, the implementation flow of the man-machine interaction method of the present disclosure is described in detail with reference to fig. 1.

The man-machine interaction method can be applied to searching scenes, such as video, music, information and other content searching scenes. When the man-machine interaction method of the present disclosure is applied to a search scene, the present disclosure may be implemented as a content search method implemented based on a dialog.

Specifically, in response to a content search instruction (which may be a voice instruction or an input text instruction) of the current dialog turn of the user, an intention category to which content search intention information in the content search instruction belongs may be identified. For the meaning of the intention category and the identification process thereof, reference may be made to the description above in connection with fig. 1, which is not repeated here.

Based on the intent categories identified for the current conversation turn, the category combination for the previous conversation turn may be updated to obtain the category combination for the current conversation turn. The meaning of the category combination and the update mechanism can be referred to the description above in conjunction with fig. 1, and are not described here again.

Based on the category combination for the current conversation turn, a search may be performed to determine what the current conversation turn feeds back to the user. By performing the search, content matching the category combination of the current session turn may be found and fed back to the user. And the content fed back to the user can also simultaneously comprise the category labels with different intention categories in the category combination of the current turn so as to prompt the user to express the search intention of the user aiming at the category labels which are not involved.

In addition, the man-machine interaction method can also be realized as a man-machine interaction device.

Fig. 2 is a schematic block diagram illustrating the structure of a human-computer interaction device according to an embodiment of the present disclosure. The functional modules of the human-computer interaction device can be realized by hardware, software or a combination of hardware and software for realizing the principle of the invention. It will be appreciated by those skilled in the art that the functional blocks described in fig. 2 may be combined or divided into sub-blocks to implement the principles of the invention described above. Thus, the description herein may support any possible combination, or division, or further definition of the functional modules described herein.

In the following, functional modules that the human-computer interaction device can have and operations that each functional module can perform are briefly described, and details related thereto may be referred to the description above in conjunction with fig. 1, and are not repeated here.

Referring to fig. 2, the human-computer interaction device 200 may include a category identification module 210, a combination update module 220, and a content determination module 230.

The category identification module 210 is configured to identify an intention category to which intention information in an instruction input by a user in a current turn belongs. Where the instructions referred to herein may be voice instructions, the apparatus may be configured to determine a voice search intent of the user.

The combination updating module 220 is configured to update the category combination of the previous round based on the intention category of the current round to obtain the category combination of the current round. The combination update module 220 may superimpose the intention category of the current round with the intention category of the previous round in the category combination of the previous round to obtain the category combination of the current round.

As an example of the present disclosure, as shown in fig. 2, the combination update module 220 may optionally include a judgment module 221 and a first deletion module 223, which are shown by dashed boxes in the figure. The determining module 221 is configured to determine whether the intention category of the current round is mutually exclusive with the intention category of the previous round. The first deleting module 223 is used to delete the mutually exclusive intention category from the intention category of the current round in the category combination of the previous round.

The content determining module 230 is configured to determine content to be fed back to the user in the current turn based on the category combination of the current turn. The determined content may include: recommendation information matched with the category combination of the current round; and/or a category label that is different from the intent category in the category combination of the current turn.

As shown in fig. 2, the human-computer interaction device 200 may further include a transmission module 240. After the content determination module 230 determines the content that is fed back to the user in the current turn, the content may be sent to the user by the sending module 240.

As shown in fig. 2, the human-computer interaction device 200 may further include a second deleting module 250, configured to delete, in chronological order, a predetermined number of intention categories ranked at the top in the category combination of the current turn if the content is empty.

As an example of the present disclosure, as shown in FIG. 2, the combination update module 220 may optionally include a keyword extraction module 260, which is shown in the figure by a dashed box. The keyword extraction module 260 is configured to, when an intention category to which intention information in an instruction input by a user in a current turn belongs is empty, extract a keyword of the intention information in the instruction. The content determination module 230 determines the content fed back to the user in the current round based on the keyword and the category combination of the previous round.

As an example of the present disclosure, as shown in fig. 2, the combination update module 220 may optionally include a dividing module 270, shown by a dashed box in the figure, for dividing a plurality of intention levels having a predetermined priority in advance, each intention level including one or more intention categories. The content determining module 230 may determine, according to the priority order, the category of the current turn as the content fed back to the user in the current turn by combining the categories of the current turn with the intention categories at the intention levels not covered by the categories of the current turn.

The man-machine interaction device 200 of the present disclosure may also be configured as a content search device based on dialog implementation. For example, the category identification module 210 may identify an intent category to which content search intent information in the content search instruction belongs in response to the content search instruction of the current conversation turn of the user. The combination update module 220 may update the combination of categories for the previous conversation turn based on the intent category for the current conversation turn to obtain the combination of categories for the current conversation turn. The content determination module 230 may perform a search to determine content that the current conversation turn feeds back to the user based on the category combination of the current conversation turn.

Application example

FIG. 3 shows a schematic diagram of a human-machine interaction scheme of the present disclosure performed by a client and a server in cooperation. Taking an interactive scenario applied to video resource search as an example, the client may be a smart Television (TV), and the server may be a server connected to a smart TV network. The user can interact with the intelligent television through voice, and the intelligent television can feed back video resources meeting the search requirements of the user through interaction with the user.

The first round of interaction is as follows.

In response to a voice instruction issued by a user, the client may recognize the voice instruction issued by the user based on an automatic speech Recognition technology (ASR), and convert it into text information (which may be referred to as "ASR data"). The client may then send the resulting data to the server.

The server may analyze the received data, for example, may perform semantic analysis on the data uploaded by the client by using a Natural Language Understanding (NLU) technology, so as to identify intention information therein, and may classify the identified intention information.

For example, the server may pre-populate a plurality of intent categories and store the classification information in a data file (which may be a file in XML/JS/JSGF format, etc.). After identifying the intention information, the server can compare the intention information with intention categories divided in the data file to determine the intention category to which the intention information belongs.

After the server identifies the intention category, the server may determine an interface to be fed back to the user, where the fed back interface may include a tag guessed based on the intention and a movie and television search result. For example, the interface shown in fig. 4 may be an interface that includes a plurality of tags and movie search results.

Specifically, the server side can select one or more intention categories from the intention categories not related to the interaction so far and send the intention categories to the client side as labels fed back to the user. And the server can also search the video database for the video data meeting the identified intention category at the same time, and send the video data to the client. And the client can generate a corresponding interface according to the label and/or the video data sent by the server and feed back the interface to the user.

So far, the first round of interaction process is finished, and the second round of interaction process is as follows.

Taking the example that the content fed back by the client in the first round of interaction process is the tag + the recommended video, the user can click the recommended video to watch, and at this time, the interaction process is paused or ended. In addition, the user may also express an intention with respect to the feedback category label, for example, in the case where the feedback category label is "actor", the user may speak a desired actor name. Of course, the user does not have to make an intention expression within the range of the prompted category label, in other words, the user can freely express the intention according to own will.

In response to the voice instruction issued by the user in the second round of interaction, the voice data can be converted into text information by the client and then uploaded to the server. And performing semantic analysis by the server to identify the intention type of the current turn. This process is similar to the first round of interaction process and will not be described here.

In contrast, after the server identifies the intention category, the server may combine the intention category identified in the previous round with the intention category identified in the current round to obtain an intention category combination, that is, a screening condition combination. The server side can determine the content fed back to the user in the turn according to the screening condition combination. And then the content is sent to the client and pushed to the user by the client.

Therefore, in the multi-round interaction process, the intention category combination of the previous round can be updated based on the intention category identified in the current round, so that the screening range is gradually reduced, and the screening result meeting the user requirement is finally obtained. And, in the interaction process, the user intention is not limited.

An example of a multi-pass interaction process implemented with the present disclosure is as follows.

The multi-level screening conditions may be divided in advance. As an example, the following multi-stage screening conditions can be classified.

Primary screening conditions: categories (movies, TV shows, anaglyphs, animations, etc.)

Secondary screening conditions: type (ancient clothes, idol, comedy, science fiction, etc.)

And (3) three-stage screening conditions: country/region (usa, korea, japan, etc.), production unit (TVB, CCTV, BBC, disney, etc.), quality (winning, scoring, classic), show time (new, specific, old), name of person (actor, director, sound quality, character name, sports star, etc.), and the like.

First round

User: i want to watch a movie

TV (client): please select (recommend movie second dimension label + hot movie)

Second round

User: of comedy

TV (client): please select (recommend three-level dimension label of comedy movie + hot comedy movie)

The third round

User: in question

TV: please choose (recommend suspense movie three-level dimension label + hot suspense movie)

The fourth round

User: of Liu De Hua

TV: please choose (recommend Liu De Hua suspense movie three-level dimension label + Liu De Hua hot suspense movie)

The fifth round

User: periya of

TV: please choose (recommend Pengyan suspense movie three-level dimension label + Pengyan hot door suspense movie)

It can be seen that, in the present disclosure, the intention category obtained by identifying the intention of the user is set as a clue, no flow logic is preset for the user, the intention of the user is not limited in the interactive process, and the intention of the user can be switched at any time.

The present disclosure is to release the emphasis from the preset restrictions of multi-round processes, and shift the design emphasis to profiling of the user's true intent. Through the classification of the user intention, the repeated screening of multiple rounds of intersection without limiting the number of rounds is achieved, and the screening logic is the same for each round. By adopting the method, the user intention is not limited by the set grammar and the number of the conversation turns, the intention can be converted more naturally and smoothly, and the logic backtracking can be achieved.

When the scheme disclosed by the invention is applied to voice interactive search of films, fuzzy matching can be carried out according to the labels in user conversation, and the user intention is presumed to recommend the relevant labels to the user. By adopting the principle of label superposition and mutual exclusion deletion, repeated screening without limitation of the number of rounds can be realized. And the user can switch the labels of various levels arbitrarily without limiting the intention without presetting grammar limitation.

Fig. 5 is a schematic structural diagram of a data processing computing device that can be used to implement the above-described human-computer interaction or content search method according to an embodiment of the present invention.

Referring to fig. 5, computing device 500 includes memory 510 and processor 520.

The processor 520 may be a multi-core processor or may include a plurality of processors. In some embodiments, processor 520 may include a general-purpose host processor and one or more special coprocessors such as a Graphics Processor (GPU), a Digital Signal Processor (DSP), or the like. In some embodiments, processor 420 may be implemented using custom circuits, such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA).

The memory 510 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions for the processor 520 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 510 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, may also be employed. In some embodiments, memory 410 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual layer DVD-ROM), a read-only Blu-ray disc, an ultra-density optical disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disc, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.

The memory 510 has stored thereon executable code, which, when executed by the processor 520, may cause the processor 520 to perform the above-mentioned human-machine interaction method or content search method.

The human-computer interaction and content search method, apparatus, and computing device according to the present disclosure have been described in detail above with reference to the accompanying drawings.

Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for carrying out the above-mentioned steps defined in the above-mentioned method of the invention.

Alternatively, the invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the steps of the above-described method according to the invention.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A human-computer interaction method, comprising:

identifying an intention category to which intention information in an instruction input by a user in the current turn belongs;

updating the category combination of the previous round based on the intention category of the current round to obtain the category combination of the current round; and

and determining the content fed back to the user by the current turn based on the category combination of the current turn.

2. The human-computer interaction method of claim 1, further comprising:

sending the content to the user.

3. The human-computer interaction method according to claim 1, wherein the step of updating the category combination of the previous round comprises:

and overlapping the intention category of the current round with the intention category of the previous round in the category combination of the previous round.

4. The human-computer interaction method according to claim 3, wherein the step of overlaying the intention category of the current round with the intention category of the previous round in the category combination of the previous round comprises:

judging whether the intention type of the current round is mutually exclusive with the intention type of the previous round; and is

And deleting the mutually exclusive intention categories from the intention categories of the current round in the category combination of the previous round.

5. The human-computer interaction method according to claim 1, further comprising:

and under the condition that the content is empty, deleting a preset number of intention categories which are ranked at the top in the category combination of the current round according to the time sequence.

6. The human-computer interaction method according to claim 1, wherein the content comprises:

recommendation information matched with the category combination of the current round; and/or

A category label that is different from the intent category in the category combination of the current turn.

7. The human-computer interaction method according to claim 1, further comprising:

extracting a keyword of intention information in an instruction input by a user in the current turn under the condition that the intention category to which the intention information belongs is empty; and

and determining the content fed back to the user by the current round based on the keyword and the category combination of the previous round.

8. The human-computer interaction method according to claim 1, further comprising: pre-dividing a plurality of intention levels having a predetermined priority, each intention level including one or more intention categories,

the step of determining the content fed back to the user in the current turn comprises: and according to the priority order, combining the categories of the current turn with the intention categories under the uncovered intention level, and determining the categories as the contents fed back to the user by the current turn.

9. A human-computer interaction method according to any one of claims 1 to 8, wherein the instruction is a voice instruction and the method is used to determine the voice search intention of the user.

10. A method for searching contents based on dialog implementation, comprising:

in response to a content search instruction of a current conversation turn of a user, identifying an intention category to which content search intention information in the content search instruction belongs;

updating the category combination of the previous conversation turn based on the intention category of the current conversation turn to obtain the category combination of the current conversation turn; and

based on the category combination for the current conversation turn, a search is performed to determine what the current conversation turn feeds back to the user.

11. A human-computer interaction device, comprising:

the category identification module is used for identifying the intention category to which the intention information in the instruction input by the user in the current turn belongs;

the combination updating module is used for updating the category combination of the previous round based on the intention category of the current round so as to obtain the category combination of the current round; and

and the content determining module is used for determining the content fed back to the user by the current round based on the category combination of the current round.

12. A human-computer interaction device as claimed in claim 11, further comprising:

a sending module, configured to send the content to the user.

13. The human-computer interaction device of claim 11,

and the combination updating module superposes the intention category of the current round and the intention category of the previous round in the category combination of the previous round to obtain the category combination of the current round.

14. A human-computer interaction device according to claim 13, wherein the combination update module comprises:

the judging module is used for judging whether the intention type of the current round is mutually exclusive with the intention type of the previous round; and

and the first deleting module is used for deleting the intention category which is mutually exclusive with the intention category of the current round in the category combination of the previous round.

15. The human-computer interaction device of claim 11, further comprising:

and the second deleting module is used for deleting the preset number of intention categories which are ranked at the top in the category combination of the current round according to the time sequence under the condition that the content is empty.

16. A human-computer interaction device according to claim 11, wherein the content comprises:

17. The human-computer interaction device of claim 11, further comprising:

a keyword extraction module for extracting the keyword of the intention information in the instruction when the intention category to which the intention information belongs in the instruction input by the user in the current turn is empty,

and the content determining module determines the content fed back to the user in the current round based on the keyword and the category combination of the previous round.

18. The human-computer interaction device of claim 11, further comprising:

a division module for pre-dividing a plurality of intention levels having a predetermined priority, each intention level including one or more intention categories,

and the content determining module determines the category combination of the current turn as the intention category under the intention level which is not covered by the category combination of the current turn according to the priority order, and determines the contents fed back to the user by the current turn.

19. A human-computer interaction device as claimed in any one of claims 11 to 118, wherein the instructions are voice instructions, the device being arranged to determine the user's voice search intent.

20. A content search apparatus implemented based on a dialog, comprising:

the category identification module is used for responding to a content search instruction of the current conversation turn of the user and identifying an intention category to which content search intention information in the content search instruction belongs;

the combination updating module is used for updating the category combination of the previous conversation turn based on the intention category of the current conversation turn so as to obtain the category combination of the current conversation turn; and

and the content determining module is used for executing search based on the category combination of the current conversation turn to determine the content fed back to the user by the current conversation turn.

21. A computing device, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any of claims 1-10.

22. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any of claims 1-10.