CN116737883A - Man-machine interaction method, device, equipment and storage medium - Google Patents

Man-machine interaction method, device, equipment and storage medium Download PDF

Info

Publication number
CN116737883A
CN116737883A CN202311026638.8A CN202311026638A CN116737883A CN 116737883 A CN116737883 A CN 116737883A CN 202311026638 A CN202311026638 A CN 202311026638A CN 116737883 A CN116737883 A CN 116737883A
Authority
CN
China
Prior art keywords
information
interaction
source
target
natural language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311026638.8A
Other languages
Chinese (zh)
Inventor
梅林海
吴凡
李锐
陈洋
刘权
王士进
魏思
胡国平
刘聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN202311026638.8A priority Critical patent/CN116737883A/en
Publication of CN116737883A publication Critical patent/CN116737883A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a man-machine interaction method, a man-machine interaction device, man-machine interaction equipment and a storage medium, wherein after input interaction information is obtained, a target retrieval source is determined based on the interaction information; retrieving associated information of the interactive information through a target retrieval source; performing color rendering processing on the associated information through a natural language processing model to generate response information of the interaction information; and outputting response information. Based on the application, instead of directly inputting the interactive information into the natural language processing model, the information related to the interactive information is searched in the target search source determined based on the interactive information, and the search source has information updating capability, so the searched related information can be updated information, the related information acquired based on the search source is colored through the natural language processing model, more natural response information is obtained, accurate and natural man-machine interaction is realized, and the man-machine interaction performance is improved.

Description

Man-machine interaction method, device, equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a human-computer interaction method, apparatus, device, and storage medium.
Background
With the continuous development of artificial intelligence technology, dialog systems are receiving more and more attention. At present, many industries begin to apply dialogue systems to customer service, intelligent home control, education and other fields so as to improve working efficiency and user experience.
At present, in order to provide a smoother interaction experience, introducing a natural language processing model with memory capability and powerful natural language processing capability into a dialogue system becomes a research hotspot. The current dialogue system directly inputs the information input by the user into the natural language processing model, and feeds back the processing result obtained by the natural language processing model to the user. However, the knowledge learned by the natural language processing model is limited to the knowledge before the training date, and the knowledge cannot be updated in real time and continuously learned, and also cannot be predicted, so when the information input by the user relates to the information after the training date or the future information, the dialog system cannot give correct feedback, and therefore, the man-machine interaction performance of the dialog system still needs to be improved.
Disclosure of Invention
In view of the above, the present application provides a human-computer interaction method, apparatus, device and storage medium to improve human-computer interaction performance.
In order to achieve the above object, the following solutions have been proposed:
a human-machine interaction method, comprising:
obtaining input interaction information;
determining a target retrieval source based on the interaction information;
retrieving associated information of the interactive information through the target retrieval source;
performing color rendering processing on the associated information through a natural language processing model to generate response information of the interaction information;
and outputting the response information.
The method, optionally, the determining a target retrieval source based on the interaction information includes:
performing intention understanding on the interaction information to obtain interaction intention;
determining a retrieval source corresponding to the interaction intention as the target retrieval source; different interaction intents correspond to different search sources.
The method, optionally, of performing intent understanding on the interaction information includes:
and carrying out intention understanding on the interaction information through the natural language processing model.
The method, optionally, the retrieving, by the target retrieval source, the associated information of the interaction information includes:
if the interaction intention is a first interaction intention, retrieving associated information of the interaction information in a knowledge base;
and if the interaction intention is not the first interaction intention, invoking a third party information source corresponding to the non-first interaction intention to retrieve the associated information of the interaction information.
The method, optionally, the first interaction intention includes: intent related to product sales; the retrieving the associated information of the interaction information in the knowledge base comprises the following steps:
and retrieving the product information associated with the interaction information from a product knowledge base.
The method, optionally, the non-first interaction intention includes: the information source identification and the target information required by the information retrieval of the third party information source; the calling the third party information source corresponding to the non-first interaction intention comprises the following steps:
and inputting the target information into a third-party information source corresponding to the information source identifier based on the information source identifier to obtain the associated information of the interaction information fed back by the third-party information source.
The method, before obtaining the input interaction information, optionally further includes:
when a target face is detected for the first time, outputting target voice information; the target voice information is used for guiding the user represented by the target face to perform man-machine interaction;
the obtaining the input interaction information comprises the following steps:
collecting voice information of the user;
and carrying out voice recognition on the voice information to obtain the interaction information.
The method, optionally, further comprises:
obtaining a facial image of the user; processing the face image to obtain state information of the user;
the color rendering processing is carried out on the associated information through a natural language processing model, and response information of the interaction information is generated, and the method comprises the following steps:
and sequentially splicing the color rendering instruction, the state information of the user and the associated information, and then inputting the spliced information into the natural language processing model to obtain the color rendering associated information generated by the natural language processing model as the response information.
A human-machine interaction device, comprising:
the acquisition module is used for acquiring the input interaction information;
the determining module is used for determining a target retrieval source based on the interaction information;
the retrieval module is used for retrieving the associated information of the interaction information through the target retrieval source;
the color rendering module is used for rendering the associated information through a natural language processing model to generate response information of the interaction information;
and the output module is used for outputting the response information.
A man-machine interaction device comprises a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the human-computer interaction method according to any one of the above.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the human-machine interaction method of any of the preceding claims.
From the above technical solution, it can be seen that the man-machine interaction method, apparatus, device and storage medium provided by the embodiments of the present application determine a target retrieval source based on interaction information after obtaining input interaction information; retrieving associated information of the interactive information through a target retrieval source; performing color rendering processing on the associated information through a natural language processing model to generate response information of the interaction information; and outputting response information. Based on the application, instead of directly inputting the interactive information into the natural language processing model, the information related to the interactive information is searched in the target search source determined based on the interactive information, and the search source has information updating capability, so the searched related information can be updated information, the related information acquired based on the search source is colored through the natural language processing model, more natural response information is obtained, accurate and natural man-machine interaction is realized, and the man-machine interaction performance is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an implementation of a human-computer interaction method disclosed in an embodiment of the present application;
FIG. 2 is a flow chart of one implementation of determining a target search source based on interaction information according to an embodiment of the present application;
FIG. 3 is an example of a system architecture diagram of a human-machine interaction scheme disclosed in an embodiment of the present application;
fig. 4 is a schematic structural diagram of a man-machine interaction device according to an embodiment of the present application;
fig. 5 is a block diagram of a hardware structure of a man-machine interaction device according to an embodiment of the present application.
Detailed Description
Before describing the scheme of the application, related concepts are explained.
Instruction (Prompt): when the user performs a dialogue with an AI (such as a large language model), the instruction to be sent to the AI can be a text description, such as "please help me recommend a popular music" input during the dialogue with the AI, or a parameter description according to a certain format, such as making the AI perform drawing according to a certain format, and describing related drawing parameters.
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The scheme of the application is provided for improving the man-machine interaction performance.
As shown in fig. 1, a flowchart for implementing a man-machine interaction method according to an embodiment of the present application may include:
step S101: and obtaining the input interaction information.
The interaction information may also be referred to as user instructions, user questions, and the like.
Alternatively, the interactive information may be entered into the text editing box by a user via a text input device (e.g., keyboard, touch screen, etc.), or may be entered in a voice manner. Based on this, the input interactive information may be obtained through a text editing box, or the collected voice data may be subjected to voice recognition through a voice recognition device to obtain the input interactive information.
Step S102: a target retrieval source is determined based on the interaction information.
The target retrieval source is an information retrieval source with information real-time updating capability, for example, the target retrieval source can be a knowledge base, or can be a third party information source and the like.
One information retrieval source corresponding to the interactive information may be determined as a target retrieval source among a plurality of information retrieval sources having information updating capability in real time.
The target search sources determined based on the different interaction information may be the same or different.
Step S103: and retrieving the associated information of the interactive information through the target retrieval source.
In the application, the searched associated information is the initial response information of the interaction information.
Step S104: and performing color rendering processing on the associated information through the natural language processing model to generate response information of the interaction information.
Because the retrieved associated information (namely the initial response information) is harder, if the associated information is directly output, the dialogue experience of the user is poorer, and therefore, the associated information is colored through a natural language processing model to generate smoother and natural response information.
The natural language processing model may be a generative model, such as a large language model. As an example, generative models may include, but are not limited to: the model of the transducer architecture may be, for example, GPT (generating Pre-Training) -3, GPT-4, etc. The generative model may also be other generative models, such as PaLM (Pathways Language Model), T5 (Text-to-Text Transfer Transformer), etc.
As an example, the color rendering instruction and the associated information may be sequentially input into the natural language processing model, so as to obtain the associated information after color rendering, that is, the response information, output by the natural language model. The rendering instruction instructs the natural language processing model to render the associated information.
Or alternatively, the process may be performed,
the color rendering instruction, the interaction information and the associated information can be sequentially input into the natural voice processing model to obtain the associated information after color rendering, namely response information, which is output by the natural language model. The color rendering instruction instructs the natural language processing model to render the associated information according to the interaction information. The color rendering results thus obtained are more contextually relevant.
Step S105: and outputting response information.
Alternatively, the response information may be displayed directly.
Or alternatively, the process may be performed,
at least part of the response information can be synthesized into voice, and the voice is output; alternatively, the response information is outputted in synchronization with the voice. That is, only the voice may be played, or the response information may be displayed while the voice is played.
Or alternatively, the process may be performed,
at least part of the response information can be synthesized into voice, and an avatar video corresponding to the voice is generated; synchronizing the voice with the virtual image video; or, the response information, the voice and the avatar video are synchronously output. That is, the voice may be played only while the avatar video is played, or the response information may be displayed while the avatar video and the voice are played.
According to the man-machine interaction method provided by the embodiment of the application, after the input interaction information is obtained, a target retrieval source is determined based on the interaction information; retrieving associated information of the interactive information through a target retrieval source; performing color rendering processing on the associated information through a natural language processing model to generate response information of the interaction information; and outputting response information. Based on the application, instead of directly inputting the interactive information into the natural language processing model, the information related to the interactive information is searched in the target search source determined based on the interactive information, and the search source has the information real-time updating capability, so the searched related information can be updated information, the related information acquired based on the search source is colored through the natural language processing model, more natural response information is obtained, accurate and natural man-machine interaction is realized, and the man-machine interaction performance is improved.
The existence of the retrieval source ensures the accuracy of the reply content, overcomes the defect that the natural language processing model cannot update information in real time by combining the retrieval source with the natural language processing model, and reduces the randomness of the response information generated by the natural language processing model, thereby improving the fluency and accuracy of man-machine interaction.
In an alternative embodiment, a flowchart of an implementation of determining the target retrieval source based on the interaction information is shown in fig. 2, and may include:
step S201: and carrying out intention understanding on the interaction information to obtain the interaction intention.
Alternatively, the intent recognition model (i.e., a model dedicated to intent understanding) trained in advance may be used to understand the intent of the interaction information, so as to obtain the interaction intent of the interaction information.
Or alternatively, the process may be performed,
the intent understanding of the interactive information can be performed by using the natural language processing model to obtain the interactive intent. As an example, the intent understanding instruction and the interaction information may be sequentially input into the above-mentioned natural language processing model, so as to obtain the interaction intent output by the natural language processing model. The intent understanding instruction instructs the natural language processing model to understand intent of the interactive information.
Step S202: determining a retrieval source corresponding to the interaction intention as a target retrieval source; different interaction intents correspond to different search sources.
The application presets the corresponding relation between different interactive intentions and search sources, and after the interactive intentions of the interactive information are obtained, the search source corresponding to the interactive intentions of the interactive information is determined as a target search source.
Optionally, if the interaction intention of the interaction information is the first interaction intention, the associated information of the interaction information is retrieved in the knowledge base.
As an example, the first interaction intent may be: the intention related to the product sales, wherein the knowledge base corresponding to the first interaction intention is a product knowledge base; based on this, the above-mentioned retrieving the association information of the interaction information in the knowledge base may include: product information associated with the interaction information is retrieved from a product knowledge base.
Specifically, the interactive information can be encoded to obtain the encoding characteristics of the interactive information, the encoding characteristics of each knowledge segment in the product knowledge base are obtained (the encoding characteristics of each knowledge segment can be stored in the product knowledge base after being pre-encoded), the association degree of each knowledge segment and the interactive information is calculated based on the encoding characteristics of the interactive information and the encoding characteristics of each knowledge segment, and the knowledge segment with the largest association degree with the interactive information is determined as the product information associated with the interactive information. The association degree of any knowledge segment and the interaction information can be represented by the distance between the coding feature of any knowledge segment and the coding feature of the interaction information.
And if the interaction intention of the interaction information is not the first interaction intention, invoking a third party information source corresponding to the non-first interaction intention to retrieve the associated information of the interaction information.
The different non-first interaction intents correspond to different third party sources.
As an example, the non-first interaction intent includes: the information source identification and the target information required by information retrieval of a third party information source which is not corresponding to the first interaction intention. Wherein the source identifiers characterize third party sources and the different source identifiers characterize different third party sources. For example, the interactive information is "weather in the tomorrow of the fertilizer, the non-first interaction intent may be { skill: weather, time: tomorrow, location: combining fertilizer }; the content weather of the skill field is the information source identifier; the contents "tomorrow" and "syndication" of other fields ("time" field and "place" field) are information required by the weather information source for weather information retrieval.
Based on this, the above-mentioned process of calling the third party source corresponding to the non-first interactive intention to retrieve the associated information of the interactive information may include:
and inputting the target information into a third party information source corresponding to the information source identifier based on the information source identifier, and obtaining the associated information of the interaction information fed back by the third party information source according to the target information.
In an alternative embodiment, the color rendering capabilities of the natural language processing model may be trained based on the knowledge base of the product using a first training data set relating to sales of the product. Each training sample in the first training data set comprises multiple rounds of dialogue data or single round of dialogue data, and further comprises knowledge segments corresponding to each round of dialogue. Each round of dialogue includes user input content and machine reply content, and the corresponding knowledge segments of each round of dialogue are extracted from the product knowledge base. The product knowledge in the product knowledge base may include, but is not limited to: official documents of the product, user forums, customer feedback, etc.
Whether the dialogue data is multi-round dialogue data or single-theory dialogue data is set by simulating a real sales scene, the natural and fluency of the dialogue can be kept through the multi-round dialogue data, the performance of a natural language processing model in practical application is facilitated, the natural language processing model can be integrated into various practical dialogue scenes, and more effective services are provided for users; through the monograph dialogue data, the model has strong knowledge retrieval and extraction capability in practical application, so that various common and complex problems can be answered, and the user requirements are met.
As shown in table 1, an example of a training sample is provided in an embodiment of the present application:
TABLE 1
Included in the training samples shown in table 1 are multiple rounds of dialogue data, and knowledge pieces associated with each round of dialogue data.
As shown in table 2, an example of two training samples provided in an embodiment of the present application:
TABLE 2
Both training samples in table 2 are training samples containing monograph dialogue data and corresponding knowledge segments.
The machine reply content in the training sample can be artificially marked, and can be specifically obtained by artificially coloring the knowledge segments.
Based on the first training data set, both training samples in table 2 are those containing monograph dialogue data and corresponding knowledge segments.
The rendering capabilities of the natural language processing model may be trained as follows:
when training the natural language processing model by using any training sample in the first training data set, corresponding to any round of dialogue data in any training sample, sequentially inputting a color rendering instruction, user content in any round of dialogue data and a knowledge segment corresponding to any round of corresponding data into the natural language model to obtain a color rendering result generated by the natural language processing model, and updating parameters of the natural language processing model by taking machine reply content in any round of dialogue data as a target of the color rendering result. The color rendering instruction instructs the natural language processing model to render the knowledge segments according to the user content.
By training the natural language processing model for multiple rounds of dialogue data and single theory dialogue data, the natural language processing model can better adapt to the user requirements in practical application, intelligently answer questions and efficiently solve user questions.
In an alternative embodiment, the rendering capabilities of the natural language processing model may be trained based on the third party source using a second training data set of information about the third party source. Wherein each training sample in the second training data set includes user input content, machine reply content, and feedback content for the user input content by the third party source. The machine reply content can be artificially marked, and specifically can be obtained by artificially coloring feedback content of a third-party information source aiming at user input content.
As shown in table 3, an example of a training sample is provided in an embodiment of the present application:
TABLE 3 Table 3
Based on the second training data set, the color rendering capability of the natural language processing model can be obtained through training in the following manner:
when training the natural language processing model by using any training sample in the second training data set, sequentially inputting a color rendering instruction, user content in any training sample and feedback content of a third party information source in any training sample into the natural language model to obtain a color rendering result generated by the natural language processing model, and updating parameters of the natural language processing model by taking a machine reply of which the color rendering result is close to any training sample as a target. The color rendering instruction instructs the natural language processing model to perform color rendering processing on feedback content of the third party information source according to user content.
The intent understanding capabilities of the natural language processing model may be trained by a third training data set. Each sample in the third training data set includes user-entered content, and corresponding intent, with each user-entered content corresponding to either a first intent (e.g., sales-related) or a non-first intent (e.g., weather query, flight query, hotel query or stock query, etc.).
The intent understanding capabilities of the natural language processing model may be trained as follows:
when training the natural language processing model by using any training sample in the third training data set, sequentially inputting an intention understanding instruction and user content in any training sample into the natural language model to obtain an intention understanding result generated by the natural language processing model, and updating parameters of the natural language processing model by taking the intention understanding result approaching to the intention corresponding to the user input content in any training sample as a target. The intent understanding instruction instructs the natural language processing model to understand intent of the user input content.
In an optional embodiment, before obtaining the input interaction information, the man-machine interaction method provided by the embodiment of the application may further include:
when a target face is detected for the first time, outputting target voice information; the target voice information is used for guiding the user of the target face representation to perform man-machine interaction.
The first detection of the target face refers to the first detection of the target face within a preset time period. When the target face is detected for the first time, the target voice information is actively output so as to guide the user to pay attention to and participate in man-machine interaction. For example, the target voice message may be "you good, here xxx camping store, what you want to know, i can make a detailed description to you.
Accordingly, the obtaining the input interaction information may include:
collecting voice information of a user represented by a target face; and carrying out voice recognition on the collected voice information to obtain the interaction information.
That is, after the target face is detected for the first time, the target face can be locked (i.e. face tracking is performed) and the interactive information of the user represented by the target face through voice input can be obtained. Specifically, an image and voice can be acquired at the same time, and face detection is carried out on the acquired image sequence to obtain a face region image sequence; matching each frame of face region image in the face region image with the image of the target face detected for the first time to determine a target image sequence, wherein the target image sequence is a face image sequence of a user represented by the target face; obtaining voice data corresponding to a target image sequence, denoising the voice data corresponding to the target image sequence through the target image sequence to obtain voice information of a user represented by a denoised target face (for example, the voice data approaching to the voice data after denoising the voice data can be taken as an acquisition direction, information of fusion voice data and the target image sequence can be obtained as fusion information, the fusion information approaches to the voice data after denoising the voice data, namely, the fusion information is the voice information of the user represented by the denoised target face); and performing voice recognition by using voice information of the user represented by the target face after denoising to obtain a voice recognition result of voice data. The voice recognition result is the interactive information input by the user of the target face representation.
By automatically filtering background noise (non-speech and speech of other users except for the user represented by the target face), the voice signal of one user is intensively processed, so that the smoothness and accuracy of human-computer interaction can be further improved.
In an optional embodiment, the human-computer interaction method provided by the application may further include:
obtaining a face image of a user with a target face representation; the facial image is processed to obtain status information of the user. The status information of the user may include, but is not limited to: gender, age, mood, etc.
The face image can be processed by using a pre-trained state recognition model, so that the state information of the user output by the state recognition model can be obtained.
Correspondingly, the implementation manner of performing the color rendering processing on the related information through the natural language processing model to generate the response information of the interaction information may be as follows:
the related information is subjected to color rendering processing based on the state information of the user through the natural voice model, specifically, the color rendering instruction, the state information of the user, the interaction information and the related information are sequentially spliced and then input into the natural language processing model, and the related information after color rendering generated by the natural language processing model is obtained to serve as response information.
The color rendering instruction instructs the natural language processing model to render the associated information based on the state information and the interaction information of the user, so that the rendered associated information is matched with the state information of the user, and the smoothness and accuracy of human-computer interaction are further improved.
As shown in fig. 3, an example of a system architecture diagram of a man-machine interaction scheme provided by an embodiment of the present application is illustrated by taking a machine performing man-machine interaction as an example of a robot.
The robot collects images and voices in real time;
actively asking: when a user enters the image acquisition range of the robot, the robot can acquire the face image of the user and the voice of the user. The robot actively outputs voice information when determining that the user is close to the robot for the first time within a preset time period according to the collected face image, for example, "you good, in this case, xxx direct store, what you want to know, i can describe you in detail.
Judging the intention of a user: if the user expresses the requirement of the robot through voice, the robot can perform voice recognition on the collected voice of the user to obtain an interactive text; then carrying out intention understanding on the interactive text to obtain interactive intention; the robot can call a special intention understanding model to understand the intention of the interactive text, and can also call the natural language processing model to understand the intention of the interactive text. If the interactive intention is a product sales related intention (abbreviated as sales in fig. 3), the robot invokes a product knowledge base, searching the product knowledge base for product information associated with the interactive text. If the interactive intention is a non-product sales related intention (abbreviated as boring in fig. 3), the robot invokes a third party information source corresponding to the interactive intention to obtain feedback information of the third party information source.
Large model color-rendering reply language: and calling the natural language processing model to carry out color rendering on the product information or the feedback information to obtain a color rendering result.
Multimode reply of robot: the robot synthesizes the color-rendering result into voice; generating an avatar video corresponding to the voice; and synchronously outputting the color rendering result, the voice and the avatar video.
Corresponding to the method embodiment, the embodiment of the present application further provides a man-machine interaction device, as shown in fig. 4, which is a schematic structural diagram of the man-machine interaction device provided by the embodiment of the present application, and may include:
an obtaining module 401, a determining module 402, a retrieving module 403, a color rendering module 404 and an output module 405; wherein, the liquid crystal display device comprises a liquid crystal display device,
the obtaining module 401 is configured to obtain input interaction information;
the determining module 402 is configured to determine a target retrieval source based on the interaction information;
the retrieval module 403 is configured to retrieve association information of the interaction information through the target retrieval source;
the color rendering module 404 is configured to perform color rendering processing on the associated information through a natural language processing model, and generate response information of the interaction information;
the output module 405 is configured to output the response information.
According to the man-machine interaction device provided by the embodiment of the application, after the input interaction information is obtained, a target retrieval source is determined based on the interaction information; retrieving associated information of the interactive information through a target retrieval source; performing color rendering processing on the associated information through a natural language processing model to generate response information of the interaction information; and outputting response information. Based on the application, instead of directly inputting the interactive information into the natural language processing model, the information related to the interactive information is searched in the target search source determined based on the interactive information, and the search source has information updating capability, so the searched related information can be updated information, the related information acquired based on the search source is colored through the natural language processing model, more natural response information is obtained, accurate and natural man-machine interaction is realized, and the man-machine interaction performance is improved.
In an alternative embodiment, the determining module 402 is configured to:
performing intention understanding on the interaction information to obtain interaction intention;
determining a retrieval source corresponding to the interaction intention as the target retrieval source; different interaction intents correspond to different search sources.
In an alternative embodiment, when the determining module 402 performs intention understanding on the interaction information, the determining module is configured to:
and carrying out intention understanding on the interaction information through the natural language processing model.
In an alternative embodiment, the retrieving module 403 is configured to:
if the interaction intention is a first interaction intention, retrieving associated information of the interaction information in a knowledge base;
and if the interaction intention is not the first interaction intention, invoking a third party information source corresponding to the non-first interaction intention to retrieve the associated information of the interaction information.
In an alternative embodiment, the first interaction intention includes: intent related to product sales; the retrieving module 403 is configured to, when retrieving the association information of the interaction information in the knowledge base:
and retrieving the product information associated with the interaction information from a product knowledge base.
In an alternative embodiment, the non-first interactive intention includes: the information source identification and the target information required by the information retrieval of the third party information source; the retrieving module 403 is configured to, when invoking a third party source corresponding to the non-first interaction intention:
and inputting the target information into a third-party information source corresponding to the information source identifier based on the information source identifier to obtain the associated information of the interaction information fed back by the third-party information source.
In an alternative embodiment, the output module 405 is configured to:
synthesizing the response information into voice;
generating an avatar video corresponding to the voice;
at least the speech is output in synchronization with the avatar video.
In an alternative embodiment, the output module 405 is further configured to, before obtaining the input interaction information:
when a target face is detected for the first time, outputting target voice information; the target voice information is used for guiding the user represented by the target face to perform man-machine interaction;
the obtaining module 401 is configured to:
collecting voice information of the user;
and carrying out voice recognition on the voice information to obtain the interaction information.
In an alternative embodiment, the man-machine interaction device further comprises:
a state acquisition module for acquiring a face image of the user; processing the face image to obtain state information of the user;
the color rendering module 404 is configured to:
and carrying out color rendering processing on the associated information based on the state information of the user through a natural voice model, wherein specifically, a color rendering instruction, the state information of the user, the interaction information and the associated information are sequentially input into the natural language processing model to obtain the associated information after color rendering generated by the natural language processing model as the response information.
The man-machine interaction device provided by the embodiment of the application can be applied to man-machine interaction equipment such as PC terminals, mobile terminals, all-in-one machines, cloud platforms, servers, server clusters and the like. Optionally, fig. 5 shows a block diagram of a hardware structure of the man-machine interaction device, and referring to fig. 5, the hardware structure of the man-machine interaction device may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;
in the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete the communication with each other through the communication bus 4;
processor 1 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application, etc.;
the memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory) or the like, such as at least one magnetic disk memory;
wherein the memory stores a program, the processor is operable to invoke the program stored in the memory, the program operable to:
obtaining input interaction information;
determining a target retrieval source based on the interaction information;
retrieving associated information of the interactive information through the target retrieval source;
performing color rendering processing on the associated information through a natural language processing model to generate response information of the interaction information;
and outputting the response information.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
The embodiment of the present application also provides a storage medium storing a program adapted to be executed by a processor, the program being configured to:
obtaining input interaction information;
determining a target retrieval source based on the interaction information;
retrieving associated information of the interactive information through the target retrieval source;
performing color rendering processing on the associated information through a natural language processing model to generate response information of the interaction information;
and outputting the response information.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. A human-computer interaction method, comprising:
obtaining input interaction information;
determining a target retrieval source based on the interaction information;
retrieving associated information of the interactive information through the target retrieval source;
performing color rendering processing on the associated information through a natural language processing model to generate response information of the interaction information;
and outputting the response information.
2. The method of claim 1, wherein the determining a target retrieval source based on the interaction information comprises:
performing intention understanding on the interaction information to obtain interaction intention;
determining a retrieval source corresponding to the interaction intention as the target retrieval source; different interaction intents correspond to different search sources.
3. The method of claim 2, wherein the intent understanding of the interaction information comprises:
and carrying out intention understanding on the interaction information through the natural language processing model.
4. The method of claim 2, wherein said retrieving associated information of said interaction information by said target retrieval source comprises:
if the interaction intention is a first interaction intention, retrieving associated information of the interaction information in a knowledge base;
and if the interaction intention is not the first interaction intention, invoking a third party information source corresponding to the non-first interaction intention to retrieve the associated information of the interaction information.
5. The method of claim 4, wherein the first interaction intent comprises: intent related to product sales; the retrieving the associated information of the interaction information in the knowledge base comprises the following steps:
and retrieving the product information associated with the interaction information from a product knowledge base.
6. The method of claim 4, wherein the non-first interaction intent comprises: the information source identification and the target information required by the information retrieval of the third party information source; the calling the third party information source corresponding to the non-first interaction intention comprises the following steps:
and inputting the target information into a third-party information source corresponding to the information source identifier based on the information source identifier to obtain the associated information of the interaction information fed back by the third-party information source.
7. The method of claim 1, further comprising, prior to obtaining the entered interaction information:
when a target face is detected for the first time, outputting target voice information; the target voice information is used for guiding the user represented by the target face to perform man-machine interaction;
the obtaining the input interaction information comprises the following steps:
collecting voice information of the user;
and carrying out voice recognition on the voice information to obtain the interaction information.
8. The method as recited in claim 7, further comprising:
obtaining a facial image of the user; processing the face image to obtain state information of the user;
the color rendering processing is carried out on the associated information through a natural language processing model, and response information of the interaction information is generated, and the method comprises the following steps:
and sequentially inputting the color rendering instruction, the state information of the user, the interaction information and the association information into the natural language processing model to obtain color rendering associated information generated by the natural language processing model as the response information.
9. A human-machine interaction device, comprising:
the acquisition module is used for acquiring the input interaction information;
the determining module is used for determining a target retrieval source based on the interaction information;
the retrieval module is used for retrieving the associated information of the interaction information through the target retrieval source;
the color rendering module is used for rendering the associated information through a natural language processing model to generate response information of the interaction information;
and the output module is used for outputting the response information.
10. A man-machine interaction device, comprising a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the human-computer interaction method according to any one of claims 1-8.
11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the human interaction method according to any of claims 1-8.
CN202311026638.8A 2023-08-15 2023-08-15 Man-machine interaction method, device, equipment and storage medium Pending CN116737883A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311026638.8A CN116737883A (en) 2023-08-15 2023-08-15 Man-machine interaction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311026638.8A CN116737883A (en) 2023-08-15 2023-08-15 Man-machine interaction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116737883A true CN116737883A (en) 2023-09-12

Family

ID=87918995

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311026638.8A Pending CN116737883A (en) 2023-08-15 2023-08-15 Man-machine interaction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116737883A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117370493A (en) * 2023-09-22 2024-01-09 中国司法大数据研究院有限公司 Intelligent interaction method and device for system based on large language model

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070136222A1 (en) * 2005-12-09 2007-06-14 Microsoft Corporation Question and answer architecture for reasoning and clarifying intentions, goals, and needs from contextual clues and content
CN107203265A (en) * 2017-05-17 2017-09-26 广东美的制冷设备有限公司 Information interacting method and device
CN109065035A (en) * 2018-09-06 2018-12-21 珠海格力电器股份有限公司 Information interacting method and device
CN109101545A (en) * 2018-06-29 2018-12-28 北京百度网讯科技有限公司 Natural language processing method, apparatus, equipment and medium based on human-computer interaction
CN109857848A (en) * 2019-01-18 2019-06-07 深圳壹账通智能科技有限公司 Interaction content generation method, device, computer equipment and storage medium
CN113282725A (en) * 2021-05-21 2021-08-20 北京市商汤科技开发有限公司 Dialogue interaction method and device, electronic equipment and storage medium
CN113434647A (en) * 2021-06-18 2021-09-24 竹间智能科技(上海)有限公司 Man-machine interaction method, system and storage medium
CN113836172A (en) * 2021-09-30 2021-12-24 深圳追一科技有限公司 Interaction method, interaction device, electronic equipment, storage medium and computer program product
WO2022165109A1 (en) * 2021-01-28 2022-08-04 Embodied, Inc. Methods and systems enabling natural language processing, understanding and generation
CN115525753A (en) * 2022-10-26 2022-12-27 上海交通大学 Task-oriented multi-turn dialogue method and system based on 1+ N
CN116010569A (en) * 2022-12-28 2023-04-25 科大讯飞股份有限公司 Online answering method, system, electronic equipment and storage medium
CN116303980A (en) * 2023-05-19 2023-06-23 无码科技(杭州)有限公司 Large language model knowledge enhancement method, system, electronic equipment and medium
CN116541493A (en) * 2023-04-07 2023-08-04 平安科技(深圳)有限公司 Interactive response method, device, equipment and storage medium based on intention recognition
CN116561277A (en) * 2023-05-05 2023-08-08 科大讯飞股份有限公司 Knowledge question-answering method, device, equipment and storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070136222A1 (en) * 2005-12-09 2007-06-14 Microsoft Corporation Question and answer architecture for reasoning and clarifying intentions, goals, and needs from contextual clues and content
CN107203265A (en) * 2017-05-17 2017-09-26 广东美的制冷设备有限公司 Information interacting method and device
CN109101545A (en) * 2018-06-29 2018-12-28 北京百度网讯科技有限公司 Natural language processing method, apparatus, equipment and medium based on human-computer interaction
CN109065035A (en) * 2018-09-06 2018-12-21 珠海格力电器股份有限公司 Information interacting method and device
CN109857848A (en) * 2019-01-18 2019-06-07 深圳壹账通智能科技有限公司 Interaction content generation method, device, computer equipment and storage medium
WO2022165109A1 (en) * 2021-01-28 2022-08-04 Embodied, Inc. Methods and systems enabling natural language processing, understanding and generation
CN113282725A (en) * 2021-05-21 2021-08-20 北京市商汤科技开发有限公司 Dialogue interaction method and device, electronic equipment and storage medium
CN113434647A (en) * 2021-06-18 2021-09-24 竹间智能科技(上海)有限公司 Man-machine interaction method, system and storage medium
CN113836172A (en) * 2021-09-30 2021-12-24 深圳追一科技有限公司 Interaction method, interaction device, electronic equipment, storage medium and computer program product
CN115525753A (en) * 2022-10-26 2022-12-27 上海交通大学 Task-oriented multi-turn dialogue method and system based on 1+ N
CN116010569A (en) * 2022-12-28 2023-04-25 科大讯飞股份有限公司 Online answering method, system, electronic equipment and storage medium
CN116541493A (en) * 2023-04-07 2023-08-04 平安科技(深圳)有限公司 Interactive response method, device, equipment and storage medium based on intention recognition
CN116561277A (en) * 2023-05-05 2023-08-08 科大讯飞股份有限公司 Knowledge question-answering method, device, equipment and storage medium
CN116303980A (en) * 2023-05-19 2023-06-23 无码科技(杭州)有限公司 Large language model knowledge enhancement method, system, electronic equipment and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117370493A (en) * 2023-09-22 2024-01-09 中国司法大数据研究院有限公司 Intelligent interaction method and device for system based on large language model

Similar Documents

Publication Publication Date Title
CN110647636B (en) Interaction method, interaction device, terminal equipment and storage medium
KR20180025121A (en) Method and apparatus for inputting information
JP2021168139A (en) Method, device, apparatus and medium for man-machine interactions
US20210160581A1 (en) Method and data processing apparatus
CN112309365B (en) Training method and device of speech synthesis model, storage medium and electronic equipment
US11749255B2 (en) Voice question and answer method and device, computer readable storage medium and electronic device
CN113067953A (en) Customer service method, system, device, server and storage medium
CN113378583A (en) Dialogue reply method and device, dialogue model training method and device, and storage medium
CN116737883A (en) Man-machine interaction method, device, equipment and storage medium
CN113703585A (en) Interaction method, interaction device, electronic equipment and storage medium
CN115840841A (en) Multi-modal dialog method, device, equipment and storage medium
CN114064943A (en) Conference management method, conference management device, storage medium and electronic equipment
CN110517672B (en) User intention recognition method, user instruction execution method, system and equipment
CN113542797A (en) Interaction method and device in video playing and computer readable storage medium
CN113205569A (en) Image drawing method and device, computer readable medium and electronic device
CN117352132A (en) Psychological coaching method, device, equipment and storage medium
CN111427444B (en) Control method and device of intelligent device
CN111415662A (en) Method, apparatus, device and medium for generating video
CN113763925B (en) Speech recognition method, device, computer equipment and storage medium
CN115167733A (en) Method and device for displaying live broadcast resources, electronic equipment and storage medium
CN113066473A (en) Voice synthesis method and device, storage medium and electronic equipment
CN112820265A (en) Speech synthesis model training method and related device
CN117373455B (en) Audio and video generation method, device, equipment and storage medium
CN116913278B (en) Voice processing method, device, equipment and storage medium
CN114781401A (en) Data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination