CN116737883A - Man-machine interaction method, device, equipment and storage medium - Google Patents
Man-machine interaction method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN116737883A CN116737883A CN202311026638.8A CN202311026638A CN116737883A CN 116737883 A CN116737883 A CN 116737883A CN 202311026638 A CN202311026638 A CN 202311026638A CN 116737883 A CN116737883 A CN 116737883A
- Authority
- CN
- China
- Prior art keywords
- information
- interaction
- source
- target
- natural language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 209
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000003058 natural language processing Methods 0.000 claims abstract description 76
- 238000009877 rendering Methods 0.000 claims abstract description 56
- 230000002452 interceptive effect Effects 0.000 claims abstract description 54
- 230000004044 response Effects 0.000 claims abstract description 49
- 238000012545 processing Methods 0.000 claims abstract description 23
- 238000004590 computer program Methods 0.000 claims description 3
- 230000001815 facial effect Effects 0.000 claims description 3
- 238000012549 training Methods 0.000 description 38
- 238000004891 communication Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000004927 fusion Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004040 coloring Methods 0.000 description 2
- 239000003337 fertilizer Substances 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the application discloses a man-machine interaction method, a man-machine interaction device, man-machine interaction equipment and a storage medium, wherein after input interaction information is obtained, a target retrieval source is determined based on the interaction information; retrieving associated information of the interactive information through a target retrieval source; performing color rendering processing on the associated information through a natural language processing model to generate response information of the interaction information; and outputting response information. Based on the application, instead of directly inputting the interactive information into the natural language processing model, the information related to the interactive information is searched in the target search source determined based on the interactive information, and the search source has information updating capability, so the searched related information can be updated information, the related information acquired based on the search source is colored through the natural language processing model, more natural response information is obtained, accurate and natural man-machine interaction is realized, and the man-machine interaction performance is improved.
Description
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a human-computer interaction method, apparatus, device, and storage medium.
Background
With the continuous development of artificial intelligence technology, dialog systems are receiving more and more attention. At present, many industries begin to apply dialogue systems to customer service, intelligent home control, education and other fields so as to improve working efficiency and user experience.
At present, in order to provide a smoother interaction experience, introducing a natural language processing model with memory capability and powerful natural language processing capability into a dialogue system becomes a research hotspot. The current dialogue system directly inputs the information input by the user into the natural language processing model, and feeds back the processing result obtained by the natural language processing model to the user. However, the knowledge learned by the natural language processing model is limited to the knowledge before the training date, and the knowledge cannot be updated in real time and continuously learned, and also cannot be predicted, so when the information input by the user relates to the information after the training date or the future information, the dialog system cannot give correct feedback, and therefore, the man-machine interaction performance of the dialog system still needs to be improved.
Disclosure of Invention
In view of the above, the present application provides a human-computer interaction method, apparatus, device and storage medium to improve human-computer interaction performance.
In order to achieve the above object, the following solutions have been proposed:
a human-machine interaction method, comprising:
obtaining input interaction information;
determining a target retrieval source based on the interaction information;
retrieving associated information of the interactive information through the target retrieval source;
performing color rendering processing on the associated information through a natural language processing model to generate response information of the interaction information;
and outputting the response information.
The method, optionally, the determining a target retrieval source based on the interaction information includes:
performing intention understanding on the interaction information to obtain interaction intention;
determining a retrieval source corresponding to the interaction intention as the target retrieval source; different interaction intents correspond to different search sources.
The method, optionally, of performing intent understanding on the interaction information includes:
and carrying out intention understanding on the interaction information through the natural language processing model.
The method, optionally, the retrieving, by the target retrieval source, the associated information of the interaction information includes:
if the interaction intention is a first interaction intention, retrieving associated information of the interaction information in a knowledge base;
and if the interaction intention is not the first interaction intention, invoking a third party information source corresponding to the non-first interaction intention to retrieve the associated information of the interaction information.
The method, optionally, the first interaction intention includes: intent related to product sales; the retrieving the associated information of the interaction information in the knowledge base comprises the following steps:
and retrieving the product information associated with the interaction information from a product knowledge base.
The method, optionally, the non-first interaction intention includes: the information source identification and the target information required by the information retrieval of the third party information source; the calling the third party information source corresponding to the non-first interaction intention comprises the following steps:
and inputting the target information into a third-party information source corresponding to the information source identifier based on the information source identifier to obtain the associated information of the interaction information fed back by the third-party information source.
The method, before obtaining the input interaction information, optionally further includes:
when a target face is detected for the first time, outputting target voice information; the target voice information is used for guiding the user represented by the target face to perform man-machine interaction;
the obtaining the input interaction information comprises the following steps:
collecting voice information of the user;
and carrying out voice recognition on the voice information to obtain the interaction information.
The method, optionally, further comprises:
obtaining a facial image of the user; processing the face image to obtain state information of the user;
the color rendering processing is carried out on the associated information through a natural language processing model, and response information of the interaction information is generated, and the method comprises the following steps:
and sequentially splicing the color rendering instruction, the state information of the user and the associated information, and then inputting the spliced information into the natural language processing model to obtain the color rendering associated information generated by the natural language processing model as the response information.
A human-machine interaction device, comprising:
the acquisition module is used for acquiring the input interaction information;
the determining module is used for determining a target retrieval source based on the interaction information;
the retrieval module is used for retrieving the associated information of the interaction information through the target retrieval source;
the color rendering module is used for rendering the associated information through a natural language processing model to generate response information of the interaction information;
and the output module is used for outputting the response information.
A man-machine interaction device comprises a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the human-computer interaction method according to any one of the above.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the human-machine interaction method of any of the preceding claims.
From the above technical solution, it can be seen that the man-machine interaction method, apparatus, device and storage medium provided by the embodiments of the present application determine a target retrieval source based on interaction information after obtaining input interaction information; retrieving associated information of the interactive information through a target retrieval source; performing color rendering processing on the associated information through a natural language processing model to generate response information of the interaction information; and outputting response information. Based on the application, instead of directly inputting the interactive information into the natural language processing model, the information related to the interactive information is searched in the target search source determined based on the interactive information, and the search source has information updating capability, so the searched related information can be updated information, the related information acquired based on the search source is colored through the natural language processing model, more natural response information is obtained, accurate and natural man-machine interaction is realized, and the man-machine interaction performance is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an implementation of a human-computer interaction method disclosed in an embodiment of the present application;
FIG. 2 is a flow chart of one implementation of determining a target search source based on interaction information according to an embodiment of the present application;
FIG. 3 is an example of a system architecture diagram of a human-machine interaction scheme disclosed in an embodiment of the present application;
fig. 4 is a schematic structural diagram of a man-machine interaction device according to an embodiment of the present application;
fig. 5 is a block diagram of a hardware structure of a man-machine interaction device according to an embodiment of the present application.
Detailed Description
Before describing the scheme of the application, related concepts are explained.
Instruction (Prompt): when the user performs a dialogue with an AI (such as a large language model), the instruction to be sent to the AI can be a text description, such as "please help me recommend a popular music" input during the dialogue with the AI, or a parameter description according to a certain format, such as making the AI perform drawing according to a certain format, and describing related drawing parameters.
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The scheme of the application is provided for improving the man-machine interaction performance.
As shown in fig. 1, a flowchart for implementing a man-machine interaction method according to an embodiment of the present application may include:
step S101: and obtaining the input interaction information.
The interaction information may also be referred to as user instructions, user questions, and the like.
Alternatively, the interactive information may be entered into the text editing box by a user via a text input device (e.g., keyboard, touch screen, etc.), or may be entered in a voice manner. Based on this, the input interactive information may be obtained through a text editing box, or the collected voice data may be subjected to voice recognition through a voice recognition device to obtain the input interactive information.
Step S102: a target retrieval source is determined based on the interaction information.
The target retrieval source is an information retrieval source with information real-time updating capability, for example, the target retrieval source can be a knowledge base, or can be a third party information source and the like.
One information retrieval source corresponding to the interactive information may be determined as a target retrieval source among a plurality of information retrieval sources having information updating capability in real time.
The target search sources determined based on the different interaction information may be the same or different.
Step S103: and retrieving the associated information of the interactive information through the target retrieval source.
In the application, the searched associated information is the initial response information of the interaction information.
Step S104: and performing color rendering processing on the associated information through the natural language processing model to generate response information of the interaction information.
Because the retrieved associated information (namely the initial response information) is harder, if the associated information is directly output, the dialogue experience of the user is poorer, and therefore, the associated information is colored through a natural language processing model to generate smoother and natural response information.
The natural language processing model may be a generative model, such as a large language model. As an example, generative models may include, but are not limited to: the model of the transducer architecture may be, for example, GPT (generating Pre-Training) -3, GPT-4, etc. The generative model may also be other generative models, such as PaLM (Pathways Language Model), T5 (Text-to-Text Transfer Transformer), etc.
As an example, the color rendering instruction and the associated information may be sequentially input into the natural language processing model, so as to obtain the associated information after color rendering, that is, the response information, output by the natural language model. The rendering instruction instructs the natural language processing model to render the associated information.
Or alternatively, the process may be performed,
the color rendering instruction, the interaction information and the associated information can be sequentially input into the natural voice processing model to obtain the associated information after color rendering, namely response information, which is output by the natural language model. The color rendering instruction instructs the natural language processing model to render the associated information according to the interaction information. The color rendering results thus obtained are more contextually relevant.
Step S105: and outputting response information.
Alternatively, the response information may be displayed directly.
Or alternatively, the process may be performed,
at least part of the response information can be synthesized into voice, and the voice is output; alternatively, the response information is outputted in synchronization with the voice. That is, only the voice may be played, or the response information may be displayed while the voice is played.
Or alternatively, the process may be performed,
at least part of the response information can be synthesized into voice, and an avatar video corresponding to the voice is generated; synchronizing the voice with the virtual image video; or, the response information, the voice and the avatar video are synchronously output. That is, the voice may be played only while the avatar video is played, or the response information may be displayed while the avatar video and the voice are played.
According to the man-machine interaction method provided by the embodiment of the application, after the input interaction information is obtained, a target retrieval source is determined based on the interaction information; retrieving associated information of the interactive information through a target retrieval source; performing color rendering processing on the associated information through a natural language processing model to generate response information of the interaction information; and outputting response information. Based on the application, instead of directly inputting the interactive information into the natural language processing model, the information related to the interactive information is searched in the target search source determined based on the interactive information, and the search source has the information real-time updating capability, so the searched related information can be updated information, the related information acquired based on the search source is colored through the natural language processing model, more natural response information is obtained, accurate and natural man-machine interaction is realized, and the man-machine interaction performance is improved.
The existence of the retrieval source ensures the accuracy of the reply content, overcomes the defect that the natural language processing model cannot update information in real time by combining the retrieval source with the natural language processing model, and reduces the randomness of the response information generated by the natural language processing model, thereby improving the fluency and accuracy of man-machine interaction.
In an alternative embodiment, a flowchart of an implementation of determining the target retrieval source based on the interaction information is shown in fig. 2, and may include:
step S201: and carrying out intention understanding on the interaction information to obtain the interaction intention.
Alternatively, the intent recognition model (i.e., a model dedicated to intent understanding) trained in advance may be used to understand the intent of the interaction information, so as to obtain the interaction intent of the interaction information.
Or alternatively, the process may be performed,
the intent understanding of the interactive information can be performed by using the natural language processing model to obtain the interactive intent. As an example, the intent understanding instruction and the interaction information may be sequentially input into the above-mentioned natural language processing model, so as to obtain the interaction intent output by the natural language processing model. The intent understanding instruction instructs the natural language processing model to understand intent of the interactive information.
Step S202: determining a retrieval source corresponding to the interaction intention as a target retrieval source; different interaction intents correspond to different search sources.
The application presets the corresponding relation between different interactive intentions and search sources, and after the interactive intentions of the interactive information are obtained, the search source corresponding to the interactive intentions of the interactive information is determined as a target search source.
Optionally, if the interaction intention of the interaction information is the first interaction intention, the associated information of the interaction information is retrieved in the knowledge base.
As an example, the first interaction intent may be: the intention related to the product sales, wherein the knowledge base corresponding to the first interaction intention is a product knowledge base; based on this, the above-mentioned retrieving the association information of the interaction information in the knowledge base may include: product information associated with the interaction information is retrieved from a product knowledge base.
Specifically, the interactive information can be encoded to obtain the encoding characteristics of the interactive information, the encoding characteristics of each knowledge segment in the product knowledge base are obtained (the encoding characteristics of each knowledge segment can be stored in the product knowledge base after being pre-encoded), the association degree of each knowledge segment and the interactive information is calculated based on the encoding characteristics of the interactive information and the encoding characteristics of each knowledge segment, and the knowledge segment with the largest association degree with the interactive information is determined as the product information associated with the interactive information. The association degree of any knowledge segment and the interaction information can be represented by the distance between the coding feature of any knowledge segment and the coding feature of the interaction information.
And if the interaction intention of the interaction information is not the first interaction intention, invoking a third party information source corresponding to the non-first interaction intention to retrieve the associated information of the interaction information.
The different non-first interaction intents correspond to different third party sources.
As an example, the non-first interaction intent includes: the information source identification and the target information required by information retrieval of a third party information source which is not corresponding to the first interaction intention. Wherein the source identifiers characterize third party sources and the different source identifiers characterize different third party sources. For example, the interactive information is "weather in the tomorrow of the fertilizer, the non-first interaction intent may be { skill: weather, time: tomorrow, location: combining fertilizer }; the content weather of the skill field is the information source identifier; the contents "tomorrow" and "syndication" of other fields ("time" field and "place" field) are information required by the weather information source for weather information retrieval.
Based on this, the above-mentioned process of calling the third party source corresponding to the non-first interactive intention to retrieve the associated information of the interactive information may include:
and inputting the target information into a third party information source corresponding to the information source identifier based on the information source identifier, and obtaining the associated information of the interaction information fed back by the third party information source according to the target information.
In an alternative embodiment, the color rendering capabilities of the natural language processing model may be trained based on the knowledge base of the product using a first training data set relating to sales of the product. Each training sample in the first training data set comprises multiple rounds of dialogue data or single round of dialogue data, and further comprises knowledge segments corresponding to each round of dialogue. Each round of dialogue includes user input content and machine reply content, and the corresponding knowledge segments of each round of dialogue are extracted from the product knowledge base. The product knowledge in the product knowledge base may include, but is not limited to: official documents of the product, user forums, customer feedback, etc.
Whether the dialogue data is multi-round dialogue data or single-theory dialogue data is set by simulating a real sales scene, the natural and fluency of the dialogue can be kept through the multi-round dialogue data, the performance of a natural language processing model in practical application is facilitated, the natural language processing model can be integrated into various practical dialogue scenes, and more effective services are provided for users; through the monograph dialogue data, the model has strong knowledge retrieval and extraction capability in practical application, so that various common and complex problems can be answered, and the user requirements are met.
As shown in table 1, an example of a training sample is provided in an embodiment of the present application:
TABLE 1
Included in the training samples shown in table 1 are multiple rounds of dialogue data, and knowledge pieces associated with each round of dialogue data.
As shown in table 2, an example of two training samples provided in an embodiment of the present application:
TABLE 2
Both training samples in table 2 are training samples containing monograph dialogue data and corresponding knowledge segments.
The machine reply content in the training sample can be artificially marked, and can be specifically obtained by artificially coloring the knowledge segments.
Based on the first training data set, both training samples in table 2 are those containing monograph dialogue data and corresponding knowledge segments.
The rendering capabilities of the natural language processing model may be trained as follows:
when training the natural language processing model by using any training sample in the first training data set, corresponding to any round of dialogue data in any training sample, sequentially inputting a color rendering instruction, user content in any round of dialogue data and a knowledge segment corresponding to any round of corresponding data into the natural language model to obtain a color rendering result generated by the natural language processing model, and updating parameters of the natural language processing model by taking machine reply content in any round of dialogue data as a target of the color rendering result. The color rendering instruction instructs the natural language processing model to render the knowledge segments according to the user content.
By training the natural language processing model for multiple rounds of dialogue data and single theory dialogue data, the natural language processing model can better adapt to the user requirements in practical application, intelligently answer questions and efficiently solve user questions.
In an alternative embodiment, the rendering capabilities of the natural language processing model may be trained based on the third party source using a second training data set of information about the third party source. Wherein each training sample in the second training data set includes user input content, machine reply content, and feedback content for the user input content by the third party source. The machine reply content can be artificially marked, and specifically can be obtained by artificially coloring feedback content of a third-party information source aiming at user input content.
As shown in table 3, an example of a training sample is provided in an embodiment of the present application:
TABLE 3 Table 3
Based on the second training data set, the color rendering capability of the natural language processing model can be obtained through training in the following manner:
when training the natural language processing model by using any training sample in the second training data set, sequentially inputting a color rendering instruction, user content in any training sample and feedback content of a third party information source in any training sample into the natural language model to obtain a color rendering result generated by the natural language processing model, and updating parameters of the natural language processing model by taking a machine reply of which the color rendering result is close to any training sample as a target. The color rendering instruction instructs the natural language processing model to perform color rendering processing on feedback content of the third party information source according to user content.
The intent understanding capabilities of the natural language processing model may be trained by a third training data set. Each sample in the third training data set includes user-entered content, and corresponding intent, with each user-entered content corresponding to either a first intent (e.g., sales-related) or a non-first intent (e.g., weather query, flight query, hotel query or stock query, etc.).
The intent understanding capabilities of the natural language processing model may be trained as follows:
when training the natural language processing model by using any training sample in the third training data set, sequentially inputting an intention understanding instruction and user content in any training sample into the natural language model to obtain an intention understanding result generated by the natural language processing model, and updating parameters of the natural language processing model by taking the intention understanding result approaching to the intention corresponding to the user input content in any training sample as a target. The intent understanding instruction instructs the natural language processing model to understand intent of the user input content.
In an optional embodiment, before obtaining the input interaction information, the man-machine interaction method provided by the embodiment of the application may further include:
when a target face is detected for the first time, outputting target voice information; the target voice information is used for guiding the user of the target face representation to perform man-machine interaction.
The first detection of the target face refers to the first detection of the target face within a preset time period. When the target face is detected for the first time, the target voice information is actively output so as to guide the user to pay attention to and participate in man-machine interaction. For example, the target voice message may be "you good, here xxx camping store, what you want to know, i can make a detailed description to you.
Accordingly, the obtaining the input interaction information may include:
collecting voice information of a user represented by a target face; and carrying out voice recognition on the collected voice information to obtain the interaction information.
That is, after the target face is detected for the first time, the target face can be locked (i.e. face tracking is performed) and the interactive information of the user represented by the target face through voice input can be obtained. Specifically, an image and voice can be acquired at the same time, and face detection is carried out on the acquired image sequence to obtain a face region image sequence; matching each frame of face region image in the face region image with the image of the target face detected for the first time to determine a target image sequence, wherein the target image sequence is a face image sequence of a user represented by the target face; obtaining voice data corresponding to a target image sequence, denoising the voice data corresponding to the target image sequence through the target image sequence to obtain voice information of a user represented by a denoised target face (for example, the voice data approaching to the voice data after denoising the voice data can be taken as an acquisition direction, information of fusion voice data and the target image sequence can be obtained as fusion information, the fusion information approaches to the voice data after denoising the voice data, namely, the fusion information is the voice information of the user represented by the denoised target face); and performing voice recognition by using voice information of the user represented by the target face after denoising to obtain a voice recognition result of voice data. The voice recognition result is the interactive information input by the user of the target face representation.
By automatically filtering background noise (non-speech and speech of other users except for the user represented by the target face), the voice signal of one user is intensively processed, so that the smoothness and accuracy of human-computer interaction can be further improved.
In an optional embodiment, the human-computer interaction method provided by the application may further include:
obtaining a face image of a user with a target face representation; the facial image is processed to obtain status information of the user. The status information of the user may include, but is not limited to: gender, age, mood, etc.
The face image can be processed by using a pre-trained state recognition model, so that the state information of the user output by the state recognition model can be obtained.
Correspondingly, the implementation manner of performing the color rendering processing on the related information through the natural language processing model to generate the response information of the interaction information may be as follows:
the related information is subjected to color rendering processing based on the state information of the user through the natural voice model, specifically, the color rendering instruction, the state information of the user, the interaction information and the related information are sequentially spliced and then input into the natural language processing model, and the related information after color rendering generated by the natural language processing model is obtained to serve as response information.
The color rendering instruction instructs the natural language processing model to render the associated information based on the state information and the interaction information of the user, so that the rendered associated information is matched with the state information of the user, and the smoothness and accuracy of human-computer interaction are further improved.
As shown in fig. 3, an example of a system architecture diagram of a man-machine interaction scheme provided by an embodiment of the present application is illustrated by taking a machine performing man-machine interaction as an example of a robot.
The robot collects images and voices in real time;
actively asking: when a user enters the image acquisition range of the robot, the robot can acquire the face image of the user and the voice of the user. The robot actively outputs voice information when determining that the user is close to the robot for the first time within a preset time period according to the collected face image, for example, "you good, in this case, xxx direct store, what you want to know, i can describe you in detail.
Judging the intention of a user: if the user expresses the requirement of the robot through voice, the robot can perform voice recognition on the collected voice of the user to obtain an interactive text; then carrying out intention understanding on the interactive text to obtain interactive intention; the robot can call a special intention understanding model to understand the intention of the interactive text, and can also call the natural language processing model to understand the intention of the interactive text. If the interactive intention is a product sales related intention (abbreviated as sales in fig. 3), the robot invokes a product knowledge base, searching the product knowledge base for product information associated with the interactive text. If the interactive intention is a non-product sales related intention (abbreviated as boring in fig. 3), the robot invokes a third party information source corresponding to the interactive intention to obtain feedback information of the third party information source.
Large model color-rendering reply language: and calling the natural language processing model to carry out color rendering on the product information or the feedback information to obtain a color rendering result.
Multimode reply of robot: the robot synthesizes the color-rendering result into voice; generating an avatar video corresponding to the voice; and synchronously outputting the color rendering result, the voice and the avatar video.
Corresponding to the method embodiment, the embodiment of the present application further provides a man-machine interaction device, as shown in fig. 4, which is a schematic structural diagram of the man-machine interaction device provided by the embodiment of the present application, and may include:
an obtaining module 401, a determining module 402, a retrieving module 403, a color rendering module 404 and an output module 405; wherein, the liquid crystal display device comprises a liquid crystal display device,
the obtaining module 401 is configured to obtain input interaction information;
the determining module 402 is configured to determine a target retrieval source based on the interaction information;
the retrieval module 403 is configured to retrieve association information of the interaction information through the target retrieval source;
the color rendering module 404 is configured to perform color rendering processing on the associated information through a natural language processing model, and generate response information of the interaction information;
the output module 405 is configured to output the response information.
According to the man-machine interaction device provided by the embodiment of the application, after the input interaction information is obtained, a target retrieval source is determined based on the interaction information; retrieving associated information of the interactive information through a target retrieval source; performing color rendering processing on the associated information through a natural language processing model to generate response information of the interaction information; and outputting response information. Based on the application, instead of directly inputting the interactive information into the natural language processing model, the information related to the interactive information is searched in the target search source determined based on the interactive information, and the search source has information updating capability, so the searched related information can be updated information, the related information acquired based on the search source is colored through the natural language processing model, more natural response information is obtained, accurate and natural man-machine interaction is realized, and the man-machine interaction performance is improved.
In an alternative embodiment, the determining module 402 is configured to:
performing intention understanding on the interaction information to obtain interaction intention;
determining a retrieval source corresponding to the interaction intention as the target retrieval source; different interaction intents correspond to different search sources.
In an alternative embodiment, when the determining module 402 performs intention understanding on the interaction information, the determining module is configured to:
and carrying out intention understanding on the interaction information through the natural language processing model.
In an alternative embodiment, the retrieving module 403 is configured to:
if the interaction intention is a first interaction intention, retrieving associated information of the interaction information in a knowledge base;
and if the interaction intention is not the first interaction intention, invoking a third party information source corresponding to the non-first interaction intention to retrieve the associated information of the interaction information.
In an alternative embodiment, the first interaction intention includes: intent related to product sales; the retrieving module 403 is configured to, when retrieving the association information of the interaction information in the knowledge base:
and retrieving the product information associated with the interaction information from a product knowledge base.
In an alternative embodiment, the non-first interactive intention includes: the information source identification and the target information required by the information retrieval of the third party information source; the retrieving module 403 is configured to, when invoking a third party source corresponding to the non-first interaction intention:
and inputting the target information into a third-party information source corresponding to the information source identifier based on the information source identifier to obtain the associated information of the interaction information fed back by the third-party information source.
In an alternative embodiment, the output module 405 is configured to:
synthesizing the response information into voice;
generating an avatar video corresponding to the voice;
at least the speech is output in synchronization with the avatar video.
In an alternative embodiment, the output module 405 is further configured to, before obtaining the input interaction information:
when a target face is detected for the first time, outputting target voice information; the target voice information is used for guiding the user represented by the target face to perform man-machine interaction;
the obtaining module 401 is configured to:
collecting voice information of the user;
and carrying out voice recognition on the voice information to obtain the interaction information.
In an alternative embodiment, the man-machine interaction device further comprises:
a state acquisition module for acquiring a face image of the user; processing the face image to obtain state information of the user;
the color rendering module 404 is configured to:
and carrying out color rendering processing on the associated information based on the state information of the user through a natural voice model, wherein specifically, a color rendering instruction, the state information of the user, the interaction information and the associated information are sequentially input into the natural language processing model to obtain the associated information after color rendering generated by the natural language processing model as the response information.
The man-machine interaction device provided by the embodiment of the application can be applied to man-machine interaction equipment such as PC terminals, mobile terminals, all-in-one machines, cloud platforms, servers, server clusters and the like. Optionally, fig. 5 shows a block diagram of a hardware structure of the man-machine interaction device, and referring to fig. 5, the hardware structure of the man-machine interaction device may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;
in the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete the communication with each other through the communication bus 4;
processor 1 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application, etc.;
the memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory) or the like, such as at least one magnetic disk memory;
wherein the memory stores a program, the processor is operable to invoke the program stored in the memory, the program operable to:
obtaining input interaction information;
determining a target retrieval source based on the interaction information;
retrieving associated information of the interactive information through the target retrieval source;
performing color rendering processing on the associated information through a natural language processing model to generate response information of the interaction information;
and outputting the response information.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
The embodiment of the present application also provides a storage medium storing a program adapted to be executed by a processor, the program being configured to:
obtaining input interaction information;
determining a target retrieval source based on the interaction information;
retrieving associated information of the interactive information through the target retrieval source;
performing color rendering processing on the associated information through a natural language processing model to generate response information of the interaction information;
and outputting the response information.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (11)
1. A human-computer interaction method, comprising:
obtaining input interaction information;
determining a target retrieval source based on the interaction information;
retrieving associated information of the interactive information through the target retrieval source;
performing color rendering processing on the associated information through a natural language processing model to generate response information of the interaction information;
and outputting the response information.
2. The method of claim 1, wherein the determining a target retrieval source based on the interaction information comprises:
performing intention understanding on the interaction information to obtain interaction intention;
determining a retrieval source corresponding to the interaction intention as the target retrieval source; different interaction intents correspond to different search sources.
3. The method of claim 2, wherein the intent understanding of the interaction information comprises:
and carrying out intention understanding on the interaction information through the natural language processing model.
4. The method of claim 2, wherein said retrieving associated information of said interaction information by said target retrieval source comprises:
if the interaction intention is a first interaction intention, retrieving associated information of the interaction information in a knowledge base;
and if the interaction intention is not the first interaction intention, invoking a third party information source corresponding to the non-first interaction intention to retrieve the associated information of the interaction information.
5. The method of claim 4, wherein the first interaction intent comprises: intent related to product sales; the retrieving the associated information of the interaction information in the knowledge base comprises the following steps:
and retrieving the product information associated with the interaction information from a product knowledge base.
6. The method of claim 4, wherein the non-first interaction intent comprises: the information source identification and the target information required by the information retrieval of the third party information source; the calling the third party information source corresponding to the non-first interaction intention comprises the following steps:
and inputting the target information into a third-party information source corresponding to the information source identifier based on the information source identifier to obtain the associated information of the interaction information fed back by the third-party information source.
7. The method of claim 1, further comprising, prior to obtaining the entered interaction information:
when a target face is detected for the first time, outputting target voice information; the target voice information is used for guiding the user represented by the target face to perform man-machine interaction;
the obtaining the input interaction information comprises the following steps:
collecting voice information of the user;
and carrying out voice recognition on the voice information to obtain the interaction information.
8. The method as recited in claim 7, further comprising:
obtaining a facial image of the user; processing the face image to obtain state information of the user;
the color rendering processing is carried out on the associated information through a natural language processing model, and response information of the interaction information is generated, and the method comprises the following steps:
and sequentially inputting the color rendering instruction, the state information of the user, the interaction information and the association information into the natural language processing model to obtain color rendering associated information generated by the natural language processing model as the response information.
9. A human-machine interaction device, comprising:
the acquisition module is used for acquiring the input interaction information;
the determining module is used for determining a target retrieval source based on the interaction information;
the retrieval module is used for retrieving the associated information of the interaction information through the target retrieval source;
the color rendering module is used for rendering the associated information through a natural language processing model to generate response information of the interaction information;
and the output module is used for outputting the response information.
10. A man-machine interaction device, comprising a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the human-computer interaction method according to any one of claims 1-8.
11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the human interaction method according to any of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311026638.8A CN116737883A (en) | 2023-08-15 | 2023-08-15 | Man-machine interaction method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311026638.8A CN116737883A (en) | 2023-08-15 | 2023-08-15 | Man-machine interaction method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116737883A true CN116737883A (en) | 2023-09-12 |
Family
ID=87918995
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311026638.8A Pending CN116737883A (en) | 2023-08-15 | 2023-08-15 | Man-machine interaction method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116737883A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117370493A (en) * | 2023-09-22 | 2024-01-09 | 中国司法大数据研究院有限公司 | Intelligent interaction method and device for system based on large language model |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070136222A1 (en) * | 2005-12-09 | 2007-06-14 | Microsoft Corporation | Question and answer architecture for reasoning and clarifying intentions, goals, and needs from contextual clues and content |
CN107203265A (en) * | 2017-05-17 | 2017-09-26 | 广东美的制冷设备有限公司 | Information interacting method and device |
CN109065035A (en) * | 2018-09-06 | 2018-12-21 | 珠海格力电器股份有限公司 | Information interacting method and device |
CN109101545A (en) * | 2018-06-29 | 2018-12-28 | 北京百度网讯科技有限公司 | Natural language processing method, apparatus, equipment and medium based on human-computer interaction |
CN109857848A (en) * | 2019-01-18 | 2019-06-07 | 深圳壹账通智能科技有限公司 | Interaction content generation method, device, computer equipment and storage medium |
CN113282725A (en) * | 2021-05-21 | 2021-08-20 | 北京市商汤科技开发有限公司 | Dialogue interaction method and device, electronic equipment and storage medium |
CN113434647A (en) * | 2021-06-18 | 2021-09-24 | 竹间智能科技(上海)有限公司 | Man-machine interaction method, system and storage medium |
CN113836172A (en) * | 2021-09-30 | 2021-12-24 | 深圳追一科技有限公司 | Interaction method, interaction device, electronic equipment, storage medium and computer program product |
WO2022165109A1 (en) * | 2021-01-28 | 2022-08-04 | Embodied, Inc. | Methods and systems enabling natural language processing, understanding and generation |
CN115525753A (en) * | 2022-10-26 | 2022-12-27 | 上海交通大学 | Task-oriented multi-turn dialogue method and system based on 1+ N |
CN116010569A (en) * | 2022-12-28 | 2023-04-25 | 科大讯飞股份有限公司 | Online answering method, system, electronic equipment and storage medium |
CN116303980A (en) * | 2023-05-19 | 2023-06-23 | 无码科技(杭州)有限公司 | Large language model knowledge enhancement method, system, electronic equipment and medium |
CN116541493A (en) * | 2023-04-07 | 2023-08-04 | 平安科技(深圳)有限公司 | Interactive response method, device, equipment and storage medium based on intention recognition |
CN116561277A (en) * | 2023-05-05 | 2023-08-08 | 科大讯飞股份有限公司 | Knowledge question-answering method, device, equipment and storage medium |
-
2023
- 2023-08-15 CN CN202311026638.8A patent/CN116737883A/en active Pending
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070136222A1 (en) * | 2005-12-09 | 2007-06-14 | Microsoft Corporation | Question and answer architecture for reasoning and clarifying intentions, goals, and needs from contextual clues and content |
CN107203265A (en) * | 2017-05-17 | 2017-09-26 | 广东美的制冷设备有限公司 | Information interacting method and device |
CN109101545A (en) * | 2018-06-29 | 2018-12-28 | 北京百度网讯科技有限公司 | Natural language processing method, apparatus, equipment and medium based on human-computer interaction |
CN109065035A (en) * | 2018-09-06 | 2018-12-21 | 珠海格力电器股份有限公司 | Information interacting method and device |
CN109857848A (en) * | 2019-01-18 | 2019-06-07 | 深圳壹账通智能科技有限公司 | Interaction content generation method, device, computer equipment and storage medium |
WO2022165109A1 (en) * | 2021-01-28 | 2022-08-04 | Embodied, Inc. | Methods and systems enabling natural language processing, understanding and generation |
CN113282725A (en) * | 2021-05-21 | 2021-08-20 | 北京市商汤科技开发有限公司 | Dialogue interaction method and device, electronic equipment and storage medium |
CN113434647A (en) * | 2021-06-18 | 2021-09-24 | 竹间智能科技(上海)有限公司 | Man-machine interaction method, system and storage medium |
CN113836172A (en) * | 2021-09-30 | 2021-12-24 | 深圳追一科技有限公司 | Interaction method, interaction device, electronic equipment, storage medium and computer program product |
CN115525753A (en) * | 2022-10-26 | 2022-12-27 | 上海交通大学 | Task-oriented multi-turn dialogue method and system based on 1+ N |
CN116010569A (en) * | 2022-12-28 | 2023-04-25 | 科大讯飞股份有限公司 | Online answering method, system, electronic equipment and storage medium |
CN116541493A (en) * | 2023-04-07 | 2023-08-04 | 平安科技(深圳)有限公司 | Interactive response method, device, equipment and storage medium based on intention recognition |
CN116561277A (en) * | 2023-05-05 | 2023-08-08 | 科大讯飞股份有限公司 | Knowledge question-answering method, device, equipment and storage medium |
CN116303980A (en) * | 2023-05-19 | 2023-06-23 | 无码科技(杭州)有限公司 | Large language model knowledge enhancement method, system, electronic equipment and medium |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117370493A (en) * | 2023-09-22 | 2024-01-09 | 中国司法大数据研究院有限公司 | Intelligent interaction method and device for system based on large language model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110647636B (en) | Interaction method, interaction device, terminal equipment and storage medium | |
KR20180025121A (en) | Method and apparatus for inputting information | |
JP2021168139A (en) | Method, device, apparatus and medium for man-machine interactions | |
US20210160581A1 (en) | Method and data processing apparatus | |
CN112309365B (en) | Training method and device of speech synthesis model, storage medium and electronic equipment | |
US11749255B2 (en) | Voice question and answer method and device, computer readable storage medium and electronic device | |
CN113067953A (en) | Customer service method, system, device, server and storage medium | |
CN113378583A (en) | Dialogue reply method and device, dialogue model training method and device, and storage medium | |
CN116737883A (en) | Man-machine interaction method, device, equipment and storage medium | |
CN113703585A (en) | Interaction method, interaction device, electronic equipment and storage medium | |
CN115840841A (en) | Multi-modal dialog method, device, equipment and storage medium | |
CN114064943A (en) | Conference management method, conference management device, storage medium and electronic equipment | |
CN110517672B (en) | User intention recognition method, user instruction execution method, system and equipment | |
CN113542797A (en) | Interaction method and device in video playing and computer readable storage medium | |
CN113205569A (en) | Image drawing method and device, computer readable medium and electronic device | |
CN117352132A (en) | Psychological coaching method, device, equipment and storage medium | |
CN111427444B (en) | Control method and device of intelligent device | |
CN111415662A (en) | Method, apparatus, device and medium for generating video | |
CN113763925B (en) | Speech recognition method, device, computer equipment and storage medium | |
CN115167733A (en) | Method and device for displaying live broadcast resources, electronic equipment and storage medium | |
CN113066473A (en) | Voice synthesis method and device, storage medium and electronic equipment | |
CN112820265A (en) | Speech synthesis model training method and related device | |
CN117373455B (en) | Audio and video generation method, device, equipment and storage medium | |
CN116913278B (en) | Voice processing method, device, equipment and storage medium | |
CN114781401A (en) | Data processing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |