CN116187346A

CN116187346A - Man-machine interaction method, device, system and medium

Info

Publication number: CN116187346A
Application number: CN202310497513.7A
Authority: CN
Inventors: 王英; 李伟
Original assignee: 4u Beijing Technology Co ltd
Current assignee: 4u Beijing Technology Co ltd
Priority date: 2023-05-05
Filing date: 2023-05-05
Publication date: 2023-05-30

Abstract

The application provides a man-machine interaction method, a device, a system and a medium, wherein the method comprises the following steps: responding to an inquiry request input by a user, and acquiring history record data corresponding to the user; judging whether a context association exists between the query request and the history data; a query response is generated based on the query request and the history data in the presence of a contextual association between the query request and the history data, and otherwise the query response is generated based only on the query request. The method and the device solve the technical problem that in the existing man-machine conversation, the answer is inaccurate or incomplete due to the fact that the prior conversation history is ignored only by considering the current user input.

Description

Man-machine interaction method, device, system and medium

Technical Field

The application relates to the technical field of data processing, in particular to a man-machine interaction method, a device, a system and a medium.

Background

Virtual digital people are virtual agents constructed by artificial intelligence and natural language processing techniques that simulate the language, behavior, and thinking of humans to provide a range of services such as customer support, sales concierge, calendar management, financial advisor, branded angel, healthcare advisor, digital influencer, and data input and processing.

Virtual digital humans are typically driven by artificial intelligence and machine learning techniques, and can understand the meaning and intent of human language through natural language processing techniques. These virtual digital persons may communicate using voice or text, and may perform tasks according to the user's requirements and inputs. The virtual digital person may be programmed to recognize and respond to specific instructions, may quickly process large amounts of data, and provide information and advice when needed.

The virtual digital person typically works via the internet, although it may also work locally at the terminal. The user may interact with the virtual digital person through a virtual digital person device, such as a preset fixed terminal, cell phone, tablet computer, or computer, without having to conduct face-to-face communication. The virtual digital person can also work in different time zones and places, and provide services at any time when the user needs.

The interaction between the virtual digital person and the user is realized based on natural language processing and machine learning technologies, and the technologies can enable the virtual digital person to understand the intention of the user and respond correspondingly. However, many existing virtual digital person systems only consider current user inputs, ignoring previous dialog history. This may result in inaccurate or incomplete answers. For example, a user may ask a virtual digital person a complex question, but since the virtual digital person only considers current inputs, it may give an incomplete or erroneous answer because it does not take into account previous dialog history and context.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the application provides a man-machine interaction method, a device, a system and a medium, which are used for at least solving the technical problem of inaccurate or incomplete answer caused by the fact that the prior dialogue history is ignored only by considering the current user input in the existing man-machine dialogue.

According to an aspect of the embodiments of the present application, there is provided a human-computer interaction method, including: responding to an inquiry request input by a user, and acquiring history record data corresponding to the user; judging whether a context association exists between the query request and the history data; a query response is generated based on the query request and the history data in the presence of a contextual association between the query request and the history data, and otherwise the query response is generated based only on the query request.

According to another aspect of the embodiments of the present application, there is also provided a human-computer interaction device, including: the acquisition module is configured to respond to an inquiry request input by a user and acquire history record data corresponding to the user; a determination module configured to determine whether a contextual association exists between the query request and the history data; and a response module configured to generate a query response based on the query request and the history data if a contextual association exists between the query request and the history data, and to generate a query response based on the query request otherwise.

According to still another aspect of the embodiments of the present application, there is further provided a human-computer interaction system, including: a human-machine interaction device as described above; and a rendering device configured to render a virtual digital person, wherein the virtual digital person is used for outputting the query response of the man-machine interaction device.

In the embodiment of the application, responding to an inquiry request input by a user, and acquiring history record data corresponding to the user; judging whether a context association exists between the query request and the history data; and generating a query response for responding to the query request based on the query request and the history data under the condition that the context correlation exists between the query request and the history data, otherwise, generating the query response based on the query request, thereby solving the technical problem of inaccurate or incomplete answer caused by the fact that the prior dialogue history is ignored only by considering the current user input in the existing man-machine dialogue.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a flow chart of a human-machine interaction method according to an embodiment of the present application;

FIG. 2 is a flow chart of a method of human-machine interaction of a virtual digital person according to an embodiment of the present application;

FIG. 3 is a flow chart of a method of determining whether a context association exists according to an embodiment of the present application;

FIG. 4 is a flow chart of a method for text analysis and semantic analysis to obtain corresponding text contexts according to embodiments of the present application;

FIG. 5 is a flow chart of a method of feature extraction according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a human-machine interaction device according to an embodiment of the present application;

FIG. 7 is a schematic architecture diagram of a human-machine interaction system according to an embodiment of the present application;

fig. 8 shows a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Wherein the above figures include the following reference numerals:

1001. a CPU; 1002. a ROM; 1003. a RAM; 1004. a bus; 1005. an I/O interface; 1006. an input section; 1007. an output section; 1008. a storage section; 1009. a communication section; 1010. a driver; 1011. a removable medium; 100. an interactive system; 101. a first terminal device; 102. a second terminal device; 103. a third terminal device; 104. a network; 105. a server; 106. a dynamic catching device; 1062. a dynamic catching helmet; 1064. dynamic catching clothes; 1066. a dynamic catching glove; 62. an acquisition module; 64. a judging module; 66. and a response module.

Detailed Description

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

The relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present application unless it is specifically stated otherwise. Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description. Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate. In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

Example 1

The embodiment of the application provides a man-machine interaction method, as shown in fig. 1, comprising the following steps:

step S102, responding to an inquiry request input by a user, and acquiring history record data corresponding to the user.

And after receiving a query request input by a user, acquiring historical record data corresponding to the user. For example, acquiring an identity feature capable of identifying the identity of the user, and acquiring the history data corresponding to the identity feature based on the identity feature, wherein the identity feature comprises a biological feature, a social feature or a behavioral feature of the user; or, acquiring a history record in a preset time period, and taking the history record in the preset time period as the history record data.

In this embodiment, by acquiring the history data corresponding to the user, the user's interests, preferences, behavior habits, and other information can be better known, so as to provide more personalized services for the user. For example, in a search engine, search results that better match the interests of a user may be recommended to the user based on the user's historical search records; in the e-commerce platform, based on the historical purchase record of the user, commodities which better accord with the preference of the user can be recommended to the user. In addition, the efficiency of the system can be improved, and the historical record data of all users are prevented from being scanned and analyzed, so that the system resources are saved.

Step S104, judging whether a context association exists between the query request and the history data.

Firstly, respectively carrying out text analysis and semantic analysis on the historical record data and the query request by using a natural language processing technology to obtain a corresponding text context. For example, decomposing the history data and sentences in the query request into words or phrases, respectively; part-of-speech tagging is performed on each word or phrase to determine grammatical roles of each word or phrase in context; based on the natural language processing technology, analyzing the semantic of the historical record data and the query request to obtain the corresponding text context, wherein the semantic comprises the subject, emotion and semantic relation of the sentence.

The embodiment can better understand the meaning and the relation in the text by using the natural language processing technology and carrying out semantic analysis on the historical record data and the query request, thereby more accurately answering the questions of the user. Moreover, through analyzing the semantics, the intention and the demand of the user can be better understood, so that the questions of the user can be better answered, and the satisfaction degree of the user is improved; in addition, through carrying out emotion analysis on the historical record data and the query request, the emotion state of the user can be better known, so that the feedback mode of the system is adjusted according to the emotion state, and the user experience is improved.

Then, the history data and the query request are encoded with a word vector, and a similarity between the history data and the query request is calculated based on the encoding. For example, the sentences in the history data and the query request are decomposed into words or phrases, the word vector representation of each word or phrase is obtained based on a word vector model, and the word vector representations of all the words or phrases are subjected to weighted average coding to obtain the vector representations of the history data and the query request. For example, calculating a word vector representation for each word or phrase that indicates how frequently the query request and the history data appear; calculating the reciprocal of the frequency of the word vector representation of each word or phrase in a preset corpus; and performing weighted average coding based on the frequency of occurrence and the inverse of the frequency of occurrence. Then, based on the vector representation, a cosine similarity measure is used to calculate a similarity score between the historic data vector and the query request vector.

In this embodiment, word vector encoding is performed on the history data and the query request, and the similarity between them is calculated, so that the recommendation system can better understand the query of the user and make personalized recommendation. Word vector coding may help capture semantic relationships between words and phrases to better understand a user's query. The similarity calculation may help the system better match histories and queries, thereby providing more accurate suggestions and predictions. In addition, this approach may also increase the efficiency of the system because it may reduce the number of redundant queries and recommendations while improving accuracy and precision.

Finally, it is determined whether a contextual association exists between the query request and the history data based on the corresponding text context and the similarity. For example, in the event that the similarity score is greater than a preset threshold, determining that a contextual association exists between the query request and the historic data; and under the condition that the similarity score is smaller than or equal to the preset threshold value, determining that no context correlation exists between the query request and the historical record data.

The embodiment judges whether the context correlation exists between the historical record data and the query request through semantic analysis and vector coding, which is helpful for improving the understanding and response capability of the intelligent system to the user and enabling the system to be more intelligent and personalized. Specifically, through analysis of the text context, the topic, emotion and semantic relationship in the history data and the query request can be determined, so that the requirements and intentions of the user can be better understood; by vector encoding and similarity calculation, the degree of similarity between the history data and the query request can be quantified, and further, whether a context association exists or not can be determined. This may improve the accuracy and efficiency of the system, better providing personalized services and advice to the user.

Step S106, in the case where there is a contextual association between the query request and the history data, generating a query response for responding to the query request based on the query request and the history data, otherwise, generating the query response based on the query request.

According to the method and the device for providing the historical record data, the historical record data which are most relevant to the query request are selected, so that more personalized and accurate service can be provided for the user. For example, in a search engine, search results that better match the interests of the user are recommended; in the e-commerce platform, the commodity which better accords with the preference of the user is recommended. Therefore, the satisfaction degree of the user can be improved, and the viscosity of the user is increased. In addition, by sorting and screening the historical record data according to the similarity score, scanning and analysis of all the historical record data can be avoided, so that system resources are saved, and system efficiency is improved.

Example 2

The embodiment of the application provides a man-machine interaction method of virtual digital people, as shown in fig. 2, comprising the following steps:

step S202, receiving a query request input by a user.

The virtual digital person may receive the user's query request through a variety of input means, including text input and voice input. For text entry, the user may enter text via a keyboard or by touching the screen. For voice input, a user may input voice through a microphone.

If the user inputs text by way of a keyboard or touch screen, the virtual numeric person will use natural language processing techniques to convert the input text into a format understandable by the computer, which typically involves word segmentation, grammar analysis, and intent recognition of the text input by the user.

If the user inputs voice by using a microphone, the virtual digital person converts the voice of the user into text by using a voice recognition technology, and then processes the converted text by using a natural language processing technology. Speech recognition techniques typically require consideration of factors such as the user's speech quality and accent to ensure that the converted text is accurate.

Step S204, history data is acquired.

And acquiring the identity characteristics of the user, and acquiring historical record data based on the identity characteristics. An identity feature is obtained that is capable of identifying the identity of the user, which may include biometric, social or behavioral features such as name, age, gender, address, cell phone number, email address, IP address, etc. Historical data corresponding thereto, such as network activity, purchase records, social media behavior, mobile device usage, etc., may then be obtained from different sources based on these identity features. Such data may be obtained from personal devices, applications, websites, third party data providers, social media platforms, and the like.

Or, acquiring a history record in a preset time period. User activity occurring within a preset period of time may be obtained through data sources such as access records, communication records, search records, purchase records, etc. of the user's device, application program, website, etc. Such data may be obtained from personal devices, applications, websites, third party data providers, and the like.

The virtual digital person can be helped to better know personal conditions of the user, such as interests, purchasing preferences and the like of the user by acquiring identity characteristics of the user, and the virtual digital person can be helped to know past behaviors and preferences of the user by acquiring historical data so as to respond to demands of the user better. Additionally, the acquisition of the history data may also assist the virtual digital person in determining contextual information related to the user's current query request, such as the user's previous preferences, historical activity, and the like. Such contextual information may help the virtual digital person better understand the needs of the user and thus better generate the response. Finally, the virtual digital person may better understand the user's needs and context information to more accurately generate the response. Meanwhile, the historical record data can also help the virtual digital person to predict the demands of the user, and response efficiency is improved.

Step S206, judging whether a context association exists between the query request and the history data.

The virtual digital person needs to compare the history data to the words or phrases in the query request to determine if a contextual relationship exists between them. If a contextual association exists, the virtual digital person will generate a query response for responding to the query request based on the query request and the historic data. If no contextual association exists, the virtual digital person generates the query response based only on the query request.

In some embodiments, the method for determining whether a contextual relationship exists between the query request and the history data, as shown in fig. 3, includes the steps of:

in step S2062, the history data and the query request are respectively subjected to text analysis and semantic analysis by using natural language processing technology, so as to obtain a corresponding text context.

First, the history data and sentences in the query request are decomposed into words or phrases, respectively. The text is divided into words or sentences using a word divider (tokenizer) or sentence divider (sentence splitter). The segmenter recognizes punctuation marks and spaces in the text and segments the text into individual words or phrases, and the sentence segmenter segments the text into sentences. For example, off-the-shelf segmentor and sentence segmenter tools, such as NLTK, spaCy, and Stanford CoreNLP, etc., may be used, as well as custom rules to segment text. After segmentation is completed, each word or phrase may be stored as a list of strings for subsequent processing.

According to the embodiment, the text in the history data and the query request is decomposed into words or phrases, and part-of-speech tagging and semantic analysis are performed, so that the meaning and the context of the text can be better understood, and the accuracy and the efficiency of a natural language processing system are improved.

Each word or phrase is then part-of-speech tagged to determine its grammatical role in context. Part-of-speech tagging is a task in natural language processing that aims to tag each word or phrase in text as its grammatical role in context. Common grammatical roles include nouns, verbs, adjectives, adverbs, and the like. Part-of-speech annotators based on machine learning typically learn the relationship between words or phrases and their corresponding parts of speech using the annotated text as training data. In use, the input text will be broken down into words or phrases, and then the part-of-speech tag for each word or phrase is obtained by the part-of-speech tagger. Parts of speech tagging based on machine learning includes tools such as StanfordCoreNLP, NLTK and SpaCy. In addition, part-of-speech tagging may also be implemented using a rules engine or manual tagging.

The present embodiment may help identify grammatical roles played in the context of each word or phrase, such as nouns, verbs, adjectives, adverbs, and the like, using part-of-speech notations. This is important for understanding the meaning and context of text, as different parts of speech have different roles in grammar.

Finally, based on the natural language processing technology, analyzing the semantic of the historical record data and the query request to obtain the corresponding text context, wherein the semantic comprises the subject, emotion and semantic relation of the sentence. Semantic analysis based on natural language processing techniques is a complex task whose purpose is to extract semantically related information from text, including the subject matter, emotion, semantic relationships, etc. of the text. The specific method for analyzing the semantics of the history data and the query request to obtain the corresponding text context will be described in detail below, and therefore will not be described in detail here.

Step S2064, encoding the history data and the query request with a word vector, and calculating the similarity between the history data and the query request based on the encoding. In some examples, calculating the similarity between the history data and the query request is accomplished by the following equation:

similarityscore = cos(θ) = (A* B) / (||A|| ||B||)

Where A and B are vector representations of history data and query requests, respectively, representing dot product operations, the sum of A and B represents the modular lengths of the vectors A and B, respectively, and cos (θ) represents the cosine value of the angle between the vectors A and B.

In this formula, the vector representation of the history data and the query request needs to be calculated first by converting each word or phrase in the sentence into a preset word vector representation and then weighted averaging the word vector representations of all the words or phrases. The weight of this weighted average may be calculated based on TF-IDF techniques, etc.

Then, by calculating the cosine similarity score, the similarity between the history data and the query request can be obtained. Cosine similarity is a method for measuring text similarity, and can measure the similarity between two vectors in a high-dimensional space. In natural language processing, cosine similarity is often used to calculate similarity between texts, such as between questions posed by a user in a conversation and questions in historical data.

Step S2066, determining whether a contextual relationship exists between the query request and the history data based on the corresponding text context and the similarity.

Determining that a context association exists between the query request and the history data if the similarity score is greater than a preset threshold, and executing step S208; and if the similarity score is less than or equal to the preset threshold, determining that no context association exists between the query request and the history data, and executing step S209.

Step S208, in the case where there is a contextual association between the query request and the history data, generating a query response for responding to the query request based on the query request and the history data.

Step S209, in the case where there is no contextual association between the query request and the history data, generating the query response based on only the query request.

Example 3

The method for respectively carrying out text analysis and semantic analysis on the historical record data and the query request to obtain corresponding text context is shown in fig. 4, and comprises the following steps:

step S400, entity identification.

Entities in the text, such as person names, place names, organization names, etc., are analyzed to identify important information and participants in the text. Entity identification may use pre-trained models such as SpaCy and Stanford CoreNLP, etc.

Step S402, dependency analysis.

Dependency relationships between words in the sentence are analyzed to determine a grammatical structure in the sentence. The dependency analysis may use a pre-trained model. Specifically, the dependency relationship between each word in the sentence is analyzed to determine the grammatical structure in the sentence. For example, dependency relationships between words in a sentence can be analyzed using, for example, stanford CoreNLP and SyntaxNet, and a dependency tree can be generated. This may help to further understand the grammatical structure of the text and the relationships between the words in the sentence.

Step S404, word sense disambiguation.

1) Ambiguities in the text are analyzed. First, it is necessary to word the text and decompose the text into words. Then, for ambiguous words appearing in the text, word sense disambiguation processing is required.

2) A suitable word sense is selected. For each ambiguous word, the most appropriate one needs to be selected from its possible word senses. In order to select the correct word sense, context information, including preceding and following words, grammar structures, etc., typically needs to be considered. For example, where the words "deposit," "interest rate," and the like are relevant to finance, the term "bank" is likely to refer to a financial institution.

3) A pre-trained model is used. Word sense disambiguation may use a variety of methods including manual rules, knowledge base based methods, and machine learning based methods. Machine learning based methods typically use labeled text as training data to learn the relationships between words and their corresponding word senses. The pre-trained models include WordNet and Lesk algorithms, and the like.

And S406, emotion analysis.

Firstly, dialogue data of a virtual digital person and a user is required to be converted into text data, and then emotion analysis is carried out. Emotion analysis returns the emotion polarity of the text, e.g., positive, negative, or neutral, and an emotion score that reflects the emotion level of the text. Based on these results, the virtual digital person can automatically judge the emotion of the user, thereby better performing interaction and response.

And S408, analyzing the theme.

The main content and intent of the text may be determined by analyzing and categorizing the text of the history data and the text of the query request. The implementation of the topic analysis comprises two main steps as shown in fig. 5:

and step S4082, extracting the characteristics.

In the feature extraction stage, the text data typically requires some pre-processing, such as removal of stop words, tokenization, stem extraction, etc., to better perform topic analysis. The text data may then be converted to feature vectors by means of word frequency, TF-IDF, etc., for subsequent topic model construction.

The embodiment provides an improved TF-IDF feature vector calculation method. Traditional TF-IDF weights are weights that use document frequency (IDF) and frequency of occurrence of words (TF) in text when calculating the weight of each word in each text. However, this approach may result in certain words having different importance in different text, as the frequency with which they appear in text and the document frequency may be different.

The present embodiment uses symmetric TF-IDF weights to more equally consider the importance of each word in different texts. In particular, this embodiment combines TF and IDF together using a symmetrical approach to balance the number of occurrences of words in text against the number of occurrences in a collection of text. For example, the TF-IDF formula for computing symmetry of feature vectors is as follows:

w _ij = 0.5 + 0.5 * (tf(i,j) / maxk tf(k,j)) * log(N / df(i))

wherein w is _ij Is a feature vector, tf (i, j) is the frequency of occurrence of word i in text j, maxk tf (k, j) is the maximum frequency of occurrence of all words in text j, N is the total number of texts in the text set, df (i) is the number of texts containing word i, and text j refers to history data or query requests.

Compared with the traditional TF-IDF method, the symmetric TF-IDF method provided by the embodiment is more accurate in the aspect of equally considering the importance of words in different texts, and can better capture text semantics.

And S4084, constructing a theme model.

In the topic model construction stage, topic model algorithms can be used for text classification and topic extraction.

For example, using a probability-based generative model, text data is decomposed into several topics. It is assumed that each word in the text is generated by a certain topic and that each topic consists of several words. By learning and inferring the probability distribution of each word in the text, the topic distribution and topic-term distribution of the text can be determined.

In the topic model construction stage, let D represent a set of text, where D represents text, V represents a set of all different terms, and k represents the number of topics. It is assumed that each word in the text is generated by a certain topic, each topic consisting of several words. The probability distribution may be used to represent the probability that each word in the text belongs to a certain topic, as well as the probability that each word in each topic appears. The probability that each word wi in the text belongs to each topic zj and the probability that each word wi in each topic appears can be calculated using the following formula:

wherein P (z) _j |w _i ) Is each word w _i Belonging to each subject z _j Wherein P (w) _i ) Is the full probability, P (z _j ) Representing subject z _j Probability of occurrence in all texts, P (w _i |z _j ) Is each word w in each topic _i Probability of occurrence, w _i,j Representing feature vectors, h _ij The term vector representation is represented.

The text classification and topic extraction performed by the topic model algorithm in this embodiment may have the following beneficial effects:

1) And (5) text classification. The topic model may decompose a text into several topics and determine a probability distribution in each text that belongs to each topic. In this way, text may be assigned to the most likely topic, thereby achieving the objective of text classification.

2) And (5) extracting a theme. The topic model may identify topics in the text and determine a probability distribution for each term in each topic. In this way, key topics in the text can be extracted, so that the subject matter and topics of the text can be better understood. This is of great importance for information extraction, text summarization, text mining, and other applications.

3) Probability modeling. The topic model uses probability distributions to represent the probability that each word in the text belongs to a topic, and the probability that each word in each topic appears. In this way, a probabilistic model can be built to better understand the nature of text data and to infer and predict it.

4) And (5) reducing the dimension of the data. The topic model can dimension high-dimensional text data into the topic space, thereby reducing the dimension and complexity of the data. This helps to reduce the computational cost and improve the efficiency and scalability of the algorithm.

Step S410, semantic relationship analysis.

Semantic relationships, such as actions, causal relationships, etc., between various entities in the sentence are analyzed to further understand meaning in the text. The semantic relationship analysis may use natural language processing techniques and artificial intelligence techniques, such as knowledge graph and logic reasoning.

Example 4

The embodiment of the application provides a man-machine interaction device for virtual digital people, as shown in fig. 6, including: an acquisition module 62, a determination module 64, and a response module 66.

The acquisition module 62 is configured to acquire history data corresponding to a user in response to an inquiry request input by the user; the determination module 64 is configured to determine whether a contextual association exists between the query request and the history data; the response module 66 is configured to generate a query response based on the query request and the history data if a contextual association exists between the query request and the history data, and to generate a query response based on the query request otherwise.

It should be noted that, in the man-machine interaction device provided in the above embodiment, only the division of the above functional modules is used as an example, in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to perform all or part of the functions described above. In addition, the man-machine interaction device provided in the above embodiment and the man-machine interaction method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which is not described herein again.

Example 5

The embodiment of the application provides a man-machine interaction system of a virtual digital man, as shown in fig. 7, the interaction system 100 may include terminal devices, such as one or more of a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, a server 105, and a dynamic capture device 106.

The network 104 is a medium for providing communication links between the first terminal device 101, the second terminal device 102, the third terminal device 103 and the server 105, and between the dynamic capture device 106 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices with a display screen including, but not limited to, desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks, dynamic capture devices, and servers in fig. 7 are merely illustrative. There may be any number of terminal devices, networks, dynamic capture devices, and servers, as desired for implementation. For example, the server 105 may be a server cluster formed by a plurality of servers.

The dynamic capture device 106 is used for collecting dynamic capture data in real time, and sending the dynamic capture data to the server 105 via the network 104. The dynamic capture device 106 may include one or more of a dynamic capture helmet 1062, a dynamic capture suit 1064, and a dynamic capture glove 1066, among others.

The dynamic capture helmet 1062 is provided with a camera that takes up to 60 frames/second, and is capable of capturing rapid lip movements, blink movements, and facial twitches and shakes. Furthermore, the dynamic-catching helmet 1062 in the present embodiment is of an open structure so that air can circulate therein, thereby enabling a person wearing the dynamic-catching helmet 1062 to more comfortably perform an operation. The dynamic capture helmet 1062 may be connected to a dedicated data line, and may be extended by a connection enhanced USB extension if the data line is not long enough.

The dynamic suit 1064 is composed of inertial sensors, control boxes, lycra fabric, etc. The dynamic capture suit 1064 in this embodiment is provided with 17 sensors, which can track the movements of 23 different body links simultaneously, the tracking locations including feet, lower legs, knees, abdomen, hands, elbows, shoulders, etc. The dynamic capturing suit 1064 in this embodiment can meet the strict requirements of motion capturing and animation design by such a structure, and has the advantages of simple use, comfortable wearing and high data quality. In other embodiments, the trackable markers may also be placed on the dynamic capture suit 1064 to capture the motion profile of the person or other object wearing the dynamic capture suit 1064. For example, retroreflective markers may be placed and tracked by a tracking device such as an infrared camera.

The dynamic capture glove 1066 is composed of an inertial sensor, elastic fabric, a hand motion capture system, etc. In this embodiment, 12 high-performance nine-axis inertial sensors are disposed on the dynamic capture glove 1066, the gesture update frequency is 120Hz, the number of the collection points is 12 nodes, the static precision is 0.02 degrees, the dynamic precision is 0.2 degrees, the resolving frequency is about 1000Hz, and the data delay is 30ms.

After receiving the dynamic capturing data, the server 105 executes the man-machine interaction method of the virtual digital person provided by the embodiment of the present disclosure, generates a query response, and pushes the query response to the first terminal device 101, the second terminal device 102 and the third terminal device 103.

The man-machine interaction method of the virtual digital person provided in the embodiments of the present disclosure is generally executed by the server 105, and accordingly, the man-machine interaction device of the virtual digital person is generally disposed in the server 105. However, it is easy to understand by those skilled in the art that the man-machine interaction method of the virtual digital person provided in the embodiment of the present disclosure may be performed by the first terminal device 101, the second terminal device 102, and the third terminal device 103, and accordingly, the man-machine interaction device of the virtual digital person may also be provided in the first terminal device 101, the second terminal device 102, and the third terminal device 103, which is not limited in particular in the present exemplary embodiment.

In some exemplary embodiments, the user may interact with the rendered virtual digital person through the application programs on the first terminal device 101, the second terminal device 102, and the third terminal device 103, and the server 105 generates a query response through the human-computer interaction system of the virtual digital person provided by the embodiments of the present disclosure, and sends the query response to the first terminal device 101, the second terminal device 102, the third terminal device 103, and so on. The first terminal device 101, the second terminal device 102, and the third terminal device 103 may also perform rendering operations locally and generate query responses based on the type of interaction.

Example 6

Fig. 8 shows a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure. It should be noted that the electronic device shown in fig. 8 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present disclosure.

As shown in fig. 8, the electronic device includes a Central Processing Unit (CPU) 1001 that can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for system operation are also stored. The CPU1001, ROM 1002, and RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output portion 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), etc., and a speaker, etc.; a storage portion 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The drive 1010 is also connected to the I/O interface 1005 as needed. A removable medium 1011, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in the drive 1010, so that a computer program read out therefrom is installed as needed in the storage section 1008.

In particular, according to embodiments of the present disclosure, the processes described below with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 1009, and/or installed from the removable medium 1011. When executed by a Central Processing Unit (CPU) 1001, performs the various functions defined in the methods and apparatus of the present application. In some embodiments, the electronic device may further include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

It should be noted that the computer readable medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device.

The computer-readable medium carries one or more programs which, when executed by one of the electronic devices, cause the electronic device to implement the methods described in the embodiments below. For example, the electronic device may implement the steps of the method embodiments described above, and so on.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present application.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided in the present application, it should be understood that the disclosed terminal device may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.

Claims

1. A human-computer interaction method, comprising:

responding to an inquiry request input by a user, and acquiring history record data corresponding to the user;

judging whether a context association exists between the query request and the history data;

a query response is generated based on the query request and the history data in the presence of a contextual association between the query request and the history data, and otherwise the query response is generated based only on the query request.

2. The method of claim 1, wherein obtaining history data corresponding to the user comprises at least one of:

acquiring an identity feature capable of identifying the user, and acquiring the history data corresponding to the identity feature based on the identity feature, wherein the identity feature comprises a biological feature, a social feature or a behavioral feature of the user;

and acquiring a history record in a preset time period, and taking the history record in the preset time period as the history record data.

3. The method of claim 1, wherein determining whether a contextual association exists between the query request and the history data comprises:

respectively carrying out text analysis and semantic analysis on the historical record data and the query request by using a natural language processing technology to obtain a corresponding text context;

encoding the history data and the query request by using word vectors, and calculating the similarity between the history data and the query request based on the encoding;

based on the respective text contexts and the similarity, a determination is made as to whether a contextual association exists between the query request and the history data.

4. A method according to claim 3, wherein performing text analysis and semantic analysis on the history data and the query request, respectively, to obtain a corresponding text context, comprises:

decomposing the historical record data and sentences in the inquiry request into words or phrases respectively;

part-of-speech tagging is performed on each word or phrase to determine grammatical roles of each word or phrase in context;

and analyzing the semantic meaning of the historical record data and the query request based on each word or phrase after the part-of-speech tagging to obtain the corresponding text context, wherein the semantic meaning comprises the subject, emotion and semantic relation of the sentence.

5. The method of claim 3, wherein encoding the history data and the query request with a word vector and calculating a similarity between the history data and the query request based on the encoding comprises:

decomposing sentences in the historical record data and the query request into words or phrases, acquiring word vector representations of each word or phrase based on a preset word vector model, and carrying out weighted average coding on the word vector representations of all the words or phrases to obtain vector representations of the historical record data and the query request;

A similarity score between the historical data and the query request is calculated using a cosine similarity measure based on the vector representations of the historical data and the query request.

6. The method of claim 5, wherein determining whether a contextual association exists between the query request and the history data based on the respective text context and the similarity comprises:

determining that a contextual association exists between the query request and the historical record data if the similarity score is greater than a preset threshold;

and under the condition that the similarity score is smaller than or equal to the preset threshold value, determining that no context correlation exists between the query request and the historical record data.

7. The method of claim 5, wherein obtaining a word vector representation for each word or phrase based on a pre-set word vector model, and wherein weighted average encoding the word vector representations for all words or phrases comprises:

calculating a word vector representation of each word or phrase at a frequency of occurrence of the query request and the history data;

calculating the reciprocal of the frequency of the word vector representation of each word or phrase in a preset corpus;

And performing weighted average coding based on the frequency of occurrence and the inverse of the frequency of occurrence.

8. A human-machine interaction device, comprising:

the acquisition module is configured to respond to an inquiry request input by a user and acquire history record data corresponding to the user;

a determination module configured to determine whether a contextual association exists between the query request and the history data;

and a response module configured to generate a query response based on the query request and the history data if there is a contextual association between the query request and the history data, and to generate a query response based on the query request only otherwise.

9. A human-machine interaction system, comprising:

the human-machine interaction device of claim 8;

and a rendering device configured to render a virtual digital person, wherein the virtual digital person is used for outputting the query response of the man-machine interaction device.

10. A computer-readable storage medium, on which a program is stored, characterized in that the program, when run, causes a computer to perform the method of any one of claims 1 to 7.