CN117519488B

CN117519488B - Dialogue method and dialogue system of dialogue robot

Info

Publication number: CN117519488B
Application number: CN202410016504.6A
Authority: CN
Inventors: 李强; 赵峰; 叶林峰; 高攀; 李欢欢
Original assignee: Sichuan Zhongdian Aostar Information Technologies Co ltd; State Grid Information and Telecommunication Co Ltd
Current assignee: Sichuan Zhongdian Aostar Information Technologies Co ltd; State Grid Information and Telecommunication Co Ltd
Priority date: 2024-01-05
Filing date: 2024-01-05
Publication date: 2024-03-29
Anticipated expiration: 2044-01-05
Also published as: CN117519488A

Abstract

The invention discloses a conversation method and a conversation system of a conversation robot, which relate to the technical field of conversation robots.

Description

Dialogue method and dialogue system of dialogue robot

Technical Field

The invention relates to the technical field of conversation robots, in particular to a conversation method and a conversation system of a conversation robot.

Background

The conversation robot, also called chat robot, or question-answering system, refers to a software system that enables people and machines to communicate with each other using natural language. With the progress of artificial intelligence technology represented by deep learning, a new development trend is coming into the dialogue of a robot system. The conversation robot is already able to understand to some extent the true semantics expressed by natural language questions from the user, and may even incorporate contextual information during the conversation, giving the most appropriate answer. Because of this, conversational robots have begun to play an increasingly important role in people's work and life;

the existing conversation method of the conversation robot and the conversation system thereof are applied to a service hall of an enterprise, such as a bank, intelligent conversation is carried out by arranging the conversation robot in the service hall of the enterprise to supply conversation users, but the conversation robot only can serve one user at a time, and the personnel in the service hall usually have a lot, and the existing conversation robot can lock service objects by face acquisition and wake-up passwords, so that if a plurality of objects want to be in conversation service at the same time, the conversation users of the face acquisition and wake-up passwords acquired by the conversation robot can have a plurality of conversation users, which can lead to inaccurate service object locking situations of the conversation robot, noise environments of all areas in the service hall are different, and the existing conversation service is provided without selecting proper conversation areas based on the noise environments of all areas, which can lead to low difficulty in distinguishing and identifying the current locked conversation users from environment noise, and reduce the efficiency and quality of conversation service;

in order to solve the above problems, the present invention proposes a solution.

Disclosure of Invention

The invention aims to provide a conversation method and a conversation system of a conversation robot, which aim to solve the problem that in the prior art, when a plurality of users want to carry out conversation service at the same time, the conversation robot cannot accurately lock a service object, and a proper conversation area is not selected for the service object based on noise environments of all areas, so that the difficulty of distinguishing and identifying the current locked conversation user from environmental noise is low, and the efficiency and quality of conversation service are reduced.

The aim of the invention can be achieved by the following technical scheme:

a conversation method of a conversation robot, comprising:

step one: collecting face image data, clothing feature data and wake-up audio data of a plurality of dialogue users who speak wake-up passwords to the dialogue robot in a current service hall to generate wake-up information data at the current moment;

step two: comparing the areas of face and face in the face image data of each dialogue user contained in the wake-up information data at the current moment, determining the locking object of the current dialogue robot by the interval distance between the face and the high-definition camera and the voice receiving equipment carried by the dialogue robot, and generating locking object data of the current dialogue robot;

step three: dividing a waiting area and the like divided for dialogue users in a service hall into a plurality of areas, analyzing the noise in the area and the noise sizes in a plurality of adjacent areas of the waiting area for each area to obtain the noise screening amount of each area, screening the areas by combining the number of people existing in each area at the current moment, and determining the optimal dialogue area at the current moment;

step four: the locking object of the current conversation robot is guided to the optimal conversation area at the current moment to conduct conversation.

Further, the wake-up password is input into the conversation robot in advance by a maintainer of the conversation robot, text reply content data for providing conversation service for the conversation user in the service hall are prestored in the conversation robot, one of the text reply content data corresponds to one of the conversation categories, and the conversation categories are divided into technical support categories, emotion categories and account problem categories based on service content provided by the service hall.

A conversation system of a conversation robot, comprising:

the tracking locking module is used for carrying out tracking locking on a dialogue user seeking dialogue service for the dialogue robot in the service hall, and comprises an information acquisition unit and a tracking locking unit;

the information acquisition unit comprises a conversation robot, and acquires face image data, clothing feature data and awakening audio data of a plurality of conversation users who speak awakening passwords into the conversation robot in a current service hall and generates awakening information data at the current moment according to the face image data, the clothing feature data and the awakening audio data;

the method comprises the steps that a tracking and locking unit obtains all dialogue users contained in current time awakening information data, determines a locking object of a current dialogue robot based on the area of a face in face image data corresponding to each dialogue user and the interval distance between the face and a high-definition camera and a voice receiving device carried by the dialogue robot, obtains locking object data of the current dialogue robot, and generates a service guide instruction;

the analysis guiding module is used for guiding the current dialogue user to the optimal position for dialogue service, and comprises an environment analysis unit and a service guiding unit, wherein the environment analysis unit is used for dividing a waiting area and the like divided for dialogue personnel in a service hall into a plurality of areas after receiving a service guiding instruction, analyzing the noise in the area and the noise in the area adjacent to the area for each area to obtain noise screening characteristic quantity of each area, and screening the area by combining the number of people existing in the area at the current moment to obtain the optimal dialogue area at the current moment;

the service guiding unit guides the dialog personnel to the region according to the optimal dialog region at the current moment.

The invention has the beneficial effects that:

(1) According to the invention, the information acquisition unit is arranged to acquire and generate the wake-up information data at the current moment, and the tracking and locking unit locks the dialogue user based on the acquired face image data and wake-up audio data of the dialogue user, so that inaccurate or impossible locking of the dialogue robot locking service object caused by acquisition of a plurality of dialogue users at the same moment is avoided, and the experience of dialogue service of the user is influenced;

(2) The invention provides a method for setting an environment analysis unit to periodically analyze all areas in a service hall, combining the number of people in the current area to select the optimal dialogue area at the current moment, a service guiding unit guides the current locked dialogue user to the area to conduct dialogue, an auxiliary processing unit identifies and distinguishes one piece of dialogue audio data of the current dialogue personnel from collected surrounding environment sound data of a dialogue robot according to the voiceprint characteristic data of the current locked dialogue user and converts the dialogue audio data into one piece of dialogue text data, a dialogue service unit sorts the dialogue text data according to a text sorting technology and matches the corresponding text reply content data, and the dialogue text data is simultaneously displayed to the current dialogue user in a voice broadcasting and text display mode, so that the speed of dialogue service is accelerated while the voice difficulty of a service object is identified from the environment is reduced, and the quality of dialogue service is ensured.

Drawings

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a flow chart of the method of the present invention;

fig. 2 is a system block diagram of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1 and 2, a dialogue method and a dialogue system of a dialogue robot include a tracking and locking module, an analysis and guiding module and a dialogue service module;

the tracking and locking module is used for tracking and locking dialogue users seeking dialogue services from the dialogue robot in the service hall, and comprises an information acquisition unit and a tracking and locking unit, wherein a waiting area is arranged in the service hall in the embodiment;

the information acquisition unit comprises a conversation robot, the information acquisition unit acquires face image data, clothing feature data and awakening audio data of a plurality of conversation users who speak awakening passwords into the conversation robot in a current service hall and generates awakening information data at the current moment according to the face image data, the clothing feature data and the awakening audio data, the conversation users refer to users seeking conversation services, the clothing feature data of one conversation user comprise collar shapes, colors and textures of the conversation users, the awakening audio data of the one conversation user refer to audio data of the conversation user which speak the awakening passwords, the awakening passwords are input into the conversation robot in advance by maintainers of the conversation robot, and in the embodiment, the awakening passwords of the conversation robot are sequentially spoken twice by names, and the conversation robot is provided with a high-definition camera, a voice receiving device and a noise sensor;

here, the plurality of dialogue users who speak the wake-up password to the robot refer to all dialogue users within the window of the dialogue robot camera;

the information acquisition unit transmits the wake-up information data at the current moment to the tracking and locking unit, and the tracking and locking unit determines a locking object of the current dialogue robot according to a preset tracking and locking rule after receiving the wake-up information data at the current moment transmitted by the information acquisition unit and generates locking object data of the current dialogue robot, and the method specifically comprises the following steps:

step S11: acquiring all dialogue users contained in the wake-up information data at the current moment, and marking the dialogue users as A1, A2, aa and a more than or equal to 1 in sequence;

step S12: acquiring the area B1 of the face of the conversation user A1 based on the face image data of the conversation user A1, and calculating the interval distance C1 between the face of the conversation user A1 and a high-definition camera carried by the conversation robot when acquiring the face image data of the conversation user A1;

step S13: based on the audio data awakened by the conversation user A1, calculating and obtaining the interval distance D1 between the conversation user A1 and the voice receiving equipment carried by the conversation robot by analyzing the energy distribution of the audio signals on different frequencies;

step S14: using the formulaCalculating and acquiring a locking evaluation index E1 of the conversation user A1, wherein the locking evaluation index is manually defined and used for measuring whether the conversation user A1 has a condition evaluation index for tracking locking, and alpha 1 and alpha 2 are preset duty ratio coefficients;

step S15: according to the steps S11 to S14, the locking evaluation indexes E1, E2 of the dialogue users A1, A2, the I.P., aa are calculated and obtained, the maximum value Emax is obtained, and the dialogue user corresponding to the Emax is determined as the locking object of the current dialogue robot;

the tracking and locking unit acquires face image data, clothing feature data and wake-up audio data of a dialogue user corresponding to Emax, generates locking object data of a current dialogue robot according to the face image data, the clothing feature data and the wake-up audio data, and transmits the locking object data to the dialogue service module, and the tracking and locking unit generates a service guide instruction and transmits the service guide instruction to the analysis guide module;

it should be noted that, if the wake-up information data at the current moment only includes one dialogue user, the locking object data of the current dialogue robot is directly generated according to the face image data, the clothing feature data and the wake-up audio data of the dialogue user and is transmitted to the dialogue service module;

the dialogue service module is used for providing dialogue service for dialogue users and comprises an auxiliary processing unit and a dialogue service unit;

the conversation service module receives the locking object data of the current conversation robot transmitted by the tracking locking unit and then transmits the locking object data to the auxiliary processing unit, and the auxiliary processing unit acquires voiceprint feature data of the current conversation user according to wake-up audio data carried in the locking object data of the current conversation robot transmitted by the conversation service module and stores the voiceprint feature data;

the voiceprint feature data of the dialogue user comprises frequency spectrum features, tone features, formant features and duration features, wherein the frequency spectrum features comprise frequency distribution and energy distribution data of sound, and the formant features are as follows: the sound propagates in the sound channel to cause resonance, the frequency and the amplitude of the formants can be used for identifying the voiceprint, the tone features refer to the pitch and the promotion change features of the sound, and the duration features refer to the pronunciation duration and the pause duration data of the dialog user in the wake-up audio data;

the analysis guiding module is used for guiding the current dialogue user to the optimal position for dialogue service, and comprises an environment analysis unit and a service guiding unit;

the analysis guiding module receives the service guiding instruction transmitted by the tracking locking unit and then transmits the service guiding instruction to the environment analysis unit, and the environment analysis unit generates an optimal dialogue area at the current moment according to a preset analysis judging rule after receiving the service guiding instruction transmitted by the analysis guiding module, and the method comprises the following specific steps:

step S21: equally dividing a waiting area divided for conversational staff in a service hall into a plurality of square areas with side length of F1, calibrating the square areas as analysis areas, and marking the analysis areas as G1, G2, G, and G is more than or equal to 1 in sequence, wherein F1 is a preset value;

step S22: firstly, selecting an analysis area G1 as a main analysis area, and sequentially recalibrating the analysis areas adjacent to the main analysis area in position into associated analysis areas, wherein the associated analysis areas are marked as E1, E2, ee, and E is more than or equal to 1 and less than or equal to 4;

step S23: and calculating and acquiring a noise evaluation characteristic quantity N1 in the main analysis area according to a preset calculation rule, wherein the preset calculation rule is as follows:

step S231: calculating and obtaining the noise sizes in the main analysis areas in H monitoring periods, wherein the noise sizes are marked as H1, H2, hh and H is more than or equal to 1 in sequence; the h monitoring periods are traced back to the past by starting with a current monitoring period, wherein the current monitoring period is traced back to P1 time from the current moment, and P1 is a preset monitoring duration;

step S232: using the formulaCalculating and obtaining the dispersion Y1 of the noise size in the main analysis area in h monitoring periods, and carrying out Y1 and YComparing the sizes, wherein H is expressed as the average value of Hi, and Y is a preset value;

if Y1 is more than or equal to Y, sequentially selecting corresponding His according to the sequence of I Hi-H I from large to small, deleting the corresponding His each time, recalculating the discrete value Y1 of the residual His after deleting, and comparing the Y1 with the Y again until Y1 is less than Y, and recalibrating the average value of the residual His at the moment as the noise average value in the main analysis area and marking the average value as J1;

step S233: obtaining all noise magnitudes larger than a noise mean J1 from the deleted Hi, sequentially marking the noise magnitudes as K1, K2, K, h > K being more than or equal to 1, calculating and obtaining the mean value by utilizing a summation averaging formula, re-calibrating the mean value as a high noise mean value of a main analysis area, marking the high noise mean value as L1, and calculating and obtaining a high noise difference value M1 of the main analysis area by utilizing a formula M1=Kmax-L1, wherein Kmax is the maximum value in K1, K2, K, and K;

step S234: using the formulaCalculating and acquiring a noise evaluation characteristic quantity N1 of a main analysis area;

step S24: the noise evaluation feature amounts O1, O2, and Oe of the associated analysis areas E1, E2, and Ee are calculated and acquired, respectively, according to step S23;

step S25: using the formulaCalculating and acquiring a screening evaluation value R1 of a main analysis area based on the association area, wherein Q1 is the number of people in the main analysis area at the current moment, and beta 1 and beta 2 are preset adjustment values;

step S26, sequentially selecting analysis areas G1, G2, G.and G.as main analysis areas according to the step S22, and sequentially calculating and obtaining screening evaluation values R1, R2, G.and Rg of the analysis areas G1, G2, G.and G.based on the relevant areas;

the position of an analysis area corresponding to a screening evaluation value Rmin based on the association area in a waiting area of a service hall is recalibrated to be the optimal dialogue area at the current moment;

the environment analysis unit transmits the current time optimal conversation area to the service guiding unit, and the service guiding unit guides the conversation personnel to the current time optimal conversation area after receiving the current time optimal conversation area transmitted by the environment analysis unit;

when the dialogue personnel reaches the optimal dialogue area at the current moment, the service guiding unit generates a dialogue starting instruction and transmits the dialogue starting instruction to the dialogue service module, and the dialogue service module receives the dialogue starting instruction transmitted by the service guiding unit and transmits the dialogue starting instruction to the auxiliary processing unit;

the auxiliary processing unit acquires environmental sound data around the conversation robot in real time after receiving a conversation starting instruction transmitted by the conversation service module and identifies and distinguishes a conversation audio data of a current conversation person in the acquired environmental sound data around the conversation robot and converts the conversation audio data into a conversation text data according to voice print characteristic data of the current conversation user stored in the voice print characteristic data, and the auxiliary processing unit transmits the converted conversation text data to the conversation service unit, wherein in the embodiment, the beginning and the end of one conversation audio data are from the beginning of a conversation starting instruction of the current conversation user to the end of the continuous t seconds when no sound is emitted, and t is a preset duration;

the dialogue service unit comprises a dialogue text database, wherein the dialogue text database stores all text reply content data of the service hall for providing dialogue service for dialogue users; in this embodiment, one of the text content data corresponds to one dialogue class, and in this embodiment, the dialogue class is divided into a technical support class, an emotion class, and an account problem class based on service content provided by a service hall;

the dialogue service unit receives a piece of dialogue text data transmitted by the auxiliary processing unit, extracts keywords and parts of speech in the dialogue text data, delimits dialogue categories of the piece of dialogue text data based on the dialogue text data by using a text classification technology, matches text reply content data consistent with the keywords of the piece of dialogue text data in a dialogue text database based on the dialogue categories of the piece of dialogue text data, and simultaneously displays the text reply content data to a current dialogue user in a voice broadcasting and text display mode;

in the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing is merely illustrative and explanatory of the invention, as various modifications and additions may be made to the particular embodiments described, or in a similar manner, by those skilled in the art, without departing from the scope of the invention or exceeding the scope of the invention as defined in the claims.

The foregoing describes one embodiment of the present invention in detail, but the description is only a preferred embodiment of the present invention and should not be construed as limiting the scope of the invention. All equivalent changes and modifications within the scope of the present invention are intended to be covered by the present invention.

Claims

1. A conversation method of a conversation robot, the conversation robot is equipped with a high-definition camera, a voice receiving device and a noise sensor; the dialogue method is characterized by comprising the following steps of:

step one: the dialogue robot continuously monitors environmental information in the service hall through the voice receiving equipment and captures the wake-up password; when the wake-up password is captured, collecting face image data of a user acquired by a high-definition camera at the current moment and wake-up audio data acquired by voice receiving equipment, and generating wake-up information data at the current moment;

step two: comparing the wake-up information data at the current moment, wherein the face of each dialogue user occupies the area of the face graphic data in the face image data of each dialogue user and the interval distance between the face and the high-definition camera and the voice receiving equipment carried by the dialogue robot, determining the locking object of the current dialogue robot, and generating the locking object data of the current dialogue robot, wherein the specific steps are as follows:

step S14: using the formulaCalculating and obtaining a locking evaluation index E1 of a conversation user A1, wherein the locking evaluation index E1 is used for measuring whether the conversation user A1 has a condition evaluation index for tracking locking, and alpha 1 and alpha 2 are preset duty ratio coefficients;

acquiring face image data, clothing feature data and wake-up audio data of a dialogue user corresponding to Emax, and generating locking object data of the current dialogue robot according to the face image data, the clothing feature data and the wake-up audio data;

step three: dividing a waiting area and the like divided for dialogue users in a service hall into a plurality of areas, periodically analyzing noise in the areas and noise sizes in a plurality of adjacent areas of the areas for each area to obtain noise screening quantity of each area, screening the areas by combining the number of people existing in each area at the current moment, and determining the optimal dialogue area at the current moment;

2. The conversation method of a conversation robot of claim 1 wherein the wake-up password is entered into the conversation robot in advance by a maintainer of the conversation robot, and text reply content data of the service hall for providing conversation services to the conversation users are prestored in the conversation robot, one of the text reply content data corresponds to one of the conversation categories, and the conversation categories are classified into technical support categories, emotion categories, and account problem categories based on the service content provided by the service hall.

3. The conversation method of a conversation robot of claim 1 wherein the wake-up information data at the present time further includes clothing feature data of a conversation user.

4. The conversation method of a conversation robot of claim 1 wherein the third step is to determine the most appropriate conversation area at the current time, specifically as follows:

step S232: using the formulaCalculating and obtaining the dispersion Y1 of the noise size in a main analysis area in H monitoring periods, comparing the size of Y1 with that of Y, wherein H is expressed as the mean value of Hi, and Y is a preset value;

step S25: using the formulaCalculating and acquiring a screening evaluation value R1 of a main analysis area based on the association area, wherein Q1 is the number of people in the main analysis area at the current moment, and beta 1 andβ2 is a preset adjustment value;

and (3) re-calibrating the position of the analysis area corresponding to the screening evaluation value Rmin based on the association area in the waiting area of the service hall as the optimal dialogue area at the current moment.

5. The conversation method of a conversation robot as claimed in claim 1 wherein the conversation robot collects environmental sound data of the periphery of the conversation robot in real time after a lock object of the current conversation robot is guided to an optimal conversation area at the current time, and recognizes and distinguishes a piece of conversation audio data of a current conversation person from the collected environmental sound data of the periphery of the conversation robot and converts the same into a piece of conversation text data according to the lock object data of the current conversation robot;

extracting keywords and parts of speech in the piece of dialogue text data, defining dialogue categories of the piece of dialogue text data by using a text classification technology based on the keywords and parts of speech, matching the matched text reply content data in a dialogue text database according to the dialogue categories and the keywords of the piece of dialogue text data, and simultaneously displaying the matched text reply content data to a locking object of the current dialogue robot in a voice broadcasting and text display mode.

6. A conversation system of a conversation robot for implementing the conversation method of claim 1, wherein the conversation system comprises:

the information acquisition unit comprises a conversation robot, wherein the conversation robot continuously monitors environmental information in a service hall through voice receiving equipment and captures an awakening password; when the wake-up password is captured, collecting face image data of a user acquired by a high-definition camera at the current moment and wake-up audio data acquired by voice receiving equipment, and generating wake-up information data at the current moment;

the tracking and locking unit compares the area of face image data of each dialogue user occupied by the face in the face image data of each dialogue user and the interval distance between the face and a high-definition camera and a voice receiving device carried by the dialogue robot in wake-up information data at the current moment, determines a locking object of the current dialogue robot, generates locking object data of the current dialogue robot and generates a service guide instruction;

the analysis guiding module is used for guiding the current dialogue user to the optimal position for dialogue service, and comprises an environment analysis unit and a service guiding unit, wherein the environment analysis unit divides a waiting area and the like divided for the dialogue user in a service hall into a plurality of areas after receiving a service guiding instruction, periodically analyzes noise in the area and noise sizes in a plurality of adjacent areas of the area for each area to obtain noise screening quantity of each area, screens the area by combining the number of people existing in the current moment in each area, and determines the optimal dialogue area at the current moment;

the service guiding unit guides the locking object of the current dialogue robot to go to the optimal dialogue area at the current moment to conduct dialogue;

the steps of determining the locking object of the current dialogue robot and generating the locking object data of the current dialogue robot are specifically as follows:

and acquiring face image data, clothing feature data and wake-up audio data of the dialogue user corresponding to Emax, and generating locking object data of the current dialogue robot according to the face image data, the clothing feature data and the wake-up audio data.

7. The conversation system of claim 6 wherein the service guide unit generates a start conversation instruction when a conversation person arrives at an optimum conversation area at a current time, the auxiliary processing unit collects environmental sound data around the conversation robot in real time after receiving the start conversation instruction and recognizes and distinguishes a conversation audio data of the current conversation person from the collected environmental sound data around the conversation robot according to voiceprint feature data of the current conversation user stored therein and converts the conversation audio data into a conversation text data, and the auxiliary processing unit transmits the converted conversation text data to the conversation service unit.

8. The conversation system of claim 7 wherein the conversation service unit receives a piece of conversation text data, extracts keywords and parts of speech in the conversation text data, and based on the keywords and parts of speech, uses text classification techniques to define conversation categories of the piece of conversation text data, and matches consistent text reply content data in a conversation text database according to the conversation categories and keywords of the piece of conversation text data, and presents the same to a current conversation user in a voice broadcast and text display mode.