CN111177329A

CN111177329A - User interaction method of intelligent terminal, intelligent terminal and storage medium

Info

Publication number: CN111177329A
Application number: CN201811348973.9A
Authority: CN
Inventors: 吴炽强
Original assignee: Qiku Internet Technology Shenzhen Co Ltd
Current assignee: Qiku Internet Technology Shenzhen Co Ltd
Priority date: 2018-11-13
Filing date: 2018-11-13
Publication date: 2020-05-19

Abstract

The invention discloses a user interaction method of an intelligent terminal, the intelligent terminal and a storage medium. The user interaction method of the intelligent terminal analyzes the received sound and determines the sound source position corresponding to the sound; carrying out identity recognition on the user at the sound source position to obtain a recognition result; determining the identity information of the user according to the identification result, and acquiring first historical interaction data corresponding to the identity information; and intelligently interacting with the user by utilizing the first historical interaction data. By the method, human-computer interaction can be carried out on the basis of the identified historical interaction data of the user, and the user can better fit the hobbies, the travel plan, the living habits, the physical conditions, the moods and the like of the user, so that the interaction between the intelligent terminal and the user is more personalized, the personal characteristics of the user are better met, the interaction is more intelligent, and the user experience is better.

Description

User interaction method of intelligent terminal, intelligent terminal and storage medium

Technical Field

The invention relates to the technical field of man-machine interaction of intelligent terminals, in particular to a user interaction method of an intelligent terminal, the intelligent terminal and a storage medium.

Background

With the development of intelligent terminals, intelligent devices capable of performing human-computer interaction with users are becoming more and more common. In the prior art, intelligent devices such as an intelligent sound box, a mobile phone, and an intelligent robot can obtain a user instruction to perform intelligent man-machine interaction with the user, for example, perform operations such as playing music and video according to the user control instruction, and perform corresponding answers according to a user question to perform chat interaction with the user.

However, in the prior art, the intelligent terminal cannot identify different users, but responds to an interactive instruction sent by the user according to a set program, cannot perform personalized operation according to different users in a targeted manner, cannot perform personalized operation corresponding to the user intelligently according to different users when the number of the users is large, and cannot meet personalized requirements of a plurality of different users.

Disclosure of Invention

The invention aims to provide a user interaction method of an intelligent terminal, the intelligent terminal and a storage medium.

In order to achieve the above object, the present invention provides a user interaction method for an intelligent terminal, where the user interaction method includes:

analyzing the received sound, and determining a sound source position corresponding to the sound;

carrying out identity recognition on the user at the sound source position to obtain a recognition result;

determining the identity information of the user according to the identification result, and acquiring first historical interaction data corresponding to the identity information;

and intelligently interacting with the user by utilizing the first historical interaction data.

On the other hand, the invention provides an intelligent terminal, which comprises a sound acquisition device, a man-machine interaction circuit, a memory and a processor, wherein the sound acquisition device, the man-machine interaction circuit, the memory and the processor are connected with each other;

the sound acquisition device is used for acquiring the sound of a user;

the memory is used for storing computer instructions executed by the processor;

the human-computer interaction circuit is used for performing human-computer interaction with a user according to the instruction of the processor;

the processor is used for executing the computer instruction to generate a corresponding human-computer interaction control instruction and sending the control instruction to the human-computer interaction circuit, so that the human-computer interaction circuit realizes the user interaction method according to the control instruction.

In another aspect, the present invention further provides a storage medium storing computer program data, which can be executed to implement the above-mentioned user interaction method.

Has the advantages that: different from the prior art, the intelligent terminal establishes the historical interaction database corresponding to the user according to the identity information obtained by the user registration, and stores the information interacted with the user into the historical interaction database corresponding to the identity information of the user. When the intelligent terminal receives the voice of the user, the user at the position of the sound source is identified, the identity information of the user is determined, historical interactive data corresponding to the user are further obtained, and then intelligent interaction can be carried out according to the historical interactive data corresponding to the user. Because the interaction at the moment is carried out based on the historical interaction data of the user, the hobbies, the travel plans, the living habits, the physical conditions, the moods and the like of the user are represented to have user personalized information, the intelligent interaction between the intelligent terminal and the user is enabled to be more suitable for the hobbies, the travel plans, the living habits, the physical conditions, the moods and the like of the user, the interaction is more personalized, the personal characteristics of the user are better met, the interaction is more intelligent, and the user experience is better.

Drawings

FIG. 1 is a flowchart illustrating a first embodiment of a user interaction method of an intelligent terminal according to the present invention;

FIG. 2 is a schematic flow chart diagram illustrating one embodiment of step S11 in FIG. 1;

FIG. 3 is a schematic illustration of the sound source location calculation of FIG. 2;

FIG. 4 is a schematic flow chart diagram illustrating one embodiment of step S12 in FIG. 1;

FIG. 5 is a schematic flow chart diagram illustrating another embodiment of step S12 in FIG. 1;

FIG. 6 is a schematic flow chart diagram illustrating one embodiment of step S14 in FIG. 1;

FIG. 7 is a schematic flow chart diagram illustrating another embodiment of step S14 in FIG. 1;

FIG. 8 is a flowchart illustrating a second embodiment of a user interaction method of the smart terminal according to the present invention;

FIG. 9 is a flowchart illustrating a third exemplary embodiment of a user interaction method of the smart terminal according to the present invention;

FIG. 10 is a schematic structural diagram of an embodiment of an intelligent terminal according to the present invention;

FIG. 11 is a schematic structural diagram of an embodiment of a storage medium according to the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, those skilled in the art will now describe the present invention in further detail with reference to the accompanying drawings and detailed description. It is to be understood that the described embodiments are merely some embodiments of the invention, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without any creative effort belong to the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a user interaction method of an intelligent terminal according to a first embodiment of the present invention. As shown in fig. 1, the user interaction method of the present embodiment may at least include the following steps:

in step S11, the received sound is analyzed to determine the sound source position corresponding to the sound.

The intelligent sound box is exemplified by the intelligent sound box, the microphone is arranged on the intelligent sound box, when sound exists around the intelligent sound box, the intelligent sound box receives the sound through the microphone arranged on the intelligent sound box, the received sound is analyzed, and then the sound source position of the sound can be determined. In this embodiment, the sound received by the smart speaker is not limited to the voice command of the user, and the sound received by the smart speaker may also be a voice of the user talking around the smart speaker or a sound generated by some actions, for example, a sound generated when the user performs an action of opening a door, placing a key, and the like.

In one embodiment, the received sound is analyzed to obtain corresponding intensity information, and then the corresponding sound source position can be calculated according to the intensity information.

In addition, it is understood that the present invention is not limited to the intelligent terminal, the intelligent terminal may be any intelligent device with a human-computer interaction function, such as a mobile phone, a computer terminal, a robot, etc. capable of performing human-computer interaction with a user, and the intelligent sound box is only an example in the embodiment.

In step S12, the user at the sound source position is identified, and a recognition result is obtained.

Further, the smart speaker may identify the user at the sound source location based on the determined sound source location. In this embodiment, identifying the user at the sound source position includes identifying the user at the sound source position, and determining whether the user is a registered user or an unregistered user, and if the user is a registered user, further determining the identity information of the user, and then performing subsequent steps S13 and S14 according to the identity information of the user; if the user is an unregistered user, prompting the user to register and recording the interactive data after the user registers, if the unregistered user refuses to register, ending the operation without responding to the execution of the user, or simply replying according to the question of the unregistered user.

The above-mentioned identification of whether the user is a registered user may be performed by extracting a sound feature of the received sound, or may be performed by photographing a user at a sound source position and by photographing a photographed image.

In another embodiment, the identifying the user at the sound source position further includes identifying the sound source emitting sound at the sound source position, determining whether the sound source emitting sound is the user, and if the sound source emitting sound is the user, determining whether the user at the sound source position is a registered user, determining whether the user is the registered user or an unregistered user, and if the sound source emitting sound is not the user (for example, a television, a pet, another electronic device, or another object capable of emitting sound), stopping the flow and not continuing the subsequent steps. Thus, objects other than the user that can emit or generate sound can be excluded. In this case, the sound source recognition of the sound emitted from the sound source position may be performed by the sound characteristics of the received sound, or by capturing an image of the sound source at the sound source position and recognizing the captured image.

In step S13, identity information of the user is determined according to the recognition result, and first historical interaction data corresponding to the identity information is acquired.

In this embodiment, after the user registers through the smart speaker to become a registered user, the smart speaker may establish a corresponding historical usage database according to the identity information of the user when registering, and during the process of interaction between the user and the smart speaker each time, the smart speaker may record the content of each interaction and store the content of each interaction into the corresponding historical usage database according to the identity information of the user. If the identification result in step S12 is that the user is a registered user, the identity information of the user may be further determined, and the first historical interaction data corresponding to the user may be further acquired from the corresponding historical usage database according to the identity information. It will be appreciated that since the historical usage database records the user's interaction process, the first historical interaction data may characterize the user's preferences, itinerary, lifestyle habits, physical condition, mood, etc.

In this embodiment, the identity information of the user may include fixed characteristic information of the user, such as a user name, a user birthday, a user gender, and a family member role of the user, and the first historical interaction data may include any type of interaction data of the user interacting with the smart speaker, where a preference, a travel plan, a living habit, a physical condition, an emotion, and the like of the user may be represented.

In step S14, intelligent interaction is performed with the user using the first historical interaction data.

The first historical interactive data can represent hobbies, travel plans, living habits, physical conditions, moods and the like of the user, so that when the intelligent sound box interacts with the user according to the obtained first historical interactive data, intelligent interaction which is more suitable for the personality of the user can be carried out with the user on the basis of the hobbies, travel plans, living habits, physical conditions, moods and the like of the user represented in the first historical interactive data.

For example, if the first historical interaction data records songs of singer a that the user often plays, it may be characterized that the user likes listening to the songs of singer a, and when the user sends a voice instruction to the smart speaker to play the songs (which song is not indicated in the voice instruction), the smart speaker may automatically play the songs of singer a according to the records in the first historical interaction data. Or the first interactive data records the comedy movies that the user often plays a certain actor as the lead actor, when the user sends a voice instruction to play the movies to the smart speaker, the smart speaker can automatically turn on the television or the projection device to play the comedy movies that the actor plays as the lead actor according to the record in the first historical interactive data.

In addition, the intelligent sound box can also perform emotion analysis on the received sound, so that the current emotion of the user can be obtained, the current emotion of the user obtained through analysis can be referred, and intelligent interaction with the user can be performed by combining the current emotion of the user.

In this embodiment, the user needs to perform identity registration before interacting with the smart sound box, and the smart sound box establishes a historical interaction database corresponding to the user according to the identity information obtained by the user registration, and stores the information interacting with the user into the historical interaction database corresponding to the identity information of the user. When the intelligent sound box receives the sound of the user, the user at the sound source position is identified, the identity information of the user is determined, historical interactive data corresponding to the user are further acquired, and then intelligent interaction can be carried out according to the historical interactive data corresponding to the user. Because the interaction at the moment is carried out based on the historical interaction data of the user, the hobbies, the travel plan, the living habits, the physical conditions, the moods and the like of the user are represented to have user personalized information, so that the intelligent interaction between the intelligent sound box and the user is more suitable for the hobbies, the travel plan, the living habits, the physical conditions, the moods and the like of the user, the interaction is more personalized, the personal characteristics of the user are better met, the interaction is more intelligent, and the user experience is better.

Further, referring to fig. 2, as shown in fig. 2, in one embodiment, the step S11 may include the following steps:

in step S111, sounds are received through a plurality of microphones provided at different locations of the smart terminal.

The intelligent terminal of this embodiment has a plurality of microphones of setting in different positions department, and when there was sound peripheral intelligent audio amplifier, a plurality of microphones of different positions department on the intelligent audio amplifier all can receive this sound.

In step S112, the intensity of the sound received by the plurality of microphones is analyzed, and the intensity information of the sound received by each of the microphones is obtained.

For the sound emitted by the same sound source, because the positions of the microphones are different, and the distances and angles of the microphones relative to the sound source are different, the intensity of the sound received by each microphone is different, so that the sound received by each microphone can be analyzed, and the intensity information of the sound received by each microphone can be obtained.

In step S113, a sound source position corresponding to the sound is calculated from the intensity information of the sound received by each microphone.

Since the intensity information of the sound received by each microphone is different, a relatively accurate sound source position can be calculated by combining the intensity information of the sound received by each microphone.

To explain the above embodiment, taking 3 microphones with different positions on the smart speaker as an example, as shown in fig. 3, 3 microphones (respectively, microphone a, microphone B, and microphone C) are respectively installed on the housing of the smart speaker, the sound source is located at S, all the 3 microphones of the smart speaker can receive the sound from the sound source S, the intensity analysis is performed on the sound received by the 3 microphones, so as to obtain the intensity information Sa, Sb, and Sc of the sound received by the microphone a, microphone B, and microphone C, and the distances from the microphone a, microphone B, and microphone C to the sound source S are La, Lb, and Lc, respectively, according to the following equations:

Sa＝k×S/(La×La)；

Sb＝k×S/(Lb×Lb)；

Sc＝k×S/(Lc×Lc)；

wherein k is a constant, S is the sound source position, and the ratio of La, Lb and Lc can be calculated according to the above equation, thereby obtaining the sound source position S.

It is understood that the above equation set may be adjusted according to the number of microphones, for example, if only the microphones a and B are provided, the equation set used includes only Sa ═ kxs/(La × La) and Sb ═ kxs/(Lb × Lb); if microphone D is also included, the set of equations used is augmented with the equation corresponding to microphone D.

Further, as is clear from the above description of step S12, when identifying the user at the sound source position, the user may be identified using images or using sound characteristics. Thus, referring to fig. 4, as shown in fig. 4, in one embodiment, step S12 may include the following steps:

in step S12a1, a user image at the sound source position is acquired.

After the sound source position determined by the intelligent sound box is determined, the orientation of the shooting device arranged on the intelligent sound box is adjusted, the shooting device is enabled to face the sound source position, the user image is shot through the shooting device, and the intelligent terminal can acquire the user image shot by the shooting device.

In step S12a2, corresponding two-dimensional image information or two-dimensional feature information is extracted from the user image, and it is determined whether or not two-dimensional image information or two-dimensional feature information matching the user image information or two-dimensional feature information can be found from the user two-dimensional image information or two-dimensional feature information stored in advance.

In this embodiment, first, two-dimensional image information or two-dimensional feature information included in a captured user image is acquired, and further, the acquired two-dimensional image information or two-dimensional feature information is matched with pre-stored user two-dimensional image information or user two-dimensional feature information, it can be understood that, since it is currently determined whether a user is a registered user, that is, the smart speaker cannot determine whether the two-dimensional image information or two-dimensional feature information in the acquired user image matches with the user two-dimensional image information or user two-dimensional feature information of the registered user, in this embodiment, the operation of the smart speaker is to determine whether the two-dimensional image information or two-dimensional feature information matching therewith can be found from the stored user two-dimensional image information or user two-dimensional feature information of the registered user, and if so, it is determined that the two-dimensional image information or two-dimensional feature information obtained through the user image at this time is the user two-dimensional image information or two-dimensional feature information of If the user two-dimensional feature information is obtained, continuing to execute step S12a3 to further identify whether the user is a registered user; otherwise, step S12a5 is executed.

The two-dimensional image information may refer to two-dimensional image information of all regions included in an image, or may be two-dimensional image information obtained by performing face recognition on a user image and then extracting a face recognition region. The two-dimensional feature information may refer to feature recognition performed on a user image, two-dimensional feature information extracted from a recognized feature region, for example, feature recognition performed on a human face in the user image, recognition performed on features of five sense organs included in the human face, and then two-dimensional feature information of the recognized features of the five sense organs.

In step S12a3, corresponding image depth information or depth feature information is extracted from the user image, and user depth information or user depth feature information corresponding to the two-dimensional image information or two-dimensional feature information that matches the image depth information or depth feature information is acquired, and it is determined whether the image depth information or depth feature information matches the user depth information or user depth feature information.

When the determination result in step S12a2 is that two-dimensional image information or two-dimensional feature information matching the two-dimensional image information or two-dimensional feature information can be found from the pre-stored user two-dimensional image information or user two-dimensional feature information, the present embodiment further extracts image depth information or depth feature information of the area corresponding to the two-dimensional image information or two-dimensional feature information. Further, according to the user two-dimensional image information or the user two-dimensional feature information matched with the two-dimensional image information or the two-dimensional feature information in the user image, user depth information or user depth feature information corresponding to the user two-dimensional image information or the user two-dimensional feature information is obtained. Further, whether the image depth information or the depth feature information matches the user depth information or the user depth feature information is determined, if yes, step S12a4 is executed to determine that the user is a registered user, otherwise, step S12a5 is executed to determine that the user is an unregistered user.

In the embodiment, after the user is identified through the two-dimensional information, the user is further identified through the three-dimensional information, so that the influence of articles such as user photos contained in the shot user image on the identification result can be eliminated.

The image depth information may be depth information corresponding to a region of the two-dimensional image information, and the depth feature information may be depth information corresponding to a region of the two-dimensional feature information.

In this embodiment, the shooting device may be a depth camera disposed on the smart speaker, and thus the image shot by the shooting device may include depth information to perform the matching operation.

In step S12a4, the user is determined to be a registered user.

When both the determination results of step S12a2 and step S12a3 are positive results, it may be determined that the user at the sound source position is a registered user. And then the step S13 can be executed continuously to continue the human-computer interaction with the user.

In step S12a5, the user is determined to be an unregistered user.

When the determination result of any one of the steps S12a2 or S12a3 is negative, it is determined that the user at the sound source position is an unregistered user, and at this time, a registration prompt may be issued to the user, and the user may interact with the user according to the user' S instruction after the user completes registration. If the user refuses to register, the interaction request of the user can be refused, or only basic interaction is carried out on the user, for example, corresponding questions are answered according to the questions of the user, and the like.

Further, it is also possible to determine whether the sound source at the sound source position is the user through steps S12a2 and S12a3, for example, if the sound source at the sound source position is an article such as a television, a pet, a mobile phone, etc., it is also possible to determine that the sound source at the sound source position is not the user through steps S12a2 and S12a3, and it is determined that the subsequent steps are not to be performed any more.

Further, referring to fig. 5, as shown in fig. 5, in another embodiment, step S12 may further include the following steps:

in step S12b1, the sound characteristics of the sound are extracted, and it is determined whether or not the sound characteristics matching the extracted sound characteristics can be found from the stored user sound characteristics.

In another embodiment, the smart speaker analyzes the acquired sound, i.e., may extract sound features in the sound, and then determine whether the stored user sound features have sound features matching the sound features. The sound characteristics may include characteristics such as timbre, tone, sound frequency, and the like. If so, the process continues to step S12b2 to determine that the user is a registered user, otherwise, the process continues to step S12b3 to determine that the user is an unregistered user.

In step S12b2, the user is determined to be a registered user.

When the determination results of step S12b1 are both positive results, it is determined that the user at the sound source position is a registered user. And then the step S13 can be executed continuously to continue the human-computer interaction with the user.

In step S12b3, the user is determined to be an unregistered user.

When the determination result of step S12b1 is a negative result, it is determined that the user at the sound source position is an unregistered user.

Further, it is also possible to determine whether the sound source at the sound source position is the user in step S12b1, for example, if the sound source at the sound source position is an article such as a television, a pet, a mobile phone, etc., it is also possible to determine that the sound source at the sound source position is not the user in step S12b1, and it is determined that the subsequent steps are not to be performed any more.

In other embodiments, the embodiment of step S12 shown in fig. 4 and the embodiment of step S12 shown in fig. 5 may be combined, that is, the user is identified by the user image and the user is also identified by extracting the sound feature of the sound. Specifically, after the user identification is completed through step S12 shown in fig. 4, a sound feature corresponding to the identity information is obtained according to the identity information of the user determined by the execution content of step S12 shown in fig. 4, the sound feature obtained by analyzing the received sound is matched with the sound feature corresponding to the identity information, and the implementation structure of step S12 shown in fig. 4 is further confirmed through the implementation manner of step S12 shown in fig. 5, so that the accuracy of the user identification is improved.

Further, referring to fig. 6, as shown in fig. 6, in one embodiment, the step S14 may include the following steps:

in step S14a1, the first historical interaction data is analyzed to obtain user usage information of the user.

The first historical interactive data can represent hobbies, travel plans, living habits, physical conditions, moods and the like of the user, so that when the intelligent sound box interacts with the user according to the acquired first historical interactive data, the first historical interactive data is analyzed to obtain user use information including hobbies, travel plans, living habits, physical conditions, moods and the like of the user, and then intelligent interaction which is more suitable for the individuality of the user can be carried out on the user use information obtained through analysis and the user.

In step S14a2, intelligent interaction is actively initiated to the user with the user usage information.

According to the user use information of the user such as hobbies, travel plans, living habits, physical conditions, emotions and the like obtained in the step S14a1, intelligent interaction is actively initiated to the user, and the use experience of the user is improved.

For example, information that a user inquires cold medicines in the morning is recorded in the first historical interactive data which is recorded recently, the first historical interactive data is analyzed to obtain the user state of the user who may catch a cold, when the user is identified again, the user state is determined to be the user who may catch a cold through the first historical interactive data, a prompt of remembering to take the medicines is actively sent to the user, or the physical condition is actively inquired of the user; or if the user inquires about the ticket information of a flight corresponding to the user's itinerary in the morning in the first history interactive data recorded recently, the user can be presented with the change of the ticket price of the flight, the ticket information of other flights close to the flight, and the like, actively when the user is identified again.

Further, referring to fig. 7, as shown in fig. 7, in another embodiment, the step S14 may further include the following steps:

in step S14b1, it is determined that the voice contains a voice command carrying an associated identity keyword.

In application, one smart sound box may generally correspond to a plurality of registered users, and the smart sound box of the present embodiment may associate, according to a setting or an instruction of a certain registered user, the smart sound box corresponding to the plurality of registered users through the association identity keyword, so as to establish an association relationship with the plurality of registered users. When the sound received by the intelligent sound box contains the associated identity keywords and the corresponding voice instruction, the associated identity keywords and the voice instruction can be confirmed.

In step S14b2, the corresponding other users are searched according to the associated identity keywords, and the second historical interaction data corresponding to the other users are extracted.

Further, the smart sound box may find the corresponding registered other users according to the determined associated identity keywords, and obtain the historical interaction data corresponding to the other users according to the identity information of the found registered other users. According to the explanation of the historical interaction data in the above embodiment, the second historical interaction data corresponding to other users may represent user characteristics of usage habits, preferences, and user states of other users.

In step S14b3, intelligent interaction is performed with the user using the first historical interaction data and the second historical interaction data.

Furthermore, intelligent interaction can be carried out with the user according to the first historical interaction data of the user identified by the intelligent sound box and the searched second historical interaction data of other registered users. Therefore, the interactive information of other users can be utilized to carry out intelligent interaction with the user, so that the user can know the states, hobbies, habits and the like of the other users, the user experience is improved, and the interactive content is richer.

The above embodiments are explained by specific application examples: if a family includes family member a, family member B and family member C, wherein family member a is mom, family member B is dad, family member C is son, and the 3 family members are registered users of the smart sound box, the association relationship between family member a and family member B can be established by using the association identity keyword of "wife" according to the setting or instruction of family member a, and the association relationship between family member A, B and family member C can be established by using the association identity keyword of "child". Therefore, the intelligent sound box can determine that the family member A and the family member B are wife and husband of each other and the family member C is a son of the family member A and the family member B according to the established association relation. When the smart speaker receives sound and determines that the user at the sound source position is a family member a, and analyzes the received sound, it is determined that the voice instruction contained in the sound is "do nothing for the work of the son today? Therefore, the associated identity keyword can be determined to be 'son', the intelligent sound box can determine that the 'son' corresponds to the family member C according to the established association relationship of the family member, can correspondingly extract the interactive data of the family member C today, answers the question of the family member A according to the interactive data of the family member C today, and performs intelligent interaction with the family member A.

It is understood that the implementation of step S14 shown in fig. 6 and the implementation of step S14 shown in fig. 7 may be implemented content of the smart terminal in different scenarios, that is, in other application scenarios, the implementation content of step S14 shown in fig. 6 and the implementation content of step S14 shown in fig. 7 may be performed simultaneously according to the needs of the application scenarios.

Further, referring to fig. 8, fig. 8 is a flowchart illustrating a user interaction method of an intelligent terminal according to a second embodiment of the present invention. As shown in fig. 8, the user interaction method of the present embodiment may at least include the following steps:

in step S21, a plurality of sounds are received, the plurality of received sounds are analyzed, and a plurality of sound source positions corresponding to the plurality of sounds are determined.

In step S22, the users at the sound source positions are identified to obtain a plurality of corresponding identification results.

In step S23, the identity information of each of the plurality of users corresponding to the plurality of sound source positions is determined based on the plurality of recognition results, and the first history interactive data of each of the plurality of users is acquired.

In this embodiment, the implementation contents of steps S21 to S23 are similar to steps S11 to S13 in the first embodiment of the user interaction method shown in fig. 1, except that the smart sound box in this embodiment receives a plurality of sounds corresponding to a plurality of sound sources, so that the processing of steps S21 to S23 is performed on the plurality of sounds respectively, and for concrete implementation contents, please refer to steps S11 to S13 shown in fig. 1 to fig. 4, which is not described herein again.

In step S24, intelligent interaction is sequentially performed with the multiple users according to the sequence of obtaining the multiple sounds by using the respective first historical interaction data of the multiple users.

In this embodiment, the sequence of receiving the plurality of sounds may be determined according to the time of the received sounds, and thus, the user at the sound source position may be intelligently interacted with the received sounds in sequence according to the sequence of the received sounds.

It can be understood that the users at the sound source positions of the received multiple sounds may not be all registered users, and if there are unregistered users, the identified registered users are intelligently interacted with the registered users according to the sequence of the received sounds.

Further, referring to fig. 9, fig. 9 is a flowchart illustrating a user interaction method of an intelligent terminal according to a third embodiment of the present invention. As shown in fig. 9, the user interaction method of the present embodiment may at least include the following steps:

in step S31, a plurality of sounds are received, the plurality of received sounds are analyzed, and a plurality of sound source positions corresponding to the plurality of sounds are determined.

In step S32, the users at the sound source positions are identified to obtain a plurality of corresponding identification results.

In step S33, the identity information of each of the plurality of users corresponding to the plurality of sound source positions is determined based on the plurality of recognition results, and the first history interactive data of each of the plurality of users is acquired.

In this embodiment, the implementation contents of steps S31 to S33 are similar to steps S11 to S13 in the first embodiment of the user interaction method shown in fig. 1 to fig. 4, except that the smart sound box in this embodiment receives a plurality of sounds corresponding to a plurality of sound sources, so that the processing of steps S31 to S33 is performed on the plurality of sounds respectively, and for concrete implementation contents, refer to steps S11 to S13 shown in fig. 1, which is not described herein again.

In step S34, the priority relationships between the users are obtained according to the identity information of each of the users.

In this embodiment, the plurality of users at the sound source position may be respectively identified according to the received plurality of sounds, so as to obtain the identity information of the plurality of users. In this embodiment, the priority relationship may be set according to the user identity information, and thus, the priority relationship between the multiple users may be determined according to the respective identity information of the multiple identified users.

In step S35, intelligent interaction is sequentially performed with the plurality of users in accordance with the priority relationship, using the first historical interaction data of each of the plurality of users.

In this embodiment, intelligent interaction may be sequentially performed with the plurality of users according to the identified priority relationships of the plurality of users and the priority relationships.

It is understood that the users at the sound source positions of the received multiple sounds may not be all registered users, and there are unregistered users, so that the present embodiment intelligently interacts the identified registered users with the multiple users in turn according to the priority relationship. In addition, there may be a case where priority relationships of some users in the multiple users are the same, and at this time, in combination with the second embodiment of the user interaction method shown in fig. 7, users with the same priority relationships may perform intelligent interaction with the users in turn according to the order of received sounds.

Further, referring to fig. 10, fig. 10 is a schematic structural diagram of an embodiment of the intelligent terminal of the present invention. As shown in fig. 10, the intelligent terminal 100 of the present embodiment includes a sound acquiring device 104, a human-computer interaction circuit 103, a memory 102, and a processor 101, wherein the sound acquiring device 104, the human-computer interaction circuit 103, the memory 102, and the processor 101 are connected to each other. In this embodiment, the smart sound box is taken as an example, the smart sound box 100 may be set to be in various shapes such as a cylinder, a rectangle, a square, etc. according to the actual situation, the present embodiment is not limited, and the present embodiment takes a rectangle as an example. The sound acquiring device 104 may be one or more microphones disposed inside or outside the smart sound box 100 for acquiring the sound of the user. The human-computer interaction circuit 103 may include a human-computer interaction chip and related circuits disposed inside the smart speaker, and a human-computer interaction interface disposed on the housing of the smart speaker, and is configured to perform human-computer interaction with a user according to an instruction of the processor 101, and display related interaction content or anthropomorphic expressions through the human-computer interaction interface during the human-computer interaction. Memory 102 is disposed within smart sound box 100 for storing computer instructions executed by processor 101. The processor 101 is configured to execute the computer instruction stored in the memory 102 to generate a corresponding human-computer interaction control instruction, and send the control instruction to the human-computer interaction circuit 103, so that the human-computer interaction circuit 103 implements any embodiment of the first to third embodiments of the user interaction method of the intelligent terminal shown in fig. 1 to 9 according to the control instruction, for specific implementation, please refer to the first to third embodiments of the user interaction method of the intelligent terminal shown in fig. 1 to 9, which is not described herein again.

Further, the intelligent terminal of this embodiment further includes a camera 105, the camera 105 is respectively connected to the processor 101 and the memory 102, and the camera 105 may be a depth camera, so as to obtain depth information of the captured user image. The shooting device 105 is configured to shoot a user at a sound source position to obtain a corresponding user image, so that any embodiment of the first to third embodiments of the user interaction methods of the intelligent terminal shown in fig. 1 to 9 is implemented by using the user image, which is not described herein again.

Referring to fig. 11, fig. 11 is a schematic structural diagram of a storage medium according to an embodiment of the present application. As shown in fig. 11, the storage medium 200 in this embodiment stores executable computer program data 201, and the computer program data 201 is executed to implement any of the first to third embodiments of the user interaction methods of the intelligent terminal shown in fig. 1 to 9.

In this embodiment, the storage medium 200 may be a storage medium with a storage function, such as a storage module of an intelligent terminal, a mobile storage device (e.g., a mobile hard disk, a usb disk, etc.), a network cloud disk, an application storage platform, or a server. Further, the storage medium may also be the memory 102 shown in fig. 10 described above.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A user interaction method of an intelligent terminal is characterized by comprising the following steps:

2. The method of claim 1, wherein analyzing the received sound to determine a sound source location corresponding to the sound comprises:

receiving sound through a plurality of microphones arranged at different positions of the intelligent terminal;

analyzing the intensity of the sound received by the microphones to obtain the intensity information of the sound received by each microphone;

and calculating the sound source position corresponding to the sound according to the intensity information of the sound received by each microphone.

3. The method according to claim 1, wherein the identifying the user at the sound source position to obtain an identification result comprises:

acquiring a user image at the sound source position;

extracting corresponding two-dimensional image information or two-dimensional characteristic information from the user image, and judging whether the two-dimensional image information or the two-dimensional characteristic information matched with the user image information or the two-dimensional characteristic information can be searched from the pre-stored user two-dimensional image information or the pre-stored user two-dimensional characteristic information;

if yes, extracting corresponding image depth information or depth characteristic information from the user image, acquiring user depth information or user depth characteristic information corresponding to the two-dimensional image information or the two-dimensional characteristic information matched with the image depth information or the depth characteristic information, and judging whether the image depth information or the depth characteristic information is matched with the user depth information or the user depth characteristic information; if so, determining the user as a registered user;

otherwise, determining the user as an unregistered user.

4. The method according to claim 1, wherein the identifying the user at the sound source position to obtain an identification result comprises:

extracting the sound features of the sound, and judging whether the sound features matched with the sound features can be searched from the stored user sound features;

if so, determining the user as a registered user;

otherwise, determining the user as an unregistered user.

5. The user interaction method of claim 1, wherein the intelligently interacting with the user using the first historical interaction data comprises:

analyzing the first historical interaction data to obtain user use information of the user;

and actively initiating intelligent interaction to the user by utilizing the user use information.

6. The user interaction method of claim 1, wherein the intelligently interacting with the user using the first historical interaction data comprises:

determining that the sound contains a voice instruction carrying a related identity keyword;

searching other corresponding users according to the associated identity keywords, and extracting second historical interaction data corresponding to the other users;

and intelligently interacting with the user by utilizing the first historical interaction data and the second historical interaction data.

7. The user interaction method of claim 1, further comprising:

the intelligent terminal receives a plurality of sounds, analyzes the received sounds and determines a plurality of sound source positions corresponding to the sounds respectively;

respectively carrying out identity recognition on the users at the positions of the sound sources to obtain a plurality of corresponding recognition results;

determining respective identity information of a plurality of users corresponding to the plurality of sound source positions respectively according to the plurality of identification results, and acquiring respective first historical interaction data of the plurality of users;

and carrying out intelligent interaction with the plurality of users in sequence according to the sequence of the acquired plurality of sounds by utilizing the respective first historical interaction data of the plurality of users.

8. The user interaction method of claim 7,

after the determining, according to the multiple recognition results, the respective identity information of the multiple users corresponding to the multiple sound source positions, respectively, and acquiring the respective first historical interaction data of the multiple users, the method further includes:

acquiring the priority relation among the users according to the respective identity information of the users;

and carrying out intelligent interaction with the plurality of users in sequence according to the priority relation by utilizing the respective first historical interaction data of the plurality of users.

9. The intelligent terminal is characterized by comprising a sound acquisition device, a man-machine interaction circuit, a memory and a processor, wherein the sound acquisition device, the man-machine interaction circuit, the memory and the processor are connected with each other;

the sound acquisition device is used for acquiring the sound of a user;

the memory is used for storing computer instructions executed by the processor;

the processor is configured to execute the computer instruction to generate a corresponding human-computer interaction control instruction, and send the control instruction to the human-computer interaction circuit, so that the human-computer interaction circuit implements the user interaction method according to any one of claims 1 to 8 according to the control instruction.

10. A storage medium, characterized in that computer program data are stored, which computer program data can be executed to implement the user interaction method according to any of claims 1-8.