CN110096251B

CN110096251B - Interaction method and device

Info

Publication number: CN110096251B
Application number: CN201810806493.6A
Authority: CN
Inventors: 朱碧军; 陈志远; 俞静飞
Original assignee: Nail Holding Cayman Co ltd
Current assignee: Nail Holding Cayman Co ltd
Priority date: 2018-01-30
Filing date: 2018-07-20
Publication date: 2024-02-27
Anticipated expiration: 2038-07-20
Also published as: WO2020015473A1; CN110096251A; SG11202100352YA; TW202008115A; JP2021533510A

Abstract

One or more embodiments of the present disclosure provide an interaction method and apparatus, where the method may include: detecting a user in a sensing area; providing interactive content to users in the sensing area; and when the target interactive object of the interactive content is a part of users in the sensing area, displaying the information of the target interactive object to the users in the sensing area.

Description

Interaction method and device

Technical Field

One or more embodiments of the present disclosure relate to the field of electronic technology, and in particular, to an interaction method and apparatus.

Background

Along with the continuous development of the intelligent technology, the electronic equipment has higher and higher intelligent degree, can interact with the user to a certain extent, and assists the user to complete related events. For example, the electronic device may perform the above-described interaction with the user by displaying the related content, playing the related content by voice, or the like on the screen.

Disclosure of Invention

In view of this, one or more embodiments of the present disclosure provide an interaction method and apparatus.

In order to achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:

According to a first aspect of one or more embodiments of the present specification, there is provided an interaction method, comprising:

detecting a user in a sensing area;

providing interactive content to users in the sensing area;

and when the target interactive object of the interactive content is a part of users in the sensing area, displaying the information of the target interactive object to the users in the sensing area.

According to a second aspect of one or more embodiments of the present specification, there is provided an interaction device comprising:

a detection unit that detects a user in the sensing area;

a providing unit for providing interactive contents for users in the sensing area;

and the display unit is used for displaying the information of the target interaction object to the user in the induction area when the target interaction object of the interaction content is a part of the user in the induction area.

Drawings

FIG. 1 is a schematic architecture diagram of an interactive system provided in an exemplary embodiment.

Fig. 2 is a flow chart of an interaction method provided by an exemplary embodiment.

Fig. 3 is a schematic diagram of an interaction scenario provided by an exemplary embodiment.

FIG. 4 is a schematic diagram of interactions for internal employees provided by an exemplary embodiment.

FIG. 5 is a schematic diagram of a user location guided by interactive content, as provided by an exemplary embodiment.

FIG. 6 is a schematic diagram of an interaction device for actively initiating an interaction with a user, as provided by an example embodiment.

FIG. 7 is a schematic diagram of another guidance of a user location through interactive content provided by an exemplary embodiment.

Fig. 8 is a schematic diagram of a normal interaction scenario provided by an exemplary embodiment.

FIG. 9 is a schematic diagram of adjusting interactive content according to an associated event, as provided by an example embodiment.

Fig. 10 is a schematic diagram of a speaker designated by an interactive device, provided by an exemplary embodiment.

Fig. 11 is a schematic diagram of another speaker designation by an interactive device, provided by an exemplary embodiment.

Fig. 12 is a schematic diagram of yet another speaker designation by an interactive device, provided by an exemplary embodiment.

Fig. 13 is a schematic diagram of a speaking sequence for designating an external person provided by an exemplary embodiment.

FIG. 14 is a schematic diagram of a labeling interaction object, as provided by an example embodiment.

FIG. 15 is a schematic diagram of a annotation target interactive object, provided in an exemplary embodiment.

FIG. 16 is a schematic diagram of a user determining the source of user speech provided by an exemplary embodiment.

Fig. 17 is a schematic diagram of determining a direction of origin of an audio message according to an exemplary embodiment.

FIG. 18 is a schematic diagram of a source user labeling user speech, as provided by an exemplary embodiment.

Fig. 19 is a schematic view of an apparatus according to an exemplary embodiment.

Fig. 20 is a block diagram of an interaction device provided by an example embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.

It should be noted that: in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; while various steps described in this specification may be combined into a single step in other embodiments.

In an embodiment, the interaction scheme of the present specification may be applied to an interaction device. The interactive device may be an electronic device dedicated to implementing interactive functions; alternatively, the interaction device may be a multifunctional electronic device having interaction functions, for example, the interaction device may include a PC, a tablet device, a notebook computer, a wearable device (such as smart glasses, etc.), and the present disclosure is not limited in one or more embodiments. During operation, the interactive device may operate an interactive system to implement an interactive scheme. Wherein the application program of the interactive system can be pre-installed on the interactive device so that it can be started and run on the interactive device; of course, when using techniques such as HTML5, the interactive system described above can be obtained and run without installing the application on the interactive device.

In one embodiment, FIG. 1 is a schematic architecture diagram of an interactive system according to an exemplary embodiment. As shown in fig. 1, the interactive system may comprise a server 11, a network 12, and an interactive device 13. In the running process, the server 11 may run a program on the server side of the interactive system to implement related functions such as processing; in the running process of the interaction device 13, a program on the client side of the interaction system can be run to realize functions of relevant information presentation, man-machine interaction and the like, so that the interaction system is realized by cooperation between the server 11 and the interaction device 13.

The server 11 may be a physical server including a separate host, or the server 11 may be a virtual server carried by a host cluster. The interaction device 13 may be an electronic device dedicated to implementing interaction functions; alternatively, the interaction device 13 may be a multifunctional electronic device having interaction functions, for example, the interaction device may include a PC, a tablet device, a notebook computer, a wearable device (such as smart glasses, etc.), which is not limited by one or more embodiments of the present disclosure. And the network 12 for interaction between the interaction device 13 and the server 11 may comprise various types of wired or wireless networks. In one embodiment, the network 12 may include a public switched telephone network (Public Switched Telephone Network, PSTN) and the internet. It should be noted that: an application of a client of the interactive system may be pre-installed on the interactive device such that the client may be started and run on the interactive device; of course, when an online "client" such as HTML5 technology is employed, the client can be obtained and run without installing a corresponding application on the interactive device.

In one embodiment, the interactive system described above may be implemented based on a mobile community office platform. The mobile group office platform can realize a communication function, and can be used as an integrated functional platform with a plurality of other functions, for example, the processing of group internal events such as approval events (such as leave-out, office article claim, finance and the like), attendance events, task events, log events and the like, and the processing of group external events such as meal ordering, purchasing and the like, which is not limited by one or more embodiments of the present specification; similarly, the mobile community office platform may implement the interactive system described above.

More specifically, the mobile community office platform may be carried in an instant messaging application of the related art, such as an enterprise instant messaging (Enterprise Instant Messaging, EIM) application, for example Skype For Business ^® 、Microsoft Teams ^® 、Yammer ^® 、Workplace ^® 、Slack ^® WeChat of enterprise ^® Sales and sales ^® Enterprise letter of fly ^® Easily trusted by enterprises ^® Etc. Of course, the instant messaging function is only one of the communication functions supported by the mobile group office platform, and the mobile group office platform can also implement other functions such as the above, which are not described herein. Wherein, the "group" in the present specification can The present application is not limited in this regard to various organizations, including businesses, schools, forces, hospitals, institutions, and the like.

In an embodiment, the interactive system may be implemented based on any other type of application, and is not limited to a mobile group office platform or similar, such as a common instant messaging application, etc., which is not limited in this specification.

Fig. 2 is a flow chart of an interaction method provided by an exemplary embodiment. As shown in fig. 2, the method may be applied to an interactive device, and may include the steps of:

step 202, a user in a sensing area is detected.

In an embodiment, the interaction device has a sensing distance, and a coverage area of the sensing distance forms a sensing area, such as a sector (or any other shape) area with a radius of 3 m; by detecting the sensing area, it can be determined whether there is a user within the sensing area.

In an embodiment, the interaction device may detect the user within the sensing area in any manner. For example, the interactive device may determine whether a user is present in the sensing area by implementing face detection.

And step 204, providing interactive contents for users in the sensing area.

In an embodiment, the interactive device may provide the interactive contents in any one manner or a plurality of combinations, which is not limited in this specification. For example, the interactive apparatus may comprise a display screen and display the interactive content on the display screen, thereby providing the interactive content to the user in the sensing area; for another example, the interactive apparatus may include a speaker, and voice broadcast the interactive contents through the speaker, thereby providing the interactive contents to the user in the sensing area; for another example, the interactive device may include a plurality of indicator lights, and provide the interactive content to the user in the sensing area by controlling the on/off states, colors, blinking patterns, etc. of the indicator lights.

In an embodiment, the sensing area may include a near field sensing area and a far field sensing area, where the near field sensing area is closer to the interactive device than the far field sensing area, i.e. where the "near field" and the "far field" are in a relative relationship, for example, a range of 0 to 1.5m may be defined as the near field sensing area, and a range of 1.5 to 3m may be defined as the far field sensing area.

In an embodiment, the interactive device may provide interactive content to a user within the near field sensing region; and the interactive device can send guiding information to the user in the far-field sensing area so as to guide the user to enter the near-field sensing area from the far-field sensing area, so that the user becomes the user in the near-field sensing area, and interactive content is provided for the user. When a user is in a far-field induction area, the user has a certain probability of hopeing to interact with the interaction equipment, and the far-field induction area possibly cannot provide a good interaction effect due to a long distance, so that the user can be screened whether the user really hopes to implement interaction or not by sending guide information to the user, and the user can enter the near-field induction area to obtain a better interaction effect. The interactive device can send the guiding information in any one mode or a plurality of combination modes, and the specification does not limit the guiding information; for example, the interactive device may display the guiding information on the display screen, for example, the interactive device may perform voice broadcasting on the guiding information through a speaker, and for example, the interactive device may light a prompting lamp or make the prompting lamp flash, so as to guide the user to enter the near-field sensing area.

In an embodiment, when the interaction device can implement the attendance operation, by identifying the user in the sensing area, the user in the sensing area and not yet checked in can be automatically implemented with the attendance operation in the attendance time period, whether the user is in the near-field sensing area or the far-field sensing area or not. When the residence time of the user in the far-field sensing area reaches a first preset time or the residence time of the user in the near-field sensing area reaches a second preset time, the interactive device may determine that the user may need to implement interactive operation, so as to provide interactive content for the user; since the user typically only approaches the interactive device if he wishes to interact with the interactive device, the second preset duration may be suitably smaller than the first preset duration to reduce the waiting time of the user.

In one embodiment, the interactive device may actively provide interactive content to users within the induction area, similar to "call-in" behavior between users, such as the interactive content may include "what may help you" and so on, so that users within the induction area are guided through the interactive content to assist them in completing related events.

Further, the interaction device may determine whether the user in the sensing area meets a preset condition, so as to provide the interaction content only to the user meeting the preset condition, for example, the preset condition may include at least one of the following: the residence time in the far-field sensing area reaches a first preset time, the residence time in the near-field sensing area reaches a second preset time, the user looks towards the interactive device, the face of the user is opposite to the interactive device or the included angle between the two is smaller than a preset angle, and the like, which is not limited in the specification.

In an embodiment, the interaction device may obtain an association event of a user in the sensing area, so as to adjust, when the interaction content is related to the association event, the interaction content provided to the user in the sensing area according to the state information of the association event; and when there is no associated event related to the interactive contents, default interactive contents may be provided. For example, when the interactive content is related to attendance, the interactive content may be "do you confirm early? "; whereas if the user's associated events in the sensing area include a sick approval event and have reached the leave time period involved in the sick approval event, the interactive content may be adjusted to "do you confirm to work? ". For another example, when the interactive content is related to the access of an external person, if the user in the sensing area is the external person and the corresponding related event is acquired as the access reservation event, then the interactive content may be "whether or not you need to be assisted in connecting to the access object? "; and if there is no corresponding access reservation event, the interactive content may be "please speak your access object".

In an embodiment, the interaction device may determine an identity type of the user in the sensing area, and then adjust the interaction content provided to the user in the sensing area according to the identity type. For example, the identity type may include: the users in the sensing area belong to members in the group or personnel outside the group, departments to which the users in the sensing area belong, and the like, so that interactive contents which are consistent with the identity type are provided for the users in the sensing area.

And 206, when the target interactive object of the interactive content is a part of users in the sensing area, displaying the information of the target interactive object to the users in the sensing area.

In an embodiment, when there are multiple users in the sensing area, the target interaction object may be only a part of the users (the number of the part of users may be one or multiple), and by displaying the information of the target interaction object to the users in the sensing area, it is ensured that each user in the sensing area can clearly know whether the user is the target interaction object of the interaction content.

In an embodiment, the image information of each user in the sensing area may be acquired separately, and a corresponding avatar image may be generated for each user. Then, when no interaction is implemented, the head portrait pictures of the respective users may be simultaneously shown; when the target interaction object is interacted, only the head portrait picture corresponding to the target interaction object may be shown, the head portrait pictures of other users may be shielded, or the head portrait pictures corresponding to the target interaction object and other users may be displayed in a distinguishing manner, for example, the head portrait picture of the target interaction object may be displayed in a central area, the head portrait pictures of other users may be displayed in an edge area, for example, the head portrait picture of the target interaction object may be displayed in an enlarged manner, the head portrait pictures of other users may be displayed normally or in a reduced manner, for example, the head portrait picture of the target interaction object may be displayed normally (in color), the head portrait pictures of other users may be displayed after the head portrait pictures of other users are gray-scale treatment, and the disclosure is not limited thereto.

In an embodiment, when the target interactive object of the interactive content is all users (the number of all users can be one or a plurality of users) in the sensing area, the information of the target interactive object is not required to be displayed to the users in the sensing area, which is helpful for the users to pay more attention to the interactive content; of course, even if the information of the target interaction object is displayed in the scene, implementation of the technical scheme of the specification is not affected.

In one embodiment, the user in the sensing area may be constantly changing; when the target interactive object of the interactive content is changed from a part of users in the sensing area to all users (for example, the users of the non-target interactive object leave the sensing area), the information of the target interactive object can be paused to be displayed to the users in the sensing area, so that smooth scene transition is realized by using the target interactive object of the interactive content as a part of users in the sensing area and the target interactive object of the interactive content as all users in the sensing area.

In an embodiment, the interaction device may determine identity information of the user as the target interaction object as information of the target interaction object; and then, displaying the identity information to the user in the sensing area. For example, the interaction device may present the identity information to the user in the sensing area while providing the interaction content; alternatively, the operations of providing the interactive content and displaying the identity information by the interactive device may be performed at different times, which is not limited in this specification.

In an embodiment, the interaction device may identify the user in the sensing area, for example, the interaction device may identify the user by using a physiological feature identification manner such as face recognition, fingerprint recognition, iris recognition, gait recognition, voiceprint recognition, or any other manner, which is not limited in this specification. When the identity of the first user as the target interactive object is successfully identified, the identity information may include a name (e.g., name, title, or other type) of the first user, for example, when the name of the first user is "xiaobai", what is needed by you, where "xiaobai" is the information of the target interactive object, and what is needed by you, which is the interactive content. When the identity of the second user as the target interaction object is not successfully identified, the identity information may include visual feature description information for the second user, e.g., the visual feature description information may include at least one of: for example, the interactive device may display "what needs are needed for a black coat wearing man" to the user in the sensing area, where "what needs are needed for a black coat wearing man" is the interactive content of the above-mentioned interactive content, where "what needs are needed for a black coat wearing man" is the information of the target interactive object of the estimated sex and the clothing.

In an embodiment, the interaction device may display user indication information corresponding to a user in the sensing area; the interactive device can determine user indication information corresponding to the user serving as the target interactive object to serve as information of the target interactive object; the interactive device may then highlight the determined user-pointing information to the user within the sensing area. For example, the interaction device may perform image acquisition on the user in the sensing area, so as to display the acquired user image as the user indication information; accordingly, the interaction device may perform visual effect enhancement on the user image corresponding to the target interaction object (such as circling the corresponding user image, displaying an arrow icon near the corresponding user image, etc.), or perform visual effect degradation on the user image corresponding to the non-target interaction object (such as performing content occlusion on the corresponding user image, etc.), so that the user in the sensing area can know the target interaction object.

In an embodiment, the interaction device may obtain an event assistance request sent by a user in the sensing area, and then respond to the event assistance request to assist in completing the corresponding event. For example, a user within the sensing area may make an interactive voice to the interactive device to "call employee little black", and the interactive device may make an explicit request for event assistance as a call request for "employee little black" to initiate a call to that "employee little black". Of course, the user in the sensing area may send the event assistance request in other manners besides the voice form, such as making a preset limb action in the space, etc., which is not limited in this specification.

In an embodiment, the interaction device may receive response information returned by the user in the sensing area for the interaction content, where the response information includes the event assistance request. For example, when the interactive content is "xiaobai," what can help your, the user in the sensing area can reply to "call employee xiaohei," and the interactive device can make sure that its event assistance request is a call request for "employee xiaohei," thereby initiating a call to the "employee xiaohei. The manner in which the interaction device provides the interaction content is not necessarily related to the manner in which the user returns the response information in the sensing area, and the two may be the same or different, which is not limited in this specification.

In an embodiment, when a plurality of users exist in the sensing area, the interaction device may select the users as the assistance objects according to a preset sequence; then, the interaction equipment sequentially sends prompts to the selected users so that the selected users send corresponding event assistance requests, and therefore the users in the sensing area can sequentially send event assistance requests, confusion caused when a plurality of users send event assistance requests simultaneously is avoided, the interaction equipment cannot accurately know the event assistance requests corresponding to the users, and the assistance efficiency and success rate of the interaction equipment to the users are improved.

In an embodiment, the interaction device may perform semantic recognition on the collected user voice to obtain the event assistance request; and the interactive device can perform voice feature recognition on the user voice to determine a source user of the user voice. Then, even if a plurality of users in the sensing area speak at the same time, the interaction device can also simultaneously distinguish speaking contents corresponding to the users respectively and respond to corresponding event assistance requests, so that assistance efficiency of the users is improved. The interaction equipment can perform identity recognition on the users in the induction area in advance to obtain identity information of each user; when a plurality of users exist in the sensing area, the interaction equipment can conduct voice feature recognition on collected user voices according to the voice features corresponding to the recognized users, so that source users of the user voices are determined, and compared with the fact that voice feature recognition is conducted on the user voices by using the full quantity of voice features, the duration occupied by the voice feature recognition can be greatly shortened.

In one embodiment, when the user in the sensing area is a member inside the community, the interaction device may assist in completing the corresponding community management event in response to the event assistance request; when the user within the sensing area is a group external person, the interaction device may send a reminder message to the associated group internal member in response to the event assistance request, assist the group external person in establishing communication with the associated group internal member, or direct the group external person to a processing location where the event is accessed; when the user in the sensing area is an administrator, the interactive device may assist in completing the corresponding device management event in response to the event assistance request.

In one embodiment, the interactive device may receive user speech uttered by a user within the sensing region and respond to the user speech. For example, a user in the sensing area may actively send user speech to the interactive device, such as for example, for sending an event assistance request to the interactive device, for daily greetings to the interactive device, for sending control instructions to the interactive device, etc., which is not limiting in this description. For another example, the user in the sensing area may respond to the interactive operation performed by the interactive device by sending a corresponding user voice to the interactive device, for example, when the interactive operation performed by the interactive device is to ask whether the user in the sensing area needs help, the user voice sent by the user may inform the interactive device what kind of help is needed, etc., which is not limited in this specification.

In an embodiment, the interaction device may perform semantic recognition on the user speech; because the same pronunciation may have a plurality of corresponding words or words, and certain distortion or noise interference may also exist in the pickup process of the user voice by the interaction device, a plurality of semantic recognition results may be obtained after the interaction device recognizes the user voice. The interaction equipment can score each semantic recognition result according to a predefined semantic recognition algorithm to obtain corresponding confidence; when the confidence coefficient reaches a preset value, the reliability of the corresponding semantic recognition result is high enough. Further, if a plurality of semantic recognition results with the matching degree reaching a preset value with the user voice exist at the same time, the interaction device can display a plurality of corresponding semantic recognition result options for the user in the induction area, so that the user can select the semantic recognition result options to accurately express the real intention of the user, and then respond to the user voice according to the semantic recognition result corresponding to the selected semantic recognition result options; in the selecting process, the user may read the semantic recognition result corresponding to the semantic recognition result option desired to be selected, or read the sequence (such as "first", "leftmost" and the like) of the semantic recognition result option desired to be selected, which is not limited in this specification.

In one embodiment, the interactive device may determine the direction of origin of the user's voice and respond to the user being in the direction of origin of the user's voice. In one case, after determining the source direction of the user voice, the interactive device directly defaults to only the user who uttered the user voice at the source direction, and thus may respond directly toward the source direction of the user voice, such as playing the interactive voice, etc. In another case, the interactive device may determine a user whose source direction of user speech is present, and if multiple users are present at the same time, the interactive device may further determine the source user of the user speech, thereby responding to the source user.

In an embodiment, the interaction device is built with a microphone array, and the user voice can be received through the microphone array, wherein the microphone array comprises a first microphone arranged relatively to the left and a second microphone arranged relatively to the right; and determining the source direction of the user voice according to the receiving time difference of the first microphone and the second microphone to the user voice. For example, the first microphone can receive user speech earlier than the second microphone when the user in the sensing area is located on the left side, and the second microphone can receive user speech earlier than the first microphone when the user in the sensing area is located on the right side. For a specific scheme how to determine the source direction of the user voice based on the receiving time difference, reference may be made to related technical schemes in the prior art, which will not be described herein.

In an embodiment, when there are multiple users located in the source direction of the user voice, the interactive device may determine the source user of the user voice according to the facial action of each user in the multiple users (for example, the facial action is obtained by capturing an image through a camera built in the interactive device), and respond to the source user. The facial movements of the user may include movements of one or more parts such as cheek, mouth, chin, etc., which are not limited in this specification. Taking mouth movements as an example, when there are a plurality of users in the source direction of the user voice, but only one user's mouth has an opening and closing action, the user can be determined as the source user of the user voice; for example, although the opening and closing operations are performed in the mouths of a plurality of users, only the number, the magnitude, and the like of opening and closing times of one user are matched with the user voice, and the user can be determined as the source user of the user voice.

In one embodiment, when the interactive device is mounted on a wall surface, the user is typically only able to pass in front of the interactive device and make user speech; when the interactive device is assembled in other ways, the user may appear in front of or behind the interactive device, so that the audio message collected by the interactive device may come from the user located in front of or behind, and if there is a user in the sensing area of the interactive device, and the other user speaks when just passing behind the interactive device, the interactive device may be mistaken for the user voice uttered by the user in the sensing area. Thus, when the interactive device receives the audio message, it may determine whether the audio message is a user voice uttered by a user in the sensing area based on the source direction of the audio message and whether the user is present in the sensing area.

For example, the interaction device has built-in a microphone array comprising a third microphone relatively close to the sensing area, a fourth microphone relatively far from the sensing area; when an audio message is received through the microphone array, determining the source direction of the audio message according to the receiving condition of the third microphone and the fourth microphone on a high-frequency part in the audio message; when the source direction is relatively close to the sensing area, the high-frequency part of the audio message is affected by the absorption of the shell of the interaction device, so that the high-frequency part of the audio message received by the fourth microphone is smaller than the high-frequency part of the audio message received by the third microphone, and when the source direction is relatively far away from the sensing area, the high-frequency part of the audio message is also affected by the absorption of the shell of the interaction device, so that the high-frequency part of the audio message received by the third microphone is smaller than the high-frequency part of the audio message received by the fourth microphone, and therefore, the source direction of the audio message can be accurately determined through the receiving conditions of the high-frequency part of the audio message received by the third microphone and the fourth microphone.

When the source direction is one side relatively close to the sensing area and a user exists in the sensing area, the interactive equipment can judge that the audio message is user voice sent by the user in the sensing area; otherwise, the interactive device may determine that the audio message is not user speech uttered by a user in the sensing area, such as when the source direction is a side relatively far from the sensing area, or when the source direction is a side relatively close to the sensing area but no user is present in the sensing area.

As for the first microphone, the second microphone, the third microphone, and the fourth microphone, it should be noted that: in order to distinguish the respective microphones in the left-right direction, the different microphones are thus distinguished as a first microphone and a second microphone, and in order to distinguish the respective microphones in the front-rear direction, the different microphones are thus distinguished as a third microphone and a fourth microphone, but the number of microphones actually included in the microphone array is not limited in this specification. For example, the microphone array may include one or more first microphones and one or more second microphones; as another example, one or more third microphones and one or more fourth microphones may be included in the microphone array. Meanwhile, when the microphone array needs to resolve the source direction in the left-right direction and the front-back direction at the same time, the microphone array does not necessarily need to contain four microphones at the same time; in other words, the first microphone and the second microphone, the third microphone and the fourth microphone only play roles when implementing related functions, and in fact, the microphone array may include a smaller number of microphones, for example, the microphone array may include three microphones, where the microphone 1 and the microphone 2 are located in a row in the front-back direction and are separately located in the left-right direction, and the microphone 3 is located in front of or behind the microphone 1 and the microphone 2, so that the microphones 1-3 form a positional relationship similar to a "delta" shape, wherein: the microphone 1 and the microphone 2 can be used as a first microphone and a second microphone for distinguishing the source direction of the voice of the user in the left-right direction; and, the microphone 1 and the microphone 2 may be used as a third microphone, the microphone 3 may be used as a fourth microphone (the microphone 3 is located at the rear of the microphone 1 and the microphone 2), or the microphone 3 may be used as a third microphone, the microphone 1 and the microphone 2 may be used as a fourth microphone (the microphone 3 is located at the front of the microphone 1 and the microphone 2) for distinguishing the source direction of the audio message in the front-rear direction.

In an embodiment, the interactive device may detect the number of users in the sensing area, for example, face detection and counting after the user is acquired by the camera, which is not limited in this specification. When a plurality of users exist in the sensing area, the interaction equipment can respectively display head portrait pictures corresponding to the users so as to be used for representing the users; when the user in the sensing area increases, decreases or changes, the head portrait picture displayed by the interactive device can also change accordingly. When the interactive device receives the audio message and confirms that the audio message originates from the user in the sensing area, the head portrait pictures of the user from which the user voice originates can be displayed differently from the head portrait pictures of other users, so that the user can determine that the interactive device has successfully received the user voice and distinguish the source user by looking at the changes of the head portrait pictures, and the interactive device does not need to worry about that the interactive device does not receive the user voice or recognize errors. The head portrait pictures of the source user of the user voice and the head portrait pictures of other users can be displayed in a distinguishing mode, and the head portrait pictures are not limited in the specification; for example, the head portrait pictures of the source user of the user voice can be displayed in the central area, and the head portrait pictures of other users can be displayed in the edge area; for another example, the head portrait pictures of the source user of the user voice can be displayed in an enlarged mode, and the head portrait pictures of other users can be displayed normally or in a reduced mode; for example, the head portrait pictures of the user from which the user voice originates may be displayed normally (in color), or the head portrait pictures of other users may be displayed after the gradation processing.

For ease of understanding, the technical solutions of one or more embodiments of the present disclosure will be described by taking an enterprise instant messaging application "enterprise WeChat" as an example. FIG. 3 is a schematic diagram of an interaction scenario provided by an exemplary embodiment; as shown in fig. 3, it is assumed that an interactive device 3 is provided at an office of an enterprise AA, and an enterprise WeChat client is operated on the interactive device 3, so that the interactive device 3 can implement an interactive scheme of the present specification based on the enterprise WeChat client.

In an embodiment, the interaction device 3 is equipped with a camera 31, and the camera 31 can form a corresponding shooting area 32 to serve as a corresponding sensing area of the interaction device 3; accordingly, the interaction device 3 may determine, from the image obtained by photographing the photographing region 32 by the camera 31, a user entering the photographing region 32, such as the user 4 entering the photographing region 32 in fig. 3. Of course, in addition to the camera 31, the interaction device 3 may determine the user entering the sensing area by sound detection, infrared detection or other means, which is not limited in this specification.

FIG. 4 is a schematic diagram of interactions for internal employees provided by an exemplary embodiment. As shown in fig. 4, the interaction device 3 may be equipped with a screen 33, which screen 33 may be used for displaying the user image 41 corresponding to the user 4 acquired by the camera 31. The interaction device 3 may perform identity recognition on the user 4, for example, perform face recognition based on the face image acquired by the camera 31, which is not limited in this specification; assuming that the interaction device 3 recognizes that the user 4 is an internal employee "xiaobai", the corresponding identity information 42 may be shown on the screen 33, e.g. the identity information 42 may be referred to as "xiaobai" for the user 4.

When the user 4 is identified as the internal employee "white", the interaction device 3 can query the attendance data of the internal employee "white", and if the user has not checked the attendance, the attendance operation for the internal employee "white" can be automatically implemented. In order to make the user 4 aware that the attendance operation has been completed, the interactive device 3 may provide the user 4 with corresponding interactive contents, for example, the interactive contents may include a tag 43 shown on the screen 33, the tag 43 containing information of "on duty" indicating that the type of the attendance operation is "on duty punching"; the interactive content may also be provided to the user 4 in other forms, such as when the interactive device 3 comprises a speaker 34, voice information such as "xiaobai, successful card punch on duty" may be played through the speaker 34. Similarly, the interaction device 3 may perform automated on-duty operations on other internal employees of the enterprise AA, and the interaction device 3 may also perform automated off-duty operations on internal employees of the enterprise AA, which are not described in detail herein.

In an embodiment, the sensing area of the interaction device 3 may be divided into a plurality of sub-areas according to the distance from the interaction device 3, such as the shooting area 32 is divided into a far-field shooting area 321 (the distance from the interaction device 3 is 1.5-3.0 m) and a near-field shooting area 322 (the distance from the interaction device 3 is 0-1.5 m) in fig. 3. Wherein, if the user is currently in the attendance time period, the interaction device 3 can implement the automatic attendance operation for the user 4 regardless of whether the user is in the far-field photographing region 321 or the near-field photographing region 322. If the user 4 is in the attendance time period but the user 4 has completed attendance, or in other time periods, the interaction device 3 may default that the user 4 only passes temporarily and does not have an interaction intention when the user 4 is in the far-field shooting area 321, so that the interaction with the user 4 may not be actively initiated (i.e. interaction content is not provided to the user 4); however, if the continuous stay time of the user 4 in the far-field photographing region 321 reaches the first preset time period (e.g., 3 s), the interactive apparatus 3 may determine that the user 4 has an interactive intention, and thus may provide the interactive contents to the user 4. Similarly, if in the attendance period but user 4 has completed attendance, or in other periods, the interaction device 3 may default to user 4 only passing temporarily and no willingness to interact when user 4 is in the near field capture area 322, and thus may not actively initiate interaction with user 4 (i.e., not provide interaction content to user 4); however, if the continuous stay time of the user 4 in the near field photographing region 322 reaches the second preset time period, the interactive apparatus 3 may determine that the user 4 has an interactive intention, and thus may provide the interactive contents to the user 4. Because the near-field capturing area 322 is relatively closer to the interaction device 3, the action of the user 4 actively entering the near-field capturing area 322 may include a certain interaction wish, so that the second preset duration may be appropriately smaller than the first preset duration, for example, the first preset duration is 3s, and the second preset duration is 1s; in a more specific case, the second preset duration may be 0, which corresponds to that the interaction device 3 defaults to the user 4 entering the near field capturing area 322 having an interaction wish, so that the interaction content may be provided to the user 4 without delay.

When the user 4 is in the far-field photographing region 321, in order to ensure effective communication between the interactive device 3 and the user 4 and improve interactive efficiency, the interactive device 3 may guide the user 4 to move from the far-field photographing region 321 to the near-field photographing region 322 through interactive contents. For example, FIG. 5 is a schematic diagram of a user location guided by interactive content provided by an exemplary embodiment; as shown in fig. 5, the interactive device 3 may show interactive contents 511 in the form of text through the interactive presentation area 51 on the screen 33, such as the interactive contents 511 being "please walk up to within 1.5 meters", guiding the user 4 to move from the far-field photographing area 321 to the near-field photographing area 322. In addition to the interactive content 511 in text form, the interactive device 3 may play the interactive content in speech form through the speaker 34, such as "xiaobai", which is identity information, "xiaobai", which is interactive content, "xiaoqian", which guides the user 4 to move from the far field photographing area 321 to the near field photographing area 322. Meanwhile, the interaction device 3 may also control the indicator light 35 to realize breathing type flickering, which may attract the attention of the user 4, equivalent to conveying the interaction content to the user 4, so as to guide the user 4 to move from the far-field photographing region 321 to the near-field photographing region 322. Of course, the interactive device 3 may communicate the interactive contents in one of the above-mentioned text form, voice form, light form, etc., which is not limited in this specification.

When the user 4 enters the near field photographing region 322 (actively or under the guidance described above), the interactive device 3 may guide the user 4 to speak his own interactive purpose. For example, FIG. 6 is a schematic diagram of an interaction device actively initiating an interaction with a user provided by an example embodiment; as shown in fig. 6, the interactive device 3 may play interactive contents in the form of voice through the speaker 34, such as "xiaobai, what can help you? "(where" xiaobai "is identity information," what can help you "is interactive content), while the interactive device 3 can show interactive content 512 in text form in the interactive presentation area 51, such as" try-so-say "and" call Zhang Sano "for guiding the user 4 to express interactive purposes to the interactive device 3 in speech form.

It should be noted that: the interaction device 3 does not have to direct the user 4 from the far field shot region 321 to the near field shot region 322, e.g. the interaction device 3 may also direct the user 4 of the far field shot region 321 to speak his own interaction purpose. For example, the interaction device 3 may also detect ambient noise, first directing the user 4 from the far field capture area 321 to the near field capture area 322 when the noise level is greater than a preset value, then directing the user 4 to speak the self-interaction objective, and directly directing the user 4 of the far field capture area 321 to speak the self-interaction objective when the noise level is less than the preset value.

In an embodiment, when the user 4 is an internal member of the enterprise AA, the interactive device 3 may obtain the name "xiaobai" of the user 4, so that the interactive device 3 may guide the user 4 from the far-field photographing area 321 to the near-field photographing area 322 through "xiaobai" as shown in fig. 5, where the name "xiaobai" is used as the identity information and "you are a little away from me" is used as the interactive content. For the external personnel of the enterprise AA, the interaction device 3 may not be able to obtain the corresponding name, so the identity information used in the interaction process is different from the embodiment shown in fig. 5. For example, FIG. 7 is a schematic diagram of another guidance of a user location through interactive content provided by an exemplary embodiment; as shown in fig. 7, assuming that the interactive apparatus 3 captures a user image 71 of a user through the camera 31, but the user is an external person of the enterprise AA, the interactive apparatus 3 fails to acquire its name, so that when the user is guided to move from the far-field photographing region 321 to the near-field photographing region 322, the interactive apparatus 3 may show interactive contents 513 in text form, such as the interactive contents 513 being "please get closer to within 1.5 meters", in the interactive presentation region 51, and the interactive apparatus 3 may also play interactive contents in voice form, such as "hello, you are a little distant from me" (identity information of the user is omitted), through the speaker 34, and the interactive apparatus 3 may also control the indicator lamp 35 to implement such as breathing flicker, so as to guide the user to move from the far-field photographing region 321 to the near-field photographing region 322.

In an embodiment, the interaction device 3 may learn about the association event of the user in the sensing area by accessing the enterprise WeChat service, and possibly change the provided interaction content based on the association event. For example, FIG. 8 is a schematic diagram of a normal interaction scenario provided by an exemplary embodiment; as shown in fig. 8, assuming that the interactive device 3 detects the user 4 located in the photographing area 32 during the working period and recognizes that the user 4 is an internal employee "white" of the enterprise AA, if the interactive device 3 determines that the interaction purpose of the user 4 is under the work attendance, but because it is still in the working period, when the interactive device 3 does not inquire that the user 4 has an associated event related to the under work attendance, the interactive content 514 shown in the interactive presentation area 51 may be "do you determine to be early? ". FIG. 9 is a schematic diagram of adjusting interactive content according to an associated event, as provided by an exemplary embodiment; as shown in fig. 9, assuming that the interactive device 3 detects the user 4 located in the photographing area 32 during the working period and recognizes that the user 4 is an internal employee "white-out" of the enterprise AA, if the interactive device 3 queries that the user 4 has a submitted sick and false approval event and has reached the sick and false time revealed by the sick and false approval event, the interactive contents 515 shown in the interactive presentation area 51 may be "do you determine to get to work? ".

In an embodiment, a plurality of users may exist in the shooting area 32 of the interaction device 3 at the same time, and the interaction device 3 may communicate with the users through appropriate interaction content. FIG. 10 is a schematic diagram of a speaker designated by an interactive device, provided by an exemplary embodiment; as shown in fig. 10, assuming that a plurality of users exist in the sensing area 32, corresponding to the user images 81 to 82 shown on the screen 33, respectively, for example, the interactive apparatus 3 may recognize that the user corresponding to the user image 81 is "small white", the user corresponding to the user image 82 is "small black", etc., and display the designation of each user as identity information in the vicinity of the corresponding user image, for example, the identity information 91 of the corresponding user is shown over the user image 81 as "small white", the identity information 92 of the corresponding user is shown over the user image 82 as "small black", etc. Because of the limited interaction capabilities of the interaction device 3, the interaction device 3 may interact with only a part of the users at the same time, also in order that the interaction device 3 may be able to clearly learn about the interaction purposes of the respective users. The interactive device 3 may select the target interactive object (i.e., the above-mentioned partial users) in a certain manner, for example, in the order of decreasing the distance between each user and the interactive device 3, in the order of increasing the included angle between the front face of each user and the shooting direction of the camera 31, in the order of increasing the height of each user, and the like, which is not limited in this specification. Assuming that the interaction device 3 wants to interact with the user "small white" corresponding to the user image 81, in order to avoid misunderstanding by other users in the sensing area, when the interaction device 3 needs to provide the interaction content, the user in the sensing area expresses that the target interaction object corresponding to the interaction content is the user "small white", for example, what can you be helped by playing the interaction content by the interaction device 3 through the speaker 34? "at the same time, the identity information of" xiaobai "of the user can be added and played, so that the actual playing content can be" xiaobai ", what can help you? ", so that other users can make sure that the interactive content" what can help you? The target interaction object of the ' is the ' small white ' of the user.

FIG. 11 is a schematic diagram of another speaker designation by an interactive device provided by an exemplary embodiment; as shown in fig. 11, if a plurality of users such as a user "small white", a user "small black", etc. speak their own interaction purposes at the same time, the interaction device 3 may not be able to accurately learn the interaction purpose of each user due to sound confusion, or the interaction device 3 may not be able to respond to the interaction purposes of a plurality of users at the same time, or for other reasons, the interaction device 3 may provide interaction contents to guide the users to express their own interaction purposes in turn. For example, the interactive device 3 may show interactive content 516 in the interactive presentation area 51, which interactive content 516 may include "please not speak at the same time"; further, when the interaction device 3 determines that the speaking sequence of each user is that the user's "small white" speaks first and the user's "small black" speaks later according to the above embodiment, the interaction device 3 may play the interaction content "i can not hear clearly" through the speaker 34, and add the identity information for playing the user's "small white" while not xx you can speak first ", so that the actually played content may be" i can hear clearly "and not speak first" so that other users can determine that the target interaction object of the interaction content is the user' small white ".

In addition to the embodiment shown in fig. 10-11, the order of speech among multiple users may be specified in voice form, in a variety of other ways. For example, FIG. 12 is a schematic diagram of yet another speaker designated by an interactive device as provided by an exemplary embodiment; as shown in fig. 12, when the interaction device 3 determines that the speaking sequence of each user is that the user "small white" speaks first and that the user "small black" speaks later, the interaction device 3 may mark the user image 81 corresponding to the user "small white", for example, add the mark box 810 in the face area, so that even if the interaction content is "what can help you", "please speak", etc., each user can determine that the target interaction object of the interaction content is the user "small white". Of course, in the embodiment shown in fig. 12, when the interactive device 13 shows the interactive text 517 in the interactive display area 51, the interactive text 517 includes the identity information of the user "xiaobai" in addition to the interactive content "please xx speak", so that the whole content of the interactive text 517 is "please xiaobai speak", which can also indicate to each user that the current target interactive object is the user "xiaobai".

FIG. 13 is a schematic illustration of a speaking sequence for an outsider specified in accordance with an exemplary embodiment; as shown in fig. 13, assuming that the users corresponding to the user images 81-82 are all outside personnel of the enterprise AA, the interaction device 3 cannot obtain the names of the users, but may express the identity information of each user in other manners so as to indicate the target interaction object corresponding to the interaction information. For example, when the interactive apparatus 3 determines that the target interactive object is the user corresponding to the user image 81, if the user image 81 corresponds to a female user and the user image 82 corresponds to a male user, the identity information of each user, such as "the female", "the male", etc., may be expressed by gender; therefore, when the voice content played by the interactive apparatus 3 through the speaker 34 is "i am not hearing clear" and the lady is not required to speak first ", all users in the photographing area 32 can determine that the interactive content is" i am not hearing clear "and the xx is not required to speak first" and determine that the target interactive object is the user corresponding to the user image 81 based on the identity information "the lady".

By providing interactive content to users within the capture area 32, the user is not required to respond in some scenarios, such as "work-on-duty successful card-punching" of the interactive content in the embodiment shown in FIG. 4; while in other scenarios a response from the user may be obtained, and the response may include a user-initiated event assistance request, so that the interaction device 3 assists the user in completing the corresponding event, such as "do you determine to get off duty? "when the response returned by the user" white-out "is" yes ", the interaction device 3 may determine, based on the semantic analysis, that the user" white-out "initiated an event assistance request for the" under-flight event ", and thus the" under-flight event "may be completed with assistance by the interaction device 3. Similarly, in a multi-person scenario, such as that in fig. 13, the interactive device 3 makes a voice "i am inaudible," and if the response returned by the female user is "call white", the interactive device 3 may determine, based on semantic analysis, that the user "white" initiates an event assistance request for the "call event" and that the call object is the user "white", so that the interactive device 3 may initiate a call to the user "white" to assist in completing the "call event.

Of course, besides the case of responding to the interactive contents, the user in the photographing area 32 may also directly initiate an event assistance request to the interactive apparatus 3, and the interactive apparatus 3 may assist in completing the corresponding event, which is similar to the case of responding to the interactive contents, and will not be described herein.

In the embodiments shown in fig. 10 to 13, the interaction device 3 may ensure that a plurality of users in the photographing area 32 speak in sequence, so that the interaction device 3 may determine the event assistance requests initiated by the respective users, so as to assist in completing the corresponding events, respectively. In an embodiment, the interaction device 3 may receive user voices emitted by multiple users at the same time, accurately separate each user voice based on sound features, and determine a mapping relationship between each user voice and the user in the shooting area 32 through sound feature recognition (such as voiceprint recognition, etc.), so that the interaction device 3 can obtain event assistance requests of multiple users at the same time, and assist in completing corresponding events at the same time, thereby significantly improving assistance efficiency for multiple users.

In one case, the interaction device 3 may directly compare the collected user voice with the voice feature library, for example, the voice feature library may include voiceprint features of all internal employees in the enterprise AA, so as to determine, based on the comparison result, the internal employee corresponding to the collected user voice. Meanwhile, the interaction device 3 can identify the user in the shooting area 32 through other modes such as face recognition and the like, and compares the identification result with a comparison result obtained based on a sound feature library so as to avoid impersonation of internal staff of the enterprise AA. For example, by determining that the user voice is from the user a and the user B in the enterprise AA respectively through the sound features, and determining that the user in the shooting area is the user a and the unrecognizable external person based on the face recognition, the external person may impersonate the recording of the user B, the interaction device 3 may refuse to complete the corresponding assistance event, and issue an alarm prompt to the user B.

In another case, the interaction device 3 may first identify the user in the shooting area 32 by face recognition, for example, identify the user in the shooting area 32 as the user a and the user B in the enterprise AA. When the interaction device 3 collects two user voices, the user voices can be compared with the voiceprint features of the user A and the user B only to determine which user voice comes from the user A and which user voice comes from the user B, and the comparison with other voiceprint features in the voiceprint feature library is not needed, so that the comparison efficiency can be greatly improved.

In addition, the user in the photographing region 32 may include an administrator, and the interactive apparatus 3 may assist in completing a corresponding apparatus management event, such as adjusting the welcome content on the screen 33, adjusting the volume of the speaker 34, adjusting the region ranges of the far-field photographing region 321 and the near-field photographing region 322, and the like, in response to an event assistance request of the administrator.

FIG. 14 is a schematic diagram of a labeling interaction object, as provided by an example embodiment. After the interaction device 3 photographs the photographing region 32 through the camera 31, the detected user located within the photographing region 32 may be marked so that the user can clearly determine whether or not he or she has been detected by the interaction device 3 and can perform interaction with the interaction device 3. As shown in fig. 14, when the interaction device 3 detects that a certain user exists in the shooting area 32, a corresponding head portrait picture 1401 can be generated for the user according to the shot image, and the head portrait picture 1401 is displayed on the screen 33; when another user is also detected by the interaction device 3 to be located in the shooting area 32, an avatar picture 1402 corresponding to the user may also be shown on the screen 33; similarly, when other users enter the photographing region 32, the interaction device 3 may also show corresponding avatar pictures on the screen 33, which will not be described here again.

When a user corresponding to, for example, the avatar picture 1402 leaves from the photographing region 32, the interactive apparatus 3 may delete the avatar picture 1402 from the screen; the other users are similarly, and will not be described in detail here.

Thus, when viewing that the screen 33 contains the avatar pictures 1401-1402, the corresponding user can determine that himself has been detected by the interaction device 3, that is the interaction object by the interaction device 3, and that interaction with the interaction device 3 can be performed; and when other users desiring to interact with the interaction device 3 do not view their own corresponding avatar pictures on the screen 33, it is indicated that the user may not enter the shot area 32, or enter the shot area 32 but not be successfully detected by the interaction device 3, the user may take steps of entering or re-entering the shot area 32, etc., until the avatar picture of the user is shown on the screen 33.

FIG. 15 is a schematic diagram of a annotation target interactive object, provided in an exemplary embodiment. It is assumed that the interaction device 3 recognizes the user "small white" and the user "small black" within the photographing region 32, respectively, and the interaction device 3 determines the user "small white" as a target interaction object, as shown in fig. 15: the interactive apparatus 3 may display the avatar picture 1401 corresponding to the user "small white" on the central region of the screen 33 (relatively far from the edge of the screen 33) in a normal scale, and display the avatar picture 1402 corresponding to the user "small black" on the edge region of the screen 33 in a smaller scale. Then, when the interactive device 3 sends out an interactive voice of "what can ask you to help" through the speaker 34, according to the display scale and the display position of the avatar pictures 1401 and 1402, it can be determined that the target interactive object corresponding to the interactive voice is the user "small white" corresponding to the avatar picture 1401, and not the user "small black" corresponding to the avatar picture 1402.

Of course, besides configuring the display proportion and the display position of the head portrait pictures, the display attribute of the head portrait pictures can be adjusted in other aspects, so that the head portrait pictures corresponding to the target interaction objects and the head portrait pictures corresponding to other interaction objects are distinguished from each other, the corresponding target interaction objects can be conveniently determined according to the head portrait pictures, and the method is not limited in this specification.

In addition to the interaction by the interaction device 3 with the user in the capture area 32, the user in the capture area 32 may also interact with the interaction device 3, such as by emitting user speech to the interaction device 3, so that the interaction device 3 responds to the user speech to meet the needs of the user from whom the user speech originated. The user voice may be used to respond to the interactive voice sent by the interactive device 3, or may be sent to the interactive device 3 actively by the user in the shooting area 32, which is not limited in this specification.

In an embodiment, there may be multiple users simultaneously within the photographing region 32, so that when the interactive apparatus 3 receives the user voice uttered by the user within the photographing region 32, it needs to determine the source user of the user voice, i.e. to distinguish which user in the photographing region 32 uttered the user voice.

For example, FIG. 16 is a schematic diagram of a user determining the source of user speech provided by an exemplary embodiment; as shown in fig. 16, the interactive apparatus 3 may have built therein a microphone array, which may include a microphone 36 and a microphone 37, wherein the arrangement position of the microphone 36 is biased to the left and the arrangement position of the microphone 37 is biased to the right. Thus, when a user in the photographing region 32 utters a user voice such as "i need to reserve a 15-person meeting room", if the microphone 36 will receive the user voice earlier than the microphone 37, it is indicated that the source user of the user voice is relatively closer to the microphone 36 and relatively farther from the microphone 37, and thus it can be determined that the source user is located relatively farther to the left in the photographing region 32, such as by combining the images acquired in fig. 10, it can be determined that the source user is a user "white).

Similarly, if the time of receipt of the user's voice by microphone 37 is earlier than the time of receipt by microphone 36, indicating that the source user of the user's voice is relatively closer to microphone 37 and relatively farther from microphone 36, it may be determined that the source user is located relatively more rightward in the capture area 32, such as in connection with the image captured in FIG. 10, it may be determined that the source user is "small black". Alternatively, if the time of receipt of the user's voice by the microphone 36 and the microphone 37 is the same or almost the same, it is indicated that the source user of the user's voice is located in the middle of the microphone 36 and the microphone 37, which corresponds to the right front of the interactive apparatus 3, and thus it can be determined that the source user is located in the middle of the photographing area 32.

In an embodiment, based on the installation position or the installation manner of the interactive device 3, the user may be located behind the interactive device 3, not in front of the screen 33 and the camera 31, and thus the user is obviously not located in the shooting area 32, but may be located just near the interactive device 3, so that when the interactive device 3 receives an audio message such as "i need a meeting room of 15 persons to be scheduled", the audio message is not necessarily user voice uttered by the user in the shooting area 32, but is interference voice uttered by the user at the rear. Therefore, in order to avoid misjudging the interfering voice as the user voice, the source direction of the audio message needs to be judged: the front-originating audio message may be user speech uttered by a user within the capture area 32 and the rear-originating audio message may be interfering speech.

For example, FIG. 17 is a schematic diagram of an exemplary embodiment for determining the direction of origin of an audio message; as shown in fig. 17, the interactive apparatus 3 may have built-in a microphone array, which may include a microphone 36, a microphone 37, and a microphone 38, the placement position of the microphone 36 being biased to the left in the left-right direction (i.e., the horizontal direction in fig. 17), the placement position of the microphone 37 being biased to the right, the microphone 36 and the microphone 37 being in the front of the interactive apparatus 3, relatively close to the video area 32, the microphone 38 being in the back of the interactive apparatus 3, relatively far from the video area 32, in the front-back direction (i.e., the up-down direction in fig. 17); wherein the microphones 36-38 are all located in the interaction device 3. Thus, for an audio message sent by a user in the vicinity of the interaction device 3: if the user is located in front of the interactive device 3 such that an audio message is transmitted from the front of the interactive device 3 and passes through the interactive device 3, the high frequency part of the audio message will be absorbed by the housing part of the interactive device 3, and the microphones 36-38 will cause the intensity of the high frequency signal received by the microphone 38 located at the back of the interactive device 3 to be smaller than the intensity of the high frequency signal received by the microphones 36-37 due to the absorption by the housing of the interactive device 3 when receiving the high frequency part of the audio message; if the user is located behind the interactive device 3 such that an audio message is transmitted from behind the interactive device 3 and passes through the interactive device 3, the high frequency part of the audio message will be absorbed by the housing part of the interactive device 3, and the microphones 36-38 will, when receiving the high frequency part of the audio message, result in the strength of the high frequency signal received by the microphones 36-37 located in front of the interactive device 3 being smaller than the strength of the high frequency signal received by the microphone 38 because of the absorption by the housing of the interactive device 3.

Therefore, according to the receiving situation of the microphones 36 to 38 on the high frequency part of the audio message, it can be determined whether the source direction of the audio signal is the front or the rear of the interactive device 3. When the source direction of the audio message is determined to be behind the interactive apparatus 3, the source user of the audio message is necessarily not a user within the photographing area 32, i.e., the audio message is a disturbing voice. When it is determined that the source direction of the audio message is the front of the interactive apparatus 3, the source user of the audio message may be a user within the photographing area 32; of course, in order to improve accuracy and reduce the probability of misjudgment, the method can further judge by combining other conditions:

in an embodiment, the image acquisition may be performed by the camera 31 on the interaction device 3, and if a user is present in the shooting area 32, it may be determined that the above-mentioned audio message originates from the user.

In an embodiment, the image capturing may be performed by the camera 31 on the interactive device 3, if there are multiple users in the capturing area 32, facial actions of each user may be combined, for example, whether there is an action of opening and closing the mouth during the process of receiving the audio message, whether the moment of occurrence of the action is consistent with the signal change of the audio message, etc., so that the user whose facial action matches with the audio message is determined as the source user of the audio message.

In an embodiment, the image capturing may be performed by the camera 31 on the interaction device 3, and if there are multiple users in the capturing area 32, the direction of the source of the audio message identified by the microphone 36-37 may be biased to the left, the right or the middle, so that the user corresponding to the corresponding direction is determined as the source user of the audio message. If there are still multiple users in the same direction, the facial actions of the above users can be further combined, and the users with facial actions matched with the audio message can be screened from the facial actions to determine the source user of the audio message.

FIG. 18 is a schematic diagram of a source user labeling user speech provided by an exemplary embodiment; as shown in fig. 18, assuming that the user "small white" is determined as the source user of the user voice, the interactive device 3 may keep the avatar image 1401 in the original color mode for display, and display the avatar images 1402 corresponding to other users after the uniform graying process, so that the user in the shooting area 32 may quickly confirm whether the interactive device 3 correctly recognizes the source user of the user voice, so as to ensure that no deviation occurs in the subsequent interaction process.

In an embodiment, when the interactive device 3 recognizes the voice of the user, if the source user has accents, the external environment is too noisy, or there are adverse factors such as distortion in the pick-up process, the semantic recognition accuracy of the interactive device 3 may be affected. Thus, during the recognition process, the interaction device 3 may score each alternative semantic recognition result separately; the interaction device 3 may directly discard the candidate semantic recognition results with low confidence (e.g. lower than the preset score), if the number of candidate semantic recognition results with high confidence (e.g. higher than the preset score) is 1, the candidate semantic recognition results may be directly used as the semantic recognition results, and if the number of candidate semantic recognition results with high confidence is greater than 1, the interaction device 3 may display candidate semantic recognition result options corresponding to the candidate semantic recognition results with high confidence to the source user, for example, option 1801 shown in fig. 18 is "1, i need to reserve a meeting room of 15 people," option 1802 is "2, i need to reserve a meeting room of 45 people," so as to be used as a "small white" of the source user for selection confirmation.

For example, the user "xiaobai" may inform the interactive device 3 of its selection option 1801 by making a confirmation voice containing "first", "previous", "15 person" etc., and then the interactive device 3 may determine that the semantic recognition result corresponding to the above-mentioned user voice is "i need to reserve a 15 person meeting room", thereby further responding, such as assisting the user "xiaobai" to complete reservation of the relevant meeting room, etc.

Fig. 19 is a schematic block diagram of an apparatus provided in an exemplary embodiment. Referring to fig. 19, at a hardware level, the device includes a processor 1902, an internal bus 1904, a network interface 1906, a memory 1908, and a nonvolatile memory 1910, although other hardware required for other services is also possible. The processor 1902 reads a corresponding computer program from the nonvolatile memory 1910 into the memory 1908 and then runs, forming an interaction means on a logical level. Of course, in addition to software implementation, one or more embodiments of the present disclosure do not exclude other implementation manners, such as a logic device or a combination of software and hardware, etc., that is, the execution subject of the following processing flow is not limited to each logic unit, but may also be hardware or a logic device.

Referring to fig. 20, in a software implementation, the interaction device may include:

a detection unit 2001 which detects a user in a sensing area;

a providing unit 2002 for providing interactive contents to users in the sensing area;

and a first display unit 2003 for displaying information of the target interactive object to the user in the sensing area when the target interactive object of the interactive content is a part of the user in the sensing area.

Optionally, the first display unit 2003 is specifically configured to:

determining identity information of a user serving as the target interaction object to serve as information of the target interaction object;

and displaying the identity information to the user in the sensing area.

Alternatively to this, the method may comprise,

further comprises: an identification unit 2004 for identifying the user in the sensing area;

wherein, when the identity of the first user as the target interaction object is successfully identified, the identity information of the first user comprises the name of the first user; when the identity of the second user as the target interaction object is not successfully identified, the identity information of the second user comprises visual characteristic description information for the second user.

Alternatively to this, the method may comprise,

further comprises: a second display unit 2005 for displaying user indication information corresponding to the user in the sensing area;

the first display unit 2003 is specifically configured to: determining user indication information corresponding to a user serving as the target interaction object to serve as information of the target interaction object; and highlighting the determined user indication information to the user in the sensing area.

Optionally, the second display unit 2005 is specifically configured to:

And acquiring images of users in the sensing area, and displaying the acquired user images as the user indication information.

Optionally, the method further comprises:

and the management unit 2006 pauses the information of the target interaction object to be displayed to the users in the sensing area when the target interaction object of the interaction content is changed from part of the users in the sensing area to all the users.

Optionally, the method further comprises:

a request acquisition unit 2007 for acquiring an event assistance request sent by a user in the sensing area;

and an assisting unit 2008, responding to the event assisting request, so as to assist in completing the corresponding event.

Optionally, the request acquiring unit 2007 is specifically configured to:

and receiving response information returned by the user in the sensing area for the interactive content, wherein the response information comprises the event assistance request.

Optionally, the request acquiring unit 2007 is specifically configured to:

when a plurality of users exist in the sensing area, selecting the users serving as assistance objects according to a preset sequence;

and sequentially sending prompts to the selected users so that the selected users send corresponding event assistance requests.

Optionally, the request acquiring unit 2007 is specifically configured to:

carrying out semantic recognition on the collected user voice to obtain the event assistance request;

and performing voice feature recognition on the user voice to determine a source user of the user voice.

Optionally, the assisting unit 2008 is specifically configured to:

when the user in the sensing area is a member in the community, responding to the event assistance request to assist in completing the corresponding community management event;

when the user in the sensing area is a group external person, responding to the event assistance request to send a reminding message to the associated group internal member, assisting the group external person to establish communication with the associated group internal member, or guiding the group external person to a processing place for accessing an event;

and when the user in the sensing area is an administrator, responding to the event assistance request to assist in completing the corresponding equipment management event.

Optionally, the sensing region includes a near field sensing region and a far field sensing region; the providing unit 2002 is specifically configured to:

providing interactive content to users in the near field sensing region;

And sending guiding information to a user in the far-field sensing area so as to guide the user to enter the near-field sensing area from the far-field sensing area.

Alternatively to this, the method may comprise,

further comprises: an event obtaining unit 2009, configured to obtain an associated event of a user in the sensing area;

the providing unit 2002 is specifically configured to: and when the interactive content is related to the related event, adjusting the interactive content provided for the user in the sensing area according to the state information of the related event.

Alternatively to this, the method may comprise,

further comprises: a determining unit 2010 for determining an identity type of the user in the sensing area;

the providing unit 2002 is specifically configured to: and adjusting the interaction content provided for the user in the sensing area according to the identity type.

Optionally, the method further comprises:

a voice receiving unit 2011, configured to receive a user voice sent by a user in the sensing area;

and a response unit 2012 that responds to the user voice.

Optionally, the response unit 2012 is specifically configured to:

carrying out semantic recognition on the user voice;

when a plurality of semantic recognition results with confidence coefficient reaching a preset value exist, displaying a plurality of corresponding semantic recognition result options for users in the sensing area;

Responding to the user voice according to the semantic recognition result corresponding to the selected semantic recognition result option.

Optionally, the response unit 2012 is specifically configured to:

determining a source direction of the user voice;

responding to a user located in the direction of the source of the user's voice.

Optionally, the response unit 2012 determines the source direction of the user voice by:

receiving the user voice through a microphone array, wherein the microphone array comprises a first microphone arranged relatively far to the left and a second microphone arranged relatively far to the right;

and determining the source direction of the user voice according to the receiving time difference of the first microphone and the second microphone to the user voice.

Optionally, the response unit 2012 responds to the user located in the source direction of the user's voice by:

when a plurality of users positioned in the source direction of the user voice exist, determining the source user of the user voice according to the facial action of each user in the plurality of users;

responding to the source user.

Optionally, the method further comprises:

an audio receiving unit 2013 that receives an audio message through a microphone array including a third microphone relatively close to the sensing area, a fourth microphone relatively far from the sensing area;

A direction determining unit 2014 configured to determine a source direction of the audio message according to the reception of the high frequency part in the audio message by the third microphone and the fourth microphone;

and a source determining unit 2015, configured to determine that the audio message is a user voice uttered by the user in the sensing area when the source direction is a side relatively close to the sensing area and the user is present in the sensing area.

Optionally, the method further comprises:

a head portrait display unit 2016 for displaying head portrait pictures corresponding to each user when a plurality of users exist in the sensing area;

and the distinguishing and displaying unit 2017 is used for distinguishing and displaying the head portrait pictures of the user from which the user voice is generated and the head portrait pictures of other users.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.

In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.

Claims

1. An interaction method, comprising:

detecting a user in a sensing area, and acquiring an associated event of the user in the sensing area;

providing interactive content to users in the sensing area, comprising: when the interactive content is related to the related event, according to the state information of the related event, adjusting the interactive content provided for the user in the sensing area, and when the related event related to the interactive content does not exist, providing default interactive content; when the related event includes a leave-for-check event, adjusting the interactive content provided to the user in the sensing area according to the state information of the related event includes: if the current time period reaches the leave-out time period corresponding to the leave-out approval event and the current time period is a working time period, adjusting interaction content provided for users in the induction area to be attendance content matched with the associated event;

2. The method of claim 1, wherein the presenting the information of the target interactive object to the user in the sensing area comprises:

and displaying the identity information to the user in the sensing area.

3. The method of claim 2, wherein the step of determining the position of the substrate comprises,

further comprises: identifying the identity of the user in the sensing area;

4. The method of claim 1, wherein the step of determining the position of the substrate comprises,

further comprises: displaying user indication information corresponding to the user in the sensing area;

The displaying the information of the target interactive object to the user in the sensing area comprises the following steps: determining user indication information corresponding to a user serving as the target interaction object to serve as information of the target interaction object; and highlighting the determined user indication information to the user in the sensing area.

5. The method of claim 4, wherein the presenting the user-specific information corresponding to the user in the sensing area comprises:

6. The method as recited in claim 1, further comprising:

and when the target interactive object of the interactive content is changed from part of users in the sensing area to all users, suspending the information showing the target interactive object to the users in the sensing area.

7. The method as recited in claim 1, further comprising:

acquiring an event assistance request sent by a user in the sensing area;

and responding to the event assistance request to assist in completing the corresponding event.

8. The method of claim 7, wherein the obtaining the event assistance request from the user within the sensing area comprises:

9. The method of claim 7, wherein the obtaining the event assistance request from the user within the sensing area comprises:

10. The method of claim 7, wherein the obtaining the event assistance request from the user within the sensing area comprises:

11. The method of claim 7, wherein said responding to the event assistance request to assist in completing the respective event comprises:

12. The method of claim 1, wherein the sensing region comprises a near field sensing region and a far field sensing region; the providing interactive content for the user in the sensing area comprises the following steps:

providing interactive content to users in the near field sensing region;

13. The method of claim 1, wherein the step of determining the position of the substrate comprises,

further comprises: determining the identity type of the user in the sensing area;

the providing interactive content for the user in the sensing area comprises the following steps: and adjusting the interaction content provided for the user in the sensing area according to the identity type.

14. The method as recited in claim 1, further comprising:

receiving user voice sent by a user in the sensing area;

responding to the user voice.

15. The method of claim 14, wherein said responding to said user speech comprises:

carrying out semantic recognition on the user voice;

16. The method of claim 14, wherein said responding to said user speech comprises:

determining a source direction of the user voice;

17. The method of claim 16, wherein said determining the direction of origin of the user's speech comprises:

18. The method of claim 16, wherein said responding to the user being in the direction of the source of the user's voice comprises:

responding to the source user.

19. The method as recited in claim 14, further comprising:

receiving an audio message through a microphone array, the microphone array including a third microphone relatively close to the sensing region, a fourth microphone relatively far from the sensing region;

determining the source direction of the audio message according to the receiving conditions of the third microphone and the fourth microphone on the high-frequency part in the audio message;

when the source direction is one side relatively close to the sensing area and a user exists in the sensing area, the audio message is judged to be user voice sent by the user in the sensing area.

20. The method as recited in claim 14, further comprising:

when a plurality of users exist in the sensing area, displaying head portrait pictures corresponding to the users respectively;

and distinguishing and displaying the head portrait pictures of the source user of the user voice from the head portrait pictures of other users.

21. An interactive apparatus, comprising:

a detection unit that detects a user in the sensing area;

the event acquisition unit acquires the associated event of the user in the sensing area;

a providing unit for providing interactive contents to users in the sensing area, comprising: when the interactive content is related to the related event, according to the state information of the related event, adjusting the interactive content provided for the user in the sensing area, and when the related event related to the interactive content does not exist, providing default interactive content; when the related event includes a leave-for-check event, adjusting the interactive content provided to the user in the sensing area according to the state information of the related event includes: if the current time period reaches the leave-out time period corresponding to the leave-out approval event and the current time period is a working time period, adjusting interaction content provided for users in the induction area to be attendance content matched with the associated event;

And the first display unit is used for displaying information of the target interaction object to the user in the induction area when the target interaction object of the interaction content is part of the user in the induction area.

22. The apparatus according to claim 21, wherein the first display unit is specifically configured to:

and displaying the identity information to the user in the sensing area.

23. The apparatus of claim 22, wherein the device comprises a plurality of sensors,

further comprises: the identification unit is used for identifying the identity of the user in the induction area;

24. The apparatus of claim 21, wherein the device comprises a plurality of sensors,

further comprises: the second display unit displays user indication information corresponding to the user in the sensing area;

The first display unit is specifically configured to: determining user indication information corresponding to a user serving as the target interaction object to serve as information of the target interaction object; and highlighting the determined user indication information to the user in the sensing area.

25. The device according to claim 24, wherein the second display unit is specifically configured to:

26. The apparatus as recited in claim 21, further comprising:

and the management unit pauses the information of the target interaction object to be displayed to the users in the induction area when the target interaction object of the interaction content is changed from part of the users in the induction area to all the users.

27. The apparatus as recited in claim 21, further comprising:

a request acquisition unit for acquiring an event assistance request sent by a user in the sensing area;

and the assisting unit responds to the event assisting request to assist in completing the corresponding event.

28. The apparatus according to claim 27, wherein the request acquisition unit is specifically configured to:

29. The apparatus according to claim 27, wherein the request acquisition unit is specifically configured to:

30. The apparatus according to claim 27, wherein the request acquisition unit is specifically configured to:

31. The apparatus according to claim 27, wherein the assisting unit is specifically configured to:

32. The apparatus of claim 21, wherein the sensing region comprises a near field sensing region and a far field sensing region; the providing unit is specifically configured to:

providing interactive content to users in the near field sensing region;

33. The apparatus of claim 21, wherein the device comprises a plurality of sensors,

further comprises: a determining unit for determining the identity type of the user in the sensing area;

the providing unit is specifically configured to: and adjusting the interaction content provided for the user in the sensing area according to the identity type.

34. The apparatus as recited in claim 21, further comprising:

a voice receiving unit for receiving user voice sent by a user in the sensing area;

and the response unit is used for responding to the user voice.

35. The apparatus according to claim 34, wherein the response unit is specifically configured to:

Carrying out semantic recognition on the user voice;

36. The apparatus according to claim 34, wherein the response unit is specifically configured to:

determining a source direction of the user voice;

37. The apparatus of claim 36, wherein the response unit determines the direction of origin of the user's voice by:

38. The apparatus according to claim 36, wherein said response unit responds to a user located in a direction of origin of said user's voice by:

responding to the source user.

39. The apparatus as recited in claim 34, further comprising:

an audio receiving unit that receives an audio message through a microphone array that includes a third microphone relatively close to the sensing area, a fourth microphone relatively far from the sensing area;

a direction determining unit for determining a source direction of the audio message according to the receiving conditions of the third microphone and the fourth microphone on the high-frequency part in the audio message;

and the source judging unit is used for judging that the audio message is user voice sent by the user in the induction area when the source direction is one side relatively close to the induction area and the user exists in the induction area.

40. The apparatus as recited in claim 34, further comprising:

the head portrait display unit is used for respectively displaying head portrait pictures corresponding to each user when a plurality of users exist in the sensing area;

And the distinguishing and displaying unit is used for distinguishing and displaying the head portrait pictures of the source user of the user voice and the head portrait pictures of other users.