CN112925235A

CN112925235A - Sound source localization method, apparatus and computer-readable storage medium at the time of interaction

Info

Publication number: CN112925235A
Application number: CN202110084524.3A
Authority: CN
Inventors: 李泽华; 张涛; 禤小兵
Original assignee: Shenzhen Pudu Technology Co Ltd
Current assignee: Shenzhen Pudu Technology Co Ltd
Priority date: 2021-01-21
Filing date: 2021-01-21
Publication date: 2021-06-08
Also published as: WO2022156611A1

Abstract

The application relates to the field of robots, and provides a sound source positioning method and equipment during interaction and a computer readable storage medium, so that the direction of a sound source is accurately positioned, and the efficiency and experience during interaction are improved. The method comprises the following steps: the sound pick-up array picks up target sound information around the robot; determining azimuth information of a sound source of the target sound information according to the target sound information picked up by the sound pick-up array; and transmitting the azimuth information of the sound source of the target sound information to a servo mechanism of the robot so that the servo mechanism rotates the anthropomorphic head of the robot to be opposite to the sound source emitting the target sound information. According to the technical scheme, the servo mechanism rotates the anthropomorphic head of the robot, the servo mechanism is accurate to the sound source emitting the target sound information, the interaction efficiency of the robot and a user is improved, and the use experience of the user can be improved.

Description

Sound source localization method, apparatus and computer-readable storage medium at the time of interaction

Technical Field

The present invention relates to the field of robots, and in particular, to a method, an apparatus, and a computer-readable storage medium for positioning a sound source during interaction.

Background

The rapid development of the robot technology enables the robot to be widely applied in various scenes. It is desirable that the robot in these application scenarios can interact with the user, and especially, whether the robot in some specific application scenarios, such as a robot accompanying an elderly person, an independent person, etc., can interact well with the user is an important criterion for providing a good experience to the user.

One way for a user to interact with a robot is voice interaction, e.g., waking the robot, talking with the robot after waking the robot, etc. Similar to human-human interaction, a robot can face a user who makes a sound (whether an instruction or a conversation with emotion) as a premise of forming good interaction experience, and the robot does not face the user all the time, so that positioning of a sound source in the interaction mode is particularly important.

However, in the existing sound source positioning method during interaction, since the robot cannot accurately position the direction of the sound source, the robot cannot directly face the user when the user makes a sound, and the experience of the user is reduced.

Disclosure of Invention

The application provides a sound source positioning method and equipment during interaction and a computer readable storage medium, so that the direction of a sound source is accurately positioned, and the efficiency and experience during interaction are improved.

In one aspect, the present application provides a sound source localization method during interaction, including:

the sound pick-up array picks up target sound information around the robot;

determining azimuth information of a sound source of the target sound information according to the target sound information picked up by the sound pick-up array;

and sending the azimuth information of the sound source of the target sound information to a servo mechanism of the robot so that the servo mechanism rotates the anthropomorphic head of the robot to be over against the sound source emitting the target sound information.

In another aspect, the present application provides an interactive sound source localization apparatus, including:

the sound pick-up array module is used for picking up target sound information around the robot through the sound pick-up array;

the azimuth information determining module is used for determining azimuth information of a sound source of the target sound information according to the target sound information picked up by the sound pick-up array;

and the driving module is used for sending the azimuth information of the sound source of the target sound information to the servo mechanism of the robot so that the servo mechanism rotates the anthropomorphic head of the robot to be over against the sound source emitting the target sound information.

In a third aspect, the present application provides an apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the technical solution of the sound source localization method as described above in the interaction.

In a fourth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when being executed by a processor, implements the steps of the technical solution of the sound source localization method when interacting as described above.

According to the technical scheme, the pickup robot has the advantages that the target sound information around the pickup robot is the pickup arrays, the target sound information can be picked up from multiple directions, so that the azimuth information of the sound source of the target sound information is determined according to the target sound information picked up by the pickup arrays, the positioning result is relatively accurate, the servo mechanism can rotate the anthropomorphic head of the robot, the sound source which emits the target sound information is accurately and rightly arranged, the interaction efficiency of the robot and a user is improved, and the use experience of the user can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a sound source positioning method during interaction according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a sound source positioning device during interaction according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of an interactive sound source localization apparatus according to another embodiment of the present application;

fig. 4 is a schematic structural diagram of an apparatus provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In this specification, adjectives such as first and second may only be used to distinguish one element or action from another, without necessarily requiring or implying any actual such relationship or order. References to an element or component or step (etc.) should not be construed as limited to only one of the element, component, or step, but rather to one or more of the element, component, or step, etc., where the context permits.

In the present specification, the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The application provides a sound source positioning method during interaction, which can be applied to a robot, wherein the robot can be a robot working in a restaurant, such as a dish-passing robot, a medicine delivery robot working in a medical place, such as a hospital, a household robot, such as an emotional robot accompanying the old or an independent person, and the like. As shown in fig. 1, the sound source localization method during interaction mainly includes steps S101 to S103, which are detailed as follows:

step S101: the microphone array picks up target sound information around the robot.

The sound pickup array is a combination of a plurality of sound pickups arranged in a certain rule, and each sound pickup in the sound pickup array has a function of independently acquiring a sound signal or sound information. In the embodiment of the present application, the microphone array may be an array composed of 4 to 6 microphones, which may be disposed at 6 positions of the front, rear, left, right, upper, lower, and the like of the head of the robot, and the microphones (including their internal audio amplification circuits) and other common sound signal acquisition devices may be used as the microphones.

In the embodiment of the present application, the target sound information may be a voice content having a specific meaning, including a wake-up word for waking up the robot, a name for the relative nickname of the robot, and common words when interacting with the robot, such as "small moxa, please wake up, small moxa, please go over the head", and "small moxa, please help me take the water cup over", and the like. Since the user may be in any orientation of the robot, the microphone array needs to pick up peripheral sounds of the robot in order to pick up target sound information. Since specific voice information necessarily contains specific acoustic features, such as loudness, pitch, frequency, timbre of voice and even voiceprints of specific persons, as an embodiment of the present application, the target sound information picked up by the microphone array in the periphery of the robot may be: the method comprises the steps of extracting acoustic features of peripheral sounds picked up by a sound pick-up array to obtain sound source information containing the acoustic features, comparing the acoustic features of the sound source information with pre-stored acoustic features, and determining the sound source information to be target sound information if the acoustic features of the sound source information are matched with the pre-stored acoustic features. In the embodiment of picking up the target sound information around the robot by the microphone array, when the acoustic feature extraction is performed on the peripheral sound picked up by the microphone array, the processing of removing the interference, including eliminating or reducing the noise interference, may be performed on the peripheral sound. Specifically, one method of eliminating or reducing noise interference may be: determining the volume of the ambient sound collected by each sound pick-up in the sound pick-up array, calculating sound signals with volume difference smaller than a predetermined difference threshold, and determining the sound signals with frequencies higher than a first frequency threshold or lower than a second frequency threshold and/or duration longer than a first duration or lower than a second duration as noise signals, wherein the sound signals with volume difference smaller than the predetermined difference threshold may be: regarding a group of sound collectors, regarding any one sound collector as a main sound collector, regarding other sound collectors as secondary sound collectors, calculating a volume average value of sound signals collected by the secondary sound collectors for sound signals collected by the group of sound collectors in the same frequency band, calculating a difference value between the volume average value and the volume of the sound signals collected by the main sound collector, and determining the corresponding sound signals as the sound signals with the volume difference value smaller than a preset difference threshold when the difference value is smaller than the preset difference threshold.

Step S102: and determining the azimuth information of the sound source of the target sound information according to the target sound information picked up by the sound pick-up array.

The step S102 of extracting acoustic features of the peripheral sound picked up by the sound pickup array includes extracting the sound intensity, that is, the sound intensity of the peripheral sound picked up by the sound pickup array, that is, obtaining the sound intensity of the sound source information, and therefore, the target sound information picked up by the sound pickup array also includes the sound intensity. As an embodiment of the present application, determining the azimuth information of the sound source of the target sound information from the target sound information picked up by the sound pickup array may be: calculating the time of the target sound information reaching each sound pick-up in the sound pick-up array, determining the delay time of the sound pick-up array for collecting the target sound information, and determining the azimuth information of the sound source of the target sound information according to the delay time and the sound intensity of the target sound information. The positioning principle is obtained by simulating the concept of listening and distinguishing positions of the human auditory system and combining with geometric knowledge, and is not described herein again.

In another embodiment of the present application, determining the azimuth information of the sound source of the target sound information according to the target sound information picked up by the sound pickup array may be: dividing the space where the robot is located into a plurality of airspaces; judging a first sound pickup which receives the target sound information and a second sound pickup which receives the target sound information to determine an airspace in which a sound source of the target sound information is located; and calculating the direction information of the sound source of the target sound information according to the first time and the second time when the target sound information is received and the airspace where the sound source of the target sound information is located. The first sound pickup receiving the target sound information and the second sound pickup receiving the target sound information may be set with a judgment threshold value, where the threshold value is an average sound intensity value of a section of speech extracted from the target sound information in an earlier stage, and if the sound intensity of the target sound information received by two sound pickups in the sound pickup array successively is higher than the average sound intensity value, the two sound pickups are determined as the first sound pickup receiving the target sound information and the second sound pickup receiving the target sound information. The spatial domain where the sound source of the target sound information is located is actually determined by the angle between the pair of sound pick-up devices, and therefore, the direction information of the sound source of the target sound information calculated according to the first and second time instants at which the target sound information is received and the spatial domain where the sound source of the target sound information is located may be: regarding any two sound collectors as a group of sound collectors, respectively calculating an azimuth angle beta of a sound source of target sound information between the two sound collectors in the group of sound collectors for each group of sound collectors, estimating a distance D between the sound source of the target sound information and the group of sound collectors according to the azimuth angle beta, determining a position of a hypothetical sound source in a space where the sound source of the target sound information is located according to the distance D, orthogonally decomposing the hypothetical sound source determined by each group of sound collectors at the position of the hypothetical sound source, calculating a horizontal angle and a height angle of the hypothetical sound source, and locating azimuth information of the sound source of the target sound information.

In another embodiment of the present application, the method may further include: image acquisition is performed in accordance with at least one sound source direction of the target sound information picked up by the sound pickup array, and morphological features of the sound emission part are identified from the acquired image. Since each sound pickup in the sound pickup array can pick up the target sound information, the target sound information picked up by the sound pickup array includes a plurality of sound source orientations, image pickup is performed in accordance with at least one sound source orientation of the target sound information picked up by the sound pickup array, and morphological characteristics of a sound emission part in the picked-up images, such as a mouth shape when the sound is emitted, are recognized.

In combination with the above-mentioned technical means of collecting images according to at least one sound source direction of the target sound information picked up by the sound pick-up array and identifying the morphological feature of the sound emitting part from the collected images, in an embodiment of the present application, the direction information of the sound source for determining the target sound information according to the target sound information picked up by the sound pick-up array may be: and determining final orientation information of the sound source of the target sound information from at least one sound source orientation of the target sound information picked up by the sound pick-up array according to the matching degree between the morphological characteristics of the sound production part in the collected image and the target sound information. Specifically, the final orientation information for determining the sound source of the target sound information from at least one sound source orientation of the target sound information picked up by the microphone array according to the matching degree between the morphological feature of the utterance section in the captured image and the target sound information may be: acquiring a predicted azimuth probability value of each sound source azimuth in at least one sound source azimuth of target sound information picked up by a sound pick-up array; determining a sound source azimuth value corresponding to the sound source azimuth according to the predicted azimuth probability value and the matching degree between the morphological characteristics of the speech expression part and the target sound information; and selecting a sound source azimuth corresponding to the maximum sound source azimuth as final azimuth information of the sound source of the target sound information, wherein the sound source azimuth is used for representing the probability that the obtained sound source azimuth is the final azimuth information of the target sound information. In the above embodiment, the sound source direction is actually positioned by combining the auditory and visual positioning methods, that is, the sound source direction is assisted by the morphological characteristics of the sound-emitting part in the image, so that the accuracy of sound source positioning can be improved compared with the method of determining the sound source direction only by sound.

Step S103: and transmitting the azimuth information of the sound source of the target sound information to a servo mechanism of the robot so that the servo mechanism rotates the anthropomorphic head of the robot to be opposite to the sound source emitting the target sound information.

In the embodiment of the present application, the servo mechanism refers to a system for achieving position, velocity, or acceleration control of a mechanical system via a closed-loop control manner, and generally includes several parts, such as a controlled body, an actuator, a sensor, and a controller, wherein the controller part is connected to a central processing unit in a robot. When the direction information of the sound source of the target sound information is determined in step S102, the central processing unit transmits the direction information of the sound source of the target sound information to the controller of the servo mechanism, and the controller drives an actuator (usually a motor) to rotate the anthropomorphic head of the robot to face the sound source emitting the target sound information.

Specifically, a positioning sensor (e.g., a gyroscope) may be disposed on the robot, the positioning sensor is configured to sense a direction currently opposite to the robot, the servo mechanism calculates a direction and an angle that the robot needs to rotate according to the azimuth information of the sound source of the target sound information and the direction currently opposite to the robot, and then rotates the anthropomorphic head of the robot to the direction that the robot needs to rotate by a corresponding angle, and finally, the robot is opposite to the sound source that emits the target sound information. It can be understood that, since the orientation information of the sound source of the aforementioned determined target sound information is the orientation information of the three-dimensional space, the direction in which the anthropomorphic head of the robot rotates is divided into two cases, that is, the sound source is as high as the anthropomorphic head of the robot and the sound source is not as high as the anthropomorphic head of the robot; correspondingly, the azimuth information of the sound source of the target sound information is sent to a servo mechanism of the robot, so that the servo mechanism rotates the anthropomorphic head of the robot to be opposite to the sound source emitting the target sound information, namely, when the sound source is as high as the anthropomorphic head of the robot, a first plane included angle of the sound source of the target sound information relative to the anthropomorphic head of the robot is sent to the servo mechanism of the robot, so that the servo mechanism rotates the anthropomorphic head of the robot to be opposite to the sound source emitting the target sound information leftwards or rightwards according to the first plane included angle; when the sound source of the target sound information is not as high as the anthropomorphic head of the robot, the pitch angle of the anthropomorphic head of the robot relative to the sound source of the target sound information or the pitch angle of the anthropomorphic head of the robot relative to the sound source of the target sound information and the second plane included angle of the sound source of the target sound information relative to the anthropomorphic head of the robot are sent to a servo mechanism of the robot, so that the servo mechanism rotates the anthropomorphic head of the robot up and down according to the pitch angle or rotates the anthropomorphic head of the robot up and down according to the pitch angle and then rotates the anthropomorphic head of the robot left or right according to the second plane included angle to just face the sound source which sends the target. In the above embodiment, the first plane included angle or the second plane included angle refers to an included angle between the sound source and the anthropomorphic head of the robot when the sound source and the anthropomorphic head are in the same plane, and the pitch angle includes a top view angle and a bottom view angle of the anthropomorphic head of the robot relative to the sound source.

It should be further noted that, in the above-mentioned embodiment of the present application, whether the orientation information of the sound source is determined according to the target sound information picked up by the sound pick-up array, or the orientation information of the sound source is determined by combining the picked-up image with the target sound information picked up by the sound pick-up array, the servo mechanism of the robot is enabled to rotate the anthropomorphic head of the robot to face the sound source which emits the target sound information, and the real-time positioning is not strictly performed, because there is a time difference or a sequential order (although the process is short) from the determination of the orientation information of the sound source which the target sound information belongs to the rotation of the anthropomorphic head of the robot to face the sound source which emits. In order to enhance the real-time performance, in an embodiment of the present application, after determining the azimuth information of the sound source of the target sound information, the sound source sending the target sound information may be continuously tracked by combining an image recognition algorithm, specifically, whether a human face exists in the sound source within the sound production range is detected by using a human image recognition technology, whether a user with the same facial features as a pre-stored human face template exists within the sound production range of the sound source is detected by using a human face recognition technology, or whether a user with lip motion exists within the sound production range of the sound source is detected by using a lip motion detection technology, and if so, the sound source sending the target sound information is locked to continuously track the user.

As can be seen from the sound source positioning method during interaction illustrated in fig. 1, since the target sound information around the robot is picked up by the sound pick-up arrays, the target sound information can be picked up from multiple directions, and the azimuth information of the sound source of the target sound information is determined according to the target sound information picked up by the sound pick-up arrays, the positioning result is relatively accurate, so that the servo mechanism can rotate the anthropomorphic head of the robot to accurately face the sound source emitting the target sound information, the interaction efficiency between the robot and the user is improved, and the use experience of the user is also improved.

Referring to fig. 2, an interactive sound source positioning apparatus according to an embodiment of the present application may include a microphone array module 201, an azimuth information determining module 202, and a driving module 203, which are described in detail as follows:

a sound pickup array module 201 for picking up target sound information around the robot;

an azimuth information determining module 202, configured to determine azimuth information of a sound source of target sound information according to the target sound information picked up by the sound pickup array 201;

and the driving module 203 is used for sending the azimuth information of the sound source of the target sound information to the servo mechanism of the robot so that the servo mechanism rotates the anthropomorphic head of the robot to be opposite to the sound source emitting the target sound information.

Alternatively, the pickup array module 201 illustrated in fig. 2 may include a feature extraction unit and a target information determination unit, where:

the characteristic extraction unit is used for extracting acoustic characteristics of peripheral sound picked up by the sound pick-up array to obtain sound source information containing the acoustic characteristics;

and the acquisition unit is used for comparing the acoustic characteristics of the sound source information with the pre-stored acoustic characteristics, and if the acoustic characteristics are matched with the pre-stored acoustic characteristics, determining that the sound source information is the target sound information.

Alternatively, in the apparatus illustrated in fig. 2, the acoustic feature includes a sound intensity of sound source information, and the bearing information determining module 202 may include a calculating unit and a first bearing information determining unit, wherein:

the calculating unit is used for calculating the time of the target sound information reaching each sound pick-up in the sound pick-up array and determining the delay time of the sound pick-up array for acquiring the target sound information;

a first azimuth information determining unit for determining azimuth information of a sound source of the target sound information based on the delay time of the target sound information and the sound intensity of the target sound information.

Optionally, the orientation information determining module 202 illustrated in fig. 2 may include a spatial domain dividing unit, a determining unit, and a second orientation information determining unit, wherein:

the space domain dividing unit is used for dividing the space where the robot is located into a plurality of space domains;

the judging unit is used for judging a first sound pick-up which receives the target sound information and a second sound pick-up which receives the target sound information so as to determine an airspace where a sound source of the target sound information is located;

and the second direction information determining unit is used for calculating the direction information of the sound source of the target sound information according to the first time and the second time when the target sound information is received and the airspace where the target sound information is located.

Optionally, the apparatus illustrated in fig. 2 may further include an image acquisition module 301 and a recognition module 302, such as the interactive sound source localization apparatus illustrated in fig. 3, wherein:

an image acquisition module 301, configured to perform image acquisition according to at least one sound source direction of the target sound information picked up by the sound pickup array;

the recognition module 302 is configured to recognize morphological features of the utterance region from the image captured by the image capturing module 301.

Alternatively, the bearing information determination module 202 illustrated in fig. 3 may include a third bearing information determination unit configured to determine final bearing information of the sound source of the target sound information from at least one sound source bearing of the target sound information picked up by the microphone array, based on a degree of matching between the morphological feature of the sound emission portion and the target sound information.

Optionally, the third direction information determining unit may include a predicted direction probability value obtaining unit, a sound source direction value determining unit, and a selecting unit, where:

a predicted azimuth probability value acquisition unit for acquiring a predicted azimuth probability value of each sound source azimuth in the sound source azimuths of at least one sound source azimuth;

the sound source orientation value determining unit is used for determining a sound source orientation value corresponding to the sound source orientation according to the predicted orientation probability value and the matching degree, wherein the sound source orientation value is used for representing the probability that the obtained sound source orientation is the final orientation information of the target sound information;

a selecting unit for selecting a sound source bearing corresponding to the maximum sound source bearing value as final bearing information of the sound source of the target sound information.

Alternatively, the driving module 203 illustrated in fig. 2 may include a first rotating unit and a second rotating unit, wherein:

the first rotating unit is used for sending a first plane included angle of the sound source of the target sound information relative to the anthropomorphic head of the robot to a servo mechanism of the robot when the sound source of the target sound information is as high as the anthropomorphic head of the robot, so that the servo mechanism of the robot rotates the anthropomorphic head of the robot leftwards or rightwards according to the first plane included angle and is opposite to the sound source which emits the target sound information;

and the second rotating unit is used for sending the pitch angle of the anthropomorphic head of the robot relative to the sound source of the target sound information or the pitch angle of the anthropomorphic head of the robot relative to the sound source of the target sound information and a second plane included angle of the sound source of the target sound information relative to the anthropomorphic head of the robot to a servo mechanism of the robot when the sound source of the target sound information and the anthropomorphic head of the robot are not equal in height, so that the servo mechanism rotates the anthropomorphic head of the robot up and down according to the pitch angle or rotates the anthropomorphic head of the robot up and down according to the pitch angle and then rotates the anthropomorphic head of the robot left or right according to the second plane included angle to just face the sound source sending the target sound.

According to the technical scheme, the target sound information around the pickup robot is the sound pickup arrays, and the target sound information can be picked up from multiple directions, so that the azimuth information of the sound source of the target sound information is determined according to the target sound information picked up by the sound pickup arrays, the positioning result is relatively accurate, the servo mechanism can rotate the anthropomorphic head of the robot, the sound source which emits the target sound information is accurately aligned, the interaction efficiency of the robot and a user is improved, and the use experience of the user can be improved.

Fig. 4 is a schematic structural diagram of an apparatus provided in an embodiment of the present application. As shown in fig. 4, the apparatus 4 of this embodiment mainly includes: a processor 40, a memory 41 and a computer program 42 stored in the memory 41 and executable on the processor 40, for example a program of a sound source localization method when interacting. The steps in the above-described interactive sound source localization method embodiment, such as steps S101 to S103 shown in fig. 1, are implemented when the processor 40 executes the computer program 42. Alternatively, the processor 40, when executing the computer program 42, implements the functions of the modules/units in the above-described apparatus embodiments, such as the functions of the sound pickup array module 201, the orientation information determination module 202, and the drive module 203 shown in fig. 2.

Illustratively, the computer program 42 of the sound source localization method at the time of interaction mainly comprises: the sound pick-up array picks up target sound information around the robot; determining azimuth information of a sound source of the target sound information according to the target sound information picked up by the sound pick-up array; and transmitting the azimuth information of the sound source of the target sound information to a servo mechanism of the robot so that the servo mechanism rotates the anthropomorphic head of the robot to be opposite to the sound source emitting the target sound information. The computer program 42 may be partitioned into one or more modules/units, which are stored in the memory 41 and executed by the processor 40 to accomplish the present application. One or more of the modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 42 in the device 4. For example, the computer program 42 may be divided into functions of the microphone array module 201, the orientation information determination module 202, and the drive module 203 (modules in a virtual device), and the specific functions of each module are as follows: a sound pickup array module 201 for picking up target sound information around the robot; an azimuth information determining module 202, configured to determine azimuth information of a sound source of target sound information according to the target sound information picked up by the sound pickup array 201; and the driving module 203 is used for sending the azimuth information of the sound source of the target sound information to the servo mechanism of the robot so that the servo mechanism rotates the anthropomorphic head of the robot to be opposite to the sound source emitting the target sound information.

The device 4 may include, but is not limited to, a processor 40, a memory 41. Those skilled in the art will appreciate that fig. 4 is merely an example of a device 4 and does not constitute a limitation of device 4 and may include more or fewer components than shown, or some components in combination, or different components, e.g., a computing device may also include input-output devices, network access devices, buses, etc.

The Processor 40 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 41 may be an internal storage unit of the device 4, such as a hard disk or a memory of the device 4. The memory 41 may also be an external storage device of the device 4, such as a plug-in hard disk provided on the device 4, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 41 may also include both an internal storage unit of the device 4 and an external storage device. The memory 41 is used for storing computer programs and other programs and data required by the device. The memory 41 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned functions may be distributed as required to different functional units and modules, that is, the internal structure of the apparatus may be divided into different functional units or modules to implement all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the above-mentioned apparatus may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/device and method may be implemented in other ways. For example, the above-described apparatus/device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logic function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another apparatus, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a non-transitory computer readable storage medium. Based on such understanding, the present application may also implement all or part of the processes in the method according to the above embodiments, and may also instruct related hardware to complete the processes by using a computer program, where the computer program of the sound source localization method during interaction may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the steps of the embodiments of the method may be implemented, that is, the sound pickup array picks up target sound information around the robot; determining azimuth information of a sound source of the target sound information according to the target sound information picked up by the sound pick-up array; and transmitting the azimuth information of the sound source of the target sound information to a servo mechanism of the robot so that the servo mechanism rotates the anthropomorphic head of the robot to be opposite to the sound source emitting the target sound information. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The non-transitory computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the non-transitory computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, non-transitory computer readable media does not include electrical carrier signals and telecommunications signals as subject to legislation and patent practice. The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application. The above-mentioned embodiments, objects, technical solutions and advantages of the present application are described in further detail, it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present application, and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present application should be included in the scope of the present invention.

Claims

1. A method for sound source localization at interactive time, the method comprising:

the sound pick-up array picks up target sound information around the robot;

2. The interactive sound source localization method according to claim 1, wherein the microphone array picks up target sound information around the robot, and the method comprises:

extracting acoustic features of peripheral sounds picked up by the sound pick-up array to obtain sound source information containing the acoustic features;

and comparing the acoustic characteristics of the sound source information with pre-stored acoustic characteristics, and if the acoustic characteristics are matched with the pre-stored acoustic characteristics, determining that the sound source information is the target sound information.

3. The interactive sound source localization method according to claim 2, wherein the acoustic feature includes a sound intensity of the sound source information, and the determining the azimuth information of the sound source of the target sound information based on the target sound information picked up by the microphone array includes:

calculating the time of the target sound information reaching each sound pick-up in the sound pick-up array, and determining the delay time of the sound pick-up array for collecting the target sound information;

and determining the azimuth information of the sound source of the target sound information according to the delay time and the sound intensity.

4. The interactive sound source localization method according to claim 2, wherein the determining of the azimuth information of the sound source of the target sound information based on the target sound information picked up by the microphone array comprises:

dividing the space where the robot is located into a plurality of airspaces;

judging a first sound pickup which receives the target sound information and a second sound pickup which receives the target sound information so as to determine an airspace where a sound source of the target sound information is located;

and calculating the direction information of the sound source of the target sound information according to the first time and the second time when the target sound information is received and the airspace in which the target sound information is located.

5. The method for sound source localization at interactive time according to claim 1, wherein the method further comprises:

acquiring images according to at least one sound source direction of the target sound information picked up by the sound pick-up array;

from the captured image, morphological features of the vocal part are identified.

6. The interactive sound source localization method according to claim 5, wherein the determining of the azimuth information of the sound source of the target sound information based on the target sound information picked up by the microphone array comprises: and determining final orientation information of a sound source of the target sound information from at least one sound source orientation of the target sound information picked up by the sound pick-up array according to the matching degree between the morphological characteristics and the target sound information.

7. The interactive sound source localization method according to claim 6, wherein the determining final orientation information of the sound source of the target sound information from at least one sound source orientation of the target sound information picked up by the microphone array based on the degree of matching between the morphological feature and the target sound information comprises:

obtaining a predicted azimuth probability value of each sound source azimuth in the sound source azimuths of the at least one sound source azimuth;

determining a sound source azimuth value corresponding to the sound source azimuth according to a predicted azimuth probability value and the matching degree, wherein the sound source azimuth value is used for representing the probability that the obtained sound source azimuth is the final azimuth information of the target sound information;

and selecting the sound source azimuth corresponding to the maximum sound source azimuth as the final azimuth information of the sound source of the target sound information.

8. The method for locating a sound source during interaction according to claim 1, wherein the sending the orientation information of the sound source of the target sound information to the servo mechanism of the robot so that the servo mechanism rotates the anthropomorphic head of the robot to face the sound source emitting the target sound information comprises:

when the sound source of the target sound information is as high as the anthropomorphic head of the robot, sending a first plane included angle of the sound source of the target sound information relative to the anthropomorphic head of the robot to a servo mechanism of the robot, so that the servo mechanism rotates the anthropomorphic head of the robot leftwards or rightwards according to the first plane included angle and is just opposite to the sound source which sends the target sound information;

when the sound source of the target sound information is not high enough to the anthropomorphic head of the robot, the pitch angle of the anthropomorphic head of the robot relative to the sound source of the target sound information or the pitch angle of the anthropomorphic head of the robot relative to the sound source of the target sound information and a second plane included angle of the sound source of the target sound information relative to the anthropomorphic head of the robot are sent to a servo mechanism of the robot, so that the servo mechanism rotates the anthropomorphic head of the robot up and down according to the pitch angle or rotates the anthropomorphic head of the robot up and down according to the pitch angle and then rotates the anthropomorphic head of the robot left or right according to the second plane included angle to face the sound source sending the target sound information.

9. An apparatus for sound source localization when interacting, the apparatus comprising:

and the driving module is used for sending the azimuth information of the sound source of the target sound information to the servo mechanism of the robot so as to enable the servo mechanism to rotate the anthropomorphic head of the robot to be over against the sound source emitting the target sound information.

10. An apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 8 when executing the computer program.

11. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.