WO2021108991A1 - Procédé et appareil de commande, et plateforme mobile - Google Patents
Procédé et appareil de commande, et plateforme mobile Download PDFInfo
- Publication number
- WO2021108991A1 WO2021108991A1 PCT/CN2019/122726 CN2019122726W WO2021108991A1 WO 2021108991 A1 WO2021108991 A1 WO 2021108991A1 CN 2019122726 W CN2019122726 W CN 2019122726W WO 2021108991 A1 WO2021108991 A1 WO 2021108991A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- sound source
- movable platform
- area
- sound
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 111
- 230000033001 locomotion Effects 0.000 claims abstract description 66
- 230000015654 memory Effects 0.000 claims description 54
- 230000000007 visual effect Effects 0.000 claims description 14
- 230000008859 change Effects 0.000 claims description 10
- 230000001755 vocal effect Effects 0.000 claims description 8
- 230000002441 reversible effect Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 abstract description 17
- 238000010586 diagram Methods 0.000 description 18
- 230000003993 interaction Effects 0.000 description 16
- 238000004590 computer program Methods 0.000 description 15
- 230000001360 synchronised effect Effects 0.000 description 10
- 230000002452 interceptive effect Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000003068 static effect Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000017105 transposition Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
Definitions
- This application relates to the field of computer technology, and more specifically, to a control method, device, and movable platform.
- the embodiments of the present application provide a control method and device and a movable platform, which can effectively identify the sound information of a target object, and improve the signal-to-noise ratio of the acquired target object's sound.
- a control method is provided.
- a movable platform is equipped with an audio collection device for acquiring sound emitted by a target object.
- the method includes: acquiring sound source distribution information around the target object; and according to the sound source distribution The information determines the target area around the target object that meets the sound source conditions; controls the movement of the movable platform so that the direction of the sound pickup beam of the audio collection device, the target object and the target area meet a preset Azimuth relationship.
- a control device including the device including a memory and a processor; the memory is used to store program code; the processor calls the program code, and when the program code is executed, it is used to execute The following operations: obtain the sound source distribution information around the target; determine the target area around the target that meets the sound source conditions according to the sound source distribution information; control the movement of the movable platform so that the movable platform includes The direction of the pickup beam of the audio collecting device, the target object and the target area satisfy a preset azimuth relationship.
- a mobile platform including: an audio/video collection device for acquiring the sound emitted by a target; at least one processor, individually or collectively, for: acquiring a sound source around the target Distribution information; according to the sound source distribution information, determine the target area around the target that meets the sound source conditions; control the movement of the movable platform so that the direction of the sound pickup beam of the audio collection device and the target The object and the target area satisfy a preset orientation relationship.
- a chip is provided for implementing the method in the first aspect or its implementation manners.
- the chip includes: a processor, configured to call and run a computer program from the memory, so that the device installed with the chip executes the method in the first aspect or its implementation manners.
- a computer-readable storage medium for storing a computer program.
- the computer program includes instructions for executing the first aspect or any possible implementation of the first aspect.
- a computer program product including computer program instructions that cause a computer to execute the above-mentioned first aspect or the method in each implementation manner of the first aspect.
- a computer program which when running on a computer, causes the computer to execute the method in the first aspect or any possible implementation of the first aspect.
- a control method is provided.
- the movable platform is equipped with an audio collection device for acquiring the sound emitted by the target object.
- the method includes:
- the movement of the movable platform is controlled so that the orientation of the audio collection device, the target object and the target area meet a preset azimuth relationship.
- the control method provided by the embodiments of the present application controls the movement of the movable platform according to the target area that meets the sound source condition, so that the direction of the pickup beam of the audio collection device, the target, and the target area meet the preset azimuth relationship. Because in the process of controlling the movement of the movable platform, the movement can be based on the determined target area, so that the audio collection device included in the movement of the movable platform can obtain the sound information emitted by the target as much as possible and weaken it as much as possible Or shielding and acquiring the sound information emitted by objects other than the target object, and further, the signal-to-noise ratio of the acquired target object's sound can be improved.
- Fig. 1 is an architecture diagram of a technical solution applying an embodiment of the present application
- FIG. 2 is a schematic flowchart of a control method provided by an embodiment of the present application.
- FIG. 3a is a schematic diagram of the relative position of the sound pickup beam and the target object of the audio collection device provided by an embodiment of the present application;
- 3b is a schematic diagram of the relative position of the sound pickup beam and the target object of the audio collection device provided by another embodiment of the present application;
- FIG. 3c is a schematic diagram of the relative position of the sound pickup beam and the target object of the audio collection device provided by another embodiment of the present application.
- FIG. 3d is a schematic diagram of the relative position of the sound pickup beam and the target object of the audio collection device provided by still another embodiment of the present application.
- FIG. 4a is a schematic diagram of a candidate area divided around a target object according to an embodiment of the present application.
- FIG. 4b is a schematic diagram of a candidate area divided around a target object according to another embodiment of the present application.
- FIG. 4c is a schematic diagram of a candidate area divided around a target object according to another embodiment of the present application.
- FIG. 4d is a schematic diagram of a candidate area divided around a target object according to another embodiment of the present application.
- FIG. 5 is a schematic flowchart of a control method provided by another embodiment of the present application.
- FIG. 6 is a schematic flowchart of a control method provided by another embodiment of the present application.
- FIG. 7 is a schematic diagram of dividing a user's 360° area according to an embodiment of the present application.
- FIG. 8 is a schematic structural diagram of a control device provided by an embodiment of the present application.
- FIG. 9 is a schematic structural diagram of a movable platform provided by an embodiment of the present application.
- FIG. 10 is a schematic structural diagram of a control device provided by another embodiment of the present application.
- FIG. 11 is a schematic structural diagram of a chip provided by an embodiment of the present application.
- pan-robot equipment has entered all aspects of human society, such as industrial robots and service robots.
- a large proportion of these robots are mobile robots, such as building delivery robots, and unmanned robots. Machine delivery robots, etc.
- This type of robot is gradually replacing the previous manual operation, and in the foreseeable future, this type of robot can make more human-like operations, such as increasing human-machine dialogue and information exchange, making the entire service process more humane and convenient .
- These robots are mobile and can be regarded as mobile platforms, including, but not limited to, biped, four-wheel, multi-rotor aircraft and other body moving devices capable of carrying a certain weight.
- This application takes a service robot as an example. After the drone delivery robot or building delivery robot delivers the goods, when interacting with the user, the following human-like scenarios may exist: Here, it is tentatively required to use voice dialogue to confirm the user’s identity.
- the internal face database of machine vision cannot meet the recognition requirements, or the user's facial data cannot be used due to privacy.
- the dialogue between the delivery robot and the consignee (user) can be as follows:
- Delivery robot Hello, there is your package, and then confirm the information with you, please state your name and mobile phone number;
- Delivery robot OK, please receive the package and sign for it;
- the distance between the first delivery robot 120 and the user 110 is too far, for example, the distance between the two is more than 3m.
- the current solution is that the first delivery robot 120 and the user 110 have the same orientation of the dialogue, through the microphone Array speech enhancement to obtain a better signal-to-noise ratio, this process may increase the cost of hardware and algorithms, and the original sound will be distorted after processing, which will cause recognition failure and affect user experience.
- the distance between the second delivery robot 130 and the user 110 is moderate, but the microphone array is used to locate the user's voice direction, and there are other noises in the same direction, such as the noise emitted by the car 140, causing the microphone array to pick up the user 110
- the sound also picks up a lot of background noise.
- the background noise here includes but is not limited to the speech sound of other people except the user, wind noise, car noise, etc.
- the current solution cannot distinguish between useful human voice and noise in this scene, so the sound source There is a lot of background noise in, resulting in a low recognition rate.
- the beamforming algorithm can be used to optimize the problems in the above solutions.
- the specific process is as follows:
- the delivery robot uses the microphone voice array and beamforming algorithm to locate the user's direction. Assuming that the user's direction is 0°, it can only pick up the sound within the angle range of (0° ⁇ °), and then attenuate ( ⁇ ° ⁇ (360- ⁇ )°) The sound within this angle range.
- the pre-recorded or learned models can be compared and then eliminated, for example, the wind noise beamforming principle.
- AGC automatic gain control
- the microphone voice array of the pickup module picks up the user’s voice accuracy and its sensitivity, distortion and bottom. Noise and other common decisions.
- the microphone voice array picks up the user’s voice and deviates from the user’s direction, it means that the received original user voice is extremely small, resulting in worse performance and lowering the user’s experience; if the user has other voices in the same direction, For example, other people's speech/noise, etc., even after the beamforming algorithm is optimized, there are still background noises in the voice received by the delivery robot, resulting in insufficient voice signal-to-noise ratio and problems in human-computer interaction.
- recognizing animal sounds For example, a pet companion robot that interacts with pets needs to recognize sounds made by pets. For another example, in farm monitoring applications, the sounds made by animals need to be recognized.
- Another possible scenario is to identify specific mechanical sounds. For example, when the vehicle is overhauled, the sound of the vehicle's mechanical vibration is obtained for problem diagnosis.
- the embodiment of the present application provides a control method, which can improve the signal-to-noise ratio of the sound emitted by the target object, and further improve the recognition rate of the sound of the target object.
- the control method 200 provided by an embodiment of the present application will be described in detail below with reference to FIG. 2.
- the method can be applied to a mobile platform, and can also be applied to a server communicating with the mobile platform. In some embodiments, part of the steps may be performed by the mobile platform, and part of the steps may be performed by the server.
- a control method 200 provided by an embodiment of this application, the method 200 may include steps 210-230.
- the target object in the embodiment of the present application may be a person, a device, or other objects that can make a sound and need to recognize the sound, which is not specifically limited in this application.
- the sound source distribution information around the person can be obtained. Further, the best position of the movable platform can be determined according to the sound source distribution information around the person to obtain the target person's Effective sound information; if the target in the embodiment of this application is a device, the sound source distribution information around the device can be obtained. Further, the best position of the movable platform can be determined according to the sound source distribution information around the device to obtain Effective sound information of the device, so that the quality of the device can be detected.
- the sound source distribution information in the embodiments of the present application may include noise and sound source distribution information around the target object and environmental information.
- the noise source distribution information in the embodiments of the present application may include objects that can make sounds around the target, for example, a car whistling around the target, or other people talking to each other around the target; in the embodiments of the present application,
- the environmental information may include the distribution of the environment around the target, for example, the environment around the target may be a school, a park, or a highway.
- the target area around the target object that meets the sound source condition can be determined according to the sound source distribution information. It is understandable that the determined target area can maximize the acquisition of the sound emitted by the target object when acquiring the sound of the target object, and can minimize or shield the acquisition of sound information emitted by other objects.
- the movable platform in the embodiments of this application can be a delivery robot, or a smart speaker, or an aircraft with multiple wheels or multiple rotors, etc. This application does not specifically limit this, as long as it can interact with the target Smart devices can all apply the embodiments of this application.
- the sound source distribution information is acquired based on the audio collection device.
- the movable platform may include an audio collection device, and the direction of the pickup beam of the audio collection device may be the forward direction of the movable platform or not the forward direction of the movable platform.
- the audio collection device can be expanded and contracted based on the robotic arm included in the movable platform. Therefore, in this case, the audio collection device can still be Collect the sound of the target.
- the preset orientation relationship in the embodiments of the present application means that the audio collection device can obtain the sound information emitted by the target object as much as possible, and weaken or shield as much as possible the information emitted by objects other than the target object. Voice information.
- the control method provided by the embodiments of the present application controls the movement of the movable platform according to the target area that meets the sound source condition, so that the direction of the pickup beam of the audio collection device, the target, and the target area meet the preset azimuth relationship. Because in the process of controlling the movement of the movable platform, the movement can be based on the determined target area, so that the audio collection device included in the movement of the movable platform can obtain the sound information emitted by the target as much as possible and weaken it as much as possible Or shielding and acquiring the sound information emitted by objects other than the target object, and further, the signal-to-noise ratio of the acquired target object's sound can be improved.
- the following will specifically introduce the audio collection device to obtain the sound source distribution information around the target object.
- the following main human-machine voice interaction scenarios are illustrated as examples. It is worth noting that this does not limit the real-time scene of the present invention to this example scene.
- the acquiring the sound source distribution information around the target includes: adjusting the direction of the pickup beam so that the direction of the pickup beam corresponds to the direction around the target. Different orientations; acquiring the sound source distribution information based on the sound information acquired by the audio collection device under different pickup beam directions.
- the target object is a person as an example for description.
- the direction of the pickup beam of the audio collection device can be adjusted, and by adjusting the direction of the pickup beam, the sound source distribution information in different directions around the person can be obtained.
- the pickup beam of the audio collection device points to the right of the person, the pickup beam can obtain the sound source distribution information in the left and right areas of part of the person at this position, as shown in Figure 3a
- the pickup beam of the audio collection device is pointed to the right left of the person, the pickup beam can obtain the sound source distribution information on the left part of the person at this position, that is, the area B in Figure 3b; similarly, If the pickup beam of the audio collection device is pointed upward, the pickup beam can obtain the sound source distribution information above the person at this position, that is, area C in Figure 3c; similarly, if the pickup beam of the audio collection device is pointed at Below, the pickup beam can obtain the sound source distribution information below the person at this position, that is, the area D
- each area in FIG. 3a to FIG. 3d in the implementation of this application is only an example image area, and the size of each area can be adjusted through an algorithm.
- the audio collection device in the embodiment of the present application may be a pickup sensor, for example, may include a condenser microphone (Electret Condenser Microphone, ECM), a micro electromechanical system (Micro Electro Mechanical System, MEMS), etc., which is not specifically limited in this application Any sensor that can convert sound into an electrical signal can be applied to the embodiments of the present application.
- ECM Electrical Condenser Microphone
- MEMS Micro Electro Mechanical System
- the audio collection device in the embodiment of the present application may include a single microphone or a microphone array.
- the audio collection device in the embodiment of the present application may include a single microphone or a microphone array. If the audio collection device includes a microphone array, the direction of the pickup beam can be adjusted based on the radio unit included in the microphone array. This will be described in detail below.
- the audio collection device includes a microphone array, and the direction of the pickup beam is adjusted based on the signal weight of each radio unit in the microphone array.
- the direction of the pickup beam is adjusted based on the pose of the movable platform.
- the audio collection device may include a microphone array, as shown in FIGS. 3a to 3d, each audio collection device may include multiple microphones to form a microphone array.
- each audio collection device may include multiple microphones to form a microphone array.
- you can The adjustment is made based on the signal weight of each radio unit in the microphone array.
- microphone 1 to microphone 5 in the figure can be the sound receiving unit in the embodiment of the application. If the sound weight of microphone 2 to microphone 4 is greater than the sound weight of microphone 1 and microphone 5, the sound beam The direction of can be based on the position to obtain the sound source distribution information of the right area and part of the left area of the person.
- the direction of the sound beam in FIG. 3a can be adjusted to the direction of the sound beam in FIG. 3c, which can be mainly based on microphone 1 and microphone. 2 Obtain the sound source distribution information of the area above the person.
- the sound source distribution information includes one or more of the following information: position information of the sound source, volume information of the sound source, and type information of the sound source .
- the movable platform is equipped with a visual acquisition device, and the sound source distribution information is acquired based on the visual acquisition device.
- the sound source distribution information in the embodiments of the present application may include the position information of the sound source, for example, in which direction the sound source around the target object is located in the target object, the position information can be obtained through an audio collection device, or through visual The acquisition by the collection device can also be obtained through the audio collection device and the visual collection device together, which is not specifically limited in this application.
- the sound source distribution information in the embodiment of the present application may include sound source volume information, for example, the volume of the sound source around the target object is in decibels, and the volume information may be obtained through an audio collection device.
- the sound source distribution information in the embodiments of the present application may include the type information of the sound source, for example, what is the sound source that emits sound around the target, such as a car, a person in a park, a student in a school, etc.
- the type information can be passed Visual acquisition device to obtain.
- the vision acquisition device in the embodiments of the present application may be a vision sensor.
- the vision sensor may include one or more of a camera module, an infrared sensor, or a radar sensor. This application does not specifically limit this, as long as it can be implemented The embodiments of this application can be applied to sensors that detect and scan the external environment.
- the target area around the target that meets the sound source conditions can be determined according to the sound source distribution information.
- the following will specifically describe the determination of the target area according to the sound source distribution information.
- the determining, according to the sound source distribution information, a target area that meets the sound source condition around the target includes: determining multiple primary candidate areas around the target, multiple The primary to-be-selected areas are respectively located in different orientations of the target; based on the sound source distribution information, a target area that meets the sound source condition among the plurality of primary to-be-selected areas is determined.
- the surroundings of the target may be divided into multiple primary candidate regions, and then the target region may be determined from the multiple primary candidate regions based on the acquired sound source distribution information.
- the target object for example, it can be explained by taking the target object as a person as an example.
- the surrounding area of the target person is divided into 4 primary candidate areas, namely area A, area B, area C, and area D. If there is strong background noise in the area A, and the sound source in the area B, the area C, and the area D is small or there is no background noise, then the area C can be determined as the target area in the embodiment of the application.
- the audio collection device of the movable platform collects the human voice
- the direction of the pickup beam of the audio collection device, the person and the area C satisfy the preset orientation relationship, for example, the movement of the audio collection device can be controlled In area A, the direction of the pickup beam of the audio collection device can be directed toward people.
- Areas B, C, and D with less noise can be used as the background of people. Because of their lower noise, they can only obtain the sound information from people with higher signal-to-noise ratio based on the pickup beam, while reducing the noise of the larger ones.
- the sound of area A is a very low noise.
- the pickup beam of the audio collection device may acquire the background noise in area A, there is at least no noise or low noise between the movable platform and the person. To a certain extent, it can also reduce the influence of noise on the mobile platform to obtain the sound information of the person, and further, the signal-to-noise ratio of the sound of the person can be improved.
- the movable platform can also be moved to the area B or the area D to obtain the sound information of the target person.
- the purpose is to make the movable platform only obtain the sound information emitted by the target person as much as possible. Other sounds should be attenuated or shielded as much as possible.
- the directions of any two adjacent primary candidate regions relative to the target are different by a first preset angle.
- the angle between any adjacent primary candidate area relative to the target object may be different by a first preset angle.
- the first preset angle in the embodiment of the present application may be 0°, that is, the angle between any adjacent primary candidate area relative to the target object (person) is 0° .
- the angle between the area A and the area B in FIG. 4a to the target is 0°
- the angle between the area B and the area C relative to the target is 0°
- the area C and the area D are relative to the target.
- the angle between the objects is 0°
- the angle between the area D and the area A relative to the target object is 0°.
- the first preset angle may also be other angles greater than 0°.
- the first preset angle may be 45°, that is, any adjacent primary candidate area relative to The angle between the targets can be 45°.
- the angle between the area A and the area B relative to the target can be 45°
- the angle between the area B and the area C relative to the target can be 45°
- the area C and the area D relative to the target The angle between the two may be 45°
- the angle between the area D and the area A relative to the target object may be 45°.
- multiple primary candidate regions around the target may be determined according to the sound source distribution information, and target regions that meet the sound source conditions may be determined from the multiple primary candidate regions based on the sound source distribution information.
- the angles between any adjacent primary candidate regions relative to the target object may be different.
- multiple primary candidate regions may be determined according to the sound source distribution information. If the audio collection device collects the sound source distribution information around the target, the volume of the sound sources in different areas in area A is different. For example, as shown in Figure 4c, there is a larger background in area A1. Noise. In area A2, there is no background noise, and in area C, there is no background noise.
- a part of the area close to the area D in the area C can be used as the target area in the embodiment of the present application.
- the area C1 in FIG. 4c can be used as the target area, and the movable platform can be controlled to move to the area C1.
- the pickup beam of the pickup collection device points to the target and area A2. Since there is no background noise in the area A2, only the sound information emitted by the target can be obtained to the maximum extent without being affected by noise in other areas; or ,
- the movable platform can be controlled to move to area A2, and the pickup beam of the pickup collection device points to the target and area C1. Since there is no background noise in area C1, it is possible to obtain only the sound information emitted by the target to the maximum extent. Not affected by noise in other areas.
- the target area that meets the sound source conditions can be determined from multiple primary candidate areas. In some cases, there may be large background noise in multiple primary candidate areas. Therefore, you can consider The area to be selected is divided again so that there is a target area that meets the conditions of the sound source, which will be described in detail below.
- the method further includes: if there is no target area that meets the sound source condition among the plurality of primary candidate regions, determining multiple secondary candidate regions around the target Area, a plurality of the secondary candidate regions are respectively located in different orientations of the target, and the directions of any two adjacent secondary candidate regions with respect to the target are different by a second preset angle, so The second preset angle is smaller than the first preset angle; and a target area that meets the sound source condition among the plurality of secondary candidate areas is determined based on the sound source distribution information.
- the target area that meets the sound source condition can be determined from the secondary candidate area.
- the angle between any adjacent areas of the secondary to-be-selected area relative to the target is smaller than the angle between any adjacent areas of the primary to-be-selected area relative to the target, that is, the first in the embodiment of the present application
- the preset angle is greater than the second preset angle.
- the second preset angle is 30°, which is smaller than the aforementioned first preset angle of 45°.
- the secondary candidate area is divided into more detailed areas around the target, that is, the scope of each area included in the secondary candidate area is smaller, so it can be based on the multiple The small area determines the target area in the embodiment of the present application.
- the area C2 in Figure 4d can be determined as a target area that meets the sound source conditions, and the movable platform can be controlled to move to the area C2, and the pickup beam of the pickup collection device points to the target and area A2, because there is no area in the area A2.
- the target area that meets the sound source conditions can be determined based on the sound source distribution information, and some conditions included in the sound source conditions will be described in detail below.
- the sound source condition includes one or more of the following conditions: the sound source volume in the area is less than a volume threshold; the sound source frequency in the area belongs to a preset frequency range; The type of sound source belongs to the preset category; the volume change of the sound source in the area within the preset first duration is less than the preset threshold.
- the sound source conditions may include multiple, for example, the sound source volume in a certain area is less than the volume threshold, for example, the volume of the sound source around the target is less than 10 decibels, or the volume of the sound source around the target If it is less than the volume of the sound emitted by the target, it can be considered that the area meets the conditions of the sound source, and the area can be regarded as the target area.
- area C is a relatively quiet park, and the volume of the sound source in the park is less than 10 or less than the volume of the sound made by people, then it can be considered that area C meets the sound source conditions, and the area can be C is the target area.
- the movable device can be moved to the opposite side of the area C, for example, in the area A, so that only the voice of the target person can be acquired as much as possible.
- the sound source condition may also be that the sound source frequency in the area belongs to a preset frequency range. Assuming that the preset frequency range is 300Hz-3000Hz, if the frequency of the sound emitted by the sound source in a certain area is within the range of 300Hz-3000Hz, it can be considered that the area meets the sound source condition, and the area can be regarded as the target area.
- area C is a relatively quiet park
- the park may include whispers among other people, and the frequency of people’s speech is generally in the range of 300Hz-3000Hz
- area C conforms to For sound source conditions
- area B is a construction site
- the sound source condition can also be that the sound source type in the area belongs to a preset type. Assuming that the preset type is people whispering to each other or rivers that make sounds, if the sound source in the area is people talking to each other or a river, it can be considered that the area meets the conditions of the sound source, and the area can be used as the target area; if If the sound source in the area is an electric drill on the construction site, it is deemed that the area does not meet the sound source conditions.
- the sound source condition may also be that the volume change of the sound source in the area within the preset first time period is less than the preset threshold.
- the car running on the road behind the target. When the car passes by the target, it will whistle for a few seconds at a volume of 100 decibels, and then there will be no on the road for the next 5 minutes. The car passed by. Assuming that the first duration is 1 minute and the preset threshold is 50 decibels, since the whistle of the car lasts for a few seconds and the volume is greater than the preset threshold, the area behind the target does not meet the sound source condition.
- the controlling the movement of the movable platform so that the direction of the pickup beam of the audio collecting device, the target object and the target area meet a preset azimuth relationship includes: controlling the movement of the movable platform so that the direction of the sound pickup beam points to the target object and the target area.
- controlling the movement of the movable platform so that the direction of the pickup beam points to the target object and the target area includes: controlling the movement of the movable platform In the first direction to the target, the first direction is the reverse of the direction from the target to the target area.
- the direction of the sound pickup beam in the embodiments of the present application points to the target object and the target area, which may mean that when the direction of the sound pickup beam points to the target object and the target area, the sound information emitted by the target object can be obtained as much as possible, and The sound information other than the target object is weakened or shielded, and further, the speech signal-to-noise ratio can be improved.
- the movement of the movable platform can be controlled so that the direction of the pickup beam points to the target object and the target area.
- the movable platform can be controlled to move into area A, so that the audio collection device of the movable platform can maximize the acquisition of the target object.
- the emitted sound information improves the signal-to-noise ratio of the sound emitted by the acquired target.
- area C is the target area in the embodiment of the application, that is, there is no noise in area C or the noise is negligible relative to the sound emitted by the target
- the movable platform can be moved In area A, when the sound information emitted by the target is acquired, since there is no other noise influence, the acquired sound information is effective, that is, the mobile platform and the target can interact normally.
- the boundary between the area A and the area B where the movable platform moves, and the audio collection device included in the movable platform can extend outward, for example, can extend into the area A.
- the audio collection device is located in area A, so that the audio collection device can also maximize the acquisition of the sound information emitted by the target, and improve the acquisition The signal-to-noise ratio of the sound emitted by the target.
- the method further includes: sending out a prompt message for prompting that the sounding side of the target object faces the movable platform.
- a prompt message can be sent to the target object, which is used to prompt that the sounding side of the target object corresponds to the audio collection device of the movable platform That is, the sounding side of the target object can face the audio collection device of the movable platform, so that the audio collection device can effectively collect the sound information emitted by the target object.
- the prompt information may be voice information, or text information or color information displayed on the movable platform, which is not specifically limited in this application.
- a voice prompt message can be issued to the target object, for example, a voice message similar to "please face to me” can be issued, and the target object can receive the prompt message , You can turn your position so that its face faces the audio collection device included in the movable platform.
- the movable platform can display a text message similar to "Please face me" on the included display screen. After the target receives the prompt information, It can rotate its own position so that its face faces the audio collection device included in the movable platform.
- the movable platform can display green.
- the target sees the green information displayed on the movable platform, it can rotate its position so that its face is movable
- the audio acquisition device included in the platform can display green.
- the movable platform can select the prompt information according to the type of the target object.
- the prompt information can be any one or more of voice information, text information and color information; if the target object If it is a robot, the prompt information can be voice information.
- the 0° alignment algorithm may also be used to align the movable platform with the vocal side of the target, that is, the vocal side of the target faces the movable platform.
- the foregoing describes the determination of the target area based on the sound source distribution information and the control of the movement of the movable platform, so that the audio collection device can obtain only the sound information emitted by the target object to the greatest extent.
- prompt information can be sent to the target, so that the audio collection device can only acquire To the sound information emitted by the target.
- the method 200 may further include step 240.
- a prompt message can be sent to the target object , This information can prompt the target to increase the volume of sound.
- the absence of a target area in the implementation of the present application may refer to the absence of a target area caused by the presence of noise in a 360° direction around the target.
- the prompt information may be voice information, or text information or color information displayed on the mobile platform, or may be multiple types of information in voice information, text information, and color information. Specific restrictions.
- a voice prompt message can be issued to the target object, for example, a text message similar to "please increase the volume" can be sent out, and the target object can receive the prompt message Later, the volume can be increased so that the movable platform can obtain the sound information emitted by the target.
- the prompt information is text information
- a message similar to "please increase the volume” can be displayed on the display screen included in the movable platform.
- the volume can be increased so that the movable platform can obtain the sound information emitted by the target.
- the prompt information is color information
- adjustments can be made based on preset rules. For example, green can indicate that the sound volume of the target is reduced, and the target can increase the volume.
- the movable platform can emit a green flashing light, and the target object can increase the volume after receiving the prompt information of this color, so that the movable platform can obtain the sound information emitted by the target object.
- the method before acquiring the sound source distribution information around the target object, the method further includes: determining whether a voice recognition instruction in the sound emitted by the target object is a preset instruction; Acquiring the sound source distribution information around the target object includes: if the voice recognition instruction is the preset instruction, acquiring the sound source distribution information around the target object.
- the preset instruction in the embodiment of the present application may be an instruction indicating whether the number of voice interaction recognition errors between the target object and the movable platform is greater than a preset threshold, or whether the movable platform can extract valid information from the target object’s sound information.
- the information instruction can also be an instruction to convert what percentage of the sound information of the target object's sound information.
- the preset instruction is an instruction for whether the number of voice interaction recognition errors between the target and the movable platform is greater than the preset threshold, and the preset threshold is 2 times. If the first time between the target and the movable platform If the voice recognition is unsuccessful, the target can interact with the movable platform again. If the second voice recognition between the target and the movable platform is still unsuccessful, it means that there may be strong noise around the target, and the movable platform
- the sound source distribution information around the target object can be obtained, and the position of the movable platform can be determined based on the sound source distribution information. Furthermore, only the sound information emitted by the target object can be obtained to the greatest extent.
- the preset instruction is an instruction on whether the movable platform can extract effective information from the sound information of the target
- the movable platform can extract the effective interactive information from the sound information of the target, it can be based on the effective information and the target.
- Interaction if the movable platform cannot extract the effective information of the interaction from the sound information of the target object, it can obtain the sound source distribution information around the target object. Further, the preferred position of the movable platform can be determined based on the sound source distribution information Further, it is possible to obtain only the sound information emitted by the target object to the maximum extent.
- the movable platform in the case that the movable platform cannot extract effective information for interaction from the sound information of the target object, it can try to interact with the target object again, if it still cannot be extracted from the sound information of the target object
- the sound source distribution information around the target can be obtained, and the preferred position of the movable platform can be determined based on the acquired sound source distribution information.
- the movable platform can be maximized The ground only obtains the sound information emitted by the target.
- the preset instruction is an instruction for how much of the sound information of the target object’s sound information is converted
- 50% of the sound information of the target object’s sound information it can be considered that the target object can be
- the movable platform performs normal interaction; if 20% of the sound information of the target object is converted, it can be considered that the target object cannot interact with the movable platform normally.
- the target object can be obtained
- the surrounding sound source distribution information further, the preferred position of the movable platform can be determined based on the sound source distribution information, so that at the determined preferred position, the movable platform can maximize only the sound information emitted by the target object.
- the method further includes: controlling the movable platform to move to a position where the distance from the target is within a preset distance interval.
- the action of controlling the position where the distance between the movable platform and the target belongs to the preset distance interval may be before controlling the movement of the movable platform, or in the process of controlling the movement of the movable platform , This application does not specifically limit this.
- the distance between the two can be adjusted first. Assuming that the preset distance interval is the interval [1, 2], the target can be moved closer or farther away. The distance between the two platforms is adjusted within the interval [1, 2]; or the movable platform is found to be far or closer to the target after being scanned and identified, and you can adjust its own The distance between the two position adjustments is within the range [1, 2].
- the distance between the movable platform and the target can also be adjusted to fall within the range [1, 2].
- the movable platform and the target arrive at the designated location, the movable platform starts to be located in area B.
- area C is in the implementation of this application. If there is no noise or noise in area C is not affected by the sound information emitted by the target, you can control the movable platform to move to area A.
- the distance data between the movable platform and the target can be adjusted at the same time.
- the preset distance interval that is, the distance between the movable platform and the target can be within the range of [1, 2].
- the preset distance interval in the embodiment of the present application may be interval [1, 2], that is, the distance between the movable platform and the target is controlled to be between 1-2 meters.
- the endpoint value of the preset distance interval in the embodiment of the present application may be other numerical values, for example, the interval [1, 3], etc., which should not be particularly limited to the present application.
- the method further includes: identifying the sound-producing side of the target; controlling the movement of the movable platform so that the sound pickup beam of the audio device is directed to the target The vocal side.
- the sound-producing side of the identification target can be identified based on the audio collection device, or can be identified based on the visual collection device, and can also be recognized by the audio collection device and the visual collection device.
- identifying the occurrence side of the target and controlling the movement of the movable platform may be before controlling the movement of the movable platform, or may be in the process of controlling the movement of the movable platform, which is not specifically limited in this application.
- the movable platform can first identify the vocal side of the target, and after recognizing the vocal side of the target, the movable platform can be controlled to move, so that the movable platform can be moved.
- the sound pickup beam of the included audio collecting device is directed to the sounding side of the target object, that is, the sounding side of the audio collecting device and the target object face each other.
- the sound pickup beam of the audio device can also be adjusted to point to the sound emitting side of the target object.
- the sound pickup beam of the audio device can also be adjusted to point to the sound emitting side of the target object. For example, as shown in Figure 4a, if the movable platform and the target arrive at the designated location, the movable platform starts to be located in area B. After analyzing the sound source distribution information around the target, it is determined that area C is in the implementation of this application. If there is no noise or noise in area C is not affected by the sound information emitted by the target, you can control the movable platform to move to area A. In the process of controlling the movable platform to move to area A, The sound pickup beam of the audio acquisition device included in the movable platform can be adjusted to point to the sounding side of the target object at the same time.
- control method 600 provided in this embodiment of the present application may include steps 610-632.
- the robot scans and recognizes the surrounding environment and judges the distance from the user.
- step 613 If not, go to step 613, and if yes, go to step 614.
- the distance to the user is too close, and the stop location is re-planned with the user as the center.
- step 615 If yes, go to step 615, if not, go to step 616.
- the microphone array/vision sensor module and the user start 0° alignment.
- step 618 If not, go back to step 616, if yes, go to step 618.
- the robot performs human-computer interaction according to a predetermined service program.
- step 620 is performed, and if yes, step 621 is performed.
- the robot and the user perform normal human-computer interaction.
- step 622 If yes, go to step 622, if not, go to step 623.
- a visual sensor is used to scan the user's environment within a 360° range.
- FIG. 7 it is a schematic diagram for dividing a user's 360° range area provided by an embodiment of this application.
- the area within a 360° range around the user is divided into 6 areas, namely, area A, area B, area C, area D, area E, and area F.
- the distance between the robot and the user is L2
- the distance L2 can be set to 2 meters
- L1 in the figure can be 2 meters.
- D1-D6 in the figure can be calculated according to the preset model, where D1-D6 are the distances of the robot from different positions of the user.
- D1 can be the distance between the robot and the a point around the user;
- D2 can be the robot and The distance between point b around the user;
- D3 can be the distance between the robot and point c around the user;
- D4 can be the distance between the robot and point d around the user;
- D5 can be the point e around the robot and the user D6 can be the distance between the robot and the f point around the user.
- the robot can scan the sound source distribution information in a 360° direction centered on the user and within 2 meters of the user.
- the robot scans and detects the sound source distribution information around the user, it is not limited to two-dimensional planar detection, but can also be stereo scanning detection, that is, it can perform 360° stereo scanning detection of the sound source distribution information around the user. .
- the divided area around the user in the embodiment of the present application is not limited to the area shown in FIG. 7, and may also be divided into other areas, for example, it may be divided into 8 or 10 areas, etc., which is not covered by this application. Specific restrictions.
- the division when the area around the user is divided, the division may not be uniform.
- the area A can be divided larger than the area B, or the area A can be smaller than the area B, that is, the area
- the angle of A relative to the user may be slightly larger or smaller than the angle of area B relative to the user, which is not specifically limited in this application.
- step 629 If not, go to step 629, and if yes, go to step 630.
- step 629 Return to step 618 to give feedback to the user and instruct to increase the volume.
- the robot starts a movement centered on the user and the relative distance remains the same.
- the embodiment of the present application also provides another control method.
- the movable platform is equipped with an audio collection device for acquiring the sound emitted by the target object.
- the method includes:
- the movement of the movable platform is controlled so that the orientation of the audio collection device, the target object, and the target area meet a preset azimuth relationship.
- the orientation of the audio collection device may be the orientation of the microphone of the audio collection device.
- a mesh microphone collection window may be provided on the movable platform, and the orientation of this window can identify the orientation of the microphone to a certain extent.
- the sound source distribution information is acquired based on the audio collection device.
- the acquiring the sound source distribution information around the target includes: adjusting the orientation of the audio collection device so that the orientation of the audio collection device corresponds to different orientations around the target; based on the The sound source distribution information is acquired by the sound information acquired by the audio collection device in different orientations.
- the orientation of the audio collection device is adjusted based on the pose of the movable platform.
- the sound source condition includes a first sound source condition
- the first sound source condition includes one or more of the following conditions: the sound source volume in the area is less than the first sound source condition. Volume threshold; the frequency of the sound source in the area belongs to the first preset frequency range; the type of the sound source in the area belongs to the first preset type; the volume change of the sound source in the area during the preset first time period is less than the first preset Threshold.
- the specific first volume threshold, the first preset frequency range, the first preset category, the first preset threshold and other condition values can be based on the acquired sound from the target object when performing a corresponding sound recognition operation.
- the interference of the voice recognition operation is less than the condition value setting of the preset index. For example, when performing a voice recognition operation, if a sound source whose volume is less than a condition value of 80 decibels can control the recognition error rate index to be below 10%, the volume threshold is set to 80 decibels.
- the first target area is an area that meets the conditions of the first sound source, and the movement of the movable platform is controlled to make the orientation of the audio collection device of the audio collection device, the target object and the first sound source
- the target area meeting the preset azimuth relationship includes: controlling the movement of the movable platform so that the direction of the audio collection device points to the target object and the first target area.
- the controlling the movement of the movable platform so that the direction of the audio collection device points to the target object and the first target area includes: controlling the movable platform to move to the first side of the target object Upward, the first direction is the reverse of the direction from the target to the first target area.
- the sound source condition includes a second sound source condition
- the second sound source condition includes one or more of the following conditions: the sound source volume in the area is less than the second sound source condition. Volume threshold; the frequency of the sound source in the area belongs to the second preset frequency range; the type of the sound source in the area belongs to the second preset category; the volume change of the sound source in the area within the preset second time period is less than the second preset Threshold.
- the specific second volume threshold, second preset frequency range, second preset type, second preset threshold and other condition values can be based on the acquired sound from the target object when performing a corresponding voice recognition operation.
- the interference of the voice recognition operation is greater than the condition value setting of the preset index, that is, there may be a noise source in the area that affects the sound emitted by the recognition target.
- the preset frequency range is set to 20-50 Hz.
- the second target area is an area that meets the conditions of the second sound source.
- the controlling the movement of the movable platform so that the orientation of the audio collecting device of the audio collecting device, the target object and the second target area satisfy a preset orientation relationship including: controlling the movable platform Move to make the direction of the audio collection device point to the target, and at the same time make the second target area away from the direction of the audio collection device.
- the controlling the movement of the movable platform so that the direction of the audio collection device points to the target object and the second target area includes: controlling the movable platform to move to the first side of the target object Upward, the first direction is a direction from the target to the second target area.
- the direction of the audio collecting device points towards the target and away from the second target area, so as to reduce the influence of noise sources in the second target area.
- the method further includes: identifying the sounding side of the target object; and controlling the movement of the movable platform so that the direction of the audio device points to the sounding side of the target object.
- the orientation of the audio device may be the same as the sound pickup direction.
- the movable platform is a voice interactive robot, and the audio device is mounted on the front of the voice interactive robot.
- the front of the voice interactive robot faces the user
- its audio device faces the user
- the sound pickup direction also points to the user.
- an audio device including a microphone array is mounted on the front of a voice interactive robot.
- the front of the voice interactive robot faces the user
- its audio device faces the user
- the pickup direction may point to the user according to the adjustment of the microphone array parameters, or it may point to another target next to the user.
- FIG. 8 is a control device 800 provided by an embodiment of this application.
- the device 800 may include a memory 810 and a processor 820.
- the memory 810 is used to store program codes.
- the processor 820 calls the program code, and when the program code is executed, is configured to perform the following operations:
- a target area that meets the sound source condition around the target is determined.
- the movement of the movable platform is controlled so that the direction of the pickup beam of the audio collecting device included in the movable platform, the target object and the target area meet a preset azimuth relationship.
- the sound source distribution information is acquired based on the audio collection device.
- the processor 820 is further configured to: adjust the direction of the pickup beam so that the direction of the pickup beam corresponds to different directions around the target; based on the audio
- the sound source distribution information is acquired by the sound information obtained by the collecting device under different pickup beam directions.
- the audio collection device includes a microphone array, and the direction of the pickup beam is adjusted based on the signal weight of each radio unit in the microphone array.
- the direction of the pickup beam is adjusted based on the pose of the movable platform.
- the sound source distribution information includes one or more of the following information: position information of the sound source, volume information of the sound source, and type information of the sound source .
- the movable platform is equipped with a visual acquisition device, and the sound source distribution information is acquired based on the visual acquisition device.
- the processor 820 is further configured to: determine a plurality of primary candidate regions around the target, and the plurality of primary candidate regions are respectively located in different directions of the target; The target area that meets the sound source condition among the plurality of primary candidate areas is determined based on the sound source distribution information.
- the directions of any two adjacent primary candidate regions relative to the target object differ by a first predetermined angle.
- the processor 820 is further configured to: if there is no target area that meets the sound source condition among the plurality of primary candidate areas, determine multiple times around the target A second-level candidate area, a plurality of the secondary candidate areas are respectively located in different orientations of the target, and the direction of any two adjacent secondary candidate areas with respect to the target is different by a second preset An angle, the second preset angle is smaller than the first preset angle; a target area that meets the sound source condition among the plurality of secondary candidate areas is determined based on the sound source distribution information.
- the sound source condition includes one or more of the following conditions: the sound source volume in the area is less than a volume threshold; the sound source frequency in the area belongs to a preset frequency range; The type of sound source belongs to the preset category; the volume change of the sound source in the area within the preset first duration is less than the preset threshold.
- the processor 820 is further configured to: control the movement of the movable platform so that the direction of the pickup beam points to the target object and the target area.
- the processor 820 is further configured to: control the movable platform to move to a first direction of the target, and the first direction is from the target to the target. The direction of the target area is reversed.
- the processor 820 is further configured to: issue a prompt message for prompting that the sound-producing side of the target object faces the movable platform.
- the processor 820 is further configured to: if the target area does not exist, send a prompt message for prompting the target to increase the volume of sound.
- the processor 820 is further configured to: determine whether the voice recognition instruction in the sound emitted by the target is a preset instruction; if the voice recognition instruction is the preset instruction, Then, the sound source distribution information around the target object is acquired.
- the processor 820 is further configured to: control the movable platform to move to a position where the distance from the target object falls within a preset distance interval.
- the processor 820 is further configured to: identify the sound-producing side of the target; control the movement of the movable platform, so that the sound pickup beam of the audio device is directed to the target object. The sounding side of the target object.
- the embodiment of the present invention also provides a control device, which includes a memory and a processor;
- the memory is used to store program code
- the processor calls the program code, and when the program code is executed, it is used to perform the following operations:
- the movement of the movable platform is controlled so that the orientation of the audio collection device, the target object, and the target area meet a preset azimuth relationship.
- FIG. 9 is a movable platform 900 provided by an embodiment of the application.
- the movable platform 900 may include an audio/video collection device 910 and at least one processor 920.
- the audio/video acquisition device 910 is used to acquire the sound emitted by the target object.
- At least one processor 920 individually or collectively, is used to: obtain sound source distribution information around the target; determine, according to the sound source distribution information, a target area that meets the sound source conditions around the target; and control the The movement of the movable platform enables the direction of the pickup beam of the audio collection device, the target object and the target area to meet a preset azimuth relationship.
- the at least one processor 920 is further configured to: adjust the direction of the pickup beam so that the direction of the pickup beam corresponds to different directions around the target;
- the sound source distribution information is acquired by the sound information acquired by the audio collecting device under different pickup beam directions.
- the audio collection device includes a microphone array, and the direction of the pickup beam is adjusted based on the signal weight of each radio unit in the microphone array.
- the direction of the pickup beam is adjusted based on the pose of the movable platform.
- the sound source distribution information includes one or more of the following information: position information of the sound source, volume information of the sound source, and type information of the sound source .
- the at least one processor 920 is further configured to: determine a plurality of primary candidate regions around the target, and the plurality of primary candidate regions are located at different locations of the target. Azimuth; determining a target area that meets the sound source condition among the plurality of primary candidate areas based on the sound source distribution information.
- the directions of any two adjacent primary candidate regions relative to the target are different by a first preset angle.
- the at least one processor 920 is further configured to: if there is no target area that meets the sound source condition among the plurality of primary candidate areas, determine that there is more surrounding area around the target object.
- a plurality of secondary candidate regions are respectively located in different orientations of the target object, and the direction of any two adjacent secondary candidate regions with respect to the target object differs by a second A preset angle, where the second preset angle is smaller than the first preset angle; and a target area that meets the sound source condition among the plurality of secondary candidate areas is determined based on the sound source distribution information.
- the sound source condition includes one or more of the following conditions: the sound source volume in the area is less than a volume threshold; the sound source frequency in the area belongs to a preset frequency range; The type of sound source belongs to the preset category; the volume change of the sound source in the area within the preset first duration is less than the preset threshold.
- the at least one processor 920 is further configured to: control the movement of the movable platform, and the direction of the pickup beam points to the target object and the target area.
- the at least one processor 920 is further configured to: control the movable platform to move to a first direction of the target, where the first direction is from the target The reversal of the direction to the target area.
- the at least one processor 920 is further configured to: send out prompt information for prompting that the sounding side of the target object faces the movable platform.
- the at least one processor 920 is further configured to: if the target area does not exist, send out prompt information for prompting the target to increase the volume of sound.
- the at least one processor 920 is further configured to: determine whether the voice recognition instruction in the sound emitted by the target is a preset instruction; if the voice recognition instruction is The preset instruction acquires the sound source distribution information around the target object.
- the at least one processor 920 is further configured to: control the movable platform to move to a position where the distance from the target object falls within a preset distance interval.
- the at least one processor 920 is further configured to: identify the sound-producing side of the target; control the movement of the movable platform to make the sound pickup beam of the audio device Point to the utterance side of the target.
- the embodiment of the present application also provides a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are configured to execute any one of the above-mentioned control methods 200 or 600.
- the embodiments of the present application also provide a computer program product.
- the computer program product includes a computer program stored on a computer-readable storage medium.
- the computer program includes program instructions. When the program instructions are executed by a computer, the computer program The computer executes any one of the aforementioned control methods 200 or 600.
- Fig. 10 is a schematic structural diagram of a control device provided by still another embodiment of the present application.
- the control device 1000 shown in FIG. 10 includes a processor 1010, and the processor 1010 can call and run a computer program from a memory to implement the method described in the embodiment of the present application.
- control device 1000 may further include a memory 1020.
- the processor 1010 can call and run a computer program from the memory 1020 to implement the method in the embodiment of the present application.
- the memory 1020 may be a separate device independent of the processor 1010, or may be integrated in the processor 1010.
- control device 1000 may further include a transceiver 1030, and the processor 1010 may control the transceiver 1030 to communicate with other devices. Specifically, it may send information or data to other devices, or receive other devices. Information or data sent by the device.
- control device may be, for example, a robot, a smart speaker, etc.
- control device 1000 can implement the corresponding processes in the various methods in the embodiments of the present application. For the sake of brevity, details are not described herein again.
- FIG. 11 is a schematic structural diagram of a chip of an embodiment of the present application.
- the chip 1100 shown in FIG. 11 includes a processor 1110, and the processor 1110 can call and run a computer program from the memory to implement the method in the embodiment of the present application.
- the chip 1100 may further include a memory 1120.
- the processor 1110 can call and run a computer program from the memory 1120 to implement the method in the embodiment of the present application.
- the memory 1120 may be a separate device independent of the processor 1110, or may be integrated in the processor 1110.
- the chip 1100 may further include an input interface 1130.
- the processor 1110 can control the input interface 1130 to communicate with other devices or chips, and specifically, can obtain information or data sent by other devices or chips.
- the chip 1100 may further include an output interface 1140.
- the processor 1110 can control the output interface 1140 to communicate with other devices or chips, and specifically, can output information or data to other devices or chips.
- chips mentioned in the embodiments of the present application may also be referred to as system-level chips, system-on-chips, system-on-chips, or system-on-chips.
- the processor of the embodiment of the present application may be an integrated circuit image processing system with signal processing capability.
- the steps of the foregoing method embodiments can be completed by hardware integrated logic circuits in the processor or instructions in the form of software.
- the above-mentioned processor may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) or other Programming logic devices, discrete gates or transistor logic devices, discrete hardware components.
- DSP Digital Signal Processor
- ASIC application specific integrated circuit
- FPGA Field Programmable Gate Array
- the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
- the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
- the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
- the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
- the memory in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
- the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), and electrically available Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory.
- the volatile memory may be a random access memory (Random Access Memory, RAM), which is used as an external cache.
- RAM random access memory
- SRAM static random access memory
- DRAM dynamic random access memory
- DRAM synchronous dynamic random access memory
- DDR SDRAM Double Data Rate Synchronous Dynamic Random Access Memory
- Enhanced SDRAM, ESDRAM Enhanced Synchronous Dynamic Random Access Memory
- Synchronous Link Dynamic Random Access Memory Synchronous Link Dynamic Random Access Memory
- DR RAM Direct Rambus RAM
- the memory in the embodiment of the present application may also be static random access memory (static RAM, SRAM), dynamic random access memory (dynamic RAM, DRAM), Synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection Dynamic random access memory (synch link DRAM, SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DR RAM) and so on. That is to say, the memory in the embodiments of the present application is intended to include, but is not limited to, these and any other suitable types of memory.
- the memory in the embodiment of the present application can provide instructions and data to the processor.
- a part of the memory may also include a non-volatile random access memory.
- the memory can also store device type information.
- the processor may be used to execute instructions stored in the memory, and when the processor executes the instructions, the processor may execute each step corresponding to the terminal device in the foregoing method embodiment.
- each step of the above method can be completed by an integrated logic circuit of hardware in the processor or instructions in the form of software.
- the steps of the method disclosed in combination with the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
- the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
- the storage medium is located in the memory, and the processor executes the instructions in the memory and completes the steps of the above method in combination with its hardware. To avoid repetition, it will not be described in detail here.
- the pixels in the image can be located in different rows and/or columns, where the length of A can correspond to the number of pixels located in the same row included in A, and the height of A can be Corresponds to the number of pixels in the same column included in A.
- the length and height of A may also be referred to as the width and depth of A, which are not limited in the embodiment of the present application.
- the boundary spacing distribution with A can refer to at least one pixel spaced from the boundary of A, and can also be referred to as "not adjacent to the boundary of A” or “not located at the boundary of A”.
- “Border”, this embodiment of the application does not limit this, where A can be an image, a rectangular area, or a sub-image, and so on.
- the term "and/or” is merely an association relationship describing an associated object, indicating that there may be three relationships.
- a and/or B can mean that: A alone exists, A and B exist at the same time, and B exists alone.
- the character "/" in this text generally indicates that the associated objects before and after are in an "or" relationship.
- the disclosed system, device, and method can be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.
- the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
- the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes. .
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
L'invention concerne un procédé et un appareil de commande, et une plateforme mobile. Le procédé consiste : à acquérir des informations de distribution d'une source sonore autour d'un objet cible ; à déterminer, en fonction des informations de distribution de la source sonore, une région cible, qui remplit une condition de source sonore, autour de l'objet cible ; et à commander le mouvement d'une plateforme mobile, de sorte que la direction de pointage d'un faisceau de collecte d'un appareil de collecte audio, l'objet cible et la région cible satisfont une relation d'orientation prédéfinie. Selon le procédé de commande proposé dans la présente invention, pendant le processus de commande du mouvement d'une plateforme mobile, la plateforme mobile peut se déplacer sur la base d'une région cible déterminée, de sorte qu'un appareil de collecte audio compris dans le mouvement de la plateforme mobile peut acquérir, autant que possible, des informations d'un son produit par un objet cible, et affaiblir ou faire écran, autant que possible, à l'acquisition d'informations d'un son produit par un objet autre que l'objet cible ; en outre, le rapport signal sur bruit des sons acquis produits par l'objet cible peut être amélioré.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201980048649.1A CN112470215A (zh) | 2019-12-03 | 2019-12-03 | 控制方法、装置和可移动平台 |
PCT/CN2019/122726 WO2021108991A1 (fr) | 2019-12-03 | 2019-12-03 | Procédé et appareil de commande, et plateforme mobile |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/122726 WO2021108991A1 (fr) | 2019-12-03 | 2019-12-03 | Procédé et appareil de commande, et plateforme mobile |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021108991A1 true WO2021108991A1 (fr) | 2021-06-10 |
Family
ID=74807693
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/122726 WO2021108991A1 (fr) | 2019-12-03 | 2019-12-03 | Procédé et appareil de commande, et plateforme mobile |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112470215A (fr) |
WO (1) | WO2021108991A1 (fr) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113747303B (zh) * | 2021-09-06 | 2023-11-10 | 上海科技大学 | 定向声束耳语交互系统、控制方法、控制终端及介质 |
CN114242072A (zh) * | 2021-12-21 | 2022-03-25 | 上海帝图信息科技有限公司 | 一种用于智能机器人的语音识别系统 |
CN114516061B (zh) * | 2022-02-25 | 2024-03-05 | 杭州萤石软件有限公司 | 一种机器人控制方法、机器人系统及一种机器人 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106708047A (zh) * | 2016-12-21 | 2017-05-24 | 精效新软新技术(北京)有限公司 | 智能物品投递机器人装置及其控制方法 |
WO2018023232A1 (fr) * | 2016-07-31 | 2018-02-08 | 杨洁 | Procédé permettant de déplacer un robot en fonction d'un son, et robot |
CN108496128A (zh) * | 2016-01-28 | 2018-09-04 | 高通股份有限公司 | 无人机飞行控制 |
CN108828599A (zh) * | 2018-04-06 | 2018-11-16 | 东莞市华睿电子科技有限公司 | 一种基于救援无人机的受灾人员搜寻方法 |
CN109144092A (zh) * | 2017-06-16 | 2019-01-04 | 昊翔电能运动科技(昆山)有限公司 | 无人机飞行辅助方法、装置及无人机 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE10224816A1 (de) * | 2002-06-05 | 2003-12-24 | Philips Intellectual Property | Eine mobile Einheit und ein Verfahren zur Steuerung einer mobilen Einheit |
CN102137318B (zh) * | 2010-01-22 | 2014-08-20 | 华为终端有限公司 | 拾音控制方法和装置 |
JP6977448B2 (ja) * | 2017-09-27 | 2021-12-08 | 沖電気工業株式会社 | 機器制御装置、機器制御プログラム、機器制御方法、対話装置、及びコミュニケーションシステム |
CN108917113A (zh) * | 2018-08-01 | 2018-11-30 | 珠海格力电器股份有限公司 | 辅助语音控制方法、装置以及空调 |
CN109286875B (zh) * | 2018-09-29 | 2021-01-01 | 百度在线网络技术(北京)有限公司 | 用于定向拾音的方法、装置、电子设备和存储介质 |
CN110085258B (zh) * | 2019-04-02 | 2023-11-14 | 深圳Tcl新技术有限公司 | 一种提高远场语音识别率的方法、系统及可读存储介质 |
-
2019
- 2019-12-03 WO PCT/CN2019/122726 patent/WO2021108991A1/fr active Application Filing
- 2019-12-03 CN CN201980048649.1A patent/CN112470215A/zh active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108496128A (zh) * | 2016-01-28 | 2018-09-04 | 高通股份有限公司 | 无人机飞行控制 |
WO2018023232A1 (fr) * | 2016-07-31 | 2018-02-08 | 杨洁 | Procédé permettant de déplacer un robot en fonction d'un son, et robot |
CN106708047A (zh) * | 2016-12-21 | 2017-05-24 | 精效新软新技术(北京)有限公司 | 智能物品投递机器人装置及其控制方法 |
CN109144092A (zh) * | 2017-06-16 | 2019-01-04 | 昊翔电能运动科技(昆山)有限公司 | 无人机飞行辅助方法、装置及无人机 |
CN108828599A (zh) * | 2018-04-06 | 2018-11-16 | 东莞市华睿电子科技有限公司 | 一种基于救援无人机的受灾人员搜寻方法 |
Also Published As
Publication number | Publication date |
---|---|
CN112470215A (zh) | 2021-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021108991A1 (fr) | Procédé et appareil de commande, et plateforme mobile | |
CN108831474B (zh) | 语音识别设备及其语音信号捕获方法、装置和存储介质 | |
CN109571499A (zh) | 一种智能导航引领机器人及其实现方法 | |
TWI711035B (zh) | 方位角估計的方法、設備、語音交互系統及儲存介質 | |
CN106826846B (zh) | 基于异常声音和图像事件驱动的智能服务机器人及方法 | |
KR102476600B1 (ko) | 전자 장치, 그의 음성 인식 방법 및 비일시적 컴퓨터 판독가능 기록매체 | |
JP4460528B2 (ja) | 識別対象識別装置およびそれを備えたロボット | |
CN102023703B (zh) | 组合唇读与语音识别的多模式界面系统 | |
CN108725452B (zh) | 一种基于全声频感知的无人驾驶车辆控制系统及控制方法 | |
CN111833899B (zh) | 一种基于多音区的语音检测方法、相关装置及存储介质 | |
CN107277260A (zh) | 一种情景模式调整方法、装置和移动终端 | |
CN108375986A (zh) | 无人机的控制方法、装置及终端 | |
CN111930336A (zh) | 音频设备的音量调节方法、设备及存储介质 | |
CN111090412B (zh) | 一种音量调节方法、装置及音频设备 | |
CN110188179B (zh) | 语音定向识别交互方法、装置、设备及介质 | |
KR20230027252A (ko) | 차량 캐빈에서의 음성 명령 제어 방법 및 관련 디바이스 | |
US20230367319A1 (en) | Intelligent obstacle avoidance method and apparatus based on binocular vision, and non-transitory computer-readable storage medium | |
CN110784523B (zh) | 一种目标物信息的推送方法及装置 | |
CN109061655B (zh) | 一种智能驾驶车辆全声频感知系统及其智能控制方法 | |
CN115831141B (zh) | 车载语音的降噪方法、装置、车辆及存储介质 | |
CN111103807A (zh) | 一种家用终端设备的控制方法及装置 | |
US20210065732A1 (en) | Noise manageable electronic apparatus and method for controlling the same | |
KR102495019B1 (ko) | 동물소리 인식 사운드장치 | |
KR20220113619A (ko) | 주차 및 주행시 차량용 블랙박스 이벤트 감지시스템 및 이벤트 학습방법 | |
CN113496697B (zh) | 机器人、语音数据处理方法、装置以及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19954977 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19954977 Country of ref document: EP Kind code of ref document: A1 |