CN108628572B

CN108628572B - Method and device for adjusting volume by robot, computer equipment and storage medium

Info

Publication number: CN108628572B
Application number: CN201810314093.3A
Authority: CN
Inventors: 周宸; 周宝; 王健宗; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-04-10
Filing date: 2018-04-10
Publication date: 2020-03-31
Anticipated expiration: 2038-04-10
Also published as: CN108628572A; WO2019196312A1

Abstract

The invention provides a method and a device for automatically adjusting volume of a robot, wherein the robot is provided with a camera, a loudspeaker and an environment microphone for collecting environment sound, the volume of the loudspeaker corresponding to a first user with a predefined height H when the first user is away from the robot by a distance D is V, and the method comprises the following steps: acquiring an image through a camera, detecting second user image characteristics in the image, calculating the height H of a second user and the distance d relative to the robot according to the second user image characteristics, and determining the height gain k according to the relation between H and H_hAnd D in relation to D determines the distance gain k_d(ii) a Acquiring environmental volume through an environmental microphone to obtain an environmental noise value v_eAccording to v_eDetermining corresponding environmental gain k according to preset corresponding relation_e(ii) a According to k_h、k_d、k_eAnd V determining speaker volume

The volume of the loudspeaker can be intelligently adjusted by the robot according to actual conditions, and interaction efficiency and user experience are improved. A computer device and a storage medium are also provided.

Description

Method and device for adjusting volume by robot, computer equipment and storage medium

Technical Field

The invention relates to the technical field of robots, in particular to a method and a device for automatically adjusting volume by a robot, a computer device and a storage medium with computer readable instructions stored therein.

Background

The current service robot generally adopts a fixed volume to perform functions such as voice conversation and video playing, and may cause a large decibel value of environmental noise due to various factors, such as the sound of people stream/other sound equipment, so that a user is difficult to hear the sound of the robot, the interaction efficiency is poor, and the user experience is poor.

Disclosure of Invention

The object of the present invention is to solve at least one of the above-mentioned technical drawbacks, in particular the technical drawback of poor interaction efficiency.

The invention provides a method for automatically adjusting volume by a robot, wherein the robot is provided with a camera, a loudspeaker and an environment microphone for collecting environment sound, the volume of the loudspeaker corresponding to a first user with a predefined height H when the first user is away from the robot by a distance D is V, and the method comprises the following steps:

acquiring an image through the camera, detecting image characteristics of a second user in the image, calculating the height H of the second user and the distance d relative to the robot according to the image characteristics of the second user, and determining the height gain k according to the relation between H and H_hAnd the relation of D to D determines the distance gain k_d；

Acquiring an environment volume through the environment microphone to obtain an environment noise value v_eAccording to said v_eDetermining corresponding environmental gain k according to preset corresponding relation_e；

According to the k_hK to k_dK to k_eAnd said V determines the speaker volume

In one embodiment, the second user image feature comprises a portrait interpupillary distance, the first user is predefined to have a portrait interpupillary distance of A1 in the image when the first user is at a distance D1 from the robot, and the first user has a portrait interpupillary distance of A2 in the image when the first user is at a distance D2 from the robot, then the distance D of the second user from the robot is calculated by the following formula:

d＝k(a-A1)+D1，

wherein

a is the image pupil of the second user in the imageDistance.

In one embodiment, the second user image feature includes a portrait interpupillary distance, and if the real interpupillary distance of the first user is predefined to be C, the second user height h is calculated by the following formula:

where H1 is the camera height, and Δ H is the pixel difference between the detected center of the rectangular frame of the face and the image center.

In one embodiment, the environment microphones comprise at least a first microphone and a second microphone, the first microphone and the second microphone are positioned on two sides of the robot; acquiring an environment volume by the environment microphone to obtain an environment noise value v_eThe method comprises the following steps:

acquiring the environmental volume through a first microphone to obtain a first environmental noise value v₁Acquiring the environmental volume by a second microphone to obtain a second environmental noise value v₂V is to be measured₁And said v₂The largest among them is determined as the value v of the environmental noise_e。

In one embodiment, the height gain k_h(H- Δ)/(H- Δ), where Δ is the speaker height.

In one embodiment, the distance gain k_d＝d/D。

The invention also provides a device for automatically adjusting the volume of a robot, wherein the robot is provided with a camera, a loudspeaker and an environment microphone for collecting environment sound, the volume of the loudspeaker corresponding to a first user with a predefined height H is V when the first user is away from the robot by a distance D, and the device comprises:

a first calculating module, configured to obtain an image through the camera and detect a second user image feature in the image, calculate a height H of the second user and a distance d from the robot according to the second user image feature, and determine a height gain k according to a relationship between H and H_hAnd D is related to DConstant distance gain k_d；

A second calculation module for acquiring the environment volume by the environment microphone to obtain the environment noise value v_eAccording to said v_eDetermining the corresponding environmental gain k according to the preset corresponding relation_e；

A volume calculating module for calculating the volume according to the k_hK to k_dK to k_eAnd said V determines the speaker volume

In one embodiment, the second user image feature comprises a portrait interpupillary distance, the first user is predefined to have a portrait interpupillary distance of a1 in the image when the first user is at a distance D1 from the robot, and the first user has a portrait interpupillary distance of a2 in the image when the first user is at a distance D2 from the robot, then the first calculation module calculates the distance D of the second user relative to the robot by the following formula:

d＝k(a-A1)+D1，

wherein

and a is the interpupillary distance of the portrait of the second user in the image.

The invention also provides a computer device, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the method for automatically adjusting the sound volume by the robot in any one of the embodiments.

The present invention also provides a storage medium storing computer readable instructions, which when executed by one or more processors, cause the one or more processors to perform the steps of the method for automatically adjusting the volume of a robot according to any one of the embodiments.

The method, the device, the computer equipment and the storage medium for automatically adjusting the volume of the robot are provided, and the robot is provided with a camera, a loudspeaker and a storage medium for collecting environmental soundAn ambient microphone of sound, the speaker volume corresponding to a first user at a distance D from the robot of a predefined height H being V, the method comprising the steps of: acquiring an image through the camera, detecting second user image characteristics in the image, calculating the height H of the second user and the distance d relative to the robot according to the second user image characteristics, and determining the height gain k according to the relation between H and H_hAnd the relation of D to D determines the distance gain k_d(ii) a Acquiring an environment volume through the environment microphone to obtain an environment noise value v_eAccording to said v_eDetermining corresponding environmental gain k according to preset corresponding relation_e(ii) a According to the k_hK to k_dK to k_eAnd said V determines the speaker volume

By judging the h of the user and the distance d of the user relative to the robot and combining the environment noise value v measured by the environment microphone_eTo determine the loudspeaker volume V_mThe volume of the loudspeaker can be intelligently adjusted according to actual conditions by the robot, so that the most suitable volume of a user can be given no matter what environment, and the interaction efficiency and the user experience are improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic diagram showing an internal configuration of a computer device according to an embodiment;

FIG. 2 is a flow chart illustrating a method for automatically adjusting volume by a robot according to an embodiment;

FIG. 3 is a top plan view of the spatial position between the robot and the user of one embodiment;

fig. 4 is a schematic block diagram of an apparatus for automatically adjusting volume by a robot according to an embodiment.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Fig. 1 is a schematic diagram of an internal structure of a computer device according to an embodiment. As shown in fig. 1, the computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected through a system bus. The non-volatile storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can enable the processor to realize a method for automatically adjusting the volume by the robot when being executed by the processor. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have computer readable instructions stored therein that, when executed by the processor, may cause the processor to perform a method for robotic automatic volume adjustment. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 1 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

The method for automatically adjusting the volume by the robot described below can be applied to an intelligent robot, such as a customer service robot, a child education robot, and the like.

Fig. 2 is a flowchart illustrating a method for automatically adjusting volume by a robot according to an embodiment.

The invention provides a method for automatically adjusting volume by a robot, wherein the robot is provided with a camera, a loudspeaker and an environment microphone (a microphone for collecting user voice) for collecting environment voice, the volume of the loudspeaker corresponding to a first user with a predefined height H is V when the first user is at a distance D from the robot, and the method comprises the following steps:

step S100: acquiring an image through the camera, detecting image characteristics of a second user in the image, calculating the height H of the second user and the distance d relative to the robot according to the image characteristics of the second user, and determining the height gain k according to the relation between H and H_hAnd said relation of D to D determines the distance gain k_d。

Face detection may be performed using a face detection method to detect the second user in the image.

Since the camera of the robot may capture a plurality of faces, some of which are only characters of the background and do not interact with the robot, such as conversation, only the person facing the camera and having a conversation with the robot may need to be considered. The camera is typically arranged in the direction of the robot facing the user, for example at the forehead or face of the head if the robot has a head; if the robot has a torso, it may also be positioned at the front torso. The installation position of the camera is not limited, and only the second user can be captured when the second user has a conversation with the robot.

For a camera, an image (picture or video frame) obtained by shooting is of a fixed size, a preset rectangular position can be defined at the center position of the picture to serve as a face recognition area, and face detection is only performed in the face recognition area. For example, assuming that a picture of 1920 × 1080 size is used, a 1000 × 1000 rectangular position may be defined as a face recognition area at the picture center position.

The Face Detection (Face Detection) technique, that is, a technique for detecting a Face existing in an image by image analysis and accurately framing the Face position with a rectangular frame, is the basis of Face feature point Detection and Face recognition. Common face Detection Data sets include fddb (face Detection Data Set and benchmark). With the rapid development of deep learning in recent years, a plurality of excellent face detection methods emerge.

For example, the FDDB database has submitted many excellent face detection methods, such as a face detection method using a cascade CNN Network (Convolutional Neural Network): a Convolvuleanal neural Network Cascade, modified fast rcnn for face detection: face Detection using deep learning: an Improved fast RCNN Approach, as well as Findingtiny faces that are very successful in detecting small faces, and so on. In addition, databases like opencv, dlib, libfacedetect, etc. also provide interfaces for face detection.

The following methods are commonly used for face detection:

1. single CNN face detection method

2. Cascade CNN face detection method

OpenCV face detection method

Dlib face detection method

Face detection method of libfacedetect

Seetaface face detection method

The following briefly introduces a single CNN face detection method.

First, a classifier is trained to determine face and non-face. For example, a convolutional neural network is used for two-classification, and fine adjustment can be performed on a model trained in an imagenet data set by using a face data set of the model. The convolutional network can also be customized for training, and in order to detect smaller human face targets, a small convolutional neural network is generally adopted as a binary model, so that the input size of an image is reduced, and the prediction speed is accelerated.

And then changing the full connection layer of the trained face judgment classification network into a convolution layer, so that the network becomes a full convolution network, the size of any input image can be accepted, the image can obtain a feature map through the full convolution network, each point on the feature map is mapped to the probability that a receptive field area on the original image belongs to the face corresponding to the position, and the area with the probability of belonging to the face larger than a set threshold value is regarded as a face candidate frame.

The size of the face on the image is changed, and in order to adapt to the change, the best method is to use an image pyramid mode to scale the image to be detected to different sizes so as to carry out multi-scale face detection. And performing non-maximum suppression NMS on all face candidate frames detected under multiple scales to obtain the final face detection result.

If the method for automatically adjusting the volume by the robot in the embodiment is applied to an android system, a FaceDetector type can be applied to judge whether a face image exists in an image shot by a camera. The android system is internally provided with a face recognition API: the FaceDetector, the API, can perform face recognition with a small amount of code, but this recognition is the most basic recognition, i.e. only faces in the image can be recognized.

Face identification technology in android, the bottom library that needs to be used: android/external/neven/, architecture layer: frames/base/media/java/android/media/facedetector. Restrictions of the Java layer: 1, only data in a Bitmap format can be accepted; 2, only face images with a distance between two eyes larger than 20 pixels can be recognized (which can be modified in the frame layer); c, only the positions of the faces (the center points and the distances of the eyes) can be detected, and the faces cannot be matched (the specified facial makeup is searched).

The main methods provided by the Neven library:

A，android.media.FaceDetector.FaceDetector(int width,int height,intmaxFaces)；B，int android.media.FaceDetector.findFaces(Bitmap bitmap,Face[]faces)。

in the android system, the positions of the two eyes of the face image can be acquired through a faceDetector class, and the position of the face image on the desktop is determined according to the positions of the two eyes. The specific steps can be as follows: the method comprises the steps of obtaining the central point of the positions of two eyes of a face image, obtaining the interpupillary distance (interpupillary distance) of the face image, drawing a rectangular area (rectangular frame) according to the central point of the positions of the two eyes and the interpupillary distance, and taking the rectangular area as the position of the face image on a desktop. The center point of the positions of the two eyes of the face image can be obtained through the following codes:

the human face detection method is not described herein again, and in this embodiment, the human face detection method is not limited. After the face is detected, the height H of the detected second user and the distance d relative to the robot are calculated, and the height gain k is determined according to the relation between H and H_hAnd D in relation to D determines the distance gain k_d。

The distance d of the second user relative to the robot can be determined computationally by means of the sensed values of the associated distance measuring sensors, for example infrared distance measuring by means of infrared sensors, laser distance measuring using laser sensors, etc. In the present embodiment, the determination d is calculated in an image analysis method.

In calculating the determination d, two assumptions are based. First, for most people, the interpupillary distances of people are not very different (around ± 2 cm). Second, the user who has a conversation with the robot has a limited range of variation in the distance between the user and the robot. The principle is that the distance is estimated by comparing the ratio of the interpupillary distance of the face in the shot picture to the interpupillary distance of the face in the calibration picture. The closer the user's face is to the camera, the larger the size of the face, and this relationship is approximately linear.

In this embodiment, the second user image characteristics include a portrait interpupillary distance, the portrait interpupillary distance in the image is a1 (actual interpupillary distance) when the first user is predefined to be at a distance D1 from the robot, the portrait interpupillary distance in the image is a2 (actual interpupillary distance) when the first user is predefined to be at a distance D2 from the robot, and then the distance D of the second user from the robot is calculated by the following formula:

d＝k(a-A1)+D1

wherein

and a is the interpupillary distance of a second user image detected when the face detection method is applied to carry out face detection, namely the interpupillary distance of the image in the image.

Of course, other image analysis methods may also be used to calculate and determine d, for example, other calculation formulas may be used, and only the above two assumptions and principles need to be followed, which is not described herein.

After D is determined, a distance gain k is determined from the relationship of D to D_d. Distance gain k_dAnd d are in a positive relationship, e.g., a direct proportional relationship. In the present embodiment, the distance gain k_dD/D. Of course, in other embodiments, there may be other calculation modes, k_dD/D + m (m is a preset coefficient), and so on, which are not described herein in detail.

In this embodiment, when the interpupillary distance of the corresponding image in the image is C (the interpupillary distance of the corresponding image) when the interpupillary distance of the first user is C (the actual interpupillary distance), the second user height h is calculated by the following formula:

wherein, H1 is the camera height, and Δ H is the pixel difference between the center of the rectangular frame of the face and the center of the image detected when the face detection method is applied to face detection.

Similarly, other image analysis methods may also be used to calculate and determine h, for example, other calculation formulas may be used, and only the above two assumptions and principles need to be followed, which is not described herein.

After H is determined, the height gain k is determined according to the relation between H and H_h. Likewise, H and H have a positive relationship, e.g., a direct proportional relationship. In the present embodiment, the height gain k_h(H- Δ)/(H- Δ), where Δ is the speaker height. Of course, in some embodiments, the speaker height may be ignored, i.e., Δ ═ 0.

Step S200: acquiring an environment volume through the environment microphone to obtain an environment noise value v_eAccording to v_eDetermining corresponding environmental gain k according to preset corresponding relation_e。

Specifically, the environment microphone acquires environment volume to obtain an environment noise value v_eAccording to v_eThe section range in which the environmental gain k is positioned determines the environmental gain k corresponding to the section range_e。

Multiple interval ranges may be preset, each interval range having a corresponding preset environmental gain. I.e. (v)₁，v₂) The range of the interval corresponds to the environmental gain k₁、(v₂，v₃) The range of the interval corresponds to the environmental gain k₂、……、(v_n-1，v_n) The range of the interval corresponds to the environmental gain k_n-1。

For example, the noise level is set to 70 dB.

For quiet environment (v)_e<40dB)，k_e＝0.8；

For the normal environment (40 dB)<v_e<70dB)，k_e＝1；

For noisy environment (70 dB)<v_e<90dB)，k_e＝1+(v_e-70)/100；

For extremely noisy conditions (v)_e>90dB)，k_e＝∞；

Of course, in some embodiments, v may be passed through_eAnd a preset calculation formula to determine the environmental gain k_e。

In one embodiment, the ambient microphones include at least a first microphone and a second microphone, which are located on both sides of the robot (with the front direction of the robot as the baseline), such as both sides of the head of the robot, or both sides of the torso of the robot, see fig. 3; acquiring an environment volume through the environment microphone to obtain an environment noise value v_eThe process comprises the following steps:

acquiring the environmental volume through a first microphone to obtain a first environmental noise value v₁Acquiring the environmental volume by a second microphone to obtain a second environmental noise value v₂V is to be₁And v₂The largest among them is determined as the value v of the environmental noise_eI.e. v_e＝max(v₁，v₂)。

Determining v_eThereafter, v can be looked up in the data table_eSaid interval range, then obtaining v_eThe environmental gain k corresponding to the interval range_e。

Step S300: according to k_h、k_d、k_eAnd V determining speaker volume

k_h、k_d、k_eAll with the loudspeaker volume V_mHaving a positive relationship (e.g. a direct proportional relationship),

corresponding to the gain of the sound source,

equivalent to the total gain, and therefore any volume V for the loudspeaker based on this forward relationship_mSuitable variations of the calculation formula (c) can be considered reasonable and are not described herein.

Of course, the maximum volume V may also be preset_maxAnd minimum volume V_minIf V is_m<V_minThen V is_m＝V_min(ii) a If V_m>V_maxThen V is_m＝V_max。

The method for automatically adjusting the volume of the robot judges the h of the second user and the distance d between the second user and the robot and combines the environmental noise value v measured by the environmental microphone_eTo determine the loudspeaker volume V_mThe volume of the loudspeaker can be intelligently adjusted according to actual conditions by the robot, so that the most suitable volume of a user can be given no matter what environment, and the interaction efficiency and the user experience are improved.

Fig. 4 is a schematic block diagram of an apparatus for automatically adjusting volume by a robot according to an embodiment. The invention also provides a device for automatically adjusting the volume of the robot, which corresponds to the method for automatically adjusting the volume of the robot, wherein the robot is provided with a camera, a loudspeaker and an environment microphone (also provided with a microphone for collecting the sound of a user) for collecting the environment sound, the volume of the loudspeaker corresponding to a first user with a predefined height H is V when the first user is at a distance D from the robot, and the device comprises a first calculating module 100, a second calculating module 200 and a volume calculating module 300.

The first calculating module 100 is configured to obtain an image through the camera and detect an image feature of a second user in the image, calculate a height H of the second user and a distance d from the second user to the robot according to the image feature of the second user, and determine a height gain k according to a relationship between H and H_hAnd D in relation to D determines the distance gain k_d(ii) a The second computing module 200 is configured to obtain an ambient noise value v by collecting ambient volume with the ambient microphone_eAccording to v_eDetermining corresponding environmental gain k according to preset corresponding relation_e(ii) a The volume calculating module 300 is used for calculating the volume according to k_h、k_d、k_eAnd V determining speaker volume

The first calculation module 100 acquires an image through the camera and detects a second user image feature in the image, calculates a height H of the second user and a distance d relative to the robot according to the second user image feature, and determines a height gain k according to a relationship between H and H_hAnd D in relation to D determines the distance gain k_d。

The first computing module 100 may perform face detection using a face detection method to detect the second user in the image.

The following methods are commonly used for face detection:

1. single CNN face detection method

2. Cascade CNN face detection method

OpenCV face detection method

Dlib face detection method

Face detection method of libfacedetect

Seetaface face detection method

The following briefly introduces a single CNN face detection method.

If the device for automatically adjusting the volume of the robot in this embodiment is applied to an android system, the first computing module 100 may use a FaceDetector class to determine whether a face image exists in an image shot by a camera. The android system is internally provided with a face recognition API: the FaceDetector, the API, can perform face recognition with a small amount of code, but this recognition is the most basic recognition, i.e. only faces in the image can be recognized.

The main methods provided by the Neven library:

in the android system, the first computing module 100 may obtain positions of two eyes of the face image through a FaceDetector class, and determine a position of the face image on the desktop according to the positions of the two eyes. The specific steps can be as follows: the method comprises the steps of obtaining the central point of the positions of two eyes of a face image, obtaining the interpupillary distance (interpupillary distance) of the face image, drawing a rectangular area (rectangular frame) according to the central point of the positions of the two eyes and the interpupillary distance, and taking the rectangular area as the position of the face image on a desktop. The center point of the positions of the two eyes of the face image can be obtained through the following codes:

the human face detection method is not described herein againIn the embodiment, the face detection method is not limited. After the first calculation module 100 detects a face, the height H of the detected second user and the distance d relative to the robot are calculated, and the height gain k is determined according to the relationship between H and H_hAnd D in relation to D determines the distance gain k_d。

The distance d of the second user relative to the robot can be determined computationally by means of the sensed values of the associated distance measuring sensors, for example infrared distance measuring by means of infrared sensors, laser distance measuring using laser sensors, etc. In the present embodiment, the first calculation module 100 calculates the determination d in an image analysis method.

In this embodiment, the second user image characteristics include a human image interpupillary distance, the human image interpupillary distance in the image is a1 (actual interpupillary distance) when the first user is predefined to be at a distance D1 from the robot, the human image interpupillary distance in the image is a2 (actual interpupillary distance) when the first user is predefined to be at a distance D2 from the robot, and the first calculation module 100 calculates the distance D of the second user from the robot by the following formula:

d＝k(a-A1)+D1

wherein

Of course, the first calculating module 100 may also use other image analysis methods to calculate and determine d, for example, use other calculating formulas, and only need to follow the above two assumptions and principles, which is not described herein.

After the first computing module 100 determines D, the distance gain k is determined according to the relation between D and D_d. Distance gain k_dAnd d are in a positive relationship, e.g., a direct proportional relationship. In the present embodiment, the distance gain k_dD/D. Of course, in other embodiments, there may be other calculation modes, k_dD/D + m (m is a preset coefficient), and so on, which are not described herein in detail.

In this embodiment, when the interpupillary distance of the corresponding image in the image is C (the interpupillary distance of the corresponding image) when the interpupillary distance of the first user is C (the actual interpupillary distance), the first calculating module 100 calculates the second user height h by using the following formula:

Similarly, the first calculating module 100 may also use other image analysis methods to calculate and determine h, for example, use other calculating formulas, and only need to follow the above two assumptions and principles, which is not described herein.

After the first calculation module 100 determines H, the height gain k is determined according to the relation between H and H_h. Likewise, H and H have a positive relationship, e.g., a direct proportional relationship. In the present embodiment, the height gain k_h(H- Δ)/(H- Δ), where Δ is the speaker height. Of course, in some embodiments, the speaker height may be ignored, i.e., Δ ═ 0.

The second computing module 200 obtains the value v of the environmental noise by collecting the environmental volume through the environmental microphone_eAccording to v_eDetermining corresponding environmental gain k according to preset corresponding relation_e. Specifically, the environment microphone acquires environment volume to obtain an environment noise value v_eAccording to v_eThe section range in which the environmental gain k is positioned determines the environmental gain k corresponding to the section range_e。

Can be presetA plurality of interval ranges, each interval range having a corresponding pre-set environmental gain. I.e. (v)₁，v₂) The range of the interval corresponds to the environmental gain k₁、(v₂，v₃) The range of the interval corresponds to the environmental gain k₂、……、(v_n-1，v_n) The range of the interval corresponds to the environmental gain k_n-1。

For example, the noise level is set to 70 dB.

For quiet environment (v)_e<40dB)，k_e＝0.8；

For the normal environment (40 dB)<v_e<70dB)，k_e＝1；

For noisy environment (70 dB)<v_e<90dB)，k_e＝1+(v_e-70)/100；

For extremely noisy conditions (v)_e>90dB)，k_e＝∞；

Of course, in some embodiments, the second computing module 200 may pass v_eAnd a preset calculation formula to determine the environmental gain k_e。

In one embodiment, the ambient microphones include at least a first microphone and a second microphone, which are located on both sides of the robot (with the front direction of the robot as the baseline), such as both sides of the head of the robot, or both sides of the torso of the robot, see fig. 3; the second computing module 200 obtains the value v of the environmental noise by collecting the environmental volume through the environmental microphone_eThe process comprises the following steps:

The second calculation module 200 determines v_eThereafter, v can be looked up in the data table_eSaid interval range, then obtaining v_eThe environmental gain k corresponding to the interval range_e。

The volume calculating module 300 calculates the volume according to k_h、k_d、k_eAnd V determining speaker volume

corresponding to the gain of the sound source,

The device for automatically adjusting the volume of the robot judges the h of the second user and the distance d between the second user and the robot and combines the environmental noise value v measured by the environmental microphone_eTo determine the loudspeaker volume V_mThe volume of the loudspeaker can be intelligently adjusted according to actual conditions by the robot, so that the most suitable volume of a user can be given no matter what environment, and the interaction efficiency and the user experience are improved.

The robot has a camera, a loudspeaker and an ambient microphone for capturing ambient sound, the loudspeaker volume corresponding to a first user at a predefined height H at a distance D from the robot is V, the method comprising the steps of: acquiring an image through the camera, detecting second user image characteristics in the image, calculating the height H of the second user and the distance d relative to the robot according to the second user image characteristics, and determining the height gain k according to the relation between H and H_hAnd the relation of D to D determines the distance gain k_d(ii) a Acquiring an environment volume through the environment microphone to obtain an environment noise value v_eAccording to said v_eDetermining corresponding environmental gain k according to preset corresponding relation_e(ii) a According to the k_hK to k_dK to k_eAnd said V determines the speaker volume

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for automatically adjusting volume by a robot, the robot having a camera, a speaker and an ambient microphone for collecting ambient sound, a first user with a predefined height H corresponding to a speaker volume V at a distance D from the robot, the method comprising the steps of:

acquiring an image through the camera, detecting image characteristics of a second user in the image, calculating the height H of the second user and the distance d relative to the robot according to the image characteristics of the second user, and determining the height gain k according to the relation between H and H_h：k_h(H- Δ)/(H- Δ); wherein Δ is the speaker height; and the relation of D to D determines the distance gain k_d(ii) a Wherein the second user image characteristics include a portrait interpupillary distance, and if the real interpupillary distance of the first user is predefined to be C, the height h of the second user is calculated by the following formula:

wherein, H1 is the camera height, and Δ H is the pixel difference between the detected face rectangle frame center and the image center;

predefining a human image pupil distance in the image of the first user to be A1 when the first user is at a distance D1 from the robot and A2 when the first user is at a distance D2 from the robot, then calculating a distance D of the second user relative to the robot by: d ═ k (a-a1) + D1, where

a is the interpupillary distance of the portrait of the second user in the image;

According to the k_hK to k_dK to k_eAnd said V determines the speaker volume

2. The method of claim 1, wherein the environmental microphones comprise at least a first microphone and a second microphone, the first microphone and the second microphone being located on both sides of the robot; acquiring an environment volume by the environment microphone to obtain an environment noise value v_eThe method comprises the following steps:

3. The method of claim 1, wherein the distance gain k is a distance gain_d＝d/D。

4. A device for automatically adjusting volume by a robot, the robot having a camera, a speaker and an ambient microphone for collecting ambient sound, the speaker volume corresponding to a first user with a predefined height H at a distance D from the robot being V, the device comprising:

a first calculating module, configured to obtain an image through the camera and detect a second user image feature in the image, calculate a height H of the second user and a distance d from the robot according to the second user image feature, and determine a height gain k according to a relationship between H and H_h：k_h(H- Δ)/(H- Δ); wherein Δ is the speaker height; and the relation of D to D determines the distance gain k_d(ii) a Wherein the second user image characteristics include a portrait interpupillary distance, and if the real interpupillary distance of the first user is predefined to be C, the height h of the second user is calculated by the following formula:

predefining a pupil distance of a1 of a portrait in the image of the first user at a distance D1 from the robot and a pupil distance of a2 of a portrait in the image of the first user at a distance D2 from the robot, a first calculation module calculates a distance D of the second user relative to the robot by: d ═ k (a-a1) + D1, where

a second calculation module for acquiring the environment volume by the environment microphone to obtain the environment noise value v_eAccording to said v_eDetermining the corresponding environmental gain according to the preset corresponding relationk_e；

5. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the method of robotic automatic volume adjustment of any one of claims 1 to 3.

6. A storage medium having stored therein computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the method for robot auto-adjusting volume as claimed in any one of claims 1 to 3.