WO2023040547A1

WO2023040547A1 - Volume adjustment method and apparatus, terminal, and computer-readable storage medium

Info

Publication number: WO2023040547A1
Application number: PCT/CN2022/112705
Authority: WO
Inventors: 吴文飞
Original assignee: Oppo广东移动通信有限公司
Priority date: 2021-09-16
Filing date: 2022-08-16
Publication date: 2023-03-23
Also published as: CN113965641A; CN113965641B

Abstract

A volume adjustment method, a volume adjustment apparatus (10), a terminal (100), and a non-volatile computer-readable storage medium (200). The volume adjustment method comprises: acquiring a face image, the face image comprising jitter information (101); calculating the distance between a face and an electronic device according to the face image (102); and when the jitter information is in a first preset range, adjusting playback volume according to the distance (103).

Description

Volume adjustment method and device, terminal, and computer-readable storage medium

priority information

This application claims the priority and benefit of the patent application No. 202111088747.3 filed with the State Intellectual Property Office of China on September 16, 2021, which is hereby incorporated by reference in its entirety.

technical field

The present application relates to the technical field of volume adjustment, and more specifically, to a volume adjustment method, a volume adjustment device, a terminal, and a non-volatile computer-readable storage medium.

Background technique

At present, in the speaker scene, the user often adjusts the volume by pressing the volume adjustment button on the terminal. When the distance between the user and the terminal changes, only the button is provided to adjust the volume, and the user cannot get the best results. playback volume.

Contents of the invention

Embodiments of the present application provide a volume adjustment method, a volume adjustment device, a terminal, and a non-volatile computer-readable storage medium.

The volume adjustment method in the embodiment of the present application includes acquiring a face image, the face image including shake information; calculating the distance between the face and the electronic device according to the face image; When setting the range, adjust the playback volume according to the distance.

The volume adjustment device in the embodiment of the present application includes an acquisition module, a calculation module and an adjustment module. The obtaining module is used to obtain a face image, and the face image includes shaking information. The calculating module is used for calculating the distance between the human face and the electronic device according to the human face image. And the adjustment module is configured to adjust the playback volume according to the distance when the shaking information is within a first preset range.

The terminal in the embodiment of the present application includes a processor. The processor is used to acquire a human face image, the human face image includes shaking information; calculate the distance between the human face and the electronic device according to the human face image; and when the shaking information is within a first preset range, Adjust the playback volume according to the distance.

The non-transitory computer-readable storage medium of the embodiment of the present application contains a computer program. When the computer program is executed by one or more processors, the processor is made to perform the following volume adjustment method: acquire a face image, and The human face image includes shaking information; the distance between the human face and the electronic device is calculated according to the human face image; and when the shaking information is within a first preset range, the playback volume is adjusted according to the distance.

Additional aspects and advantages of the embodiments of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the embodiments of the application.

Description of drawings

The above and/or additional aspects and advantages of the present application will become apparent and understandable from the description of the embodiments in conjunction with the following drawings, wherein:

FIG. 1 is a schematic flowchart of a volume adjustment method in some embodiments of the present application;

FIG. 2 is a schematic diagram of a volume adjustment device in some embodiments of the present application;

FIG. 3 is a schematic plan view of a terminal in some embodiments of the present application;

FIG. 4 is a schematic diagram of a scene of a volume adjustment method in some embodiments of the present application;

FIG. 5 is a schematic flowchart of a volume adjustment method in some embodiments of the present application;

FIG. 6 is a schematic diagram of a scene of a volume adjustment method in some embodiments of the present application;

FIG. 7 and FIG. 8 are schematic flowcharts of volume adjustment methods in some embodiments of the present application;

Fig. 9 is a schematic scene diagram of a volume adjustment method in some embodiments of the present application;

10 to 12 are schematic flowcharts of volume adjustment methods in some embodiments of the present application;

Fig. 13 is a schematic diagram of a connection state between a non-volatile computer-readable storage medium and a processor in some embodiments of the present application.

Detailed ways

Embodiments of the present application are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary, are only for explaining the embodiments of the present application, and should not be construed as limiting the embodiments of the present application.

In some implementations, the volume adjustment method includes acquiring a plurality of consecutive frames of the human face images within a first predetermined duration; Whether the difference of the position coordinates of the face is within the first preset range; and if so, determining that the shaking information is within the first preset range.

In some implementations, the face image further includes angle information, and the adjusting the playback volume according to the distance includes when the shaking information is in the first preset range and the angle information is in a second preset range. When setting the range, adjust the playback volume according to the distance.

In some implementations, the volume adjustment method further includes acquiring a plurality of consecutive frames of the human face image within a second predetermined duration; judging whether the angle information of the human face in the continuous multiple frames of the human face image within the second preset range; and if so, determining that the angle information is within the second preset range.

In some implementations, before calculating the distance between the human face and the electronic device according to the human face image, the volume adjustment method further includes receiving an input operation to set the human face of a plurality of different users priority; and acquiring the first face information of the face with the highest priority in the face image as the target face information; calculating the The distance between the human face and the electronic device includes calculating the distance between the human face and the electronic device according to the target human face information.

In some implementations, the acquiring the first face information of the face with the highest priority in the face image as the target face information includes identifying the face in the face image The second face information of one or more of the faces; comparing the one or more of the second face information with the pre-stored face information in the preset face database to obtain the same The second face information matched with the pre-stored face information is used as the first face information; the first face information with the highest priority of the face is obtained as the target person face information.

In some implementations, the pre-stored face information is generated according to the face images of different users under different lighting conditions.

In some implementations, the pre-stored face information is generated according to the face images of different users at different shooting angles.

In some implementations, the volume adjustment method further includes setting an initial volume at an initial distance according to an input operation, and associating the initial distance with the face in the face image collected at the initial distance The initial size of the human face; the calculation of the distance between the human face and the electronic device according to the human face image includes the calculation according to the initial distance, the initial size and the size of the human face in the multiple frames of the human face image average size, calculate the distance.

In some implementations, the adjusting the playback volume according to the distance includes determining an adjustment volume according to the initial distance, the distance, and the initial volume; and adjusting the playback volume according to the adjustment volume.

In some implementations, the processor is configured to acquire multiple consecutive frames of the human face images within a first predetermined duration; and determine whether the person in any two frames of the human face images among the multiple consecutive frames of the human face images Whether the difference of the position coordinates of the face is within the first preset range; and if so, determining that the shaking information is within the first preset range.

In some implementations, the face image further includes angle information, and the processor is configured to, when the shaking information is in the first preset range and the angle information is in a second preset range, according to the Adjust the playback volume by the above distance.

In some implementations, the processor is configured to acquire multiple consecutive frames of the human face images within a second predetermined duration; and determine whether the angle information of the human faces in the multiple consecutive frames of the human face images is within the specified range. the second preset range; and if so, determining that the angle information is within the second preset range.

In some implementations, before the processor calculates the distance between the human face and the electronic device according to the human face image, the processor is configured to: receive an input operation to set the human face of a plurality of different users The priority of the face; and obtaining the first face information of the face with the highest priority in the face image as the target face information; calculating the target face information according to the target face information The distance between the human face and the electronic device.

In some implementations, the processor is configured to identify the second face information of one or more of the faces in the face image; combine one or more of the second face information with Compare the pre-stored face information in the preset face database to obtain the second face information matched with the pre-stored face information as the first face information; obtain the face information The first face information with the highest priority is used as the target face information.

In some implementations, the processor is configured to generate the prestored face information according to the face images of different users under different lighting conditions.

In some implementations, the processor is configured to generate the prestored face information according to the face images of different users at different shooting angles.

In some implementations, the processor is configured to set an initial volume at an initial distance according to an input operation, and associate the initial distance with the initial volume of the face in the face image collected at the initial distance. Size: Calculate the distance according to the initial distance, the initial size, and the average size of the face sizes in multiple frames of the face images.

In some implementations, the processor is configured to determine an adjusted volume according to the initial distance, the distance, and the initial volume; and adjust the playback volume according to the adjusted volume.

Referring to FIG. 1 , an embodiment of the present application provides a volume adjustment method. The volume adjustment method includes steps:

101: Acquire a face image, where the face image includes shaking information;

102: Calculate the distance between the face and the electronic device according to the face image; and

103: When the shaking information is within the first preset range, adjust the playback volume according to the distance.

Referring to FIG. 2 , an embodiment of the present application provides a volume adjustment device 10 . The volume adjustment device 10 includes an acquisition module 11 , a calculation module 12 and an adjustment module 13 . The volume adjustment method in the embodiments of the present application can be applied to the volume adjustment device 10 . Wherein, the acquisition module 11 is used to execute step 101 , the calculation module 12 is used to execute step 102 , and the adjustment module 13 is used to execute step 103 . That is, the obtaining module 11 is used to obtain a face image, and the face image includes shaking information. The calculation module 12 is used for calculating the distance between the human face and the electronic device according to the human face image. The adjustment module 13 is configured to adjust the playback volume according to the distance when the shaking information is within the first preset range.

Referring to FIG. 3 , the embodiment of the present application further provides a terminal 100 . The terminal 100 includes a processor 30 . The volume adjustment method in the embodiments of the present application may be applied to the terminal 100 . The processor 30 is configured to execute step 101 , step 102 and step 103 . That is, the processor 30 is used to acquire a face image, and the face image includes shake information; calculate the distance between the face and the electronic device according to the face image; and adjust the playback volume according to the distance when the shake information is within a first preset range.

Wherein, the terminal 100 further includes a housing 40 . The terminal 100 may be a mobile phone, a tablet computer, a display device, a notebook computer, an teller machine, a gate, a smart watch, a head-mounted display device, a game console, and the like. As shown in FIG. 3 , the embodiment of the present application is described by taking the terminal 100 as an example of a mobile phone. It can be understood that the specific form of the terminal 100 is not limited to the mobile phone. The housing 40 can also be used to install functional modules such as a display device, an imaging device, a power supply device, and a communication device of the terminal 100, so that the housing 40 provides protection for the functional modules such as dustproof, dropproof, and waterproof.

Specifically, before adjusting the playback volume of the terminal 100, the processor 30 first needs to determine whether the face (that is, the user) in the face image is within the first preset range according to the shaking information in the face image. Wherein, the first preset range may be a position where the face is not shaken. The first preset range may also be the maximum range that the human face can allow shaking, that is, when the range is exceeded, the processor determines that the human face shakes.

In one embodiment, the shaking information may include the position of the human face and the preset position (i.e. the first preset range) where the human face is not shaking, and the processor may determine whether the position of the human face in the human face image is within When presetting the position, it is used to judge whether the face shakes. For example, when the processor judges that the position of the human face is in the predetermined position, the processor judges that the human face is in the first preset range, that is, the human face is shaken; when the processor judges that the position of the human face is not in the predetermined position, the processor judges that the human face If it is not in the first preset range, that is, the human face shakes.

In another embodiment, before adjusting the playback volume of the terminal 100, the processor 30 acquires multiple frames of face images, and only detects the faces in the face images, and compares the faces in the multiple frames of face images. After whether the position changes greatly, it can be determined whether the shake information is within the first preset range, so as to determine whether the face shakes. That is, when the processor 30 changes greatly by comparing the positions of the faces in the multi-frame face images (that is, the position difference of the faces in the multi-frame face images is outside the first preset range), the processor will 30, it is determined that the shake information is not in the first preset range, and the face shakes, and the processor 30 compares the position of the target face image in the multi-frame portrait images without changing, or the position change is small (that is, the position of the multi-frame portrait image is relatively small). When the position difference of the face in the face image is within the first preset range), the processor 30 determines that the shake information is within the first preset range, and the face does not shake. Next, the processor 30 can calculate the current distance between the face and the electronic device (ie, the terminal 100 ) according to the face image. Specifically, the corresponding mapping relationship between the size of the human face and the distance can be preset in the terminal 100, that is, the size of the human face can reflect the distance between the human face and the terminal 100, so that the processor 30 can , to get the distance between the face and the electronic device.

Taking Figure 4 as an example, the terminal 100 can preset the distance between the human face and the electronic device to be 0.5 meters and 1 meter, and the corresponding facial images are human face image P1 and human face image P2 respectively. It can be seen that the human face image P1 Different from the size of the face in the face image P2, the size of the face in the face image P2 is smaller than that in the face image P1. Thus, when the processor 30 acquires the face image, it can be compared with the face size in the face image P1 and the face image P2 respectively, that is, when the face in the face image acquired by the processor 30 is compared with the size of the person When the sizes of the faces in the face image P1 are the same, the processor 30 can conclude that the distance between the faces and the electronic device is 0.5 meters. When the size of the face in the face image acquired by the processor 30 is the same as that in the face image P2, the processor 30 can conclude that the distance between the face and the electronic device is 1 meter.

Finally, after the processor 30 determines that the human face is not shaking, that is, the shaking information is within the first preset range, and calculates the distance between the human face and the electronic device, the processor 30 will obtain the volume corresponding to the distance according to the distance, thereby Adjust the playback volume of the terminal 100.

For example, the mapping relationship between the predetermined distance and the predetermined volume set by the user can be set in advance in the terminal 100. After the processor 30 calculates the distance between the face and the electronic device, the processor 30 can compare the distance with the predetermined volume. distance, so as to obtain the change ratio of the distance relative to the preset distance, and then calculate the product of the change ratio and the predetermined volume to obtain the corresponding volume at the current distance, and the processor 30 can adjust the playback volume of the terminal 100 according to the volume, that is, The playback volume of the terminal 100 is adjusted to this volume.

For another example, when the mapping relationship between the predetermined distance and the predetermined volume set by the user is set in the terminal 100, after the processor 30 calculates the distance between the face and the electronic device, the processor 30 can compare the current distance and the predetermined distance to obtain the change ratio of the distance, so that the relationship between the sound pressure and the distance and the change ratio can be used to obtain the volume of the playback volume of the terminal 100 that needs to be adjusted theoretically relative to the predetermined volume at the current distance, so as to adjust the volume of the terminal 100 Playback volume.

At present, only by changing the distance between the user and the terminal 100 to adjust the playback volume of the terminal 100 independently, it will lead to an inaccurate judgment of performing the adjustment of the playback volume, so that the user cannot obtain the best sound experience.

The volume adjustment method, the volume adjustment device 10 and the terminal 100 of the embodiment of the present application will adjust the playback volume according to the distance between the face and the electronic device when the shaking information of the face image is in the first preset range, that is, when the face is not shaking , thus, it can be ensured that the user will not adjust the playback volume if the user shakes unconsciously during the use of the terminal 100, thereby ensuring the accuracy of judging whether to adjust the playback volume, so that the user can obtain the best volume experience.

Please refer to Fig. 2, Fig. 3 and Fig. 5, the volume adjustment method of the embodiment of the present application also includes steps:

501: Obtain a face image, where the face image includes shaking information;

502: Calculate the distance between the face and the electronic device according to the face image;

503: Obtain multiple consecutive frames of face images within the first predetermined duration;

504: Judging whether the difference between the position coordinates of the faces in any two frames of face images in the continuous multiple frames of face images is within the first preset range; and

505: If yes, determine that the shaking information is within a first preset range.

506: When the shake information is within the first preset range, adjust the playback volume according to the distance.

In some embodiments, the acquisition module 11 is used to execute step 501 , step 503 , step 404 and step 505 , the calculation module 12 is used to execute step 502 , and the adjustment module 13 is used to execute step 506 . That is, the acquisition module 11 is used to obtain a face image, and the face image includes shaking information; obtain continuous multiple frames of human face images in the first predetermined duration; judge the continuous multiple frames of human face images, the people in any two frames of human face images Whether the difference of the position coordinates of the face is within a first preset range; and if so, determining that the shaking information is within the first preset range. The calculation module 12 is used for calculating the distance between the human face and the electronic device according to the human face image. The adjustment module 13 is configured to adjust the playback volume according to the distance when the shaking information is within the first preset range.

In some implementations, the processor 30 is configured to execute step 501 , step 502 , step 503 , step 504 , step 505 and step 506 . That is, the processor 30 obtains a face image, and the face image includes shaking information; calculates the distance between the face and the electronic device according to the face image; obtains continuous multi-frame face images in the first predetermined duration; judges the continuous multi-frame face images , whether the difference between the position coordinates of the faces in any two frames of face images is within a first preset range; and if so, determining that the shaking information is within a first preset range. When the shaking information is within the first preset range, the playback volume is adjusted according to the distance.

Wherein, step 501 is executed in the same manner as above-mentioned step 101, step 502 is executed in the same manner as above-mentioned step 102, and step 506 is executed in the same manner as above-mentioned step 103, which will not be repeated here.

Specifically, the processor 30 will acquire multiple consecutive frames of human face images within the first predetermined time period, so as to determine the shaking Whether the information is in the first preset range. Whether the shaking information is within the first preset range can also reflect whether the face shakes within the first predetermined time period. Wherein, the shaking of the human face may be the shaking that occurs on the terminal 100 when the user operates the terminal 100, or the shaking that occurs unconsciously by the user himself, that is, the shaking of the human face is the relative shaking between the terminal 100 and the user, and is not limited to Jitter generated by the user itself.

For example, the first predetermined duration is 1 second, and the processor 30 will acquire 5 consecutive frames of human face images within 1 second. The position coordinates of the faces in any two frames of face images in the face image to obtain the coordinate difference. For example, the difference between the position coordinates of the face in the first frame of face image and the second frame of face image, and the difference between the position coordinates of the face in the first frame of face image and the fifth frame of face image , such as the difference between the position coordinates of the face in the second frame of the face image and the fourth frame of the face image, etc. Wherein, the difference of the position coordinates of the face in any two frames of face images can be the difference of the position coordinates of the center point of the face in any two frames of face images, or the difference of the position coordinates of the center point of the face in any two frames of face images. The difference between the position coordinates of facial feature points (such as eye feature points, mouth feature points and nose feature points).

Next, the processor 30 compares whether the difference of the position coordinates is within the first preset range to determine whether the human face shakes. At this time, the first preset range represents the maximum value that allows the positions of the faces in any two frames of face images to change.

As shown in Figure 6, Figure 6 (a) and Figure 6 (b) are any two frames of human face images, processor 30 can calculate the position coordinates of the mouth corner feature point Q1 in Figure 6 (a) and Figure 6 (b) ) in the position coordinate difference of the mouth corner feature point Q2, to obtain the difference value of the position coordinates in any two frames of face images. For example, the coordinates of Q1 are (1, 1.5), and the coordinates of Q2 are (1, 2), it can be obtained that the difference between the position coordinates of Q1 and Q2 is (0, 0.5), if the first preset range is (0.5 , 0.5), that is, the maximum distance that allows the position of the face in any two frames of face images to change on the X-axis and Y-axis is 0.5 units. It can be seen that the difference between the position coordinates of Q1 and Q2 is in the first A preset range means that there is no shaking in the multiple frames of human face images within the first preset duration. If the first preset range is (0, 0.25), at this time, the difference between the position coordinates of Q1 and Q2 is not in the first preset range, which means that the multi-frame face images within the first preset duration shake. .

It should be noted that when the difference between the position coordinates of the faces in any two frames of face images is negative, the processor 30 compares whether the absolute value of the position coordinates is within the first preset range. For example, the first preset range is (1, 1), and when the difference between the position coordinates of the faces in any two frames of human face images is (-2, -2), the processor 30 then determines the difference between the position coordinates The absolute value (2, 2) of is not in the first preset range (1, 1), it means that the difference between the position coordinates of the faces in any two frames of face images is not in the first preset range, that is, in The multiple frames of face images within the first pre-duration shake.

To sum up, when the processor 30 judges that the difference between the position coordinates of the faces in any two frames of face images is not within the first preset range in the continuous multi-frame face images, the processor 30 determines that no face has occurred. shake. When the processor 30 judges that the difference between the position coordinates of the faces in any two frames of the face images is within the first preset range among the consecutive multiple frames of face images, the processor 30 determines that the face shakes. At this time, it means that the user does not wish to adjust the playback volume of the terminal 100 , and the processor 30 will not adjust the playback volume of the terminal 100 either.

Please refer to Fig. 2, Fig. 3 and Fig. 7, in some embodiments, the face image also includes angle information, the volume adjustment method of the embodiment of the present application, also includes the steps:

701: Obtain a face image, where the face image includes shaking information;

702: Calculate the distance between the face and the electronic device according to the face image; and

703: When the shake information is within the first preset range and the angle information is within the second preset range, adjust the playback volume according to the distance.

In some embodiments, the acquisition module 11 is used to execute step 701 , the calculation module 12 is used to execute step 702 , and the adjustment module 13 is used to execute step 703 . That is, the acquisition module 11 is used to acquire a face image, and the face image includes shaking information. The calculation module 12 is used for calculating the distance between the human face and the electronic device according to the human face image. The adjustment module 13 is configured to adjust the playback volume according to the distance when the shaking information is within the first preset range and the angle information is within the second preset range.

In some implementations, the processor 30 is further configured to execute step 601 , step 602 and step 603 . That is, the processor 30 is used to obtain a face image, and the face image includes shake information; calculate the distance between the face and the electronic device according to the face image; and when the shake information is in a first preset range and the angle information is in a second preset range , adjust the playback volume according to the distance.

Wherein, step 701 and step 702 are performed in the same way as the above-mentioned step 101 and step 102 respectively, and will not be repeated here.

In some cases, when the user turns, raises or lowers the head, the distance between the user's face and the terminal 100 (electronic device) will also change, and at this time, the user does not need to adjust the playback volume of the terminal 100 .

Therefore, in order to ensure the accuracy of the processor 30 in judging whether to adjust the playback volume, when the processor 30 adjusts the playback volume according to the distance between the human face and the electronic device, it is also necessary to determine whether the angle information in the face image is in the second predetermined range. The processor 30 will adjust the playback volume of the terminal 100 only when the angle information is in the second preset range and the shaking information is in the first preset range.

Wherein, the second preset range may include a preset angle between the human face and the terminal and a corresponding preset orientation. For example, the preset angle may be 70 degrees, which means that relative to the terminal 100 , the angle thresholds of the left head, right head, head up and head down of the human face are 70 degrees. The angle information includes the angle and orientation between the face in the face image and the terminal.

Specifically, the processor 30 may determine whether to adjust the playback volume according to the distance by judging whether the angle between the target face image and the terminal 100 in the multiple frames of face images is within a second preset range. For example, when the angle between the target face image and the terminal 100 is smaller than the second preset range, the processor 30 determines that the angle between the target face image and the terminal 100 is within the second preset range, when the target face image and the terminal 100 When the included angle is greater than the predetermined angle, the included angle between the processor 30 target face image and the terminal 100 is not within the second preset range.

Please refer to Fig. 2, Fig. 3 and Fig. 8, the volume adjustment method of the embodiment of the present application also includes steps:

801: Obtain a face image, where the face image includes shaking information;

802: Calculate the distance between the face and the electronic device according to the face image;

803: Obtain multiple consecutive frames of face images within a second predetermined duration;

804: Judging whether the angle information of the face in the continuous multiple frames of face images is within the second preset range;

805: If yes, determine that the angle information is in the second preset range; and

806: When the shaking information is in the first preset range and the angle information is in the second preset range, adjust the playback volume according to the distance.

In some embodiments, the acquisition module 11 is further configured to execute step 801 , step 803 , step 804 and step 805 , the calculation module 12 is configured to execute step 802 , and the adjustment module 13 is configured to execute step 806 . That is, the acquisition module 11 is used to obtain the face image, and the face image includes shaking information; obtain continuous multiple frames of human face images in the second predetermined duration; judge whether the angle information of the human face is in the first position in the continuous multiple frames of human face images. Two preset ranges; if yes, determine that the angle information is in the second preset range. The calculation module 12 is used for calculating the distance between the human face and the electronic device according to the human face image. The adjustment module 13 is configured to adjust the playback volume according to the distance when the shaking information is within the first preset range and the angle information is within the second preset range.

In some implementations, the processor 30 is further configured to execute step 801 , step 802 , step 803 , step 804 , step 805 and step 806 . That is, the processor 30 obtains a face image, and the face image includes shaking information; calculates the distance between the face and the electronic device according to the face image; obtains continuous multi-frame face images in the second predetermined duration; judges the continuous multi-frame face images , whether the angle information of the face is in the second preset range; if so, then determine that the angle information is in the second preset range; and when the shaking information is in the first preset range and the angle information is in the second preset range, Adjust playback volume according to distance.

Wherein, step 801 is executed in the same manner as above-mentioned step 701, step 802 is executed in the same manner as above-mentioned step 702, and step 806 is executed in the same manner as above-mentioned step 703, which will not be repeated here.

Specifically, the processor 30 also acquires multiple consecutive frames of human face images within a second predetermined time period, and determines whether the angle information of the human face in the continuous multiple frames of human face images is within a second preset range. Whether the angle information is within the second preset range can also reflect whether the angle of the face within the second predetermined time period is valid. Wherein, the second predetermined duration may be greater than the first predetermined duration, may also be shorter than the first predetermined duration, and may also be equal to the first predetermined duration.

The second preset range is a specific angle representing the orientation. For example, the second preset range may be 70 degrees, which means that relative to the terminal 100, the angle thresholds of the left head, right head, head up and head down of the human face are 70 degrees. If the processor 30 acquires 5 frames of human face images, The processor 30 judges whether the angle of the face in the five frames of face images is less than 70 degrees, and when it is less than 70 degrees, determines that the angle information of the face in the second preset range is in the second preset range, If the angle of the face is valid, it means that the user wishes to adjust the playback volume of the terminal 100 .

As shown in FIG. 9 , FIG. 8 is a face image P of the user's right head, and the processor 30 can determine whether the angle between the face and the terminal is in the second predetermined position according to the degree of the user's right head in the face image P. Set the range to determine whether the face angle is valid. If the second preset range is 60 degrees, and the processor 30 judges that the angle of the user's right head in FIG. 8 is 80 degrees, then at this time, the angle between the human face and the terminal is not in the second preset range. Invalid face angle determination. If the second preset range is 60 degrees, the processor 30 judges that the angle of the user’s right head in FIG. .

It should be noted that the processor 30 can simultaneously determine whether the shaking information is in the first preset range and whether the angle information is in the second preset range; the processor 30 can also first determine whether the shaking information is in the first preset range. work, and then determine whether the angle information is in the second preset range; the processor 30 can also first determine whether the angle information is in the second preset range, and then determine whether the shaking information is in the first preset range.

When the processor 30 determines whether the jitter information is in the first preset range and whether the angle information is in the second preset range, if the jitter information is not in the first preset range or the angle information is not in the second preset range When one of the operations is performed, the processor 30 will not adjust the playback volume of the terminal 100 . When the processor 30 successively determines whether the shaking information is in the first preset range and whether the angle information is in the second preset range, then when the previously determined work does not meet the conditions, the processor 30 will not perform subsequent operations. Work. For example, after the processor 30 first determines that the shaking information is not within the first preset range, the processor 30 will not perform work to determine whether the angle information is within the second preset range. Thus, the workload of the processor 30 can be reduced.

Please refer to Fig. 2, Fig. 3 and Fig. 10, the volume adjustment method of the embodiment of the present application also includes steps:

1001: receiving an input operation to set the priority of faces of multiple different users;

1002: Obtain the first face information of the face with the highest priority in the face image as the target face information;

1003: Obtain a face image, where the face image includes face shaking information;

1004: Calculate the distance between the face and the electronic device according to the target face information; and

1005: When the shaking information is within the first preset range, adjust the playback volume according to the distance.

In some embodiments, the volume adjustment device 10 further includes a setting module 14, the setting module 14 is used to execute step 1001 and step 1002, the acquisition module 11 is used to execute step 1003, the calculation module 12 is used to execute step 1004, and the adjustment module 13 It is used to execute step 1005. That is, the setting module 14 is used to receive an input operation to set the priorities of faces of multiple different users; obtain the first face information of the face with the highest priority in the face image as the target face information. The obtaining module 11 is used for obtaining a face image, and the face image includes face shaking information. The calculation module 12 is used for calculating the distance between the human face and the electronic device according to the target human face information. The adjustment module 13 is configured to adjust the playback volume according to the distance when the shaking information is within the first preset range.

In some implementations, the processor 30 is configured to execute step 1001 , step 1002 , step 1003 , step 1004 and step 1005 . That is, the processor 30 receives an input operation to set the priority of the faces of a plurality of different users; obtains the first face information of the face with the highest priority in the face image as the target face information; obtains the face The image, the face image includes face shake information; calculate the distance between the face and the electronic device according to the target face information; and adjust the playback volume according to the distance when the shake information is in the first preset range.

Wherein, step 1003 and step 1005 are performed in the same way as the above-mentioned step 101 and step 103 respectively, and will not be repeated here.

Specifically, before the processor 30 acquires the face image, multiple users may input their own faces in the terminal 100, and the processor 30 may receive an input operation, that is, receive the faces of multiple users.

Next, the owner of the terminal 100 can set the priority of the faces of multiple different users through the terminal 100. For example, the owner of the terminal 100 has entered the faces of three users including his own face, and the The owner of the machine can set his own face as the first priority, and the faces of the remaining two users as the second priority and the third priority.

After setting the priorities of the faces of multiple users, the processor 30 may use the first face information of the face with the highest priority among the acquired face images as the target face information.

For example, the terminal 100 is provided with a total of three priority faces, namely, a face with a first priority, a face with a second priority, and a face with a third priority. Then when the processor 30 obtains continuous multi-frame face images, the processor 30 will first look for a face of the first priority, if there is no face of the first priority, then find a face of the second priority, If there is no face with the second priority, then go to the face with the third priority. It should be noted that, if the face image contains the face of the first priority, the face of the second priority and the face of the third priority, the processor 30 will select the first priority (i.e. the priority The first human face information of the highest) human face is used as the target human face information. If the face image does not contain the face of the first priority, the face of the second priority and the face of the third priority, it means that the face image of the continuous multiple frames is invalid, and the processor 30 will not execute The volume adjustment method of the embodiment of the present application.

Finally, after obtaining the target face information in the face image, the processor 30 can use the target face information to calculate the distance between the face and the electronic device (ie, the terminal 100 ). It can be understood that when the face image contains multiple faces, the processor 30 will first determine the face with the highest priority among the multiple faces, so as to use the face information with the highest priority as the target face information, and When the processor 30 calculates the distance between the human face and the electronic device according to the human face image, it calculates the distance between the human face with the highest priority among the multiple human faces and the electronic device.

Therefore, the processor 30 will only provide the owner of the terminal 100 with the work of adjusting the playback volume, so as to avoid the occurrence of other faces affecting the accuracy of adjusting the playback volume when the acquired multi-frame face images contain other faces. situation, thereby ensuring the accuracy of the processor 30 in adjusting the playback volume.

Please refer to Fig. 2, Fig. 3 and Fig. 11, in some embodiments, step 1002: acquire the first face information of the face with the highest priority in the face image, as the target face information, also includes the step :

1101: Identify the second face information of one or more faces in the face image;

1102: Compare one or more second face information with the pre-stored face information in the preset face database to obtain the second face information matching the pre-stored face information as the first face information ;and

1103: Obtain the first face information with the highest priority of the face as the target face information.

In some embodiments, the setting module 14 is used to execute step 1101 , step 1102 and step 1103 . That is, the setting module 14 is used to identify the second face information of one or more faces in the face image; one or more second face information is compared with the pre-stored face information in the preset face storehouse , to obtain the second face information matching the pre-stored face information as the first face information; and obtain the first face information with the highest priority of the face as the target face information.

In some implementations, the processor 30 is configured to execute step 1101 , step 1102 and step 1103 . That is, the processor 30 is used to identify the second face information of one or more faces in the face image; compare the one or more second face information with the pre-stored face information in the preset face database , to obtain the second face information matching the pre-stored face information as the first face information; and obtain the first face information with the highest priority of the face as the target face information.

Specifically, before the processor 30 acquires the first face information of the face with the highest priority in the face image, a preset face library may be set in the terminal 100, and the preset face library includes There is pre-stored face information. After the processor 30 acquires multiple frames of human face images, the processor 30 can recognize the human face information of all the human faces in the human face images, and use the human face information as the second human face information. It should be noted that when the face image contains multiple faces, the processor 30 may acquire face information of the multiple faces to obtain multiple second face information.

Wherein, the pre-stored face information in the preset face database can be generated according to the face images of different users under different lighting conditions, or can be generated according to the face images of different users under different shooting angles of.

Thus, when the user needs to adjust the playback volume of the terminal 100, the processor 30 can remind the user to operate under the same lighting conditions as the pre-stored face information, or the processor 30 can remind the user to take pictures under the same lighting conditions as the pre-stored face information. Operate at an angle to ensure the accuracy of adjusting the playback volume.

Next, the processor 30 can compare the second face information with the pre-stored face information, thereby finding the second face information that matches (that is, is consistent) with the pre-stored face information, and compares the second face information with the pre-stored face information. as the first face information. When the processor 30 compares and obtains the second facial information that matches the multiple pre-stored facial information, multiple first facial information can be obtained.

Finally, the processor 30 can find out the first human face information with the highest priority according to the priorities of different human faces as the target human face information. That is to say, the processor 30 will only determine whether the face shakes and whether the angle of the face is valid for the first face information with the highest priority, and calculate the face and electronic information according to the first face information with the highest priority. The distance between the device (that is, the terminal 100) is used to perform the corresponding work of adjusting the playback volume.

Please refer to Fig. 2, Fig. 3 and Fig. 12, the volume adjustment method of the embodiment of the present application also includes steps:

1201: Obtain a face image, where the face image includes shaking information;

1202: According to the input operation, set the initial volume at the initial distance, and associate the initial distance with the initial size of the face in the face image collected at the initial distance;

1203: Calculate the distance according to the initial distance, the initial size, and the average size of the face sizes in multiple frames of face images;

1204: Determine and adjust the volume according to the initial distance, the current distance and the initial volume; and

1205: Adjust the playback volume according to the adjusted volume.

In some embodiments, the volume adjustment device 10 further includes an association module 15, the association module 15 is used to execute step 1202, the acquisition module 11 is used to execute step 1201, the calculation module 12 is used to execute step 1203, and the adjustment module 13 is used to execute Step 1204 and Step 1205. That is, the acquiring module 11 acquires a face image, and the face image includes shaking information. The association module 15 is used to set the initial volume at the initial distance according to the input operation, and associate the initial distance with the initial size of the face in the face image collected at the initial distance. The calculating module 12 calculates the current distance according to the initial distance, the initial size and the average size of the faces in multiple frames of face images. The adjustment module 13 is used for determining the adjusted volume according to the initial distance, the current distance and the initial volume; and adjusting the playback volume according to the adjusted volume.

In some embodiments, the processor 30 is used to execute step 1201, step 1202, step 1203, step 1204 and step 1205, that is, the processor 30 is used to acquire a face image, and the face image includes shaking information; according to the input operation, Set the initial volume at the initial distance, and associate the initial distance with the initial size of the face in the face image collected at the initial distance; calculate The current distance; determining and adjusting the volume according to the initial distance, the current distance and the initial volume; and adjusting the playback volume according to the adjusted volume.

Wherein, step 1201 is executed in the same way as the above step 101, and will not be repeated here.

Specifically, before the processor 30 calculates the distance between the face and the electronic device according to the target face information, the user can operate according to the instructions of the terminal 100 to set an appropriate distance from the terminal 100 and an optimal playback volume of the terminal 100. For example, the distance between the user and the terminal 100 is 0.5 meters, and the optimal playback volume of the terminal 100 is 50 decibels. At this time, the processor 30 takes the distance and the playback volume as the initial distance and the initial volume, respectively. At this time, the processor 30 may also acquire the current face image of the user at the initial distance. Thus, the processor 30 can associate the initial distance with the initial size of the face in the face image, for example, the initial distance corresponds to the initial size.

Next, when the processor 30 calculates the distance between the human face and the electronic device according to the target human face information, the average size of the size of the human face in the multi-frame human face images can be calculated first, and then according to the following formula (1), the Calculate the current distance.

L1＝L0*(S1/S0) (1)

Among them, L1 is the current distance, S1 is the average size of the face in the multi-frame face image, S0 is the initial size of the face in the face image, and L0 is the initial distance. It can be understood that the face images corresponding to S1 and S0 are the face images at the L1 distance and the face images at the L0 distance respectively, and they are not the same face image. When S1 is equal to S0, L1 is equal to L0.

After the processor 30 calculates the distance between the human face and the electronic device, according to the relationship between the sound pressure and the distance, the following formula (2) can be obtained, so that under the current distance, the playback volume of the terminal 100 is adjusted from the initial volume to Vary volume required for proper playback volume.

△V＝20Log(L1/L0) (2)

Thus, when the changing volume ΔV is known, the adjusted volume corresponding to the current distance can be obtained according to the following formula (3).

V1＝V0+△V (3)

Wherein, △V is the changing volume required to adjust the playback volume of the terminal 100 from the initial volume to an appropriate playback volume at the current distance, V1 is the corresponding adjusted volume at the current distance, and V0 is the initial volume.

Finally, the processor 30 can adjust the playing volume according to the volume V1, that is, adjust the playing volume of the terminal 100 to V1. Wherein, since V0 is the optimal playback volume of the terminal 100 under the preset initial distance, when the processor 30 calculates the adjusted volume V1 according to the above formulas (1), (2), and (3), the adjusted volume V1 It is also the optimal playback volume of the terminal 100 at the current distance, so as to ensure a better user experience for the user.

Referring to FIG. 13 , the embodiment of the present application also provides a non-volatile computer-readable storage medium 200 containing a computer program 201 . When the computer program 201 is executed by one or more processors 30, the one or more processors 30 are made to execute the volume adjustment method in any one of the above-mentioned embodiments.

For example, when the computer program 201 is executed by one or more processors 30, the processors 30 are made to perform the following volume adjustment method:

101: Acquire a face image, where the face image includes shaking information;

For another example, when the computer program 201 is executed by one or more processors 30, the processors 30 are made to perform the following volume adjustment method:

501: Obtain a face image, where the face image includes shaking information;

701: Obtain a face image, where the face image includes shaking information;

801: Obtain a face image, where the face image includes shaking information;

Also for example, when the computer program 201 is executed by one or more processors 30, the processors 30 are made to perform the following volume adjustment method:

1201: Obtain a face image, where the face image includes shaking information;

1205: Adjust the playback volume according to the adjusted volume.

In the description of this specification, descriptions with reference to the terms "certain embodiments", "in one example", "exemplarily" and the like mean that specific features, structures, materials or characteristics described in connection with the embodiments or examples are included in the In at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the described specific features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples. In addition, those skilled in the art can combine and combine different embodiments or examples and features of different embodiments or examples described in this specification without conflicting with each other.

Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent modules, segments or portions of code comprising one or more executable instructions for implementing specific logical functions or steps of the process , and the scope of preferred embodiments of the present application includes additional implementations in which functions may be performed out of the order shown or discussed, including in substantially simultaneous fashion or in reverse order depending on the functions involved, which shall It should be understood by those skilled in the art to which the embodiments of the present application belong.

Although the implementation of the present application has been shown and described above, it can be understood that the above-mentioned implementation is exemplary and should not be construed as limiting the application, and those skilled in the art can make the above-mentioned The embodiments are subject to changes, modifications, substitutions and variations.

Claims

A volume adjustment method, characterized in that, comprising:

Obtaining a face image, the face image includes shaking information;

calculating the distance between the human face and the electronic device according to the human face image; and

When the shaking information is within a first preset range, the playback volume is adjusted according to the distance.
The volume adjustment method according to claim 1, further comprising:

Acquiring the face images of multiple consecutive frames within the first predetermined duration;

Judging whether the difference between the position coordinates of the faces in any two frames of the face images in the continuous multiple frames of the face images is within the first preset range; and

If yes, determine that the shaking information is within a first preset range.
The volume adjustment method according to claim 1, wherein the face image further includes angle information, and adjusting the playback volume according to the distance includes:

When the shaking information is within the first preset range and the angle information is within a second preset range, the playback volume is adjusted according to the distance.
The volume adjustment method according to claim 3, further comprising:

Acquiring the face images of multiple consecutive frames within a second predetermined duration;

judging whether the angle information of the face in the multiple consecutive frames of the face image is within the second preset range; and

If yes, determine that the angle information is within the second preset range.
The volume adjustment method according to claim 1, further comprising: before calculating the distance between the human face and the electronic device according to the human face image:

receiving input to prioritize said faces of a plurality of different users; and

Obtaining the first face information of the face with the highest priority in the face image as the target face information;

The calculating the distance between the human face and the electronic device according to the human face image includes:

Calculate the distance between the human face and the electronic device according to the target human face information.
The volume adjustment method according to claim 5, wherein said obtaining the first face information of the face with the highest priority in the face image as the target face information includes :

identifying the second face information of one or more of the faces in the face image;

Comparing one or more of the second face information with the pre-stored face information in the preset face database, to obtain the second face information matching the pre-stored face information, as the Describe the first face information;

Acquiring the first human face information with the highest priority of the human face as the target human face information.
The volume adjustment method according to claim 6, wherein the pre-stored face information is generated according to the face images of different users under different lighting conditions.
The volume adjustment method according to claim 6, wherein the pre-stored face information is generated according to the face images of different users at different shooting angles.
The volume adjustment method according to claim 1, wherein the volume adjustment method further comprises:

According to the input operation, an initial volume at an initial distance is set, and the initial size of the face in the face image collected at the initial distance is associated with the initial distance;

The calculating the distance between the human face and the electronic device according to the human face image includes:

The distance is calculated according to the initial distance, the initial size, and an average size of the face sizes in multiple frames of the face images.
The volume adjustment method according to claim 9, wherein the adjusting the playback volume according to the distance comprises:

determining and adjusting volume according to the initial distance, the distance and the initial volume; and

Adjusting the playing volume according to the adjusting volume.
A volume adjustment device, characterized in that it comprises:

An acquisition module, the acquisition module is used to acquire a face image, and the face image includes shaking information;

A calculation module, the calculation module is used to calculate the distance between the human face and the electronic device according to the human face image; and

An adjustment module, configured to adjust the playback volume according to the distance when the shaking information is within a first preset range.
A terminal, characterized in that it includes a processor, and the processor is used for:

Obtaining a face image, the face image includes shaking information;

calculating the distance between the human face and the electronic device according to the human face image; and

When the shaking information is within a first preset range, the playback volume is adjusted according to the distance.
The terminal according to claim 12, wherein the processor is configured to:

Acquiring the face images of multiple consecutive frames within the first predetermined duration;

Judging whether the difference between the position coordinates of the faces in any two frames of the face images in the continuous multiple frames of the face images is within the first preset range; and

If yes, determine that the shaking information is within a first preset range.
The terminal according to claim 12, wherein the face image further includes angle information, and the processor is configured to, when the shaking information is in the first preset range and the angle information is in a second preset range, When setting the range, adjust the playback volume according to the distance.
The terminal according to claim 14, wherein the processor is configured to:

Acquiring the face images of multiple consecutive frames within a second predetermined duration;

judging whether the angle information of the face in the multiple consecutive frames of the face image is within the second preset range; and

If yes, determine that the angle information is within the second preset range.
The terminal according to claim 12, wherein before the processor calculates the distance between the human face and the electronic device according to the human face image, the processor is configured to:

receiving input to prioritize said faces of a plurality of different users; and

Obtaining the first face information of the face with the highest priority in the face image as the target face information;

Calculate the distance between the human face and the electronic device according to the target human face information.
The terminal according to claim 16, wherein the processor is configured to:

identifying the second face information of one or more of the faces in the face image;

Comparing one or more of the second face information with the pre-stored face information in the preset face database to obtain the second face information matching the pre-stored face information, as the Describe the first face information;

Acquiring the first human face information with the highest priority of the human face as the target human face information.
The terminal according to claim 17, wherein the processor is configured to generate the pre-stored face information according to the face images of different users under different lighting conditions.
The terminal according to claim 17, wherein the processor is configured to generate the pre-stored face information according to the face images of different users at different shooting angles.
The terminal according to claim 12, wherein the processor is configured to:

According to the input operation, an initial volume at an initial distance is set, and the initial size of the face in the face image collected at the initial distance is associated with the initial distance;

The distance is calculated according to the initial distance, the initial size, and an average size of the face sizes in multiple frames of the face images.
The terminal according to claim 20, wherein the processor is configured to:

determining and adjusting volume according to the initial distance, the distance and the initial volume; and

Adjusting the playing volume according to the adjusting volume.
A non-volatile computer-readable storage medium including a computer program, when the computer program is executed by a processor, the processor is made to execute the volume adjustment method according to any one of claims 1-10.