CN111813367A

CN111813367A - Method, device and equipment for adjusting volume and storage medium

Info

Publication number: CN111813367A
Application number: CN202010711046.XA
Authority: CN
Inventors: 郭军
Original assignee: Guangzhou Fanxing Huyu IT Co Ltd
Current assignee: Guangzhou Fanxing Huyu IT Co Ltd
Priority date: 2020-07-22
Filing date: 2020-07-22
Publication date: 2020-10-23

Abstract

The application discloses a method, a device, equipment and a storage medium for adjusting volume, and belongs to the technical field of computers. The method comprises the following steps: acquiring sampling values of each sampling point in each audio segment with preset time duration divided in a target audio in a Pulse Code Modulation (PCM) format; determining a volume level corresponding to each audio segment based on the sampling value of each sampling point in each audio segment; determining a volume adjustment coefficient corresponding to the target audio based on the volume level corresponding to each audio segment; and adjusting the volume of the target audio based on the volume adjustment coefficient corresponding to the target audio. According to the method and the device, the volume of the target audio can be automatically adjusted according to the volume adjustment coefficient, so that the problem that the playing volume of the device is frequently and manually adjusted by a main broadcaster is solved.

Description

Method, device and equipment for adjusting volume and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for adjusting a volume.

Background

With the development of networks, singing in a live broadcast room using an electronic device to watch a main broadcast is one of the current trends.

In the related art, during live broadcasting of a main broadcast, the main broadcast sequentially selects a plurality of audio broadcasts.

In the course of implementing the present application, the inventors found that the related art has at least the following problems:

in the above process, when the electronic device plays a plurality of different audios in sequence, the original volumes of the different audios are different due to different sources of different song audios, and then the actual playing volumes of the different audios are different under the condition that the playing volume of the device is fixed, so that the playing volume of the device needs to be frequently manually adjusted by the anchor.

Disclosure of Invention

In order to solve technical problems in the related art, embodiments of the present application provide a method, an apparatus, a device, and a storage medium for adjusting a volume. The technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a method for adjusting volume, where the method includes:

acquiring sampling values of each sampling point in each audio segment with preset time duration divided in a target audio in a Pulse Code Modulation (PCM) format;

determining a volume level corresponding to each audio segment based on the sampling value of each sampling point in each audio segment;

determining a volume adjustment coefficient corresponding to the target audio based on the volume level corresponding to each audio segment;

and adjusting the volume of the target audio based on the volume adjustment coefficient corresponding to the target audio.

Optionally, the determining, based on the sample value of each sample point in each audio segment, a volume level corresponding to each audio segment includes:

determining sampling points of which the sampling values are in the same numerical range based on the sampling values of the sampling points in each audio segment and a plurality of preset numerical ranges;

for each audio segment, determining the number of sampling points corresponding to each numerical range;

and determining the volume level corresponding to each audio segment based on the number of sampling points corresponding to each numerical range in each audio segment.

Optionally, the determining, based on the number of sampling points corresponding to each value range in each audio segment, a volume level corresponding to each audio segment includes:

for each audio segment, screening out a target numerical range with the maximum number of corresponding sampling points from a plurality of numerical ranges to obtain a target numerical range corresponding to each audio segment;

and determining the volume level corresponding to each audio segment based on the target numerical range corresponding to each audio segment and the corresponding relationship between the pre-stored numerical range and the volume level.

Optionally, the determining, based on the volume level corresponding to each audio segment, a volume adjustment coefficient corresponding to the target audio includes:

determining the number of audio segments corresponding to the same volume level based on the volume level corresponding to each audio segment;

and determining a volume adjustment coefficient corresponding to the target audio based on the number of audio segments corresponding to each volume level.

Optionally, the determining, based on the number of audio segments corresponding to each volume level, a volume adjustment coefficient corresponding to the target audio includes:

determining a ratio corresponding to each volume level according to the ratio of the number of the audio segments corresponding to each volume level to the total number of all audio segments in the target audio;

and determining a volume adjustment coefficient corresponding to the target audio based on the magnitude relation between the ratio corresponding to each volume level and a preset ratio.

Optionally, the determining, based on a magnitude relationship between a ratio corresponding to each volume level and a preset ratio, a volume adjustment coefficient corresponding to the target audio includes:

screening out a target volume level with the maximum number of corresponding audio segments from a plurality of volume levels to obtain a target volume level corresponding to the target audio;

and determining the volume adjusting coefficient corresponding to the target audio based on the target volume level corresponding to the target audio and the corresponding relation between the pre-stored volume level and the volume adjusting coefficient.

In a second aspect, an embodiment of the present application provides an apparatus for adjusting volume, where the apparatus includes:

the acquisition module is configured to acquire sampling values of sampling points in each audio segment with preset time duration divided in a target audio in a Pulse Code Modulation (PCM) format;

the first determining module is configured to determine a volume level corresponding to each audio segment based on the sampling value of each sampling point in each audio segment;

a second determining module configured to determine a volume adjustment coefficient corresponding to the target audio based on the volume level corresponding to each audio segment;

and the volume adjusting module is configured to adjust the volume of the target audio based on the volume adjusting coefficient corresponding to the target audio.

Optionally, the first determining module is configured to:

Optionally, the second determining module is configured to:

and determining a volume adjustment coefficient corresponding to the target audio based on a target volume level corresponding to the target audio and a pre-stored corresponding relationship between the volume level and the volume adjustment coefficient, wherein the volume adjustment coefficient corresponding to the volume level larger than the reference volume level in the corresponding relationship is used for reducing the volume, and the volume adjustment coefficient corresponding to the volume level smaller than the reference volume level is used for increasing the volume.

In a third aspect, an embodiment of the present application provides a computer device, where the terminal includes a processor and a memory, where the memory stores at least one instruction, and the at least one instruction is loaded and executed by the processor to implement the method for adjusting volume according to the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, where at least one instruction is stored, and the at least one instruction is loaded and executed by a processor to implement the method for adjusting volume according to the first aspect.

The technical scheme provided by the embodiment of the application has the following beneficial effects:

the volume adjustment coefficient in the embodiment of the present application is determined according to the volume level corresponding to each audio segment, and the volume level indicates the volume level. Based on this scheme, the technical staff can be reasonable according to the demand set up corresponding volume adjustment coefficient to the condition of different volume levels to carry out the volume with the audio frequency that the volume is big and turn down, carry out the volume with the audio frequency that the volume is little and turn up. Therefore, the volume of different audios can be automatically adjusted up or down according to the volume adjustment coefficient, so that the actual playing volumes of different audios are similar, and the problem that the actual playing volume of equipment is frequently and manually adjusted by a main broadcaster is solved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment for adjusting volume according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a method for adjusting volume according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of adjusting volume according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an apparatus for adjusting volume according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an implementation environment for adjusting volume according to an embodiment of the present disclosure, and as shown in fig. 1, the implementation environment may include: a server 101 and an electronic device 102.

The server 101 may be one server or a server cluster including a plurality of servers. The server 101 may be at least one of a cloud computing platform and a virtualization center, which is not limited in this embodiment of the present application. The server 101 may be configured to receive a play request of a certain audio sent by the terminal, calculate a volume adjustment coefficient of each audio according to a sampling value of each sampling point in each audio segment in each audio, and send the audio requested by the terminal and the volume adjustment coefficient corresponding to the audio to the terminal. Of course, the server 101 may also include other functional servers to provide more comprehensive and diversified services.

The electronic device 102 may be at least one of a smart phone, a game console, a desktop computer, a tablet computer, an MP3(Moving Picture Experts Group Audio Layer III) player, an MP4(Moving Picture Experts Group Audio Layer IV) player, and a laptop computer. The electronic device 102 is connected to the server 101 through a wired network or a wireless network, and an application program supporting audio playing is installed and operated in the electronic device 102. The electronic device 102 may be configured to send a play request corresponding to a certain audio to the server after receiving a play instruction of the audio triggered by the user, may also be configured to receive the audio sent by the server and a volume adjustment coefficient corresponding to the audio, adjust the volume of the audio according to the volume adjustment coefficient, and may also be configured to play the adjusted audio.

When the main broadcast plays music in the live broadcast room, the main broadcast can set the playing volume of the equipment and fix the playing volume according to the size of the sound singing by the main broadcast and the actual playing volume of the currently played music, so that the current live broadcast singing effect is the best. However, since each audio source played by the anchor is different, for example, hardware parameters of the recording device are different, the original volume of the stored audio is different, and further, under the condition that the playing volume of the device is fixed, the actual playing volume of different audio is different, and further, the actual playing volume of the device needs to be frequently adjusted manually by the anchor. The volume adjustment coefficient in the embodiment of the present application is determined according to the volume level corresponding to each audio segment, and the volume level indicates the volume level. Based on this scheme, the technical staff can be reasonable according to the demand set up corresponding volume adjustment coefficient to the condition of different volume levels to carry out the volume with the audio frequency that the volume is big and turn down, carry out the volume with the audio frequency that the volume is little and turn up. For example, the original volume of each audio is adjusted according to the volume adjustment information corresponding to each audio, or the actual playing volume of each audio is adjusted together according to the volume adjustment information corresponding to each audio and the playing volume of the device. On the premise that the playing volume of the equipment is fixed, the actual playing volumes of the multiple audios are similar, and the problem that the playing volume of the equipment needs to be frequently and manually adjusted by a main broadcaster due to the fact that the actual playing volumes of the audios are suddenly changed is solved.

An embodiment of the present application provides a method for adjusting volume, and takes a flowchart of the method for adjusting volume provided in the embodiment of the present application shown in fig. 2 as an example. As shown in fig. 2, the method comprises the steps of:

step 201, obtaining the sampling value of each sampling point in each audio segment with preset duration divided in the target audio in the pulse code modulation PCM format.

The target audio is divided into a plurality of audio segments with preset time duration, and the size of the preset time duration can be preset by a technician. The preset time period may be 1 second, 2 seconds, 3 seconds, and the like. When the sampling frequency is 44100HZ per second and the preset time duration is 1 second, 44100 sampling points exist in the audio segment, and when the sampling frequency is 20000HZ per second and the preset time duration is 2 seconds, 40000 sampling points exist in the audio segment. The sample value of each sampling point in each audio segment in the target audio represents the volume of the target audio at the sampling point. The target audio may be any audio in the audio library.

In implementation, any audio frequency without the original volume being adjusted is selected from an audio frequency library, the audio frequency is decoded, and a PCM (Pulse Code Modulation) format audio frequency after decoding is obtained.

PCM is a sampling technique for digitizing an analog signal, and is used to convert an analog voice signal into a digital signal, i.e., a pulse signal. Pulse code modulation goes through mainly 3 processes: sampling, quantizing, and encoding. The sampling process changes continuous time analog signals into sampling signals with discrete time and continuous amplitude, the quantization process changes the sampling signals into digital signals with discrete time and discrete amplitude, and the coding process codes the quantized signals into a binary code group for output.

Step 202, based on the sampling values of the sampling points in each audio segment, determining the volume level corresponding to each audio segment.

For example, the volume level of an audio segment may indicate that the overall volume of the audio segment is large, the volume level of an audio segment may indicate that the overall volume of the audio segment is small, and the volume level of an audio segment may indicate that the overall volume of the audio segment is appropriate.

In implementation, the volume level corresponding to each audio segment is determined by determining the sampling value of each sampling point in each audio segment, that is, the total volume of each audio segment is determined according to the sampling value of each sampling point in each audio segment.

Optionally, determining sampling points of which the sampling values are in the same numerical range based on the sampling values of the sampling points in each audio segment and a plurality of preset numerical ranges; for each audio segment, determining the number of sampling points corresponding to each numerical range; and determining the volume level corresponding to each audio segment based on the number of sampling points corresponding to each numerical range in each audio segment.

Wherein, the technical staff can divide a plurality of preset numerical value ranges according to experience. The skilled person can also divide a plurality of predetermined ranges of values according to the sensitivity of the human ear. For example, the volume corresponding to a sampling point whose sampling value is in the range of 18000 to 26000 is appropriate for the human ear, and the volume corresponding to a sampling point whose sampling value is above 26000 is larger for the human ear. For example, if the sampling points whose sampling values are above 26000 exist continuously, it is easy to form a sharp sound, which affects the playing effect of the audio. Meanwhile, technicians can also divide a plurality of preset numerical value ranges according to the sensitivity of ears of different types of people. For example, a plurality of numerical ranges are divided for persons of each age group. For the elderly, the volume of a sampling point whose sampling value is in the range of 0 to 20000 is small for the human ear of the elderly, the volume of a sampling point whose sampling value is in the range of 20000 to 30000 is suitable for the human ear of the elderly, and the volume of a sampling point whose sampling value is above 30000 is large for the human ear of the elderly. The range of the division numbers in the embodiments of the present application is not limited.

It should be noted that, in the embodiment of the present application, the volume of the sampling point whose sampling value is in the range of 18000 to 26000 is suitable for human ears, and it can be understood that, when the playback volume of the device is adjusted in a wide range, the numerical range corresponds to the actual playback volume of the device and is suitable for human ears. The volume of the sampling point with the sampling value above 26000 is larger for human ears, and it can be understood that when the playback volume of the device is adjusted in a larger range, the actual playback volume of the device corresponding to the numerical range is larger for human ears

In implementation, the sampling value of each sampling point in each audio segment is determined, and a plurality of preset value ranges pre-divided by technicians are obtained. And determining the sampling points of which the sampling values are in the same numerical range according to the sampling values of the sampling points in each audio segment, and further determining the number of the sampling points corresponding to each numerical range in each audio segment. And determining the volume level corresponding to each audio segment according to the number of the sampling points corresponding to each numerical range in each audio segment.

Optionally, for each audio segment, in the multiple numerical value ranges, a target numerical value range with the largest number of corresponding sampling points is screened out, and a target numerical value range corresponding to each audio segment is obtained. And determining the volume level corresponding to each audio segment based on the target value range corresponding to each audio segment and the corresponding relationship between the pre-stored value range and the volume level.

The target value range is a value range with the maximum number of corresponding sampling points in each audio segment, and one audio segment corresponds to one target value range. In the embodiment of the present application, three preset numerical ranges are divided, for example, where the first numerical range corresponds to a first volume level, the second numerical range corresponds to a second volume level, and the third numerical range corresponds to a third volume level.

In implementation, based on the sample value of each sample point in each audio segment, the sample point of the sample value in the first range of values, the sample point of the sample value in the second range of values, and the sample point of the sample value in the third range of values in each audio segment are determined. The number of sample points in the first range of values, the number of sample points in the second range of values and the number of sample points in the third range of values are determined. If the number of the sampling points in the first numerical value range is the maximum, the volume level corresponding to the audio segment is the first volume level; if the number of the sampling points in the second numerical value range is the maximum, the volume level corresponding to the audio segment is the second volume level; if the number of sample points within the third range of values is the greatest, the volume level of the audio segment is the third volume level.

It should be noted that, when the file corresponding to the target audio is stored in a 16-bit binary file, the maximum sample value that can be recorded is 32767, that is, the maximum value that can be recorded by the sample value is in the range of 0 to 32767. The skilled person can divide 0 to 32767 into at least one predetermined range of values. In the embodiment of the present application, three preset value ranges are divided as an example, and the three value ranges are 0 to 18000, 18000 to 26000, and 26000 to 32767, respectively. The volume levels corresponding to 0 to 18000 are low, the volume levels corresponding to 18000 to 26000 are medium, that is, the volume levels are proper, and the volume levels corresponding to 26000 to 32767 are high. The overall volume of the audio segment is just as large when the values of the majority of the sample points within the audio segment are within 18000 to 26000, smaller when the values of the majority of the sample points within the audio segment are within 0 to 18000, and larger when the values of the majority of the sample points within the audio segment are within 26000 to 32767.

Further, when the sample value of a certain sampling point is greater than 32767, the sample value of the sampling point is recorded as 32767.

For example, 44100 sampling points are provided in a certain audio segment, the number of sampling points in the sampling values from 0 to 18000 is 24100, the number of sampling points in the sampling values from 18000 to 26000 is 10000, and the number of sampling points in the sampling values from 26000 to 32767 is 10000. Among the three values, the number of sampling points of the sampling value from 0 to 18000 is the largest, and it is known that the volume level of the audio segment is low.

The present embodiment may further determine a ratio between the number of sampling points in each numerical range in each audio segment and the number of all sampling points in each audio segment, in these ratios, select a target numerical range with a maximum corresponding ratio, and then obtain a target numerical range corresponding to each audio segment, and for each audio segment, in a plurality of numerical ranges, select a target numerical range with a number of corresponding sampling points greater than a threshold, and then obtain a target numerical range corresponding to each audio segment, and may also determine a ratio between the number of sampling points in each numerical range in each audio segment and the number of all sampling points in each audio segment, in these ratios, select a target numerical range with a corresponding ratio greater than a threshold, and then obtain a target numerical range corresponding to each audio segment.

Step 203, determining a volume adjustment coefficient corresponding to the target audio based on the volume level corresponding to each audio segment.

In implementation, whether the volume of the target audio needs to be adjusted or not is determined according to the volume level corresponding to each audio segment, and a volume adjustment coefficient corresponding to the target audio is determined.

Optionally, based on the volume level corresponding to each audio segment, determining the number of audio segments corresponding to the same volume level; and determining a volume adjustment coefficient corresponding to the target audio based on the number of the audio segments corresponding to each volume level.

Because the duration of each audio is different, or the duration of the audio segment divided by each audio is different, and the number of the corresponding audio segments is also different, the number of the audio segments corresponding to each volume level can be calculated to determine the audio adjustment coefficient of the target audio.

In an implementation, the number of audio segments for which the same volume level corresponds is determined based on the volume level for each audio segment. The volume adjustment coefficient of the target audio frequency can be determined according to the volume level with the maximum number of the corresponding audio frequency segments, or according to the ratio of the number of the audio frequency segments corresponding to each volume level to the total number of all the audio frequency segments in the target audio frequency.

Optionally, the ratio of the number of audio segments corresponding to each volume level to the total number of all audio segments in the target audio is determined as the ratio corresponding to each volume level. And determining a volume adjustment coefficient corresponding to the target audio based on the magnitude relation between the ratio corresponding to each volume level and a preset ratio.

Optionally, in a plurality of volume levels, screening out a target volume level with the largest number of corresponding audio segments to obtain a target volume level corresponding to the target audio; and determining the volume adjusting coefficient corresponding to the target audio based on the target volume level corresponding to the target audio and the corresponding relation between the pre-stored volume level and the volume adjusting coefficient.

In the pre-stored correspondence relationship between the volume levels and the volume adjustment coefficients, the volume adjustment coefficient corresponding to a volume level greater than the reference volume level is a volume adjustment coefficient for lowering the volume, and the volume adjustment coefficient corresponding to a volume level less than the reference volume level is a volume adjustment coefficient for raising the volume.

In implementation, a target volume level with the largest number of corresponding audio segments is determined, and a volume adjustment coefficient of the target audio is determined. For example, the target volume level corresponding to the largest number of audio segments is a volume low level, and it is known that the overall volume corresponding to the target audio is a volume low level, and then the volume adjustment coefficient corresponding to the volume low level is determined as the volume adjustment coefficient of the target audio.

Further, for the target audio, the number of audio segments corresponding to the first volume level, the number of audio segments corresponding to the second volume level, the number of audio segments corresponding to the third volume level, and the total number of all audio segments in the target audio are determined, and ratios respectively corresponding to the first volume level, the second volume level, and the third volume level are calculated. If the ratio corresponding to the first volume level is larger than the preset ratio, determining the volume adjustment coefficient corresponding to the target audio as a first coefficient; if the ratio corresponding to the second volume level is larger than the preset ratio, determining the volume adjustment coefficient corresponding to the target audio as a second coefficient; and if the ratio corresponding to the third volume level is larger than the preset ratio, determining the volume adjustment coefficient corresponding to the target audio as a third coefficient.

Furthermore, the volume level corresponding to each audio segment is one of a low volume, a medium volume and a high volume, wherein the low volume corresponds to the first coefficient, the medium volume corresponds to the second coefficient, and the volume is greater than the third coefficient. Determining the number of audio segments corresponding to small volume, the number of audio segments corresponding to large volume and the number of audio segments corresponding to large volume according to the volume level of each audio segment; and calculating the ratio of the number of the audio segments with small corresponding volume to the total number of all the audio segments in the target audio, the ratio of the number of the audio segments with proper corresponding volume to the total number of all the audio segments in the target audio, and the ratio of the number of the audio segments with large corresponding volume to the total number of all the audio segments in the target audio. And if the ratio corresponding to the small volume is larger than the preset ratio, determining the volume adjustment coefficient corresponding to the target audio as a first coefficient. And if the ratio corresponding to the appropriate volume is larger than the preset ratio, determining that the volume adjustment coefficient corresponding to the target audio is a second coefficient. And if the ratio corresponding to the large volume is larger than the preset ratio, determining that the volume adjustment coefficient corresponding to the target audio is a third coefficient.

It should be noted that the first coefficient is greater than the second coefficient, and the second coefficient is greater than the third coefficient, and since the second coefficient and the volume level are suitable for corresponding to the volume, it can be seen that the second coefficient is a reference coefficient in the volume adjustment coefficient. According to the embodiment of the application, the volume of the audio with large volume is turned down through the first coefficient, the volume of the audio with small volume is turned up through the third coefficient, and then the volume of the audio after adjustment is similar.

For example, the volume level corresponding to each audio segment is set to be low, and a preset ratio of suitable volume to high volume is set to be 60%. The ratio of the number of the audio segments corresponding to the small volume to the total number of all the audio segments in the target audio is 10%, and the ratio of the number of the audio segments corresponding to the large volume to the total number of all the audio segments in the target audio is 80%. In the above ratio, the ratio of the number of audio segments corresponding to a small volume to the total number of all audio segments in the target audio is greater than a preset ratio. At this time, by inquiring the correspondence between the volume levels and the volume adjustment coefficients stored in advance, it is determined that the volume adjustment coefficient corresponding to the small volume is 1.2.

Wherein, the corresponding relation between the volume level and the volume adjusting coefficient is as shown in table 1.

TABLE 1

Volume level	Coefficient of sound volume adjustment
		The volume is small	1.2
In the volume of	1
		High sound volume	0.8

It should be noted that, in the above method, the volume level corresponding to each audio segment is taken as one of a low volume, a medium volume and a high volume. The volume level corresponding to each audio segment can be one of small volume, large volume and large volume, so as to determine the volume adjustment coefficient. The specific correspondence is shown in table 2. The volume levels and the divided numerical ranges are in one-to-one correspondence, and therefore, 5 preset numerical ranges can be divided according to each volume level.

TABLE 2

And step 204, adjusting the volume of the target audio based on the volume adjustment coefficient corresponding to the target audio.

The volume of the target audio may be an actual playing volume of the target audio, and the actual playing volume is related to the playing volume of the device and also related to the volume of the target audio. The volume of the target audio may also be the original volume of the target audio, which is the volume that the target audio itself has, made up of each sample value within the target audio. When the volume of the target audio is the actual playing volume of the target audio, the volume of the target audio can be adjusted according to the audio adjusting coefficient and the playing volume of the device, that is, the actual playing volume of the target audio is adjusted according to the audio adjusting coefficient and the playing volume of the device. When the volume of the target audio is the original volume of the target audio, the original volume of the target audio may be adjusted according to the audio adjustment coefficient.

It should be noted that, the original volume of the target audio is adjusted according to the audio adjustment coefficient, the sampling values of each sampling point in the target audio in the PCM format may be directly adjusted according to the audio adjustment coefficient, and the adjusted target audio in the PCM format is encoded. Alternatively, the volume of the target audio after encoding may be adjusted according to the audio adjustment coefficient.

Optionally, the method in this embodiment of the present application may be applied to a terminal, where the terminal detects a target audio to be played, calculates a volume adjustment coefficient corresponding to the target audio to be played according to the methods in steps 201 to 203, and plays the volume of the target audio according to the volume adjustment coefficient and the playing volume of the device. When the target audio is played or after the target audio is played, the identifier of the target audio and the volume adjustment coefficient corresponding to the target audio are stored in the terminal correspondingly, so that when the target audio is played again, the volume adjustment coefficient corresponding to the target audio is found out directly according to the identifier of the target audio.

Or after the volume adjustment coefficient of the target audio is determined, adjusting the original volume of the target audio according to the audio adjustment coefficient, and acquiring the target audio after the volume adjustment. And according to the playing volume of the equipment, playing the target audio after volume adjustment, and replacing the target audio before volume adjustment with the target audio before volume adjustment.

For example, as shown in fig. 3, at this time, the playing volume of the device is 50%, the audio adjustment coefficient corresponding to the song audio "element color" is 1.2, and the song audio "element color" can be played according to the playing volume of the device being 50% and the audio adjustment coefficient being 1.2. Or, the original volume of the song audio element color can be adjusted according to the audio adjustment coefficient 1.2, and then the song audio element color can be played according to the playing volume of the equipment which is 50%.

Alternatively, the methods in step 201 to step 203 may be applied to a server, and the method in step 204 may be applied to a terminal. The specific steps are as follows, calculating the volume adjustment coefficient corresponding to each audio in the audio library according to the methods in the steps 201 to 203, and correspondingly storing the identification of each audio and the audio adjustment coefficient corresponding to each audio in the server. When the server receives a playing request which is sent by the terminal and carries an audio identifier, the server searches an audio adjusting coefficient corresponding to the audio according to the audio identifier, and sends the audio and a volume adjusting coefficient corresponding to the audio to the terminal. And after receiving the audio sent by the server and the volume adjusting coefficient corresponding to the audio, the terminal adjusts the volume of the audio according to the audio adjusting coefficient.

Or after the server determines the volume adjustment coefficient corresponding to each audio in the audio library, the original volume of each audio may be adjusted according to the volume adjustment coefficient corresponding to each audio, and the audio before volume adjustment in the audio library is replaced with the audio after volume adjustment. When the server receives a playing request which is sent by the terminal and carries the audio identifier, the server searches the audio after the volume adjustment according to the audio identifier and sends the audio after the volume adjustment to the terminal. And when receiving the audio frequency after the volume adjustment sent by the server, the terminal plays the audio frequency after the volume adjustment according to the playing volume of the equipment.

Optionally, the volume adjustment coefficient may be determined according to sample values of all sampling points in the target audio, and the specific steps are as follows: determining sampling points of which the sampling values are in the same numerical range according to the sampling values of all the sampling points in the target audio and a plurality of preset numerical ranges; determining the number of sampling points corresponding to each numerical range; screening out a target numerical range with the maximum number of corresponding sampling points from the plurality of numerical ranges to obtain a target numerical range corresponding to the target audio; and determining the volume adjustment coefficient corresponding to the target audio based on the target numerical range corresponding to the target audio and the corresponding relationship between the prestored numerical range and the volume adjustment coefficient.

For example, the duration of the target audio is 8 seconds, the target audio is sampled at a sampling frequency of 44100HZ per second, and there are 8 × 44100 sampling points in the target audio. The preset numerical range may include [0,18000 ], [18000, 26000], and (18000, 26000], and the volume adjustment coefficient may include 0.8, 1, and 1.2, where the numerical range [0,18000) corresponds to the volume adjustment coefficient 1.2, the numerical range [18000, 26000) corresponds to the volume adjustment coefficient 1, and the numerical range [18000, 26000) corresponds to the volume adjustment coefficient 0.8. The number of sampling points in the numerical range [0,18000) is 44100 × 5, the number of sampling points in the numerical range [18000, 26000] is 44100 × 2, and the number of sampling points in the numerical range (26000, 32767) is 44100 × 1, so that the number of sampling points in [0,18000) is the largest, and the numerical range [0,18000) corresponds to the volume adjustment coefficient 1.2, so that the volume adjustment coefficient corresponding to the target audio is determined to be 1.2.

In the above process, the sampling point with the small sampling value is the largest in the target audio, and the size of the sampling value of the sampling point reflects the size of the original volume, so that the original volume of the target audio is small, and the original volume of the target audio needs to be increased.

It should be noted that, in the plurality of value ranges, a target value range in which the number of the corresponding sampling points is greater than the preset number is screened out, and then a target value range corresponding to the target audio is obtained, ratios of the number of the sampling points in the same value range to the number of all the sampling points in the target audio can be calculated, and a target value range in which the corresponding ratio is the largest is screened out from the ratios, or a target value range in which the corresponding ratio is greater than the preset ratio is screened out.

According to the embodiment of the application, the original volume of the target audio can be determined according to the numerical range of the sampling values of most of the sampling points, the volume adjustment coefficient can be further determined according to the original volume corresponding to the target audio, and the volume of the target audio can be adjusted according to the volume adjustment coefficient. By the method, the original volume of each audio is adjusted according to the volume adjustment coefficient corresponding to each audio, so that the sampling values of most sampling points of each audio after adjustment fall within the same numerical value range with proper volume, and the original volumes of the multiple audio after adjustment are similar. Because the original volume of the audio is similar and the playing volume of the equipment is the same, the actual playing volume of each audio is similar, and the problem that the playing volume of the equipment is frequently and manually adjusted by a main player is solved.

Fig. 4 is a schematic structural diagram of a volume adjustment device according to an embodiment of the present application, and as shown in fig. 4, the volume adjustment device may be a terminal or a server according to the foregoing embodiment, and includes:

the acquisition module 401 is configured to acquire a sampling value of each sampling point in each audio segment with preset duration divided in a target audio in a Pulse Code Modulation (PCM) format;

a first determining module 402, configured to determine a volume level corresponding to each audio segment based on the sampling value of each sampling point in each audio segment;

a second determining module 403, configured to determine a volume adjustment coefficient corresponding to the target audio based on the volume level corresponding to each audio segment;

a volume adjustment module 404 configured to adjust the volume of the target audio based on a volume adjustment coefficient corresponding to the target audio.

Optionally, the first determining module 402 is configured to:

Optionally, the second determining module 403 is configured to:

determining the ratio of the number of the audio segments corresponding to each volume level to the total number of all the audio segments in the target audio as the ratio corresponding to each volume level;

Optionally, the second determining module 403 is configured to:

It should be noted that: in the volume adjustment device provided in the above embodiment, when the volume is adjusted, only the division of the above functional modules is taken as an example, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the embodiments of the method for adjusting the volume provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the embodiments of the method for adjusting the volume, which are not described herein again.

Fig. 5 shows a block diagram of a terminal 500 according to an exemplary embodiment of the present application. The terminal 500 may be: a smartphone, a tablet, an MP3 player, an MP4 player, a laptop, or a desktop computer. Terminal 500 may also be referred to by other names such as account device, portable terminal, laptop terminal, desktop terminal, and the like.

In general, the terminal 500 includes: a processor 501 and a memory 502.

The processor 501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 501 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 501 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 501 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 501 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

Memory 502 may include one or more computer-readable storage media, which may be non-transitory. Memory 502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 502 is used to store at least one instruction for execution by processor 501 to implement the method of volume adjustment provided by method embodiments herein.

In some embodiments, the terminal 500 may further optionally include: a peripheral interface 503 and at least one peripheral. The processor 501, memory 502 and peripheral interface 503 may be connected by a bus or signal lines. Each peripheral may be connected to the peripheral interface 503 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 504, touch screen display 505, camera 506, audio circuitry 507, positioning components 508, and power supply 509.

The peripheral interface 503 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 501 and the memory 502. In some embodiments, the processor 501, memory 502, and peripheral interface 503 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 501, the memory 502, and the peripheral interface 503 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 504 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 504 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 504 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 504 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, an account identity module card, and so forth. The radio frequency circuitry 504 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 504 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 505 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 505 is a touch display screen, the display screen 505 also has the ability to capture touch signals on or over the surface of the display screen 505. The touch signal may be input to the processor 501 as a control signal for processing. At this point, the display screen 505 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 505 may be one, providing the front panel of the terminal 500; in other embodiments, the display screens 505 may be at least two, respectively disposed on different surfaces of the terminal 500 or in a folded design; in still other embodiments, the display 505 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 500. Even more, the display screen 505 can be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display screen 505 may be made of LCD (liquid crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The camera assembly 506 is used to capture images or video. Optionally, camera assembly 506 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 506 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Audio circuitry 507 may include a microphone and a speaker. The microphone is used for collecting sound waves of the account and the environment, converting the sound waves into electric signals, and inputting the electric signals into the processor 501 for processing, or inputting the electric signals into the radio frequency circuit 504 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 500. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 501 or the radio frequency circuit 504 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 507 may also include a headphone jack.

The positioning component 508 is used to locate the current geographic position of the terminal 500 for navigation or LBS (location based Service). The positioning component 508 may be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

Power supply 509 is used to power the various components in terminal 500. The power source 509 may be alternating current, direct current, disposable or rechargeable. When power supply 509 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 500 also includes one or more sensors 55. The one or more sensors 510 include, but are not limited to: acceleration sensor 511, gyro sensor 512, pressure sensor 513, fingerprint sensor 514, optical sensor 515, and proximity sensor 516.

The acceleration sensor 511 may detect the magnitude of acceleration on three coordinate axes of the coordinate system established with the terminal 500. For example, the acceleration sensor 511 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 501 may control the touch screen 505 to display the account interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 511. The acceleration sensor 511 may also be used for acquisition of motion data of a game or an account.

The gyro sensor 512 may detect a body direction and a rotation angle of the terminal 500, and the gyro sensor 512 may cooperate with the acceleration sensor 511 to acquire a 3D motion of the account with respect to the terminal 500. The processor 501 may implement the following functions according to the data collected by the gyro sensor 512: motion sensing (such as changing the UI according to a tilt operation of the account), image stabilization while shooting, game control, and inertial navigation.

The pressure sensor 513 may be disposed on a side bezel of the terminal 500 and/or an underlying layer of the touch display screen 505. When the pressure sensor 513 is disposed on the side frame of the terminal 500, the holding signal of the account to the terminal 500 can be detected, and the processor 501 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 513. When the pressure sensor 513 is disposed at the lower layer of the touch display screen 505, the processor 501 controls the operability control on the UI interface according to the pressure operation of the account on the touch display screen 505. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 514 is used for collecting a fingerprint of an account, and the processor 501 identifies the identity of the account according to the fingerprint collected by the fingerprint sensor 514, or the fingerprint sensor 514 identifies the identity of the account according to the collected fingerprint. Upon recognizing that the account is a trusted identity, the processor 501 authorizes the account to perform relevant sensitive operations, including unlocking a screen, viewing encrypted information, downloading software, paying, and changing settings. The fingerprint sensor 514 may be provided on the front, back, or side of the terminal 500. When a physical button or a vendor Logo is provided on the terminal 500, the fingerprint sensor 514 may be integrated with the physical button or the vendor Logo.

The optical sensor 515 is used to collect the ambient light intensity. In one embodiment, the processor 501 may control the display brightness of the touch display screen 505 based on the ambient light intensity collected by the optical sensor 515. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 505 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 505 is turned down. In another embodiment, processor 501 may also dynamically adjust the shooting parameters of camera head assembly 506 based on the ambient light intensity collected by optical sensor 515.

A proximity sensor 516, also referred to as a distance sensor, is typically disposed on the front panel of the terminal 500. The proximity sensor 516 is used to capture the distance between the account and the front face of the terminal 500. In one embodiment, when the proximity sensor 516 detects that the distance between the account and the front face of the terminal 500 gradually decreases, the touch display screen 505 is controlled by the processor 501 to switch from the bright screen state to the dark screen state; when the proximity sensor 516 detects that the distance between the account and the front face of the terminal 500 becomes gradually larger, the touch display screen 505 is controlled by the processor 501 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 5 is not intended to be limiting of terminal 500 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

Fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 600 may include one or more processors (CPUs) 601 and one or more memories 602, where at least one instruction is stored in the memory 602, and is loaded and executed by the processor 601 to implement the methods provided by the foregoing method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory, including instructions executable by a processor in a terminal to perform the method of volume adjustment in the above embodiments is also provided. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of adjusting volume, the method comprising:

2. The method of claim 1, wherein determining the volume level corresponding to each audio segment based on the sample values of the samples in each audio segment comprises:

3. The method of claim 2, wherein determining the volume level for each audio segment based on the number of samples corresponding to each range of values in the audio segment comprises:

4. The method of claim 1, wherein determining the volume adjustment factor corresponding to the target audio based on the volume level corresponding to each audio segment comprises:

5. The method of claim 4, wherein determining the volume adjustment factor corresponding to the target audio based on the number of audio segments corresponding to each volume level comprises:

6. The method according to claim 5, wherein the determining the volume adjustment coefficient corresponding to the target audio based on a magnitude relation between the ratio corresponding to each volume level and a preset ratio comprises:

7. An apparatus for adjusting volume, the apparatus comprising:

8. The apparatus of claim 7, wherein the first determining module is configured to:

9. A computer device comprising a processor and a memory, wherein at least one instruction is stored in the memory, and wherein the instruction is loaded and executed by the processor to perform the operations performed by the method of adjusting volume according to any one of claims 1 to 6.

10. A computer-readable storage medium having stored therein at least one instruction, which is loaded and executed by a processor to perform operations performed by the method of adjusting volume according to any one of claims 1 to 6.