WO2021137536A1

WO2021137536A1 - Method for standardizing volume of sound source, device, and method for display and operation

Info

Publication number: WO2021137536A1
Application number: PCT/KR2020/019120
Authority: WO
Inventors: Jong Hwa Park
Original assignee: Jong Hwa Park
Priority date: 2019-12-30
Filing date: 2020-12-24
Publication date: 2021-07-08

Abstract

Disclosed is a method for standardizing a volume of a sound source. The method for standardizing a volume of a sound source includes: receiving a user command to standardize the volume of the sound source through a UI displayed on a display; estimating a tension of a person who listens to the sound source corresponding to the user command and generating a tension function using the tension; generating a standardization function having variance reduced or increased than the tension function while maintaining the same average as the tension function; and generating an output sound source signal having an envelope in a form of the standardization function.

Description

METHOD FOR STANDARDIZING VOLUME OF SOUND SOURCE, DEVICE, AND METHOD FOR DISPLAY AND OPERATION

The present disclosure relates to a method for standardizing a volume of a sound source, a device, and a method for display.

Worldwide, record profits are gradually decreasing, but the proportion and profits of digital sound sources continue to increase, and are expected to gradually increase in the future. In recent years, as the use of mobile devices such as smartphones increase and networks such as LTE and 5G are expanded, the digital sound source market is being reorganized into streaming services.

Furthermore, the needs of listeners who not only consume music for listening, but also consume music to generate an interior effect that forms the atmosphere of the space or to maintain concentration are increasing. In this case, there is a need for the music to be output so as not to reduce the listener's concentration.

However, the volume of the sound source may vary greatly depending on the recording environment and the editing environment. When the average volume is different for each sound source, it is inconvenient for the user who listens to several sound sources to adjust the volume for each sound source. For example, if the average volume of the sound source released by company A is always higher than the average volume of the sound source released by company B, when a person who has listened to the sound source of the company A at an appropriate volume listens to the sound source of the company B, the volume of the sound source of the company B needs to be adjusted to be increased.

To solve this problem, in the field of the existing audio signal processing, standardization algorithms (Z-scoring, RMS normalization, EBU R128, and the like) that uniformly adjust volumes of multiple sound sources have been used. When such standardization algorithms are used, the amount of change in a volume within one sound source is maintained as it is, and an average volume between multiple sound sources is adjusted to be the same, so the inconvenience to adjust the volume for each sound source may be eliminated.

However, when a sound source is turned on without active manipulation for atmosphere of a space, such as in a cafe or restaurant, the rich volume change within a song may often disturb tasks or conversations occurring in the space. For example, there may be a case in which the volume is turned up because there is little sound in a very quiet section within a sound source, and the volume is turned down because the sound is too loud in a climax section of a song.

In order to solve this problem, research on a method for uniformly adjusting the volumes of each section within a single sound source is necessary, but research for this purpose has not been conducted so far.

Therefore, to match the purpose of spatial music that allows workers or people taking a break to consistently maintain the atmosphere in the space without disturbing their attention or conversation, a new method for uniformly adjusting volumes of several sections within one sound source has been invented.

Furthermore, the user who listens to the sound source needs to make a volume uniform in different ways depending on the time and place. In the past, it is rare to optimize a listening environment according to individual auditory perception levels and situations, or external noise environment, but as this demand increases, there is also a need to provide a simpler and more intuitive UX/UI environment for users to change sound quality appropriate for their situations.

Therefore, in the present disclosure, an intuitive UX/UI that enables operation/control by optimizing the technology to the user's purpose and perception level along with the method for uniformly adjusting a volume has been newly invented. This operation method may be effectively applied not only to music streaming apps, but also to streaming of videos such as movies and dramas, or to a car's audio system.

The present disclosure is to provide a method for standardizing a volume of a sound source, a device, and a method for operation and display thereof.

The problems to be solved by the present disclosure are not limited to the above-mentioned problems, and other problems that are not mentioned may be obviously understood by those skilled in the art from the following description.

A method for standardizing a volume of a sound source performed by an electronic device includes: measuring a range of a volume that a user perceives as comfortable; receiving a user command to standardize the volume of the sound source through a UI displayed on a display and various sensors built into a device (e.g. smartphone, laptop, etc.); estimating a tension of a person who listens to the sound source corresponding to the user command and generating a tension function using the tension; generating a standardization function having variance lower or higher than the tension function while maintaining the same average as the tension function; generating an output sound source signal having an envelope in a form of the standardization function; and automatically adjusting only a volume control device so that the user can perceive a sound source with a uniform volume without applying any correction or modification measures to the received sound source signal.

Other specific details of the present disclosure are contained in the detailed description and drawings.

In accordance with the embodiment of the present disclosure, the user who listens to the sound source may listen to the sound source while being provided with the uniform sound quality.

Furthermore, according to various embodiments of the present disclosure, the user who listens to the sound source may correct the sound quality of the sound source in the desired direction through the intuitive method.

The effects of the present disclosure are not limited to the above-mentioned effects, and other effects that are not mentioned may be obviously understood by those skilled in the art from the following description.

The accompany drawings, which are included to provide a further understanding of the present disclosure and are incorporated in and constitute a part of this specification illustrate embodiments of the present disclosure, together with the description, serve to explain the principles of the present disclosure.

FIG. 1 is a block diagram for describing a configuration of an electronic device according to an embodiment of the present disclosure.

FIG. 2 is a flowchart for explaining a method for standardizing and displaying a volume of a sound source according to an embodiment of the present disclosure.

FIGS. 3A to 3E are exemplary views for explaining a method for displaying an electronic device according to an embodiment of the present disclosure.

FIG. 4 is an exemplary diagram illustrating a graph of functions derived during a process of generating a standardization function over time, according to an embodiment of the present disclosure.

FIG. 5 is an exemplary diagram illustrating a graph according to the passage of time of an output sound source signal whose volume is adjusted according to an embodiment of the present disclosure.

FIG. 6 is a flowchart for explaining a method for standardizing a volume of a sound source according to another embodiment of the present disclosure.

Various advantages and features of the present disclosure and methods for accomplishing them will become apparent from the following description of embodiments with reference to the accompanying drawings. However, the present disclosure is not limited to embodiments to be described below, but may be implemented in various different forms, these embodiments will be provided only in order to make the present disclosure complete and allow those skilled in the art to completely recognize the scope of the present disclosure, and the present disclosure will be defined by the scope of the claims.

Terms used in the present specification are for explaining embodiments rather than limiting the present disclosure. Unless explicitly described to the contrary, a singular form includes a plural form in the present specification. Throughout this specification, the term "comprise" and/or "comprising" will be understood to imply the inclusion of stated constituents but not the exclusion of any other constituents. Like reference numerals refer to like components throughout the specification and "and/or" includes each of the components mentioned and includes all combinations thereof. Although "first", "second", and the like are used to describe various components, it goes without saying that these components are not limited by these terms. These terms are used only to distinguish one component from other components. Therefore, it goes without saying that the first component mentioned below may be the second component within the technical scope of the present disclosure.

Unless defined otherwise, all terms (including technical and scientific terms) used in the present specification have the same meaning as meanings commonly understood by those skilled in the art to which the present disclosure pertains. In addition, terms defined in commonly used dictionaries are not ideally or excessively interpreted unless explicitly defined otherwise.

The term "unit" or "module" used in the specification refers to a software component or a hardware component such as FPGA or ASIC, and the "unit" or "module" performs certain roles. However, the term "unit" or "module" is not intended to be limited to software or hardware. The "unit" or "module" may be configured to be stored in a storage medium that can be addressed or may be configured to regenerate one or more processors. Accordingly, as an example, the "unit" or "module" refers to components such as software components, object-oriented software components, class components, and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables. Components and functions provided within "unit" or "module" may be combined into a smaller number of components and "unit" or "module" or may be further separated into additional components and "unit" or "module".

The spatially relative terms "below", "beneath", "lower", "above", "upper", and the like may be used to easily describe the correlation between one component and other components as illustrated in drawings. The spatially relative terms should be understood as terms including different directions of components during use or operation in addition to the directions illustrated in the drawings. For example, if components illustrated in drawings are turned over, components described as "below" or "beneath" of another component may be placed "above" other components. Therefore, the illustrative term "below" can include both downward and upward directions. The components can also be aligned in different directions, and therefore the spatially relative terms can be interpreted according to the alignment.

In this specification, the computer means all kinds of hardware devices including at least one processor, and can be understood as including a software configuration which is operated in the corresponding hardware device according to the embodiment. For example, the computer may be understood as a meaning including all of smart phones, tablet PCs, desk tops, notebooks, and user clients and applications running on each device, but is not limited thereto.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

Each step described in the present disclosure is described as being performed by a computer, but subjects of each step are not limited thereto, and according to embodiments, at least some of each step can also be performed on different devices.

An electronic device 100 according to the present disclosure may include a camera 110, a processor 120, a display 130, an input unit 140, a microphone 150, a speaker 160, a memory 170, and a communication unit 180.

The camera 110 may photograph a background image. Specifically, the camera 110 may photograph a QR code included in the background image and display the background image and an object by overlapping the background image and the object.

The camera 110 is a component capable of photographing an image as described above. The camera 110 may include a lens, a shutter, a diaphragm, an image sensor, an analog front end (AFE), and a timing generator (TG).

Specifically, a lens (not illustrated) is a component in which light reflected on a subject is incident, and may include at least one of a zoom lens and a focus lens. The shutter (not illustrated) controls the time when light enters an image capturing device 100. The amount of light accumulated in an exposed pixel of an image sensor is determined depending on a shutter speed. The diaphragm (not illustrated) is a component that adjusts the amount of light incident into the image capturing device 100 through the lens. The image sensor (not illustrated) is a component in which an image of a subject passing through a lens is formed.

The image processing unit 170 may process raw image data photographed by the camera 110 to generate YCbCr data. In addition, it is possible to determine an image black level and adjust a sensitivity ratio for each color. In addition, the image processing unit 170 may adjust white balance, and perform gamma correction, color interpolation, color correction, and resolution conversion.

The processor 120 controls a general operation of the electronic device 100.

The processor 120 may include a RAM, a ROM, a main CPU, first to n interfaces, and a bus. At this time, the RAM, the ROM, the main CPU, the first to n interfaces, and the like may be connected to each other via the bus.

The ROM stores a set of instructions and the like for booting a system. When a turn-on command is input and power is supplied, the main CPU copies the O/S stored in the memory 170 to the RAM according to the instruction stored in the ROM, and executes the O/S to boot the system. When the booting is completed, the main CPU copies various application programs stored in the memory 170 to the RAM, and executes the application programs copied to the RAM to perform various operations.

The main CPU accesses the memory 170 to perform the booting using the O/S stored in the memory 170. The main CPU performs various operations using various programs, contents, data, and the like, stored in the memory 170.

The first to nth interfaces are connected to various components described above. One of the interfaces may be a network interface connected to an external server through a network.

The display 130 may be implemented as various types of display panels. For example, the display panel may be implemented by various display technologies such as a liquid crystal display (LCD), an organic light emitting diode (OLED), an active-matrix organic light-emitting diode (AM-OLED), a liquid crystal on silicon (LcoS), digital light processing (DLP), and the like. In addition, the display 130 may also be coupled to at least one of a front area, a side area, and a back area of the electronic device 100 in the form of a flexible display.

The input unit 140 may receive user commands. The input unit 140 may be implemented in various forms. For example, the input unit 140 may be implemented as a touch screen in which the display 130 and a touch sensing unit (not illustrated) are combined. However, the input unit 140 is not limited thereto, but the input unit 140 may include a button or may be constituted by an external remote control, may be constituted by the microphone 150 for voice input, and may perform motion input by combining with the camera 110.

In particular, the input unit 140 may receive a user command for editing a product image. In this case, the input user command may be a long press command, a double tap command, a swipe command, or the like.

The memory 170 may store an operating system (O/S) for driving the electronic device 100. In addition, the memory 170 may store various software programs or applications for operating the electronic device 100 according to various embodiments of the present disclosure. The memory 170 may store various types of information such as various types of data input, set, or generated during execution of the programs or the applications therein.

In addition, the memory 170 may include various software modules for operating the electronic device 100 according to various embodiments of the present disclosure, and the processor 120 may execute various software modules stored in the memory 170 to perform the operation of the electronic device 100 according to various embodiments of the present disclosure.

In addition, the memory 170 may store a space image photographed by the camera 110 and various images received from the outside. To this end, the memory 170 may include a semiconductor memory such as a flash memory, a magnetic storing medium such as a hard disk, or the like.

The communication unit 180 may perform communication with the external device. The communication unit 180 may include various communication chips such as a WiFi chip, a Bluetooth chip, an NFC chip, and a wireless communication chip. Here, the WiFi chip, the Bluetooth chip, and the NFC chip each perform communication in a LAN scheme, a WiFi scheme, a Bluetooth scheme, and an NFC scheme. In the case of using the Wi-Fi chip or the Bluetooth chip, various connection information such as SSID and a session key is first transmitted and received, the communication is connected using the connection information, and then various information may be transmitted and received. The wireless communication chip means a chip performing communication according to various communication protocols such as IEEE, Zigbee, 3rd generation (3G), 3rd generation partnership project (3GPP), and long term evolution (LTE). In particular, the communication unit 180 may receive various information from an external device (for example, a content server that provides a product image, and the like). For example, the communication unit 180 may receive various indoor images, product information, and product images from an external device, and store the received information in the memory 170.

FIG. 2 is a flowchart for explaining a method for standardizing a volume of an electronic device according to an embodiment of the present disclosure.

In step S110, the processor 120 may receive the user command to standardize the volume of the sound source through the UI displayed on the display 130.

In step S120, the processor 120 may estimate a tension of a person listening to the sound source corresponding to the input user command, and generate a tension function by using the estimated tension.

At this time, the tension indicates a degree of tension of a person (listener) who listens to the sound source, and specifically, means the degree of tension perceived by the listener. This tension is estimated based on the volume of the sound source, the degree of sweat secretion of a listener, and/or a degree of change in a size of a pupil. In addition, the electronic device 100 generates the tension function by expressing the estimated tension according to the passage of time. At this time, it may be seen that the tension function represents the average volume of each time section for the corresponding sound source.

In an embodiment, the measurement of sweat secretion may be obtained from a wearable device in which a biosensor is implemented. In addition, the change in the size of the pupil may be acquired through the front camera included in the electronic device 100. Specifically, when the electronic device 100 is unlocked through iris recognition, the change in the size of the pupil may be measured based on iris data used for the unlock.

When the iris recognition is required to unlock the electronic device 100, the change in the size of the pupil may be measured based on the release. More specifically, the electronic device 100 may generate a machine learning model that estimates the tension function by measuring how the tension of the listener changes when a certain sound (sound source) is heard, and then learning data on the analysis result and operate the generated model. According to this embodiment, when the sound source is input, the tension function is generated and output by a pre-built machine learning model. In other words, when the sound source is input, the tension function may be calculated through a cognitive model (machine learning model) that derives the tension to be perceived by the person listening to the sound source as an output. At this time, the cognitive model is being discussed in the field of the artificial intelligence or may be created by any modeling methodology in use.

However, the estimation of the tension according to the present disclosure may be measured through various methods in addition to the degree of sweat secretion and the change in the size of the pupil described above. For example, the electronic device 100 may include a biosensor, and may estimate the tension by acquiring at least one of heart rate information, oxygen saturation information, and stress information through the biosensor.

According to various embodiments, various information for estimating the tension may be collected by a wearable device that communicates with the electronic device 100 and transmitted to the electronic device 100. According to another embodiment of the present disclosure, the tension function may represent an envelope function of the sound source signal for the sound source. In other words, the envelope of the sound source signal over time reflects the tension of the listener for the sound source. Therefore, the envelope function of the sound source signal for the sound source may be used as the tension function of the present disclosure. In this case, the sound source signal refers to a signal representing a change in voltage over time, and the level of voltage represents the level of volume. Therefore, the envelope of the sound source signal means the envelope of the volume.

When using the envelope function of the sound source signal as the tension function, the electronic device 100 extracts the envelope from the initial signal (initial signal, input sound source signal, or original sound source signal) for the sound source instead of step S120 to generate an envelope function for the corresponding sound source. The electronic device 100 extracts an analytic signal using Hilbert transform or fast Fourier transform (FFT) in order to extract an envelope from the initial signal, and then use this signal or may use the Gamma-Filter Bank. Since such an envelope extraction method is a known technique, a detailed description thereof will be omitted.

In step S130, the processor 120 may generate a standardization function having variance reduced or increased than the tension function while maintaining the same average as the tension function.

In step S140, the processor 120 may generate the output sound source signal having the envelope of the standardization function.

In step S140, the electronic device 100 generates the output sound source signal having the shape of the standardization function calculated in step S130 as an envelope. That is, the envelope of the output sound source signal (final sound source signal or standardized sound source signal) generated in this step has the same form as the standardization function. At this time, the output sound source signal is generated by multiplying the standardization function by the sound source signal (initial signal) for the sound source. The envelope of the output sound source signal thus generated has the same average as the envelope of the initial signal, but has an adjusted variance, thereby generating the output sound source signal in which the volume of each section of the initial signal is uniformly adjusted. Meanwhile, the output sound source signal is calculated through the following equation

[Equation 1]

X_final(t) = X_initial(t)*NF(t)

X_final(t): Output sound source signal

X_initial(t): Initial signal

NF(t): Standardization function

FIG. 5 is a diagram illustrating a graph according to the passage of time of an output sound source signal whose volume is adjusted according to an embodiment of the present disclosure.

Referring to FIG. 5, a first graph from the top of FIG. 5 is a graph illustrating an output sound signal (Xfinal(t)) and an envelope signal (NF3a) of the output sound source signal (Xfinal(t)) over the passage of time when 0 < k < 1 (k = k1), and a second graph is a graph illustrating an output sound signal (Xfinal(t)) and an envelope signal (NF3b) of the output sound source signal (Xfinal(t)) over the passage of time when k > 1 (k = k2). In the graph of this figure, an X axis is time (minutes), and a Y axis is amplitude (angstrom) representing the volume of the sound source.

Meanwhile, as illustrated in FIG. 3A, in step S110, the processor 120 may receive a user command through the AVC UI 210 for standardizing the volume of the sound source. The AVC UI 210 may mean an auto volume control UI, but the function is not limited to the name of the UI.

The AVC UI 210 is a UI for activating volume standardization. When a touch is input to the AVC UI 210 illustrated in FIG. 3A, the processor 120 may operate the volume standardization mode.

Meanwhile, there may be various methods for operating the volume standardization mode. In an embodiment, the processor 120 may detect the motion of the electronic device 100 to activate the volume normalization mode. For example, when a motion of shaking the electronic device 100 a preset number of times (for example, three times) is detected, the processor 120 may activate the volume standardization mode. In this case, the processor 120 may additionally determine the degree of shaking of the electronic device 100 before/after the detected motion of shaking the electronic device 100 three times. If the degree of shaking before/after the motion of shaking the electronic device 100 three times is greater than or equal to a preset value, the processor 120 may determine that the motion is not related to the user's intention, and may not activate the volume standardization mode. That is, in order to prevent the volume normalization mode from being automatically activated when the electronic device 100 continuously shakes in an active environment, the processor 120 may determine the motion (shaking) before and after the detected motion to determine whether to activate the volume standardization mode.

In another embodiment, the processor 120 may activate the volume standardization mode based on a user command input to an audio output device communicatively connected to the electronic device 100. For example, the processor 120 may activate the volume standardization mode based on a motion signal touching the audio output device three times. As another example, the processor 120 may activate the volume standardization mode based on a motion signal dragging the audio output device. In this case, the processor 120 may apply a different volume standardization mode according to the drag direction. That is, when the dragging direction is from top to bottom, the first volume standardization mode may be activated, and when the drag direction is from left to right, the second volume standardization mode may be activated. Furthermore, the processor 120 may deactivate the volume standardization mode when receiving a drag input in a direction opposite to the volume standardization mode activation direction. That is, if the drag direction is from the bottom to the top, the first volume standardization mode may be deactivated, and if the drag direction is from the right to the left side, the second volume standardization mode may be deactivated. The range is not limited to the pattern of the drag mentioned in the present embodiment, and this function may be controlled not only with the mentioned straight pattern but also with a circle, a zigzag, a V-letter, and the like. In another embodiment, it is also possible to adjust the volume normalization mode using two or three or more fingers.

In another embodiment, the processor 120 may change the volume standardization mode according to a user command input to a sound adjustment unit provided in the electronic device 100. For example, the processor 120 may activate the volume standardization mode based on a user input of continuously clicking a volume control button a preset number of times (for example, three times) and a motion signal inputting a volume control button in a preset pattern. As another example, the processor 120 may activate the volume standardization mode by continuously pressing the volume decrease button (or increase button) a preset number of times (for example, three times), or by pressing the volume reduction button by applying a preset rhythm such as a Morse code (long-short-short) to a preset number of times.

As another embodiment, the processor 120 may determine a user motion based on the image acquired through the camera 110 to activate the volume standardization mode. For example, the processor 120 may activate the volume standardization mode if it is determined that the user's finger is on his or her lips based on the step of determining a user's face image, the step of determining the lips among the determined face image, and the step of determining whether the finger is positioned in a preset direction on the determined lips. As another example, the processor 120 may activate the volume standardization mode by analyzing that the user's eyes blink. At this time, in order to distinguish between physiological blinking and blinking for volume standardization, the processor 120 may activate the volume standardization mode only when the user's eyes blink a preset number of times (for example, 4 times) and a preset period (for example, eyes are closed for 0.5 seconds or more).

As another example, the processor 120 may control the volume standardization mode based on a voice command. For example, when receiving an utterance such as "turn on/off a BGM mode (background music mode)", the processor 120 may activate/deactivate the volume standardization mode.

The above-described embodiments provide the methods for activating or deactivating a volume standardization mode (automatic volume control mode) using various sensors built into a device that streams sound signals from the electronic device 100 or adjusting detailed parameters of the corresponding mode, and describe the methods for operating the present function without confusion by setting patterns that are uniquely distinguishable from the operation pattern that users already use in general and can be easily remembered, but the operation pattern or the driving method is not limited to these embodiments. Therefore, by applying new sensors and systems of devices such as smart watches, smart glasses, automobiles, and artificial intelligence speakers as well as smartphones, an operation macro may be variously defined, and it is possible to define an appropriate macro pattern in consideration of systems, UIs, and UXs of specific software (e.g. music streaming apps), and furthermore, a method for setting a user's own macro pattern is also possible.

Meanwhile, when the volume standardization mode is activated, as illustrated in FIG. 3B, an indicator 220 may be displayed on the volume control UI. The indicator 220 serves to adjust the volume of the sound source to correspond to the activated volume standardization mode. Specifically, as illustrated in FIG. 3B, the indicator 220 may increase (221) or decrease (222) the volume of the sound source according to the volume standardization mode. FIG. 3B illustrates that the indicator 200 moves and displays the volume control of the sound source, but it goes without saying that the volume control button moves instead of the indicator 220 to adjust the volume of the sound source.

According to the above-described embodiment of the present disclosure, the present disclosure has a technical feature of not only performing the volume standardization process of the sound source by a method for processing an original sound source to generate a new standardized sound source, but also outputting a sound source by controlling the volume of the original sound source in a music application. In other words, since the present disclosure does not process the original sound source and changes only the volume of the sound source, no modification measures for the original sound source occur, which means that the user act of randomly turning up and down the volume for a certain volume is automated. As a result, the service may be provided without any modification or change of works protected by the copyright law. In other words, there is an effect that the service provider can also avoid legal liability such as violation of copyright law.

Meanwhile, according to various embodiments of the present disclosure, the user may change the target volume by adjusting the indicator 220 displayed on the display 130. That is, when the user command for changing the position of the indicator 220 is input, the processor 120 may standardize the volume based on the volume value corresponding to the changed position of the indicator 220.

In another embodiment, as illustrated in FIG. 3C, in step S110, the processor 120 receives a user command through the first UI 310 for standardizing the volume of the sound source, and when the user command is input through the first UI 310, as illustrated in FIG. 3D, a plurality of mode UIs 311 to 313 supporting a plurality of volume standardization modes for a sound source may be displayed on the display 120.

At this time, the plurality of mode UIs is one of a second UI 311 supporting volume standardization corresponding to a BGM mode, a third UI 312 supporting volume standardization corresponding to a study mode, and a fourth UI 313 supporting volume standardization corresponding to a concentration mode. Each mode may have characteristics depending on the detailed parameter value of the present disclosure, and may be named as a new mode without any limitations according to the user's demand.

When a user command for any one of the second UI to fourth UI 311 to 313 is input, the electronic device 100 may estimate the tension according to the input mode and generate a tension function for the estimated tension. That is, even if the tension for the user who listens to the sound source is estimated to be the same, the electronic device 100 may generate different tension functions according to the volume standardization mode.

In another embodiment, when a user command for any one of the second UI to fourth UI 311 to 313 is input, the electronic device 100 may estimate the tension of the user and generate the same tension function based on the estimated tension. Thereafter, the electronic device 100 may generate different standardization functions according to the input mode. That is, the electronic device 100 may generate the same tension according to the tension of the user who listens to the sound source, but may generate different standardization functions according to the mode. Specifically, the electronic device 100 may generate different standardization functions from the same tension function through the process of adjusting the variance value.

Meanwhile, according to another embodiment of the present disclosure, the processor 120 may determine priorities of each of the second UI to the fourth UI, and may arrange and display a plurality of UIs according to the priorities.

Specifically, when the priority is determined in the order of the third UI 312, the second UI 311, and the fourth UI 313, the processor 120 may display the UI in the order of the third UI 312, the second UI 311, and the fourth UI 313 as illustrated in FIG. 3E.

In this case, the processor 120 may determine the priority based on the volume standardization history of the user who listens to the sound source. For example, the processor 120 may set the priority in the order of the volume standardization mode most used by the user who listens to the sound source.

However, the present disclosure is not limited thereto, and it goes without saying that the processor 120 may set the priority by analyzing the surrounding environment information. For example, the processor 120 may set priorities among a plurality of UIs based on ambient noise information input through the microphone 150. That is, the processor 120 acquires the ambient noise information as a decibel value, and when the decibel value is less than or equal to a preset first value, the priority may be set in the order of the third UI 312, the second UI 311, and the fourth UI 313 and when the decibel value is greater than or equal to the preset second value, the priority may be set in the order of the fourth UI 313, the second UI 311, and the third UI 312.

Alternatively, when the decibel value exceeds a preset first value or less than a preset second value, the priority of the second UI 311 may be set to be the highest. In this case, the priority between the third UI 312 and the fourth UI 313 may be determined based on the volume standardization history of the user who listens to the above-described sound source.

The priority determined according to the various embodiments described above may be used to standardize a volume corresponding to the highest priority UI among the plurality of UIs when the user command is not input through the plurality of UIs.

That is, even when the user listening to the sound source does not input the user command for the volume standardization mode, the processor 120 may automatically perform appropriate volume standardization based on the priority.

Meanwhile, according to another embodiment of the present disclosure, the sound source standardization mode may be set for a plurality of sound sources to be played, but may be set for a specific sound source.

Specifically, the processor 120 may set the same first volume standardization mode for the plurality of sound sources to be played. That is, when the user command for selecting the sound source standardization mode is input, the processor 120 may set the same first volume standardization mode for the entire list including the sound sources.

However, for a specific sound source or a sound source of a specific genre, there may be cases where the user wants to set the volume standardization mode differently. Accordingly, the processor 120 may set a second volume standardization mode corresponding to a single sound source for at least some of the plurality of sound sources to be played. That is, when the user selects a specific song and sets the second volume standardization mode, the processor 120 may apply the second volume standardization mode to the specific song.

In conclusion, while the electronic device 100 is playing a sound source corresponding to the first volume standardization mode, when the playback of a single sound source to which the second volume standardization mode is applied or a sound source of a specific genre starts, the processor 120 may apply the second volume standardization mode to the single sound source.

Meanwhile, as illustrated in FIGS. 3F to 3I, when the user command is input based on the UI for changing the uniformity of the volume of the sound source, the processor 120 may uniformize the volume of the sound source corresponding to the user command.

Specifically, as illustrated in FIG. 3F, the display 130 may display a fifth UI 314 together with the second UI 311 to the fourth UI 313. When a user command is input through the fifth UI 314, the processor 120 may display a setting window for making the volume uniform as illustrated in FIG. 3G or 3H.

Specifically, when a user command for moving a sixth UI 410 illustrated in FIG. 3G or a seventh UI 420 illustrated in FIG. 3H is input, the processor 120 may perform the uniformity of the volume to correspond to the moved position of the sixth UI 410 or the seventh UI 420. The embodiment of the present disclosure describes that the volume is uniformly corrected as the sixth UI 410 or the seventh UI 420 moves to the right, but the moving direction of the UI for uniformity may be applied in all of up, down, left, and right directions. Even if the sixth UI 410 and the seventh UI 420 are not used, the uniformity of the volume may be controlled by a method for dragging the sound signal and the envelope directly up and down or a method for pinching in/out a gap between high and low points of the sound signal and the envelope using two fingers.

On the other hand, when the degree of uniformity of the volume is set according to the movement of the sixth UI 410 illustrated in FIG. 3G, the processor 120 may standardize the volume within a range 411 corresponding to the degree of uniformity of the volume, as illustrated on the left side of FIG. 3I, and when the degree of uniformity of the volume is set according to the movement of the seventh UI 410 illustrated in FIG. 3G, the processor may standardize the volume within a range 412 corresponding to the degree of uniformity of the volume as illustrated on the right side of FIG. 3I. As illustrated in FIG. 3I, the greater the degree of uniformity of the volume, the greater the degree of change.

Meanwhile, according to the movement of the sixth UI 410 or the seventh UI 420, the processor 120 may perform the process of making the volume uniform by changing a variance value (Var[NF2(t)]) of a second-order standardization function to be described later. In this case, as the variance value approaches 0, the degree of uniformity of the volume increases, and when the variance value is greater than 1, the degree of change of the volume becomes larger than that of the original sound source signal. Therefore, preferably, the variance value is preferably between 0 and 1, but it is possible to reinterpret and output the original sound source signal by applying a variance value greater than 1 according to the type of the volume standardization mode.

Meanwhile, the above-described step S130 includes generating a first-order standardization function by adjusting the average of the tension function to 0 (S131), generating a second-order standardization function by multiplying a constant greater than 0 and less than 1 or a constant greater than 1 by the first-order standardization function (S132), generating a third-order standardization function by adding the average of the tension function to the second-order standardization function, and/or generating the standardization function by dividing the third-order standardization function by the tension function (S134). That is, the electronic device 100 generates a standardization function having variance reduced or increased than the tension function while maintaining the same average as the tension function estimated in step S120 through steps S131 to S134 described above.

In step (S131), the electronic device 100 generates a first-order standardization function as a new function by adjusting the average of the tension function (which may be an envelope function according to the embodiment) to 0. In this case, the first-order standardization function may be calculated through the following equation.

[Equation 2]

NF1(t) = Env(t) - E[Env(t)]

NF1(t): First-order standardization function

Env(t): Tension function or envelope function

E[Env(t)]: Average of tension function

In step S132, the electronic device 100 generates a second-order standardization function by multiplying a constant greater than 0 and less than 1 or a constant greater than 1 by the first-order standardization function calculated in step S131. This is to make the variance of the first-order standardization function small or large. In other words, the electronic device 100 generates a second-order standardization function by multiplying any constant between 0 and 1 when trying to narrow the variance of the first-order standardization function, or multiplying any constant between 1 and infinity when trying to widen the variance of the first-order standardization function. In this case, the second-order standardization function may be calculated through the following equation.

[Equation 3]

NF2(t) = k*NF1(t)

NF2(t): Second-order standardization function

k: Any constant (0 < k < 1 or k > 1)

NF1(t): First-order standardization function

In step S133, the electronic device 100 generates the third-order standardization function by adding the average of the tension function to the second-order standardization function calculated in step S132. In this case, the third-order standardization function may be calculated through the following equation.

[Equation 4]

NF3(t) = NF2(t) + E[Env(t)]

NF3(t): Standardization function

NF2(t): Second-order standardization function

E[Env(t)]: Average of tension function

In step S134, the electronic device 100 finally generates the standardization function by dividing the third-order standardization function calculated in step S133 by the tension function. The standardization function generated in this step is the same as the volume envelope (or tension function) of the final sound source signal output after the standardization process suggested in the present disclosure. In this case, the standardization function may be calculated through the following equation.

[Equation 5]

NF(t) = NF3(t)/Env(t)

NF(t): Standardization function

NF3(t): Third-order standardization function

Env(t): Tension function

Referring to FIG. 4, a first graph from the top of FIG. 4 is a graph illustrating an initial signal and an envelope signal (envelope function signal, envelope) of the initial signal according to the passage of time, a second graph is a graph illustrating an envelope signal (env(t)) of the initial signal and a signal (NF1) of the first-order standardization function according to the passage of time, a third graph is a graph illustrating a second-order standardization function (NF2a) when 0 < k < 1 (k = k1) and a second-order standardization function (NF2b) when k > 1 (k = k2) according to the passage of time, a fourth graph is a graph illustrating a third-order standardization function (NF3a) when 0 < k < 1 (k = k1) and a third-order standardization function (NF3b) when k > 1 (k = k2) according to the passage of time, and a fifth graph is a graph illustrating the standardization function (NFa) when 0 < k < 1 (k = k1) according to the passage of time. In the graph of this figure, an X axis is time (minutes), and a Y axis is amplitude (angstrom) representing the volume of the sound source.

According to another embodiment of the present disclosure, the standardization function generated in step S130 may be generated by dividing the average of the tension function (which may be an envelope function according to an embodiment) by the tension function. According to the method of the present embodiment, since the variance of the standardization function becomes 0 with the advantage of being very simple in calculation, there is a limitation that the variance of the standardization function may not be adjusted to a value other than 0 by the method. According to the present embodiment, the standardization function may be calculated through the following equation.

[Equation 6]

NF(t) = E[Env(t)]/Env(t)

NF(t): Standardization function

E[Env(t)]: Average of tension function

Env(t): Tension function or envelope function

On the other hand, according to another embodiment of the present disclosure, when the sound source signal for the sound source is received by an online streaming method, the processor 120 may cause the average and variance of the tension function to be generated in the step of generating the tension function, prior to the above-described step S120, and generate the tension function and the standardization function whenever a sound source signal of a certain length is received to generate the output sound source signal.

Specifically, according to another embodiment of the present disclosure, the sound source signal for the sound source may be received by the online streaming method. In this case, the volume standardization method further includes a step of pre-setting the an average of the tension function to be generated in step S120 before step S120, and afterwards, the electronic device 100 may perform steps S120 to S140 whenever a sound source signal of a certain length is received to correct the volume of the sound source in real time. That is, the electronic device 100 may preset the average of the targeted tension function to fix the volume to be finally adjusted, and then analyze the initial signal received in real time to correct the volume. As a result, the electronic device 100 may correct the volume in real time even in a situation in which all data of the sound source signal is not known until the end of the sound source.

The method for standardizing a volume may include (A) estimating a tension of a person listening to a sound source and generating a tension function using the tension (S100), (B) generating a standardization function having variance reduced or increased than the tension function while maintaining the same average as the tension function (S200) and/or (C) generating an output sound source signal having an envelope in the form of the standardization function (S300).

Although exemplary embodiments of the present disclosure have been described with reference to the accompanying drawings, those skilled in the art to which the present disclosure belongs will appreciate that various modifications and alterations may be made without departing from the spirit or essential feature of the present disclosure. Therefore, it is to be understood that exemplary embodiments described hereinabove are illustrative rather than being restrictive in all aspects.

Claims

A method for standardizing a volume of a sound source performed by an electronic device, comprising:

receiving a user command to standardize the volume of the sound source;

estimating a tension of a person who listens to the sound source corresponding to the user command and generating a tension function using the tension;

generating a standardization function having variance reduced or increased than the tension function while maintaining the same average as the tension function; and

generating an output sound source signal having an envelope in a form of the standardization function.
The method of claim 1, wherein the user command is a user command for standardizing the volume of the sound source through a UI displayed on a display,

the receiving of the user command includes:

receiving a user command through a first UI 310 to standardize the volume of the sound source; and

displaying, on the display, a plurality of mode UIs supporting a plurality of volume standardization modes for the sound source when the user command is input through the first UI 310, and

the plurality of mode UIs is one of a second UI supporting volume standardization corresponding to a BGM mode, a third UI supporting volume standardization corresponding to a study mode, and a fourth UI supporting volume standardization corresponding to a concentration mode.
The method of claim 2, wherein the generating of the tension function includes estimating a tension corresponding to the mode of the UI, and generating the tension function using the tension, when the user command is input to any one of the plurality of mode UIs.
The method of claim 2, wherein the displaying of the plurality of mode UIs includes:

determining priorities of each of the second UI to the fourth UI; and

arranging and displaying the plurality of UIs according to the priority, and

the determining of the priorities is determined based on a volume standardization history of the user who listens to the sound source, and

the receiving of the user command includes:

performing a volume standardization corresponding to the highest priority UI among the plurality of UIs, when the user command is not input through the plurality of UIs.
The method of claim 2, further comprising:

setting the same first volume standardization mode as the plurality of sound sources to be played;

setting a second volume standardization mode corresponding to a single sound source for at least some of the plurality of sound sources to be played;

setting a second volume standardization mode corresponding to at least some of the plurality of sound sources to be played or a sound source of a specific genre; and

applying the second volume standardization mode to a single sound source while the electronic device is playing the sound source corresponding to the first volume standardization mode or applying the single source to a second volume standardization mode when the sound source of the specific genre starts playing.
The method of claim 1, wherein the generating the standardization function includes:

generating a first-order standardization function by adjusting the average of the tension function to 0;

generating a second-order standardization function by multiplying the first-order standardization function by a constant greater than 0 and less than 1 or a constant greater than 1;

generating a third-order standardization function by adding the average of the tension function to the second-order standardization function; and

dividing the third-order standardization function by the tension function and generating the standardization function, and

in the generating the second-order standardization function,

the first-order standardization function is multiplied by a constant greater than 0 and less than 1 in order to reduce the variance value of the first-order standardization function, and the first-order standardization function is multiplied by a constant greater than 1 in order to increase the variance value of the first-order standardization function.
The method of claim 1, wherein the tension function is an envelope function of the sound source signal for the sound source,

the standardization function is generated by dividing the average of the tension function by the tension function,

the output sound source signal is generated by multiplying the standardization function by the sound source signal for the sound source, and

the tension is estimated based on a degree of sweat secretion or a degree of change in a size of a pupil of a person listening to the sound source, and a correlation between the estimated tension and the degree of sweat secretion or the degree of change in the size of the pupil of the person listening to the sound source is learned through artificial intelligence and is used for the tension estimation.
The method of claim 1, further comprising:

setting the average and variance of the tension function to be generated in the generating of the tension function prior to the generating of the tension function, when the sound source signal for the sound source is received through an online streaming method,

wherein whenever a sound source signal of a certain length is received, the generating of the tension function, the generating of the standardization function, and the generating of the output sound source signal are sequentially performed to correct the volume of the sound source in real time.
The method of claim 1, wherein the user command is at least one of a motion command to standardize the volume of the sound source, a voice command to standardize the volume of the sound source, a physical button input command to standardize the volume of the sound source, and a macro command of a volume control button to standardize the volume of the sound source.
A method for standardizing a volume of a sound source performed by an electronic device, comprising:

receiving a user command through a first UI 310 to standardize the volume of the sound source; and

displaying, on the display, a plurality of mode UIs supporting a plurality of volume standardization modes for the sound source when the user command is input through the first UI 310, and

performing volume standardization corresponding to the input user command when the user command is input through the plurality of UIs,

wherein the plurality of mode UIs is one of a second UI supporting volume standardization corresponding to a BGM mode, a third UI supporting volume standardization corresponding to a study mode, and a fourth UI supporting volume standardization corresponding to a concentration mode.
A device comprising:

a memory that stores one or more instructions; and

a processor that executes the one or more instructions stored in the memory,

wherein the processor executes the one or more instructions to perform the method of claim 1.