KR101748039B1 - Sampling rate conversion method and system for efficient voice call - Google Patents

Sampling rate conversion method and system for efficient voice call Download PDF

Info

Publication number
KR101748039B1
KR101748039B1 KR1020150154083A KR20150154083A KR101748039B1 KR 101748039 B1 KR101748039 B1 KR 101748039B1 KR 1020150154083 A KR1020150154083 A KR 1020150154083A KR 20150154083 A KR20150154083 A KR 20150154083A KR 101748039 B1 KR101748039 B1 KR 101748039B1
Authority
KR
South Korea
Prior art keywords
electronic device
aliasing
voice
filter
selecting
Prior art date
Application number
KR1020150154083A
Other languages
Korean (ko)
Other versions
KR20170052090A (en
Inventor
이동원
Original Assignee
라인 가부시키가이샤
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 라인 가부시키가이샤 filed Critical 라인 가부시키가이샤
Priority to KR1020150154083A priority Critical patent/KR101748039B1/en
Publication of KR20170052090A publication Critical patent/KR20170052090A/en
Application granted granted Critical
Publication of KR101748039B1 publication Critical patent/KR101748039B1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)

Abstract

A sampling rate conversion method and system for efficient voice communication is disclosed. A computer program stored on a medium may be provided to perform the sampling rate conversion method in combination with a computer implementing the electronic device. Here, the sampling rate conversion method may include: transmitting and receiving a data packet through a network in the electronic device, and performing a voice call; converting frequency characteristics of a voice signal input during the voice call progression into a plurality of bands And selecting one of a plurality of anti-aliasing filters based on the energy of each of the plurality of bands in the electronic device, and using the selected anti-aliasing filter And processing the down-sampling for the input speech signal.

Description

TECHNICAL FIELD [0001] The present invention relates to a sampling rate conversion method and system for efficient voice communication,

The following description relates to a sampling rate conversion method and system for efficient voice communication.

A sampling rate converter (SRC) is used to change the sampling frequency of a digital signal. For example, Korean Patent Laid-Open No. 10-2008-0098530 discloses a digital domain sampling rate converter. In the conventional SRC, the aliasing prevention filter and the imaging prevention filter have narrow band (NB) of 300 to 3400 Hz in the pass band, so that the sound quality is not deteriorated even with a small calculation amount. It is preferable that such SRC does not burden the system by reducing the amount of calculation so that it is always used as a module at the end of a communication terminal for voice communication. However, the recent voice call service is provided in the wide band (WB) of 70-7000 Hz. In such a wide band, since a high frequency band and a high sampling rate are used, a filter requiring a large amount of calculation is required in order not to cause deterioration of sound quality.

Conventionally, reducing the amount of computation is prioritized, so that sound quality degradation due to erroneous selection of SRC filters (anti-aliasing filters and anti-imaging filters) in a narrowband voice call service occurs. In addition, among voice processing used in Voice over Internet Protocol (VoIP), sound quality damage due to SRC filter is dominant in the dominant state.

For example, the anti-aliasing filter of the transmission side (Tx) SRC changes the frequency of an input voice signal, and thus is the portion where sound quality deterioration occurs first during a call. Generally, anti-aliasing filters are composed of N kinds according to the amount of computation and performance. For example, the N anti-aliasing filters may be used for various frequencies ranging from a filter for covering a band of 200-5000 Hz (hereinafter referred to as a first filter) to a filter for covering a band of 70-7000 Hz (hereinafter referred to as a second filter) A plurality of communication terminals can be implemented in each communication band. More specifically, in the case of using the first filter, the calculation amount is about 10% of the case of using the second filter, but the sound quality is distinguishable (for example, a Mean Opinion Score (MOS) of 0.3 or more) The use of the second filter increases the load of the system of the communication terminal due to a large amount of computation and increases the battery consumption of the communication terminal. In particular, in the prior art, (Anti-aliasing filter and anti-imaging filter) until the end of the call, the problem of continuous sound quality degradation, system load increase, and battery consumption increase repeatedly.

References: <PCT / KR / 2014/010167, US20140019540A1, US20130332543A1, US20130260893>

The anti-aliasing filter of the transmission side (Tx) sampling rate converter (SRC) and the anti-imaging filter of the reception side (Rx) SRC each operate efficiently (for example, And a sampling rate conversion method and system for controlling the sampling rate to be controlled.

A computer program stored in a medium for executing a sampling rate conversion method in combination with a computer embodying an electronic device, the sampling rate conversion method comprising the steps of: transmitting and receiving a data packet through a network in the electronic device ; Analyzing a frequency characteristic of a voice signal input during the voice communication in the electronic device for each of a plurality of bands and calculating energy for each of the plurality of bands; Selecting one of a plurality of anti-aliasing filters based on the plurality of band-specific energies in the electronic device; And processing the down-sampling of the input speech signal using the selected anti-aliasing filter.

A method of converting a sampling rate of an electronic device, the method comprising the steps of: transmitting and receiving data packets through a network in the electronic device to conduct a voice call; Analyzing a frequency characteristic of a voice signal input during the voice communication in the electronic device for each of a plurality of bands and calculating energy for each of the plurality of bands; Selecting one of a plurality of anti-aliasing filters based on the plurality of band-specific energies in the electronic device; And processing the downsampling for the input speech signal using the selected anti-aliasing filter.

A method of converting a sampling rate of an electronic device, the method comprising the steps of: transmitting and receiving data packets through a network in the electronic device to conduct a voice call; And selecting one of a plurality of anti-aliasing filters at predetermined time intervals in the voice call using the frequency characteristics of the voice signal input during the voice call progression Rate conversion method.

The anti-aliasing filter of the transmission side (Tx) sampling rate converter (SRC) and the anti-imaging filter of the reception side (Rx) SRC each operate efficiently (for example, Can be controlled.

1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention.
2 is a block diagram illustrating an internal configuration of an electronic device and a server according to an embodiment of the present invention.
FIG. 3 is a diagram illustrating an example of components that a processor of an electronic device according to an embodiment of the present invention may include.
4 is a flowchart illustrating an example of a sampling rate conversion method that can be performed by an electronic device according to an embodiment of the present invention.
5 is a diagram illustrating an example of a component that a processor of an electronic device according to an embodiment of the present invention may further include.
FIG. 6 is a flowchart illustrating an example of steps that the electronic apparatus according to an embodiment of the present invention may further include a sampling rate conversion method that can be performed.
7 is a block diagram of logical components that an electronic device according to an embodiment of the present invention may include.
8 is a graph showing an example of a graph showing an accumulated value of energy for each band in an embodiment of the present invention.

Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention. 1 shows an example in which a plurality of electronic devices 110, 120, 130, 140, a plurality of servers 150, 160, and a network 170 are included. 1, the number of electronic devices and the number of servers are not limited to those shown in FIG.

The plurality of electronic devices 110, 120, 130, 140 may be a fixed terminal implemented as a computer device or a mobile terminal. Examples of the plurality of electronic devices 110, 120, 130 and 140 include a smart phone, a mobile phone, a navigation device, a computer, a notebook, a digital broadcast terminal, a PDA (Personal Digital Assistants) ), And tablet PCs. For example, the electronic device 1 110 may communicate with other electronic devices 120, 130, 140 and / or the servers 150, 160 via the network 170 using a wireless or wired communication scheme.

The communication method is not limited, and may include a communication method using a communication network (for example, a mobile communication network, a wired Internet, a wireless Internet, a broadcasting network) that the network 170 may include, as well as a short-range wireless communication between the devices. For example, the network 170 may be a personal area network (LAN), a local area network (LAN), a campus area network (CAN), a metropolitan area network (MAN), a wide area network (WAN) , A network such as the Internet, and the like. The network 170 may also include any one or more of a network topology including a bus network, a star network, a ring network, a mesh network, a star-bus network, a tree or a hierarchical network, It is not limited.

Each of the servers 150 and 160 is a computer device or a plurality of computers that communicate with a plurality of electronic devices 110, 120, 130 and 140 through a network 170 to provide commands, codes, files, Lt; / RTI &gt; devices.

In one example, the server 160 may provide a file for installation of the application to the electronic device 1 (110) connected via the network 170. [ In this case, the electronic device 1 (110) can install an application using a file provided from the server (160). The server 150 is connected to the server 150 according to the control of the operating system (OS) and at least one program (for example, the browser or the installed application) I can receive contents. For example, when the electronic device 1 (110) transmits a service request message to the server 150 via the network 170 under the control of the application, the server 150 transmits a code corresponding to the service request message to the electronic device 1 The first electronic device 110 can provide contents to the user by displaying and displaying a screen according to the code according to the control of the application. As another example, the server 150 may establish a communication session for the messaging service and route the message transmission / reception between the plurality of electronic devices 110, 120, 130, 140 through the established communication session.

2 is a block diagram illustrating an internal configuration of an electronic device and a server according to an embodiment of the present invention. In FIG. 2, an internal configuration of the electronic device 1 (110) as an example of one electronic device and the server 150 as an example of one server will be described. Other electronic devices 120, 130, 140 or server 160 may have the same or similar internal configurations.

The electronic device 1 110 and the server 150 may include memories 211 and 221, processors 212 and 222, communication modules 213 and 223 and input / output interfaces 214 and 224. The memories 211 and 221 may be a computer-readable recording medium and may include a permanent mass storage device such as a random access memory (RAM), a read only memory (ROM), and a disk drive. The memory 211 and 221 may store an operating system and at least one program code (for example, a code for a browser installed in the electronic device 1 (110) or the above-described application). These software components may be loaded from a computer readable recording medium separate from the memories 211 and 221 using a drive mechanism. Such a computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD / CD-ROM drive, and a memory card.

Processors 212 and 222 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input / output operations. The instructions may be provided by the memory 211, 221 to the processors 212, 222. For example, the processor 212, 222 may be configured to execute a command received in accordance with a program code stored in a recording device, such as the memory 211, 221.

The communication modules 213 and 223 may provide functions for the electronic device 1 110 and the server 150 to communicate with each other through the network 170 and may provide functions for communicating with other electronic devices (for example, the electronic device 2 120) Or to communicate with another server (e.g., server 160). For example, when the processor 212 of the electronic device 1 110 receives a request (e.g., a streaming service request for the content) generated in accordance with a program code stored in a recording device such as the memory 211, To the server 150 via the network 170 in accordance with the &lt; / RTI &gt; Conversely, control signals, commands, contents, files, and the like provided under the control of the processor 222 of the server 150 are transmitted to the communication module 223 of the electronic device 110 via the communication module 223 and the network 170 213 to the electronic device 1 (110). For example, control signals and commands of the server 150 received through the communication module 213 may be transmitted to the processor 212 or the memory 211, May be stored as a storage medium that may further include a &lt; RTI ID = 0.0 &gt;

The input / output interfaces 214 and 224 may be means for interfacing with the input / output device 215. For example, the input device may include a device such as a keyboard or a mouse, and the output device may include a device such as a display for displaying a communication session of the application. As another example, the input / output interface 214 may be a means for interfacing with a device having integrated functions for input and output, such as a touch screen. More specifically, the processor 212 of the electronic device 1 (110) uses the data provided by the server 150 or the electronic device 2 (120) in processing commands of the computer program loaded in the memory 211 A service screen or contents can be displayed on the display through the input / output interface 214. [

Also, in other embodiments, electronic device 1 110 and server 150 may include more components than the components of FIG. However, there is no need to clearly illustrate most prior art components. For example, electronic device 1 110 may be implemented to include at least a portion of input / output devices 215 described above, or may be implemented with other components such as a transceiver, Global Positioning System (GPS) module, camera, Elements.

In this embodiment, the server 150 may be a system for providing a VoIP service to communication terminals connected through a network. At this time, the electronic devices 110 and 120 may be communication terminals that receive the VoIP service from the server 150 via the network. For example, the server 150 may establish a call session for the electronic device 1 110 and the electronic device 2 120. For example, In this case, the electronic device 1 (110) and the electronic device 2 (120) transmit and receive data packets through the established call session so that the voice call can proceed. In VoIP communication, there may exist a transmitting side and a receiving side from the viewpoint of one packet data. At this time, one terminal system may be a transmitting side (Tx) and a receiving side (Rx). For example, electronic device 1 (110) may include both components for a transmitting side and components for a receiving side.

FIG. 3 is a diagram illustrating an example of a component that a processor of an electronic device according to an embodiment of the present invention can include; FIG. 4 is a diagram illustrating an example of a sampling rate Fig. 3 is a flowchart showing an example of a conversion method. Fig.

3, the processor 212 of the electronic device 110 includes components such as a voice call progress controller 310, an energy calculator 320, an anti-aliasing filter selector 330, and a down- (340). The processor 212 and the components of the processor 212 may control the first electronic device 110 to perform the steps 410 through 450 included in the sampling rate conversion method of FIG. At this time, the components of the processor 212 and the processor 212 may be implemented to execute instructions according to the code of the operating system and the code of at least one program that the memory 211 contains. Here, the components of the processor 212 may be representations of different functions performed by the processor 212. For example, the voice call progress control 310 may be used as a functional representation in which the processor 212 operates to control the progress of a voice call in accordance with the above-described command.

In step 410, the processor 212 may load the program code stored in the file of the application for the sampling rate conversion method into the memory 211. [ For example, the application may be a program installed in the electronic device 1 (110) for providing VoIP communication. The file of this application can be downloaded to the electronic device 1 110 through the separate file distribution server (for example, the server 160) connected to the electronic device 1 110 via the network, To install the application on the network. When the application installed in the electronic device 1 (110) is executed, the processor 212 can load the program code from the file of the application into the memory 211. [

At this time, the voice call progress control unit 310, the energy calculation unit 320, the anti-aliasing filter selection unit 330, and the down-sampling processing unit 340, which are included in the processor 212 and the processor 212, ) To execute subsequent steps (420 and 450). For the execution of steps 420 and 450, the components of processor 212 and processor 212 may control electronic device 1 110. For example, the processor 212 controls the communication module 213 included in the electronic device 1 110 so that the electronic device 1 110 can communicate with at least one of the server 150 and other electronic devices 2 (120)). &Lt; / RTI &gt; As another example, the processor 212 may control the electronic device 1 110 to store the program code in the memory 211 in loading the program code into the memory 211.

In step 420, the voice call progress control unit 310 may control the first electronic device 110 to transmit and receive data packets through the network to proceed with voice communication. For example, the electronic device 1 (110) communicates with the server 150 under the control of the voice call progress control unit 310 and performs voice communication with the electronic device 2 (120) through a call session set by the server 150 . Voice call technologies using VoIP communication are already well known, so a detailed description is omitted.

In operation 430, the energy calculator 320 may calculate energy for each of a plurality of bands by analyzing the frequency characteristics of the voice signal input during the voice communication in each of a plurality of bands. The frequency distribution of human voice varies depending on various characteristics such as sex, vocal length, and anatomical structure of Songdo. For example, when analyzing the spectrum of voice, it is found that male is distributed in low band (4 kHz) or less, and female band is distributed not only in low band but also in high band. Also, the spectrum is distributed differently depending on the speech characteristic of the language. The energy calculation unit 320 divides the frequency characteristic of the input audio signal into a plurality of bands (for example, M (M is a natural number of 2 or more) bands), and calculates energy for each band, which is energy of each band, .

In step 440, the anti-aliasing filter selector 330 may select one of a plurality of anti-aliasing filters based on a plurality of band-specific energies. The plurality of anti-aliasing filters may be a software function of the above-described application. For example, the application program code may include code for providing functions for each of the plurality of anti-aliasing filters to the electronic device 110. [ Each of these plurality of anti-aliasing filters may have the form of a module.

More specifically, the anti-aliasing filter selecting unit 330 can select the anti-aliasing filter based on the band having the largest cumulative energy value for each band. For example, the energy calculation unit 320 may divide the spectrum of the voice signal into a plurality of bands in step 430, and may extract and accumulate energy for each band in each frame. At this time, the accumulation value of the energy is large, which means that the input voice signal has relatively more energy distributed in the corresponding band. Therefore, the anti-aliasing filter selecting unit 330 can select an anti-aliasing filter among the plurality of anti-aliasing filters that can process the band having the largest accumulated energy value without deterioration. More specifically, the anti-aliasing filter selecting unit 330 selects a high-performance anti-aliasing filter for covering a band of 70-7000 Hz when a voice signal having a large energy distribution is input in a relatively high frequency band, When a voice signal having a large energy distribution in a relatively low frequency band is input, a low performance anti-aliasing filter can be selected to cover a band of 200 to 5000 Hz.

At this time, there may be a plurality of anti-aliasing filters capable of processing the band having the largest accumulated energy value without deteriorating the sound quality. Here, the fact that the band having the largest accumulation value of energy can be processed without deteriorating the sound quality can mean that it is possible to guarantee similar performance (prevention of deterioration of sound quality). Therefore, the anti-aliasing filter selection unit 330 can minimize the amount of calculation while ensuring the performance by selecting the anti-aliasing filter having the smallest arithmetic amount among aliasing prevention filters of similar performance.

In step 450, the down-sampling processing unit 340 may process down-sampling on the input speech signal using the selected anti-aliasing filter. For example, the down-sampling processing unit 340 may process down-sampling to convert a standard sampling rate of 48 kHz to a sampling rate of 16 kHz. At this time, aliasing may occur according to the down-sampling, and in order to prevent such aliasing, aliasing is prevented from occurring through filtering using an anti-aliasing filter such as a low-pass filter before decimation . In this case, the down-sampling processing unit 340 can use the anti-aliasing filter selected in real-time by the anti-aliasing filter selecting unit 330. Here, the fact that the anti-aliasing filter is selected in real time may mean that the anti-aliasing filter is selected based on the frequency characteristic of the voice signal input during the voice call. For example, the anti-aliasing filter selecting unit 330 may combine steps 430 and 440 to detect a plurality of anti-aliasing filter units 330 at predetermined time intervals in the voice call using the frequency characteristics of the voice signal input during the voice call proceeding Lt; RTI ID = 0.0 &gt; anti-aliasing &lt; / RTI &gt; In this case, the down-sampling processing unit 340 may change the anti-aliasing filter for processing the down-sampling of the voice signal inputted during the voice call to the anti-aliasing filter selected for each time interval.

As described above, according to this embodiment, degradation of sound quality can be prevented by processing down-sampling using a high-performance anti-aliasing filter for a voice signal having a large energy distribution in a high band, Sampling processing is performed using a low-performance anti-aliasing filter for a visible speech signal, thereby reducing the amount of calculation for converting the sampling rate.

At this time, the anti-aliasing filter selected for processing the down-sampling of the input voice signal may be reselected by the anti-aliasing filter selector 330 at predetermined time intervals during the voice call. In this case, since an anti-aliasing filter suitable for a voice signal can be variably selected in real time, it becomes possible to select an optimum anti-aliasing filter for both performance (prevention of sound quality deterioration) and calculation amount. As a result, it is possible to obtain an effect of reducing the amount of calculation while ensuring sound quality of a voice call.

Hereinabove, the process of down-sampling the voice signal input during voice communication for converting the sampling rate of the transmitting side has been described. On the other hand, as described above, since the electronic device 1 (110) can be both the transmitting side (Tx) and the receiving side (Rx), the receiving side must also be able to process the sampling rate conversion.

FIG. 5 is a diagram illustrating an example of a component that a processor of an electronic device according to an embodiment of the present invention may further include; FIG. 6 is a diagram illustrating an example of a sampling FIG. 8 is a flowchart illustrating an example of steps that a rate conversion method may further include.

5, the processor 212 of the electronic device 1 110 includes a sound output apparatus characteristic verifying unit 510, an imaging prevention filter selecting unit 520, a voice signal generating unit 530, - a sampling processing unit 540. The components that the processor 212 and the processor 212 further may control electronic device 1 110 to perform steps 610 through 640 further including a sampling rate conversion method as shown in Figure 6 . As described above, the components of processor 212 may be representations of different functions performed by processor 212. [

Steps 610 through 640 of FIG. 6 may be performed after step 420 described with reference to FIG. 4, and since transmission and reception of voice in a voice call occur simultaneously, steps 630 through 640 may be processed in parallel with steps 430 through 450 .

In step 610, the sound output apparatus characteristic verifying unit 510 confirms the type of the sound output apparatus included in the first electronic apparatus 110 or connected to the first electronic apparatus 110, the frequency reproducing power of the sound output apparatus, . The type of the sound output apparatus may be a speaker phone, an earphone, a handset, or more specifically, a model name of the sound output apparatus. The frequency regenerating power of the sound output apparatus may vary depending on the type of the sound output apparatus, and may be a value obtained by quantifying the frequency regenerating power according to the type of the sound output apparatus. For example, since the sound output apparatuses may have different frequency characteristics or performances, the sound output apparatus characteristic verifying unit 510 can confirm the frequency regenerating power depending on the type and type of the sound output apparatus.

In addition, the call mode may mean a hand free mode, a handset call mode, or the like. For example, in the case of a hand-free call, the frequency characteristics of the voice reproduced through a loud speaker may be very poor due to mechanical effects. Further, in the case of a handset call, since the frequency characteristics of the voice reproduced through the receiver of the communication terminal are managed within a strict range, a call of good quality is possible. Therefore, the voice output apparatus characteristic verifying unit 510 can confirm the call mode used in the current voice call.

In step 620, the anti-imaging filter selector 520 may select one of a plurality of imaging prevention filters based on the identified type, the identified frequency regenerative power, or the identified calling mode. As described above, the type of the voice output apparatus, the frequency regenerative power or the call mode depending on the type may all affect the quality of the voice call. Accordingly, the imaging prevention filter selection unit 520 can select the imaging prevention filter according to the identified kind, the identified frequency regenerating power, or the confirmed calling mode.

For example, a step (not shown) for managing a matching table in which one of a plurality of anti-imaging filters is matched for each type of sound output device, for each frequency reproduction power range of the sound output device, or for each communication mode, May be performed by the processor 212 in FIG. The matching table may be a table in which an appropriate image prevention filter is mapped in advance according to the type of voice output device, the frequency regenerative power, or the communication mode. In this case, the imaging prevention filter selection unit 520 may check the matching anti-image filter for the identified kind in the matching table, or check the matching anti-image filter for the range containing the identified frequency reproducing power, It is possible to confirm the matched image prevention filter for the call mode. In addition, the imaging prevention filter selection unit 520 may select the identified imaging prevention filter from among the plurality of imaging prevention filters.

In step 630, the voice signal generator 530 may decode the data packet received through the network to generate a voice signal. For example, the data packet may be a data packet received from the electronic device 2 (120), which is the other party with which the electronic device 1 (110) proceeds the voice call. The second electronic device 120 may also undergo the down-sampling process described with reference to FIGS. 3 and 4 on the voice signal input to the second electronic device 120, encodes the down-sampled voice signal, To the electronic device 1 (110). The electronic device 1 (110) may decode the received data packet to generate a voice signal.

In step 640, the up-sampling processing unit 540 may process up-sampling of the speech signal generated using the selected anti-fake filter. Since the audio signal generated as described above is applied to the down-sampling process in the second electronic device 120, the sampling rate can be converted by up-sampling. For example, when upsampling a sampling rate of 16 kHz and converting it to a sampling rate of 48 kHz, imaging occurs where signals below 8 kHz appear like images. Therefore, an imaging prevention filter is required as a filter for removing such images. For example, the up-sampled signal can be input to the selected anti-fake filter to remove the generated image upon up-sampling. At this time, by selecting the appropriate anti-imaging filter according to the type of the audio output apparatus, the frequency regenerating power or the communication mode as described above, it becomes possible to select the anti-imaging filter optimized between the performance and the calculation amount.

7 is a block diagram of logical components that an electronic device according to an embodiment of the present invention may include. The electronic device 1 110 may include a transmission side sampling rate converter (Tx-SRC) 710 and a reception side sampling rate converter Rx-SRC 720. For example, the components of the processor 212 (the voice call progress control section 310, the energy calculation section 320, the anti-aliasing filter selection section 330, and the down-sampling processing section 340 may be implemented to handle the function of the transmitter-side sampling rate converter 710 and may be implemented as components of the processor 212 described with reference to Figures 5 and 6 The audio signal generation unit 530 and the up-sampling processing unit 540) may be configured to process the functions of the reception-side sampling rate converter 720. The reception-

In the transmission side sampling rate converter 710, a voice activity detection (VAD) 711 is a function for distinguishing between voice and silence, and can be used to transmit only a voice signal to the voice spectrum analyzer 712. For example, the signal input to the electronic device 110 may include not only a voice signal but also a silent signal (which may include a surrounding sound signal) when the user's utterance does not exist. The audio spectrum analyzer 712 is a function for analyzing the frequency characteristics of the user's voice signal and the electronic device 110 transmits the extracted voice signal using the VAD 711 to the voice spectrum analyzer 712 . The voice spectrum analyzer 712 can select the anti-aliasing filter 713 which is one of the plurality of anti-aliasing filters using the method described with reference to FIG. 3 and FIG. The input signal may be input to the selected anti-aliasing filter 713 and the output of the anti-aliasing filter 713 may be down-sampled through a decimator 714. [ The down-sampled signal may be encoded and data packetized via the speech encoder 715 and transmitted to the second electronic device 120 via the network.

In the receiving side sampling rate converter 720, the voice decoder may decode the received data (data of the data packet received from the electronic device 2 120) to generate a voice signal. The generated speech signal may be input to an expander 722 and up-sampled. An imaging prevention filter 723 may be used to remove the image generated during up-sampling as described above. The imaging prevention filter 723 may be selected using the matching table 724 as described with reference to FIGS. The signal from which the image is removed through the anti-imaging filter 723 may be output through a sound output device such as a speaker.

8 is a graph showing an example of a graph showing an accumulated value of energy for each band in an embodiment of the present invention. The graph 800 is a histogram showing the cumulative value of energy for each band. The x-axis represents the cumulative value of energy in units of level, and the y-axis represents frequency. In addition, the graph 800 divides the frequency band of the speech into 3 (M = 3) bands, and displays the accumulated energy value for each of the divided bands through a bar. In the example of FIG. 8, the accumulated energy value for the second band (band M-1) is the largest. At this time, the anti-aliasing filter selecting unit 330 described with reference to FIG. 3 and FIG. 4 can identify the anti-aliasing filter that can process the second band without deteriorating sound quality. This histogram may be expressed in the form of a cumulative distribution function.

As described above, according to the embodiments of the present invention, the anti-aliasing filter of the transmission side (Tx) sampling rate converter (SRC) and the imaging prevention filter of the reception side (Rx) SRC efficiently operate And operate at the optimum point of performance and computation amount).

The system or apparatus described above may be implemented as a hardware component, a software component or a combination of hardware components and software components. For example, the apparatus and components described in the embodiments may be implemented within a computer system, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA) , A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be embodyed temporarily. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI &gt; or equivalents, even if it is replaced or replaced.

Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims (14)

A computer program stored on a computer readable recording medium for executing a sampling rate conversion method in combination with a computer embodying an electronic device,
The sampling rate conversion method includes:
Transmitting and receiving a data packet through the network in the electronic device and proceeding with a voice call;
Analyzing a frequency characteristic of a voice signal input during the voice communication in the electronic device for each of a plurality of bands and calculating energy for each of the plurality of bands;
Selecting one of a plurality of anti-aliasing filters during the voice call based on the plurality of band-specific energies in the electronic device; And
Sampling the input speech signal using the selected anti-aliasing filter while the voice call is in progress
Lt; / RTI &gt;
Wherein the anti-aliasing filter selected for processing the down-sampling of the input speech signal is reselected at predetermined time intervals during the voice call.
The method according to claim 1,
Wherein the calculating the energy for each of the plurality of bands comprises:
The spectrum of the speech signal is divided into a plurality of bands, energy is extracted and accumulated for each band for each frame,
Wherein the step of selecting one of the plurality of anti-
Wherein the anti-aliasing filter identifies the anti-aliasing filter capable of processing the band having the largest accumulation value of the energy among the plurality of anti-aliasing filters without deterioration of sound quality.
3. The method of claim 2,
Wherein the step of selecting one of the plurality of anti-
When there are a plurality of anti-aliasing filters that are confirmed to be capable of processing the band having the largest accumulated energy value without deteriorating the sound quality, the calculation amount for changing the frequency of the input voice signal among the plurality of anti- And selecting a small anti-aliasing filter.
delete The method according to claim 1,
The sampling rate conversion method includes:
Confirming a type of a sound output apparatus included in or connected to the electronic apparatus, a frequency regeneration power or a communication mode of the sound output apparatus;
Selecting one of a plurality of imaging prevention filters based on the identified kind, the identified frequency regenerating power or the identified calling mode;
Decoding a data packet received through the network to generate a voice signal; And
Processing the up-sampling for the generated speech signal
&Lt; / RTI &gt;
6. The method of claim 5,
The sampling rate conversion method includes:
Managing a matching table in which one of the plurality of imaging prevention filters is matched for each type of the sound output apparatus, for each frequency reproduction power range of the sound output apparatus or for each communication mode
Further comprising:
Wherein the step of selecting one of the plurality of anti-
An image rejection filter matched for the identified type in the matching table, an image rejection filter matched for the range containing the identified frequency regenerative power or an image rejection filter matched for the call mode, And selecting an image protection filter from among the plurality of imaging prevention filters.
A method for converting a sampling rate of an electronic device,
Transmitting and receiving a data packet through the network in the electronic device and proceeding with a voice call;
Analyzing a frequency characteristic of a voice signal input during the voice communication in the electronic device for each of a plurality of bands and calculating energy for each of the plurality of bands;
Selecting one of a plurality of anti-aliasing filters during the voice call based on the plurality of band-specific energies in the electronic device; And
Sampling the input speech signal using the selected anti-aliasing filter while the voice call is in progress
Lt; / RTI &gt;
Wherein the anti-aliasing filter selected for processing the down-sampling of the input speech signal is reselected at predetermined time intervals in the voice call.
8. The method of claim 7,
Wherein the calculating the energy for each of the plurality of bands comprises:
The spectrum of the speech signal is divided into a plurality of bands, energy is extracted and accumulated for each band for each frame,
Wherein the step of selecting one of the plurality of anti-
Wherein an anti-aliasing filter capable of processing the band having the largest accumulation value of the energy among the plurality of anti-aliasing filters without deterioration of sound quality is identified.
9. The method of claim 8,
Wherein the step of selecting one of the plurality of anti-
When there are a plurality of anti-aliasing filters that are confirmed to be capable of processing the band having the largest accumulated energy value without deteriorating the sound quality, the calculation amount for changing the frequency of the input voice signal among the plurality of anti- And a small anti-aliasing filter is selected.
delete 8. The method of claim 7,
Confirming a type of a sound output apparatus included in or connected to the electronic apparatus, a frequency regeneration power or a communication mode of the sound output apparatus;
Selecting one of a plurality of imaging prevention filters based on the identified kind, the identified frequency regenerating power or the identified calling mode;
Decoding a data packet received through the network to generate a voice signal; And
Processing the up-sampling for the generated speech signal
&Lt; / RTI &gt;
12. The method of claim 11,
Managing a matching table in which one of the plurality of imaging prevention filters is matched for each type of the sound output apparatus, for each frequency reproduction power range of the sound output apparatus or for each communication mode
Further comprising:
Wherein the step of selecting one of the plurality of anti-
An image rejection filter matched for the identified type in the matching table, an image rejection filter matched for the range containing the identified frequency regenerative power or an image rejection filter matched for the call mode, Wherein an image rejection filter is selected from among the plurality of anti-imaging filters.
A method for converting a sampling rate of an electronic device,
Transmitting and receiving a data packet through the network in the electronic device and proceeding with a voice call;
Selecting one of a plurality of anti-aliasing filters at predetermined time intervals in the voice call using a frequency characteristic of the voice signal input during the voice call; And
Changing an anti-aliasing filter for processing the down-sampling of the input voice signal during the voice call to the anti-aliasing filter selected at each time interval
Wherein the sampling rate conversion step comprises:
delete
KR1020150154083A 2015-11-03 2015-11-03 Sampling rate conversion method and system for efficient voice call KR101748039B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150154083A KR101748039B1 (en) 2015-11-03 2015-11-03 Sampling rate conversion method and system for efficient voice call

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150154083A KR101748039B1 (en) 2015-11-03 2015-11-03 Sampling rate conversion method and system for efficient voice call

Publications (2)

Publication Number Publication Date
KR20170052090A KR20170052090A (en) 2017-05-12
KR101748039B1 true KR101748039B1 (en) 2017-06-15

Family

ID=58740427

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150154083A KR101748039B1 (en) 2015-11-03 2015-11-03 Sampling rate conversion method and system for efficient voice call

Country Status (1)

Country Link
KR (1) KR101748039B1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102423977B1 (en) * 2019-12-27 2022-07-22 삼성전자 주식회사 Method and apparatus for transceiving voice signal based on neural network
KR20210111603A (en) * 2020-03-03 2021-09-13 삼성전자주식회사 Apparatus and method for improving sound quality

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130211827A1 (en) 2012-02-15 2013-08-15 Microsoft Corporation Sample rate converter with automatic anti-aliasing filter

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130211827A1 (en) 2012-02-15 2013-08-15 Microsoft Corporation Sample rate converter with automatic anti-aliasing filter

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Bruno Bessette, et al. The adaptive multirate wideband speech codec (AMR-WB). IEEE transactions on speech and audio processing. 2002.11. Vol.10, No.8, pp.620-636.*
Ronald E. Crochiere, et al. Optimum FIR digital filter implementations for decimation, interpolation, and narrow-band filtering. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1975.10.*

Also Published As

Publication number Publication date
KR20170052090A (en) 2017-05-12

Similar Documents

Publication Publication Date Title
US10559313B2 (en) Speech/audio signal processing method and apparatus
EP3252767B1 (en) Voice signal processing method, related apparatus, and system
JP6545815B2 (en) Audio decoder, method of operating the same and computer readable storage device storing the method
US9294834B2 (en) Method and apparatus for reducing noise in voices of mobile terminal
KR101668401B1 (en) Method and apparatus for encoding an audio signal
JP2011516901A (en) System, method, and apparatus for context suppression using a receiver
JP2000305599A (en) Speech synthesizing device and method, telephone device, and program providing media
KR20200123395A (en) Method and apparatus for processing audio data
KR101748039B1 (en) Sampling rate conversion method and system for efficient voice call
CA2945791A1 (en) Systems, methods and devices for electronic communications having decreased information loss
EP2786373A1 (en) Quality enhancement in multimedia capturing
CN105761724B (en) Voice frequency signal processing method and device
CN111145776B (en) Audio processing method and device
CN115631758B (en) Audio signal processing method, apparatus, device and storage medium
KR102526699B1 (en) Apparatus and method for providing call quality information
CN117157705A (en) Data processing method and device
CN115602183A (en) Audio enhancement method and device, electronic equipment and storage medium
JP2010160496A (en) Signal processing device and signal processing method

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant