WO2012124268A1

WO2012124268A1 - Audio content processing device and audio content processing method

Info

Publication number: WO2012124268A1
Application number: PCT/JP2012/001384
Authority: WO
Inventors: 友朗丸山
Original assignee: パナソニック株式会社
Priority date: 2011-03-14
Filing date: 2012-02-29
Publication date: 2012-09-20

Abstract

An audio content processing device capable of suggesting the weight of each item of audio content such that same can be intuitively grasped, without using a display. The audio content processing device (400) has: a display position calculation unit (440) that moves a pointer between a plurality of audio content items having weighting allocated to each; a marker sound generation unit (450) that indicates the weight of the audio content item where the pointer is positioned, by using a sound quality for a marker sound associated to said weight; and an audio replay control unit (460) that performs a prescribed processing on the audio content item where the pointer is positioned.

Description

Audio content processing apparatus and audio content processing method

The present invention relates to an audio content processing apparatus and an audio content processing method for performing processing such as reproduction on a plurality of audio contents each weighted.

Devices that perform various processing such as playback on audio content, such as music players, radio receivers, mobile phones, and mobile terminals (hereinafter referred to as “music content processing devices”) are widely used. In recent years, music content processing apparatuses having a wireless communication function and a function of searching for and downloading music content on the Internet or the like have also appeared. In the following description, “sound” refers to general sound that is not limited to a human voice, as can be seen from an example of audio content. That is, “speech” is a concept that widely refers to sounds such as music, insects and animals, artificial sounds such as noise from machines, and natural sounds such as waterfalls and lightning.

Some weight is given to a plurality of audio contents to be processed by the music content processing apparatus. This weight is a value (linear value) indicating the order as one of the attributes of the audio content, and may be an absolute value or a relative value. For example, the search match score is a kind of weight. The user can check the weight of each audio content. Therefore, the user can perform an operation on the audio content in consideration of the weight, for example, checking in order from the highest search match score.

Therefore, for example, Patent Document 1 describes a technique for displaying the weight of each audio content. The technique described in Patent Document 1 displays a list of combinations of channel numbers (weights) and broadcast programs in Internet radio on a display. Thereby, the user can perform a broadcast program switching operation while confirming the channel number of each broadcast program. That is, the technique described in Patent Document 1 can display the weight of each audio content to the user.

JP 2008-72672 A

However, the technique described in Patent Document 1 has a problem that the weight of each audio content cannot be presented so as to be intuitively understood by the user. This is because, in order to confirm the weight of the audio content, the user must check the display content by sequentially looking at the display. Considering an operation of checking a plurality of audio contents while switching at high speed, an audio content processing apparatus that can be presented so as to intuitively understand the weight of each audio content is desired.

An object of the present invention is to present so that the weight of each audio content can be intuitively grasped without using a display.

The audio content processing apparatus of the present invention associates a presentation position calculation unit that moves a pointer between a plurality of audio contents each weighted, and the weight of the audio content on which the pointer is located, with the weight A marker sound generation unit that presents the sound quality of the marker sound that is displayed, and an audio reproduction control unit that performs predetermined processing on the audio content on which the pointer is located.

The audio content processing method of the present invention includes a step of moving a pointer between a plurality of audio contents each having a weight, and a marker associated with the weight of the audio content on which the pointer is positioned. A step of presenting with sound quality and a step of performing predetermined processing on the audio content on which the pointer is located.

According to the present invention, the weight of each audio content can be presented in an easily understandable manner.

The figure which shows an example of the external appearance of the audio | voice content reproduction | regeneration system in which the audio | voice content processing apparatus which concerns on Embodiment 1 of this invention is used. FIG. 2 is a block diagram showing an example of the configuration of an audio content processing apparatus according to the first embodiment. The figure which shows an example of operation | movement of the audio | voice content processing apparatus which concerns on this Embodiment 1. The figure which shows an example of the information list in this Embodiment 1. The figure which shows an example of the information list to which the content arrangement | positioning information in this Embodiment 1 was added. The figure which shows an example of the audio | voice content arrangement | positioning of the weight axis in this Embodiment 1. The figure which shows an example of the mode of the movement of the pointer in this Embodiment 1. The figure which shows the 1st example of the position sound quality conversion rule in this Embodiment 1. Sequence diagram showing an example of the operation of the audio content reproduction system according to the first embodiment The figure which shows the 2nd example of the position sound quality conversion rule in this Embodiment 1. The figure which shows the 3rd example of the position sound quality conversion rule in this Embodiment 1. The figure which shows the 4th example of the position sound quality conversion rule in this Embodiment 1. The figure which shows an example of the weight sound quality conversion rule in this Embodiment 1. The figure which shows an example of the external appearance of the audio | voice content reproduction | regeneration system in which the audio | voice content processing apparatus which concerns on Embodiment 2 of this invention is used. Block diagram showing an example of the configuration of an audio content processing apparatus according to the second embodiment The figure which shows an example of the definition of head direction and position in this Embodiment 2. The figure which shows an example of operation | movement of the audio | voice content processing apparatus which concerns on this Embodiment 2. The figure which shows an example of the display window in this Embodiment 2. The figure which shows an example of the mode of the movement of the pointer position in this Embodiment 2, and the change state of a marker sound. The figure which shows an example of the pointer position in this Embodiment 2. The figure which shows an example of the mode of the movement of the pointer range in this Embodiment 2.

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

(Embodiment 1)
FIG. 1 is a diagram showing an example of the appearance of an audio content reproduction system in which the audio content processing apparatus according to Embodiment 1 of the present invention is used.

Referring to FIG. 1, the audio content reproduction system 100 will be described with an audio output device 200 and a portable player 300 provided with the audio content processing device according to Embodiment 1 of the present invention as an example.

The audio output device 200 is a device that converts audio data into audio and outputs the audio. In the present embodiment, the audio output device 200 includes a monaural audio data transmission cable 210 and a monaural earphone 220 worn on one ear of a person. And

The portable player 300 is a monaural audio player. The portable player 300 places a plurality of weighted audio contents on a weight axis that is a virtual coordinate axis. Then, the portable player 300 switches and reproduces the audio content by moving the pointer on the weight axis. For example, the weight is an image of a channel number in the radio, the weight axis is an image of a rotation angle of a knob for switching the channel number, and the movement of the pointer is an image of rotation of the knob.

The portable player 300 has a small casing 301 as its outer shape, and has a user operation input device 310 on the surface of the casing 301. The user operation input device 310 receives various operations such as a pointer position moving operation and a determination operation for audio content from the user. In the present embodiment, the user operation input device 310 has a “return” button 311, a “decision” button 312, and a “forward” button 313, and detects these presses and releases as operation events. To do. Note that other forms of the user operation input device 310 include, for example, a joystick, a touch panel, or a remote control device separate from the portable player 300. The portable player 300 may further include a display on the surface of the housing 301.

The audio content processing apparatus (not shown) according to the present embodiment provided in the portable player 300 moves the pointer on the above-described weight axis in response to the pointer movement operation by the user operation input apparatus 310. That is, the audio content processing apparatus moves the pointer between a plurality of audio contents. Also, every time the pointer position overlaps the position of the audio content, the audio content processing apparatus temporarily reproduces the audio of the audio content. Then, the audio content processing apparatus receives the audio content determination operation from the user operation input device 310, and reproduces the audio of the audio content where the pointer is located at that time. Note that audio reproduction is performed by transmitting audio data to the monaural earphone 220 via the cable 210. In this example, the audio data is transmitted through the cable 210, but it can also be transmitted wirelessly.

In the following description, the audio content where the pointer is located is referred to as “selected audio content”. The audio content that is the target of the determination operation is referred to as “determined audio content”.

Also, the audio content processing apparatus according to the present embodiment arranges a plurality of audio contents on the above-described weight axis. Then, the audio content processing device presents the weight of the selected audio content with the sound quality of the marker sound associated with the weight until the determination operation is performed with the pointer. That is, the audio content processing apparatus generates a marker sound that indicates the weight corresponding to the pointer position by sound quality, and transmits the audio data to the monaural earphone 220 via the cable 210. At this time, the arrangement of the plurality of audio contents on the weight axis matches, for example, the order of the weight of the audio contents.

In the present embodiment, the marker sound is an intermittent simple sound “Pong, Pong,...”, And is a voice pointer indicating the position of the pointer. In addition, the marker sound presents the weight of the audio content where the pointer is located due to a change in pitch.

FIG. 2 is a block diagram showing an example of the configuration of the audio content processing apparatus according to the present embodiment. Here, for convenience of explanation, other peripheral devices are also illustrated.

In FIG. 2, the audio content processing device 400 includes an information storage unit 410, a position arrangement unit 420, a pointer position acquisition unit 430, a presentation position calculation unit 440, a marker sound generation unit 450, an audio reproduction control unit 460, and an audio stream generation unit. 470.

The information storage unit 410 has general database functions such as information recording, correction, deletion, search, and reading, and stores audio data of audio contents and attributes (ID, storage position, weight, etc.) assigned thereto. Store. For example, the information storage unit 410 acquires an information list from the external file system 510 and the audio content supply server 520 via a communication network such as the Internet, and stores the information list. Here, the information list is a list describing identification information of weighted audio contents.

In the present embodiment, the file system 510 is a storage device that stores audio data that is the main body of audio content. The audio content supply server 520 is assumed to be a search server having a function of searching audio content. Specifically, the audio content supply server 520 searches the audio content of the file system 510 in response to the search query from the audio content processing device 400. Then, the audio content supply server 520 replies to the audio content processing apparatus 400 with 999 pieces of audio content information assigned with the search match score as a weight. The information storage unit 410 generates the above information list based on the returned information. It should be noted that illustration and description of the functional unit that issues the search query issuance / returned information in the audio content processing apparatus 400 are omitted.

The position arrangement unit 420 associates the identification information of the plurality of audio contents in the information list with the coordinates of the position corresponding to each weight among the above-described weight axes. Hereinafter, this association is referred to as “arrangement of audio content on the weight axis”. Then, the position arrangement unit 420 adds content arrangement information indicating the arrangement of each audio content (correspondence between the audio content identification information and the weight axis coordinates) to the information list.

The pointer position acquisition unit 430 acquires operations such as a pointer position moving operation and a determination operation on the current position of the pointer (corresponding audio content) performed by the user operation input device 310. When the pointer position acquisition unit 430 acquires a pointer position movement operation, the pointer position acquisition unit 430 outputs pointer operation information indicating the direction and degree of the movement to the presentation position calculation unit 440. When the pointer position acquisition unit 430 acquires a determination operation for the current position of the pointer, the pointer position acquisition unit 430 outputs determination operation information indicating that to the audio reproduction control unit 460.

The presentation position calculation unit 440 calculates the current position of the pointer when the pointer is moved on the weight axis according to the pointer operation information. In other words, the presentation position calculation unit 440 moves the pointer to the audio content that the user pays attention to among the plurality of audio contents each weighted. The presentation position calculation unit 440 may move the pointer only to the position of the audio content according to the degree of the pointer operation information, or may move the pointer regardless of the presence or absence of the audio content. Good. The presentation position calculation unit 440 in the present embodiment moves the pointer regardless of the presence or absence of audio content. Then, the presentation position calculation unit 440 outputs pointer position information indicating the current position of the pointer to the marker sound generation unit 450 and the audio reproduction control unit 460.

The marker sound generation unit 450 determines the sound quality of the marker sound from the current position of the pointer indicated by the pointer position information. Then, the marker sound generation unit 450 generates audio data of the marker sound having the determined sound quality and outputs it to the audio stream generation unit 470. That is, the marker sound generation unit 450 uses the audio stream generation unit 470 to present the weight of the audio content at which the pointer is located with the sound quality of the marker sound associated with the weight. In addition, the marker sound generation unit 450 uses the audio stream generation unit 470 to present the change in weight between a certain audio content and the next audio content with the sound quality of the marker sound.

Also, the audio reproduction control unit 460 calculates the sound field of the virtual sound field space based on the latest pointer position information and content arrangement information, and constructs the virtual sound field space. This virtual sound field space is a virtual sound field space that includes at least the current position of the pointer on the weight axis and outputs the marker sound and the sound of each sound content from each position. In the present embodiment, since the sound output is monaural, the virtual sound field space is a one-dimensional space extending in front of the user. Note that the audio reproduction control unit 460 may handle the weight axis and the virtual sound field space in the same coordinate system. In this case, the construction process of the virtual sound field space is not necessarily required. Then, the audio reproduction control unit 460 outputs sound field information indicating the constructed virtual sound field space to the audio stream generation unit 470.

Further, when the determination operation information is input, the audio reproduction control unit 460 specifies the selected audio content as the audio content determined by the user, and performs a predetermined process. Here, the predetermined process is a process of causing the audio stream generation unit 470 to stop outputting the marker sound and reproduce the determined audio content. The audio playback control unit 460 also acquires pointer operation information from the pointer position acquisition unit 430 while the determined audio content is being played back. When the “return” button 311 is pressed, the audio playback control unit 460 returns the playback location, when the “forward” button 313 is pressed, the playback location is advanced, and the “decision” button 312 is pressed. When playback stops.

The audio stream generation unit 470 generates an audio stream (audio data) that realizes the virtual sound field space indicated by the sound field information in accordance with the sound field information, and outputs the sound stream (audio data) to the audio output device 200. Specifically, the audio reproduction control unit 460 obtains audio data of audio content included in the sound field information from, for example, the information storage unit 410 or the external file system 510. Then, the audio stream generation unit 470 generates an audio stream in which the sound field in which the marker sound and the sound of the sound content are heard as the virtual sound field space is realized for the user wearing the sound output device 200. Generate. At this time, the audio stream generation unit 470 applies appropriate sound quality and volume, designated special effects, and the like to the audio data of the audio content.

Although not shown, the audio content processing apparatus 400 includes a CPU (central processing unit), a storage medium such as a ROM (read only memory) storing a control program, a working memory such as a RAM (random access memory), and communication. It has a circuit. In this case, the function of each unit described above is realized by the CPU executing the control program.

Such an audio content processing apparatus 400 can present the weight of the audio content where the pointer is located (the selected audio content) with the sound quality of the marker sound associated with the weight. The sound quality can be grasped in a very short time as compared with grasping the information displayed on the display, without any work such as movement of the line of sight or reading of characters. Further, in the case of one-dimensional information such as weights, it can be presented so that the user can sufficiently grasp the sound quality. Therefore, the audio content processing apparatus 400 can present the weight of the selected audio content so that the user can intuitively understand without using the display.

In addition, the audio content processing apparatus 400 can arrange the audio content in the virtual sound field space and render the audio content desired by the user at a timing desired by the user. Therefore, the audio content processing apparatus 400 can provide a comfortable audio content reproduction environment to the user.

Here, the “virtual sound field space” in the present embodiment will be briefly described.

First, the term “sound field” is a commonly used term and refers to a “real” space where “real” sounds (sound waves) exist. The “sound field” typically refers to the space of a concert venue, where there are sound sources and walls that reflect or absorb sound.

On the other hand, the “virtual sound field” is a sound field generated “virtually” around the user by adjusting the sound that enters the user's ear. Although the user feels that there is a sound field (that is, a sound source or a wall) in the surroundings, there is actually no sound field that the user feels. Technologies for creating a virtual sound field include so-called surround technology and three-dimensional sound technology.

The “virtual sound field space” in the present embodiment is a space in which a sound source and a wall position are arranged, which a user who hears the virtual sound field will feel. The audio content processing apparatus 400 constructs an arrangement such as a sound source or a wall as a virtual sound field space, and applies a surround technology or a stereophonic technology to this. The sound generated as a result is a sound that makes the user feel the constructed virtual sound field space as an actual sound field space when listening with a speaker or headphones.

In this embodiment, it is assumed that the audio reproduction control unit 460 constructs a virtual sound field space with the position of the pointer as the position of the listener. Therefore, in this embodiment, when the pointer position changes, the weight axis on which the audio content is arranged does not change, but the virtual sound field space changes. The audio reproduction control unit 460 constructs a virtual sound field space in which the weight axis is a straight line and the position of the listener of the audio data is arranged at the pointer position on the weight axis.

In addition, the audio reproduction control unit 460 constructs a virtual sound field space according to the positional relationship according to the positional relationship between the audio output device 200 and the user's both ears, such as a speaker or a headphone. It is desirable to do. In the present embodiment, the audio reproduction control unit 460 constructs a virtual sound field space on the assumption that a predetermined monaural headphone 220 worn on the user's ear is used as the audio output device 200.

Next, the operation of the audio content processing apparatus 400 will be described.

Here, as an example, it is assumed that the file system 510, which is an external device, stores a large number of audio contents such as speeches, readings, and monologues. It is assumed that the user uses an information processing apparatus such as a personal computer or a cellular phone (not shown), designates the audio content processing apparatus 400 as a reply destination, and transmits an arbitrary keyword as a search query to the audio content supply server 520. In this case, the audio content supply server 520 searches the attributes of the audio content in the file system 510 (such as the title of the audio content, the name of the speaker, the place name of the recording location, and the recording date / time) using keywords. Then, the audio content supply server 520 returns a search result including the search match score to the audio content processing apparatus 400.

FIG. 3 is a diagram illustrating an example of the operation of the audio content processing apparatus 400.

First, in step S1100, the information storage unit 410 acquires and stores an information list from the information acquired from the audio content supply server 520.

FIG. 4 is a diagram showing an example of the information list.

As shown in FIG. 4, the information list 610 describes, for each audio content, an ID 611 that is identification information of the audio content, a storage location 612, a weight 613, and a weight order 614 in association with each other. The storage location 612 is a storage location of audio data of audio content. In the present embodiment, the weight 613 is the number of match points for the search by the audio content supply server 520. The weight order 614 is an order of the degree of the weight 613.

Then, in step S1200 of FIG. 3, the position arrangement unit 420 arranges the audio contents listed in the information list on the weight axis in the order of the respective weights. Then, the position arrangement unit 420 adds content arrangement information in which the ID of each audio content is associated with the coordinate value of the weight axis to the information list.

Specifically, for example, the position arrangement unit 420 has a weight position conversion rule for converting the weight described in the information list into weight axis coordinates in advance, and each audio content is determined according to the weight position conversion rule. Place on the weight axis.

Specifically, the weight position conversion rule is a content in which audio content is arranged so that when a change in the coordinate value of the weight axis proceeds in one direction, a change in the arranged weight also proceeds in one direction. That is, the weight position conversion rule is such that only one of the following formulas (1) and (2) always holds. Here, w _n is the weight of the audio content with ID “n”, and x _n is the coordinate value of the audio content with ID “n”. Also, w _m is the weight of the audio content with ID “m”, and x _m is the coordinate value of the audio content with ID “m”.
If w _n <w _m , x _n <x _m (1)
If w _n <w _m , x _n > x _m (2)

For example, in the weight position conversion rule, the coordinate axes are arranged in the order of the weight height from the side close to 0 so that the ratio of the difference between the coordinate values is almost equal to the ratio of the difference in weight, and is closest to the origin. This is the content where audio content is placed at the origin. The weight position conversion rule is defined by a function or a correspondence table, for example.

The simplest example of the weight position conversion rule is based on the difference from the maximum value w ₀ of the weight w _i of the audio content with the difference ID “i”, based on the maximum value w ₀ of the weight described in the information list. the contents to determine the coordinate values x _i. This is expressed, for example, by the following formula (3).
x _i = f (w _i ) ≡w ₀ −w _i (3)

FIG. 5 is a diagram illustrating an example of an information list to which content arrangement information is added.

As shown in FIG. 5, the information list 620 to which the content arrangement information is added describes the presentation position 621 indicating the coordinate value of the weight axis in association with each audio content ID 611.

Note that the location arrangement unit 420 may generate content arrangement information separately from the information list and store the content arrangement information in the information storage unit 410 or output it to the audio reproduction control unit 460.

FIG. 6 is a diagram showing an example of the arrangement of each audio content on the weight axis.

As shown in FIG. 6, on the weight axis 630, for example, first to third audio contents 631 to 633 with ID “1” to ID “3” are sequentially set to coordinate values x = 0, 900, and 1400, respectively. Be placed.

And in step S1300 of FIG. 3, the presentation position calculation part 440 sets a pointer position to an initial state. In addition, the marker sound generation unit 450 sets the marker sound to the initial state by setting the sound quality corresponding to the pointer position in the initial state to the marker sound.

In the present embodiment, the presentation position calculation unit 440 sets the coordinate value 0 of the weight axis as the initial position of the pointer. In addition, the marker sound generation unit 450 has a position sound quality conversion rule for converting the pointer position into a parameter that defines the sound quality of the marker sound (hereinafter referred to as “sound quality parameter”). Then, the marker sound generation unit 450 determines the value of the sound quality parameter according to the position sound quality conversion rule. Details of the position sound quality conversion rule will be described later.

In step S1400, the audio reproduction control unit 460 constructs a virtual sound field space for generating an audio stream based on the current position of the pointer and the arrangement of each audio content. Specifically, the audio reproduction control unit 460 transmits the sound from the sound source to the listener's ear when the sound source is arranged at a set position for each sound source (the sound of each audio content and the marker sound). Calculate the function. Then, the audio reproduction control unit 460 outputs the calculated transfer function to the audio stream generation unit 470 as sound field information.

Note that the audio reproduction control unit 460 should construct a virtual sound field space (that is, calculate an audio stream) in consideration of wall reflection and sound propagation speed. However, in this embodiment, the audio stream is monaural audio, and the user position is arranged on a linear weight axis. Therefore, the sound reproduction control unit 460 in the present embodiment ignores the reflection of the wall and the head-related transfer function (the sound quality on the left and right, the phase, the timing shift, etc.) and only attenuates the sound due to the distance. A virtual sound field space that takes into account may be constructed.

In step S1500, the presentation position calculation unit 440 and the marker sound generation unit 450 determine whether the pointer has moved. Specifically, the presentation position calculation unit 440 and the marker sound generation unit 450 determine whether pointer operation information has been input. This determination may be made by either one of the presentation position calculation unit 440 and the marker sound generation unit 450 determining based on the pointer operation information and notifying the determination result to the other. If there is no movement of the pointer (S1500: NO), the presentation position calculation unit 440 and the marker sound generation unit 450 proceed to step S1600.

In step S1600, the marker sound generation unit 450 generates audio data of the determined marker sound of the sound quality and outputs the audio data to the audio stream generation unit 470. Also, the audio reproduction control unit 460 outputs the audio data of each audio content to the audio stream generation unit 470. Then, the audio stream generation unit 470 generates and outputs an audio stream (audio data) that realizes the sound field of the constructed virtual sound field space. As a result, in the example of FIG. 7A, the user can hear the sound of the first audio content 631 and the marker sound at a close distance, and can hear the sounds of the second and third

audio contents

632 and 633 from a distance. Become.

Note that the marker sound generation unit 450 outputs the audio data of the audio content that is not output as audio in the constructed virtual sound field space (for example, audio content located very far on the weight axis) to the audio stream generation unit 470. It does not have to be.

In step S1700, the audio reproduction control unit 460 determines whether or not there has been a determination operation on the audio content. Specifically, the audio reproduction control unit 460 determines whether decision operation information has been input. If there is no determination operation (S1700: NO), the audio reproduction control unit 460 proceeds to step S1800.

In step S1800, the audio reproduction control unit 460 determines whether an instruction to end the process is given by a user operation or the like. If the audio reproduction control unit 460 is not instructed to end the process (S1800: NO), the process returns to step S1500.

On the other hand, the presentation position calculation unit 440 and the marker sound generation unit 450 proceed to step S1900 when the pointer moves (S1500: YES).

FIG. 7 is a diagram illustrating an example of how the pointer moves.

7A, the presentation position calculation unit 440 places the pointer 640 at the origin, that is, the position of the first audio content 631 in the initial state. The presentation position calculation unit 440 moves the pointer 640 in the plus direction 641 of the weight axis 630 each time the duration of the state increases while the “forward” button 313 of the user operation input device 310 is pressed. In addition, the presentation position calculation unit 440 moves the pointer 641 in the minus direction 642 of the weight axis 630 each time the duration of the state in which the “return” button 311 of the user operation input device 310 is kept pressed increases.

For example, assume that the “forward” button 313 is continuously pressed. In this case, the presentation position calculation unit 440 moves the pointer 640 to the plus direction 641 side of the weight axis 630 as shown in FIG. 7B. As a result, the pointer 640 moves away from the position of the first audio content 631 and eventually approaches the next second audio content 632 as shown in FIG. 7C.

In addition, as shown in FIG. 7B, the presentation position calculation unit 440 may handle sections 636 to 638 having widths instead of the positions 631 to 633 of the respective audio contents as the positions of the audio contents. . Thereby, the user can easily adjust the pointer position to the desired audio content.

In step S1900 of FIG. 3, the marker sound generation unit 450 changes the marker sound to the sound quality corresponding to the current pointer position after movement. Specifically, the marker sound generation unit 450 converts the current pointer position to the value of the sound quality parameter using the above-described position sound quality conversion rule, and generates the subsequent marker sound using the obtained value. The value to use.

Here, the position sound quality conversion rule in this embodiment will be described.

The position sound quality conversion rule is a content that determines the value of the sound quality parameter so that when the change in the coordinate value of the pointer position proceeds in one direction, the change in the sound quality parameter value also proceeds in one direction. That is, the position sound quality conversion rule is such that only one of the following formulas (4) and (5) always holds. Incidentally, P _n is the quality parameter when the position pointer is in the coordinate values x _n, the P _m, a sound quality parameter when the pointer is located at the coordinate value x _m. The position sound quality conversion rule is defined by a function or a correspondence table, for example.
If x _n <x _m B _n <B _m (4)
_x _n> x _m if _{_{B n <B m ··· (5}} )

FIG. 8 is a diagram showing a first example of the position sound quality conversion rule.

As shown in FIG. 8, the position sound quality conversion rule is a function 651 that converts a coordinate value to a marker sound frequency so that the coordinate value and the marker sound frequency have a negative proportional relationship, for example. When this function 651 is applied, the marker sound becomes the highest sound when the pointer is located at the origin (that is, in the initial state). As the pointer moves away from the origin, the sound of the marker sound becomes lower.

For example, as shown in FIG. 8, the presentation position “0” is associated with the frequency “4040 Hz”. The presentation position “900” is associated with the frequency “3500 Hz”, and the presentation position “1400” is associated with the frequency “3200 Hz”. Therefore, in the example of FIG. 7, the frequency of the marker sound is 4040 Hz at the position 631 of the first audio content, and 3500 Hz (approximately the pitch of A7) at the position 632 of the second audio content. The frequency of the marker sound is 3200 Hz at the position 633 of the third audio content.

As described above, the weight position conversion rule is a content in which weights are arranged so that when a change in the coordinate value of the weight axis advances in one direction, a change in the arranged weight also advances in one direction. Therefore, the audio content processing apparatus 400 determines the audio parameter so that only one of the following equations (6) and (7) always holds, and changes the marker sound. In other words, in the audio content processing apparatus 400, the change in the sound quality of the marker sound proceeds in one direction when the change in the position on the weight axis of the pointer advances in one direction.
If w _n <w _m B _n <B _m (6)
If w _n <w _m B _n > B _m (7)

When only Expression (6) always holds, the voice parameter increases as the pointer position moves away from the initial position. In addition, when only Expression (7) always holds, the voice parameter becomes lower as the pointer position moves away from the initial position.

Then, in step S2000 of FIG. 3, the audio reproduction control unit 460 reconstructs the virtual sound field space based on the current pointer position after movement. Specifically, the audio reproduction control unit 460 calculates a sound transfer function by changing the virtual position of the listener and the position of the marker sound to the current pointer position, and updates the transfer function used by the audio stream generation unit 470. To do. As a result, in the example of FIG. 7C, the user can hear the sound of the second audio content 632 and the marker sound at a close distance, and can hear the sounds of the first and third

audio contents

631 and 633 from a distance. become. Then, the audio reproduction control unit 460 proceeds to step S1600.

Further, when there is a determination operation (S1700: YES), the audio reproduction control unit 460 proceeds to step S2100.

In step S2100, the audio reproduction control unit 460 performs predetermined processing on the determined audio content, and proceeds to step S1800.

Then, when instructed to end the process (S1800: YES), the audio reproduction control unit 460 ends the series of processes.

By such an operation, the audio content processing apparatus 400 can present the weight of the audio content where the pointer is located with the sound quality of the marker sound indicating the position of the pointer.

Next, the operation of the audio content reproduction system 100 will be described.

FIG. 9 is a sequence diagram showing an example of the operation of the audio content reproduction system 100. In FIG. 9, the corresponding step numbers are assigned to the portions corresponding to the processing of the audio content processing apparatus 400 shown in FIG.

In the information storage unit 410, the audio content supply server 520 downloads a search result for the search query, and acquires an information list (see FIG. 4) (A01) (S1100). Next, the position arrangement unit 420 calculates the presentation position of each audio content based on the weight (A02), and adds the calculation result to the information list (A03) (S1200). Next, the position arrangement unit 420 causes the presentation position calculation unit 440 to set the initial position of the pointer position (A04) (S1300).

Next, the audio reproduction control unit 460 constructs a virtual sound field space (B01). Then, the audio reproduction control unit 460 causes the audio stream generation unit 470 to start generating an audio stream (B02), and causes the marker sound generation unit 450 to start generating a marker sound (B03) (S1400). Then, the audio stream generation unit 470 generates an audio stream including the marker sound and the audio content audio, and outputs the audio stream from the audio output device 200 (B04) (S1600).

Then, when the pointer moves, the pointer position acquisition unit 430 outputs pointer operation information to the presentation position calculation unit 440 (C01). The presentation position calculation unit 440 calculates the pointer position after movement, and outputs new pointer position information to the marker sound generation unit 450 and the audio reproduction control unit 460 (C02). As a result, the audio reproduction control unit 460 reconstructs the virtual sound field space (D01). Then, the audio reproduction control unit 460 causes the audio stream generation unit 470 to start generating the audio stream (D02), and causes the marker sound generation unit 450 to start generating the marker sound whose sound quality has been changed (D03) ( S1900, S2000).

As described above, when the audio content processing apparatus 400 according to the present embodiment moves the pointer between a plurality of audio contents, the weight of the audio content where the pointer is located is set to the marker sound associated with the weight. Present with sound quality. As a result, the audio content processing apparatus 400 can present the weight of each audio content to the user in an easily understandable manner.

When the pointer is continuously moved with respect to the audio contents arranged in the order of weight as in the present embodiment, the sound quality of the marker sound is continuously changed while the pointer is continuously moved. Therefore, the audio content processing apparatus 400 can present the user with an intuitive understanding of which direction the pointer is moving. That is, the audio content processing apparatus 400 can present a change in the weight corresponding to the pointer as a change in the sound quality of the marker sound.

Also, when the weight axis is finite and the pointer can be moved at high speed, the audio content processing apparatus 400 can easily allow the user to move the pointer over the entire range. Therefore, the audio content processing apparatus 400 can present to the user an intuitive understanding of where the weight of the audio content where the pointer is currently located is in the entire audio content.

For example, the range of the change in the sound quality of the marker sound, such as the audible range and the frequency band that can be output from the speaker, is fixed to some extent. Therefore, the audio content processing apparatus 400 can present not only the relative position of the weight indicated by the degree and direction of weight change but also the approximate absolute position of the weight in the whole.

Also, when the moving speed of the pointer on the weight axis is constant, the audio content processing apparatus 400 not only depends on the degree of change in the sound quality of the marker sound but also on the degree of change in the weight depending on the length of time required for movement. Can be presented to the user.

The variety of user interfaces on mobile terminals is diversifying. In the case of an operation interface such as a touch UI (user interface), the visibility of a GUI (graphical user interface) to be operated is improved. However, when audio information can be expressed, it is often convenient to use audio information rather than visual information, and the weight information is exactly such information. Therefore, according to audio content processing apparatus 400 according to the present embodiment, it is possible to provide a user interface with improved operability to the mobile terminal.

In addition, since the audio content processing apparatus 400 according to the present embodiment presents the weights in an intuitive manner without using a display, the user can quickly confirm the respective weights of a large number of audio contents. enable. Therefore, the user can perform an operation of picking up an enormous amount of audio content in order of weight with a small burden. Such work includes, for example, picking up a large number of voice mails or bulletin board entries in descending order of priority. Another example is the work of picking up the highest number of matches from a radio program with a large number of channels.

Note that the weight position conversion rule and the position sound quality conversion rule used by the audio content processing apparatus 400 are not limited to the above example.

FIG. 10 is a diagram showing a second example of the position sound quality conversion rule. As shown in FIG. 10, the position sound quality conversion rule includes, for example, a function 652 that converts a coordinate value to a marker sound frequency so that the frequency of the marker sound increases in a positive proportional relationship with respect to the increase of the coordinate value. Also good. In this case, the pitch with respect to the origin is 0 Hz, that is, there is no sound, and the marker sound is heard as the pointer position moves away from the origin, and the pitch of the marker sound becomes higher.

FIG. 11 is a diagram showing a third example of the position sound quality conversion rule. In FIG. 11, AS_min to AS_max indicate a range of a predetermined marker sound interval such as an audible range. As shown in FIG. 11, the position sound quality conversion rule is, for example, a function for converting a coordinate value into a marker sound frequency so that the marker sound frequency increases exponentially with respect to a decrease in the coordinate value. It may be 653. This function 653 is substantially equivalent to the function 651 shown in FIG. 8 in which the vertical axis is replaced with a scale instead of a frequency. Note that the function 653 may be converted into a limited pitch such as 12 scales.

FIG. 12 is a diagram showing a fourth example of the position sound quality conversion rule. As shown in FIG. 12, the position sound quality conversion rule is, for example, a function for converting the coordinate value into the pitch of the marker sound so that the scale of the marker sound increases exponentially with respect to the decrease in the coordinate value. It may be 654. This is almost equivalent to the value of the vertical axis of the function 653 shown in FIG. In this case, the scale change is large relative to the coordinate value change near the origin. Therefore, the audio content processing apparatus 400 can change the sound quality of the marker sound so that it is sensitive to a weight difference between audio contents having a large weight and insensitive to a weight difference between audio contents having a small weight. .

Here, it is assumed that the scale is a semitone number in which A0 (27.5 Hz) is defined as 0, and sounds that are higher than that by one semitone are sequentially defined as 1, 2, 3,. The relationship between the scale n and the frequency f (n) [Hz] is expressed by the following equation (8).
f (n) = 27.5 × 2 ^ (n / 12) where 0 ≦ n ≦ 96 (8)

The sound quality conversion rules exemplified in FIGS. 11 and 12 correspond to the fact that the sensitivity of the pitch difference of the human ear is exponential, so that it is possible to make it easier for the user to grasp the weight. Practically preferred.

Also, the audio content processing apparatus 400 uses the weight sound quality conversion rule for converting the weight described in the information list into the sound quality parameter value of the marker sound, and determines the sound quality parameter value directly from the weight. May be.

FIG. 13 is a diagram showing an example of a weighted sound quality conversion rule. As shown in FIG. 13, the weight sound quality conversion rule includes, for example, a function 655 for converting the weight into the pitch of the marker sound so that the scale of the marker sound increases exponentially with respect to the increase in the weight. can do.

In this embodiment, the monaural earphone is provided as the audio output device. However, for example, a stereo earphone, a binaural headphone, a monaural speaker, or a stereo speaker may be used. In addition, it is desirable that the audio stream generation unit accurately realize the constructed virtual sound field space. For this purpose, the audio stream generation unit may acquire the type, arrangement, and performance (monaural, stereo, multi-channel, etc.) of the audio output device, and change the audio stream generation method for each type and performance.

Moreover, when the position of the listener of the audio data and the audio content are arranged on the linear weight axis as in the present embodiment, the user can hear the audio of all the audio content from the same direction. Become. Then, when there are many sounds of audio contents that can be heard in a superimposed manner, it is difficult to distinguish each sound. Therefore, the audio reproduction control unit outputs only the audio of the audio content in the section where the pointer is located in the section in which the position of each content is wide (for example, the sections 636 to 638 shown in FIG. 7). In addition, a virtual sound field space may be constructed.

In addition, although the audio content processing apparatus always outputs intermittent marker sounds in the present embodiment, the marker contents may be output at a specific timing. The audio content processing apparatus may use the output timing of the marker sound depending on the type of the audio content.

∙ There are roughly two types of patterns for the start timing and end timing of marker sound playback.

The first is a pattern in which a marker sound is sounded for a certain period prior to the playback of each audio content. In this pattern, for example, if the marker sound is a single piano sound, a piano sound “pawn” having a pitch corresponding to the position of the audio content is heard prior to outputting the audio content.

Second, as exemplified in the present embodiment, a pattern that always sounds a marker sound regardless of whether or not audio content is reproduced. In this pattern, a marker sound having a sound quality corresponding to the pointer position flows as a background sound. Therefore, when the function 653 shown in FIG. 11 is used as the position sound quality conversion rule, the background sound that gradually changes to a lower pitch as the pointer position moves away from the initial position continues continuously or intermittently. As a second variation, a pattern in which a marker sound is not generated when audio content is reproduced can be considered.

In addition, when using the section which gave the width | variety of the position of the above-mentioned content, or a step-like scale, the marker sound will show in which range the weight of audio | voice content exists.

(Embodiment 2)
In the case of stereo audio, a plurality of audio contents can be arranged in different directions. Further, when a person listens to sound, the head (face) is usually directed in the direction in which the sound comes. Therefore, in the second embodiment of the present invention, a plurality of audio contents are arranged so as to surround the front of the user in the virtual sound field space, and the pointer can be operated in the direction of the user's head.

FIG. 14 is a diagram showing an example of the appearance of an audio content reproduction system in which the audio content processing apparatus according to the present embodiment is used, and corresponds to FIG. 1 of the first embodiment. The same parts as those in FIG. 1 are denoted by the same reference numerals, and description thereof will be omitted.

As shown in FIG. 14, the audio content reproduction system 100a in the present embodiment includes an audio output device 200a instead of the audio output device 200 of FIG. The audio output device 200a includes a stereo audio data transmission cable 211a and stereo headphones 221a attached to a human head. A motion sensor 320a that detects the movement of the stereo headphones 221a (that is, the movement of the user's head) is attached to the stereo headphones 221a. The audio content reproduction system 100a includes a portable player 300a that incorporates an audio content processing device different from the audio content processing device of the first embodiment, instead of the portable player 300 of FIG.

The motion sensor 320a detects acceleration and transmits acceleration information as a detection result to the audio content processing device of the portable player 300a by wireless communication or wired communication (for example, communication using the cable 211a).

FIG. 15 is a block diagram showing an example of the configuration of the audio content processing apparatus according to the present embodiment, and corresponds to FIG. 2 of the first embodiment. The same parts as those in FIG.

15, the audio content processing apparatus 400a according to the present embodiment includes a head orientation acquisition unit 480a in addition to the configuration of FIG. The audio content processing apparatus 400a includes a presentation position calculation unit 440a and an audio reproduction control unit 460a instead of the presentation position calculation unit 440 and the audio reproduction control unit 460 of FIG.

The head orientation acquisition unit 480a receives acceleration information from the motion sensor 320a, and acquires the orientation of the head of the user wearing the stereo headphones 221a based on the received acceleration information. More specifically, the head orientation acquisition unit 480a calculates the head orientation relative to the user's front (hereinafter referred to as “head orientation”) by integrating acceleration from a state where the user is stationary and facing the front. Calculated). Then, the head direction acquisition unit 480a outputs head direction information indicating the head direction to the sound reproduction control unit 460a.

Based on the latest pointer position information and content arrangement information, the audio reproduction control unit 460a constructs a virtual sound field space in which the position of the listener is arranged at a position corresponding to the pointer position away from the weight axis described above. . Specifically, the audio reproduction control unit 460a slides the weight axis on which the audio content is arranged on an arc extending in the horizontal direction in front of the listener according to the pointer position. More specifically, the audio reproduction control unit 460a slides the weight axis so that the pointer position is included in the arc. Hereinafter, this arc is referred to as a “presentation window”, and a range included in the arc on the weight axis is referred to as a “pointer range”.

Also, the sound reproduction control unit 460a constructs a virtual sound field space in a state where the listener's head direction is directed to the pointer position based on the head direction information. Therefore, in this embodiment, when the user's head orientation changes, the weight axis on which the audio content is arranged, the pointer position, and the presentation range do not change, but the virtual sound field space changes.

The presentation position calculation unit 440a calculates the current position of the pointer when the pointer is moved on the weight axis according to the pointer operation information and the head orientation information. Specifically, the presentation position calculation unit 440a moves the pointer range in accordance with pointer operation information (that is, an operation using the “return” button 311 and the “forward” button 313). Then, the presentation position calculation unit 440a moves the pointer within the pointer range according to the head direction information (that is, the operation based on the head direction).

Such an audio content processing apparatus 400a can arrange the weight axis in an arc shape with the weight axis extending laterally in front of the user. Thereby, the audio content processing apparatus 400a can arrange a plurality of audio contents in different directions. Therefore, the audio content processing apparatus 400a can make it easy for a user to distinguish between a plurality of audio contents, and even when audio is output at the same time, each audio can be easily distinguished.

In addition, the audio content processing apparatus 400a can move the pointer range according to the button operation, and move the pointer within the pointer range (within the range of the weight axis located in the presentation window) according to the user's head direction. it can. As a result, the audio content processing apparatus 400a allows the user to use both small pointer movements one by one and large pointer movements for each presentation window, thereby improving the operability of pointer movement.

Here, the definition of head orientation and position in the present embodiment will be described.

FIG. 16 is a diagram showing an example of head orientation and position definitions.

As shown in FIG. 16, the audio content processing apparatus 400a defines, for example, the front direction 662 of the user (listener) body 661 with a declination angle ψ based on the negative X-axis direction of a predetermined XY coordinate system. . This XY coordinate system is a coordinate system arranged horizontally with the head 663 of the user (listener) as the origin. Then, the audio content processing apparatus 400a defines the head direction of the head 663 of the user (listener) with an angle φ of the front direction 664 of the head 663 with respect to the front direction 662 of the trunk 661. In addition, the audio content processing apparatus 400a sets the position 665 on the pointer range with a deviation angle θ with respect to the negative X-axis direction of the same XY coordinate system and a distance r from the origin of the XY coordinate system (that is, in polar coordinates). )Define. The audio content processing apparatus 400a sets the coordinate value x of the weight axis to an angle corresponding to the deviation angle θ of the pointer range.

Here, the audio content processing apparatus 400a defines each angle with the clockwise direction as viewed from the user information as positive. Also, the audio content processing apparatus 400a sets the XY coordinate system and sets the angle φ to 0 degrees in a state where the user matches the front direction 664 of the head 663 with the front direction 662 of the body 661. Also, the audio content processing apparatus 400a arranges the weight axis presentation window on a circle 666 whose distance from the origin of the XY coordinate system is r.

In the present embodiment, it is assumed that the position arrangement unit 420 uses a weight position conversion rule in which each audio content is arranged at intervals of 30 degrees. This weight position conversion rule is expressed by, for example, the following equation (9), where the coordinate value of ID “i” is x _i and the weight order (see FIG. 4) is Ord (w _i ).
x _i = Ord (w _i ) × 30 (9)

Next, the operation of the audio content processing apparatus 400a will be described.

FIG. 17 is a diagram illustrating an example of the operation of the audio content processing apparatus, and corresponds to FIG. 3 of the first embodiment. The same parts as those in FIG. 3 are denoted by the same step numbers, and description thereof will be omitted.

In step S1400a, the audio reproduction control unit 460a constructs a virtual sound field space in which the arc-shaped presentation window is arranged in front of the user.

FIG. 18 is a diagram illustrating an example of a presentation window.

As shown in FIG. 18, the audio reproduction control unit 460a does not depend on the front direction 664 of the head 663, but within a range 667 of ± 90 degrees around the front direction 662 of the body 661 from the origin of the XY coordinate system. An arc 668 having a distance r is used as a presentation window. In the initial state, the range including the initial position of the pointer is the pointer range. The deflection angle θ of the pointer range is expressed by, for example, the following formula (10) using the deflection angle ψ in the front direction of the body.
ψ−90 ≦ θ ≦ ψ + 90 (10)

In the case of such a presentation window, even a sound source in front of the body 661 cannot hear a sound source that is not on the same circuit as the position of the body 661. For example, it is assumed that the deflection angle in the front direction of the body is ψ = 80 °, the angle of one sound source 671 is φ = 120 °, and the angle of another sound source 672 is φ ′ = φ + 360 ° = 480 °. This user cannot listen to the sound of the sound source 672. For this determination, the above equation (10) can be used as it is.

In step S1410a of FIG. 17, the head orientation presentation position calculation unit 440a acquires the head orientation based on the acceleration information, and generates head orientation information.

In step S1500a, presentation position calculation unit 440a and marker sound generation unit 450 determine whether or not the pointer has moved based on whether or not there has been at least one of a change in pointer range and a change in head orientation. . The change in the pointer range is an input of pointer operation information, and the change in head direction is a change in head direction information.

In step S2000a, the sound reproduction control unit 460a reconstructs the virtual sound field space according to the current head direction and the pointer range.

By such processing, the audio content processing apparatus 400a can arrange the weight axis in a state of spreading laterally in front of the user and move the pointer range and the pointer position according to the user's button operation and head orientation. it can.

FIG. 19 is a diagram showing an example of the movement of the pointer position and the change of the marker sound accompanying the change in the head direction.

As shown in FIG. 19A, in the initial state, the presentation position calculation unit 440a determines the position of the third audio content 633 among the first to fifth audio contents 631 to 635 arranged in the weight order in the head 663. Assume that the front direction 664 is set. In this case, the pointer is positioned in the front direction 664 of the head 663, that is, in the third audio content 633. Therefore, the sound quality of the marker sound 636 becomes a sound quality according to the weight of the third audio content 633.

Further, it is assumed that the audio reproduction control unit 460a sets the pointer range in a range including the first to fifth audio contents 631 to 635 centering on the third audio content 633 on the weight axis.

With such a virtual sound field space, the audio content processing device 400a can allow the user to hear the audio of the first to fifth audio contents 631 to 635 simultaneously from different directions, making it easy to distinguish them. be able to. That is, even if the sounds of the first to fifth audio contents 631 to 635 are reproduced at the same time, the user can hear them three-dimensionally and can easily recognize each of them.

Then, from the state shown in FIG. 19A, it is assumed that the user twists his / her head to the left and the front direction 664 of the head 663 faces the first audio content 631 as shown in FIG. 19B. In this case, the pointer moves to the first audio content 631, and the sound quality of the marker sound 636 depends on the weight of the first audio content 631 from the sound quality corresponding to the weight of the third audio content 633. The sound quality changes.

Note that the audio content processing apparatus 400a may output the marker sound 636 from all the audio contents 631 to 635 in the presentation window, as shown in FIG. In this case, five marker sounds can be heard, but it is difficult to distinguish the individual marker sounds 636. Therefore, as described with reference to FIG. 19, the marker sound 636 is preferably output only from the position of the audio content in the direction in which the head 663 is facing.

FIG. 21 is a diagram showing an example of the movement of the pointer range accompanying the button operation.

As shown in FIG. 21A, the audio reproduction control unit 460a sets the pointer so that the first to fifth audio contents 631 to 635 are arranged in this order clockwise in the presentation range at a certain time (for example, in an initial state). Suppose you set a range.

In this state, if the “forward” button 313 is operated, the audio reproduction control unit 460a moves the pointer range in a direction in which the pointer position moves away from the origin. That is, as shown in FIG. 21B, the audio reproduction control unit 460a slides the weight axis 630 counterclockwise in the presentation range.

Conversely, if the “return” button 311 is operated, the audio reproduction control unit 460a moves the pointer range in a direction in which the pointer position approaches the origin. That is, the audio reproduction control unit 460a slides the weight axis 630 clockwise in the presentation range.

The slide of the weight axis allows the audio content processing apparatus 400a to give the user a feeling that a plurality of audio contents are sitting on a rotating chair arranged around and are turning together. The user can perform an operation corresponding to scrolling in the GUI by a button operation.

As described above, the audio content processing apparatus 400a according to the present embodiment arranges the weight axis on which a plurality of audio contents are arranged in a state of spreading laterally in front of the user. Thereby, the audio content processing apparatus 400a can present a plurality of audio contents to the user in an easily distinguishable manner.

Accordingly, the audio content processing device 400a makes it easy for the user to grasp which audio content the pointer is located on, and intuitively which audio content corresponds to the sound quality of the marker sound. It can be grasped. Therefore, the audio content processing apparatus 400a can present the weight of each audio content to the user in an easy-to-understand manner even when audio from a plurality of audio contents is played back simultaneously.

In the present embodiment described above, the audio reproduction control unit arranges the weight axis on an arc extending in the horizontal direction forward with the user as the center, but is not limited thereto. For example, the audio reproduction control unit may arrange the weight axis on a straight line extending horizontally in the front direction of the user or on a straight line or a curve extending in the vertical direction or the three-dimensional direction.

Also, when information indicating the displacement of the direction of the head relative to the body (information corresponding to the head direction information) is input, the audio content processing apparatus does not necessarily include the head direction acquisition unit.

In addition, the head direction acquisition unit is not the head direction based on the torso, but the head direction based on another direction (for example, azimuth) or the direction of the body based on the other direction, You may acquire as head direction information.

In the first and second embodiments described above, the position arrangement unit uses the number of matching points as a weight that defines the arrangement order on the weight axis, but is not limited thereto. The position arrangement unit can acquire all attributes of the audio content having the order as weights. Therefore, the weight can use, for example, the lexicographic order of the title names of the audio content, the number of times the audio content is played, the creation date and time of the audio content, and the playback time of the audio content or their order.

Also, the operation methods of pointer movement, pointer range movement, and audio content determination are not limited to the above examples. That is, the pointer position acquisition unit may acquire operation information from various other input devices such as a cross key, a keyboard, and a mouse.

Further, the marker sound is not limited to the above example. Furthermore, the sound quality of the marker sound that changes according to the weight is not limited to the pitch. However, it is desirable that the marker sound can intuitively grasp the change and the direction of the change. Therefore, it is desirable that the sound quality of the marker sound that changes in accordance with the weight includes at least one of the sound quality of the marker sound, the pitch, the sound production interval, the sound production length, and the vibration cycle. It should be noted that even a person with good ears is said to have a range of about 100 steps in the audible range that can be distinguished. Therefore, the marker sound generator weights, for example, a chord combining a plurality of pitches, a sound combining a pitch with another type of sound quality, a sound combining a plurality of sound qualities such as different instruments, and the like. A marker sound to be expressed may be generated. That is, the marker sound generation unit may generate the marker sound by increasing the type in a scaled number method.

For example, if the weight can be expressed by a two-digit decimal number, the marker sound shall be represented by ten pitches of the violin tone and the first digit by the pitch of the piano tone. Can do. The violin and piano timbre marker sound may be a sound of a violin and then a piano sound. Alternatively, the marker sound may be a sound that plays a piano sound and a violin sound simultaneously. The pitches of different types of instruments are relatively easy to distinguish even if they are played simultaneously. Therefore, such a marker sound can express a difference in weight.

Furthermore, as an example of the sound generation interval, the marker sound can represent a weight with a short beep sound interval (about 0.2 to 0.5 seconds). For example, when the marker sound has a large weight, the beep sound interval is shortened, and when the weight is small, the beep sound interval is increased.

As an example of the length of pronunciation, the marker sound can represent the weight by the length of the beep sound.

As an example of the vibration period, the marker sound can represent a weight by the level of the “swell” frequency.

The audio content processing apparatus may generate a marker sound for each of a plurality of types of weights, or generate a marker sound that expresses a plurality of types of weights with different types of sound quality (for example, pitch and vibration period). . Further, the audio content processing apparatus may determine the sound quality based on an order (combined weight) in which a plurality of types of weights are combined.

In addition, whenever the pointer position moves, the marker sound generation unit immediately follows the marker sound indicating the position after the movement, and the marker sound when the pointer moves in the same direction as it is (or an intermediate marker sound) May be sounded. As a result, the audio content processing apparatus can present not only the weight itself but also the direction of change in the weight each time the pointer moves.

Also, the position of the marker sound is not necessarily the same as the pointer position. For example, the audio reproduction control unit arranges the marker sound slightly above or behind the pointer position. Alternatively, the audio stream generation unit generates audio data in which the marker sound is arranged slightly above or behind the pointer position. Thereby, the audio content processing apparatus can make it easier for the user to hear the audio of the audio content. In the second embodiment, when the marker sound is arranged beyond the pointer position as seen from the user, the marker sound sounds like the background sound of the audio content.

As described above, the audio content processing apparatus according to the embodiment of the present invention includes the presentation position calculation unit that moves the pointer between a plurality of audio contents each weighted. And the audio | voice content processing apparatus which concerns on embodiment of this invention has the marker sound production | generation part which presents the said weight of the said audio | voice content in which the said pointer is located with the sound quality of marker sound. Furthermore, the audio content processing apparatus according to the embodiment of the present invention includes an audio reproduction control unit that performs predetermined processing on the audio content where the pointer is located. Thereby, this Embodiment can grasp | ascertain the weight of each audio | voice content intuitively, without using a display.

The disclosure of the description, drawings and abstract contained in the Japanese application of Japanese Patent Application No. 2011-55082 filed on March 14, 2011 is incorporated herein by reference.

The present invention is useful as an audio content processing apparatus and an audio content processing method capable of presenting the weight of each audio content without using a display so that the weight can be intuitively grasped. That is, the present invention is particularly suitable for a mobile terminal mainly using voice modal and a mobile terminal usage environment in which the use of vision should be suppressed.

100, 100a Audio

content playback system

200, 200a

Audio output device

210, 211a Cable 220 Monaural earphone 221a

Stereo headphone

300, 300a Portable player 301 Housing 310 User operation input device 311 “Return” button 312 “Enter” button 313 “Go” Button 320a

Motion sensor

400, 400a Audio content processing device 410 Information storage unit 420 Position arrangement unit 430 Pointer

position acquisition unit

440, 440a Presentation position calculation unit 450 Marker

sound generation unit

460, 460a Audio reproduction control unit 470 Audio stream generation unit 480a Head Department acquisition unit 510 File system 520 Audio content supply server

Claims

A presentation position calculator that moves the pointer between a plurality of weighted audio contents,
A marker sound generation unit that presents the weight of the audio content on which the pointer is located with the sound quality of the marker sound associated with the weight;
An audio reproduction control unit that performs a predetermined process on the audio content where the pointer is located,
Audio content processing apparatus.
The sound quality includes at least one of a pitch, a pronunciation interval, a pronunciation length, and a vibration period.
The audio content processing apparatus according to claim 1.
A position placement unit for placing the plurality of audio contents on a weight axis at a position corresponding to each weight;
The presentation position calculation unit
Move the pointer on the weight axis;
The change in sound quality advances in one direction when the change in position of the pointer on the weight axis advances in one direction.
The audio content processing apparatus according to claim 2.
The predetermined process includes at least one of an audio output of an attribute of the audio content and a reproduction of the audio content.
The audio content processing apparatus according to claim 3.
The presentation position calculation unit
Receiving a movement operation of the pointer on the weight axis;
The audio content processing apparatus according to claim 4.
The voice reproduction control unit
Constructing a virtual sound field space that includes at least the position of the pointer on the weight axis, and in which the marker sound and the sound of the audio content are output from each position;
An audio stream generation unit that generates audio data that realizes the sound field of the constructed virtual sound field space;
The audio content processing apparatus according to claim 5.
The marker sound is an audio pointer indicating the position of the pointer.
The audio content processing apparatus according to claim 6.
The voice reproduction control unit
Constructing the virtual sound field space in which the weight axis is a straight line and the position of the listener of the audio data is arranged at the position of the pointer on the weight axis;
The audio content apparatus according to claim 7.
The voice reproduction control unit
Constructing the virtual sound field space in which the position of the listener is arranged at a position corresponding to the position of the pointer away from the weight axis;
A head orientation acquisition unit for detecting the orientation of the listener's head in real space;
The presentation position calculation unit
When the orientation of the head in the real space is the orientation of the listener's head in the virtual sound field space, the position corresponding to the front direction of the head on the weight axis is the position of the pointer And
The audio content processing apparatus according to claim 7.
The weight is a value indicating the order of attributes having an order among the attributes of the audio content.
The audio content processing apparatus according to claim 7.
Moving the pointer between a plurality of audio content each weighted;
Presenting the weight of the audio content at which the pointer is located with the sound quality of the marker sound associated with the weight;
Performing a predetermined process on the audio content where the pointer is located,
Audio content processing method.