WO2012124268A1 - Audio content processing device and audio content processing method - Google Patents

Audio content processing device and audio content processing method Download PDF

Info

Publication number
WO2012124268A1
WO2012124268A1 PCT/JP2012/001384 JP2012001384W WO2012124268A1 WO 2012124268 A1 WO2012124268 A1 WO 2012124268A1 JP 2012001384 W JP2012001384 W JP 2012001384W WO 2012124268 A1 WO2012124268 A1 WO 2012124268A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio content
audio
pointer
sound
weight
Prior art date
Application number
PCT/JP2012/001384
Other languages
French (fr)
Japanese (ja)
Inventor
友朗 丸山
Original Assignee
パナソニック株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パナソニック株式会社 filed Critical パナソニック株式会社
Publication of WO2012124268A1 publication Critical patent/WO2012124268A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response

Definitions

  • the present invention relates to an audio content processing apparatus and an audio content processing method for performing processing such as reproduction on a plurality of audio contents each weighted.
  • Patent Document 1 describes a technique for displaying the weight of each audio content.
  • the technique described in Patent Document 1 displays a list of combinations of channel numbers (weights) and broadcast programs in Internet radio on a display. Thereby, the user can perform a broadcast program switching operation while confirming the channel number of each broadcast program. That is, the technique described in Patent Document 1 can display the weight of each audio content to the user.
  • Patent Document 1 has a problem that the weight of each audio content cannot be presented so as to be intuitively understood by the user. This is because, in order to confirm the weight of the audio content, the user must check the display content by sequentially looking at the display. Considering an operation of checking a plurality of audio contents while switching at high speed, an audio content processing apparatus that can be presented so as to intuitively understand the weight of each audio content is desired.
  • An object of the present invention is to present so that the weight of each audio content can be intuitively grasped without using a display.
  • FIG. 2 is a block diagram showing an example of the configuration of an audio content processing apparatus according to the first embodiment.
  • positioning information in this Embodiment 1 was added.
  • the figure which shows the 1st example of the position sound quality conversion rule in this Embodiment 1. Sequence diagram showing an example of the operation of the audio content reproduction system according to the first embodiment
  • Block diagram showing an example of the configuration of an audio content processing apparatus according to the second embodiment The figure which shows an example of the definition of head direction and position in this Embodiment 2.
  • FIG. 1 is a diagram showing an example of the appearance of an audio content reproduction system in which the audio content processing apparatus according to Embodiment 1 of the present invention is used.
  • the audio content reproduction system 100 will be described with an audio output device 200 and a portable player 300 provided with the audio content processing device according to Embodiment 1 of the present invention as an example.
  • the audio output device 200 is a device that converts audio data into audio and outputs the audio.
  • the audio output device 200 includes a monaural audio data transmission cable 210 and a monaural earphone 220 worn on one ear of a person.
  • the portable player 300 is a monaural audio player.
  • the portable player 300 places a plurality of weighted audio contents on a weight axis that is a virtual coordinate axis. Then, the portable player 300 switches and reproduces the audio content by moving the pointer on the weight axis.
  • the weight is an image of a channel number in the radio
  • the weight axis is an image of a rotation angle of a knob for switching the channel number
  • the movement of the pointer is an image of rotation of the knob.
  • the portable player 300 has a small casing 301 as its outer shape, and has a user operation input device 310 on the surface of the casing 301.
  • the user operation input device 310 receives various operations such as a pointer position moving operation and a determination operation for audio content from the user.
  • the user operation input device 310 has a “return” button 311, a “decision” button 312, and a “forward” button 313, and detects these presses and releases as operation events. To do.
  • other forms of the user operation input device 310 include, for example, a joystick, a touch panel, or a remote control device separate from the portable player 300.
  • the portable player 300 may further include a display on the surface of the housing 301.
  • the audio content processing apparatus (not shown) according to the present embodiment provided in the portable player 300 moves the pointer on the above-described weight axis in response to the pointer movement operation by the user operation input apparatus 310. That is, the audio content processing apparatus moves the pointer between a plurality of audio contents. Also, every time the pointer position overlaps the position of the audio content, the audio content processing apparatus temporarily reproduces the audio of the audio content. Then, the audio content processing apparatus receives the audio content determination operation from the user operation input device 310, and reproduces the audio of the audio content where the pointer is located at that time. Note that audio reproduction is performed by transmitting audio data to the monaural earphone 220 via the cable 210. In this example, the audio data is transmitted through the cable 210, but it can also be transmitted wirelessly.
  • the audio content where the pointer is located is referred to as “selected audio content”.
  • the audio content that is the target of the determination operation is referred to as “determined audio content”.
  • the audio content processing apparatus arranges a plurality of audio contents on the above-described weight axis. Then, the audio content processing device presents the weight of the selected audio content with the sound quality of the marker sound associated with the weight until the determination operation is performed with the pointer. That is, the audio content processing apparatus generates a marker sound that indicates the weight corresponding to the pointer position by sound quality, and transmits the audio data to the monaural earphone 220 via the cable 210. At this time, the arrangement of the plurality of audio contents on the weight axis matches, for example, the order of the weight of the audio contents.
  • the marker sound is an intermittent simple sound “Pong, Pong,...”, And is a voice pointer indicating the position of the pointer.
  • the marker sound presents the weight of the audio content where the pointer is located due to a change in pitch.
  • FIG. 2 is a block diagram showing an example of the configuration of the audio content processing apparatus according to the present embodiment. Here, for convenience of explanation, other peripheral devices are also illustrated.
  • the audio content processing device 400 includes an information storage unit 410, a position arrangement unit 420, a pointer position acquisition unit 430, a presentation position calculation unit 440, a marker sound generation unit 450, an audio reproduction control unit 460, and an audio stream generation unit. 470.
  • the information storage unit 410 has general database functions such as information recording, correction, deletion, search, and reading, and stores audio data of audio contents and attributes (ID, storage position, weight, etc.) assigned thereto.
  • the information storage unit 410 acquires an information list from the external file system 510 and the audio content supply server 520 via a communication network such as the Internet, and stores the information list.
  • the information list is a list describing identification information of weighted audio contents.
  • the file system 510 is a storage device that stores audio data that is the main body of audio content.
  • the audio content supply server 520 is assumed to be a search server having a function of searching audio content. Specifically, the audio content supply server 520 searches the audio content of the file system 510 in response to the search query from the audio content processing device 400. Then, the audio content supply server 520 replies to the audio content processing apparatus 400 with 999 pieces of audio content information assigned with the search match score as a weight.
  • the information storage unit 410 generates the above information list based on the returned information. It should be noted that illustration and description of the functional unit that issues the search query issuance / returned information in the audio content processing apparatus 400 are omitted.
  • the position arrangement unit 420 associates the identification information of the plurality of audio contents in the information list with the coordinates of the position corresponding to each weight among the above-described weight axes. Hereinafter, this association is referred to as “arrangement of audio content on the weight axis”. Then, the position arrangement unit 420 adds content arrangement information indicating the arrangement of each audio content (correspondence between the audio content identification information and the weight axis coordinates) to the information list.
  • the pointer position acquisition unit 430 acquires operations such as a pointer position moving operation and a determination operation on the current position of the pointer (corresponding audio content) performed by the user operation input device 310.
  • the pointer position acquisition unit 430 acquires a pointer position movement operation
  • the pointer position acquisition unit 430 outputs pointer operation information indicating the direction and degree of the movement to the presentation position calculation unit 440.
  • the pointer position acquisition unit 430 acquires a determination operation for the current position of the pointer
  • the pointer position acquisition unit 430 outputs determination operation information indicating that to the audio reproduction control unit 460.
  • the presentation position calculation unit 440 calculates the current position of the pointer when the pointer is moved on the weight axis according to the pointer operation information. In other words, the presentation position calculation unit 440 moves the pointer to the audio content that the user pays attention to among the plurality of audio contents each weighted.
  • the presentation position calculation unit 440 may move the pointer only to the position of the audio content according to the degree of the pointer operation information, or may move the pointer regardless of the presence or absence of the audio content. Good.
  • the presentation position calculation unit 440 in the present embodiment moves the pointer regardless of the presence or absence of audio content. Then, the presentation position calculation unit 440 outputs pointer position information indicating the current position of the pointer to the marker sound generation unit 450 and the audio reproduction control unit 460.
  • the marker sound generation unit 450 determines the sound quality of the marker sound from the current position of the pointer indicated by the pointer position information. Then, the marker sound generation unit 450 generates audio data of the marker sound having the determined sound quality and outputs it to the audio stream generation unit 470. That is, the marker sound generation unit 450 uses the audio stream generation unit 470 to present the weight of the audio content at which the pointer is located with the sound quality of the marker sound associated with the weight. In addition, the marker sound generation unit 450 uses the audio stream generation unit 470 to present the change in weight between a certain audio content and the next audio content with the sound quality of the marker sound.
  • the audio reproduction control unit 460 calculates the sound field of the virtual sound field space based on the latest pointer position information and content arrangement information, and constructs the virtual sound field space.
  • This virtual sound field space is a virtual sound field space that includes at least the current position of the pointer on the weight axis and outputs the marker sound and the sound of each sound content from each position.
  • the virtual sound field space is a one-dimensional space extending in front of the user.
  • the audio reproduction control unit 460 may handle the weight axis and the virtual sound field space in the same coordinate system. In this case, the construction process of the virtual sound field space is not necessarily required. Then, the audio reproduction control unit 460 outputs sound field information indicating the constructed virtual sound field space to the audio stream generation unit 470.
  • the audio reproduction control unit 460 specifies the selected audio content as the audio content determined by the user, and performs a predetermined process.
  • the predetermined process is a process of causing the audio stream generation unit 470 to stop outputting the marker sound and reproduce the determined audio content.
  • the audio playback control unit 460 also acquires pointer operation information from the pointer position acquisition unit 430 while the determined audio content is being played back.
  • the audio playback control unit 460 returns the playback location, when the “forward” button 313 is pressed, the playback location is advanced, and the “decision” button 312 is pressed. When playback stops.
  • the audio content processing apparatus 400 includes a CPU (central processing unit), a storage medium such as a ROM (read only memory) storing a control program, a working memory such as a RAM (random access memory), and communication. It has a circuit. In this case, the function of each unit described above is realized by the CPU executing the control program.
  • a CPU central processing unit
  • a storage medium such as a ROM (read only memory) storing a control program
  • a working memory such as a RAM (random access memory)
  • communication has a circuit. In this case, the function of each unit described above is realized by the CPU executing the control program.
  • Such an audio content processing apparatus 400 can present the weight of the audio content where the pointer is located (the selected audio content) with the sound quality of the marker sound associated with the weight.
  • the sound quality can be grasped in a very short time as compared with grasping the information displayed on the display, without any work such as movement of the line of sight or reading of characters. Further, in the case of one-dimensional information such as weights, it can be presented so that the user can sufficiently grasp the sound quality. Therefore, the audio content processing apparatus 400 can present the weight of the selected audio content so that the user can intuitively understand without using the display.
  • the audio content processing apparatus 400 can arrange the audio content in the virtual sound field space and render the audio content desired by the user at a timing desired by the user. Therefore, the audio content processing apparatus 400 can provide a comfortable audio content reproduction environment to the user.
  • sound field is a commonly used term and refers to a “real” space where “real” sounds (sound waves) exist.
  • the “sound field” typically refers to the space of a concert venue, where there are sound sources and walls that reflect or absorb sound.
  • the “virtual sound field” is a sound field generated “virtually” around the user by adjusting the sound that enters the user's ear. Although the user feels that there is a sound field (that is, a sound source or a wall) in the surroundings, there is actually no sound field that the user feels. Technologies for creating a virtual sound field include so-called surround technology and three-dimensional sound technology.
  • the “virtual sound field space” in the present embodiment is a space in which a sound source and a wall position are arranged, which a user who hears the virtual sound field will feel.
  • the audio content processing apparatus 400 constructs an arrangement such as a sound source or a wall as a virtual sound field space, and applies a surround technology or a stereophonic technology to this.
  • the sound generated as a result is a sound that makes the user feel the constructed virtual sound field space as an actual sound field space when listening with a speaker or headphones.
  • the audio reproduction control unit 460 constructs a virtual sound field space with the position of the pointer as the position of the listener. Therefore, in this embodiment, when the pointer position changes, the weight axis on which the audio content is arranged does not change, but the virtual sound field space changes.
  • the audio reproduction control unit 460 constructs a virtual sound field space in which the weight axis is a straight line and the position of the listener of the audio data is arranged at the pointer position on the weight axis.
  • the audio reproduction control unit 460 constructs a virtual sound field space according to the positional relationship according to the positional relationship between the audio output device 200 and the user's both ears, such as a speaker or a headphone. It is desirable to do. In the present embodiment, the audio reproduction control unit 460 constructs a virtual sound field space on the assumption that a predetermined monaural headphone 220 worn on the user's ear is used as the audio output device 200.
  • the file system 510 which is an external device, stores a large number of audio contents such as speeches, readings, and monologues. It is assumed that the user uses an information processing apparatus such as a personal computer or a cellular phone (not shown), designates the audio content processing apparatus 400 as a reply destination, and transmits an arbitrary keyword as a search query to the audio content supply server 520.
  • the audio content supply server 520 searches the attributes of the audio content in the file system 510 (such as the title of the audio content, the name of the speaker, the place name of the recording location, and the recording date / time) using keywords. Then, the audio content supply server 520 returns a search result including the search match score to the audio content processing apparatus 400.
  • FIG. 3 is a diagram illustrating an example of the operation of the audio content processing apparatus 400.
  • step S1100 the information storage unit 410 acquires and stores an information list from the information acquired from the audio content supply server 520.
  • FIG. 4 is a diagram showing an example of the information list.
  • the information list 610 describes, for each audio content, an ID 611 that is identification information of the audio content, a storage location 612, a weight 613, and a weight order 614 in association with each other.
  • the storage location 612 is a storage location of audio data of audio content.
  • the weight 613 is the number of match points for the search by the audio content supply server 520.
  • the weight order 614 is an order of the degree of the weight 613.
  • step S1200 of FIG. 3 the position arrangement unit 420 arranges the audio contents listed in the information list on the weight axis in the order of the respective weights. Then, the position arrangement unit 420 adds content arrangement information in which the ID of each audio content is associated with the coordinate value of the weight axis to the information list.
  • the position arrangement unit 420 has a weight position conversion rule for converting the weight described in the information list into weight axis coordinates in advance, and each audio content is determined according to the weight position conversion rule. Place on the weight axis.
  • the weight position conversion rule is a content in which audio content is arranged so that when a change in the coordinate value of the weight axis proceeds in one direction, a change in the arranged weight also proceeds in one direction. That is, the weight position conversion rule is such that only one of the following formulas (1) and (2) always holds.
  • w n is the weight of the audio content with ID “n”
  • x n is the coordinate value of the audio content with ID “n”.
  • w m is the weight of the audio content with ID “m”
  • x m is the coordinate value of the audio content with ID “m”. If w n ⁇ w m , x n ⁇ x m (1) If w n ⁇ w m , x n > x m (2)
  • the coordinate axes are arranged in the order of the weight height from the side close to 0 so that the ratio of the difference between the coordinate values is almost equal to the ratio of the difference in weight, and is closest to the origin. This is the content where audio content is placed at the origin.
  • the weight position conversion rule is defined by a function or a correspondence table, for example.
  • FIG. 5 is a diagram illustrating an example of an information list to which content arrangement information is added.
  • the information list 620 to which the content arrangement information is added describes the presentation position 621 indicating the coordinate value of the weight axis in association with each audio content ID 611.
  • the location arrangement unit 420 may generate content arrangement information separately from the information list and store the content arrangement information in the information storage unit 410 or output it to the audio reproduction control unit 460.
  • the audio reproduction control unit 460 should construct a virtual sound field space (that is, calculate an audio stream) in consideration of wall reflection and sound propagation speed.
  • the audio stream is monaural audio
  • the user position is arranged on a linear weight axis. Therefore, the sound reproduction control unit 460 in the present embodiment ignores the reflection of the wall and the head-related transfer function (the sound quality on the left and right, the phase, the timing shift, etc.) and only attenuates the sound due to the distance.
  • a virtual sound field space that takes into account may be constructed.
  • step S1600 the marker sound generation unit 450 generates audio data of the determined marker sound of the sound quality and outputs the audio data to the audio stream generation unit 470. Also, the audio reproduction control unit 460 outputs the audio data of each audio content to the audio stream generation unit 470. Then, the audio stream generation unit 470 generates and outputs an audio stream (audio data) that realizes the sound field of the constructed virtual sound field space.
  • the user can hear the sound of the first audio content 631 and the marker sound at a close distance, and can hear the sounds of the second and third audio contents 632 and 633 from a distance. Become.
  • step S1700 the audio reproduction control unit 460 determines whether or not there has been a determination operation on the audio content. Specifically, the audio reproduction control unit 460 determines whether decision operation information has been input. If there is no determination operation (S1700: NO), the audio reproduction control unit 460 proceeds to step S1800.
  • step S1800 the audio reproduction control unit 460 determines whether an instruction to end the process is given by a user operation or the like. If the audio reproduction control unit 460 is not instructed to end the process (S1800: NO), the process returns to step S1500.
  • FIG. 7 is a diagram illustrating an example of how the pointer moves.
  • the marker sound generation unit 450 changes the marker sound to the sound quality corresponding to the current pointer position after movement. Specifically, the marker sound generation unit 450 converts the current pointer position to the value of the sound quality parameter using the above-described position sound quality conversion rule, and generates the subsequent marker sound using the obtained value. The value to use.
  • the position sound quality conversion rule is a content that determines the value of the sound quality parameter so that when the change in the coordinate value of the pointer position proceeds in one direction, the change in the sound quality parameter value also proceeds in one direction. That is, the position sound quality conversion rule is such that only one of the following formulas (4) and (5) always holds.
  • P n is the quality parameter when the position pointer is in the coordinate values x n
  • the P m a sound quality parameter when the pointer is located at the coordinate value x m.
  • the position sound quality conversion rule is defined by a function or a correspondence table, for example. If x n ⁇ x m B n ⁇ B m (4) x n> x m if B n ⁇ B m ⁇ (5 )
  • FIG. 8 is a diagram showing a first example of the position sound quality conversion rule.
  • the position sound quality conversion rule is a function 651 that converts a coordinate value to a marker sound frequency so that the coordinate value and the marker sound frequency have a negative proportional relationship, for example.
  • this function 651 When this function 651 is applied, the marker sound becomes the highest sound when the pointer is located at the origin (that is, in the initial state). As the pointer moves away from the origin, the sound of the marker sound becomes lower.
  • the presentation position “0” is associated with the frequency “4040 Hz”.
  • the presentation position “900” is associated with the frequency “3500 Hz”, and the presentation position “1400” is associated with the frequency “3200 Hz”. Therefore, in the example of FIG. 7, the frequency of the marker sound is 4040 Hz at the position 631 of the first audio content, and 3500 Hz (approximately the pitch of A7) at the position 632 of the second audio content.
  • the frequency of the marker sound is 3200 Hz at the position 633 of the third audio content.
  • the weight position conversion rule is a content in which weights are arranged so that when a change in the coordinate value of the weight axis advances in one direction, a change in the arranged weight also advances in one direction. Therefore, the audio content processing apparatus 400 determines the audio parameter so that only one of the following equations (6) and (7) always holds, and changes the marker sound. In other words, in the audio content processing apparatus 400, the change in the sound quality of the marker sound proceeds in one direction when the change in the position on the weight axis of the pointer advances in one direction. If w n ⁇ w m B n ⁇ B m (6) If w n ⁇ w m B n > B m (7)
  • the audio reproduction control unit 460 ends the series of processes.
  • FIG. 9 is a sequence diagram showing an example of the operation of the audio content reproduction system 100.
  • the corresponding step numbers are assigned to the portions corresponding to the processing of the audio content processing apparatus 400 shown in FIG.
  • the audio reproduction control unit 460 constructs a virtual sound field space (B01). Then, the audio reproduction control unit 460 causes the audio stream generation unit 470 to start generating an audio stream (B02), and causes the marker sound generation unit 450 to start generating a marker sound (B03) (S1400). Then, the audio stream generation unit 470 generates an audio stream including the marker sound and the audio content audio, and outputs the audio stream from the audio output device 200 (B04) (S1600).
  • the pointer position acquisition unit 430 outputs pointer operation information to the presentation position calculation unit 440 (C01).
  • the presentation position calculation unit 440 calculates the pointer position after movement, and outputs new pointer position information to the marker sound generation unit 450 and the audio reproduction control unit 460 (C02).
  • the audio reproduction control unit 460 reconstructs the virtual sound field space (D01).
  • the audio reproduction control unit 460 causes the audio stream generation unit 470 to start generating the audio stream (D02), and causes the marker sound generation unit 450 to start generating the marker sound whose sound quality has been changed (D03) ( S1900, S2000).
  • the audio content processing apparatus 400 when the audio content processing apparatus 400 moves the pointer between a plurality of audio contents, the weight of the audio content where the pointer is located is set to the marker sound associated with the weight. Present with sound quality. As a result, the audio content processing apparatus 400 can present the weight of each audio content to the user in an easily understandable manner.
  • the audio content processing apparatus 400 can present the user with an intuitive understanding of which direction the pointer is moving. That is, the audio content processing apparatus 400 can present a change in the weight corresponding to the pointer as a change in the sound quality of the marker sound.
  • the audio content processing apparatus 400 can easily allow the user to move the pointer over the entire range. Therefore, the audio content processing apparatus 400 can present to the user an intuitive understanding of where the weight of the audio content where the pointer is currently located is in the entire audio content.
  • the range of the change in the sound quality of the marker sound such as the audible range and the frequency band that can be output from the speaker, is fixed to some extent. Therefore, the audio content processing apparatus 400 can present not only the relative position of the weight indicated by the degree and direction of weight change but also the approximate absolute position of the weight in the whole.
  • the audio content processing apparatus 400 not only depends on the degree of change in the sound quality of the marker sound but also on the degree of change in the weight depending on the length of time required for movement. Can be presented to the user.
  • the variety of user interfaces on mobile terminals is diversifying.
  • an operation interface such as a touch UI (user interface)
  • the visibility of a GUI (graphical user interface) to be operated is improved.
  • audio information can be expressed, it is often convenient to use audio information rather than visual information, and the weight information is exactly such information. Therefore, according to audio content processing apparatus 400 according to the present embodiment, it is possible to provide a user interface with improved operability to the mobile terminal.
  • the audio content processing apparatus 400 presents the weights in an intuitive manner without using a display, the user can quickly confirm the respective weights of a large number of audio contents. enable. Therefore, the user can perform an operation of picking up an enormous amount of audio content in order of weight with a small burden.
  • Such work includes, for example, picking up a large number of voice mails or bulletin board entries in descending order of priority.
  • Another example is the work of picking up the highest number of matches from a radio program with a large number of channels.
  • weight position conversion rule and the position sound quality conversion rule used by the audio content processing apparatus 400 are not limited to the above example.
  • FIG. 10 is a diagram showing a second example of the position sound quality conversion rule.
  • the position sound quality conversion rule includes, for example, a function 652 that converts a coordinate value to a marker sound frequency so that the frequency of the marker sound increases in a positive proportional relationship with respect to the increase of the coordinate value. Also good.
  • the pitch with respect to the origin is 0 Hz, that is, there is no sound, and the marker sound is heard as the pointer position moves away from the origin, and the pitch of the marker sound becomes higher.
  • FIG. 11 is a diagram showing a third example of the position sound quality conversion rule.
  • AS_min to AS_max indicate a range of a predetermined marker sound interval such as an audible range.
  • the position sound quality conversion rule is, for example, a function for converting a coordinate value into a marker sound frequency so that the marker sound frequency increases exponentially with respect to a decrease in the coordinate value. It may be 653.
  • This function 653 is substantially equivalent to the function 651 shown in FIG. 8 in which the vertical axis is replaced with a scale instead of a frequency. Note that the function 653 may be converted into a limited pitch such as 12 scales.
  • FIG. 12 is a diagram showing a fourth example of the position sound quality conversion rule.
  • the position sound quality conversion rule is, for example, a function for converting the coordinate value into the pitch of the marker sound so that the scale of the marker sound increases exponentially with respect to the decrease in the coordinate value. It may be 654. This is almost equivalent to the value of the vertical axis of the function 653 shown in FIG. In this case, the scale change is large relative to the coordinate value change near the origin. Therefore, the audio content processing apparatus 400 can change the sound quality of the marker sound so that it is sensitive to a weight difference between audio contents having a large weight and insensitive to a weight difference between audio contents having a small weight. .
  • the scale is a semitone number in which A0 (27.5 Hz) is defined as 0, and sounds that are higher than that by one semitone are sequentially defined as 1, 2, 3,.
  • the relationship between the scale n and the frequency f (n) [Hz] is expressed by the following equation (8).
  • f (n) 27.5 ⁇ 2 ⁇ (n / 12) where 0 ⁇ n ⁇ 96 (8)
  • the sound quality conversion rules exemplified in FIGS. 11 and 12 correspond to the fact that the sensitivity of the pitch difference of the human ear is exponential, so that it is possible to make it easier for the user to grasp the weight. Practically preferred.
  • the audio content processing apparatus 400 uses the weight sound quality conversion rule for converting the weight described in the information list into the sound quality parameter value of the marker sound, and determines the sound quality parameter value directly from the weight. May be.
  • FIG. 13 is a diagram showing an example of a weighted sound quality conversion rule.
  • the weight sound quality conversion rule includes, for example, a function 655 for converting the weight into the pitch of the marker sound so that the scale of the marker sound increases exponentially with respect to the increase in the weight. can do.
  • the monaural earphone is provided as the audio output device.
  • a stereo earphone, a binaural headphone, a monaural speaker, or a stereo speaker may be used.
  • the audio stream generation unit may acquire the type, arrangement, and performance (monaural, stereo, multi-channel, etc.) of the audio output device, and change the audio stream generation method for each type and performance.
  • the audio reproduction control unit outputs only the audio of the audio content in the section where the pointer is located in the section in which the position of each content is wide (for example, the sections 636 to 638 shown in FIG. 7).
  • a virtual sound field space may be constructed.
  • the audio content processing apparatus always outputs intermittent marker sounds in the present embodiment
  • the marker contents may be output at a specific timing.
  • the audio content processing apparatus may use the output timing of the marker sound depending on the type of the audio content.
  • the first is a pattern in which a marker sound is sounded for a certain period prior to the playback of each audio content.
  • a marker sound is a single piano sound
  • a piano sound “pawn” having a pitch corresponding to the position of the audio content is heard prior to outputting the audio content.
  • a pattern that always sounds a marker sound regardless of whether or not audio content is reproduced.
  • a marker sound having a sound quality corresponding to the pointer position flows as a background sound. Therefore, when the function 653 shown in FIG. 11 is used as the position sound quality conversion rule, the background sound that gradually changes to a lower pitch as the pointer position moves away from the initial position continues continuously or intermittently.
  • a pattern in which a marker sound is not generated when audio content is reproduced can be considered.
  • the marker sound will show in which range the weight of audio
  • a plurality of audio contents can be arranged in different directions. Further, when a person listens to sound, the head (face) is usually directed in the direction in which the sound comes. Therefore, in the second embodiment of the present invention, a plurality of audio contents are arranged so as to surround the front of the user in the virtual sound field space, and the pointer can be operated in the direction of the user's head.
  • FIG. 14 is a diagram showing an example of the appearance of an audio content reproduction system in which the audio content processing apparatus according to the present embodiment is used, and corresponds to FIG. 1 of the first embodiment.
  • the same parts as those in FIG. 1 are denoted by the same reference numerals, and description thereof will be omitted.
  • the audio content reproduction system 100a in the present embodiment includes an audio output device 200a instead of the audio output device 200 of FIG.
  • the audio output device 200a includes a stereo audio data transmission cable 211a and stereo headphones 221a attached to a human head.
  • a motion sensor 320a that detects the movement of the stereo headphones 221a (that is, the movement of the user's head) is attached to the stereo headphones 221a.
  • the audio content reproduction system 100a includes a portable player 300a that incorporates an audio content processing device different from the audio content processing device of the first embodiment, instead of the portable player 300 of FIG.
  • the motion sensor 320a detects acceleration and transmits acceleration information as a detection result to the audio content processing device of the portable player 300a by wireless communication or wired communication (for example, communication using the cable 211a).
  • FIG. 15 is a block diagram showing an example of the configuration of the audio content processing apparatus according to the present embodiment, and corresponds to FIG. 2 of the first embodiment. The same parts as those in FIG.
  • the audio content processing apparatus 400a includes a head orientation acquisition unit 480a in addition to the configuration of FIG.
  • the audio content processing apparatus 400a includes a presentation position calculation unit 440a and an audio reproduction control unit 460a instead of the presentation position calculation unit 440 and the audio reproduction control unit 460 of FIG.
  • the head orientation acquisition unit 480a receives acceleration information from the motion sensor 320a, and acquires the orientation of the head of the user wearing the stereo headphones 221a based on the received acceleration information. More specifically, the head orientation acquisition unit 480a calculates the head orientation relative to the user's front (hereinafter referred to as “head orientation”) by integrating acceleration from a state where the user is stationary and facing the front. Calculated). Then, the head direction acquisition unit 480a outputs head direction information indicating the head direction to the sound reproduction control unit 460a.
  • the audio reproduction control unit 460a constructs a virtual sound field space in which the position of the listener is arranged at a position corresponding to the pointer position away from the weight axis described above. . Specifically, the audio reproduction control unit 460a slides the weight axis on which the audio content is arranged on an arc extending in the horizontal direction in front of the listener according to the pointer position. More specifically, the audio reproduction control unit 460a slides the weight axis so that the pointer position is included in the arc.
  • this arc is referred to as a “presentation window”, and a range included in the arc on the weight axis is referred to as a “pointer range”.
  • the sound reproduction control unit 460a constructs a virtual sound field space in a state where the listener's head direction is directed to the pointer position based on the head direction information. Therefore, in this embodiment, when the user's head orientation changes, the weight axis on which the audio content is arranged, the pointer position, and the presentation range do not change, but the virtual sound field space changes.
  • the presentation position calculation unit 440a calculates the current position of the pointer when the pointer is moved on the weight axis according to the pointer operation information and the head orientation information. Specifically, the presentation position calculation unit 440a moves the pointer range in accordance with pointer operation information (that is, an operation using the “return” button 311 and the “forward” button 313). Then, the presentation position calculation unit 440a moves the pointer within the pointer range according to the head direction information (that is, the operation based on the head direction).
  • pointer operation information that is, an operation using the “return” button 311 and the “forward” button 313
  • Such an audio content processing apparatus 400a can arrange the weight axis in an arc shape with the weight axis extending laterally in front of the user. Thereby, the audio content processing apparatus 400a can arrange a plurality of audio contents in different directions. Therefore, the audio content processing apparatus 400a can make it easy for a user to distinguish between a plurality of audio contents, and even when audio is output at the same time, each audio can be easily distinguished.
  • the audio content processing apparatus 400a can move the pointer range according to the button operation, and move the pointer within the pointer range (within the range of the weight axis located in the presentation window) according to the user's head direction. it can.
  • the audio content processing apparatus 400a allows the user to use both small pointer movements one by one and large pointer movements for each presentation window, thereby improving the operability of pointer movement.
  • FIG. 16 is a diagram showing an example of head orientation and position definitions.
  • the audio content processing apparatus 400a defines, for example, the front direction 662 of the user (listener) body 661 with a declination angle ⁇ based on the negative X-axis direction of a predetermined XY coordinate system.
  • This XY coordinate system is a coordinate system arranged horizontally with the head 663 of the user (listener) as the origin.
  • the audio content processing apparatus 400a defines the head direction of the head 663 of the user (listener) with an angle ⁇ of the front direction 664 of the head 663 with respect to the front direction 662 of the trunk 661.
  • the audio content processing apparatus 400a sets the position 665 on the pointer range with a deviation angle ⁇ with respect to the negative X-axis direction of the same XY coordinate system and a distance r from the origin of the XY coordinate system (that is, in polar coordinates). )Define.
  • the audio content processing apparatus 400a sets the coordinate value x of the weight axis to an angle corresponding to the deviation angle ⁇ of the pointer range.
  • the audio content processing apparatus 400a defines each angle with the clockwise direction as viewed from the user information as positive. Also, the audio content processing apparatus 400a sets the XY coordinate system and sets the angle ⁇ to 0 degrees in a state where the user matches the front direction 664 of the head 663 with the front direction 662 of the body 661. Also, the audio content processing apparatus 400a arranges the weight axis presentation window on a circle 666 whose distance from the origin of the XY coordinate system is r.
  • the position arrangement unit 420 uses a weight position conversion rule in which each audio content is arranged at intervals of 30 degrees.
  • This weight position conversion rule is expressed by, for example, the following equation (9), where the coordinate value of ID “i” is x i and the weight order (see FIG. 4) is Ord (w i ).
  • x i Ord (w i ) ⁇ 30 (9)
  • FIG. 17 is a diagram illustrating an example of the operation of the audio content processing apparatus, and corresponds to FIG. 3 of the first embodiment.
  • the same parts as those in FIG. 3 are denoted by the same step numbers, and description thereof will be omitted.
  • step S1400a the audio reproduction control unit 460a constructs a virtual sound field space in which the arc-shaped presentation window is arranged in front of the user.
  • FIG. 18 is a diagram illustrating an example of a presentation window.
  • the audio reproduction control unit 460a does not depend on the front direction 664 of the head 663, but within a range 667 of ⁇ 90 degrees around the front direction 662 of the body 661 from the origin of the XY coordinate system.
  • An arc 668 having a distance r is used as a presentation window.
  • the range including the initial position of the pointer is the pointer range.
  • the deflection angle ⁇ of the pointer range is expressed by, for example, the following formula (10) using the deflection angle ⁇ in the front direction of the body. ⁇ 90 ⁇ ⁇ ⁇ + 90 (10)
  • step S1410a of FIG. 17 the head orientation presentation position calculation unit 440a acquires the head orientation based on the acceleration information, and generates head orientation information.
  • step S1500a presentation position calculation unit 440a and marker sound generation unit 450 determine whether or not the pointer has moved based on whether or not there has been at least one of a change in pointer range and a change in head orientation.
  • the change in the pointer range is an input of pointer operation information
  • the change in head direction is a change in head direction information.
  • step S2000a the sound reproduction control unit 460a reconstructs the virtual sound field space according to the current head direction and the pointer range.
  • the audio content processing apparatus 400a can arrange the weight axis in a state of spreading laterally in front of the user and move the pointer range and the pointer position according to the user's button operation and head orientation. it can.
  • FIG. 19 is a diagram showing an example of the movement of the pointer position and the change of the marker sound accompanying the change in the head direction.
  • the presentation position calculation unit 440a determines the position of the third audio content 633 among the first to fifth audio contents 631 to 635 arranged in the weight order in the head 663. Assume that the front direction 664 is set. In this case, the pointer is positioned in the front direction 664 of the head 663, that is, in the third audio content 633. Therefore, the sound quality of the marker sound 636 becomes a sound quality according to the weight of the third audio content 633.
  • the audio reproduction control unit 460a sets the pointer range in a range including the first to fifth audio contents 631 to 635 centering on the third audio content 633 on the weight axis.
  • the audio content processing device 400a can allow the user to hear the audio of the first to fifth audio contents 631 to 635 simultaneously from different directions, making it easy to distinguish them. be able to. That is, even if the sounds of the first to fifth audio contents 631 to 635 are reproduced at the same time, the user can hear them three-dimensionally and can easily recognize each of them.
  • the audio content processing apparatus 400a may output the marker sound 636 from all the audio contents 631 to 635 in the presentation window, as shown in FIG. In this case, five marker sounds can be heard, but it is difficult to distinguish the individual marker sounds 636. Therefore, as described with reference to FIG. 19, the marker sound 636 is preferably output only from the position of the audio content in the direction in which the head 663 is facing.
  • FIG. 21 is a diagram showing an example of the movement of the pointer range accompanying the button operation.
  • the audio reproduction control unit 460a sets the pointer so that the first to fifth audio contents 631 to 635 are arranged in this order clockwise in the presentation range at a certain time (for example, in an initial state). Suppose you set a range.
  • the audio reproduction control unit 460a moves the pointer range in a direction in which the pointer position moves away from the origin. That is, as shown in FIG. 21B, the audio reproduction control unit 460a slides the weight axis 630 counterclockwise in the presentation range.
  • the audio reproduction control unit 460a moves the pointer range in a direction in which the pointer position approaches the origin. That is, the audio reproduction control unit 460a slides the weight axis 630 clockwise in the presentation range.
  • the slide of the weight axis allows the audio content processing apparatus 400a to give the user a feeling that a plurality of audio contents are sitting on a rotating chair arranged around and are turning together.
  • the user can perform an operation corresponding to scrolling in the GUI by a button operation.
  • the audio content processing apparatus 400a arranges the weight axis on which a plurality of audio contents are arranged in a state of spreading laterally in front of the user. Thereby, the audio content processing apparatus 400a can present a plurality of audio contents to the user in an easily distinguishable manner.
  • the audio content processing device 400a makes it easy for the user to grasp which audio content the pointer is located on, and intuitively which audio content corresponds to the sound quality of the marker sound. It can be grasped. Therefore, the audio content processing apparatus 400a can present the weight of each audio content to the user in an easy-to-understand manner even when audio from a plurality of audio contents is played back simultaneously.
  • the audio reproduction control unit arranges the weight axis on an arc extending in the horizontal direction forward with the user as the center, but is not limited thereto.
  • the audio reproduction control unit may arrange the weight axis on a straight line extending horizontally in the front direction of the user or on a straight line or a curve extending in the vertical direction or the three-dimensional direction.
  • the audio content processing apparatus does not necessarily include the head direction acquisition unit.
  • the head direction acquisition unit is not the head direction based on the torso, but the head direction based on another direction (for example, azimuth) or the direction of the body based on the other direction, You may acquire as head direction information.
  • the position arrangement unit uses the number of matching points as a weight that defines the arrangement order on the weight axis, but is not limited thereto.
  • the position arrangement unit can acquire all attributes of the audio content having the order as weights. Therefore, the weight can use, for example, the lexicographic order of the title names of the audio content, the number of times the audio content is played, the creation date and time of the audio content, and the playback time of the audio content or their order.
  • the operation methods of pointer movement, pointer range movement, and audio content determination are not limited to the above examples. That is, the pointer position acquisition unit may acquire operation information from various other input devices such as a cross key, a keyboard, and a mouse.
  • the marker sound is not limited to the above example.
  • the sound quality of the marker sound that changes according to the weight is not limited to the pitch.
  • the marker sound can intuitively grasp the change and the direction of the change. Therefore, it is desirable that the sound quality of the marker sound that changes in accordance with the weight includes at least one of the sound quality of the marker sound, the pitch, the sound production interval, the sound production length, and the vibration cycle. It should be noted that even a person with good ears is said to have a range of about 100 steps in the audible range that can be distinguished.
  • the marker sound generator weights, for example, a chord combining a plurality of pitches, a sound combining a pitch with another type of sound quality, a sound combining a plurality of sound qualities such as different instruments, and the like.
  • a marker sound to be expressed may be generated. That is, the marker sound generation unit may generate the marker sound by increasing the type in a scaled number method.
  • the marker sound shall be represented by ten pitches of the violin tone and the first digit by the pitch of the piano tone.
  • the violin and piano timbre marker sound may be a sound of a violin and then a piano sound.
  • the marker sound may be a sound that plays a piano sound and a violin sound simultaneously.
  • the pitches of different types of instruments are relatively easy to distinguish even if they are played simultaneously. Therefore, such a marker sound can express a difference in weight.
  • the marker sound can represent a weight with a short beep sound interval (about 0.2 to 0.5 seconds). For example, when the marker sound has a large weight, the beep sound interval is shortened, and when the weight is small, the beep sound interval is increased.
  • the marker sound can represent the weight by the length of the beep sound.
  • the marker sound can represent a weight by the level of the “swell” frequency.
  • the audio content processing apparatus may generate a marker sound for each of a plurality of types of weights, or generate a marker sound that expresses a plurality of types of weights with different types of sound quality (for example, pitch and vibration period). . Further, the audio content processing apparatus may determine the sound quality based on an order (combined weight) in which a plurality of types of weights are combined.
  • the marker sound generation unit immediately follows the marker sound indicating the position after the movement, and the marker sound when the pointer moves in the same direction as it is (or an intermediate marker sound) May be sounded.
  • the audio content processing apparatus can present not only the weight itself but also the direction of change in the weight each time the pointer moves.
  • the position of the marker sound is not necessarily the same as the pointer position.
  • the audio reproduction control unit arranges the marker sound slightly above or behind the pointer position.
  • the audio stream generation unit generates audio data in which the marker sound is arranged slightly above or behind the pointer position. Thereby, the audio content processing apparatus can make it easier for the user to hear the audio of the audio content.
  • the marker sound when the marker sound is arranged beyond the pointer position as seen from the user, the marker sound sounds like the background sound of the audio content.
  • the audio content processing apparatus includes the presentation position calculation unit that moves the pointer between a plurality of audio contents each weighted.
  • voice content processing apparatus which concerns on embodiment of this invention has the marker sound production
  • the audio content processing apparatus includes an audio reproduction control unit that performs predetermined processing on the audio content where the pointer is located.
  • the present invention is useful as an audio content processing apparatus and an audio content processing method capable of presenting the weight of each audio content without using a display so that the weight can be intuitively grasped. That is, the present invention is particularly suitable for a mobile terminal mainly using voice modal and a mobile terminal usage environment in which the use of vision should be suppressed.
  • Audio content playback system 200 100, 100a Audio content playback system 200, 200a Audio output device 210, 211a Cable 220 Monaural earphone 221a Stereo headphone 300, 300a Portable player 301 Housing 310 User operation input device 311 “Return” button 312 “Enter” button 313 “Go” Button 320a Motion sensor 400, 400a Audio content processing device 410 Information storage unit 420 Position arrangement unit 430 Pointer position acquisition unit 440, 440a Presentation position calculation unit 450 Marker sound generation unit 460, 460a Audio reproduction control unit 470 Audio stream generation unit 480a Head Department acquisition unit 510 File system 520 Audio content supply server

Abstract

An audio content processing device capable of suggesting the weight of each item of audio content such that same can be intuitively grasped, without using a display. The audio content processing device (400) has: a display position calculation unit (440) that moves a pointer between a plurality of audio content items having weighting allocated to each; a marker sound generation unit (450) that indicates the weight of the audio content item where the pointer is positioned, by using a sound quality for a marker sound associated to said weight; and an audio replay control unit (460) that performs a prescribed processing on the audio content item where the pointer is positioned.

Description

音声コンテンツ処理装置および音声コンテンツ処理方法Audio content processing apparatus and audio content processing method
 本発明は、それぞれ重みが付けられた複数の音声コンテンツに対して、再生などの処理を行う音声コンテンツ処理装置および音声コンテンツ処理方法に関する。 The present invention relates to an audio content processing apparatus and an audio content processing method for performing processing such as reproduction on a plurality of audio contents each weighted.
 音楽プレイヤ、ラジオ受信機、携帯電話機および携帯端末など、音声コンテンツに対して再生などの各種処理を行う機器(以下「音楽コンテンツ処理装置」という)は、広く普及している。近年では、無線通信機能を有し、インターネット等で音楽コンテンツを検索してダウンロードする機能を有する音楽コンテンツ処理装置も登場している。なお、以下の説明において、「音声」とは、音声コンテンツの例からも分かるように、人の声に限定されない、音一般をいう。すなわち、「音声」は、音楽や虫や動物の鳴き声、機械からの騒音などの人工の音、あるいは滝や雷などの自然の音など、広く音を指す概念とする。 Devices that perform various processing such as playback on audio content, such as music players, radio receivers, mobile phones, and mobile terminals (hereinafter referred to as “music content processing devices”) are widely used. In recent years, music content processing apparatuses having a wireless communication function and a function of searching for and downloading music content on the Internet or the like have also appeared. In the following description, “sound” refers to general sound that is not limited to a human voice, as can be seen from an example of audio content. That is, “speech” is a concept that widely refers to sounds such as music, insects and animals, artificial sounds such as noise from machines, and natural sounds such as waterfalls and lightning.
 音楽コンテンツ処理装置で処理の対象となる複数の音声コンテンツには、何かしらの重みが付けられている。この重みは、音声コンテンツの属性のうちの一つとして、その順序を示す値(リニアな値)であり、絶対的な値の場合もあれば、相対的な値の場合もある。例えば、検索マッチ点数は、重みの一種といえる。ユーザは、各音声コンテンツの重みを確認することができる。したがって、ユーザは、例えば、検索マッチ点数が高い順にチェックしていくというように、重みを考慮した音声コンテンツに対する操作を行うことができる。 Some weight is given to a plurality of audio contents to be processed by the music content processing apparatus. This weight is a value (linear value) indicating the order as one of the attributes of the audio content, and may be an absolute value or a relative value. For example, the search match score is a kind of weight. The user can check the weight of each audio content. Therefore, the user can perform an operation on the audio content in consideration of the weight, for example, checking in order from the highest search match score.
 そこで、各音声コンテンツの重みを表示する技術が、例えば、特許文献1に記載されている。特許文献1記載の技術は、インターネットラジオにおけるチャンネル番号(重み)と放送プログラムとの組のリストを、ディスプレイに表示する。これにより、ユーザは、各放送プログラムのチャンネル番号を確認しながら、放送プログラムの切り替え操作を行うことができる。すなわち、特許文献1記載の技術は、各音声コンテンツの重みを、ユーザに対して表示することができる。 Therefore, for example, Patent Document 1 describes a technique for displaying the weight of each audio content. The technique described in Patent Document 1 displays a list of combinations of channel numbers (weights) and broadcast programs in Internet radio on a display. Thereby, the user can perform a broadcast program switching operation while confirming the channel number of each broadcast program. That is, the technique described in Patent Document 1 can display the weight of each audio content to the user.
特開2008-72672号公報JP 2008-72672 A
 しかしながら、特許文献1記載の技術には、各音声コンテンツの重みを、ユーザに対して直感的に把握できるように提示することができないという課題がある。なぜなら、ユーザは、音声コンテンツの重みを確認するために、逐次、ディスプレイに目を向けて表示内容を確認しなければならないからである。複数の音声コンテンツを高速で切り替えながらチェックしていく操作を考えると、各音声コンテンツの重みを直感的に把握できるように提示可能な音声コンテンツ処理装置が望まれる。 However, the technique described in Patent Document 1 has a problem that the weight of each audio content cannot be presented so as to be intuitively understood by the user. This is because, in order to confirm the weight of the audio content, the user must check the display content by sequentially looking at the display. Considering an operation of checking a plurality of audio contents while switching at high speed, an audio content processing apparatus that can be presented so as to intuitively understand the weight of each audio content is desired.
 本発明の目的は、ディスプレイを使わずに各音声コンテンツの重みを直感的に把握できるように提示することである。 An object of the present invention is to present so that the weight of each audio content can be intuitively grasped without using a display.
 本発明の音声コンテンツ処理装置は、それぞれ重みが付けられた複数の音声コンテンツの間でポインタを移動させる提示位置計算部と、前記ポインタが位置する前記音声コンテンツの前記重みを、前記重みに対応付けられたマーカ音の音質で提示するマーカ音生成部と、前記ポインタが位置する前記音声コンテンツに対して所定の処理を行う音声再生制御部と、を有する。 The audio content processing apparatus of the present invention associates a presentation position calculation unit that moves a pointer between a plurality of audio contents each weighted, and the weight of the audio content on which the pointer is located, with the weight A marker sound generation unit that presents the sound quality of the marker sound that is displayed, and an audio reproduction control unit that performs predetermined processing on the audio content on which the pointer is located.
 本発明の音声コンテンツ処理方法は、それぞれ重みが付けられた複数の音声コンテンツの間でポインタを移動させるステップと、前記ポインタが位置する前記音声コンテンツの前記重みを、前記重みに対応付けられたマーカ音の音質で提示するステップと、前記ポインタが位置する前記音声コンテンツに対して所定の処理を行うステップとを有する。 The audio content processing method of the present invention includes a step of moving a pointer between a plurality of audio contents each having a weight, and a marker associated with the weight of the audio content on which the pointer is positioned. A step of presenting with sound quality and a step of performing predetermined processing on the audio content on which the pointer is located.
 本発明によれば、各音声コンテンツの重みを分かり易く提示することができる。 According to the present invention, the weight of each audio content can be presented in an easily understandable manner.
本発明の実施の形態1に係る音声コンテンツ処理装置が用いられる音声コンテンツ再生システムの外観の一例を示す図The figure which shows an example of the external appearance of the audio | voice content reproduction | regeneration system in which the audio | voice content processing apparatus which concerns on Embodiment 1 of this invention is used. 本実施の形態1に係る音声コンテンツ処理装置の構成の一例を示すブロック図FIG. 2 is a block diagram showing an example of the configuration of an audio content processing apparatus according to the first embodiment. 本実施の形態1に係る音声コンテンツ処理装置の動作の一例を示す図The figure which shows an example of operation | movement of the audio | voice content processing apparatus which concerns on this Embodiment 1. 本実施の形態1における情報リストの一例を示す図The figure which shows an example of the information list in this Embodiment 1. 本実施の形態1におけるコンテンツ配置情報が追加された情報リストの一例を示す図The figure which shows an example of the information list to which the content arrangement | positioning information in this Embodiment 1 was added. 本実施の形態1における重み軸の音声コンテンツ配置の一例を示す図The figure which shows an example of the audio | voice content arrangement | positioning of the weight axis in this Embodiment 1. 本実施の形態1におけるポインタの移動の様子の一例を示す図The figure which shows an example of the mode of the movement of the pointer in this Embodiment 1. 本実施の形態1における位置音質変換ルールの第1の例を示す図The figure which shows the 1st example of the position sound quality conversion rule in this Embodiment 1. 本実施の形態1に係る音声コンテンツ再生システムの動作の一例を示すシーケンス図Sequence diagram showing an example of the operation of the audio content reproduction system according to the first embodiment 本実施の形態1における位置音質変換ルールの第2の例を示す図The figure which shows the 2nd example of the position sound quality conversion rule in this Embodiment 1. 本実施の形態1における位置音質変換ルールの第3の例を示す図The figure which shows the 3rd example of the position sound quality conversion rule in this Embodiment 1. 本実施の形態1における位置音質変換ルールの第4の例を示す図The figure which shows the 4th example of the position sound quality conversion rule in this Embodiment 1. 本実施の形態1における重み音質変換ルールの一例を示す図The figure which shows an example of the weight sound quality conversion rule in this Embodiment 1. 本発明の実施の形態2に係る音声コンテンツ処理装置が用いられる音声コンテンツ再生システムの外観の一例を示す図The figure which shows an example of the external appearance of the audio | voice content reproduction | regeneration system in which the audio | voice content processing apparatus which concerns on Embodiment 2 of this invention is used. 本実施の形態2に係る音声コンテンツ処理装置の構成の一例を示すブロック図Block diagram showing an example of the configuration of an audio content processing apparatus according to the second embodiment 本実施の形態2における頭部向きおよび位置の定義の一例を示す図The figure which shows an example of the definition of head direction and position in this Embodiment 2. 本実施の形態2に係る音声コンテンツ処理装置の動作の一例を示す図The figure which shows an example of operation | movement of the audio | voice content processing apparatus which concerns on this Embodiment 2. 本実施の形態2における表示窓の一例を示す図The figure which shows an example of the display window in this Embodiment 2. 本実施の形態2におけるポインタ位置の移動およびマーカ音の変化の様子の一例を示す図The figure which shows an example of the mode of the movement of the pointer position in this Embodiment 2, and the change state of a marker sound. 本実施の形態2におけるポインタ位置の一例を示す図The figure which shows an example of the pointer position in this Embodiment 2. 本実施の形態2におけるポインタ範囲の移動の様子の一例を示す図The figure which shows an example of the mode of the movement of the pointer range in this Embodiment 2.
 以下、本発明の各実施の形態について、図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
 (実施の形態1)
 図1は、本発明の実施の形態1に係る音声コンテンツ処理装置が用いられる音声コンテンツ再生システムの外観の一例を示す図である。
(Embodiment 1)
FIG. 1 is a diagram showing an example of the appearance of an audio content reproduction system in which the audio content processing apparatus according to Embodiment 1 of the present invention is used.
 図1において、音声コンテンツ再生システム100は、音声出力装置200と、本発明の実施の形態1に係る音声コンテンツ処理装置を内部に備えたポータブルプレイヤ300を一例として説明する。 Referring to FIG. 1, the audio content reproduction system 100 will be described with an audio output device 200 and a portable player 300 provided with the audio content processing device according to Embodiment 1 of the present invention as an example.
 音声出力装置200は、音声データを音声に変換して出力する装置であり、本実施の形態では、モノラル音声データ伝送用のケーブル210と、人の片耳に装着されるモノラルイヤホン220とから成るものとする。 The audio output device 200 is a device that converts audio data into audio and outputs the audio. In the present embodiment, the audio output device 200 includes a monaural audio data transmission cable 210 and a monaural earphone 220 worn on one ear of a person. And
 ポータブルプレイヤ300は、モノラル音声プレイヤである。ポータブルプレイヤ300は、それぞれ重みが付けられた複数の音声コンテンツを仮想的な座標軸である重み軸上に配置する。そして、ポータブルプレイヤ300は、この重み軸上でのポインタの移動により、音声コンテンツを切り替えて再生する。例えば、重みは、ラジオにおけるチャンネル番号のイメージであり、重み軸は、チャンネル番号を切り替えるためのつまみの回転角のイメージであり、ポインタの移動は、つまみの回転のイメージである。 The portable player 300 is a monaural audio player. The portable player 300 places a plurality of weighted audio contents on a weight axis that is a virtual coordinate axis. Then, the portable player 300 switches and reproduces the audio content by moving the pointer on the weight axis. For example, the weight is an image of a channel number in the radio, the weight axis is an image of a rotation angle of a knob for switching the channel number, and the movement of the pointer is an image of rotation of the knob.
 ポータブルプレイヤ300は、その外形を小型の筐体301とし、筐体301の表面に、ユーザ操作入力装置310を有する。ユーザ操作入力装置310は、ポインタ位置の移動操作および音声コンテンツに対する決定操作などの各種操作をユーザから受け付ける。本実施の形態では、ユーザ操作入力装置310は、「戻る」ボタン311、「決定」ボタン312、および「進む」ボタン313を有し、これらの押下およびリリースを、操作のイベントとして検出するものとする。なお、ユーザ操作入力装置310の他の形態として、例えば、ジョイスティックやタッチパネル、あるいは、ポータブルプレイヤ300とは別個のリモコン装置などが挙げられる。ポータブルプレイヤ300は、筐体301の表面に、更にディスプレイを有していてもよい。 The portable player 300 has a small casing 301 as its outer shape, and has a user operation input device 310 on the surface of the casing 301. The user operation input device 310 receives various operations such as a pointer position moving operation and a determination operation for audio content from the user. In the present embodiment, the user operation input device 310 has a “return” button 311, a “decision” button 312, and a “forward” button 313, and detects these presses and releases as operation events. To do. Note that other forms of the user operation input device 310 include, for example, a joystick, a touch panel, or a remote control device separate from the portable player 300. The portable player 300 may further include a display on the surface of the housing 301.
 ポータブルプレイヤ300に備えられた本実施の形態に係る音声コンテンツ処理装置(図示せず)は、ユーザ操作入力装置310でのポインタ移動操作を受けて、上述の重み軸上でポインタを移動させる。すなわち、音声コンテンツ処理装置は、複数の音声コンテンツの間で、ポインタを移動させる。また、音声コンテンツ処理装置は、ポインタ位置が音声コンテンツの位置に重なるごとに、その音声コンテンツの音声を仮再生する。そして、音声コンテンツ処理装置は、ユーザ操作入力装置310での音声コンテンツ決定操作を受けて、ポインタがその時に位置する音声コンテンツの音声を本再生する。なお、音声の再生は、音声データをケーブル210によりモノラルイヤホン220に送信することにより行われる。なお、ここでは、ケーブル210により音声データを送信したが、無線で送信することも可能である。 The audio content processing apparatus (not shown) according to the present embodiment provided in the portable player 300 moves the pointer on the above-described weight axis in response to the pointer movement operation by the user operation input apparatus 310. That is, the audio content processing apparatus moves the pointer between a plurality of audio contents. Also, every time the pointer position overlaps the position of the audio content, the audio content processing apparatus temporarily reproduces the audio of the audio content. Then, the audio content processing apparatus receives the audio content determination operation from the user operation input device 310, and reproduces the audio of the audio content where the pointer is located at that time. Note that audio reproduction is performed by transmitting audio data to the monaural earphone 220 via the cable 210. In this example, the audio data is transmitted through the cable 210, but it can also be transmitted wirelessly.
 以下の説明において、ポインタが位置している音声コンテンツは、「選択されている音声コンテンツ」という。また、決定操作の対象となった音声コンテンツは、「決定された音声コンテンツ」という。 In the following description, the audio content where the pointer is located is referred to as “selected audio content”. The audio content that is the target of the determination operation is referred to as “determined audio content”.
 また、本実施の形態に係る音声コンテンツ処理装置は、複数の音声コンテンツを上述の重み軸上に配置する。そして、音声コンテンツ処理装置は、ポインタにより決定操作が行われるまでの間、選択されている音声コンテンツの重みを、その重みに対応付けられたマーカ音の音質で提示する。すなわち、音声コンテンツ処理装置は、ポインタ位置に対応する重みを音質で示すマーカ音を生成し、その音声データを、ケーブル210を介してモノラルイヤホン220に送信する。なお、この際、複数の音声コンテンツの重み軸上の配置は、例えば、音声コンテンツの重み順に一致する。 Also, the audio content processing apparatus according to the present embodiment arranges a plurality of audio contents on the above-described weight axis. Then, the audio content processing device presents the weight of the selected audio content with the sound quality of the marker sound associated with the weight until the determination operation is performed with the pointer. That is, the audio content processing apparatus generates a marker sound that indicates the weight corresponding to the pointer position by sound quality, and transmits the audio data to the monaural earphone 220 via the cable 210. At this time, the arrangement of the plurality of audio contents on the weight axis matches, for example, the order of the weight of the audio contents.
 本実施の形態では、マーカ音は、「ポン、ポン、・・・」という間欠的な単純音であり、ポインタの位置を示す音声ポインタであるものとする。また、マーカ音は、音程の変化により、ポインタが位置する音声コンテンツの重みを提示するものとする。 In the present embodiment, the marker sound is an intermittent simple sound “Pong, Pong,...”, And is a voice pointer indicating the position of the pointer. In addition, the marker sound presents the weight of the audio content where the pointer is located due to a change in pitch.
 図2は、本実施の形態に係る音声コンテンツ処理装置の構成の一例を示すブロック図である。ここでは、説明の便宜のため、周辺の他の装置についても併せて図示する。 FIG. 2 is a block diagram showing an example of the configuration of the audio content processing apparatus according to the present embodiment. Here, for convenience of explanation, other peripheral devices are also illustrated.
 図2において、音声コンテンツ処理装置400は、情報格納部410、位置配置部420、ポインタ位置取得部430、提示位置計算部440、マーカ音生成部450、音声再生制御部460、および音声ストリーム生成部470を有する。 In FIG. 2, the audio content processing device 400 includes an information storage unit 410, a position arrangement unit 420, a pointer position acquisition unit 430, a presentation position calculation unit 440, a marker sound generation unit 450, an audio reproduction control unit 460, and an audio stream generation unit. 470.
 情報格納部410は、情報の記録、修正、削除、検索、および読み出しなど一般的なデータベースの機能を備え、音声コンテンツの音声データおよびこれに付与された属性(ID、格納位置、重みなど)を格納する。情報格納部410は、例えば、インターネットなどの通信ネットワークを介して、外部のファイルシステム510および音声コンテンツ供給サーバ520から、情報リストを取得し、これを格納する。ここで、情報リストとは、重みが付けられた音声コンテンツの識別情報を記述したリストである。 The information storage unit 410 has general database functions such as information recording, correction, deletion, search, and reading, and stores audio data of audio contents and attributes (ID, storage position, weight, etc.) assigned thereto. Store. For example, the information storage unit 410 acquires an information list from the external file system 510 and the audio content supply server 520 via a communication network such as the Internet, and stores the information list. Here, the information list is a list describing identification information of weighted audio contents.
 本実施の形態では、ファイルシステム510は、音声コンテンツの本体である音声データを格納する記憶装置である。また、音声コンテンツ供給サーバ520は、音声コンテンツの検索の機能を有する検索サーバであるものとする。具体的には、音声コンテンツ供給サーバ520は、音声コンテンツ処理装置400からの検索クエリに対して、ファイルシステム510の音声コンテンツを検索する。そして、音声コンテンツ供給サーバ520は、検索のマッチ点数を重みとして付与した999件分の音声コンテンツの情報を、音声コンテンツ処理装置400へ返信する。情報格納部410は、この返信された情報に基づいて、上述の情報リストを生成するものとする。なお、音声コンテンツ処理装置400のうち、このような検索クエリの発行・返送された情報の受信を行う機能部の図示および説明については省略する。 In the present embodiment, the file system 510 is a storage device that stores audio data that is the main body of audio content. The audio content supply server 520 is assumed to be a search server having a function of searching audio content. Specifically, the audio content supply server 520 searches the audio content of the file system 510 in response to the search query from the audio content processing device 400. Then, the audio content supply server 520 replies to the audio content processing apparatus 400 with 999 pieces of audio content information assigned with the search match score as a weight. The information storage unit 410 generates the above information list based on the returned information. It should be noted that illustration and description of the functional unit that issues the search query issuance / returned information in the audio content processing apparatus 400 are omitted.
 位置配置部420は、情報リストの複数の音声コンテンツの識別情報を、上述の重み軸のうち、それぞれの重みに応じた位置の座標に対応付ける。以下、この対応付けを、「音声コンテンツの重み軸への配置」という。そして、位置配置部420は、各音声コンテンツの配置(音声コンテンツの識別情報と重み軸座標との対応付け)を示すコンテンツ配置情報を、情報リストに追加する。 The position arrangement unit 420 associates the identification information of the plurality of audio contents in the information list with the coordinates of the position corresponding to each weight among the above-described weight axes. Hereinafter, this association is referred to as “arrangement of audio content on the weight axis”. Then, the position arrangement unit 420 adds content arrangement information indicating the arrangement of each audio content (correspondence between the audio content identification information and the weight axis coordinates) to the information list.
 ポインタ位置取得部430は、ユーザ操作入力装置310で行われた、ポインタ位置の移動操作およびポインタの現在位置(に対応する音声コンテンツ)に対する決定操作などの操作を取得する。そして、ポインタ位置取得部430は、ポインタ位置の移動操作を取得したとき、その移動の方向と度合いとを示すポインタ操作情報を、提示位置計算部440へ出力する。また、ポインタ位置取得部430は、ポインタの現在位置に対する決定操作を取得したとき、その旨を示す決定操作情報を、音声再生制御部460へ出力する。 The pointer position acquisition unit 430 acquires operations such as a pointer position moving operation and a determination operation on the current position of the pointer (corresponding audio content) performed by the user operation input device 310. When the pointer position acquisition unit 430 acquires a pointer position movement operation, the pointer position acquisition unit 430 outputs pointer operation information indicating the direction and degree of the movement to the presentation position calculation unit 440. When the pointer position acquisition unit 430 acquires a determination operation for the current position of the pointer, the pointer position acquisition unit 430 outputs determination operation information indicating that to the audio reproduction control unit 460.
 提示位置計算部440は、ポインタ操作情報に応じてポインタを重み軸上で移動させたときの、ポインタの現在位置を計算する。すなわち、提示位置計算部440は、それぞれ重みが付けられた複数の音声コンテンツの間で、ユーザが注目する音声コンテンツへとポインタを移動させる。なお、提示位置計算部440は、ポインタ操作情報示す度合いに応じて、音声コンテンツの位置のみに限定してポインタを移動させてもよいし、音声コンテンツの有無によらずにポインタを移動させてもよい。本実施の形態における提示位置計算部440は、音声コンテンツの有無によらずにポインタを移動させるものとする。そして、提示位置計算部440は、ポインタの現在位置を示すポインタ位置情報を、マーカ音生成部450、および音声再生制御部460へ出力する。 The presentation position calculation unit 440 calculates the current position of the pointer when the pointer is moved on the weight axis according to the pointer operation information. In other words, the presentation position calculation unit 440 moves the pointer to the audio content that the user pays attention to among the plurality of audio contents each weighted. The presentation position calculation unit 440 may move the pointer only to the position of the audio content according to the degree of the pointer operation information, or may move the pointer regardless of the presence or absence of the audio content. Good. The presentation position calculation unit 440 in the present embodiment moves the pointer regardless of the presence or absence of audio content. Then, the presentation position calculation unit 440 outputs pointer position information indicating the current position of the pointer to the marker sound generation unit 450 and the audio reproduction control unit 460.
 マーカ音生成部450は、ポインタ位置情報が示すポインタの現在位置から、マーカ音の音質を決定する。そして、マーカ音生成部450は、決定した音質のマーカ音の音声データを生成し、音声ストリーム生成部470へ出力する。すなわち、マーカ音生成部450は、音声ストリーム生成部470を用いて、ポインタが位置する音声コンテンツの重みを、その重みに対応付けられたマーカ音の音質で提示する。併せて、マーカ音生成部450は、音声ストリーム生成部470を用いて、ある音声コンテンツと次の音声コンテンツとの間における重みの変化の様子を、マーカ音の音質で提示する。 The marker sound generation unit 450 determines the sound quality of the marker sound from the current position of the pointer indicated by the pointer position information. Then, the marker sound generation unit 450 generates audio data of the marker sound having the determined sound quality and outputs it to the audio stream generation unit 470. That is, the marker sound generation unit 450 uses the audio stream generation unit 470 to present the weight of the audio content at which the pointer is located with the sound quality of the marker sound associated with the weight. In addition, the marker sound generation unit 450 uses the audio stream generation unit 470 to present the change in weight between a certain audio content and the next audio content with the sound quality of the marker sound.
 また、音声再生制御部460は、最新のポインタ位置情報およびコンテンツ配置情報に基づいて、仮想音場空間の音場を計算して、仮想音場空間を構築する。この仮想音場空間は、重み軸のうち少なくともポインタの現在位置を含み、マーカ音と各音声コンテンツの音声とが、それぞれの位置から出力される仮想音場空間である。本実施の形態において、モノラル音声出力であることから、仮想音場空間は、ユーザの前方に伸びた1次元の空間である。なお、音声再生制御部460は、重み軸と仮想音場空間とを、同一の座標系で扱ってもよい。この場合には、仮想音場空間の構築の処理は必ずしも必要ではない。そして、音声再生制御部460は、構築した仮想音場空間を示す音場情報を、音声ストリーム生成部470へ出力する。 Also, the audio reproduction control unit 460 calculates the sound field of the virtual sound field space based on the latest pointer position information and content arrangement information, and constructs the virtual sound field space. This virtual sound field space is a virtual sound field space that includes at least the current position of the pointer on the weight axis and outputs the marker sound and the sound of each sound content from each position. In the present embodiment, since the sound output is monaural, the virtual sound field space is a one-dimensional space extending in front of the user. Note that the audio reproduction control unit 460 may handle the weight axis and the virtual sound field space in the same coordinate system. In this case, the construction process of the virtual sound field space is not necessarily required. Then, the audio reproduction control unit 460 outputs sound field information indicating the constructed virtual sound field space to the audio stream generation unit 470.
 また、音声再生制御部460は、決定操作情報を入力されると、選択されている音声コンテンツを、ユーザが決定した音声コンテンツとして特定し、所定の処理を行う。所定の処理は、ここでは、音声ストリーム生成部470に対し、マーカ音の出力を停止させ、決定された音声コンテンツを再生させる処理である。音声再生制御部460は、決定された音声コンテンツを再生させている間は、ポインタ位置取得部430からポインタ操作情報についても取得する。そして、音声再生制御部460は、「戻る」ボタン311が押下されたときは再生箇所を戻し、「進む」ボタン313が押下されたときは再生箇所を進め、「決定」ボタン312が押下されたときは再生を停止する。 Further, when the determination operation information is input, the audio reproduction control unit 460 specifies the selected audio content as the audio content determined by the user, and performs a predetermined process. Here, the predetermined process is a process of causing the audio stream generation unit 470 to stop outputting the marker sound and reproduce the determined audio content. The audio playback control unit 460 also acquires pointer operation information from the pointer position acquisition unit 430 while the determined audio content is being played back. When the “return” button 311 is pressed, the audio playback control unit 460 returns the playback location, when the “forward” button 313 is pressed, the playback location is advanced, and the “decision” button 312 is pressed. When playback stops.
 音声ストリーム生成部470は、音場情報に従って、音場情報が示す仮想音場空間を実現する音声ストリーム(音声データ)を生成し、音声出力装置200へ出力する。具体的には、音声再生制御部460は、音場情報に含まれる音声コンテンツの音声データを、例えば情報格納部410または外部のファイルシステム510から取得する。そして、音声ストリーム生成部470は、マーカ音および音声コンテンツの音声が仮想音場空間の通りに聞こえてくる音場が、音声出力装置200を装着したユーザに対して実現されるような音声ストリームを生成する。この際、音声ストリーム生成部470は、適当な音質および音量、指定された特殊効果などを、音声コンテンツの音声データに適用する。 The audio stream generation unit 470 generates an audio stream (audio data) that realizes the virtual sound field space indicated by the sound field information in accordance with the sound field information, and outputs the sound stream (audio data) to the audio output device 200. Specifically, the audio reproduction control unit 460 obtains audio data of audio content included in the sound field information from, for example, the information storage unit 410 or the external file system 510. Then, the audio stream generation unit 470 generates an audio stream in which the sound field in which the marker sound and the sound of the sound content are heard as the virtual sound field space is realized for the user wearing the sound output device 200. Generate. At this time, the audio stream generation unit 470 applies appropriate sound quality and volume, designated special effects, and the like to the audio data of the audio content.
 音声コンテンツ処理装置400は、図示しないが、例えば、CPU(central processing unit)、制御プログラムを格納したROM(read only memory)などの記憶媒体、RAM(random access memory)などの作業用メモリ、および通信回路を有する。この場合、上記した各部の機能は、CPUが制御プログラムを実行することにより実現される。 Although not shown, the audio content processing apparatus 400 includes a CPU (central processing unit), a storage medium such as a ROM (read only memory) storing a control program, a working memory such as a RAM (random access memory), and communication. It has a circuit. In this case, the function of each unit described above is realized by the CPU executing the control program.
 このような音声コンテンツ処理装置400は、ポインタが位置する音声コンテンツ(選択されている音声コンテンツ)の重みを、その重みに対応付けられたマーカ音の音質で提示することができる。音質の把握は、ディスプレイに表示された情報の把握に比べて、視線の移動や文字の読み取りなどの作業を伴わず、非常に短い時間で行うことができる。また、重みのような1次元的な情報の場合、音質でユーザに対して十分に把握できるように提示することが可能である。したがって、音声コンテンツ処理装置400は、ディスプレイを使わずに、選択されている音声コンテンツの重みを、ユーザが直感的に把握できるように提示することができる。 Such an audio content processing apparatus 400 can present the weight of the audio content where the pointer is located (the selected audio content) with the sound quality of the marker sound associated with the weight. The sound quality can be grasped in a very short time as compared with grasping the information displayed on the display, without any work such as movement of the line of sight or reading of characters. Further, in the case of one-dimensional information such as weights, it can be presented so that the user can sufficiently grasp the sound quality. Therefore, the audio content processing apparatus 400 can present the weight of the selected audio content so that the user can intuitively understand without using the display.
 また、音声コンテンツ処理装置400は、音声コンテンツを仮想音場空間内に配置し、ユーザが望む音声コンテンツを、ユーザが望むタイミングでレンダリングすることができ。したがって、音声コンテンツ処理装置400は、ユーザに対して、快適な音声コンテンツ再生環境を提供することができる。 In addition, the audio content processing apparatus 400 can arrange the audio content in the virtual sound field space and render the audio content desired by the user at a timing desired by the user. Therefore, the audio content processing apparatus 400 can provide a comfortable audio content reproduction environment to the user.
 ここでは、本実施の形態における「仮想音場空間」について、簡単に説明する。 Here, the “virtual sound field space” in the present embodiment will be briefly described.
 まず、「音場」とは、一般的に利用される用語であり、「実在の」音(音波)が存在する「実在の」空間のことである。「音場」は、典型的には、コンサート会場の空間等を指し、そこには音源や音を反射あるいは吸収する壁などが存在している。 First, the term “sound field” is a commonly used term and refers to a “real” space where “real” sounds (sound waves) exist. The “sound field” typically refers to the space of a concert venue, where there are sound sources and walls that reflect or absorb sound.
 これに対して、「仮想音場」とは、ユーザの耳に入る音を調整することにより、ユーザの周囲に「仮想的に」発生させた音場である。ユーザには、周囲に音場(つまり音源や壁)があると感じられるが、実際にはそのようなユーザが感じる音場はない。仮想音場を作成する技術には、サラウンド技術や立体音響技術などと呼ばれるものがある。 On the other hand, the “virtual sound field” is a sound field generated “virtually” around the user by adjusting the sound that enters the user's ear. Although the user feels that there is a sound field (that is, a sound source or a wall) in the surroundings, there is actually no sound field that the user feels. Technologies for creating a virtual sound field include so-called surround technology and three-dimensional sound technology.
 本実施の形態における「仮想音場空間」は、仮想音場を聞いたユーザが感じるであろう、音源や壁の位置が配置された空間である。音声コンテンツ処理装置400は、仮想音場空間として音源や壁などの配置を構築し、これに対してサラウンド技術や立体音響技術を適用する。この結果生成される音声は、スピーカやヘッドホンで聞いたときに、構築された仮想音場空間を実際の音場空間としてユーザに感じさせるような音声となる。 The “virtual sound field space” in the present embodiment is a space in which a sound source and a wall position are arranged, which a user who hears the virtual sound field will feel. The audio content processing apparatus 400 constructs an arrangement such as a sound source or a wall as a virtual sound field space, and applies a surround technology or a stereophonic technology to this. The sound generated as a result is a sound that makes the user feel the constructed virtual sound field space as an actual sound field space when listening with a speaker or headphones.
 なお、本実施の形態では、音声再生制御部460は、ポインタの位置を聞き手の位置として、仮想音場空間を構築するものとする。したがって、本実施の形態では、ポインタ位置が変化した場合、音声コンテンツが配置された重み軸自体は変化しないが、仮想音場空間は変化する。また、音声再生制御部460は、重み軸を直線とし、音声データの聞き手の位置を、重み軸上の、ポインタの位置に配置した仮想音場空間を構築するものとする。 In this embodiment, it is assumed that the audio reproduction control unit 460 constructs a virtual sound field space with the position of the pointer as the position of the listener. Therefore, in this embodiment, when the pointer position changes, the weight axis on which the audio content is arranged does not change, but the virtual sound field space changes. The audio reproduction control unit 460 constructs a virtual sound field space in which the weight axis is a straight line and the position of the listener of the audio data is arranged at the pointer position on the weight axis.
 また、音声再生制御部460は、スピーカであるかヘッドホンであるかなど、音声出力装置200とユーザの両耳との間の位置関係に応じて、その位置関係に合わせた仮想音場空間を構築することが望ましい。本実施の形態では、音声再生制御部460は、ユーザの耳に装着される所定のモノラルヘッドホン220が音声出力装置200として使用される前提で、仮想音場空間を構築するものとする。 In addition, the audio reproduction control unit 460 constructs a virtual sound field space according to the positional relationship according to the positional relationship between the audio output device 200 and the user's both ears, such as a speaker or a headphone. It is desirable to do. In the present embodiment, the audio reproduction control unit 460 constructs a virtual sound field space on the assumption that a predetermined monaural headphone 220 worn on the user's ear is used as the audio output device 200.
 次に、音声コンテンツ処理装置400の動作について説明する。 Next, the operation of the audio content processing apparatus 400 will be described.
 ここでは、例として、外部装置であるファイルシステム510には、演説や朗読、独り言などの音声コンテンツが多数格納されているものとする。ユーザは、図示しないパーソナルコンピュータや携帯電話機等の情報処理装置を用い、音声コンテンツ処理装置400を返信先に指定して、任意のキーワードを検索クエリとして音声コンテンツ供給サーバ520に送信したとする。この場合、音声コンテンツ供給サーバ520は、ファイルシステム510の音声コンテンツの属性(音声コンテンツのタイトル、話者の名前、収録場所の地名、および収録日時など)をキーワードで検索する。そして、音声コンテンツ供給サーバ520は、検索のマッチ点数を含む検索結果を、音声コンテンツ処理装置400へ返信する。 Here, as an example, it is assumed that the file system 510, which is an external device, stores a large number of audio contents such as speeches, readings, and monologues. It is assumed that the user uses an information processing apparatus such as a personal computer or a cellular phone (not shown), designates the audio content processing apparatus 400 as a reply destination, and transmits an arbitrary keyword as a search query to the audio content supply server 520. In this case, the audio content supply server 520 searches the attributes of the audio content in the file system 510 (such as the title of the audio content, the name of the speaker, the place name of the recording location, and the recording date / time) using keywords. Then, the audio content supply server 520 returns a search result including the search match score to the audio content processing apparatus 400.
 図3は、音声コンテンツ処理装置400の動作の一例を示す図である。 FIG. 3 is a diagram illustrating an example of the operation of the audio content processing apparatus 400.
 まず、ステップS1100において、情報格納部410は、音声コンテンツ供給サーバ520から取得した情報から、情報リストを取得し、格納する。 First, in step S1100, the information storage unit 410 acquires and stores an information list from the information acquired from the audio content supply server 520.
 図4は、情報リストの一例を示す図である。 FIG. 4 is a diagram showing an example of the information list.
 図4に示すように、情報リスト610は、音声コンテンツごとに、音声コンテンツの識別情報であるID611、格納位置612、重み613、および重み順序614を対応付けて記述する。格納位置612は、音声コンテンツの音声データの格納場所である。重み613は、本実施の形態では音声コンテンツ供給サーバ520による検索のマッチ点数である。重み順序614は、重み613の度合いの順序である。 As shown in FIG. 4, the information list 610 describes, for each audio content, an ID 611 that is identification information of the audio content, a storage location 612, a weight 613, and a weight order 614 in association with each other. The storage location 612 is a storage location of audio data of audio content. In the present embodiment, the weight 613 is the number of match points for the search by the audio content supply server 520. The weight order 614 is an order of the degree of the weight 613.
 そして、図3のステップS1200において、位置配置部420は、情報リストにリストアップされた音声コンテンツを、それぞれの重みの順序で重み軸に配置する。そして、位置配置部420は、各音声コンテンツのIDと重み軸の座標値とを対応付けたコンテンツ配置情報を、情報リストに追加する。 Then, in step S1200 of FIG. 3, the position arrangement unit 420 arranges the audio contents listed in the information list on the weight axis in the order of the respective weights. Then, the position arrangement unit 420 adds content arrangement information in which the ID of each audio content is associated with the coordinate value of the weight axis to the information list.
 具体的には、位置配置部420は、例えば、情報リストに記述された重みを重み軸座標へ変換するための重み位置変換ルールを予め有しており、この重み位置変換ルールに従って各音声コンテンを重み軸に配置する。 Specifically, for example, the position arrangement unit 420 has a weight position conversion rule for converting the weight described in the information list into weight axis coordinates in advance, and each audio content is determined according to the weight position conversion rule. Place on the weight axis.
 具体的には、重み位置変換ルールは、重み軸の座標値の変化が一方向に進むとき、配置された重みの変化も一方向に進むように、音声コンテンツを配置する内容である。すなわち、重み位置変換ルールは、以下の式(1)および式(2)のいずれか一方のみが常に成り立つような内容である。ここで、wは、ID「n」の音声コンテンツの重みであり、xは、ID「n」の音声コンテンツの座標値である。また、wは、ID「m」の音声コンテンツの重みであり、xは、ID「m」の音声コンテンツの座標値である。
 w<w ならば x<x      ・・・(1)
 w<w ならば x>x      ・・・(2)
Specifically, the weight position conversion rule is a content in which audio content is arranged so that when a change in the coordinate value of the weight axis proceeds in one direction, a change in the arranged weight also proceeds in one direction. That is, the weight position conversion rule is such that only one of the following formulas (1) and (2) always holds. Here, w n is the weight of the audio content with ID “n”, and x n is the coordinate value of the audio content with ID “n”. Also, w m is the weight of the audio content with ID “m”, and x m is the coordinate value of the audio content with ID “m”.
If w n <w m , x n <x m (1)
If w n <w m , x n > x m (2)
 例えば、重み位置変換ルールは、座標値の差の比が重みの差の比とほぼ等しくなるように、座標軸が0に近い側から重みの高さの順序で配置し、かつ、原点に最も近い音声コンテンツを原点に配置する内容である。重み位置変換ルールは、例えば関数や対応表により定義される。 For example, in the weight position conversion rule, the coordinate axes are arranged in the order of the weight height from the side close to 0 so that the ratio of the difference between the coordinate values is almost equal to the ratio of the difference in weight, and is closest to the origin. This is the content where audio content is placed at the origin. The weight position conversion rule is defined by a function or a correspondence table, for example.
 重み位置変換ルールの最も簡単な例は、情報リストに記述された重みの最大値wを基準に、差分ID「i」の音声コンテンツの重みwの最大値wとの差分に応じて座標値xを決定する内容である。これは、例えば、以下の式(3)で表される。
 x = f(w) ≡ w-w      ・・・(3)
The simplest example of the weight position conversion rule is based on the difference from the maximum value w 0 of the weight w i of the audio content with the difference ID “i”, based on the maximum value w 0 of the weight described in the information list. the contents to determine the coordinate values x i. This is expressed, for example, by the following formula (3).
x i = f (w i ) ≡w 0 −w i (3)
 図5は、コンテンツ配置情報が追加された情報リストの一例を示す図である。 FIG. 5 is a diagram illustrating an example of an information list to which content arrangement information is added.
 図5に示すように、コンテンツ配置情報が追加された情報リスト620は、音声コンテンツのID611ごとに、重み軸の座標値を示す提示位置621を対応付けて記述する。 As shown in FIG. 5, the information list 620 to which the content arrangement information is added describes the presentation position 621 indicating the coordinate value of the weight axis in association with each audio content ID 611.
 なお、位置配置部420は、情報リストとは別にコンテンツ配置情報を生成し、これを情報格納部410に格納してもよいし、音声再生制御部460へ出力してもよい。 Note that the location arrangement unit 420 may generate content arrangement information separately from the information list and store the content arrangement information in the information storage unit 410 or output it to the audio reproduction control unit 460.
 図6は、重み軸における各音声コンテンツの配置の一例を示す図である。 FIG. 6 is a diagram showing an example of the arrangement of each audio content on the weight axis.
 図6に示すように、重み軸630には、例えば、ID「1」~ID「3」の第1~第3の音声コンテンツ631~633が、順に、座標値x=0、900、1400に配置される。 As shown in FIG. 6, on the weight axis 630, for example, first to third audio contents 631 to 633 with ID “1” to ID “3” are sequentially set to coordinate values x = 0, 900, and 1400, respectively. Be placed.
 そして、図3のステップS1300において、提示位置計算部440は、ポインタ位置を初期状態に設定する。また、マーカ音生成部450は、初期状態のポインタ位置に対応する音質をマーカ音に設定することにより、マーカ音を初期状態に設定する。 And in step S1300 of FIG. 3, the presentation position calculation part 440 sets a pointer position to an initial state. In addition, the marker sound generation unit 450 sets the marker sound to the initial state by setting the sound quality corresponding to the pointer position in the initial state to the marker sound.
 本実施の形態において、提示位置計算部440は、重み軸の座標値0を、ポインタの初期位置とする。また、マーカ音生成部450は、ポインタ位置を、マーカ音の音質を規定するパラメータ(以下「音質パラメータ」という)へ変換するための位置音質変換ルールを予め有している。そして、マーカ音生成部450は、この位置音質変換ルールに従って、音質パラメータの値を決定する。位置音質変換ルールの詳細については後述する。 In the present embodiment, the presentation position calculation unit 440 sets the coordinate value 0 of the weight axis as the initial position of the pointer. In addition, the marker sound generation unit 450 has a position sound quality conversion rule for converting the pointer position into a parameter that defines the sound quality of the marker sound (hereinafter referred to as “sound quality parameter”). Then, the marker sound generation unit 450 determines the value of the sound quality parameter according to the position sound quality conversion rule. Details of the position sound quality conversion rule will be described later.
 そして、ステップS1400において、音声再生制御部460は、ポインタの現在位置および各音声コンテンツの配置に基づいて、音声ストリームを生成する際の仮想音場空間を構築する。具体的には、音声再生制御部460は、音源ごと(各音声コンテンツの音およびマーカ音)に、その音源を設定された位置に配置したときの、その音源から聞き手の耳までの音の伝達関数を算出する。そして、音声再生制御部460は、算出した伝達関数を、音場情報として、音声ストリーム生成部470へ出力する。 In step S1400, the audio reproduction control unit 460 constructs a virtual sound field space for generating an audio stream based on the current position of the pointer and the arrangement of each audio content. Specifically, the audio reproduction control unit 460 transmits the sound from the sound source to the listener's ear when the sound source is arranged at a set position for each sound source (the sound of each audio content and the marker sound). Calculate the function. Then, the audio reproduction control unit 460 outputs the calculated transfer function to the audio stream generation unit 470 as sound field information.
 なお、音声再生制御部460は、壁の反射や音の伝搬の速度など考慮して、仮想音場空間を構築(つまり音声ストリームを計算)すべきである。しかし、本実施の形態では、音声ストリームはモノラル音声であり、ユーザ位置は直線状の重み軸上に配置される。したがって、本実施の形態おける音声再生制御部460は、壁の反射、頭部伝達関数(左右での音声の質、位相、およびタイミングのずれなど)については無視し、距離による音の減衰のみを考慮した仮想音場空間を構築すればよい。 Note that the audio reproduction control unit 460 should construct a virtual sound field space (that is, calculate an audio stream) in consideration of wall reflection and sound propagation speed. However, in this embodiment, the audio stream is monaural audio, and the user position is arranged on a linear weight axis. Therefore, the sound reproduction control unit 460 in the present embodiment ignores the reflection of the wall and the head-related transfer function (the sound quality on the left and right, the phase, the timing shift, etc.) and only attenuates the sound due to the distance. A virtual sound field space that takes into account may be constructed.
 そして、ステップS1500において、提示位置計算部440およびマーカ音生成部450は、ポインタの移動があったか否かを判断する。具体的には、提示位置計算部440およびマーカ音生成部450は、ポインタ操作情報が入力されたか否かを判断する。なお、この判断は、提示位置計算部440およびマーカ音生成部450のいずれか一方がポインタ操作情報に基づいて判断し、その判断結果を他方に通知することにより行ってもよい。提示位置計算部440およびマーカ音生成部450は、ポインタの移動がない場合(S1500:NO)、ステップS1600へ進む。 In step S1500, the presentation position calculation unit 440 and the marker sound generation unit 450 determine whether the pointer has moved. Specifically, the presentation position calculation unit 440 and the marker sound generation unit 450 determine whether pointer operation information has been input. This determination may be made by either one of the presentation position calculation unit 440 and the marker sound generation unit 450 determining based on the pointer operation information and notifying the determination result to the other. If there is no movement of the pointer (S1500: NO), the presentation position calculation unit 440 and the marker sound generation unit 450 proceed to step S1600.
 ステップS1600において、マーカ音生成部450は、決定した音質のマーカ音の音声データを生成し、音声ストリーム生成部470へ出力する。また、音声再生制御部460は、各音声コンテンツの音声データを、音声ストリーム生成部470へ出力する。そして、音声ストリーム生成部470は、構築された仮想音場空間の音場を実現する音声ストリーム(音声データ)を生成し、出力する。この結果、図7Aの例では、ユーザには、第1の音声コンテンツ631の音声とマーカ音とが至近距離で聞こえ、第2および第3の音声コンテンツ632、633の音声が遠くから聞こえることになる。 In step S1600, the marker sound generation unit 450 generates audio data of the determined marker sound of the sound quality and outputs the audio data to the audio stream generation unit 470. Also, the audio reproduction control unit 460 outputs the audio data of each audio content to the audio stream generation unit 470. Then, the audio stream generation unit 470 generates and outputs an audio stream (audio data) that realizes the sound field of the constructed virtual sound field space. As a result, in the example of FIG. 7A, the user can hear the sound of the first audio content 631 and the marker sound at a close distance, and can hear the sounds of the second and third audio contents 632 and 633 from a distance. Become.
 なお、マーカ音生成部450は、構築した仮想音場空間において、音声出力されない音声コンテンツ(例えば重み軸上で非常に遠くに位置する音声コンテンツ)の音声データを、音声ストリーム生成部470へ出力しなくてもよい。 Note that the marker sound generation unit 450 outputs the audio data of the audio content that is not output as audio in the constructed virtual sound field space (for example, audio content located very far on the weight axis) to the audio stream generation unit 470. It does not have to be.
 そして、ステップS1700において、音声再生制御部460は、音声コンテンツに対する決定操作があったか否かを判断する。具体的には、音声再生制御部460は、決定操作情報が入力されたか否かを判断する。音声再生制御部460は、決定操作がない場合(S1700:NO)ステップS1800へ進む。 In step S1700, the audio reproduction control unit 460 determines whether or not there has been a determination operation on the audio content. Specifically, the audio reproduction control unit 460 determines whether decision operation information has been input. If there is no determination operation (S1700: NO), the audio reproduction control unit 460 proceeds to step S1800.
 ステップS1800において、音声再生制御部460は、ユーザ操作などにより処理の終了を指示されたか否かを判断する。音声再生制御部460は、処理の終了を指示されていない場合(S1800:NO)、ステップS1500へ戻る。 In step S1800, the audio reproduction control unit 460 determines whether an instruction to end the process is given by a user operation or the like. If the audio reproduction control unit 460 is not instructed to end the process (S1800: NO), the process returns to step S1500.
 一方、提示位置計算部440およびマーカ音生成部450は、ポインタの移動があった場合(S1500:YES)、ステップS1900へ進む。 On the other hand, the presentation position calculation unit 440 and the marker sound generation unit 450 proceed to step S1900 when the pointer moves (S1500: YES).
 図7は、ポインタの移動の様子の一例を示す図である。 FIG. 7 is a diagram illustrating an example of how the pointer moves.
 図7Aに示すように、提示位置計算部440は、初期状態において、原点、つまり第1の音声コンテンツ631の位置にポインタ640を置く。そして、提示位置計算部440は、ユーザ操作入力装置310の「進む」ボタン313が押下されたまま状態の継続時間が増大するごとに、ポインタ640を重み軸630のプラス方向641へ移動させる。また、提示位置計算部440は、ユーザ操作入力装置310の「戻る」ボタン311が押下されたままの状態の継続時間が増大するごとに、ポインタ641を重み軸630のマイナス方向642へ移動させる。 7A, the presentation position calculation unit 440 places the pointer 640 at the origin, that is, the position of the first audio content 631 in the initial state. The presentation position calculation unit 440 moves the pointer 640 in the plus direction 641 of the weight axis 630 each time the duration of the state increases while the “forward” button 313 of the user operation input device 310 is pressed. In addition, the presentation position calculation unit 440 moves the pointer 641 in the minus direction 642 of the weight axis 630 each time the duration of the state in which the “return” button 311 of the user operation input device 310 is kept pressed increases.
 例えば、「進む」ボタン313が連続して押下されたとする。この場合、提示位置計算部440は、図7Bに示すように、ポインタ640を重み軸630のプラス方向641側へ移動させる。その結果、ポインタ640は、第1の音声コンテンツ631の位置から離れていき、やがて図7Cに示すように、次の第2の音声コンテンツ632に近付く。 For example, assume that the “forward” button 313 is continuously pressed. In this case, the presentation position calculation unit 440 moves the pointer 640 to the plus direction 641 side of the weight axis 630 as shown in FIG. 7B. As a result, the pointer 640 moves away from the position of the first audio content 631 and eventually approaches the next second audio content 632 as shown in FIG. 7C.
 なお、提示位置計算部440は、図7Bに示すように、各音声コンテンツの位置631~633に替えて、これらに幅を持たせた区間636~638を、音声コンテンツの位置として扱ってもよい。これにより、ユーザは、ポインタ位置を所望の音声コンテンツに合わせ易くなる。 In addition, as shown in FIG. 7B, the presentation position calculation unit 440 may handle sections 636 to 638 having widths instead of the positions 631 to 633 of the respective audio contents as the positions of the audio contents. . Thereby, the user can easily adjust the pointer position to the desired audio content.
 図3のステップS1900において、マーカ音生成部450は、移動後の現在のポインタ位置に対応する音質に、マーカ音を変更する。具体的には、マーカ音生成部450は、上述の位置音質変換ルールを用いて、現在のポインタ位置を音質パラメータの値に変換し、得られた値を、以降のマーカ音を生成する際に使用する値とする。 In step S1900 of FIG. 3, the marker sound generation unit 450 changes the marker sound to the sound quality corresponding to the current pointer position after movement. Specifically, the marker sound generation unit 450 converts the current pointer position to the value of the sound quality parameter using the above-described position sound quality conversion rule, and generates the subsequent marker sound using the obtained value. The value to use.
 ここでは、本実施の形態における位置音質変換ルールについて説明する。 Here, the position sound quality conversion rule in this embodiment will be described.
 位置音質変換ルールは、ポインタ位置の座標値の変化が一方向に進むとき、音質パラメータの値の変化も一方向に進むように、音質パラメータの値を決定する内容である。すなわち、位置音質変換ルールは、以下の式(4)および式(5)のいずれか一方のみが常に成り立つような内容である。なお、Pは、座標値xにポインタが位置するときの音質パラメータであり、Pには、座標値xにポインタが位置するときの音質パラメータである。位置音質変換ルールは、例えば関数や対応表により定義される。
 x<x ならば B<B      ・・・(4)
 x>x ならば B<B      ・・・(5)
The position sound quality conversion rule is a content that determines the value of the sound quality parameter so that when the change in the coordinate value of the pointer position proceeds in one direction, the change in the sound quality parameter value also proceeds in one direction. That is, the position sound quality conversion rule is such that only one of the following formulas (4) and (5) always holds. Incidentally, P n is the quality parameter when the position pointer is in the coordinate values x n, the P m, a sound quality parameter when the pointer is located at the coordinate value x m. The position sound quality conversion rule is defined by a function or a correspondence table, for example.
If x n <x m B n <B m (4)
x n> x m if B n <B m ··· (5 )
 図8は、位置音質変換ルールの第1の例を示す図である。 FIG. 8 is a diagram showing a first example of the position sound quality conversion rule.
 図8に示すように、位置音質変換ルールは、例えば、座標値とマーカ音の周波数とが負の比例関係となるように、座標値をマーカ音の周波数に変換する関数651である。この関数651を適用した場合、マーカ音は、ポインタが原点に位置するとき(つまり初期状態において)、最も高音となる。そして、ポインタが原点から離れるにしたがって、マーカ音の音は低くなっていく。 As shown in FIG. 8, the position sound quality conversion rule is a function 651 that converts a coordinate value to a marker sound frequency so that the coordinate value and the marker sound frequency have a negative proportional relationship, for example. When this function 651 is applied, the marker sound becomes the highest sound when the pointer is located at the origin (that is, in the initial state). As the pointer moves away from the origin, the sound of the marker sound becomes lower.
 例えば、図8に示すように、提示位置「0」には、周波数「4040Hz」が対応付けられている。そして、提示位置「900」には周波数「3500Hz」が対応付けられ、提示位置「1400」には周波数「3200Hz」が対応付けられている。したがって、図7の例では、マーカ音の周波数は、第1の音声コンテンツの位置631で4040Hzとなり、第2の音声コンテンツの位置632で3500Hz(ほぼA7の音程)となる。また、マーカ音の周波数は、第3の音声コンテンツの位置633で3200Hzとなる。 For example, as shown in FIG. 8, the presentation position “0” is associated with the frequency “4040 Hz”. The presentation position “900” is associated with the frequency “3500 Hz”, and the presentation position “1400” is associated with the frequency “3200 Hz”. Therefore, in the example of FIG. 7, the frequency of the marker sound is 4040 Hz at the position 631 of the first audio content, and 3500 Hz (approximately the pitch of A7) at the position 632 of the second audio content. The frequency of the marker sound is 3200 Hz at the position 633 of the third audio content.
 なお、上述の通り、重み位置変換ルールは、重み軸の座標値の変化が一方向に進むとき、配置された重みの変化も一方向に進むように、重みを配置する内容である。したがって、音声コンテンツ処理装置400は、以下の式(6)および式(7)のいずれか一方のみが常に成り立つように音声パラメータを決定し、マーカ音を変化させることになる。すなわち、音声コンテンツ処理装置400において、マーカ音の音質の変化は、ポインタの重み軸における位置の変化が一方向に進むとき一方向に進むようになっている。
 w<w ならば B<B      ・・・(6)
 w<w ならば B>B      ・・・(7)
As described above, the weight position conversion rule is a content in which weights are arranged so that when a change in the coordinate value of the weight axis advances in one direction, a change in the arranged weight also advances in one direction. Therefore, the audio content processing apparatus 400 determines the audio parameter so that only one of the following equations (6) and (7) always holds, and changes the marker sound. In other words, in the audio content processing apparatus 400, the change in the sound quality of the marker sound proceeds in one direction when the change in the position on the weight axis of the pointer advances in one direction.
If w n <w m B n <B m (6)
If w n <w m B n > B m (7)
 式(6)のみが常に成り立つ場合、ポインタ位置が初期位置から遠ざかるにつれて、音声パラメータは高くなっていく。また、式(7)のみが常に成り立つ場合、ポインタ位置が初期位置から遠ざかるにつれて、音声パラメータは低くなっていく。 When only Expression (6) always holds, the voice parameter increases as the pointer position moves away from the initial position. In addition, when only Expression (7) always holds, the voice parameter becomes lower as the pointer position moves away from the initial position.
 そして、図3のステップS2000において、音声再生制御部460は、移動後の現在のポインタの位置に基づいて、仮想音場空間を再構築する。具体的には、音声再生制御部460は、聞き手の仮想位置およびマーカ音の位置を現在のポインタ位置に変更して音の伝達関数を算出し、音声ストリーム生成部470が使用する伝達関数を更新する。この結果、図7Cの例の場合、ユーザには、第2の音声コンテンツ632の音声とマーカ音とが至近距離で聞こえ、第1および第3の音声コンテンツ631、633の音声が遠くから聞こえることになる。そして、音声再生制御部460は、ステップS1600へ進む。 Then, in step S2000 of FIG. 3, the audio reproduction control unit 460 reconstructs the virtual sound field space based on the current pointer position after movement. Specifically, the audio reproduction control unit 460 calculates a sound transfer function by changing the virtual position of the listener and the position of the marker sound to the current pointer position, and updates the transfer function used by the audio stream generation unit 470. To do. As a result, in the example of FIG. 7C, the user can hear the sound of the second audio content 632 and the marker sound at a close distance, and can hear the sounds of the first and third audio contents 631 and 633 from a distance. become. Then, the audio reproduction control unit 460 proceeds to step S1600.
 また、音声再生制御部460は、決定操作があった場合(S1700:YES)ステップS2100へ進む。 Further, when there is a determination operation (S1700: YES), the audio reproduction control unit 460 proceeds to step S2100.
 ステップS2100において、音声再生制御部460は、決定された音声コンテンツに対して所定の処理を行い、ステップS1800へ進む。 In step S2100, the audio reproduction control unit 460 performs predetermined processing on the determined audio content, and proceeds to step S1800.
 そして、音声再生制御部460は、処理の終了を指示された場合(S1800:YES)、一連の処理を終了する。 Then, when instructed to end the process (S1800: YES), the audio reproduction control unit 460 ends the series of processes.
 このような動作により、音声コンテンツ処理装置400は、ポインタが位置する音声コンテンツの重みを、ポインタの位置を示すマーカ音の音質で提示することができる。 By such an operation, the audio content processing apparatus 400 can present the weight of the audio content where the pointer is located with the sound quality of the marker sound indicating the position of the pointer.
 次に、音声コンテンツ再生システム100の動作について説明する。 Next, the operation of the audio content reproduction system 100 will be described.
 図9は、音声コンテンツ再生システム100の動作の一例を示すシーケンス図である。図9において、図3に示す音声コンテンツ処理装置400の処理と対応する部分には、対応するステップ番号を付す。 FIG. 9 is a sequence diagram showing an example of the operation of the audio content reproduction system 100. In FIG. 9, the corresponding step numbers are assigned to the portions corresponding to the processing of the audio content processing apparatus 400 shown in FIG.
 情報格納部410は、音声コンテンツ供給サーバ520が検索クエリに対する検索結果をダウンロードし、情報リスト(図4参照)を取得する(A01)(S1100)。次いで、位置配置部420は、各音声コンテンツの提示位置をそれぞれの重みに基づいて計算し(A02)、計算結果を情報リストに追加する(A03)(S1200)。次いで、位置配置部420は、提示位置計算部440に、ポインタ位置の初期位置を設定させる(A04)(S1300)。 In the information storage unit 410, the audio content supply server 520 downloads a search result for the search query, and acquires an information list (see FIG. 4) (A01) (S1100). Next, the position arrangement unit 420 calculates the presentation position of each audio content based on the weight (A02), and adds the calculation result to the information list (A03) (S1200). Next, the position arrangement unit 420 causes the presentation position calculation unit 440 to set the initial position of the pointer position (A04) (S1300).
 次いで、音声再生制御部460は、仮想音場空間を構築する(B01)。そして、音声再生制御部460は、音声ストリーム生成部470に対し音声ストリームの生成を開始させ(B02)、マーカ音生成部450に対しマーカ音の生成を開始させる(B03)(S1400)。そして、音声ストリーム生成部470は、マーカ音および音声コンテンツの音声を含む音声ストリームを生成し、音声出力装置200から出力させる(B04)(S1600)。 Next, the audio reproduction control unit 460 constructs a virtual sound field space (B01). Then, the audio reproduction control unit 460 causes the audio stream generation unit 470 to start generating an audio stream (B02), and causes the marker sound generation unit 450 to start generating a marker sound (B03) (S1400). Then, the audio stream generation unit 470 generates an audio stream including the marker sound and the audio content audio, and outputs the audio stream from the audio output device 200 (B04) (S1600).
 そして、ポインタ位置取得部430は、ポインタの移動があると、ポインタ操作情報を提示位置計算部440へ出力する(C01)。提示位置計算部440は、移動後のポインタ位置を計算し、新たなポインタ位置情報をマーカ音生成部450および音声再生制御部460へ出力する(C02)。この結果、音声再生制御部460は、仮想音場空間を再構築する(D01)。そして、音声再生制御部460は、音声ストリーム生成部470に対し音声ストリームの生成を開始させ(D02)、マーカ音生成部450に対し音質が変更されたマーカ音の生成を開始させる(D03)(S1900、S2000)。 Then, when the pointer moves, the pointer position acquisition unit 430 outputs pointer operation information to the presentation position calculation unit 440 (C01). The presentation position calculation unit 440 calculates the pointer position after movement, and outputs new pointer position information to the marker sound generation unit 450 and the audio reproduction control unit 460 (C02). As a result, the audio reproduction control unit 460 reconstructs the virtual sound field space (D01). Then, the audio reproduction control unit 460 causes the audio stream generation unit 470 to start generating the audio stream (D02), and causes the marker sound generation unit 450 to start generating the marker sound whose sound quality has been changed (D03) ( S1900, S2000).
 このように、本実施の形態に係る音声コンテンツ処理装置400は、複数の音声コンテンツの間でポインタを移動させる際に、ポインタが位置する音声コンテンツの重みを、その重みに対応付けられたマーカ音の音質で提示する。これにより、音声コンテンツ処理装置400は、各音声コンテンツの重みを、ユーザに分かり易く提示することができる。 As described above, when the audio content processing apparatus 400 according to the present embodiment moves the pointer between a plurality of audio contents, the weight of the audio content where the pointer is located is set to the marker sound associated with the weight. Present with sound quality. As a result, the audio content processing apparatus 400 can present the weight of each audio content to the user in an easily understandable manner.
 本実施の形態のように、重みの順序で並べられた音声コンテンツに対して、連続的にポインタが移動する場合、ポインタを連続移動させる間、マーカ音の音質も連続的に変化していく。したがって、音声コンテンツ処理装置400は、どちらの方向にポインタが移動していっているのかを、ユーザに直感的に把握できるように提示することができる。すなわち、音声コンテンツ処理装置400は、ポインタに対応する重みの変化を、マーカ音の音質の変化で提示することができる。 When the pointer is continuously moved with respect to the audio contents arranged in the order of weight as in the present embodiment, the sound quality of the marker sound is continuously changed while the pointer is continuously moved. Therefore, the audio content processing apparatus 400 can present the user with an intuitive understanding of which direction the pointer is moving. That is, the audio content processing apparatus 400 can present a change in the weight corresponding to the pointer as a change in the sound quality of the marker sound.
 また、重み軸が有限であり、ポインタを高速に移動可能である場合、音声コンテンツ処理装置400は、ユーザに対して、全範囲に亘るポインタ移動を容易に行わせることができる。したがって、音声コンテンツ処理装置400は、ポインタが現在位置している音声コンテンツの重みが全体の音声コンテンツの中でどの位置にあるのかを、ユーザに直感的に把握できるように提示することができる。 Also, when the weight axis is finite and the pointer can be moved at high speed, the audio content processing apparatus 400 can easily allow the user to move the pointer over the entire range. Therefore, the audio content processing apparatus 400 can present to the user an intuitive understanding of where the weight of the audio content where the pointer is currently located is in the entire audio content.
 例えば、音程であれば可聴域やスピーカから出力可能な周波数帯域など、マーカ音の音質の変化の範囲はある程度決まっている。このため、音声コンテンツ処理装置400は、重みの変化の度合いや方向が示す重みの相対位置だけでなく、全体の中での重みの大まかな絶対位置を、ユーザに提示することができる。 For example, the range of the change in the sound quality of the marker sound, such as the audible range and the frequency band that can be output from the speaker, is fixed to some extent. Therefore, the audio content processing apparatus 400 can present not only the relative position of the weight indicated by the degree and direction of weight change but also the approximate absolute position of the weight in the whole.
 また、音声コンテンツ処理装置400は、重み軸上のポインタの移動速度を一定とする場合、マーカ音の音質の変化の度合いによってだけでなく、移動に要する時間の長さによって、重みの変化の度合いをユーザに提示することができる。 Also, when the moving speed of the pointer on the weight axis is constant, the audio content processing apparatus 400 not only depends on the degree of change in the sound quality of the marker sound but also on the degree of change in the weight depending on the length of time required for movement. Can be presented to the user.
 モバイル端末における各種ユーザインタフェースは、多様化が進んでいる。タッチUI(user interface)などの操作インタフェースの場合は、操作対象となるGUI(graphical user interface)の視認性の向上が図られている。しかし、聴覚情報で表現可能な場合には、視覚情報ではなく聴覚情報を用いた方が、便利な場合が多く、重み情報はまさにこのような情報である。したがって、本実施の形態に係る音声コンテンツ処理装置400によれば、モバイル端末に対して、操作性が更に向上したユーザインタフェースを提供することができる。 The variety of user interfaces on mobile terminals is diversifying. In the case of an operation interface such as a touch UI (user interface), the visibility of a GUI (graphical user interface) to be operated is improved. However, when audio information can be expressed, it is often convenient to use audio information rather than visual information, and the weight information is exactly such information. Therefore, according to audio content processing apparatus 400 according to the present embodiment, it is possible to provide a user interface with improved operability to the mobile terminal.
 また、本実施の形態に係る音声コンテンツ処理装置400は、ディスプレイを使わずに重みを直感的に把握可能に提示するので、ユーザに対し、多数の音声コンテンツのそれぞれの重みを素早く確認することを可能にする。したがって、ユーザは、膨大な音声コンテンツから重み順にピックアップしていく作業を、少ない負担で行うことができる。このような作業には、例えば、膨大なボイスメールあるいは掲示板書き込みの中から優先度の高い順にピックアップしていく作業が挙げられる。また、チャンネル数が膨大なラジオのプログラムの中から検索のマッチ件数が高い順にピックアップしていく作業が挙げられる。 In addition, since the audio content processing apparatus 400 according to the present embodiment presents the weights in an intuitive manner without using a display, the user can quickly confirm the respective weights of a large number of audio contents. enable. Therefore, the user can perform an operation of picking up an enormous amount of audio content in order of weight with a small burden. Such work includes, for example, picking up a large number of voice mails or bulletin board entries in descending order of priority. Another example is the work of picking up the highest number of matches from a radio program with a large number of channels.
 なお、音声コンテンツ処理装置400が用いる重み位置変換ルールおよび位置音質変換ルールは、上述の例に限定されない。 Note that the weight position conversion rule and the position sound quality conversion rule used by the audio content processing apparatus 400 are not limited to the above example.
 図10は、位置音質変換ルールの第2の例を示す図である。図10に示すように、位置音質変換ルールは、例えば、マーカ音の周波数が座標値の増加に対して正の比例関係で増加するように、座標値をマーカ音の周波数に変換する関数652としてもよい。この場合、原点に対する音程は0Hz、つまり、無音であり、ポインタ位置が原点から離れるに従ってマーカ音が聞こえ出し、そのマーカ音の音程が高くなる。 FIG. 10 is a diagram showing a second example of the position sound quality conversion rule. As shown in FIG. 10, the position sound quality conversion rule includes, for example, a function 652 that converts a coordinate value to a marker sound frequency so that the frequency of the marker sound increases in a positive proportional relationship with respect to the increase of the coordinate value. Also good. In this case, the pitch with respect to the origin is 0 Hz, that is, there is no sound, and the marker sound is heard as the pointer position moves away from the origin, and the pitch of the marker sound becomes higher.
 図11は、位置音質変換ルールの第3の例を示す図である。図11において、AS_min~AS_maxは、可聴域など、予め定めたマーカ音の音程の範囲を示す。図11に示すように、位置音質変換ルールは、例えば、座標値の減少に対してマーカ音の周波数が指数関数的に増大する関係となるように、座標値をマーカ音の周波数に変換する関数653としてもよい。この関数653は、図8に示す関数651の縦軸を、周波数ではなく音階に置き換えたものとほぼ等価である。なお、関数653は、12音階など、限定された音程に変換するようにしてもよい。 FIG. 11 is a diagram showing a third example of the position sound quality conversion rule. In FIG. 11, AS_min to AS_max indicate a range of a predetermined marker sound interval such as an audible range. As shown in FIG. 11, the position sound quality conversion rule is, for example, a function for converting a coordinate value into a marker sound frequency so that the marker sound frequency increases exponentially with respect to a decrease in the coordinate value. It may be 653. This function 653 is substantially equivalent to the function 651 shown in FIG. 8 in which the vertical axis is replaced with a scale instead of a frequency. Note that the function 653 may be converted into a limited pitch such as 12 scales.
 図12は、位置音質変換ルールの第4の例を示す図である。図12に示すように、位置音質変換ルールは、例えば、座標値の減少に対してマーカ音の音階が指数関数的に増大する関係となるように、座標値をマーカ音の音程に変換する関数654としてもよい。これは、図11に示す関数653の縦軸の値を、音階で示したものとほぼ等価である。この場合、原点付近では座標値の変化に対して音階の変化が大きい。したがって、音声コンテンツ処理装置400は、重みが大きい音声コンテンツ間の重み差には敏感で、重みが小さい音声コンテンツ間の重み差には鈍感となるように、マーカ音の音質を変化させることができる。 FIG. 12 is a diagram showing a fourth example of the position sound quality conversion rule. As shown in FIG. 12, the position sound quality conversion rule is, for example, a function for converting the coordinate value into the pitch of the marker sound so that the scale of the marker sound increases exponentially with respect to the decrease in the coordinate value. It may be 654. This is almost equivalent to the value of the vertical axis of the function 653 shown in FIG. In this case, the scale change is large relative to the coordinate value change near the origin. Therefore, the audio content processing apparatus 400 can change the sound quality of the marker sound so that it is sensitive to a weight difference between audio contents having a large weight and insensitive to a weight difference between audio contents having a small weight. .
 ここで、音階とは、A0の音(27.5Hz)を0として、それよりも半音ずつ高い音を、順次1、2、3・・・と定義する半音番号であるものとする。音階nと周波数f(n)[Hz]との関係は、以下の式(8)により表される。
f(n)=27.5×2^(n/12) 但し、0≦n≦96 ・・・(8)
Here, it is assumed that the scale is a semitone number in which A0 (27.5 Hz) is defined as 0, and sounds that are higher than that by one semitone are sequentially defined as 1, 2, 3,. The relationship between the scale n and the frequency f (n) [Hz] is expressed by the following equation (8).
f (n) = 27.5 × 2 ^ (n / 12) where 0 ≦ n ≦ 96 (8)
 図11および図12に例示した音質変換ルールは、人間の耳の音程差の感度が指数関数的となっていることに対応するため、ユーザに対して重みを更に把握し易くさせることができ、実用上好ましい。 The sound quality conversion rules exemplified in FIGS. 11 and 12 correspond to the fact that the sensitivity of the pitch difference of the human ear is exponential, so that it is possible to make it easier for the user to grasp the weight. Practically preferred.
 また、音声コンテンツ処理装置400は、情報リストに記述された重みをマーカ音の音質パラメータの値へ変換するための重み音質変換ルールを用いて、重みから直接に音質パラメータの値を決定するようにしてもよい。 Also, the audio content processing apparatus 400 uses the weight sound quality conversion rule for converting the weight described in the information list into the sound quality parameter value of the marker sound, and determines the sound quality parameter value directly from the weight. May be.
 図13は、重み音質変換ルールの一例を示す図である。図13に示すように、重み音質変換ルールは、例えば、重みの増大に対してマーカ音の音階が指数関数的に増大する関係となるように、重みをマーカ音の音程に変換する関数655とすることができる。 FIG. 13 is a diagram showing an example of a weighted sound quality conversion rule. As shown in FIG. 13, the weight sound quality conversion rule includes, for example, a function 655 for converting the weight into the pitch of the marker sound so that the scale of the marker sound increases exponentially with respect to the increase in the weight. can do.
 また、本実施の形態では、音声出力装置としてモノラルイヤホンを備えるとしたが、例えば、ステレオイヤホン、両耳タイプのヘッドホン、モノラルスピーカ、あるいはステレオスピーカであってもよい。また、音声ストリーム生成部は、構築された仮想音場空間が精度よく実現されることが望ましい。このために、音声ストリーム生成部は、音声出力装置の種別や配置、および性能(モノラル、ステレオ、多チャンネルなど)を取得し、種別および性能ごとに音声ストリームの生成手法を変えてもよい。 In this embodiment, the monaural earphone is provided as the audio output device. However, for example, a stereo earphone, a binaural headphone, a monaural speaker, or a stereo speaker may be used. In addition, it is desirable that the audio stream generation unit accurately realize the constructed virtual sound field space. For this purpose, the audio stream generation unit may acquire the type, arrangement, and performance (monaural, stereo, multi-channel, etc.) of the audio output device, and change the audio stream generation method for each type and performance.
 また、本実施の形態のように、音声データの聞き手の位置および音声コンテンツを、直線状の重み軸上に配置した場合、ユーザには、全ての音声コンテンツの音声が、同じ方向から聞こえることになる。すると、重畳して聞こえる音声コンテンツの音声が多い場合、各音声は区別しにくくなる。したがって、音声再生制御部は、各コンテンツの位置に幅を持たせた区間(例えば、図7に示す区間636~638)のうち、ポインタが位置する区間の音声コンテンツの音声のみが出力されるように、仮想音場空間を構築してもよい。 Moreover, when the position of the listener of the audio data and the audio content are arranged on the linear weight axis as in the present embodiment, the user can hear the audio of all the audio content from the same direction. Become. Then, when there are many sounds of audio contents that can be heard in a superimposed manner, it is difficult to distinguish each sound. Therefore, the audio reproduction control unit outputs only the audio of the audio content in the section where the pointer is located in the section in which the position of each content is wide (for example, the sections 636 to 638 shown in FIG. 7). In addition, a virtual sound field space may be constructed.
 また、音声コンテンツ処理装置は、本実施の形態では間欠的なマーカ音を常に出力するとしたが、特定のタイミングでマーカ音を出力させてもよい。また、音声コンテンツ処理装置は、音声コンテンツの種別に応じて、マーカ音の出力タイミングを使い分けるようにしてもよい。 In addition, although the audio content processing apparatus always outputs intermittent marker sounds in the present embodiment, the marker contents may be output at a specific timing. The audio content processing apparatus may use the output timing of the marker sound depending on the type of the audio content.
 マーカ音の再生の開始タイミングおよび終了のタイミングのパターンは、大別して2種類が考えられる。 ∙ There are roughly two types of patterns for the start timing and end timing of marker sound playback.
 1つ目は、各音声コンテンツの再生に先立って、一定期間だけマーカ音を鳴らすパターンである。このパターンでは、例えば、マーカ音を単音のピアノ音とすると、音声コンテンツの音声出力に先立ち、その音声コンテンツの位置に対応する音程の「ポーン」というピアノ音が、聞こえることになる。 The first is a pattern in which a marker sound is sounded for a certain period prior to the playback of each audio content. In this pattern, for example, if the marker sound is a single piano sound, a piano sound “pawn” having a pitch corresponding to the position of the audio content is heard prior to outputting the audio content.
 2つ目は、本実施の形態で例示したように、音声コンテンツの再生の有無にかかわらず、常にマーカ音を鳴らすパターンである。このパターンでは、ポインタ位置に対応した音質のマーカ音が、背景音として流れることになる。したがって、図11に示す関数653が位置音質変換ルールとして用いられる場合、ポインタ位置が初期位置から離れるに従って次第に低い音程に変化する背景音が、連続的または間欠的に継続することになる。なお、2つ目のバリエーションとして、音声コンテンツの再生時にはマーカ音を鳴らさないパターンも考えられる。 Second, as exemplified in the present embodiment, a pattern that always sounds a marker sound regardless of whether or not audio content is reproduced. In this pattern, a marker sound having a sound quality corresponding to the pointer position flows as a background sound. Therefore, when the function 653 shown in FIG. 11 is used as the position sound quality conversion rule, the background sound that gradually changes to a lower pitch as the pointer position moves away from the initial position continues continuously or intermittently. As a second variation, a pattern in which a marker sound is not generated when audio content is reproduced can be considered.
 なお、上述のコンテンツの位置に幅を持たせた区間や、階段状の音階を用いる場合、マーカ音は、音声コンテンツの重みがどの範囲にあるかを提示することになる。 In addition, when using the section which gave the width | variety of the position of the above-mentioned content, or a step-like scale, the marker sound will show in which range the weight of audio | voice content exists.
 (実施の形態2)
 ステレオ音声の場合、複数の音声コンテンツを異なる方向に配置することが可能となる。また、人は、音声を聞くとき、その音声が到来する方向に頭部(顔)を向けるのが通常である。そこで、本発明の実施の形態2は、仮想音場空間において複数の音声コンテンツをユーザの前方を取り囲むように配置し、ユーザの頭部の向きでポインタを操作可能にする。
(Embodiment 2)
In the case of stereo audio, a plurality of audio contents can be arranged in different directions. Further, when a person listens to sound, the head (face) is usually directed in the direction in which the sound comes. Therefore, in the second embodiment of the present invention, a plurality of audio contents are arranged so as to surround the front of the user in the virtual sound field space, and the pointer can be operated in the direction of the user's head.
 図14は、本実施の形態に係る音声コンテンツ処理装置が用いられる音声コンテンツ再生システムの外観の一例を示す図であり、実施の形態1の図1に対応するものである。図1と同一部分には同一符号を付し、これについての説明を省略する。 FIG. 14 is a diagram showing an example of the appearance of an audio content reproduction system in which the audio content processing apparatus according to the present embodiment is used, and corresponds to FIG. 1 of the first embodiment. The same parts as those in FIG. 1 are denoted by the same reference numerals, and description thereof will be omitted.
 図14に示すように、本実施の形態における音声コンテンツ再生システム100aは、図1の音声出力装置200に代えて、音声出力装置200aを有する。音声出力装置200aは、ステレオ音声データ伝送用のケーブル211aと、人の頭部に装着されるステレオヘッドホン221aとから成る。ステレオヘッドホン221aには、ステレオヘッドホン221aの動き(つまりユーザの頭部の動き)を検出するモーションセンサ320aが取り付けられている。また、音声コンテンツ再生システム100aは、図1のポータブルプレイヤ300に代えて、実施の形態1の音声コンテンツ処理装置とは異なる音声コンテンツ処理装置を内蔵したポータブルプレイヤ300aを有する。 As shown in FIG. 14, the audio content reproduction system 100a in the present embodiment includes an audio output device 200a instead of the audio output device 200 of FIG. The audio output device 200a includes a stereo audio data transmission cable 211a and stereo headphones 221a attached to a human head. A motion sensor 320a that detects the movement of the stereo headphones 221a (that is, the movement of the user's head) is attached to the stereo headphones 221a. The audio content reproduction system 100a includes a portable player 300a that incorporates an audio content processing device different from the audio content processing device of the first embodiment, instead of the portable player 300 of FIG.
 モーションセンサ320aは、加速度を検出し、検出結果である加速度情報を、無線通信または有線通信(例えばケーブル211aを用いた通信)により、ポータブルプレイヤ300aの音声コンテンツ処理装置へ送信する。 The motion sensor 320a detects acceleration and transmits acceleration information as a detection result to the audio content processing device of the portable player 300a by wireless communication or wired communication (for example, communication using the cable 211a).
 図15は、本実施の形態に係る音声コンテンツ処理装置の構成の一例を示すブロック図であり、実施の形態1の図2に対応するものである。図2と同一部分には同一符号を付し、これについての説明を省略する。 FIG. 15 is a block diagram showing an example of the configuration of the audio content processing apparatus according to the present embodiment, and corresponds to FIG. 2 of the first embodiment. The same parts as those in FIG.
 図15において、本実施の形態に係る音声コンテンツ処理装置400aは、図2の構成に加えて、頭部向き取得部480aを有する。また、音声コンテンツ処理装置400aは、図2の提示位置計算部440および音声再生制御部460に代えて、提示位置計算部440aおよび音声再生制御部460aを有する。 15, the audio content processing apparatus 400a according to the present embodiment includes a head orientation acquisition unit 480a in addition to the configuration of FIG. The audio content processing apparatus 400a includes a presentation position calculation unit 440a and an audio reproduction control unit 460a instead of the presentation position calculation unit 440 and the audio reproduction control unit 460 of FIG.
 頭部向き取得部480aは、モーションセンサ320aから加速度情報を受信し、受信した加速度情報に基づいて、ステレオヘッドホン221aを装着したユーザの頭部の向きを取得する。より具体的には、頭部向き取得部480aは、ユーザが正面を向いて静止している状態からの加速度の積分により、ユーザの正面を基準とした頭部の向き(以下「頭部向き」という)を算出する。そして、頭部向き取得部480aは、頭部向きを示す頭部向き情報を、音声再生制御部460aへ出力する。 The head orientation acquisition unit 480a receives acceleration information from the motion sensor 320a, and acquires the orientation of the head of the user wearing the stereo headphones 221a based on the received acceleration information. More specifically, the head orientation acquisition unit 480a calculates the head orientation relative to the user's front (hereinafter referred to as “head orientation”) by integrating acceleration from a state where the user is stationary and facing the front. Calculated). Then, the head direction acquisition unit 480a outputs head direction information indicating the head direction to the sound reproduction control unit 460a.
 音声再生制御部460aは、最新のポインタ位置情報およびコンテンツ配置情報に基づいて、聞き手の位置を、上述の重み軸から離れた、ポインタの位置に対応する位置に配置した仮想音場空間を構築する。具体的には、音声再生制御部460aは、ポインタ位置に応じて、聞き手を中心としてその前方に水平方向に広がる円弧上で、音声コンテンツが配置された重み軸をスライドさせる。より具体的には、音声再生制御部460aは、ポインタ位置が上述の円弧に含まれるように、重み軸をスライドさせる。以下、この円弧を「提示窓」といい、重み軸のうち円弧に含まれる範囲を、「ポインタ範囲」という。 Based on the latest pointer position information and content arrangement information, the audio reproduction control unit 460a constructs a virtual sound field space in which the position of the listener is arranged at a position corresponding to the pointer position away from the weight axis described above. . Specifically, the audio reproduction control unit 460a slides the weight axis on which the audio content is arranged on an arc extending in the horizontal direction in front of the listener according to the pointer position. More specifically, the audio reproduction control unit 460a slides the weight axis so that the pointer position is included in the arc. Hereinafter, this arc is referred to as a “presentation window”, and a range included in the arc on the weight axis is referred to as a “pointer range”.
 また、音声再生制御部460aは、頭部向き情報に基づいて、聞き手の頭部向きがポインタ位置に向いている状態の仮想音場空間を構築する。したがって、本実施の形態では、ユーザの頭部向きが変化した場合、音声コンテンツが配置された重み軸自体、ポインタ位置、および提示範囲は変化しないが、仮想音場空間は変化する。 Also, the sound reproduction control unit 460a constructs a virtual sound field space in a state where the listener's head direction is directed to the pointer position based on the head direction information. Therefore, in this embodiment, when the user's head orientation changes, the weight axis on which the audio content is arranged, the pointer position, and the presentation range do not change, but the virtual sound field space changes.
 提示位置計算部440aは、ポインタ操作情報および頭部向き情報に応じてポインタを重み軸上で移動させたときの、ポインタの現在位置を計算する。具体的には、提示位置計算部440aは、ポインタ操作情報(つまり「戻る」ボタン311および「進む」ボタン313による操作)に応じて、ポインタ範囲を移動させる。そして、提示位置計算部440aは、頭部向き情報(つまり頭部向きによる操作)に応じて、ポインタ範囲内でポインタを移動させる。 The presentation position calculation unit 440a calculates the current position of the pointer when the pointer is moved on the weight axis according to the pointer operation information and the head orientation information. Specifically, the presentation position calculation unit 440a moves the pointer range in accordance with pointer operation information (that is, an operation using the “return” button 311 and the “forward” button 313). Then, the presentation position calculation unit 440a moves the pointer within the pointer range according to the head direction information (that is, the operation based on the head direction).
 このような音声コンテンツ処理装置400aは、重み軸を、ユーザの前方に横に広がった状態で円弧状に配置することができる。これにより、音声コンテンツ処理装置400aは、複数の音声コンテンツを異なる方向に配置することができる。したがって、音声コンテンツ処理装置400aは、ユーザに対し、複数の音声コンテンツを区別し易くすることができ、同時に音声を出力した場合でもそれぞれの音声を区別し易くすることができる。 Such an audio content processing apparatus 400a can arrange the weight axis in an arc shape with the weight axis extending laterally in front of the user. Thereby, the audio content processing apparatus 400a can arrange a plurality of audio contents in different directions. Therefore, the audio content processing apparatus 400a can make it easy for a user to distinguish between a plurality of audio contents, and even when audio is output at the same time, each audio can be easily distinguished.
 また、音声コンテンツ処理装置400aは、ボタン操作に応じてポインタ範囲を移動させ、ユーザの頭部向きに応じてポインタ範囲内(提示窓に位置する重み軸の範囲内)でポインタを移動させることができる。これにより、音声コンテンツ処理装置400aは、ユーザに対し、1つずつの小さなポインタ移動と提示窓ごとの大きなポインタ移動とを併用させることができ、ポインタ移動の操作性を向上させることができる。 In addition, the audio content processing apparatus 400a can move the pointer range according to the button operation, and move the pointer within the pointer range (within the range of the weight axis located in the presentation window) according to the user's head direction. it can. As a result, the audio content processing apparatus 400a allows the user to use both small pointer movements one by one and large pointer movements for each presentation window, thereby improving the operability of pointer movement.
 ここでは、本実施の形態における頭部向きおよび位置の定義について説明する。 Here, the definition of head orientation and position in the present embodiment will be described.
 図16は、頭部向きおよび位置の定義の一例を示す図である。 FIG. 16 is a diagram showing an example of head orientation and position definitions.
 図16に示すように、音声コンテンツ処理装置400aは、例えば、ユーザ(聞き手)の胴体661の正面方向662を、所定のXY座標系の負のX軸方向を基準とする偏角ψで定義する。このXY座標系は、ユーザ(聞き手)の頭部663を原点して水平に配置された座標系である。そして、音声コンテンツ処理装置400aは、ユーザ(聞き手)の頭部663の頭部向きを、胴体661の正面方向662に対する頭部663の正面方向664の角度φで定義する。また、音声コンテンツ処理装置400aは、ポインタ範囲上の位置665を、同じXY座標系の負のX軸方向を基準とする偏角θと、XY座標系の原点からの距離rで(つまり極座標で)定義する。また、音声コンテンツ処理装置400aは、重み軸の座標値xを、ポインタ範囲の偏角θに対応する角度とする。 As shown in FIG. 16, the audio content processing apparatus 400a defines, for example, the front direction 662 of the user (listener) body 661 with a declination angle ψ based on the negative X-axis direction of a predetermined XY coordinate system. . This XY coordinate system is a coordinate system arranged horizontally with the head 663 of the user (listener) as the origin. Then, the audio content processing apparatus 400a defines the head direction of the head 663 of the user (listener) with an angle φ of the front direction 664 of the head 663 with respect to the front direction 662 of the trunk 661. In addition, the audio content processing apparatus 400a sets the position 665 on the pointer range with a deviation angle θ with respect to the negative X-axis direction of the same XY coordinate system and a distance r from the origin of the XY coordinate system (that is, in polar coordinates). )Define. The audio content processing apparatus 400a sets the coordinate value x of the weight axis to an angle corresponding to the deviation angle θ of the pointer range.
 ここでは、音声コンテンツ処理装置400aは、ユーザの情報から見て時計回り方向を正として、各角度を定義する。また、音声コンテンツ処理装置400aは、ユーザが頭部663の正面方向664を胴体661の正面方向662に一致させている状態において、XY座標系を設定し、角度φに0度を設定する。また、音声コンテンツ処理装置400aは、重み軸の提示窓を、XY座標系の原点からの距離がrの円666上に配置する。 Here, the audio content processing apparatus 400a defines each angle with the clockwise direction as viewed from the user information as positive. Also, the audio content processing apparatus 400a sets the XY coordinate system and sets the angle φ to 0 degrees in a state where the user matches the front direction 664 of the head 663 with the front direction 662 of the body 661. Also, the audio content processing apparatus 400a arranges the weight axis presentation window on a circle 666 whose distance from the origin of the XY coordinate system is r.
 なお、本実施の形態では、位置配置部420は、各音声コンテンツが30度ずつの間隔で配置されるような重み位置変換ルールを用いるものとする。この重み位置変換ルールは、ID「i」の座標値をxとし、重み順序(図4参照)をOrd(w)とすると、例えば、以下の式(9)により表される。
 x = Ord(w) × 30      ・・・(9)
In the present embodiment, it is assumed that the position arrangement unit 420 uses a weight position conversion rule in which each audio content is arranged at intervals of 30 degrees. This weight position conversion rule is expressed by, for example, the following equation (9), where the coordinate value of ID “i” is x i and the weight order (see FIG. 4) is Ord (w i ).
x i = Ord (w i ) × 30 (9)
 次に、音声コンテンツ処理装置400aの動作について説明する。 Next, the operation of the audio content processing apparatus 400a will be described.
 図17は、音声コンテンツ処理装置の動作の一例を示す図であり、実施の形態1の図3に対応するものである。図3と同一部分には同一ステップ番号を付し、これについての説明を省略する。 FIG. 17 is a diagram illustrating an example of the operation of the audio content processing apparatus, and corresponds to FIG. 3 of the first embodiment. The same parts as those in FIG. 3 are denoted by the same step numbers, and description thereof will be omitted.
 ステップS1400aにおいて、音声再生制御部460aは、上述の円弧状の提示窓をユーザの前方に配置した仮想音場空間を構築する。 In step S1400a, the audio reproduction control unit 460a constructs a virtual sound field space in which the arc-shaped presentation window is arranged in front of the user.
 図18は、提示窓の一例を示す図である。 FIG. 18 is a diagram illustrating an example of a presentation window.
 図18に示すように、音声再生制御部460aは、頭部663の正面方向664によらず、胴体661の正面方向662を中心とする±90度の範囲667の、XY座標系の原点からの距離rの円弧668を、提示窓とする。初期状態では、ポインタの初期位置を含む範囲がポインタ範囲となる。ポインタ範囲の偏角θは、胴体の正面方向の偏角ψを用いて例えば、以下の式(10)で表される。
 ψ - 90 ≦ θ ≦  ψ + 90      ・・・(10)
As shown in FIG. 18, the audio reproduction control unit 460a does not depend on the front direction 664 of the head 663, but within a range 667 of ± 90 degrees around the front direction 662 of the body 661 from the origin of the XY coordinate system. An arc 668 having a distance r is used as a presentation window. In the initial state, the range including the initial position of the pointer is the pointer range. The deflection angle θ of the pointer range is expressed by, for example, the following formula (10) using the deflection angle ψ in the front direction of the body.
ψ−90 ≦ θ ≦ ψ + 90 (10)
 このような提示窓の場合、胴体661の前面にある音源であっても、胴体661の位置と同一周回上にない音源は聴くことができない。例えば、胴体の正面方向の偏角がψ=80°であり、ある音源671の角度がφ=120°であり、他の音源672の角度がφ′=φ+360°=480°であったとする。このユーザは、音源672の音声は聴くことができない。この判定には、上記の式(10)をそのまま使うことができる。 In the case of such a presentation window, even a sound source in front of the body 661 cannot hear a sound source that is not on the same circuit as the position of the body 661. For example, it is assumed that the deflection angle in the front direction of the body is ψ = 80 °, the angle of one sound source 671 is φ = 120 °, and the angle of another sound source 672 is φ ′ = φ + 360 ° = 480 °. This user cannot listen to the sound of the sound source 672. For this determination, the above equation (10) can be used as it is.
 そして、図17のステップS1410aにおいて、頭部向き提示位置計算部440aは、加速度情報に基づいて頭部向きを取得し、頭部向き情報を生成する。 In step S1410a of FIG. 17, the head orientation presentation position calculation unit 440a acquires the head orientation based on the acceleration information, and generates head orientation information.
 そして、ステップS1500aにおいて、提示位置計算部440aおよびマーカ音生成部450は、ポインタ範囲の変化および頭部向きの変化の少なくとも1つがあったか否かに基づいて、ポインタの移動があったか否かを判断する。ポインタ範囲の変化は、つまりポインタ操作情報の入力であり、頭部向きの変化は、つまり頭部向き情報の変化である。 In step S1500a, presentation position calculation unit 440a and marker sound generation unit 450 determine whether or not the pointer has moved based on whether or not there has been at least one of a change in pointer range and a change in head orientation. . The change in the pointer range is an input of pointer operation information, and the change in head direction is a change in head direction information.
 そして、ステップS2000aにおいて、音声再生制御部460aは、現在の頭部向きおうよびポインタ範囲に合わせて、仮想音場空間を再構築する。 In step S2000a, the sound reproduction control unit 460a reconstructs the virtual sound field space according to the current head direction and the pointer range.
 このような処理により、音声コンテンツ処理装置400aは、重み軸をユーザの前方に横に広がった状態で配置し、ユーザのボタン操作および頭部向きに応じてポインタ範囲およびポインタ位置を移動させることができる。 By such processing, the audio content processing apparatus 400a can arrange the weight axis in a state of spreading laterally in front of the user and move the pointer range and the pointer position according to the user's button operation and head orientation. it can.
 図19は、頭部向きの変化に伴うポインタ位置の移動およびマーカ音の変化の様子の一例を示す図である。 FIG. 19 is a diagram showing an example of the movement of the pointer position and the change of the marker sound accompanying the change in the head direction.
 図19Aに示すように、提示位置計算部440aは、初期状態において、重み順序で並ぶ第1~第5の音声コンテンツ631~635のうち、第3の音声コンテンツ633の位置を、頭部663の正面方向664としたとする。この場合、ポインタは、頭部663の正面方向664、つまり、第3の音声コンテンツ633に位置することになる。したがって、マーカ音636の音質は、第3の音声コンテンツ633の重みに応じた音質となる。 As shown in FIG. 19A, in the initial state, the presentation position calculation unit 440a determines the position of the third audio content 633 among the first to fifth audio contents 631 to 635 arranged in the weight order in the head 663. Assume that the front direction 664 is set. In this case, the pointer is positioned in the front direction 664 of the head 663, that is, in the third audio content 633. Therefore, the sound quality of the marker sound 636 becomes a sound quality according to the weight of the third audio content 633.
 また、音声再生制御部460aは、重み軸のうち、第3の音声コンテンツ633を中心とする第1~第5の音声コンテンツ631~635を含む範囲に、ポインタ範囲を設定したとする。 Further, it is assumed that the audio reproduction control unit 460a sets the pointer range in a range including the first to fifth audio contents 631 to 635 centering on the third audio content 633 on the weight axis.
 このような仮想音場空間により、音声コンテンツ処理装置400aは、ユーザに対し、第1~第5の音声コンテンツ631~635の音声を異なる方向から同時に聞かせることができ、これらを区別し易くすることができる。すなわち、ユーザは、第1~第5の音声コンテンツ631~635の音声が同時に再生されても、立体的に聞こえるため、それぞれを簡単に聞き分けることができる。 With such a virtual sound field space, the audio content processing device 400a can allow the user to hear the audio of the first to fifth audio contents 631 to 635 simultaneously from different directions, making it easy to distinguish them. be able to. That is, even if the sounds of the first to fifth audio contents 631 to 635 are reproduced at the same time, the user can hear them three-dimensionally and can easily recognize each of them.
 そして、図19Aに示す状態から、図19Bに示すように、ユーザが頭部を左に捻り、頭部663の正面方向664が第1の音声コンテンツ631に向いたとする。この場合、ポインタは、第1の音声コンテンツ631に移動することになり、マーカ音636の音質は、第3の音声コンテンツ633の重みに応じた音質から、第1の音声コンテンツ631の重みに応じた音質に変化する。 Then, from the state shown in FIG. 19A, it is assumed that the user twists his / her head to the left and the front direction 664 of the head 663 faces the first audio content 631 as shown in FIG. 19B. In this case, the pointer moves to the first audio content 631, and the sound quality of the marker sound 636 depends on the weight of the first audio content 631 from the sound quality corresponding to the weight of the third audio content 633. The sound quality changes.
 なお、音声コンテンツ処理装置400aは、図20に示すように、提示窓の全ての音声コンテンツ631~635から、マーカ音636を出力させてもよい。この場合、5個のマーカ音が聞こえることになるが、個々のマーカ音636の聞き分けは難しい。したがって、図19で説明したように、マーカ音636は、頭部663が向いている方向の音声コンテンツの位置からのみ出力されることが望ましい。 Note that the audio content processing apparatus 400a may output the marker sound 636 from all the audio contents 631 to 635 in the presentation window, as shown in FIG. In this case, five marker sounds can be heard, but it is difficult to distinguish the individual marker sounds 636. Therefore, as described with reference to FIG. 19, the marker sound 636 is preferably output only from the position of the audio content in the direction in which the head 663 is facing.
 図21は、ボタン操作に伴うポインタ範囲の移動の様子の一例を示す図である。 FIG. 21 is a diagram showing an example of the movement of the pointer range accompanying the button operation.
 図21Aに示すように、音声再生制御部460aは、ある時点において(例えば初期状態において)、提示範囲において第1~第5の音声コンテンツ631~635がこの順で時計回りに並ぶように、ポインタ範囲を設定したとする。 As shown in FIG. 21A, the audio reproduction control unit 460a sets the pointer so that the first to fifth audio contents 631 to 635 are arranged in this order clockwise in the presentation range at a certain time (for example, in an initial state). Suppose you set a range.
 この状態で、「進む」ボタン313が操作されたとすると、音声再生制御部460aは、ポインタ位置が原点から遠ざかる方向に、ポインタ範囲を移動させる。すなわち、音声再生制御部460aは、図21Bに示すように、提示範囲において、重み軸630を、反時計周りにスライドさせる。 In this state, if the “forward” button 313 is operated, the audio reproduction control unit 460a moves the pointer range in a direction in which the pointer position moves away from the origin. That is, as shown in FIG. 21B, the audio reproduction control unit 460a slides the weight axis 630 counterclockwise in the presentation range.
 また、逆に、「戻る」ボタン311が操作されたとすると、音声再生制御部460aは、ポインタ位置が原点に近付く方向に、ポインタ範囲を移動させる。すなわち、音声再生制御部460aは、提示範囲において、重み軸630を、時計周りにスライドさせる。 Conversely, if the “return” button 311 is operated, the audio reproduction control unit 460a moves the pointer range in a direction in which the pointer position approaches the origin. That is, the audio reproduction control unit 460a slides the weight axis 630 clockwise in the presentation range.
 このような重み軸のスライドにより、音声コンテンツ処理装置400aは、ユーザに対して、複数の音声コンテンツが周囲に配置された回転椅子に座り、椅子ごと回っているような感覚を与えることができる。そして、ユーザは、GUIにおけるスクロールにあたる操作を、ボタン操作により行うことができる。 The slide of the weight axis allows the audio content processing apparatus 400a to give the user a feeling that a plurality of audio contents are sitting on a rotating chair arranged around and are turning together. The user can perform an operation corresponding to scrolling in the GUI by a button operation.
 このように、本実施の形態に係る音声コンテンツ処理装置400aは、複数の音声コンテンツが配置された重み軸を、ユーザの前方に横に広がった状態で配置する。これにより、音声コンテンツ処理装置400aは、ユーザに対し、複数の音声コンテンツを区別し易く提示することができる。 As described above, the audio content processing apparatus 400a according to the present embodiment arranges the weight axis on which a plurality of audio contents are arranged in a state of spreading laterally in front of the user. Thereby, the audio content processing apparatus 400a can present a plurality of audio contents to the user in an easily distinguishable manner.
 また、これにより、音声コンテンツ処理装置400aは、ユーザに対し、どの音声コンテンツにポインタが位置するかを把握し易くし、マーカ音の音質がどの音声コンテンツに対応するものであるかを直感的に把握させることができる。したがって、音声コンテンツ処理装置400aは、複数の音声コンテンツからの音声を同時に再生する場合であっても、ユーザに対し、各音声コンテンツの重みを分かり易く提示することができる。 Accordingly, the audio content processing device 400a makes it easy for the user to grasp which audio content the pointer is located on, and intuitively which audio content corresponds to the sound quality of the marker sound. It can be grasped. Therefore, the audio content processing apparatus 400a can present the weight of each audio content to the user in an easy-to-understand manner even when audio from a plurality of audio contents is played back simultaneously.
 なお、以上説明した本実施の形態では、音声再生制御部は、重み軸を、ユーザを中心として前方に水平方向に広がる円弧上に配置するとしたが、これに限定されない。音声再生制御部は、例えば、重み軸を、ユーザの前方において横に水平方向に伸びる直線上に配置したり、垂直方向または3次元方向に広がる、直線または曲線上に配置してもよい。 In the present embodiment described above, the audio reproduction control unit arranges the weight axis on an arc extending in the horizontal direction forward with the user as the center, but is not limited thereto. For example, the audio reproduction control unit may arrange the weight axis on a straight line extending horizontally in the front direction of the user or on a straight line or a curve extending in the vertical direction or the three-dimensional direction.
 また、頭部の胴体に対する向きの変位を示す情報(頭部向き情報に相当する情報)が入力される場合には、音声コンテンツ処理装置は、頭部向き取得部を必ずしも備えなくてもよい。 Also, when information indicating the displacement of the direction of the head relative to the body (information corresponding to the head direction information) is input, the audio content processing apparatus does not necessarily include the head direction acquisition unit.
 また、頭部向き取得部は、胴体を基準とした頭部の向きではなく、他の方向(例えば方位)を基準とした頭部の向きや、他の方向を基準とした胴体の向きを、頭部向き情報として取得してもよい。 In addition, the head direction acquisition unit is not the head direction based on the torso, but the head direction based on another direction (for example, azimuth) or the direction of the body based on the other direction, You may acquire as head direction information.
 また、以上説明した実施の形態1および実施の形態2では、位置配置部は、検索のマッチ点数を重み軸への配置順序を規定する重みとしたが、これに限定されない。位置配置部は、順序を有する音声コンテンツのあらゆる属性を、重みとして取得することができる。したがって、重みは、例えば、音声コンテンツのタイトル名の辞書式順序や、音声コンテンツの再生回数、音声コンテンツの作成日時、および音声コンテンツの再生時間またはこれらの順序を用いることができる。 In the first and second embodiments described above, the position arrangement unit uses the number of matching points as a weight that defines the arrangement order on the weight axis, but is not limited thereto. The position arrangement unit can acquire all attributes of the audio content having the order as weights. Therefore, the weight can use, for example, the lexicographic order of the title names of the audio content, the number of times the audio content is played, the creation date and time of the audio content, and the playback time of the audio content or their order.
 また、ポインタ移動、ポインタ範囲移動、および音声コンテンツ決定の操作手法は、上述の例に限定されない。すなわち、ポインタ位置取得部は、十字キー、キーボード、マウスなどの他の各種入力デバイスから操作情報を取得してもよい。 Also, the operation methods of pointer movement, pointer range movement, and audio content determination are not limited to the above examples. That is, the pointer position acquisition unit may acquire operation information from various other input devices such as a cross key, a keyboard, and a mouse.
 また、マーカ音は、上述の例に限定されない。更に、重みに応じて変化するマーカ音の音質は、音程に限定されない。但し、マーカ音は、直感的に変化および変化の方向を把握可能であることが望ましい。したがって、重みに応じて変化するマーカ音の音質には、マーカ音の音質、音程、発音間隔、発音長さ、および振動周期の少なくとも1つが含まれることが望ましい。なお、耳が良い人でも、聞き分けが可能な音程は、可聴域において100段階程度までと言われている。したがって、マーカ音生成部は、例えば、複数の音程を組み合わせた和音や、音程と更に別の種類の音質とを組み合わせた音、異なる楽器など複数の音質の音を組み合わせた音などにより、重みを表現するマーカ音を生成してもよい。すなわち、マーカ音生成部は、その種類を位取り記数法的に増大させて、マーカ音を生成してもよい。 Further, the marker sound is not limited to the above example. Furthermore, the sound quality of the marker sound that changes according to the weight is not limited to the pitch. However, it is desirable that the marker sound can intuitively grasp the change and the direction of the change. Therefore, it is desirable that the sound quality of the marker sound that changes in accordance with the weight includes at least one of the sound quality of the marker sound, the pitch, the sound production interval, the sound production length, and the vibration cycle. It should be noted that even a person with good ears is said to have a range of about 100 steps in the audible range that can be distinguished. Therefore, the marker sound generator weights, for example, a chord combining a plurality of pitches, a sound combining a pitch with another type of sound quality, a sound combining a plurality of sound qualities such as different instruments, and the like. A marker sound to be expressed may be generated. That is, the marker sound generation unit may generate the marker sound by increasing the type in a scaled number method.
 例えば、重みが2桁の十進数で表現可能であれば、マーカ音は、十の位をバイオリンの音色の10通りの音程で表し、一の位をピアノの音色の音程で表すものとすることができる。このバイオリンとピアノの音色のマーカ音は、まずバイオリンの音を鳴らし、次にピアノの音を鳴らすものであってもよい。あるいは、マーカ音は、ピアノの音とバイオリンの音とを同時に鳴らすものであってもよい。異なる種類の楽器の音程は、同時に鳴らしても比較的聞き分け易い。したがって、このようなマーカ音は、重みの違いを表現することができる。 For example, if the weight can be expressed by a two-digit decimal number, the marker sound shall be represented by ten pitches of the violin tone and the first digit by the pitch of the piano tone. Can do. The violin and piano timbre marker sound may be a sound of a violin and then a piano sound. Alternatively, the marker sound may be a sound that plays a piano sound and a violin sound simultaneously. The pitches of different types of instruments are relatively easy to distinguish even if they are played simultaneously. Therefore, such a marker sound can express a difference in weight.
 更に、発音間隔の例を挙げると、マーカ音は、短い(0.2~0.5秒程度の)ビープ音の間隔で重みを表すものとすることができる。例えば、マーカ音は、重みが大きい場合は、ビープ音の間隔を短くし、重みが小さい場合は、ビープ音の間隔を大きくする。 Furthermore, as an example of the sound generation interval, the marker sound can represent a weight with a short beep sound interval (about 0.2 to 0.5 seconds). For example, when the marker sound has a large weight, the beep sound interval is shortened, and when the weight is small, the beep sound interval is increased.
 発音の長さの例を挙げると、マーカ音は、ビープ音の長短により重みを表すものとすることができる。 As an example of the length of pronunciation, the marker sound can represent the weight by the length of the beep sound.
 振動周期の例を挙げると、マーカ音は、「うねり」の周波数の高低により重みを表すものとすることができる。 As an example of the vibration period, the marker sound can represent a weight by the level of the “swell” frequency.
 また、音声コンテンツ処理装置は、複数種類の重みのそれぞれについてマーカ音を生成したり、複数種類の重みをそれぞれ異なる種類の音質(例えば音程と振動周期)で表現したマーカ音を生成してもよい。また、音声コンテンツ処理装置は、複数種類の重みを組み合わせた順序(合成された重み)に基づいて音質を決定してもよい。 The audio content processing apparatus may generate a marker sound for each of a plurality of types of weights, or generate a marker sound that expresses a plurality of types of weights with different types of sound quality (for example, pitch and vibration period). . Further, the audio content processing apparatus may determine the sound quality based on an order (combined weight) in which a plurality of types of weights are combined.
 また、マーカ音生成部は、ポインタ位置が移動するごとに、その移動後の位置を提示するマーカ音の直後に、そのまま同じ方向へポインタが移動した場合のマーカ音(またはその中間のマーカ音)を鳴らすようにしてもよい。これにより、音声コンテンツ処理装置は、ポインタが移動するごとに、重みそのものだけでなく、重みの変化の方向をも提示することができる。 In addition, whenever the pointer position moves, the marker sound generation unit immediately follows the marker sound indicating the position after the movement, and the marker sound when the pointer moves in the same direction as it is (or an intermediate marker sound) May be sounded. As a result, the audio content processing apparatus can present not only the weight itself but also the direction of change in the weight each time the pointer moves.
 また、マーカ音の位置は、必ずしもポインタ位置と同一でなくてもよい。例えば、音声再生制御部は、ポインタ位置の少し上方や後方に、マーカ音を配置する。または、音声ストリーム生成部は、マーカ音をポインタ位置の少し上方や後方に配置した音声データを生成する。これにより、音声コンテンツ処理装置は、ユーザに対し、音声コンテンツの音声を聞き易くすることができる。実施の形態2においてユーザからみてポインタ位置の向こう側にマーカ音が配置された場合、マーカ音は音声コンテンツの背景音のように聞こえることになる。 Also, the position of the marker sound is not necessarily the same as the pointer position. For example, the audio reproduction control unit arranges the marker sound slightly above or behind the pointer position. Alternatively, the audio stream generation unit generates audio data in which the marker sound is arranged slightly above or behind the pointer position. Thereby, the audio content processing apparatus can make it easier for the user to hear the audio of the audio content. In the second embodiment, when the marker sound is arranged beyond the pointer position as seen from the user, the marker sound sounds like the background sound of the audio content.
 以上のように、本発明の実施の形態に係る音声コンテンツ処理装置は、それぞれ重みが付けられた複数の音声コンテンツの間でポインタを移動させる提示位置計算部を有する。そして、本発明の実施の形態に係る音声コンテンツ処理装置は、前記ポインタが位置する前記音声コンテンツの前記重みを、マーカ音の音質で提示するマーカ音生成部を有する。更に、本発明の実施の形態に係る音声コンテンツ処理装置は、前記ポインタが位置する前記音声コンテンツに対して所定の処理を行う音声再生制御部を有する。これにより、本実施の形態は、ディスプレイを使わずに各音声コンテンツの重みを直感的に把握することができる。 As described above, the audio content processing apparatus according to the embodiment of the present invention includes the presentation position calculation unit that moves the pointer between a plurality of audio contents each weighted. And the audio | voice content processing apparatus which concerns on embodiment of this invention has the marker sound production | generation part which presents the said weight of the said audio | voice content in which the said pointer is located with the sound quality of marker sound. Furthermore, the audio content processing apparatus according to the embodiment of the present invention includes an audio reproduction control unit that performs predetermined processing on the audio content where the pointer is located. Thereby, this Embodiment can grasp | ascertain the weight of each audio | voice content intuitively, without using a display.
 2011年3月14日出願の特願2011-55082の日本出願に含まれる明細書、図面および要約書の開示内容は、すべて本願に援用される。 The disclosure of the description, drawings and abstract contained in the Japanese application of Japanese Patent Application No. 2011-55082 filed on March 14, 2011 is incorporated herein by reference.
 本発明は、ディスプレイを使わずに各音声コンテンツの重みを、直感的に把握できるように提示することができる音声コンテンツ処理装置および音声コンテンツ処理方法として有用である。すなわち、本発明は、特に、音声モーダルが主なモバイル端末や、視覚の利用を抑えるべきモバイル端末使用環境に好適である。 The present invention is useful as an audio content processing apparatus and an audio content processing method capable of presenting the weight of each audio content without using a display so that the weight can be intuitively grasped. That is, the present invention is particularly suitable for a mobile terminal mainly using voice modal and a mobile terminal usage environment in which the use of vision should be suppressed.
 100、100a 音声コンテンツ再生システム
 200、200a 音声出力装置
 210、211a ケーブル
 220 モノラルイヤホン
 221a ステレオヘッドホン
 300、300a ポータブルプレイヤ
 301 筐体
 310 ユーザ操作入力装置
 311 「戻る」ボタン
 312 「決定」ボタン
 313 「進む」ボタン
 320a モーションセンサ
 400、400a 音声コンテンツ処理装置
 410 情報格納部
 420 位置配置部
 430 ポインタ位置取得部
 440、440a 提示位置計算部
 450 マーカ音生成部
 460、460a 音声再生制御部
 470 音声ストリーム生成部
 480a 頭部向き取得部
 510 ファイルシステム
 520 音声コンテンツ供給サーバ
100, 100a Audio content playback system 200, 200a Audio output device 210, 211a Cable 220 Monaural earphone 221a Stereo headphone 300, 300a Portable player 301 Housing 310 User operation input device 311 “Return” button 312 “Enter” button 313 “Go” Button 320a Motion sensor 400, 400a Audio content processing device 410 Information storage unit 420 Position arrangement unit 430 Pointer position acquisition unit 440, 440a Presentation position calculation unit 450 Marker sound generation unit 460, 460a Audio reproduction control unit 470 Audio stream generation unit 480a Head Department acquisition unit 510 File system 520 Audio content supply server

Claims (11)

  1.  それぞれ重みが付けられた複数の音声コンテンツの間でポインタを移動させる提示位置計算部と、
     前記ポインタが位置する前記音声コンテンツの前記重みを、前記重みに対応付けられたマーカ音の音質で提示するマーカ音生成部と、
     前記ポインタが位置する前記音声コンテンツに対して所定の処理を行う音声再生制御部と、を有する、
     音声コンテンツ処理装置。
    A presentation position calculator that moves the pointer between a plurality of weighted audio contents,
    A marker sound generation unit that presents the weight of the audio content on which the pointer is located with the sound quality of the marker sound associated with the weight;
    An audio reproduction control unit that performs a predetermined process on the audio content where the pointer is located,
    Audio content processing apparatus.
  2.  前記音質は、音程、発音間隔、発音長さ、および振動周期の少なくとも1つを含む、
     請求項1記載の音声コンテンツ処理装置。
    The sound quality includes at least one of a pitch, a pronunciation interval, a pronunciation length, and a vibration period.
    The audio content processing apparatus according to claim 1.
  3.  前記複数の音声コンテンツを、それぞれの重みに応じた位置で重み軸に配置する位置配置部、を更に有し、
     前記提示位置計算部は、
     前記重み軸上で前記ポインタを移動させ、
     前記音質の変化は、前記ポインタの前記重み軸における位置の変化が一方向に進むとき一方向に進む、
     請求項2記載の音声コンテンツ処理装置。
    A position placement unit for placing the plurality of audio contents on a weight axis at a position corresponding to each weight;
    The presentation position calculation unit
    Move the pointer on the weight axis;
    The change in sound quality advances in one direction when the change in position of the pointer on the weight axis advances in one direction.
    The audio content processing apparatus according to claim 2.
  4.  前記所定の処理は、前記音声コンテンツの属性の音声出力および前記音声コンテンツの再生の少なくとも1つを含む、
     請求項3記載の音声コンテンツ処理装置。
    The predetermined process includes at least one of an audio output of an attribute of the audio content and a reproduction of the audio content.
    The audio content processing apparatus according to claim 3.
  5.  前記提示位置計算部は、
     前記重み軸における前記ポインタの移動操作を受け付ける、
     請求項4記載の音声コンテンツ処理装置。
    The presentation position calculation unit
    Receiving a movement operation of the pointer on the weight axis;
    The audio content processing apparatus according to claim 4.
  6.  前記音声再生制御部は、
     前記重み軸のうち少なくとも前記ポインタの位置を含み、前記マーカ音と前記音声コンテンツの音声とが、それぞれの位置から出力される仮想音場空間を構築し、
     構築された前記仮想音場空間の音場を実現する音声データを生成する音声ストリーム生成部、を更に有する、
     請求項5記載の音声コンテンツ処理装置。
    The voice reproduction control unit
    Constructing a virtual sound field space that includes at least the position of the pointer on the weight axis, and in which the marker sound and the sound of the audio content are output from each position;
    An audio stream generation unit that generates audio data that realizes the sound field of the constructed virtual sound field space;
    The audio content processing apparatus according to claim 5.
  7.  前記マーカ音は、前記ポインタの位置を示す音声ポインタである、
     請求項6記載の音声コンテンツ処理装置。
    The marker sound is an audio pointer indicating the position of the pointer.
    The audio content processing apparatus according to claim 6.
  8.  前記音声再生制御部は、
     前記重み軸を直線とし、前記音声データの聞き手の位置を、前記重み軸上の、前記ポインタの位置に配置した前記仮想音場空間を構築する、
     請求項7記載の音声コンテンツ装置。
    The voice reproduction control unit
    Constructing the virtual sound field space in which the weight axis is a straight line and the position of the listener of the audio data is arranged at the position of the pointer on the weight axis;
    The audio content apparatus according to claim 7.
  9.  前記音声再生制御部は、
     前記聞き手の位置を、前記重み軸から離れた、前記ポインタの位置に対応する位置に配置した前記仮想音場空間を構築し、
     実空間における前記聞き手の頭部の向きを検出する頭部向き取得部、を更に有し、
     前記提示位置計算部は、
     前記実空間における前記頭部の向きを、前記仮想音場空間における前記聞き手の頭部の向きとしたときに、前記重み軸のうち前記頭部の正面方向に対応する位置を、前記ポインタの位置とする、
     請求項7記載の音声コンテンツ処理装置。
    The voice reproduction control unit
    Constructing the virtual sound field space in which the position of the listener is arranged at a position corresponding to the position of the pointer away from the weight axis;
    A head orientation acquisition unit for detecting the orientation of the listener's head in real space;
    The presentation position calculation unit
    When the orientation of the head in the real space is the orientation of the listener's head in the virtual sound field space, the position corresponding to the front direction of the head on the weight axis is the position of the pointer And
    The audio content processing apparatus according to claim 7.
  10.  前記重みは、前記音声コンテンツの属性のうち、順序を有する属性の、前記順序を示す値である、
     請求項7記載の音声コンテンツ処理装置。
    The weight is a value indicating the order of attributes having an order among the attributes of the audio content.
    The audio content processing apparatus according to claim 7.
  11.  それぞれ重みが付けられた複数の音声コンテンツの間でポインタを移動させるステップと、
     前記ポインタが位置する前記音声コンテンツの前記重みを、前記重みに対応付けられたマーカ音の音質で提示するステップと、
     前記ポインタが位置する前記音声コンテンツに対して所定の処理を行うステップと、を有する、
     音声コンテンツ処理方法。
    Moving the pointer between a plurality of audio content each weighted;
    Presenting the weight of the audio content at which the pointer is located with the sound quality of the marker sound associated with the weight;
    Performing a predetermined process on the audio content where the pointer is located,
    Audio content processing method.
PCT/JP2012/001384 2011-03-14 2012-02-29 Audio content processing device and audio content processing method WO2012124268A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2011-055082 2011-03-14
JP2011055082 2011-03-14

Publications (1)

Publication Number Publication Date
WO2012124268A1 true WO2012124268A1 (en) 2012-09-20

Family

ID=46830361

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2012/001384 WO2012124268A1 (en) 2011-03-14 2012-02-29 Audio content processing device and audio content processing method

Country Status (1)

Country Link
WO (1) WO2012124268A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003076719A (en) * 2001-06-22 2003-03-14 Sony Computer Entertainment Inc Information reading program, recording medium recored with information reading program, apparatus and method for reading information, program for generating information, recording medium recorded with program for generating information, apparatus and method for generating information, and information generation reading system
JP2006074589A (en) * 2004-09-03 2006-03-16 Matsushita Electric Ind Co Ltd Acoustic processing device
JP2007087104A (en) * 2005-09-22 2007-04-05 Sony Corp Voice output controller and voice output control method, recording medium and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003076719A (en) * 2001-06-22 2003-03-14 Sony Computer Entertainment Inc Information reading program, recording medium recored with information reading program, apparatus and method for reading information, program for generating information, recording medium recorded with program for generating information, apparatus and method for generating information, and information generation reading system
JP2006074589A (en) * 2004-09-03 2006-03-16 Matsushita Electric Ind Co Ltd Acoustic processing device
JP2007087104A (en) * 2005-09-22 2007-04-05 Sony Corp Voice output controller and voice output control method, recording medium and program

Similar Documents

Publication Publication Date Title
US9645648B2 (en) Audio computer system for interacting within a virtual reality environment
US8805561B2 (en) Audio user interface with audio cursor
US7190794B2 (en) Audio user interface
US7266207B2 (en) Audio user interface with selective audio field expansion
EP3940690A1 (en) Method and device for processing music file, terminal and storage medium
WO2012120810A1 (en) Audio control device and audio control method
US7065222B2 (en) Facilitation of clear presentation in audio user interface
CN108540899A (en) Include the hearing devices of user&#39;s interactive mode auditory displays
EP1566076B1 (en) Audio based data representation apparatus and method
CN113823250B (en) Audio playing method, device, terminal and storage medium
US20030095669A1 (en) Audio user interface with dynamic audio labels
CN108834037B (en) The method and apparatus of playing audio-fequency data
US6912500B2 (en) Facilitation of speech recognition in user interface
US20020151997A1 (en) Audio user interface with mutable synthesised sound sources
US20030227476A1 (en) Distinguishing real-world sounds from audio user interface sounds
CN109246580A (en) 3D sound effect treatment method and Related product
JP2012165283A (en) Signal processing apparatus
US20020150257A1 (en) Audio user interface with cylindrical audio field organisation
CN108827338B (en) Voice navigation method and related product
CN114546325B (en) Audio processing method, electronic device, and readable storage medium
US20020150256A1 (en) Audio user interface with audio field orientation indication
US20020154179A1 (en) Distinguishing real-world sounds from audio user interface sounds
WO2012124268A1 (en) Audio content processing device and audio content processing method
US20020147586A1 (en) Audio annoucements with range indications
US20030095668A1 (en) Audio user interface with multiple audio sub-fields

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12758153

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12758153

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP