TW201040940A

TW201040940A - Sound processing apparatus, chat system, sound processing method, information storage medium, and program

Info

Publication number: TW201040940A
Application number: TW099101683A
Authority: TW
Inventors: Shoji Mori
Original assignee: Konami Digital Entertainment
Priority date: 2009-01-23
Filing date: 2010-01-21
Publication date: 2010-11-16
Also published as: WO2010084830A1; JP2010169925A

Abstract

A chat system (211) comprises two sound processing apparatuses (201). In each of the sound processing apparatuses (201), an input accepting unit (202) accepts an input of voice uttered by a user; an extracting unit (203) extracts a characteristic parameter of the accepted voice; a generating unit (204) generates synthetic sound from predetermined sound data; and an output unit (205) outputs the generated synthetic sound. Typically, the temporal change in the waveform amplitude, the sound volume, the value of the base frequency component or the predetermined representative frequency component is used as a characteristic parameter, and the characteristic parameter of the predetermined sound data is replaced by the extracted characteristic parameter, in order to generate synthetic voice.

Description

201040940 六、發明說明：【發明所屬之技術領域】本發明係有關於一種聲音處理裝置、聊天系統、聲音處理方法、資訊記錄媒體以及程式，非常適用於藉由使用者們的聲音來做某種程度之持續溝通且不會做不適切之會話。【先前技術】 ❹ 先别’在線上遊戲或SNS (社交網路服務（s〇ciai201040940 VI. Description of the Invention: [Technical Field] The present invention relates to a sound processing device, a chat system, a sound processing method, an information recording medium, and a program, which are very suitable for doing something by the user's voice. Continuous communication of degree and no uncomfortable conversation. [Prior Art] ❹ Don’t play online or SNS (social network service (s〇ciai

Network Service))領域中，係利用一種聲音聊天系統，刚述聲音聊天系統，係以麥克風檢出使用者發出之聲音，使該聲音的聲音數據（音訊資料）往對方使用者之終端裝置傳送，而便該聲音數據以對方終端裝置之揚聲器或頭戴式耳機再生（播放）之處理，雙向實施，藉此來進行聊天。關於這種技術係揭示於下述之專利文獻1中。在此，於專利文獻1中，係提案一種技術，將假想（虛 Q擬）空間内之使用者周圍之環境聲音及該使用者發出之聲音加以合成，而傳遞到其他使用者，來提高聲音聊天系統臨場感。〔先前技術文獻〕 (專利文獻】專利文獻1 :日本特開2006-343447號公報【發明内容】〔發明所欲解決的問題〕 3 201040940 但是，使用者發出的聲音之波形數據，其數據量很大，所以’很容易產生傳送延遲等問題。又，為了讓使用者間能健全地交流，係必須抑制不適切片語之使用且抑制侵害隱私内容之傳遞。 Ο Ο 本發明，係為解決上述問題而研發出來，其目的在於提供一種聲音處理裝置、聊天系統、聲音處理方法、資訊記錄媒體以及程式，非常適用於藉由使用者們的聲音來做某種程度之溝通，且不會做不適切之會話。〔解決問題所用的技術手段〕為了達成上述㈣’依照本發明之原理，開示下述之發明。部、Π!之第1觀點之聲音處理裝置，係具有輸入接收 #、生成部及輸出部，並如下述般地構成。二即：輸入接收部，係接收使用者發出的聲音之輸入。既ii樣：Γ利用麥克風，取得該聲音之波形數據，以 =取樣頻率做A/M類比/數位）轉換，而當作數值列來另外，抽出部，係抽出所接收的聲音的為特徵參數，典型而+ + 车日的特徵參數。作基本頻率成八的大D，有波形的振幅或音量、基本頻率、 +两牛成分的大小、或特徵參數係隨著時「頻率成分的大小，時間羞過而變化。這4b資以藉由離散高讳毺1 —貝讯，典型上係可速傅利葉轉換等技術來抽出。而且，生成部’係根音。在此，生成部，係…兮^數據來生成合成聲係藉由將该既定的聲音數據的特徵參 4 201040940 數，置換成前述被抽出之特徵參數的值，來生成合成聲音。作為既定的聲音數據，係可以利用由正弦波所構成之聲音數據、預先準備之配音員的聲音或樂器的聲音等聲音數據。既定的聲音數據與被生成之合成聲音的差，在於特徵參數之值。合成聲音的特徵參數，係將既定的聲音數據的特徵參數，置換成被抽出的特徵參數的值而成。 Ο 〇如上所述’作為特徵參數，當採用振幅或音量時’係藉由改變既定的聲音數據的振幅或音量，來生成合成聲音。又’作為特徵參數，當採用基本頻率時，係藉由改變既定的聲音數據之音階（key)，來生成合成聲音。而且，作為特徵參數，當採用基本頻率成分的大小、或疋既疋頻率成分的大，丨、β主ϊ ίά / 手係藉由改變既定的聲音數據的該成分的大小，來生成合成聲音。，二實t這種置換，則合成聲音的大小、強弱、高低及抑％等之時間變化，係變成會與使用者發出的聲音一致。因此’合成聲音係被認為可某程度反映使用者的感情。、另外，由於合成聲音變成不是「人類發出之聲音」， =以即便使用者將片語或文章等加以發聲，在合成聲音中’該片語或文章等的内容也會變得無法瞭解。進而，輸出部係輸出所生成的合成聲音。 :此’被輪出之合成聲音，如上所 ::情的變化’但是卻會變成無法以聲音來傳遞片= 早等之狀態。因此，即便使用者講出侵犯隱私之發言或違 201040940 該發言内容也無法傳遞到對方的使用若根據本發明，雖然無法取得使用者的詳細發言内容來作為語言資m，但是可以藉由使用者們的感情來溝通意思。尤其，可以抑制使用者們的發言内容而造成的糾紛。In the field of Network Service), a voice chat system is used. The voice chat system is used to detect the voice of the user by using a microphone, and the voice data (audio data) of the voice is transmitted to the terminal device of the other user. On the other hand, the voice data is processed in a two-way manner by the process of reproducing (playing) the speaker or the headset of the counterpart terminal device, thereby performing chat. This technique is disclosed in Patent Document 1 below. Here, in Patent Document 1, a technique is proposed in which an ambient sound around a user in a virtual (virtual Q) space and a sound emitted by the user are combined and transmitted to other users to improve the sound. The chat system is on the spot. [PRIOR ART DOCUMENT] Patent Document 1: Japanese Laid-Open Patent Publication No. 2006-343447 [Draft of the Invention] [Problems to be Solved by the Invention] 3 201040940 However, the waveform data of the sound emitted by the user has a very large amount of data. It is very large, so it is easy to cause problems such as transmission delay. In order to allow users to communicate properly, it is necessary to suppress the use of unspoken words and suppress the transmission of infringing privacy content. Ο Ο The present invention is to solve the above problems. The problem was developed to provide a sound processing device, a chat system, a sound processing method, an information recording medium, and a program, which are very suitable for a certain degree of communication by the user's voice, and will not do it. [Means for Solving the Problem] In order to achieve the above (4) 'in accordance with the principles of the present invention, the following invention is disclosed. The first aspect of the sound processing apparatus includes an input receiving # and a generating unit. And the output unit is configured as follows. Second, the input receiving unit receives the sound of the user. Ii ii Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Γ Parameters, typical and + + characteristic parameters of the car day. Make a large D with a basic frequency of eight, with the amplitude or volume of the waveform, the fundamental frequency, the size of the two-bovine component, or the characteristic parameters of the time-frequency component, the time is shy and change. It is extracted by a technique such as discrete high-speed 1 - Beixun, which is typically a fast Fourier transform. Moreover, the generating part is the root sound. Here, the generating unit, the data is generated to generate a synthetic sound system. The number of the characteristic data of the predetermined sound data is replaced by the value of the extracted characteristic parameter to generate a synthesized sound. As the predetermined sound data, the sound data composed of the sine wave and the dubbing prepared in advance can be used. Sound data such as the sound of a member or the sound of a musical instrument. The difference between the predetermined sound data and the synthesized sound generated is the value of the characteristic parameter. The characteristic parameter of the synthesized sound is replaced by the characteristic parameter of the predetermined sound data. The value of the characteristic parameter is obtained. Ο 〇 As described above, 'as a characteristic parameter, when amplitude or volume is used' is the vibration by changing the predetermined sound data. Amplitude or volume to generate a synthesized sound. Also as a characteristic parameter, when a fundamental frequency is used, a synthesized sound is generated by changing a key of a predetermined sound data. Moreover, as a characteristic parameter, when a basic frequency is employed The size of the component, or the size of the 疋, β, β main ϊ ά ά 手手手手手手手手手手手手手手手手手手手手手手手手手手手手手手手手手手手手手手手手The time change of the size, intensity, height, and % of the sound is the same as that of the user. Therefore, the 'synthesis sound system is considered to reflect the user's feelings to some extent. In addition, since the synthesized sound becomes not "The voice of human beings", = even if the user utters a phrase or an article, the content of the phrase or article in the synthesized voice becomes unintelligible. Further, the output unit outputs the generated synthesized sound. : This is the synthetic sound that is turned out, as the above changes: but it becomes a state in which the film cannot be transmitted by sound = early. Therefore, even if the user speaks the statement of invasion of privacy or the content of the statement is not transmitted to the other party according to the 201040940, according to the present invention, although the detailed content of the user cannot be obtained as the language m, the user can Their feelings come to communicate meaning. In particular, it is possible to suppress disputes caused by the contents of the users' speeches.

又，在本發明之聲音處理裝置中，特徵參數，係可以用波形的振幅或音量、基本頻率、基本頻率成分的大小、或疋既疋之代表頻率成分的大小之時間變化來構成。如上所述，該構成係有關於本發明的較佳實施形態。並且，作為既定的代表頻率成分’可以取得對於各個預先決定的複數個頻率的成分大小，也可以針對頻率分佈的峰值的上位既定數，取得頻率與成分的大小之組合。、又，在本發明之聲音處理裝置中，抽出部係構成能以i 移不足20次的頻率來抽出特徵參數。Further, in the sound processing device of the present invention, the characteristic parameter can be constituted by the amplitude or volume of the waveform, the fundamental frequency, the magnitude of the fundamental frequency component, or the time variation of the magnitude of the representative frequency component. As described above, this configuration is a preferred embodiment of the present invention. Further, as a predetermined representative frequency component, a component size for each predetermined plurality of frequencies can be obtained, and a combination of a frequency and a component size can be obtained for a predetermined number of peaks of the frequency distribution. Further, in the sound processing device of the present invention, the extracting unit configuration can extract the characteristic parameter at a frequency less than 20 times.

反公序良俗之發言者0 人類之可聽見範圍係20Hz〜20kHz，所以，為了完全復原聲音之波形數據’必須要40 kHz以上之取樣頻率。由於電話的聲音係不含4000Hz以上頻率成分之音質，所以’當要維持電話程度之聲音品質時，必須要咖⑽ 之取樣頻率。反之’若採用不足2〇Hz之取樣頻率時幅減少非處理不可之數據量’同時’也可完全除去由聲立傳遞之片語或文章之語言資訊。 9 右根據本發明，不但可大幅地減少必須處理之數據 s，同時，可使藉由言語來溝通意思在事實而變成能有效率地防止侵犯隱私或違反公“ = 201040940 等。又，在本發明之聲音處理裝置中，抽出部係將所接传到的聲音做離散傅利葉轉換，在所獲得的頻率分佈 :抽:既定的複數個頻率成分的大小，作為特徵參數而加的夂：生成部，係構成’可以將事先分別對應於被抽出 ::頻率成分之波形數據，放大至被抽出後的大小而加以 δ成’來生成合成聲音。 Ο Ο :本發y，在抽出部中，係將預先決定之頻率成分、’、、或是峰值之上位既定數個部分的頻率成分的大 y、’作為特徵參數’此外頻能生成合成聲音。之料成刀’盍上去除遮罩，而當對應各頻率成分之波形數據是採用方形波時，之波形數據。讀換，但疋，也可以採用其他抽出1根據本發明’可以利用離散傅利葉轉換，很容易地抽出特徵參數。於各：率t發明之聲音處理裝置中’事先分別預先對應中心=波形數據的基本頻率’係與該頻率成分的成分。⑧該波形數據係構成包含該基本頻率的倍音頻率的I作為對應於各頻率成分之波形數據，雖然基本领半的尚度一致，但作為波形數掳/ -係採用音色不同之波形數據。關於峰採用樂器所發出之聲音等。例如，項率成分係分配給（預定為）鋼琴聲、峰 7 201040940 值第2位之頻率成分係分配給吉他、成分係分配給貝斯聲等。峰值第3位之頻率之各:本發明，能將反映了使用者發出的聲音的特徵之各種耷音加以輸出。寸伋 Ο Ο 數個二=發明之聲音處理裝置中’生成部係構成自複 4據之候補，將其特徵參數最接近被抽出參數之候補聲音數據加以選擇，並補徵音數據，作為既定的聲音數據。 …候補聲例如，取得4個頻率分佈之峰值，之順序，亦即，"、目私+ -料逐漸升高聲音數據等。貝斯、吉他、鋼琴之順序，來分配条根據本發明，能將反胁之各種聲音加以輸出。將反映了使用者發出的聲音的特徵音處發明之另一觀點之聊天系統’係具有：第1聲 9 ，其接收第1使用者發出的聲音之合成聲音輪出到第2:::輪入’且將接收第2使用者發出的聲〜入聲曰處理裝置，其第^使用者；並! 音輪出到置，皆係上述聲音處理奘署曰處理裝生成部及輸出部。裝置，並具有輸人接收部、抽出部、輸入接收部，係接收使用者發出的聲音之輪入。抽出部’係抽出所接收的聲音的特徵參數。生成部，係根據既定的聲音數據來生成輸出部，係輸出所生成的合成聲音。聲日。 8 201040940 進而，生成部，係藉由將既定的聲音數據的特徵參數置換成被抽出之特徵參數的值，來生成合成聲音。^ 在第1聲音處理裝置及第2聲音處理裝置中曰被的特徵參數，係自抽出部透過電腦通訊網而往生成部本發明’係將上述聲音處理裝置適用於聲音聊天，並以電腦通訊網來連接抽出部與生成部之間。、，Speakers of anti-public order and goodness 0 The human audible range is 20 Hz to 20 kHz, so in order to completely recover the waveform data of the sound, it is necessary to have a sampling frequency of 40 kHz or more. Since the sound of the telephone does not contain the sound quality of frequency components above 4000 Hz, the sampling frequency of the coffee (10) must be used when the sound quality of the telephone is to be maintained. On the other hand, if a sampling frequency of less than 2 Hz is used, the amount of data that is not processed by the time is reduced, and the language information of the phrase or article transmitted by the voice can be completely removed. 9 Right According to the present invention, not only can the data s which must be processed be greatly reduced, but also the meaning can be communicated by words to become effective in preventing invasion of privacy or violation of public "= 201040940, etc. In the sound processing device of the present invention, the extraction unit performs discrete Fourier transform on the received sound, and obtains a frequency distribution: a predetermined number of frequency components, and a characteristic parameter is added: a generation unit In the extraction unit, the waveform data corresponding to the extracted:: frequency component is amplified in advance to the size after extraction and δ is formed to generate a synthesized sound. Ο Ο : This is the y, in the extraction unit, A predetermined frequency component, ', or a large y, 'as a characteristic parameter' of a frequency component of a predetermined number of portions above the peak is generated as a characteristic parameter, and a composite sound is generated. The material is cut into a knife to remove the mask. When the waveform data corresponding to each frequency component is a square wave, the waveform data is read and replaced, but 疋, other extractions 1 may also be used according to the present invention. In the sound processing device of the invention, the "basic frequency of the waveform corresponding to the center = waveform data" and the component of the frequency component are respectively selected in advance. The I of the multiplication frequency of the fundamental frequency is the waveform data corresponding to each frequency component, and although the basic length is the same, the waveform number 掳 / - is used as the waveform data of the different sounds. For example, the item rate component is assigned to (predetermined for) piano sound, peak 7 201040940, the second frequency component is assigned to the guitar, the component system is assigned to the bass sound, etc. The frequency of the third digit of the peak: Ben According to the invention, it is possible to output various kinds of arpeggios that reflect the characteristics of the sound emitted by the user. In the case of the sound processing device of the invention, the generation unit is a candidate for the self-recovery data, and its characteristic parameters are obtained. The candidate sound data closest to the extracted parameter is selected, and the sound data is supplemented as the predetermined sound data. ... For example, four candidate sounds are obtained. The peak of the rate distribution, the order, that is, the ", the private + material gradually increase the sound data, etc.. The order of bass, guitar, piano, to allocate the strip according to the present invention, can output various sounds of the anti-warm The chat system that reflects the characteristic sound of the sound emitted by the user, which is another aspect of the invention, has a first sound 9 that receives the synthesized sound of the sound emitted by the first user and turns to the second::: Wheeling in and receiving the sound from the second user to the sonar processing device, the second user; and the sound wheel is placed, the sound processing unit is the processing unit generating unit and the output unit. The device includes an input receiving unit, an extracting unit, and an input receiving unit to receive a round of sounds emitted by the user. The extracting unit extracts characteristic parameters of the received sound. The generating unit generates an output unit based on the predetermined sound data, and outputs the generated synthesized sound. Sound day. 8 201040940 Further, the generating unit generates a synthesized sound by replacing the characteristic parameter of the predetermined sound data with the value of the extracted characteristic parameter. ^ The characteristic parameters of the first sound processing device and the second sound processing device are transmitted from the extracting unit to the generating unit through the computer communication network. The present invention applies the sound processing device to the voice chat, and uses the computer communication network. Connected between the extraction unit and the generation unit. ,,

右根據本發明，藉由提供類似於聲音聊天之系統然無法取得使用者的詳細發言内容來作為言語資訊 =使用者們的感情來溝通意思。尤其，可以抑制由; 使用者們的發言内容所致之糾紛。、有關本發明之另一觀點之聲音處理接收部、抽出部、生成部及輸出部之聲音處理:置= 之聲音處理方法’其具有輸入接收步驟、抽出步驟步驟及輸出步驟，其構成如下。成二:之::w輸入接收部接收使用者發抽出部抽出所接收的聲音的特另外，在抽出步驟中徵參數。進而，在生成步驟中，哇占立R& _ 生成σ卩根據既又的聲音數攄戎生成合成聲音。课牙在此’在生成步驟中，孫 η备虹係藉由將既疋的聲音數據的相徵參數，置換成被抽出之特将徵參數的值，來生成合成聲音而且，於輸出步驟中，給，翰出。Ρ輸出所生成的合成聲音有關本發明之另—觀點頁訊δ己錄媒體’係記憶有浪 201040940 ^該程式係使電腦作為輸人錢部、抽輪出部而發揮功能。生成#及輸入接收部，係接收使用者發出的聲音之輸入。 #出邛係抽出所接收的聲音的特徵參數。生成部，係根據既定的聲音數據來輸出部，係輸出所生成的合成聲音。成聲曰置換I:抽生：部，係藉由將既定的聲音數據的特徵參數，〇成被抽出之特徵參數的值，來生成合成聲音。右根據本發明，能#雷 .處理裝置之功能。發揮如上述般地動作之聲音有發明之另一觀點之程式，係使電腦作為輸入接部、抽出部、生成部及輸出部而發揮功能。輸入接收部，係接收使用者發出的聲音之輸入。抽出部，係抽出所接收的聲音的特徵參數。生成部’係根據既定的聲音數據來生成合成聲音。〇輸出部’係輸出所生成的合成聲音。而且’生成部’係藉由將既定的聲音數據的特徵參數，置換+成被抽出之特徵參數故值，來生成合成聲音。从根據本發明’能使電腦發揮如上述般地動作之聲音處理裝置之功能。 ^本發明之程式，係可記錄於光碟片、軟碟片、硬碟光磁1·生碟片、數位影像光碟片、磁帶及半導體記憶體等電腦可讀取之資訊記錄媒體上。月述程式，係與執行程式之電腦相互獨立，可透過電 201040940 腦通訊網來散佈及銷售。又電腦相互獨立地散佈及鎖售〔發明之效果〕刖述資訊記錄媒體，係可與若根據本發明，能提供一種聲立簦立声搜古、— 日处理裝置、聊天系統、聲B處理方法、貧訊記錄媒體久柱式’非常適用於藉由使用者們的聲音來做某種程度羞裡枉度之溝通，且不會做不適切之會話。Right according to the present invention, by providing a system similar to voice chat, it is impossible to obtain the detailed speech content of the user as the speech information = the feelings of the users to communicate the meaning. In particular, it is possible to suppress disputes caused by the contents of the speech by the users. According to another aspect of the present invention, the sound processing receiving unit, the extracting unit, the generating unit, and the output unit perform sound processing: a sound processing method for setting = an input receiving step, an extracting step, and an output step, which are configured as follows. In the second:::w input receiving unit receives the user's sending unit to extract the received sound, and collects the parameters in the extracting step. Further, in the generation step, wow occupies R& _ generates σ 摅戎 to generate a synthesized sound based on the number of sounds 既. In the generation step, Sun 备虹虹生成生成孙孙孙孙孙 η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η , give, Han out.合成 Output generated synthetic sound. Another point of view of the present invention. The _ □ recorded media ‘has a memory of the wave 201040940 ^ This program makes the computer function as a loser and a pump. The generation # and input receiving units receive the input of the sound from the user. The #出邛 system extracts the characteristic parameters of the received sound. The generating unit outputs the generated synthesized sound based on the predetermined sound data. The vocal 曰 displacement I: the sputum: is a synthetic sound generated by deriving the characteristic parameters of the predetermined sound data into the values of the extracted characteristic parameters. Right according to the present invention, the function of the processing device can be #雷. A sound that operates as described above. Another program of the invention has a function of causing a computer to function as an input unit, an extracting unit, a generating unit, and an output unit. The input receiving unit receives an input of a sound emitted by the user. The extraction unit extracts the characteristic parameters of the received sound. The generating unit generates a synthesized sound based on the predetermined sound data. The 〇 output unit outputs the generated synthesized sound. Further, the 'generating unit' generates a synthesized sound by replacing + the characteristic parameter of the predetermined sound data into the extracted characteristic parameter value. According to the present invention, the computer can function as a sound processing device that operates as described above. The program of the present invention can be recorded on a computer-readable information recording medium such as a disc, a floppy disk, a hard disk magneto-optical disk, a digital video disk, a magnetic tape, and a semiconductor memory. The monthly program is independent of the computer that executes the program and can be distributed and sold through the 201040940 brain communication network. Moreover, the computers are distributed and locked independently of each other. [Effects of the Invention] The information recording medium can be described as follows. According to the present invention, it is possible to provide a sound, a sound recording, a daily processing device, a chat system, and an acoustic B processing. The method, the poor record recording media, is very suitable for communicating with some degree of shyness by the voice of the users, and does not make uncomfortable conversations.

【實施方式】下’為了容易理解，實現本發明之實施形了說明本發明者，並以下說明本發明之實施形態。以雖然利用遊戲用資訊處理裝置來說明態，但是，以下說明之實施形態係為非用於限制本申請案發明之範圍者。因此’只要是該業者皆可能採用將本發明之各要素或全部要素置換成與本發明均等者，但是，這些實施形態皆包含在本發明之範圍内。 (實施例1) 第1圖係表示藉由執行程式，而能發揮作為本實施形 I、的聲曰處理裝置之功能之典型的資訊處理系統的概要構成之示意圖。腋以下，參照本圖來加以說明。資訊處理裝置1〇〇，係構成具有CPU (中央處理單元 (Central Processing Unit) )1〇1、R〇M1〇2、RAM (隨機存取。己憶體（Rand〇m Access Memory ) )103、介面 1〇4、控制器105、外部記憶體1 〇6、影像處理部107、DVD-ROM (數 201040940 位多功能光碟唯讀記憶體（Digital Versatile Disc R0M)) 驅動器副、NiC (網路介面卡（Network Interface Card) )l〇9、聲音處理部11〇及麥克風u卜各種輸出入裝置，係可適當省略。 =記憶著遊戲用程式及數據之dvd_r〇m裝入dvd r〇m 驅動器108，然後打開資訊處理裝置100之電源，藉此， /程式會被執行，本實施形態之聲音處理裝置會被實現。 0 又’在攜帶型之遊戲裝置中，為了使其可攜帶，也可 ^㈣DVD ROM驅動器108，而是利用快閃記憶體用插槽等。在此情形下’將記錄有程式之快閃記憶體等加以插入，而執行該程式’藉此，本實施形態之聲音處理裝置便會被實現。此外’在藉由自終端裝置連接到飼服裝置，而進行聊天之系統中’終端裝置彼此之間與伺服裝置會協同運轉，而發揮聊天系統之功能，在此情形下，終端裝置與飼服裝〇置:雖然有計算能力的差異或機器構成方面的少許不同，仁疋本質上，典型上是設成與資訊處理裝置1〇〇相同的成又在此清形下，可採用一種形態，該傅服裝置係僅負責終端裝置之介紹，之後’藉由終端裝置彼此之間進行同級間（peer to peer)通訊，來形成聊天系統。 cpui(u，係控制資訊處理裝置1〇〇全體之動作，與各構成要素連接，來交換控制訊號或數據（資料）。又， CP_1，係能對於稱做暫存器（未圖示）之高速存取之記隐Q域，使用未圖示的ALU (算術邏輯單元⑹版^ 12 201040940[Embodiment] The present invention has been described with reference to the embodiments of the present invention, and the embodiments of the present invention will be described below. The present invention has been described with reference to the embodiments of the present invention. However, the embodiments described below are not intended to limit the scope of the invention of the present application. Therefore, it is intended that all of the elements or all of the elements of the invention may be substituted by the present invention, and such embodiments are intended to be included within the scope of the invention. (Embodiment 1) Fig. 1 is a schematic diagram showing a schematic configuration of an information processing system which is typical of the function of the sonar processing apparatus of the present embodiment I by executing a program.腋 Hereinafter, the description will be made with reference to this figure. The information processing device 1 is configured to include a CPU (Central Processing Unit) 1〇1, R〇M1〇2, and RAM (Rand〇m Access Memory) 103. Interface 1〇4, controller 105, external memory 1〇6, image processing unit 107, DVD-ROM (number 201040940 versatile disc-only memory (Digital Versatile Disc R0M)) driver pair, NiC (network interface) The network interface card 〇9, the sound processing unit 11A, and the microphone u can be appropriately omitted. The dvd_r〇m, which memorizes the game program and data, is loaded into the dvd r〇m driver 108, and then the power of the information processing device 100 is turned on, whereby the program is executed, and the sound processing device of the present embodiment is realized. 0 In the portable game device, in order to make it portable, it is also possible to use the (four) DVD ROM drive 108, and to use a flash memory slot or the like. In this case, the program is executed by inserting a flash memory or the like in which a program is recorded, whereby the sound processing apparatus of the present embodiment is realized. In addition, in the system in which chat is performed by connecting the terminal device to the feeding device, the terminal devices cooperate with the servo device to function as a chat system. In this case, the terminal device and the feeding device 〇: Although there is a difference in computing power or a little difference in the composition of the machine, Ren Biao is essentially set to be the same as the information processing device 1 The Fu service device is only responsible for the introduction of the terminal device, and then 'the peer device performs peer-to-peer communication with each other to form a chat system. Cpui (u, which controls the operation of the information processing device 1 and connects to each component to exchange control signals or data (data). Further, CP_1 can be called a scratchpad (not shown). High-speed access to the hidden Q domain, using an ALU not shown (Arithmetic Logic Unit (6) version ^ 12 201040940

LogicUnit))來進行加減乘除等之算術演算或邏輯輯積、邏輯否定等之邏輯演算、或位元 °、邏丨儿凡積、位开反轉、位元移位、位元迴轉等之位元演算等。進而，高速地進行對應於多媒體處理之加減乘除等飽和運算為。了是三角函數等、向量運算等’可以是CPU 101本身即具：此功能’或是以輔助處理器來實現。 /、 Ο ❹ 在ROM102記錄有於打開電源後不久立刻執行之IPL (Initial Pr〇gram Loader)，藉由執行這些，在勵一R〇M中之程式讀出到刪〇3，而開始由cpum所作之執行。又，在咖〇2記錄有在資訊處理裝置⑽ 動作控制時所必要之作業系統程式或各種數據。體 RAM103,係用於暫時記憶數據或程式者，其保持 DVD-麵讀出之程式或數據、及其他遊戲進行或聊天通訊時必要之數據。又，CPU101會進行如下等之處理，亦即例如在則103中設定變數區域，並直接對一些被儲存在該變數區域中的值，以ALU來進行運算，或是將—些被儲存在RAM 1G3中之值儲存在暫存器中，並對暫存器進行運算，再將運算結果存回記憶體。透過介面104而被連接之控制器1〇5，係接受使用者在執行遊戲時所實施之操作輸入。而且’控制器1〇5’未必一定要外接於資訊處理裝置 100，也可以一體形成。可攜帶之終端裝置的控制器1G5，係由各種按鍵或開關所構成，將這些之按壓操作t作是操作輸人來處理。此外， 13 201040940 在利用觸控螢幕之資訊處理裝置1 〇〇中’係將使用者利用筆或手指來描繪在觸控螢幕上的執跡，視為操作輸入。在透過介面1 04而裝卸自如被連接之外部記憶體1 〇6 中，可改寫地記憶有表示遊戲等之遊玩狀況（過去之成績等）之數據、表示遊戲進行狀態之數據、網路對戰時之聊天通訊之日S志（記錄）數據等。使用者，係藉由透過控制器 105來實施指示輸入，能將這些數據記錄到適當的外部記憶體106中。在要被裝入DVD-ROM驅動器108内之DVD-ROM中，記錄有用於實現遊戲的程式及附隨遊戲的影像數據或聲音數據（音訊資料）遊戲。藉由CPU1 01之控制，DVD-ROM驅動器 108 ’係對於被裝入DVD-ROM驅動器1〇8之DVD-ROM進行讀出處現，將必要之程式或數據加以讀出，這些係被暫時記憶在RAM103等。影像處理部107，在以CPU 101或是影像處理部1〇7 所具備之影像運鼻處理器（圖中未示），來對該從Dvd — ROM所讀出之資料，進行加工處理後，將其記錄在影像處理部107所具備的圖框記憶體（圖中未示）中。被記錄在圖框。己憶體十之影像資訊，會以特定同步時序被變換成視訊信號，並被輸出至連接於影像處理部1〇7之監視器（圖中未示）中。藉此’即可進行各種影像顯示。可攜帶之遊戲裝置之顯示器’典型上有利用小型液晶顯示器者，作為控制器105，當利用觸控螢幕時，該觸控螢幕的顯示面板係作為顯示器使用。用於在自家遊玩之遊 201040940 戲裝置或伺服裝置之顯顯不盗，可利用CRT ( Cathode RayLogicUnit)) to perform arithmetic calculations such as addition, subtraction, multiplication, division, etc., logical calculations such as logical accumulation, logical negation, etc., or bit position, logic, product, bit-inversion, bit shift, bit-turn, etc. Metacalculation and so on. Further, saturation operations such as addition, subtraction, multiplication, division, and the like corresponding to the multimedia processing are performed at high speed. It is a trigonometric function or the like, a vector operation, etc. 'may be that the CPU 101 itself has: this function' or is implemented by an auxiliary processor. /, Ο ❹ In the ROM 102, the IPL (Initial Pr〇gram Loader) executed immediately after the power is turned on is recorded. By executing these, the program in the R-M is read to delete 3, and the cpum is started. Execution. Further, in the curry 2, an operating system program or various data necessary for the operation control of the information processing device (10) is recorded. The body RAM 103 is used for temporarily storing data or programs, and maintains programs or data for DVD-side reading, and data necessary for other game play or chat communication. Further, the CPU 101 performs processing such as setting a variable area in the case 103, and directly performing calculation on the ALU by some values stored in the variable area, or storing some of them in the RAM. The value in 1G3 is stored in the scratchpad, and the scratchpad is operated, and the result is stored back in the memory. The controller 1〇5 connected through the interface 104 accepts an operation input performed by the user when executing the game. Further, the 'controller 1〇5' does not necessarily have to be externally connected to the information processing apparatus 100, and may be integrally formed. The controller 1G5 of the portable terminal device is composed of various buttons or switches, and these pressing operations t are handled by the operation input. In addition, 13 201040940 in the information processing device 1 using the touch screen is used to describe the execution of the user on the touch screen with a pen or a finger as an operation input. In the external memory 1 to 6 that is detachably connected through the interface 104, data indicating the play status of the game (such as past achievements, etc.), data indicating the progress of the game, and network battle time are rewritably stored. The day of the chat communication, S (record) data, etc. The user can record the data into the appropriate external memory 106 by implementing the instruction input through the controller 105. In the DVD-ROM to be loaded into the DVD-ROM drive 108, a program for realizing the game and a video data or audio data (audio data) game accompanying the game are recorded. By the control of the CPU 101, the DVD-ROM drive 108' reads out the DVD-ROM loaded into the DVD-ROM drive 1-8, and reads the necessary programs or data, and these are temporarily memorized. RAM103 and so on. The image processing unit 107 processes the data read from the DVD-ROM by the image processing unit (not shown) provided by the CPU 101 or the image processing unit 1〇7. This is recorded in a frame memory (not shown) included in the image processing unit 107. It is recorded in the frame. The image information of the memory is converted into a video signal at a specific synchronization timing, and is output to a monitor (not shown) connected to the image processing unit 1-7. By this, various image displays can be performed. The display of the portable game device 'typically has a small liquid crystal display device as the controller 105. When the touch screen is used, the display panel of the touch screen is used as a display. For travel in your own home 201040940 Play device or servo device is not stolen, you can use CRT (Cathode Ray

Tube)或電漿顯示器等顯示裝置。影像運算處理器，Tube) or display device such as a plasma display. Image computing processor,

J以阿速地執行二次元的影像重疊運算或是α混合（N a blendlng )等透過運算、各種飽和運算。 '、、向逮執行可獲得成像（rendering )影像的運异，該成像影俊孫ν 像係扎以Ζ緩衝器，來對被配置在假想三次元空間中且附加有各種織紋資訊的多邊形資訊進行成像J performs the image superposition operation of the second element or the transmission operation such as α blending (N a blendlng ) and various saturation calculations. ',, to perform the rendering of the image that can be imaged, the image of the image of the Sun ν ν 扎扎 , , , , , , , , , , , , 多边形多边形多边形多边形多边形多边形多边形多边形多边形多边形多边形多边形多边形多边形多边形多边形多边形多边形多边形多边形多边形多边形多边形多边形Information imaging

化（rendernig )’並自特定視點位置向特定視線方向俯瞰該被配置在假想三次元空間中之多邊形而得的影像。而且，藉由CPU101與影像演算處理器之協調動作，將可以根據用以定義文字形狀的字型資tfl，將文字列作為2 維影像而往框架記憶體描繪、或是往各多邊形表面上描繪。 NIC109，係用於使資訊處理裝置1〇〇連接到網際網路等電腦通訊網（未圖示）之物件，其可以是—依據在構成 LAN (區域網路）時所用之1〇BASE一 t/i〇〇base t規格者，或是一介面（未圖示）’將用於利用電話線路而連接至網際網路的類比數據機、ISDN(整合式服務數位網路）數據機、ADSL (非對稱式數位用戶線路）數據機、利用有線電視線路而連接至網際網路的有線電視數據機等，與cpu 101之間進行媒介。聲音處理部11 0，係使自DVD-ROM讀出之聲音數據轉換成類比聲音訊號’自被連接到聲音處理部丨丨〇之揚聲器（未圖示）輸出。又’在CPU101之控制下，將必須在遊戲進行中產生之效果音或樂曲數據加以生成，使與其對應之聲立 15 201040940 自揚聲器、或頭戴式耳機（调5〈禾圖不）、耳機（未圖示）輸出。在聲音處理部110中，备Λ τ 田破圮錄在DVD-ROM之聲音數據係MIDI數據時，參昭装且女々立 ^ ^ “、、其/、有之音源數據，使MIDI數據轉換成PCM數據。又，當兑上狀八係ADPCM形式或〇gg Vorbis形式專之壓縮後的聲音數據時，了將具展開而轉換成PCM數據0 PCM數據’係以對應直And rendering an image of the polygon arranged in the imaginary three-dimensional space from a specific viewpoint position to a specific line of sight direction. Moreover, by the coordinated operation of the CPU 101 and the image calculation processor, it is possible to draw the character string as a two-dimensional image to the frame memory or to the surface of each polygon according to the font type tfl for defining the shape of the character. . The NIC 109 is used to connect the information processing device 1 to an object such as a computer communication network (not shown) such as the Internet, which may be based on 1 BASE-t/ used in constructing a LAN (Regional Network). I〇〇base t specifier, or an interface (not shown) 'will be used to connect to the Internet's analog data machine, ISDN (integrated service digital network) data machine, ADSL (not used) A symmetric digital subscriber line) a data machine, a cable television modem connected to the Internet using a cable television line, and the like, and a medium between the CPU 101. The sound processing unit 110 converts the sound data read from the DVD-ROM into an analog sound signal 'outputted from a speaker (not shown) connected to the sound processing unit. In addition, under the control of the CPU 101, the effect sound or the music data which must be generated during the game is generated, so as to correspond to the sound standing 15 201040940 from the speaker, or the headphone (the 5 is not), the earphone (not shown) output. In the sound processing unit 110, when the sound data of the DVD-ROM is recorded in the MIDI data of the DVD-ROM, the sound data of the sound source data is converted into the sound source data, and the MIDI data is converted into PCM data. In addition, when the compressed audio data is specially formatted in the form of ADPCM or 〇gg Vorbis, it is converted into PCM data 0 PCM data'

.n ”取樣頻率之時序，來實施D/A Ο CDigital/Analog)轉換，缺咬认, 锝換然後輸出到揚聲器，藉此，能輸出聲音。而且，在資訊處理裝置〗衣置i00處，可透過介面104而連接到麥克風111 〇在此盆梧；^ π α 隹匕/、It iL·下’相對於來自麥克風u i 之類比sill號 > 係以適當之取揭相.玄七& 、田I取樣頻率來實施Λ/D轉換，當作 PCM形式之數位訊號，而能叩敗•進仃在聲音處理部11 〇之混合等處理。此外，資訊處理裝置⑽，係也可以使用硬碟等大容量外部記憶裝置，來發揮與_1Q2、_03、外部記憶體 106、及被組裝到DVD-R0M驅動器1〇8之DVD—R〇M等相同之功能。又，可以採用-種連接至供接收來自使用者之文字列之編輯輸人的鍵盤、或是供接收各種位置之指㈣選擇輸入之滑鼠等的型態。又，也可取代本實施形態之資訊處理裝置1 00，而使用通用之個人電腦。以上所說明的資訊處理裝置100，雖係相當於所謂消費型遊戲裝置’但也可以在例如行動電話、攜帶式遊戲機、 16 201040940 卡拉οκ裝置、一般商務用電腦等各種計算機上，來實現本發明。例如，一般的電腦和上述資訊處理裝置1〇〇 —樣，備有 CPU、RAM、ROM、DVD_R〇M 驅動器以及 NIC 等並具有功旎比資訊處理裝置i 00更簡單的影像處理部，且具有作為外部記憶裝置的硬碟，並可以利用軟碟片、光磁碟片、磁帶等。又，也可以不利用控制器，而利用鍵盤或〇是滑鼠等來作為輸入裝置。第2圖係表示本實施形態之聲音處理裝置及利用該聲音處理裝置而成之聊天系統的概要構成之說明圖。以下，參照本圖來說明本聲音處理裝置之各部概要。本實施形態之聊天系統211，係由2個聲音處理裝置 2〇1所構成。聲音處理裝置2〇1，係分別具有輸入接收部 202、抽出部203、生成部204及輸出部205。〇在此，輸入接收部2 0 2，係接收使用者發出之聲音輸入。在本實施形態中，係於CPU1〇1之控制下，麥克風m 能發揮輸入接收部202之功能。另外’抽出部203，係將被接收之聲音特徵參數加以抽出。在本實施形態中’係聲音處理部u〇與cpui〇1，發揮抽出部203之功能。而且’生成部204 ’係根據既定聲音數據來生成合成聲音°在此被生成之合成聲音，係將該既定聲音數據的特徵參數’置換成藉由抽出部2〇3而被抽出的特徵參數後之聲 17 201040940 音。在本實施形態中，係聲生成部2〇4之功能。^處理部⑴與咖101’發揮而且，輸出部205,係輸出所生形態中，於CPUUU之控制下在只施笨日處理部110驅動揚簦或耳機而發揮輸出部205之功能。 ° 而且，如本圖所示，聊夭备 Ο 系統211及2個聲音處理裝置2。卜雖然係藉由使用者：處理裝久使用者3荨2人利用之2 處裝置100而實現，但是，在1個聲音處理裝置 2(Η的抽出部2G3與生成部咖之間，特徵過電腦通訊網t通訊來傳$。你精由透係作為對於部203 、及部2 0 5而發亦即’使用者A利用之資訊處理裝置1 00 使舟者A發出的聲音之輸入接收部2〇2盥抽出對於使用纟B發出的聲音之生成部204與輪出揮功能。 ❹ Μ ’ Μ者㈣^資訊處理裝置⑽，係作為對於使用者Β發出的聲音之輸入接收部2〇2與抽出部2〇3、及對於使用者Α發出的聲音之生成部2〇4與輸出部· 揮功能。第3圖係表示聲音處理裝置2〇1所進行之傳訊處理的控制流程之流程圖。其係相當於輸入接收部2〇2及抽出部 203所貫施之處理。以下，參照本圖來說明之。當本處理被開始時’咖，係將來自麥克風⑴來的聲音波形輸入功能及RAM1〇3加以初始化（步驟s3〇。 201040940 在此，於RAM103處，係準備2個緩衝器，將其内部事先清除乾淨，前述緩衝器，係僅能記錄既定時間長度之自麥克" 風111被輸入之聲音波形數據。來自麥克風111之聲音波形數據之取樣頻率，係可藉由聲音處理部110的能力與設定來加以變更但是也可以設為 44100Hz、22050Hz 或 11〇25ίίζ 中之任一者，A/D 轉換之精度在典型上係设為8位元或16位元之單聲道（單聲）。 *儲存在緩衝器時之既定時間長度，典型上係設為實現聲音處理裝置201之資訊處理裝i 1〇〇的垂直同步插入週期的整數倍。例如當垂直同步插入的週期係i / 6 q秒（相當於60Hz )時，緩衝器之時間長度，典型上係〗/μ秒、I〆〗〇秒或1/20秒α 在此，1/20秒（相當於2〇Ηζ)係相當於人類可聽見頻率領域(範圍)之下限。亦即，當波形數據變化後，係相當於人類會-感覺到「音量（震動大小）之變化」或「音色（波形）之變化」之任一個的邊界，所以，在本發明中，典型上係採用此時間長度。例如，在具有16位元編號之單聲道取樣時，取樣頻率係441 〇〇Ηζ時’緩衝器長度係成為（1 6/8 ) X ( 441 00/20 ) ^ 4 410位元組。而且’使來自麥克風111之波形數據開始往RAM 103内的緩衝器的—邊之緩衝器蓄積（步驟S302 )，與此並行地，針對RAM103内之另—邊的緩衝器也實施以下處理。 19 201040940 亦即，自該緩衝器之波形數據列，抽出特徵參數（步驟S303 )。在此，將被記憶在緩衝器中之波形數據列設為 al5a2，…’aL。在上述實例中，ai，a2，…，以皆係具有位元編號之整數，L=2205。作為最簡單的特徵參數，可採用：.n "Sampling frequency timing, to implement D/A Ο CDigital/Analog) conversion, lack of bite recognition, 锝 change and then output to the speaker, whereby the sound can be output. Moreover, in the information processing device, i00, It can be connected to the microphone 111 through the interface 104. In this basin; ^ π α 隹匕 /, It iL · lower 'relative to the analog sill number from the microphone ui> is appropriate to take the phase. Xuan Qi & The field I sampling frequency is used to perform Λ/D conversion, and is used as a digital signal in the form of PCM, and can be defeated and processed in the mixing of the sound processing unit 11. In addition, the information processing device (10) can also be hard. Large-capacity external memory devices such as discs perform the same functions as _1Q2, _03, external memory 106, and DVD-R〇M assembled to DVD-ROM drive 1〇8. The keyboard for receiving the edit input from the user's character string, or the type for receiving the various positions (4) selecting the input mouse, etc. Alternatively, instead of the information processing device 100 of the present embodiment, And use a versatile personal computer. The information processing device 100 described above corresponds to a so-called consumer game device. However, the present invention can be realized by various computers such as a mobile phone, a portable game machine, a 16 201040940 Karaoke card device, and a general business computer. For example, a general computer and the above-described information processing apparatus are provided with a CPU, a RAM, a ROM, a DVD_R〇M driver, an NIC, and the like, and have a simpler image processing unit than the information processing apparatus i 00, and have As a hard disk of an external memory device, a floppy disk, a magneto-optical disk, a magnetic tape, etc. can be used. Alternatively, a keyboard or a keyboard can be used as an input device without using a controller. A schematic diagram showing a schematic configuration of a voice processing device and a voice system using the voice processing device according to the present embodiment. Hereinafter, an outline of each unit of the voice processing device will be described with reference to the same figure. The chat system 211 of the present embodiment is a system. The sound processing device 2〇1 includes an input receiving unit 202 and an extracting unit 203, respectively. The input unit 204 and the output unit 205. Here, the input receiving unit 2 0 2 receives the voice input from the user. In the present embodiment, under the control of the CPU 1〇1, the microphone m can function as an input receiving unit. In addition, the 'extraction unit 203 extracts the received sound characteristic parameters. In the present embodiment, the 'sound processing unit u〇 and cpui〇1 function as the extraction unit 203. 'The synthesized sound is generated based on the predetermined sound data. The synthesized sound generated here is the sound parameter of the predetermined sound data, which is replaced by the characteristic parameter extracted by the extracting unit 2〇3. . In the present embodiment, the function of the sound generating unit 2〇4 is performed. The processing unit (1) and the coffee maker 101' play the output unit 205. The output unit 205 outputs the output unit 205 under the control of the CPUUU and drives the speaker or the earphones only by the day-to-day processing unit 110. ° Moreover, as shown in the figure, the system 211 and the two sound processing devices 2 are located. Although it is realized by the user that the two devices 100 used by the user for 3 seconds are used for processing, the sound processing device 2 (the extraction unit 2G3 of the cymbal and the generating unit) The computer communication network t communication transmits the $. The fine input system is used as the input unit 2 for the part 203 and the part 205, that is, the information processing device 100 used by the user A is used to input the sound of the player A.生成2盥 extracts the generating unit 204 and the round-out function for the sound generated by the 纟B. ❹ Μ ' Μ (4) The information processing device (10) is used as an input receiving unit 2〇2 for the sound emitted by the user. The extracting unit 2〇3 and the generating unit 2〇4 and the output unit/swing function for the user's voice are emitted. Fig. 3 is a flow chart showing the control flow of the communication processing performed by the sound processing unit 2〇1. This corresponds to the processing performed by the input receiving unit 2〇2 and the extraction unit 203. Hereinafter, the description will be made with reference to the figure. When this process is started, the coffee waveform input function from the microphone (1) is used. RAM1〇3 is initialized (step s3〇. 20 1040940 Here, in the RAM 103, two buffers are prepared, and the inside of the buffer is prepared in advance. The buffer is only capable of recording sound waveform data input from the microphone "wind 111 for a predetermined length of time. From the microphone 111 The sampling frequency of the sound waveform data can be changed by the capability and setting of the sound processing unit 110, but can also be set to any of 44100 Hz, 22050 Hz, or 11 〇 25 ίί , and the accuracy of the A/D conversion is typically It is set to 8-bit or 16-bit mono (mono). * The predetermined length of time stored in the buffer is typically set to achieve the vertical processing of the information processing device 201 of the sound processing device 201. An integer multiple of the sync insertion period. For example, when the period of vertical sync insertion is i / 6 q seconds (equivalent to 60 Hz), the length of the buffer is typically 〗 / μ sec, I 〆〇或 or 1/20 Second α Here, 1/20 second (equivalent to 2〇Ηζ) is equivalent to the lower limit of the human audible frequency domain (range). That is, when the waveform data changes, it is equivalent to human beings - feeling "volume ( Vibration size) Or the boundary of any of the "changes in the timbre (waveform)", so in the present invention, this time length is typically employed. For example, in the case of mono sampling with a 16-bit number, the sampling frequency is 441. When the buffer length is (1 6/8 ) X ( 441 00/20 ) ^ 4 410 bytes, and 'the edge of the buffer from the microphone 111 is started to the buffer in the RAM 103. The buffer is accumulated (step S302), and in parallel with this, the following processing is also performed for the other side buffer in the RAM 103. 19 201040940 That is, the feature parameters are extracted from the waveform data column of the buffer (step S303). Here, the waveform data column to be memorized in the buffer is set to al5a2, ...'aL. In the above example, ai, a2, ..., are all integers having a bit number, L = 2205. As the simplest characteristic parameter, you can use:

(1 )位移絕對值之平均Σ t = iL |以丨/L (2)位移之平方平均Σ tMLat2/L 〇 (3 )位移絕對值之總和ς…L I at丨 (4)位移之平方和Σ t = iLat2等。其係相當於自麥克風1Π被輸入之聲音大小的特徵參數。更複雜的特徵參數，容後詳述。頃便說勹使s亥特徵參數，透過資訊處理裝置1 〇-〇的NIC109而往對方之資-訊處理裝1 100傳訊（步驟S304 )，直到在步驟S3G2中已經開始之往緩衝器之蓄積結束為止’-直待機（步驟S305 )。而且，在該待機中，也可以與其他處理—並行而協同性（.r〇utine)地實施。典型上，下述收訊處理會被並行地實施。緩衝器之任務（步當往緩衝器之蓄積結束時，交換2個驟S306 ) ’回到步驟S3〇2。如上所述’往緩衝器之波形數據單位來實施，所以，特徵參數積㈣1/20秒為因此，與通常之語音聊天相比，傳1係母1/2°秒被實施，減…而日下，必須傳送之數據量會減乂很多。而且，該特徵表 r 的緩衝。㈣/數之傳訊，係也可以實施適當 20 201040940 第4圖係表不聲音處理裝置2〇1所進行的收訊處理的㈣程之流程圖β其係相當於生成部2〇4及輸出部2〇5 所實施之處理。以下，參照本圖來說明。 CPU1 〇 1 ’首先係使既定聲音波形數據之輸出，以音量〇開始（步驟S401)。作為既定之聲音波形數據，可採用正弦波、方形波、以MIDI等所準備之各種樂器之聲音波形、及配音員等之聲音數據等種種聲音。 q 頃便說句’控制NIC109，直到來自對方資訊處理裝置/100的特徵參數到達為止，一直待機（步驟)。在此待機中’也彳以與其他處理並行而協同性地實施。典型上，上述收訊處理會被並行地實施。 s特徵參數到達時，將該特徵參數加以收訊（步驟 S403)。然後，使在步驟S401中已經開始之既定聲音波形數據輪出g里，變更成與收訊得到的特徵參數成比例之音量〇 (步驟S404 )，然後回到步驟S402。藉由这些傳訊處理及收訊處理，使配合傳訊側使用者务出之聲音大小而改變音量之聲音，能讓收訊侧使用者聽見。聲θ大小係反映對方使用者之感情，所以，即便是這種處理，也能做某種程度之意思溝通。九另外收訊側使用者聽到之聲音，係不過是將既定聲皮形數據之音$加以改變後之聲音，所以，無法得知發過何種音韻（聲音）。因此，即便傳訊側使用者發表違反 21 201040940 公序良俗之發言時，收訊側使用者無法得知其内容。因此能防止由於發言内容所致之糾紛。進而 &聲g大小與感情之相關關係，不論是使用何種語言都沒有什麼關係。在本實施形態巾，於收訊側的音^聲音）不清楚，由於原先是以無法理解的言語溝通當作前提，戶斤以，即便傳訊側的使用者與收訊側的使用者不具有能互相理解之言語，由於並不存在因語言所致之障礙’所以’反而能促進意思溝通。在上述實施形態中，雖然將聲音大小作為特徵參數來抽出’將要被輸出之聲音音量加以變化，但是，此態樣可以做種種變化。首先，作為抽出的特徵參數，進而可考慮採用（5)基本頻率之手法。為了取得基本頻率，使被蓄積在緩衝器中之波形數據列⑴，a2’...，aL做離散高速傅利葉轉換，只要取得具有最大 ❹成分之波峰的頻率即可。而且，將基本頻率與上述（n〜（4)之任一者加以組合，作為特徵參數，傳送到對方之資訊處理裝置100。在收訊側，於步驟S4〇4中，除了變化音量之外，也將再生（播放）既定波形數據之音程（頻率或音階），變更成收訊得到的特徵參數的基本頻率。當係正弦波、方形波、以MIDI等所準備之各種樂器之聲音波形時’只要將該聲音波形數據之再生頻率，配合收訊得到的特徵參數來變更即可。其相當於使卡拉〇κ等所實 22 201040940 施之「更換音階」進行更詳細的控制。又，即便係採用配音員等之聲音數據時，只要配人被傳送的特徵參數中所指定之再生頻率的上下變化口波形數據之音階上下變更即可。 <茸曰纟本態樣中，除了聲音大小之外，聲音高低也能傳遞 :對方’藉由抑揚或語調等’能更詳細理解使用者之感情，意思溝通能更好。 A度 Ο 此外，也可㈣㈣散高速傅利葉轉換之、％果，將⑷ 既疋複數個頻率中之頻率成分的大小，作為特徵參數。法π-:在收訊侧’係事先準備分別對應複數個頻率之 =:，使各波形數據之放大率，與對應之頻率成分的 =成比例。典型上，係與上述⑴〜⑷中任-者成盆具考慮ΜΙΜ音源時，鼓、貝斯、吉他及鋼琴，性ΐίΓ 的範圍相異。在此，將這些樂器的代表〇性聲音的頻率，作為上述「既定之複數頻率」二訊側’係使各樂器之音量，配合傅利葉轉二果，針對該樂器的代表頻率而被抽出之成分的大丄：加以變化。藉由這種處理，為如爵士樂隊之演奏般地播放。聲曰係成率帶==用這些時，也有將⑺既定1個或複數個頻羊帶内之波峰的頻率及其大小，當作特徵之手法。帶、吉帶，係事先決定鼓用頻率帶、貝斯用頻率口TO用頻率帶、及錮炅. 7用頻率f，自傅利葉轉換之結 23 201040940 果，選擇各頻率帶中之波峰。並且針對各頻率帶而選擇之波峰複數個。例如，鋼琴與其他樂器相比較下:、"為：個或帶較廣’所以增加配合其所選擇之波峰的個二盖之頻率产，配人it收Λ側，將各樂器的波形數據之輸出聲音高又…皮峰頻率’同時，配合其波峰：(1) The average value of the absolute value of displacement Σ t = iL | 丨 / L (2) The square of the displacement Σ tMLat2 / L 〇 (3) the sum of the absolute values of the displacement ς ... LI at 丨 (4) the square of the displacement Σ t = iLat2 and so on. It is equivalent to the characteristic parameter of the sound size input from the microphone 1Π. More complex feature parameters are detailed later. It is said that the characteristic parameter of the shai is transmitted to the other party's resource processing device 1100 through the NIC 109 of the information processing device 1 (step S304) until the accumulation of the buffer has begun in step S3G2. Until the end of the '- straight standby (step S305). Moreover, in this standby, it is also possible to perform in parallel with other processes. Typically, the following reception processing will be implemented in parallel. The task of the buffer (steps to the end of the accumulation of the buffer, the exchange of two steps S306) 'back to step S3〇2. As described above, it is implemented in the waveform data unit of the buffer. Therefore, the characteristic parameter product (4) is 1/20 second. Therefore, compared with the normal voice chat, the transmission 1 system is implemented in 1/2° second, minus... Underneath, the amount of data that must be transferred is much reduced. Moreover, the buffer of the feature table r. (4) / Number of communications can also be implemented as appropriate 20 201040940 Figure 4 is a flowchart of the (four) process of the receiving processing performed by the non-sound processing device 2〇1, which corresponds to the generating unit 2〇4 and the output unit 2〇5 The treatment implemented. Hereinafter, it will be described with reference to this figure. The CPU 1 〇 1 ' first causes the output of the predetermined sound waveform data to start with the volume ( (step S401). As the predetermined sound waveform data, various sounds such as a sine wave, a square wave, a sound waveform of various instruments prepared by MIDI, and the like, and sound data of a voice actor can be used. q I will say, 'Control NIC 109, until the characteristic parameters from the other party's information processing device /100 arrive, and wait until the step (step). In this standby, it is also implemented in parallel with other processing. Typically, the above receiving processing will be implemented in parallel. When the s characteristic parameter arrives, the feature parameter is received (step S403). Then, the predetermined sound waveform data which has been started in step S401 is rotated, and changed to a volume 成 which is proportional to the received characteristic parameter (step S404), and then returns to step S402. By means of these communication processing and receiving processing, the sound of the volume can be changed in accordance with the size of the voice communicated by the communication side user, so that the receiving side user can hear it. The size of the sound θ reflects the feelings of the other party's users, so even this kind of processing can communicate to some extent. The sound heard by the user on the other receiving side is simply the sound of changing the sound of the predetermined voice data, so it is impossible to know what kind of phonology (sound) has been sent. Therefore, even if the communication side user makes a statement that violates the public order of 21 201040940, the user on the receiving side cannot know the content. Therefore, it is possible to prevent disputes caused by the contents of the speech. Furthermore, the relationship between sound size and emotion is irrelevant regardless of the language used. In the present embodiment, the sound of the sound on the receiving side is unclear, and since the original communication is based on incomprehensible speech, the user does not have the user on the communication side and the user on the receiving side. Words that can understand each other, because there is no language-induced obstacles, 'but' can promote meaning communication. In the above embodiment, although the sound size is extracted as a characteristic parameter to change the volume of the sound to be outputted, this aspect can be variously changed. First, as the extracted characteristic parameters, the method of (5) the basic frequency can be considered. In order to obtain the fundamental frequency, the waveform data sequence (1), a2', and aL accumulated in the buffer are subjected to discrete high-speed Fourier transform, and the frequency having the peak of the largest ❹ component can be obtained. Further, the basic frequency is combined with any of the above (n to (4), and transmitted as a characteristic parameter to the information processing apparatus 100 of the other party. On the reception side, in step S4〇4, in addition to changing the volume It also reproduces (plays) the interval (frequency or scale) of the predetermined waveform data, and changes it to the fundamental frequency of the characteristic parameter obtained by the reception. When the sound waveform of various instruments prepared by sine wave, square wave, MIDI, etc. is used 'As long as the reproduction frequency of the sound waveform data is changed according to the characteristic parameters obtained by the reception, it is equivalent to more detailed control of the "replacement scale" applied by Karaoke κ, etc. 2010 201040. When the voice data of the voice actor or the like is used, it is only necessary to change the scale of the waveform of the up-and-down waveform of the reproduction frequency specified in the characteristic parameter to be transmitted, and the sound level is changed. In addition, the voice can also be transmitted: the other party 'by swaying or tone, etc.' can understand the user's feelings in more detail, meaning communication is better. A degree Ο In addition, (4) (4) The high-speed Fourier transform of the high-speed Fourier transform, (4) The size of the frequency component of the complex frequency is used as the characteristic parameter. The method π-: on the receiving side is prepared in advance to correspond to a plurality of frequencies =:, The magnification of each waveform data is proportional to the ratio of the corresponding frequency component. Typically, the drums, bass, guitar, and piano are used in consideration of the sound source of any of the above (1) to (4). Here, the frequency of the representative voices of these instruments is used as the above-mentioned "established complex frequency". The second-party side is used to make the volume of each instrument, with the Fourier turn, for the representative frequency of the instrument. The big part of the extracted component: change. With this treatment, it is played like a jazz band. The sonar system rate == When using these, there are also (7) one or more frequency sheep. The frequency and the size of the peaks in the band are used as the feature. The band and the giga band are determined in advance by the frequency band for the drum, the frequency band for the TO port for the bass, and the frequency band of the bass. Knot 23 201040940 If you select a peak in each frequency band, and select a number of peaks for each frequency band. For example, the piano is compared with other instruments: " is: or a wider band' Select the frequency of the two peaks of the peak, and match it to the side, and the output of the waveform data of each instrument is high and the peak frequency is 'at the same time, with its peak:

而變化。當針對某樂器而選擇複數個波峰時以該樂器演奏複數聲音即可。叶，、要汉疋成在此手法中，進而可以獲得以爵士人類發出的聲音之播放結果。奏來拉擬並且，上述手法，係可以適當組合或省略局部。如此-來’若根據本實施形態，藉由提供一種類似聲音聊天之系統’雖然無法將使用者的詳細發言内容，作為語:資訊來加以取得’卻可以藉由使用者們的感情來溝通 ”、尤八可以抑制由於使用者們的發言内容所致之糾紛0 、並且’針對本申請案’係以日本專利申請案特願 2009 012753號為基礎來主張優先權，該基礎巾請案之内谷係全部編入本申請案中。〔產業上的可利用性〕如上所述，若根據本發明，能提供一種聲音處理装置、聊天系統、聲音處理方法、資訊記錄媒體以及程式，非常適用於藉由使用者們的聲音來做某種程度之溝通，且不會做不適切之會話。 24 201040940 【圖式簡單說明】第1圖係表示典型的資訊處理裝置的概要構成之示意圖。第2圖係表示本發明的實施形態之聲音處理裝置及利用該聲音處理裝置而成之聊天系統的概要構成之說明圖。第3圖係表示利用本實施形態的聲音處理裝置來實施之傳訊處理的控制流程之流程圖。第4圖係表示利用本實施形態的聲音處理裝置來實施〇之收訊處理的控制流程之流程圖。【主要元件符號說明】 100 資訊處理裝置 109 NIC(網路介面卡） 101 CPU 110 聲音處理部 102 ROM 111 麥克風 103 RAM 201 聲音處理裝置 104 界面 202 輸入接收部 105 控制器 203 抽出部 106 外部記憶體 204 生成部 107 影像處理部 205 輸出部 108 DVD-ROM驅動器 211 聊天系統 25And change. When a plurality of peaks are selected for an instrument, the plural sound can be played with the instrument. Ye, and Han Hancheng In this way, you can get the result of the sound of the jazz human voice. To play the analogy, the above methods can be combined or omitted as appropriate. In this way, according to the present embodiment, by providing a system similar to voice chat, although the user's detailed speech content cannot be obtained as a language: information can be communicated by the feelings of the users. , Yuba can suppress disputes caused by the content of the user's speeches. 0, and 'for this application', based on the Japanese Patent Application No. 2009 012753, the priority is claimed. The entire system is incorporated in the present application. [Industrial Applicability] As described above, according to the present invention, it is possible to provide a sound processing device, a chat system, a sound processing method, an information recording medium, and a program, which are very suitable for borrowing. A certain degree of communication is made by the user's voice, and no uncomfortable conversation is made. 24 201040940 [Simplified Schematic] Fig. 1 is a schematic diagram showing a schematic configuration of a typical information processing device. A schematic diagram showing a schematic configuration of a voice processing device according to an embodiment of the present invention and a chat system using the voice processing device Fig. 3 is a flow chart showing the flow of control of the communication processing performed by the audio processing device of the embodiment. Fig. 4 is a flowchart showing the control flow of the reception processing by the audio processing device of the embodiment. [Main element symbol description] 100 information processing device 109 NIC (network interface card) 101 CPU 110 sound processing unit 102 ROM 111 microphone 103 RAM 201 sound processing device 104 interface 202 input receiving unit 105 controller 203 extraction unit 106 External memory 204 generating unit 107 Image processing unit 205 Output unit 108 DVD-ROM drive 211 Chat system 25

Claims

201040940 VII. Patent application scope: L sound processing device (201), characterized in that: an audio input receiving unit (202) receives the sound of the sound emitted by the user (then, the sound received by the material) The feature number '2' (2(4), which generates a synthesized sound wheel output unit (205) based on the predetermined sound data, and outputs the synthesized sound Ο: generated by the generating unit (204) The aforementioned two sounds are replaced with the values of the extracted characteristic parameters, and the synthesized sound is generated by the clip. Value 2. According to the scope of the patent application, the aforementioned characteristic parameters, the size and time of the basic frequency components change. The sound processing device (201) is a waveform or a volume, a basic frequency, or a predetermined representative frequency component. The sound processing device is as described in claim 2 of the patent scope (2〇1). ), the second part (2〇3), the above characteristic parameters are extracted in 1 second less than the human neck rate. 4. As in the patent application, the aforementioned extraction part discrete Fourier transform, The size of the frequency component is the sound processing device (2〇1), (2 0 3 ) described in the first item, and the received sound is obtained in the obtained frequency distribution t, and the predetermined complex number is used as the aforementioned The feature parameter is extracted, and 26 201040940 The Bida generation unit (204)' combines the waveform data of each of the extracted frequency components in advance, and enlarges the size after the extraction, and combines them to generate a synthesized sound. 5. The sound processing device (2(1) according to the fourth aspect of the invention, wherein the basic Ο: rate corresponding to the waveform data of each of the frequency components previously is consistent with the center frequency of the frequency component, The waveform data packet 3 is a multiplication component of the fundamental frequency. 6. The sound processing device (Xie) according to the application scope η, the generating unit (204) is a candidate for a plurality of sound data, I, characteristics The candidate sound data whose parameter is closest to the extracted characteristic parameter is selected, and the selected sound data is used as the aforementioned data. 7. A chat system (211), comprising: a sounding i-sound processing device (2G1) that receives an input of a sonar issued by a user and outputs the synthesized sound to a second user; The second sound processing device (201) 'receives the input of the first "sounding sound" and outputs the synthesized sound to the first "user"; the processing 2: the first sound processing device (Xie) and the second The sound processing device (201) has an input receiving unit (202) that receives an input of a sound emitted by a user, and an extracting unit (203) that extracts a parameter of the received sound; '27 201040940 generates a unit (204) that generates a synthesized sound based on the predetermined sound data, and an output unit (205) that outputs the synthesized sound generated as described above; wherein the generating unit (2〇4) is determined by the foregoing The characteristic parameter of the sound data is replaced by the value of the extracted characteristic parameter to generate the synthesized sound, and the characteristic parameter extracted by the month is transmitted from the extraction unit (2〇3). Transmitted to the communications network generation unit (2〇4). : "A sound processing method" is a sound processing method implemented by a sound processing device (201) having an input receiving unit (202), an extraction unit (204), a generating unit (204), and an output unit (205), and is characterized by The present invention includes: a receiving and receiving step, the rounding of the generated sound; the wheel receiving receiving unit (202) receiving the user's hair extracting unit (10) extracting the received sound generating step, the generating unit (204) The established sound data is used to generate synthetic sounds as well as the car daylights like wood output steps, before giving the mountain, sound; ^.卩(2〇5) outputs the above-mentioned generated synthesis in which the value of the characteristic parameter of the semi-evented feature is described in the step 摅&It is formed by the feature of the aforementioned predetermined sound data The parameter 'substitution is extracted as described above to generate the aforementioned synthesized sound. 28 201040940 The program system 9 is an information recording medium characterized in that a program is stored to cause a computer to function as a part: a receiving unit (202) 'which receives an input of a sound from a user; a part (2G3)' It extracts the characteristic parameters of the received sound. The part (204) generates a synthesized sound output based on the predetermined sound data. Ρ (205) 'outputs the synthesized sound generated as described above; the number of turns 栌 the generating unit (2° 4)' replaces the characteristic parameter of the predetermined sound data with the value of the extracted characteristic parameter To generate the aforementioned synthesized sound. The program 'is characterized in that the computer functions as a part: an input receiving unit (202) that receives an input of a sound from a user; and extracts. Ρ (203) extracting the characteristic parameter of the received sound; the 〇 generating unit (204) generating a synthesized sound based on the predetermined sound data; and outputting 邛 (205) outputting the generated synthesized sound; The generating unit (204) generates the synthesized sound by replacing the characteristic parameter of the predetermined audio data with the value of the extracted characteristic parameter. 29