WO2021210121A1 - Food texture generation method, food texture generation device, and food texture generation program - Google Patents

Food texture generation method, food texture generation device, and food texture generation program Download PDF

Info

Publication number
WO2021210121A1
WO2021210121A1 PCT/JP2020/016683 JP2020016683W WO2021210121A1 WO 2021210121 A1 WO2021210121 A1 WO 2021210121A1 JP 2020016683 W JP2020016683 W JP 2020016683W WO 2021210121 A1 WO2021210121 A1 WO 2021210121A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
chewing
edible
food
generated
Prior art date
Application number
PCT/JP2020/016683
Other languages
French (fr)
Japanese (ja)
Inventor
十季 武田
有信 新島
隆文 向内
佐藤 隆
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2020/016683 priority Critical patent/WO2021210121A1/en
Publication of WO2021210121A1 publication Critical patent/WO2021210121A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility

Definitions

  • the present invention relates to a texture generation method, a texture generation device, and a texture generation program.
  • Non-Patent Document 1 In the technical field of VR (Virtual Reality), research is being conducted on a texture generation presentation technology that generates a texture sound and presents it to a user (see Non-Patent Document 1).
  • the conventional texture generation presentation technology only presents the texture of edible foods, it is a VR technology that can realize a simulated experience, but it has a texture that does not look good as before. I could only present it.
  • the present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technique capable of supporting the development of a new texture that has never existed before.
  • One aspect of the present invention is a texture generation method performed by a computer, wherein the computer makes a chewing sound of an edible substance similar to the generated sound of the non-edible substance with respect to the generated sound of the non-edible substance. Is synthesized and reproduced so that the chewing sound of the edible product is reproduced earlier with an arbitrary time difference.
  • the texture generating device includes a storage unit that stores the chewing sounds of a plurality of different types of foods and the characteristic amounts of the chewing sounds of the plurality of foods in association with each other, and a non-edible material.
  • the feature amount of the chewing sound of the plurality of edible substances is read out from the extraction unit that extracts the feature amount of the generated sound of the non-edible product from the generated sound of
  • the calculation unit that calculates the similarity with the feature amount of the chewing sound of the plurality of edible foods and selects the chewing sound of the edible food having the highest degree of similarity based on the calculated result of the calculated similarity.
  • the synthesis unit and the synthesis unit that synthesize the chewing sound of the selected edible product with respect to the generated sound of the non-edible product so that the chewing sound of the selected edible product is reproduced earlier with an arbitrary time difference. It is provided with a presentation unit that reproduces the generated sound source by one or more of a speaker and a vibration motor.
  • the texture generation program of one aspect of the present invention is a texture generation program that causes a computer to function as the texture generation device.
  • FIG. 1 is a configuration diagram showing a functional block of a texture generating device.
  • FIG. 2 is a flow chart showing the operation of the texture generating device.
  • FIG. 3 is a diagram showing a composite image of the generated sound of the non-edible product and the chewing sound of the edible product.
  • FIG. 4 is a configuration diagram showing a hardware configuration of the texture generating device.
  • the present invention proposes to make the user perceive that he / she is eating non-edible food by synthesizing the chewing sound of edible food with respect to the generated sound of inedible non-edible food.
  • the present invention makes the user perceive that even a non-edible food can be eaten by presenting the chewing sound of the edible food first.
  • the edible product in order to give a natural connection between two sounds obtained by synthesizing the generated sound of the non-edible product and the chewing sound of the edible product, the edible product having similar characteristics to the generated sound of the non-edible product. Synthesize the chewing sound of.
  • the mastication sound of food with a time difference of 10 to 30 ms (milliseconds) that the person can perceive and feel a sense of unity. And the sound generated by non-edible foods are combined.
  • the chewing sound when eating a hard food includes two sounds, a destructive sound and a tooth-grinding sound, and the time interval between these two sounds is about 10 to 30 ms. Because there are many.
  • the human time resolution is about 3 ms, and it is said that the time resolution that can recognize the order of sounds is more than a dozen ms (Kashiwano, "From the brain to immersive communication: timing perception mechanism", Information Science and Technology Forum, 2002. Year). Based on these backgrounds, it is possible to create a natural and effective connection by synthesizing the generated sound of non-edible food at the timing of the sound of grinding edible food.
  • FIG. 1 is a configuration diagram showing a functional block of the texture generating device 1 according to the present embodiment.
  • the texture generation device 1 includes, for example, an input unit 11, an extraction unit 12, a calculation unit 13, a synthesis unit 14, a presentation unit 15, and a storage unit 16.
  • the texture generating device 1 is a computer such as a server, has a built-in speaker, and is connected to a vibration motor that can be attached to any part of the user.
  • the storage unit 16 is a mastication sound feature amount database that stores the mastication sound feature amount, and is a mastication sound feature amount database of a plurality of different types of edible foods and the mastication sound feature amount Fi (i is a natural number) of the plurality of edible foods. ) And have a function to associate and memorize each.
  • the chewing sound of edible foods is, for example, the chewing sound of potato chips, the chewing sound of Namuru, and the like, and is the sound of chewing edible foods.
  • the input unit 11 has a function of inputting a sound source of a non-edible substance (sound generated by the non-edible substance) received by the texture generating device 1.
  • the sound generated by non-edible foods is, for example, the sound of a plurality of coins overlapping each other and rattling, the sound of stones crushing, and the like, and is the sound generated by non-edible foods that cannot be eaten.
  • the extraction unit 12 has a function of analyzing the input generated sound of the non-edible product and extracting the feature amount S of the generated sound of the non-edible product from the generated sound of the non-edible product based on the analysis result.
  • the method for analyzing the feature amount of sound can be realized by using, for example, a calculation formula for performing zero crossover rate, power spectrum analysis, cepstrum analysis, or the like.
  • the feature value F i chewing sounds edible product stored in the storage unit 16 it is determined using the calculation formula as described above calculation formula.
  • Calculating unit 13 from the storage unit 16 reads the feature quantity F i chewing sounds of a plurality of edible products, the feature amount S of the sound generated in the extracted non-food product, characterized in chewing sounds plurality of edible products read the amount F i, the similarity, for example, by using a cosine similarity, each calculated on the basis of the calculated similarity of the calculation result, a function of selecting the chewing sounds of highest similarity edible product.
  • the synthesis unit 14 has a function of synthesizing the chewing sound of the selected food with respect to the input generated sound of the non-edible food so that the chewing sound of the selected food is reproduced earlier with an arbitrary time difference.
  • the arbitrary time difference is, for example, 10 to 30 ms.
  • the synthesis unit 14 synthesizes, for example, so that the generated sound of the non-edible food is reproduced 20 ms after the start of reproduction of the chewing sound of the edible material or the reproduction time of the destructive sound.
  • the presentation unit 15 has a function of reproducing the synthesized synthetic sound source by one or more of the speaker and the vibration motor.
  • the vibration motor is mounted around the user's shoulder or chin, for example.
  • the presentation unit 15 reproduces the synthetic sound source with the speaker and / or the vibration motor, the sound corresponding to the synthetic sound source is propagated from the speaker to the user, and the vibration corresponding to the synthetic sound source is transmitted by the vibration motor. Communicate to the user.
  • FIG. 2 is a flow chart showing the operation of the texture generating device 1.
  • Step S1 First, the input unit 11 generates the texture of the non-edible food that is desired to be synthesized with respect to the chewing sound of the food, based on the user's designation or the acquisition and selection from the database on the Internet by the texture generation device 1. Input to the inside of the device 1. For example, the input unit 11 inputs a jerky sound of coins (hereinafter, jerky sound).
  • jerky sound a jerky sound of coins
  • Cepstrum analysis is a method for analyzing the spectral envelope of sound vibration, and is a known technique used in the field of speech recognition.
  • the extraction unit 12 can use any existing technique capable of calculating sound features.
  • the calculation unit 13 can use any existing technique capable of calculating the similarity of sounds.
  • Step S4 the calculation unit 13 selects the chewing sound of the edible food having the highest degree of similarity to the input generated sound of the non-edible food based on the calculated calculation result of the similarity. For example, the calculation unit 13 selects the chewing sound of potato chips.
  • FIG. 3 is a diagram showing a composite image of the jerking sound of coins and the chewing sound of potato chips.
  • SIGa is waveform data of the jerking sound of coins.
  • SIGb is waveform data of the chewing sound of potato chips.
  • the synthesis unit 14 synthesizes the potato chips chewing sound SIGb first and the coin jerking sound SIGa second. Also, from the SIGb, it can be understood that there are two peaks at t1 and t2.
  • t1 is the time when the potato chips are started to be destroyed by the teeth
  • t2 is the time when the potato chips are started to be crushed by the teeth.
  • the time interval between t1 and t2 is about 10 to 30 ms as described above.
  • the synthesizing unit 14 is after, for example, 20 ms has elapsed from the time when the reproduction disclosure time ts of the coin jerking sound SIGa is t1, or after, for example, 20 ms has elapsed from the reproduction start time t0 of the chewing sound of potato chips (however, after t1). ).
  • Step S6 Finally, the presentation unit 15 reproduces the synthesized synthetic sound source by the speaker built in the texture generating device 1, and the synthetic sound source is connected to the texture generating device 1 and is mounted around the user's jaw. Play with. At this time, the chewing sound of potato chips is reproduced first, and after 20 ms, the combined sound of the chewing sound of potato chips and the jerking sound of coins is reproduced.
  • the texture generating device 1 performs the above operation, the familiar chewing sound is heard first, and the generated sound of non-edible food is presented with a natural connection.
  • the vibration corresponding to the familiar mastication sound is transmitted first, and the vibration corresponding to the generated sound of the non-edible material is presented by the natural connection.
  • the sense of unity between the edible food and the non-edible food is increased, and it is possible to present the texture that makes the non-edible food feel like eating, and it is possible to support the development of a new texture that has never existed before.
  • VR contents there is a character who eats something that cannot be eaten, and it can be applied to the effect of becoming such a character.
  • the texture generating device 1 makes the chewing sound of the edible product similar to the generated sound of the non-edible product, and the chewing sound of the edible product is arbitrary with respect to the generated sound of the non-edible product. Since it is synthesized and reproduced so that it is reproduced first with a time lag, it is possible to have a natural connection (sense of unity) between the generated sound of non-edible food and the chewing sound of edible food, and it is possible to have a natural connection (sense of unity). It is possible to present a texture that makes you feel that you have eaten. As a result, it is possible to support the development of a new texture that has never existed before.
  • the present invention is not limited to the above embodiments.
  • the present invention can be modified in a number of ways within the scope of the gist of the present invention.
  • the texture generating device 1 of the present embodiment described above has, for example, as shown in FIG. 4, a CPU (Central Processing Unit, processor) 901, a memory 902, and a storage 903 (HDD: Hard Disk Drive, SSD: Solid). It can be realized by using a general-purpose computer system including a State Drive) 903, a communication device 904, an input device 905, and an output device 906.
  • the memory 902 and the storage 903 are storage devices.
  • each function of the texture generating device 1 is realized by executing a predetermined program loaded on the memory 902 by the CPU 901.
  • the texture generating device 1 may be mounted on one computer.
  • the texture generating device 1 may be implemented by a plurality of computers.
  • the texture generating device 1 may be a virtual machine mounted on a computer.
  • the program for the texture generator 1 can be stored in a computer-readable recording medium such as an HDD, SSD, USB (Universal Serial Bus) memory, CD (Compact Disc), or DVD (Digital Versatile Disc).
  • the program for the texture generating device 1 can also be distributed via a communication network.
  • Texture generator 11 Texture generator 11: Input unit 12: Extraction unit 13: Calculation unit 14: Synthesis unit 15: Presentation unit 16: Storage unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Toys (AREA)

Abstract

This food texture generation device 1 comprises: a storage unit 16 which stores the sounds of chewing a plurality of different types of food products and the characteristic amounts of the sounds of chewing the plurality of food products in association with each other; an extraction unit 12 which extracts the characteristic amount of the generated sound of a non-food product, from the generated sound of the non-food product; a calculation unit 13 which reads out the characteristic amounts of the sounds of chewing the plurality of food products from the storage unit 16, calculates the similarities between the characteristic amount of the generated sound of the non-food product and the characteristic amounts of the sounds of chewing the plurality of food products, respectively, and selects the sound of chewing the food product with the highest similarity, on the basis of the calculation result of the calculated similarities; a synthesis unit 14 which synthesizes the generated sound of the non-food product with the sound of chewing the selected food product such that the sound of chewing the selected food product is reproduced earlier with any time difference; and a presentation unit 15 which reproduces the synthesized sound source through one or more among a speaker and a vibration motor.

Description

食感生成方法、食感生成装置、および、食感生成プログラムTexture generation method, texture generation device, and texture generation program
 本発明は、食感生成方法、食感生成装置、および、食感生成プログラムに関する。 The present invention relates to a texture generation method, a texture generation device, and a texture generation program.
 VR(Virtual Reality)の技術分野において、食感の音を生成してユーザへ提示する食感生成提示技術に関する研究が行われている(非特許文献1参照)。 In the technical field of VR (Virtual Reality), research is being conducted on a texture generation presentation technology that generates a texture sound and presents it to a user (see Non-Patent Document 1).
 しかしながら、従来の食感生成提示技術は、食べることが可能な食用物の食感を提示するにすぎないため、疑似体験を実現可能なVR技術でありながら、これまでと何ら代わり映えのない食感しか提示できなかった。 However, since the conventional texture generation presentation technology only presents the texture of edible foods, it is a VR technology that can realize a simulated experience, but it has a texture that does not look good as before. I could only present it.
 本発明は、上記事情に鑑みてなされたものであり、本発明の目的は、今までにない新たな食感の開拓を支援可能な技術を提供することである。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technique capable of supporting the development of a new texture that has never existed before.
 本発明の一態様の食感生成方法は、コンピュータで行う食感生成方法において、前記コンピュータは、非食用物の発生音に対して、前記非食用物の発生音に類似する食用物の咀嚼音を、前記食用物の咀嚼音の方が任意の時間差で先に再生されるように合成して再生するステップを行う。 One aspect of the present invention is a texture generation method performed by a computer, wherein the computer makes a chewing sound of an edible substance similar to the generated sound of the non-edible substance with respect to the generated sound of the non-edible substance. Is synthesized and reproduced so that the chewing sound of the edible product is reproduced earlier with an arbitrary time difference.
 本発明の一態様の食感生成装置は、種類の異なる複数の食用物の咀嚼音と前記複数の食用物の咀嚼音の特徴量とをそれぞれ関連付けて記憶しておく記憶部と、非食用物の発生音から前記非食用物の発生音の特徴量を抽出する抽出部と、前記記憶部から前記複数の食用物の咀嚼音の特徴量を読み出して、前記非食用物の発生音の特徴量と前記複数の食用物の咀嚼音の特徴量との類似度をそれぞれ算出し、前記算出した類似度の算出結果を基に最も類似度の高い食用物の咀嚼音を選択する算出部と、前記非食用物の発生音に対して、前記選択した食用物の咀嚼音を、前記選択した食用物の咀嚼音の方が任意の時間差で先に再生されるように合成する合成部と、前記合成した音源をスピーカー、振動モータのうち1つ以上で再生する提示部と、を備える。 The texture generating device according to one aspect of the present invention includes a storage unit that stores the chewing sounds of a plurality of different types of foods and the characteristic amounts of the chewing sounds of the plurality of foods in association with each other, and a non-edible material. The feature amount of the chewing sound of the plurality of edible substances is read out from the extraction unit that extracts the feature amount of the generated sound of the non-edible product from the generated sound of And the calculation unit that calculates the similarity with the feature amount of the chewing sound of the plurality of edible foods and selects the chewing sound of the edible food having the highest degree of similarity based on the calculated result of the calculated similarity. The synthesis unit and the synthesis unit that synthesize the chewing sound of the selected edible product with respect to the generated sound of the non-edible product so that the chewing sound of the selected edible product is reproduced earlier with an arbitrary time difference. It is provided with a presentation unit that reproduces the generated sound source by one or more of a speaker and a vibration motor.
 本発明の一態様の食感生成プログラムは、上記食感生成装置としてコンピュータを機能させる食感生成プログラムである。 The texture generation program of one aspect of the present invention is a texture generation program that causes a computer to function as the texture generation device.
 本発明によれば、今までにない新たな食感の開拓を支援可能な技術を提供することができる。 According to the present invention, it is possible to provide a technique capable of supporting the development of a new texture that has never existed before.
図1は、食感生成装置の機能ブロックを示す構成図である。FIG. 1 is a configuration diagram showing a functional block of a texture generating device. 図2は、食感生成装置の動作を示すフロー図である。FIG. 2 is a flow chart showing the operation of the texture generating device. 図3は、非食用物の発生音と食用物の咀嚼音との合成イメージを示す図である。FIG. 3 is a diagram showing a composite image of the generated sound of the non-edible product and the chewing sound of the edible product. 図4は、食感生成装置のハードウェア構成を示す構成図である。FIG. 4 is a configuration diagram showing a hardware configuration of the texture generating device.
 以下、図面を参照して、本発明の実施形態を説明する。図面の記載において同一部分には同一符号を付し説明を省略する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the description of the drawings, the same parts are designated by the same reference numerals and the description thereof will be omitted.
 [発明の概要]
 本発明は、食べられない非食用物の発生音に対して、食べられる食用物の咀嚼音を合成することで、非食用物を食べていることをユーザに知覚させることを提案する。
[Outline of Invention]
The present invention proposes to make the user perceive that he / she is eating non-edible food by synthesizing the chewing sound of edible food with respect to the generated sound of inedible non-edible food.
 このとき、本発明は、食用物の咀嚼音の方を先に提示することで、非食用物であっても食べられることをユーザに知覚させる。 At this time, the present invention makes the user perceive that even a non-edible food can be eaten by presenting the chewing sound of the edible food first.
 また、本発明は、非食用物の発生音と食用物の咀嚼音とを合成した2つの音同士に自然な繋がりを持たせるため、非食用物の発生音に対して特徴が類似した食用物の咀嚼音を合成する。加えて、食べ物の破壊時の咀嚼音に対する人の知覚経験、及び、人の時間分解能に基づき、人が知覚可能かつ一体感を感じられる10~30ms(ミリ秒)の時間差で食用物の咀嚼音と非食用物の発生音とを合成する。これは、硬い物を食べた際の咀嚼音には、破壊音と歯ですり潰す音との2つの音が含まれており、この2つの音の時間間隔は約10~30msであることが多いからである。人の時間分解能は約3msであり、音の順番を認識できる時間分解能は十数ms以上といわれている(柏野、“脳から臨場感通信へ:タイミングの知覚メカニズム”、情報科学技術フォーラム、2002年)。これらの背景に基づき、食用物をすり潰す音のタイミングで非食用物の発生音を合成することで、自然かつ効果的な繋がりを作り出すことを実現する。 Further, in the present invention, in order to give a natural connection between two sounds obtained by synthesizing the generated sound of the non-edible product and the chewing sound of the edible product, the edible product having similar characteristics to the generated sound of the non-edible product. Synthesize the chewing sound of. In addition, based on the person's perceived experience of the mastication sound when food is destroyed and the time resolution of the person, the mastication sound of food with a time difference of 10 to 30 ms (milliseconds) that the person can perceive and feel a sense of unity. And the sound generated by non-edible foods are combined. This is because the chewing sound when eating a hard food includes two sounds, a destructive sound and a tooth-grinding sound, and the time interval between these two sounds is about 10 to 30 ms. Because there are many. The human time resolution is about 3 ms, and it is said that the time resolution that can recognize the order of sounds is more than a dozen ms (Kashiwano, "From the brain to immersive communication: timing perception mechanism", Information Science and Technology Forum, 2002. Year). Based on these backgrounds, it is possible to create a natural and effective connection by synthesizing the generated sound of non-edible food at the timing of the sound of grinding edible food.
 [食感生成装置の構成]
 図1は、本実施形態に係る食感生成装置1の機能ブロックを示す構成図である。食感生成装置1は、例えば、入力部11と、抽出部12と、算出部13と、合成部14と、提示部15と、記憶部16と、を備える。食感生成装置1は、サーバ等のコンピュータであり、スピーカーが内蔵されており、ユーザの任意部位に装着可能な振動モータが接続されている。
[Structure of texture generator]
FIG. 1 is a configuration diagram showing a functional block of the texture generating device 1 according to the present embodiment. The texture generation device 1 includes, for example, an input unit 11, an extraction unit 12, a calculation unit 13, a synthesis unit 14, a presentation unit 15, and a storage unit 16. The texture generating device 1 is a computer such as a server, has a built-in speaker, and is connected to a vibration motor that can be attached to any part of the user.
 記憶部16は、咀嚼音の特徴量を格納した咀嚼音特徴量データベースであり、種類の異なる複数の食用物の咀嚼音と、その複数の食用物の咀嚼音の特徴量F(iは自然数)と、をそれぞれ関連付けて記憶しておく機能を備える。食用物の咀嚼音とは、例えば、ポテトチップスの咀嚼音、ナムルの咀嚼音等であり、食べることが可能な食用物を咀嚼する音である。 The storage unit 16 is a mastication sound feature amount database that stores the mastication sound feature amount, and is a mastication sound feature amount database of a plurality of different types of edible foods and the mastication sound feature amount Fi (i is a natural number) of the plurality of edible foods. ) And have a function to associate and memorize each. The chewing sound of edible foods is, for example, the chewing sound of potato chips, the chewing sound of Namuru, and the like, and is the sound of chewing edible foods.
 入力部11は、食感生成装置1が受け付けた非食用物の音源(非食用物の発生音)を入力する機能を備える。非食用物の発生音とは、例えば、複数の硬貨が互いに重なり合いジャラジャラする音、石が粉砕する音等であり、食べることが不可能な非食用物が発生させる音である。 The input unit 11 has a function of inputting a sound source of a non-edible substance (sound generated by the non-edible substance) received by the texture generating device 1. The sound generated by non-edible foods is, for example, the sound of a plurality of coins overlapping each other and rattling, the sound of stones crushing, and the like, and is the sound generated by non-edible foods that cannot be eaten.
 抽出部12は、入力した非食用物の発生音を解析し、その解析した結果に基づき、非食用物の発生音から当該非食用物の発生音の特徴量Sを抽出する機能を備える。音の特徴量の解析方法については、例えば、ゼロ交差率、パワースペクトル解析、ケプストラム分析等を行う算出式を用いて実現可能である。尚、記憶部16に記憶されている食用物の咀嚼音の特徴量Fについても、上記算出式と同様の算出式を用いて求められている。 The extraction unit 12 has a function of analyzing the input generated sound of the non-edible product and extracting the feature amount S of the generated sound of the non-edible product from the generated sound of the non-edible product based on the analysis result. The method for analyzing the feature amount of sound can be realized by using, for example, a calculation formula for performing zero crossover rate, power spectrum analysis, cepstrum analysis, or the like. Incidentally, for the feature value F i chewing sounds edible product stored in the storage unit 16, it is determined using the calculation formula as described above calculation formula.
 算出部13は、記憶部16から複数の食用物の咀嚼音の特徴量Fを読み出して、抽出した非食用物の発生音の特徴量Sと、読み出した複数の食用物の咀嚼音の特徴量Fと、の類似度を、例えばコサイン類似度を用いて、それぞれ算出し、算出した類似度の算出結果に基づき、最も類似度の高い食用物の咀嚼音を選択する機能を備える。 Calculating unit 13 from the storage unit 16 reads the feature quantity F i chewing sounds of a plurality of edible products, the feature amount S of the sound generated in the extracted non-food product, characterized in chewing sounds plurality of edible products read the amount F i, the similarity, for example, by using a cosine similarity, each calculated on the basis of the calculated similarity of the calculation result, a function of selecting the chewing sounds of highest similarity edible product.
 合成部14は、入力した非食用物の発生音に対して、選択した食用物の咀嚼音を、選択した食用物の咀嚼音の方が任意の時間差で先に再生されるように合成する機能を備える。任意の時間差とは、例えば10~30msである。合成部14は、例えば、食用物の咀嚼音の再生開始から、又は破壊音の再生時刻から、20ms経過後に非食用物の発生音が再生されるように合成する。 The synthesis unit 14 has a function of synthesizing the chewing sound of the selected food with respect to the input generated sound of the non-edible food so that the chewing sound of the selected food is reproduced earlier with an arbitrary time difference. To be equipped with. The arbitrary time difference is, for example, 10 to 30 ms. The synthesis unit 14 synthesizes, for example, so that the generated sound of the non-edible food is reproduced 20 ms after the start of reproduction of the chewing sound of the edible material or the reproduction time of the destructive sound.
 提示部15は、合成した合成音源をスピーカー、振動モータのうち1つ以上で再生する機能を備える。振動モータは、例えば、ユーザの肩や顎周辺に装着されている。提示部15が合成音源をスピーカー及び振動モータ、若しくは、そのいずれかで再生することで、スピーカーからは合成音源に対応する音が上記ユーザに伝搬し、振動モータにより合成音源に対応する振動が上記ユーザに伝達する。 The presentation unit 15 has a function of reproducing the synthesized synthetic sound source by one or more of the speaker and the vibration motor. The vibration motor is mounted around the user's shoulder or chin, for example. When the presentation unit 15 reproduces the synthetic sound source with the speaker and / or the vibration motor, the sound corresponding to the synthetic sound source is propagated from the speaker to the user, and the vibration corresponding to the synthetic sound source is transmitted by the vibration motor. Communicate to the user.
 [食感生成装置の動作]
 図2は、食感生成装置1の動作を示すフロー図である。
[Operation of texture generator]
FIG. 2 is a flow chart showing the operation of the texture generating device 1.
 ステップS1;
 まず、入力部11は、ユーザの指定又は食感生成装置1によるインターネット上のデータベースからの取得及び選択に基づき、食用物の咀嚼音に対して合成させたい非食用物の発生音を食感生成装置1の内部に入力する。例えば、入力部11は、硬貨のジャラジャラする音(以下、ジャラ音)を入力する。
Step S1;
First, the input unit 11 generates the texture of the non-edible food that is desired to be synthesized with respect to the chewing sound of the food, based on the user's designation or the acquisition and selection from the database on the Internet by the texture generation device 1. Input to the inside of the device 1. For example, the input unit 11 inputs a jerky sound of coins (hereinafter, jerky sound).
 ステップS2;
 次に、抽出部12は、入力した非食用物の発生音の波形データ・波形信号をケプストラム分析することで、その非食用物の発生音の特徴量S=(s,s,…,s)を算出する。ケプストラム分析とは、音の振動のスペクトル包絡を分析する手法であり、音声認識の分野で用いられている公知技術である。抽出部12は、ケプストラム分析の他、音の特徴量を算出可能な任意の既存技術を利用可能である。
Step S2;
Next, the extraction unit 12 analyzes the input waveform data and waveform signal of the generated sound of the non-edible material by cepstrum analysis, and the characteristic amount S = (s 1 , s 2 , ... s n ) is calculated. Cepstrum analysis is a method for analyzing the spectral envelope of sound vibration, and is a known technique used in the field of speech recognition. In addition to cepstrum analysis, the extraction unit 12 can use any existing technique capable of calculating sound features.
 ステップS3;
 次に、算出部13は、コサイン類似度を用いて、算出した非食用物の発生音の特徴量S=(s,s,…,s)と、記憶部16に記憶されている複数の食用物の咀嚼音の特徴量F=(f,f,…,f)と、の類似度をそれぞれ算出する。算出部13は、コサイン類似度の他、音の類似度を算出可能な任意の既存技術を利用可能である。
Step S3;
Subsequently, the computing unit 13 uses the cosine similarity, feature quantity of the sound generated inedible material calculated S = (s 1, s 2 , ..., s n) and, stored in the storage unit 16 The degree of similarity with the feature amount Fi = (f 1 , f 2 , ..., F n ) of the chewing sound of a plurality of foods is calculated. In addition to the cosine similarity, the calculation unit 13 can use any existing technique capable of calculating the similarity of sounds.
 ステップS4;
 次に、算出部13は、算出した類似度の算出結果に基づき、入力した非食用物の発生音に対して最も類似度の高い食用物の咀嚼音を選択する。例えば、算出部13は、ポテトチップスの咀嚼音を選択する。
Step S4;
Next, the calculation unit 13 selects the chewing sound of the edible food having the highest degree of similarity to the input generated sound of the non-edible food based on the calculated calculation result of the similarity. For example, the calculation unit 13 selects the chewing sound of potato chips.
 ステップS5;
 次に、合成部14は、入力した非食用物の発生音に対して、選択した食用物の咀嚼音を、選択した食用物の咀嚼音の方が任意の時間差で先に再生されるように合成する。例えば、合成部14は、ポテトチップスの咀嚼音の再生開始から、20ms経過後に、硬貨のジャラ音が再生開始されるように合成する。
Step S5;
Next, the synthesis unit 14 reproduces the mastication sound of the selected edible product earlier than the input generated sound of the non-edible product with an arbitrary time difference. Synthesize. For example, the synthesizing unit 14 synthesizes so that the jerking sound of coins is started to be reproduced 20 ms after the start of reproduction of the chewing sound of potato chips.
 図3は、硬貨のジャラ音とポテトチップスの咀嚼音との合成イメージを示す図である。SIGaは、硬貨のジャラ音の波形データである。SIGbは、ポテトチップスの咀嚼音の波形データである。合成部14は、ポテトチップスの咀嚼音SIGbが先、硬貨のジャラ音SIGaが後になるように合成する。また、SIGbより、t1とt2に2つの山があることを把握できる。t1はポテトチップスを歯で破壊し始めた時刻であり、t2はポテトチップスを歯ですり潰し始めた時刻である。t1とt2との間の時間間隔は、前述の通り約10~30msである。それ故、合成部14は、硬貨のジャラ音SIGaの再生開示時刻tsがt1の時刻から例えば20ms経過後、又はポテトチップスの咀嚼音の再生開始時刻t0から例えば20ms経過後(但し、t1の後)になるように合成する。 FIG. 3 is a diagram showing a composite image of the jerking sound of coins and the chewing sound of potato chips. SIGa is waveform data of the jerking sound of coins. SIGb is waveform data of the chewing sound of potato chips. The synthesis unit 14 synthesizes the potato chips chewing sound SIGb first and the coin jerking sound SIGa second. Also, from the SIGb, it can be understood that there are two peaks at t1 and t2. t1 is the time when the potato chips are started to be destroyed by the teeth, and t2 is the time when the potato chips are started to be crushed by the teeth. The time interval between t1 and t2 is about 10 to 30 ms as described above. Therefore, the synthesizing unit 14 is after, for example, 20 ms has elapsed from the time when the reproduction disclosure time ts of the coin jerking sound SIGa is t1, or after, for example, 20 ms has elapsed from the reproduction start time t0 of the chewing sound of potato chips (however, after t1). ).
 ステップS6;
 最後に、提示部15は、合成した合成音源を食感生成装置1に内蔵されたスピーカーで再生するとともに、その合成音源を食感生成装置1に接続されユーザの顎周辺に装着された振動モータで再生する。このとき、先にポテトチップスの咀嚼音が再生され、20ms経過後にポテトチップスの咀嚼音と硬貨のジャラ音との合成音が再生される。
Step S6;
Finally, the presentation unit 15 reproduces the synthesized synthetic sound source by the speaker built in the texture generating device 1, and the synthetic sound source is connected to the texture generating device 1 and is mounted around the user's jaw. Play with. At this time, the chewing sound of potato chips is reproduced first, and after 20 ms, the combined sound of the chewing sound of potato chips and the jerking sound of coins is reproduced.
 このように、食感生成装置1が上記動作を行うことで、聞き慣れた咀嚼音が最初に聞こえ、自然な繋がりで非食用物の発生音が提示されることになる。また、聞き慣れた咀嚼音に対応する振動が最初に伝わり、自然な繋がりで非食用物の発生音に対応する振動が提示されることになる。その結果、食用物と非食用物の一体感が増し、非食用物を食べたと感じさせる食感を提示可能となり、今までにない新たな食感の開拓を支援することができる。また、VRコンテンツでは、食べられない物を食べるキャラクターが存在し、そのようなキャラクターになりきる効果へも応用することができる。 In this way, when the texture generating device 1 performs the above operation, the familiar chewing sound is heard first, and the generated sound of non-edible food is presented with a natural connection. In addition, the vibration corresponding to the familiar mastication sound is transmitted first, and the vibration corresponding to the generated sound of the non-edible material is presented by the natural connection. As a result, the sense of unity between the edible food and the non-edible food is increased, and it is possible to present the texture that makes the non-edible food feel like eating, and it is possible to support the development of a new texture that has never existed before. Further, in VR contents, there is a character who eats something that cannot be eaten, and it can be applied to the effect of becoming such a character.
 [効果]
 本実施形態によれば、食感生成装置1は、非食用物の発生音に対して、非食用物の発生音に類似する食用物の咀嚼音を、食用物の咀嚼音の方が任意の時間差で先に再生されるように合成して再生するので、非食用物の発生音と食用物の咀嚼音との合成に自然な繋がり(一体感)を持たることが可能となり、非食用物を食べたと感じさせる食感を提示可能となる。その結果、今までにない新たな食感の開拓を支援することができる。
[effect]
According to the present embodiment, the texture generating device 1 makes the chewing sound of the edible product similar to the generated sound of the non-edible product, and the chewing sound of the edible product is arbitrary with respect to the generated sound of the non-edible product. Since it is synthesized and reproduced so that it is reproduced first with a time lag, it is possible to have a natural connection (sense of unity) between the generated sound of non-edible food and the chewing sound of edible food, and it is possible to have a natural connection (sense of unity). It is possible to present a texture that makes you feel that you have eaten. As a result, it is possible to support the development of a new texture that has never existed before.
 [その他]
 本発明は、上記実施形態に限定されない。本発明は、本発明の要旨の範囲内で数々の変形が可能である。
[others]
The present invention is not limited to the above embodiments. The present invention can be modified in a number of ways within the scope of the gist of the present invention.
 上記説明した本実施形態の食感生成装置1は、例えば、図4に示すように、CPU(Central Processing Unit、プロセッサ)901と、メモリ902と、ストレージ903(HDD:Hard Disk Drive、SSD:Solid State Drive)903と、通信装置904と、入力装置905と、出力装置906と、を備えた汎用的なコンピュータシステムを用いて実現できる。メモリ902及びストレージ903は、記憶装置である。当該コンピュータシステムにおいて、CPU901がメモリ902上にロードされた所定のプログラムを実行することにより、食感生成装置1の各機能が実現される。 The texture generating device 1 of the present embodiment described above has, for example, as shown in FIG. 4, a CPU (Central Processing Unit, processor) 901, a memory 902, and a storage 903 (HDD: Hard Disk Drive, SSD: Solid). It can be realized by using a general-purpose computer system including a State Drive) 903, a communication device 904, an input device 905, and an output device 906. The memory 902 and the storage 903 are storage devices. In the computer system, each function of the texture generating device 1 is realized by executing a predetermined program loaded on the memory 902 by the CPU 901.
 食感生成装置1は、1つのコンピュータで実装されてもよい。食感生成装置1は、複数のコンピュータで実装されてもよい。食感生成装置1は、コンピュータに実装される仮想マシンであってもよい。食感生成装置1用のプログラムは、HDD、SSD、USB(Universal Serial Bus)メモリ、CD(Compact Disc)、DVD(Digital Versatile Disc)等のコンピュータ読取り可能な記録媒体に記憶できる。食感生成装置1用のプログラムは、通信ネットワークを介して配信することもできる。 The texture generating device 1 may be mounted on one computer. The texture generating device 1 may be implemented by a plurality of computers. The texture generating device 1 may be a virtual machine mounted on a computer. The program for the texture generator 1 can be stored in a computer-readable recording medium such as an HDD, SSD, USB (Universal Serial Bus) memory, CD (Compact Disc), or DVD (Digital Versatile Disc). The program for the texture generating device 1 can also be distributed via a communication network.
 1 :食感生成装置
 11:入力部
 12:抽出部
 13:算出部
 14:合成部
 15:提示部
 16:記憶部
1: Texture generator 11: Input unit 12: Extraction unit 13: Calculation unit 14: Synthesis unit 15: Presentation unit 16: Storage unit

Claims (5)

  1.  コンピュータで行う食感生成方法において、
     前記コンピュータは、
     非食用物の発生音に対して、前記非食用物の発生音に類似する食用物の咀嚼音を、前記食用物の咀嚼音の方が任意の時間差で先に再生されるように合成して再生するステップを行う食感生成方法。
    In the texture generation method performed by computer
    The computer
    With respect to the generated sound of the non-edible product, the chewing sound of the edible product similar to the generated sound of the non-edible product is synthesized so that the chewing sound of the edible product is reproduced earlier with an arbitrary time difference. A texture generation method that performs a step of regeneration.
  2.  前記ステップでは、
     前記非食用物の発生音から前記非食用物の発生音の特徴量を抽出するステップと、
     種類の異なる複数の食用物の咀嚼音と前記複数の食用物の咀嚼音の特徴量とをそれぞれ関連付けて記憶した記憶部から前記複数の食用物の咀嚼音の特徴量を読み出して、前記非食用物の発生音の特徴量と前記複数の食用物の咀嚼音の特徴量との類似度をそれぞれ算出し、前記算出した類似度の算出結果を基に最も類似度の高い食用物の咀嚼音を選択するステップと、
     前記非食用物の発生音に対して、前記選択した食用物の咀嚼音を、前記選択した食用物の咀嚼音の方が前記任意の時間差で先に再生されるように合成するステップと、
     前記合成した音源をスピーカー、振動モータのうち1つ以上で再生するステップと、
     を行う請求項1に記載の食感生成方法。
    In the above step
    A step of extracting the feature amount of the generated sound of the non-edible product from the generated sound of the non-edible product, and
    The characteristic amount of the masticatory sound of the plurality of edible foods is read out from the storage unit stored in association with the masticatory sound of the plurality of edible foods of different types and the characteristic amount of the masticatory sound of the plurality of edible foods, respectively, and the non-edible The degree of similarity between the characteristic amount of the generated sound of the object and the characteristic amount of the chewing sound of the plurality of edible substances is calculated, and the chewing sound of the edible object having the highest degree of similarity is calculated based on the calculated similarity calculation result. Steps to select and
    A step of synthesizing the mastication sound of the selected edible product with respect to the generated sound of the non-edible product so that the mastication sound of the selected edible product is reproduced earlier with the arbitrary time difference.
    The step of reproducing the synthesized sound source with one or more of the speaker and the vibration motor,
    The texture generation method according to claim 1.
  3.  前記任意の時間差は、
     10~30ミリ秒である請求項1又は2に記載の食感生成方法。
    The arbitrary time difference is
    The texture generation method according to claim 1 or 2, which is 10 to 30 milliseconds.
  4.  種類の異なる複数の食用物の咀嚼音と前記複数の食用物の咀嚼音の特徴量とをそれぞれ関連付けて記憶しておく記憶部と、
     非食用物の発生音から前記非食用物の発生音の特徴量を抽出する抽出部と、
     前記記憶部から前記複数の食用物の咀嚼音の特徴量を読み出して、前記非食用物の発生音の特徴量と前記複数の食用物の咀嚼音の特徴量との類似度をそれぞれ算出し、前記算出した類似度の算出結果を基に最も類似度の高い食用物の咀嚼音を選択する算出部と、
     前記非食用物の発生音に対して、前記選択した食用物の咀嚼音を、前記選択した食用物の咀嚼音の方が任意の時間差で先に再生されるように合成する合成部と、
     前記合成した音源をスピーカー、振動モータのうち1つ以上で再生する提示部と、
     を備える食感生成装置。
    A storage unit that stores the chewing sounds of a plurality of different types of foods and the feature amounts of the chewing sounds of the plurality of foods in association with each other.
    An extraction unit that extracts the feature amount of the generated sound of the non-edible product from the generated sound of the non-edible product, and
    The characteristic amounts of the chewing sounds of the plurality of foodstuffs are read out from the storage unit, and the similarity between the characteristic amounts of the generated sounds of the non-edible foodstuffs and the characteristic amounts of the chewing sounds of the plurality of foodstuffs is calculated. A calculation unit that selects the chewing sound of food with the highest degree of similarity based on the calculation result of the calculated similarity, and a calculation unit.
    A synthesizer that synthesizes the chewing sound of the selected food with respect to the generated sound of the non-edible food so that the chewing sound of the selected food is reproduced earlier with an arbitrary time difference.
    A presentation unit that reproduces the synthesized sound source with one or more of a speaker and a vibration motor,
    A texture generator equipped with.
  5.  請求項4に記載の食感生成装置としてコンピュータを機能させる食感生成プログラム。 A texture generation program that causes a computer to function as the texture generation device according to claim 4.
PCT/JP2020/016683 2020-04-16 2020-04-16 Food texture generation method, food texture generation device, and food texture generation program WO2021210121A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/016683 WO2021210121A1 (en) 2020-04-16 2020-04-16 Food texture generation method, food texture generation device, and food texture generation program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/016683 WO2021210121A1 (en) 2020-04-16 2020-04-16 Food texture generation method, food texture generation device, and food texture generation program

Publications (1)

Publication Number Publication Date
WO2021210121A1 true WO2021210121A1 (en) 2021-10-21

Family

ID=78084509

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/016683 WO2021210121A1 (en) 2020-04-16 2020-04-16 Food texture generation method, food texture generation device, and food texture generation program

Country Status (1)

Country Link
WO (1) WO2021210121A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015177447A (en) * 2014-03-17 2015-10-05 株式会社Jvcケンウッド noise reduction device, noise reduction method and noise reduction program
JP2016093476A (en) * 2014-11-10 2016-05-26 国立研究開発法人産業技術総合研究所 Manducation feeling feedback device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015177447A (en) * 2014-03-17 2015-10-05 株式会社Jvcケンウッド noise reduction device, noise reduction method and noise reduction program
JP2016093476A (en) * 2014-11-10 2016-05-26 国立研究開発法人産業技術総合研究所 Manducation feeling feedback device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MASUDA, MAMI ET AL.: "Control Method of Food Texture by Adding Mastication Sound", COMMUNICATION OF JIMA., vol. 26, no. 1, 2 May 2016 (2016-05-02), pages 37 - 44 *

Similar Documents

Publication Publication Date Title
US10790919B1 (en) Personalized real-time audio generation based on user physiological response
US9332100B2 (en) Portable communications device
CN104427390B (en) Supplemental information based on multimedia content provides the method and system of haptic effect
JP3521900B2 (en) Virtual speaker amplifier
US20100118033A1 (en) Synchronizing animation to a repetitive beat source
US8363843B2 (en) Methods, modules, and computer-readable recording media for providing a multi-channel convolution reverb
KR20060112601A (en) Key generating method and key generating apparatus
JP2002051399A (en) Method and device for processing sound signal
US7203558B2 (en) Method for computing sense data and device for computing sense data
Fontana et al. Rendering and subjective evaluation of real vs. synthetic vibrotactile cues on a digital piano keyboard
WO2021210121A1 (en) Food texture generation method, food texture generation device, and food texture generation program
CN109410972B (en) Method, device and storage medium for generating sound effect parameters
JP2006217935A (en) Morbid fear treatment apparatus
JP3716725B2 (en) Audio processing apparatus, audio processing method, and information recording medium
WO2019244625A1 (en) Information processing device, information processing method, and program
Carvalho et al. Sound-enhanced gustatory experiences and technology
JP2008242376A (en) Musical piece introduction sentence generating device, narration adding device, and program
WO2021214998A1 (en) Texture presentation device, texture presentation method, and texture presentation program
CN114495871A (en) Method, system and device for generating music melody based on multi-line characteristic information
WO2021214823A1 (en) Food texture change stimulation method, food texture change stimulation device, and food texture change stimulation program
Gibson Using digitized auditory stimuli on the Macintosh computer
WO2022003831A1 (en) Perception conversion method, perception conversion device, and perception conversion program
EP1643448A2 (en) Method for predicting the appearance of at least one portion of the body of an individual
JPWO2020059758A1 (en) Content playback device, tactile vibration generation method, computer program, tactile vibration data distribution system, and tactile vibration providing device
JP5703793B2 (en) Information display device, information display method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20931256

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20931256

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP