WO2021210121A1

WO2021210121A1 - Food texture generation method, food texture generation device, and food texture generation program

Info

Publication number: WO2021210121A1
Application number: PCT/JP2020/016683
Authority: WO
Inventors: 十季武田; 有信新島; 隆文向内; 佐藤　隆
Original assignee: 日本電信電話株式会社
Priority date: 2020-04-16
Filing date: 2020-04-16
Publication date: 2021-10-21

Abstract

This food texture generation device 1 comprises: a storage unit 16 which stores the sounds of chewing a plurality of different types of food products and the characteristic amounts of the sounds of chewing the plurality of food products in association with each other; an extraction unit 12 which extracts the characteristic amount of the generated sound of a non-food product, from the generated sound of the non-food product; a calculation unit 13 which reads out the characteristic amounts of the sounds of chewing the plurality of food products from the storage unit 16, calculates the similarities between the characteristic amount of the generated sound of the non-food product and the characteristic amounts of the sounds of chewing the plurality of food products, respectively, and selects the sound of chewing the food product with the highest similarity, on the basis of the calculation result of the calculated similarities; a synthesis unit 14 which synthesizes the generated sound of the non-food product with the sound of chewing the selected food product such that the sound of chewing the selected food product is reproduced earlier with any time difference; and a presentation unit 15 which reproduces the synthesized sound source through one or more among a speaker and a vibration motor.

Description

Texture generation method, texture generation device, and texture generation program

The present invention relates to a texture generation method, a texture generation device, and a texture generation program.

In the technical field of VR (Virtual Reality), research is being conducted on a texture generation presentation technology that generates a texture sound and presents it to a user (see Non-Patent Document 1).

However, since the conventional texture generation presentation technology only presents the texture of edible foods, it is a VR technology that can realize a simulated experience, but it has a texture that does not look good as before. I could only present it.

The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technique capable of supporting the development of a new texture that has never existed before.

One aspect of the present invention is a texture generation method performed by a computer, wherein the computer makes a chewing sound of an edible substance similar to the generated sound of the non-edible substance with respect to the generated sound of the non-edible substance. Is synthesized and reproduced so that the chewing sound of the edible product is reproduced earlier with an arbitrary time difference.

The texture generating device according to one aspect of the present invention includes a storage unit that stores the chewing sounds of a plurality of different types of foods and the characteristic amounts of the chewing sounds of the plurality of foods in association with each other, and a non-edible material. The feature amount of the chewing sound of the plurality of edible substances is read out from the extraction unit that extracts the feature amount of the generated sound of the non-edible product from the generated sound of And the calculation unit that calculates the similarity with the feature amount of the chewing sound of the plurality of edible foods and selects the chewing sound of the edible food having the highest degree of similarity based on the calculated result of the calculated similarity. The synthesis unit and the synthesis unit that synthesize the chewing sound of the selected edible product with respect to the generated sound of the non-edible product so that the chewing sound of the selected edible product is reproduced earlier with an arbitrary time difference. It is provided with a presentation unit that reproduces the generated sound source by one or more of a speaker and a vibration motor.

The texture generation program of one aspect of the present invention is a texture generation program that causes a computer to function as the texture generation device.

According to the present invention, it is possible to provide a technique capable of supporting the development of a new texture that has never existed before.

FIG. 1 is a configuration diagram showing a functional block of a texture generating device. FIG. 2 is a flow chart showing the operation of the texture generating device. FIG. 3 is a diagram showing a composite image of the generated sound of the non-edible product and the chewing sound of the edible product. FIG. 4 is a configuration diagram showing a hardware configuration of the texture generating device.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the description of the drawings, the same parts are designated by the same reference numerals and the description thereof will be omitted.

[Outline of Invention]
The present invention proposes to make the user perceive that he / she is eating non-edible food by synthesizing the chewing sound of edible food with respect to the generated sound of inedible non-edible food.

At this time, the present invention makes the user perceive that even a non-edible food can be eaten by presenting the chewing sound of the edible food first.

Further, in the present invention, in order to give a natural connection between two sounds obtained by synthesizing the generated sound of the non-edible product and the chewing sound of the edible product, the edible product having similar characteristics to the generated sound of the non-edible product. Synthesize the chewing sound of. In addition, based on the person's perceived experience of the mastication sound when food is destroyed and the time resolution of the person, the mastication sound of food with a time difference of 10 to 30 ms (milliseconds) that the person can perceive and feel a sense of unity. And the sound generated by non-edible foods are combined. This is because the chewing sound when eating a hard food includes two sounds, a destructive sound and a tooth-grinding sound, and the time interval between these two sounds is about 10 to 30 ms. Because there are many. The human time resolution is about 3 ms, and it is said that the time resolution that can recognize the order of sounds is more than a dozen ms (Kashiwano, "From the brain to immersive communication: timing perception mechanism", Information Science and Technology Forum, 2002. Year). Based on these backgrounds, it is possible to create a natural and effective connection by synthesizing the generated sound of non-edible food at the timing of the sound of grinding edible food.

[Structure of texture generator]
FIG. 1 is a configuration diagram showing a functional block of the texture generating device 1 according to the present embodiment. The texture generation device 1 includes, for example, an input unit 11, an extraction unit 12, a calculation unit 13, a synthesis unit 14, a presentation unit 15, and a storage unit 16. The texture generating device 1 is a computer such as a server, has a built-in speaker, and is connected to a vibration motor that can be attached to any part of the user.

The storage unit 16 is a mastication sound feature amount database that stores the mastication sound feature amount, and is a mastication sound feature amount database of a plurality of different types of edible foods and the mastication sound feature amount _Fi (i is a natural number) of the plurality of edible foods. ) And have a function to associate and memorize each. The chewing sound of edible foods is, for example, the chewing sound of potato chips, the chewing sound of Namuru, and the like, and is the sound of chewing edible foods.

The input unit 11 has a function of inputting a sound source of a non-edible substance (sound generated by the non-edible substance) received by the texture generating device 1. The sound generated by non-edible foods is, for example, the sound of a plurality of coins overlapping each other and rattling, the sound of stones crushing, and the like, and is the sound generated by non-edible foods that cannot be eaten.

The extraction unit 12 has a function of analyzing the input generated sound of the non-edible product and extracting the feature amount S of the generated sound of the non-edible product from the generated sound of the non-edible product based on the analysis result. The method for analyzing the feature amount of sound can be realized by using, for example, a calculation formula for performing zero crossover rate, power spectrum analysis, cepstrum analysis, or the like. Incidentally, for the feature value F _i chewing sounds edible product stored in the storage unit 16, it is determined using the calculation formula as described above calculation formula.

Calculating unit 13 from the storage unit 16 reads the feature quantity F _i chewing sounds of a plurality of edible products, the feature amount S of the sound generated in the extracted non-food product, characterized in chewing sounds plurality of edible products read the amount F _i, the similarity, for example, by using a cosine similarity, each calculated on the basis of the calculated similarity of the calculation result, a function of selecting the chewing sounds of highest similarity edible product.

The synthesis unit 14 has a function of synthesizing the chewing sound of the selected food with respect to the input generated sound of the non-edible food so that the chewing sound of the selected food is reproduced earlier with an arbitrary time difference. To be equipped with. The arbitrary time difference is, for example, 10 to 30 ms. The synthesis unit 14 synthesizes, for example, so that the generated sound of the non-edible food is reproduced 20 ms after the start of reproduction of the chewing sound of the edible material or the reproduction time of the destructive sound.

The presentation unit 15 has a function of reproducing the synthesized synthetic sound source by one or more of the speaker and the vibration motor. The vibration motor is mounted around the user's shoulder or chin, for example. When the presentation unit 15 reproduces the synthetic sound source with the speaker and / or the vibration motor, the sound corresponding to the synthetic sound source is propagated from the speaker to the user, and the vibration corresponding to the synthetic sound source is transmitted by the vibration motor. Communicate to the user.

[Operation of texture generator]
FIG. 2 is a flow chart showing the operation of the texture generating device 1.

Step S1;
First, the input unit 11 generates the texture of the non-edible food that is desired to be synthesized with respect to the chewing sound of the food, based on the user's designation or the acquisition and selection from the database on the Internet by the texture generation device 1. Input to the inside of the device 1. For example, the input unit 11 inputs a jerky sound of coins (hereinafter, jerky sound).

Step S2;
Next, the extraction unit 12 analyzes the input waveform data and waveform signal of the generated sound of the non-edible material by cepstrum analysis, and the characteristic amount S = (s ₁ , s ₂ , ... s _n ) is calculated. Cepstrum analysis is a method for analyzing the spectral envelope of sound vibration, and is a known technique used in the field of speech recognition. In addition to cepstrum analysis, the extraction unit 12 can use any existing technique capable of calculating sound features.

Step S3;
Subsequently, the computing unit 13 uses the cosine similarity, feature quantity of the sound generated inedible material calculated _{S = (s 1, s 2} , ..., s n) and, stored in the storage unit 16 The degree of similarity with _{the feature amount Fi} = (f ₁ , f ₂ , ..., F _n ) of the chewing sound of a plurality of foods is calculated. In addition to the cosine similarity, the calculation unit 13 can use any existing technique capable of calculating the similarity of sounds.

Step S4;
Next, the calculation unit 13 selects the chewing sound of the edible food having the highest degree of similarity to the input generated sound of the non-edible food based on the calculated calculation result of the similarity. For example, the calculation unit 13 selects the chewing sound of potato chips.

Step S5;
Next, the synthesis unit 14 reproduces the mastication sound of the selected edible product earlier than the input generated sound of the non-edible product with an arbitrary time difference. Synthesize. For example, the synthesizing unit 14 synthesizes so that the jerking sound of coins is started to be reproduced 20 ms after the start of reproduction of the chewing sound of potato chips.

FIG. 3 is a diagram showing a composite image of the jerking sound of coins and the chewing sound of potato chips. SIGa is waveform data of the jerking sound of coins. SIGb is waveform data of the chewing sound of potato chips. The synthesis unit 14 synthesizes the potato chips chewing sound SIGb first and the coin jerking sound SIGa second. Also, from the SIGb, it can be understood that there are two peaks at t1 and t2. t1 is the time when the potato chips are started to be destroyed by the teeth, and t2 is the time when the potato chips are started to be crushed by the teeth. The time interval between t1 and t2 is about 10 to 30 ms as described above. Therefore, the synthesizing unit 14 is after, for example, 20 ms has elapsed from the time when the reproduction disclosure time ts of the coin jerking sound SIGa is t1, or after, for example, 20 ms has elapsed from the reproduction start time t0 of the chewing sound of potato chips (however, after t1). ).

Step S6;
Finally, the presentation unit 15 reproduces the synthesized synthetic sound source by the speaker built in the texture generating device 1, and the synthetic sound source is connected to the texture generating device 1 and is mounted around the user's jaw. Play with. At this time, the chewing sound of potato chips is reproduced first, and after 20 ms, the combined sound of the chewing sound of potato chips and the jerking sound of coins is reproduced.

In this way, when the texture generating device 1 performs the above operation, the familiar chewing sound is heard first, and the generated sound of non-edible food is presented with a natural connection. In addition, the vibration corresponding to the familiar mastication sound is transmitted first, and the vibration corresponding to the generated sound of the non-edible material is presented by the natural connection. As a result, the sense of unity between the edible food and the non-edible food is increased, and it is possible to present the texture that makes the non-edible food feel like eating, and it is possible to support the development of a new texture that has never existed before. Further, in VR contents, there is a character who eats something that cannot be eaten, and it can be applied to the effect of becoming such a character.

[effect]
According to the present embodiment, the texture generating device 1 makes the chewing sound of the edible product similar to the generated sound of the non-edible product, and the chewing sound of the edible product is arbitrary with respect to the generated sound of the non-edible product. Since it is synthesized and reproduced so that it is reproduced first with a time lag, it is possible to have a natural connection (sense of unity) between the generated sound of non-edible food and the chewing sound of edible food, and it is possible to have a natural connection (sense of unity). It is possible to present a texture that makes you feel that you have eaten. As a result, it is possible to support the development of a new texture that has never existed before.

[others]
The present invention is not limited to the above embodiments. The present invention can be modified in a number of ways within the scope of the gist of the present invention.

The texture generating device 1 of the present embodiment described above has, for example, as shown in FIG. 4, a CPU (Central Processing Unit, processor) 901, a memory 902, and a storage 903 (HDD: Hard Disk Drive, SSD: Solid). It can be realized by using a general-purpose computer system including a State Drive) 903, a communication device 904, an input device 905, and an output device 906. The memory 902 and the storage 903 are storage devices. In the computer system, each function of the texture generating device 1 is realized by executing a predetermined program loaded on the memory 902 by the CPU 901.

The texture generating device 1 may be mounted on one computer. The texture generating device 1 may be implemented by a plurality of computers. The texture generating device 1 may be a virtual machine mounted on a computer. The program for the texture generator 1 can be stored in a computer-readable recording medium such as an HDD, SSD, USB (Universal Serial Bus) memory, CD (Compact Disc), or DVD (Digital Versatile Disc). The program for the texture generating device 1 can also be distributed via a communication network.

1: Texture generator 11: Input unit 12: Extraction unit 13: Calculation unit 14: Synthesis unit 15: Presentation unit 16: Storage unit

Claims

In the texture generation method performed by computer
The computer
With respect to the generated sound of the non-edible product, the chewing sound of the edible product similar to the generated sound of the non-edible product is synthesized so that the chewing sound of the edible product is reproduced earlier with an arbitrary time difference. A texture generation method that performs a step of regeneration.
In the above step
A step of extracting the feature amount of the generated sound of the non-edible product from the generated sound of the non-edible product, and
The characteristic amount of the masticatory sound of the plurality of edible foods is read out from the storage unit stored in association with the masticatory sound of the plurality of edible foods of different types and the characteristic amount of the masticatory sound of the plurality of edible foods, respectively, and the non-edible The degree of similarity between the characteristic amount of the generated sound of the object and the characteristic amount of the chewing sound of the plurality of edible substances is calculated, and the chewing sound of the edible object having the highest degree of similarity is calculated based on the calculated similarity calculation result. Steps to select and
A step of synthesizing the mastication sound of the selected edible product with respect to the generated sound of the non-edible product so that the mastication sound of the selected edible product is reproduced earlier with the arbitrary time difference.
The step of reproducing the synthesized sound source with one or more of the speaker and the vibration motor,
The texture generation method according to claim 1.
The arbitrary time difference is
The texture generation method according to claim 1 or 2, which is 10 to 30 milliseconds.
A storage unit that stores the chewing sounds of a plurality of different types of foods and the feature amounts of the chewing sounds of the plurality of foods in association with each other.
An extraction unit that extracts the feature amount of the generated sound of the non-edible product from the generated sound of the non-edible product, and
The characteristic amounts of the chewing sounds of the plurality of foodstuffs are read out from the storage unit, and the similarity between the characteristic amounts of the generated sounds of the non-edible foodstuffs and the characteristic amounts of the chewing sounds of the plurality of foodstuffs is calculated. A calculation unit that selects the chewing sound of food with the highest degree of similarity based on the calculation result of the calculated similarity, and a calculation unit.
A synthesizer that synthesizes the chewing sound of the selected food with respect to the generated sound of the non-edible food so that the chewing sound of the selected food is reproduced earlier with an arbitrary time difference.
A presentation unit that reproduces the synthesized sound source with one or more of a speaker and a vibration motor,
A texture generator equipped with.
A texture generation program that causes a computer to function as the texture generation device according to claim 4.