WO2016039463A1 - Dispositif d'analyse acoustique - Google Patents

Dispositif d'analyse acoustique Download PDF

Info

Publication number
WO2016039463A1
WO2016039463A1 PCT/JP2015/075923 JP2015075923W WO2016039463A1 WO 2016039463 A1 WO2016039463 A1 WO 2016039463A1 JP 2015075923 W JP2015075923 W JP 2015075923W WO 2016039463 A1 WO2016039463 A1 WO 2016039463A1
Authority
WO
WIPO (PCT)
Prior art keywords
impression
data
index
acoustic
feature
Prior art date
Application number
PCT/JP2015/075923
Other languages
English (en)
Japanese (ja)
Inventor
英樹 阪梨
隆一 成山
舞 小池
Original Assignee
ヤマハ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ヤマハ株式会社 filed Critical ヤマハ株式会社
Publication of WO2016039463A1 publication Critical patent/WO2016039463A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B15/00Teaching music
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K15/00Acoustics not otherwise provided for
    • G10K15/04Sound-producing devices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the present invention relates to a technique for analyzing sound.
  • Patent Literature 1 discloses a technique for evaluating a song in consideration of song expressions such as vibrato and intonation in addition to the pitch of the song voice.
  • Patent Document 2 discloses a technique for evaluating a song according to the pitch (basic frequency) and volume of a singing voice.
  • Patent Document 1 and Patent Document 2 merely evaluate an objective skill of a singing that focuses only on the difference between a reference value indicating an exemplary singing and a feature value of the singing voice to be evaluated.
  • subjective viewpoints such as impressions perceived by the listener of the singing voice are not properly evaluated.
  • individual or characteristic singing can actually give the listener a skilled impression, but as a result of deviating from the exemplary singing, the techniques of Patent Document 1 and Patent Document 2 are not rated highly. Probability is high.
  • the evaluation of the singing voice has been exemplified.
  • an object of the present invention is to appropriately evaluate a subjective impression of sound.
  • the acoustic analysis device of the present invention includes a plurality of reference data in which an impression index indicating an auditory impression of a reference sound and a feature index indicating an acoustic feature of the reference sound are associated with each other, and Using the relationship description data that defines the correspondence between the auditory impression and multiple types of acoustic features, the impression index of the auditory impression and the feature index of each acoustic feature in the correspondence specified by the relationship description data A relational expression setting unit for setting a relational expression expressing the relationship is provided. In the above configuration, a relational expression that expresses the relationship between the impression index of the auditory impression and the feature index of each acoustic feature is set. Therefore, by using the relational expression set by the relational expression setting means, it is possible to appropriately evaluate the subjective impression of sound.
  • the relational expression is set only by the statistical analysis of the reference data, the pseudo-correlation (as if the specific feature index does not actually correlate with the specific auditory impression but correlates with the wrinkles due to potential factors.
  • a relational expression in which a feature index that does not actually correlate with the auditory impression has an influence on the auditory impression is derived.
  • a relational expression is set using relationship description data that defines a correspondence relationship between an auditory impression and a plurality of types of acoustic features. Is done.
  • a relational expression that appropriately reflects the actual correlation between the impression index and a plurality of feature indices that is, a relational expression that can appropriately evaluate an auditory impression
  • the relationship description data defines a correspondence relationship between the auditory impression and a plurality of types of acoustic features via a plurality of intermediate elements included in the auditory impression.
  • the correspondence relationship between the auditory impression and a plurality of types of acoustic features via a plurality of intermediate elements included in the auditory impression is defined by the relationship description data, the auditory impression and each acoustic feature.
  • the related expression setting means sets a related expression for each of a plurality of types of auditory impressions.
  • the relational expression is set for each of a plurality of types of auditory impressions, there is an advantage that the auditory impression can be appropriately evaluated from various viewpoints.
  • the relational expressions for evaluating the auditory impression of singing voices are: young (adult / childish), light / dark (bright / dark), and turbidity (clean and transparent / cloudy)
  • a configuration in which a relational expression is set for each of a plurality of types of auditory impressions including is particularly suitable.
  • the relational expression setting means acquires reference data and updates a predetermined relational expression using the reference data.
  • the relational expression is updated using the reference data acquired after setting the relational expression, it is possible to set the relational expression that appropriately reflects the actual correlation between the auditory impression and each acoustic feature. The effect of is particularly remarkable.
  • An acoustic analysis apparatus is an apparatus that analyzes an auditory impression of a sound to be analyzed using the relational expressions generated in each of the above aspects, and a feature that extracts a feature index of the sound to be analyzed
  • a relational expression that calculates the relationship between the impression index of auditory impression and the characteristic index of multiple types of acoustic features in the correspondence relationship defined by the relationship description data.
  • Applying the feature index extracted by the extraction unit includes an impression specifying unit that calculates an impression index of the analysis target sound.
  • the auditory impression of the analysis target sound is obtained by using the relational expression that appropriately reflects the actual correlation between the impression index and the plurality of feature indices by using the plurality of reference data and the relationship description data. Can be evaluated appropriately.
  • FIG. 1 is a configuration diagram of an acoustic analysis device according to a first embodiment of the present invention. It is a schematic diagram of an analysis result image. It is a flowchart of the operation
  • FIG. 1 is a configuration diagram of an acoustic analysis device 100A according to the first embodiment of the present invention.
  • the acoustic analysis device 100A according to the first embodiment is realized by a computer system including an arithmetic processing device 10, a storage device 12, an input device 14, a sound collection device 16, and a display device 18.
  • a portable information processing device such as a mobile phone or a smartphone, or a portable or stationary information processing device such as a personal computer can be used as the acoustic analysis device 100A.
  • the sound collection device 16 is a device (microphone) that collects ambient sounds.
  • the sound collection device 16 of the first embodiment collects a singing voice V in which a user sang a song.
  • the acoustic analysis device 100A can also be used as a karaoke device that mixes and reproduces the accompaniment sound of the music and the singing voice V.
  • illustration of the A / D converter which converts the signal of the singing voice V picked up by the sound pickup device 16 from analog to digital is omitted for convenience.
  • Display device 18 (for example, a liquid crystal display panel) displays an image instructed from arithmetic processing device 10.
  • the input device 14 is an operating device operated by the user for various instructions to the acoustic analysis device 100A, and includes a plurality of operators operated by the user, for example.
  • a touch panel configured integrally with the display device 18 can also be used as the input device 14.
  • the storage device 12 stores a program executed by the arithmetic processing device 10 and various data used by the arithmetic processing device 10.
  • a known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of types of recording media is arbitrarily employed as the storage device 12.
  • the acoustic analysis device 100A of the first embodiment is a signal processing device that analyzes the singing voice V collected by the sound collection device 16.
  • the arithmetic processing device 10 executes a program stored in the storage device 12 to thereby analyze a plurality of functions for analyzing the singing voice V (a feature extraction unit 22, an impression identification unit 24, a presentation processing unit 26, and a related expression setting). Part 40).
  • a configuration in which each function of the arithmetic processing device 10 is distributed to a plurality of devices or a configuration in which a dedicated electronic circuit realizes a part of the function of the arithmetic processing device 10 may be employed.
  • the feature extraction unit 22 analyzes the singing voice V collected by the sound collection device 16 to extract a plurality (N) of feature indexes X1 to XN indicating different types of acoustic features (N is a natural number).
  • the acoustic feature means an acoustic feature of the singing voice V that influences an auditory impression (hereinafter referred to as “auditory impression”) sensed by the listener of the singing voice V.
  • a feature index Xn (n 1 to N) that quantifies each of various acoustic features, such as pitch stability, vibrato depth (pitch amplitude), and frequency characteristics, is sung. Extracted from voice V.
  • the numerical value ranges of the N feature indexes X1 to XN extracted by the feature extraction unit 22 of the first embodiment are common.
  • the auditory impression means a subjective or sensory feature (impression) that is perceived by the listener of the singing voice V, and the acoustic feature is extracted by analysis of the singing voice V. Means an objective or physical feature.
  • the impression specifying unit 24 specifies the auditory impression of the singing voice V using the N feature indexes X1 to XN extracted by the feature extracting unit 22.
  • the impression specifying unit 24 of the first embodiment calculates a plurality (M) of impression indices Y1 to YM indicating different auditory impressions of the singing voice V (M is a natural number).
  • An impression index Ym is specified. For example, the larger an impression index Ym related to a young child is in a positive number range, the more adult-like sound is meant, and the smaller the impression index Ym is in a negative number range, the more child-like sound is meant.
  • an arithmetic expression (hereinafter referred to as “related expression”) Fm set in advance for each impression index Ym is used.
  • An arbitrary relational expression Fm is an arithmetic expression that expresses the relationship between the impression index Ym and the N feature indices X1 to XN.
  • the relational expression Fm of the first embodiment represents each impression index Ym as a linear expression of N feature indices X1 to XN, as exemplified below.
  • the coefficient anm (a11 to aNM) of the relational expression Fm exemplified above is a constant (gradient of the impression index Ym with respect to the feature index Xn) corresponding to the degree of correlation between the feature index Xn and the impression index Ym, and the coefficient bm ( b1 to bM) are predetermined constants (intercepts).
  • the coefficient anm can also be restated as the contribution (weighted value) of the feature index Xn to the impression index Ym.
  • the impression specifying unit 24 applies the N feature indexes X1 to XN extracted by the feature extraction unit 22 to each of the related expressions F1 to FM, so that M impression indexes Y1 to YM corresponding to different auditory impressions are applied. Is calculated.
  • the impression specifying unit 24 of the first embodiment generates singing style information S corresponding to the M impression indexes Y1 to YM calculated from the feature indexes Xn. Specifically, an M-dimensional vector having M impression indexes Y1 to YM as elements is generated as singing style information S.
  • the singing style information S comprehensively represents M types of auditory impressions (subjective singing styles perceived by the listener) of the singing voice V.
  • a nonlinear system such as a hidden Markov model or a neural network (multilayer perceptron) can be used for calculating the impression index Ym (Y1 to YM). .
  • the presentation processing unit 26 in FIG. 1 displays various images on the display device 18. Specifically, the presentation processing unit 26 of the first embodiment displays an analysis result image 70 that expresses the M impression indexes Y1 to YM (singing style information S) of the singing voice V specified by the impression specifying unit 24. It is displayed on the device 18.
  • an analysis result image 70 that expresses the M impression indexes Y1 to YM (singing style information S) of the singing voice V specified by the impression specifying unit 24. It is displayed on the device 18.
  • Fig. 2 shows one impression index Y1 related to young children (adult / childish) among M kinds of impression indices Y1 to YM and one related to clearness (clean and transparent / slowly turbid)
  • the analysis result image 70 includes a coordinate plane in which a first axis 71 indicating the numerical value of the impression index Y1 and a second axis 72 indicating the numerical value of the impression index Y2 are set.
  • the singing voice V is located at a coordinate position corresponding to the numerical value of the impression index Y1 calculated by the impression specifying unit 24 in the first axis 71 and the numerical value of the impression index Y2 calculated in the second axis 72 by the impression specifying unit 24.
  • An image (icon) 74 meaning an auditory impression is arranged.
  • the analysis result image 70 is an image representing an auditory impression of the singing voice V (an image representing a singing style including young children and turbidity). The user can visually and intuitively grasp the auditory impression of the singing voice V by visually recognizing the analysis result image 70 displayed on the display device 18.
  • FIG. 3 is a flowchart of an operation for analyzing the auditory impression of the singing voice V.
  • the processing of FIG. 3 is started when an operation (instruction to start analysis) from the user with respect to the input device 14 is triggered.
  • the feature extraction unit 22 acquires the singing voice V picked up by the sound pickup device 16 (S1), and N feature indices X1 indicating the acoustic features of the analysis section of the singing voice V. .About.XN are extracted (S2).
  • the analysis section is a section of the singing voice V that is an analysis target of an auditory impression, and is, for example, the entire section or a part of the singing voice V (for example, a chorus section).
  • the impression identification unit 24 calculates M impression indexes Y1 to YM by applying the N feature indexes X1 to XN extracted by the feature extraction unit 22 to each related expression Fm (S3).
  • the presentation processing unit 26 causes the display device 18 to display the analysis result image 70 of FIG. 2 representing the analysis result by the impression specifying unit 24 (S4).
  • the relational expression setting unit 40 in FIG. 1 sets relational expressions Fm (F1 to FM) used for calculating the impression index Ym of each auditory impression.
  • the storage device 12 of the first embodiment stores a reference data group DR and relationship description data DC.
  • the related expression setting unit 40 sets M related expressions F1 to FM using the reference data group DR and the relationship description data DC.
  • the reference data group DR is a set (database) of a plurality of reference data r.
  • the plurality of reference data r included in the reference data group DR is generated in advance by using a sound (hereinafter referred to as “reference sound”) generated by an unspecified number of speakers.
  • reference sound a sound generated by an unspecified number of speakers.
  • the sound of an arbitrary singer singing an arbitrary piece of music is recorded as a reference sound and used to generate reference data r.
  • any one piece of reference data r associates each impression index ym (y1 to yM) of the reference sound with the feature index xn (x1 to xN) of the reference sound. It is data.
  • the impression index ym is set to a numerical value corresponding to the auditory impression actually sensed by the listener of the reference sound
  • the characteristic index xn is a numerical value of the acoustic feature extracted from the reference sound in the same process as the feature extracting unit 22.
  • the relationship description data DC defines a correspondence relationship (correlation) between an auditory impression and a plurality of acoustic features.
  • FIG. 4 is an explanatory view illustrating the correspondence defined by the relationship description data DC of the first embodiment.
  • the relationship description data DC of the first embodiment affects the auditory impression EYm for each of the M types of auditory impressions EYm (EY1 to EYM) corresponding to different impression indices Ym.
  • a correspondence relationship ⁇ m ( ⁇ 1 to ⁇ M) with a plurality of types of acoustic features EX is defined.
  • FIG. 4 illustrates correspondence relationships ⁇ 1 to ⁇ 3 with a plurality of types of acoustic features EX for each of the three types of auditory impressions EY1 to EY3, ie, young, clear and light.
  • each acoustic feature EX correlated with each auditory impression EYm is as follows.
  • the numerical value of each acoustic feature EX exemplified below corresponds to the above-described feature index Xn.
  • ⁇ Pitch stability Degree of minute change (fluctuation) in time
  • Rise speed Degree of increase in volume immediately after pronunciation
  • ⁇ Fall Decrease the pitch from the reference value (note pitch)
  • Degree of singing expression (eg number of times)
  • Scribbling the degree of singing expression that raises the pitch over time from the reference value (for example, the number of times)
  • Vibrato depth the degree of pitch change in vibrato (eg amplitude and frequency) Contour: degree of sound clarity.
  • the volume ratio of the high frequency component to the low frequency component is suitable.
  • -Tongue The degree of temporal change in acoustic characteristics.
  • the degree of temporal change typically the time change rate of the frequency characteristics (for example, formant frequency or fundamental frequency) is suitable.
  • ⁇ Attack Volume immediately after sound generation
  • Crescend Degree of increase in volume over time
  • Frequency characteristics Shape of frequency spectrum
  • Higher harmonics Intensity of higher harmonic components
  • the correspondence relationship ⁇ m that the relationship description data DC of the first embodiment defines for any one type of auditory impression EYm is used to identify multiple types of intermediate elements EZ related to the auditory impression EYm.
  • This is a hierarchical relationship (hierarchical structure) interposed between the impression EYm and each acoustic feature EX.
  • a plurality of types of intermediate elements EZ related to one type of auditory impression EYm correspond to an impression that causes the listener to perceive the auditory impression EYm and an impression obtained by subdividing the auditory impression EYm into a plurality of parts.
  • Any one intermediate element EZ is associated with a plurality of types of acoustic features EX that affect the intermediate element EZ.
  • Each correspondence ⁇ m defined in the relationship description data DC is, for example, a survey (interviews, singer, etc.) for experts who have a lot of specialized knowledge about music and voice (singing). Constructed by analyzing the correlation between each acoustic impression EYm and each acoustic feature EX (what kind of acoustic impression EYm the listener tends to perceive from the sound of the acoustic feature EX) by questionnaire) Is done.
  • a known investigation technique represented by an evaluation grid method or the like can be arbitrarily employed.
  • the relationship description data DC described above defines only the mutual relationship (connection) of each element (acoustic feature EX, intermediate element EZ, auditory impression EYm) included in the correspondence relationship ⁇ m, and between the elements.
  • the degree of correlation is not specified.
  • each correspondence ⁇ m defined by the relationship description data DC is an actual correlation between the acoustic feature EX and the auditory impression EYm observed from the reference sounds collected from a large number of unspecified speakers. (I.e., the actual relationship between each impression index ym and each feature index xn statistically observed from the reference data group DR reflecting the tendency of the actual reference sound) It can be said that there is.
  • the reference data group DR and the relationship description data DC described above are created in advance and stored in the storage device 12.
  • the relational expression setting unit 40 in FIG. 1 sets M relational expressions F1 to FM using the reference data group DR and the relationship description data DC stored in the storage device 12. That is, the relational expression setting unit 40 represents the relational expression Fm that expresses the relationship between the impression index Ym of the auditory impression EYm and the characteristic index Xn of the acoustic feature EX under the corresponding relations ⁇ m defined by the relationship description data DC. Is set for each of the M impression indices Y1 to YM.
  • the relational expression setting unit 40 sets N coefficients a1m to aNm and one coefficient bm for each relational expression Fm.
  • known statistical processing such as structural equation modeling (SEM) or multivariate analysis (for example, multiple regression analysis) can be arbitrarily employed. As understood from the example of FIG.
  • the type and total number of acoustic features EX that are correlated with the auditory impression EYm based on the correspondence ⁇ m expressed by the relationship description data DC are actually Although different for each auditory impression EYm, the type and total number of feature indexes Xn included in each of the related formulas Fm described above are common to M related formulas F1 to FM.
  • the coefficient anm corresponding to the feature index Xn of the acoustic feature EX whose correlation with the auditory impression EYm is not defined under the correspondence relationship ⁇ m is set to zero in the related expression Fm (that is, the feature index Xn is Does not affect the impression index Ym).
  • the M relational expressions (for example, structural equations and multiple regression equations) F1 to FM set by the relational expression setting unit 40 in the above procedure are stored in the storage device 12. Specifically, N coefficients a1m to aNm and one coefficient bm are stored in the storage device 12 for each of the M related expressions F1 to FM.
  • the impression specifying unit 24 applies the M feature indexes X1 to XN to each of the M related formulas F1 to FM set by the related formula setting unit 40, so that the M types of impression indexes Y1 to YM are applied. Is calculated.
  • the singing is performed using the relational expression Fm that defines the relationship between each feature index Xn extracted from the singing voice V and the impression index Ym indicating the auditory impression of the singing voice V. Auditory impressions (impression indices Y1 to YM) of the voice V are specified. Therefore, for example, the singing voice V is compared with the techniques of Patent Document 1 and Patent Document 2 that evaluate the skill of the singing by focusing only on the difference between the reference value indicating the exemplary singing and the characteristic index Xn of the singing voice V. It is possible to appropriately evaluate the subjective impression that the listener actually takes.
  • the tendency of the correlation between the impression index ym and the feature index xn is statistically analyzed to set the related expression Fm (hereinafter referred to as “proportional”).
  • Proportional the relationship description data DC is not used for setting the relational expression Fm.
  • a specific acoustic feature EX that does not actually correlate with the auditory impression EYm is recognized as if it is correlated with the auditory impression EYm due to a potential factor (pseudo-correlation).
  • a relational expression Fm is derived in which the characteristic index Xn that does not actually correlate with the impression index Ym has a dominant influence on the impression index Ym.
  • the relationship description data DC defining the hypothetical correspondence ⁇ m between each auditory impression EYm and each acoustic feature EX is used together with the reference data group DR for setting the relational expression Fm.
  • the influence of the pseudo correlation between the auditory impression EYm and the acoustic feature EX is reduced (ideally excluded). Therefore, there is an advantage that the relational expression Fm appropriately expressing the actual correlation between the auditory impression EYm and each acoustic feature EX can be set.
  • the auditory impression EYm and each acoustic feature EX via a plurality of intermediate elements EZ related to the auditory impression EYm is defined by the relationship description data DC
  • the auditory impression EYm and each Compared with the configuration in which the acoustic feature EX is directly correlated (the configuration in which the correspondence ⁇ m includes only the auditory impression EYm and the acoustic feature EX), the actual correlation between the auditory impression EYm and each acoustic feature EX is related
  • the above-described effect of being able to be appropriately expressed by the formula Fm is particularly remarkable.
  • the user designates the auditory impression of the singing voice V by appropriately operating the input device 14 after the music is finished. For example, for each of the M types of auditory impressions, a plurality of options (multiple levels of evaluation) of the impression index Ym are displayed on the display device 18, and the user specifies one desired option for each auditory impression.
  • the relational expression setting unit 40 extracts each feature extracted by the feature extraction unit 22 for the impression index ym (y1 to ym) and the singing voice V of each auditory impression specified by the user.
  • Reference data r including the index xn (x1 to xN) is acquired and stored in the storage device 12. Then, the relational expression setting unit 40 uses the reference data group DR including the new reference data r corresponding to the singing voice V to generate the relational expression Fm (F1 to FM) in the same manner as in the first embodiment. Set and remember.
  • the predetermined relational expression Fm (F1 to FM) is updated to reflect the relationship between the auditory impression (impression index ym) and the acoustic feature (feature index xn) of the singing voice V collected by the sound collection device 16.
  • the relational expressions F1 to FM can be updated to contents reflecting the relationship between the auditory impression of the actual singing voice V and the acoustic features.
  • the timing for setting (updating) the relational expression Fm using the reference data group DR is arbitrary.
  • a configuration in which the related formula Fm is updated each time the reference data r corresponding to the singing voice V is acquired, or a configuration in which the related formula Fm is updated when a predetermined number of new reference data r is accumulated can be adopted.
  • the modification illustrated above can be similarly applied to each embodiment illustrated below.
  • Second Embodiment A second embodiment of the present invention will be described.
  • standard referred by description of 1st Embodiment is diverted, and each detailed description is abbreviate
  • FIG. 5 is a configuration diagram of the acoustic analysis device 100B of the second embodiment.
  • the acoustic analysis device 100 ⁇ / b> B according to the second embodiment includes the same elements as the first embodiment (the feature extraction unit 22, the impression identification unit 24, the presentation processing unit 26, and the related expression setting unit 40).
  • the information generation unit 32 is added.
  • the information generating unit 32 generates presentation data QA corresponding to the auditory impressions (M impression indexes Y1 to YM) specified by the impression specifying unit 24 as in the first embodiment.
  • the information generating unit 32 can be rephrased as an element for converting the M impression indexes Y1 to YM into the presentation data QA.
  • the presentation processing unit 26 of the second embodiment presents the presentation data QA generated by the information generation unit 32 to the user. Specifically, the presentation processing unit 26 causes the display device 18 to display the contents of the presentation data QA.
  • the extraction of N feature indices X1 to XN by the feature extraction unit 22 and the setting of M number of related formulas F1 to FM by the related formula setting unit 40 are the same as in the first embodiment. Therefore, the same effects as those of the first embodiment are realized in the second embodiment.
  • the information generation unit 32 of the second embodiment generates music related data dA corresponding to the M impression indexes Y1 to YM (singing style information S) specified by the impression specification unit 24 as the presentation data QA. Specifically, the information generation unit 32 searches a plurality of candidates for music according to the M impression indexes Y1 to YM, and acquires related data dA of the music.
  • the related data dA is information related to music. For example, in addition to music identification information (for example, music number), attribute information such as music name, singer name, and genre is included in the related data dA.
  • the search data WA stored in the storage device 12 is used for the search of music by the information generation unit 32 (generation of the related data dA).
  • the search data WA defines the relationship between the singing style information S (M impression indices Y1 to YM) and music.
  • the search data WA of the second embodiment is related to music related data dA (dA1, dA2,%) For each of a plurality of classes CL (CL1, CL2,%) Corresponding to different singing styles. ) Is specified.
  • a large number of singing style information S generated from the singing voice V of an arbitrary piece of music is classified into a plurality of classes CL, and the singing voice of each singing style information S classified into an arbitrary one class CL.
  • V for example, the related data dA of one piece of music having the highest number of singing is designated as the class CL in the search data WA. That is, for the class CL corresponding to any one kind of singing style, the related data dA of the music that many singers tend to sing in the singing style is designated.
  • known statistical processing clustering
  • the plurality of classes CL are expressed by, for example, a mixed normal distribution that approximates the distribution of the singing style information S belonging to each class CL. .
  • the information generating unit 32 specifies one class CL to which the singing style information S (M impression indices Y1 to YM) generated by the impression specifying unit 24 among a plurality of classes CL registered in the search data WA belongs. Then, the related data dA of the music designated as the class CL in the search data WA is selected as the presentation data QA.
  • the presentation processing unit 26 causes the display device 18 to display the presentation data QA (related data dA) generated by the information generation unit 32. That is, music identification information and attribute information are displayed on the display device 18.
  • the information generation unit 32 can update the search data WA using the singing voice V collected by the sound collection device 16. Specifically, the information generation unit 32 sequentially accumulates the singing style information S generated from the singing voice V of any one piece of music together with the related data dA of the piece of music in the storage device 12 and stores it in the storage device 12.
  • the search data WA is updated by known machine learning, for example, so that the relationship between the accumulated singing style information S and the related data dA is reflected.
  • music is not searched based on the characteristics (musical tone, etc.) of the music itself, but based on singing styles sung by many singers in the past. Is searched. For example, if the auditory impression of the singing voice V is “passionate and bright singing”, similarly, songs sung by many singers in the past with the singing style of “passionate and bright singing” are searched.
  • Japanese Unexamined Patent Application Publication No. 2011-197345 discloses a technique for searching for music corresponding to a keyword designated by a user and presenting it to the user.
  • the above technique only searches for music that is formally related to the keyword specified by the user.
  • music corresponding to the auditory impression (singing style) of the singing voice V specified by the impression specifying unit 24 is presented to the user, so that the user who uttered the singing voice V uses his / her singing style.
  • music corresponding to the auditory impression (singing style) of the singing voice V specified by the impression specifying unit 24 is presented to the user, so that the user who uttered the singing voice V uses his / her singing style.
  • the configuration in which the search data WA specifies one piece of music for each of the plurality of classes CL of the singing style information S has been exemplified.
  • the class CL It is also possible to specify related data dA of a plurality of music pieces sung by each singing voice V of the singing style classified as “1”.
  • the information generation unit 32 generates, as the presentation data QA, related data dA of a plurality of pieces of music specified in one class CL to which the singing style information S specified by the impression specifying unit 24 belongs. That is, a plurality of music pieces that tend to be sung in the same singing style as the singing voice V are presented to the user.
  • the condition for selecting one piece of music from a plurality of pieces designated for one class CL is arbitrary. For example, a piece of music having the largest number of times of singing with each singing voice V classified into the class CL is selected. According to the configuration to be selected and instructions from the user (for example, selection conditions such as “90s” specified by the user and attribute information such as the user's age) among the plurality of songs specified in the class CL A configuration in which one piece of music is selected is preferable.
  • the search data WA can be updated at any time so that the relationship between the singing style information S of the singing voice V and the music of the singing voice V is reflected.
  • the singing style information S of the singing voice V whose hearing impression is not appropriate is reflected in the search data WA, an appropriate search for music may be hindered. Therefore, it is preferable to select the singing style information S to be reflected in the search data WA.
  • the user the speaker or listener specifies whether or not the actual auditory impression of the singing voice V is appropriate by operating the input device 14, and the singing style information S and the singing voice V for which the auditory impression is determined to be appropriate.
  • the relationship with the music is reflected in the search data WA, and the singing style information S of the singing voice V for which the auditory impression is determined to be inappropriate is not reflected in the search data WA. According to the above structure, there exists an advantage that the search data WA reflecting the singing style of many singers can be generated.
  • FIG. 6 is a configuration diagram of an acoustic analysis device 100C according to the third embodiment.
  • the acoustic analysis device 100C according to the third embodiment is similar to the acoustic analysis device 100B (FIG. 5) according to the second embodiment in that M impression indexes Y1 to Y specified by the impression specification unit 24 are used.
  • an information generation unit 32 that generates presentation data QA corresponding to YM is added to the first embodiment.
  • the extraction of N feature indices X1 to XN by the feature extraction unit 22 and the setting of M number of related formulas F1 to FM by the related formula setting unit 40 are the same as in the first embodiment. Therefore, the third embodiment can achieve the same effect as the first embodiment.
  • the storage device 12 of the third embodiment stores a plurality of image data dB representing auditory impressions (M impression indices Y1 to YM).
  • Each image data dB represents an image (including symbols and characters) representing the auditory impression figuratively or schematically.
  • an image of a character (animal or the like) or a celebrity suitable for the auditory impression specified by the impression specifying unit 24 is suitable as the image data dB.
  • the information generation unit 32 of the third embodiment displays image data dB representing the auditory impression (M impression indices Y1 to YM) specified by the impression specifying unit 24 among the plurality of image data dB stored in the storage device 12. Is selected as the presentation data QA. For the selection of the image data dB by the information generation unit 32, the conversion data WB stored in the storage device 12 is used.
  • the conversion data WB defines the relationship between the M impression indexes Y1 to YM (singing style information S) and the image data dB.
  • the conversion data WB represents a structural equation that defines the correlation between the M impression indexes Y1 to YM and the image data dB.
  • structural equation modeling SEM is preferably used as in the setting of the M related equations F1 to FM exemplified in the first embodiment. That is, for example, a plurality of learning data (learning data) in which M impression indexes Y1 to YM and image data dB are associated with each other, and a relationship that defines a correspondence relationship between M types of auditory impressions and image data dB.
  • the structural equation defining the relationship between the M impression indices Y1 to YM and the image data dB is set in advance and stored in the storage device 12 as the conversion data WB.
  • the conversion data WB For example, when the impression index Ym regarding a young child indicates a child-like song, image data dB indicating an image of a child-like character is generated, and when the impression index Ym regarding light and dark indicates a bright song, a character with a bright expression is generated.
  • Conversion data WB is set and stored so that image data dB representing an image is generated.
  • the information generation unit 32 specifies the image data dB by applying the M impression indexes Y1 to YM specified by the impression specification unit 24 to the structural equation of the conversion data WB, and the image stored in the storage device 12 Data dB is acquired as presentation data QA.
  • the presentation processing unit 26 causes the display device 18 to display the presentation data QA generated by the information generation unit 32.
  • an image representing the auditory impression (singing style) of the singing voice V figuratively or schematically is displayed on the display device 18. The user can visually and intuitively grasp the auditory impression of the singing voice V by visually recognizing the image displayed on the display device 18.
  • Japanese Patent Application Laid-Open No. 2002-041063 discloses a technique for displaying a character image in accordance with information such as a song name, the number of singings, and a scoring result.
  • the above technique only displays an image unrelated to the auditory impression of the singing voice.
  • an image an image representing the singing style
  • the auditory impression of the singing voice V specified by the impression specifying unit 24 is presented, so that the user intuitively views the auditory impression of the singing voice V.
  • image data dB corresponding to the characteristics of the singing voice V is simply selected, a direct relationship between each feature index Xn of the singing voice V and the image data dB is determined in advance, and a feature extracting unit is selected.
  • image data dB representing the auditory impression of the singing voice V is appropriately associated with the feature index Xn. It is actually difficult to do.
  • the image data dB suitable for each auditory impression is used as the conversion data WB. It is possible to correspond to each impression index Ym. There is also an advantage that the relationship between the auditory impression and the image data dB can be changed independently of the relational expressions F1 to FM.
  • the extraction of the N feature indexes X1 to XN by the feature extraction unit 22 and the identification of the M impression indexes Y1 to YM by the impression specification unit 24 are the K unit sections obtained by dividing the singing voice V on the time axis. Each is executed sequentially.
  • the method of dividing the singing voice V into a plurality of unit sections is arbitrary, for example, as illustrated in FIG. 7, the singing voice V is divided into a plurality of unit sections (A to C melody, It is possible to classify into rust 1 and rust 2).
  • Each of the K unit intervals corresponds to one group of image data dB.
  • the information generation unit 32 selects 1 corresponding to M impression indexes Y1 to YM of the unit section among the plurality of image data dB of the group corresponding to the unit section. Pieces of image data dB are selected. That is, one image data dB is selected for each unit section of the singing voice V, and finally, presentation data QA including K image data dB corresponding to different unit sections is generated. Specifically, in the singing voice V, one piece of image data dB (see FIG. 7) from the “topping” group according to the M impression indexes Y1 to YM specified from the unit section of “A to C melody”.
  • an image of “strawberry” is selected, and one image data dB from the “cream” group (in the example of FIG. 7, “in the illustration of FIG. 7”) according to the M impression indexes Y1 to YM of the unit section of “rust 1”.
  • "Whipped cream” image) is selected, and one piece of image data dB ("disk-like" in the example of FIG. 7) from the "base” group according to the M impression indices Y1 to YM of the unit section of "Rust 2" "Sponge” image) is selected.
  • the presentation processing unit 26 causes the display device 18 to display an image obtained by combining the K pieces of image data dB included in the presentation data QA after the song singing is completed. Specifically, as illustrated in FIG. 7, an image of “cake”, which is a combination of “topping” image data dB, “cream” image data dB, and “base” image data dB, is displayed on the display device 18. Is displayed. Since the image data dB of each unit section is selected according to the auditory impression of the unit section, the content of the image of the article displayed according to the presentation data QA (the mode of each element constituting the article) is the unit section. It changes according to the auditory impression. Therefore, it is possible to provide interest by diversifying the images presented to the user.
  • the content of the image displayed by the combination of the plurality of image data dB is not limited to the above example (cake).
  • a character such as an avatar representing the user
  • an image of each element constituting the character for example, each element such as clothes or hairstyle, each element such as eyes and mouth constituting the face
  • the image data dB is selected for each unit section of the singing voice V.
  • the image data dB is selected for each unit section obtained by dividing the singing voice V on the time axis.
  • a configuration in which the information generation unit 32 selects the image data dB corresponding to the impression index Ym of the auditory impression for each of M types of auditory impressions (that is, for each auditory impression) may be employed.
  • a plurality of image data dB prepared in advance is classified into a plurality of groups (categories), and M impression indices Y1 to YM are selected from one group selected under a predetermined condition among the plurality of groups.
  • the corresponding image data dB can be selected by the information generation unit 32 as the presentation data QA.
  • the condition for selecting one group is arbitrary. For example, a configuration in which the image data dB is selected from a group designated by an operation on the input device 14 by a user among a plurality of groups, or a user among a plurality of groups.
  • a configuration in which the image data dB is selected from a group selected according to the attribute information (for example, age, sex, etc.) is suitable. It is also possible to select a group of image data dB according to the attribute information of a plurality of users.
  • the conversion data WB that represents the structural equation that defines the correlation between each impression index Ym and the image data dB has been exemplified.
  • the impression index Ym and the image data dB are associated with each other. It is also possible to use the data table as the conversion data WB.
  • FIG. 8 is a configuration diagram of an acoustic analysis device 100D of the fourth embodiment.
  • the acoustic analysis device 100D according to the fourth embodiment is similar to the acoustic analysis device 100B according to the second embodiment (FIG. 5), and the presentation data QA corresponding to the M impression indices Y1 to YM.
  • This is a configuration in which an information generation unit 32 for generating the information is added to the first embodiment.
  • the extraction of N feature indices X1 to XN by the feature extraction unit 22 and the setting of M number of related formulas F1 to FM by the related formula setting unit 40 are the same as in the first embodiment. Therefore, the same effect as that of the first embodiment is realized in the fourth embodiment.
  • history data H indicating the history of the hearing impression of the singing voice V is stored in the storage device 12 for each user.
  • the history data H includes user information hA and impression history hB.
  • the user information hA includes identification information and attribute information (for example, age and sex) of the user who uttered the singing voice V.
  • the impression history hB is a time series of each impression index Ym specified by the impression specifying unit 24 in the past from the user's singing voice V.
  • the impression specifying unit 24 adds the impression index Y1 to the impression history hB of the history data H of the user who uttered the singing voice V. Add ⁇ YM.
  • the history data H can be rephrased as time-series data expressing temporal transition of each user's singing style.
  • the storage device 12 of the fourth embodiment stores a plurality of property data dC expressing the properties of the user.
  • the property data dC represents a character string that represents the user's property.
  • a user's property is a user's property (temperament, character) and state (for example, mental or physical situation).
  • a plurality of personalities defined by known personality classifications for example, Kretschmer temperament classification, Jung classification, Enneagram classification
  • the information generation unit 32 estimates the user's properties according to the auditory impression that the impression specifying unit 24 specifies in the past for the user's singing voice V. Specifically, the information generation unit 32 selects, as the presentation data QA, the property data dC corresponding to the auditory impression history indicated by the user history data H among the plurality of property data dC stored in the storage device 12. .
  • the conversion data WC stored in the storage device 12 is used for the selection of the property data dC (estimation of the user's property) by the information generator 32.
  • the conversion data WC defines the relationship between the impression history hB (time series of auditory impressions) and the property data dC.
  • the conversion data WC of the fourth embodiment is a data table in which impression history hB (hB1, hB2,...) And property data dC (dC1, dC2,7) Are associated with each other.
  • the characteristic data dC of “circular temperament” in the Kretschmer temperament classification corresponds to the impression history hB in which light and dark (bright / dark) appear alternately in the time series of the impression index Ym related to light and dark. To do.
  • FIG. 9 the characteristic data dC of “circular temperament” in the Kretschmer temperament classification corresponds to the impression history hB in which light and dark (bright / dark) appear alternately in the time series of the impression index Ym related to light and dark.
  • the impression history hB that changes from a strong (severe) voice to a quiet voice in the time series of the impression index Ym related to activity (strong / quiet) is “I am tired today.
  • property data dC indicating a state such as “?”.
  • the user sings the music after specifying his / her identification information by operating the input device 14.
  • the information generating unit 32 sets the property data associated with the impression history hB of the user history data H specified by the identification information among the plurality of property data dC stored in the storage device 12 by the conversion data WC. dC is specified as the presentation data QA.
  • the presentation processing unit 26 causes the display device 18 to display the presentation data QA generated by the information generation unit 32.
  • the result (property data dC) of estimating the user's property with reference to the auditory impression of the singing voice V is displayed on the display device 18.
  • the user can confirm the estimation result of his / her property by visually recognizing the image displayed on the display device 18.
  • the character of the speaker is estimated using the time series (impression history hB) of each impression index Ym of the singing voice V, it is appropriate to take into account the temporal change of the auditory impression of the singing voice V. There is an advantage that a proper property can be estimated.
  • Patent Document 1 and Patent Document 2 merely evaluate the objective skill of a singing that focuses only on the difference in the feature amount between the exemplary singing voice and the singing voice to be evaluated.
  • a user's property is estimated and shown according to the auditory impression of the singing voice V, it is possible to give a production effect and interest property to a user.
  • the result of estimating the user's properties from the singing voice V in the fourth embodiment is used for managing the mental / physical state of the user (for example, psychological counseling, health management, therapy, self-development). It is also possible to do. It is also possible to learn a singing style that can give a desired impression to others by adjusting the singing style so that his / her properties presented on the display device 18 approach the target.
  • a plurality of property data dC prepared in advance is classified into a plurality of groups (categories), and history data H (impression history hB) is selected from one group selected under a predetermined condition among the plurality of groups. It is also possible for the information generating unit 32 to specify the corresponding property data dC.
  • the condition for selecting one group is arbitrary. For example, the configuration in which the property data dC is selected from the group specified by the operation of the input device 14 by the user among a plurality of groups, or the user among the plurality of groups.
  • a configuration in which the property data dC is selected from a group selected according to the attribute information (for example, age and sex) is suitable. It is also possible to select a group of property data dC according to the attribute information of a plurality of users.
  • the content of the impression history hB of the history data H is not limited to the above example (time series of impression index Ym).
  • time series of impression index Ym For example, it is also possible to use the frequency and variation rate (change amount within unit time) for each numerical value of the impression index Ym as the impression history hB.
  • the history data H is generated by using the time series of the impression index Ym of a specific section (for example, rust) in the music as the impression history hB, or for each specific period (for example, every day, every week, every month).
  • a configuration for generating the history data H may be employed.
  • a configuration in which the history data H is generated for each music (or for each genre of music) is also suitable.
  • the property data dC indicating the character string indicating the user's property is illustrated, but image data of an image (for example, a portrait or character) representing the user's property is used as the property data dC. It is also possible to do.
  • a configuration in which a user or a celebrity who shares the property data dC is presented, or a configuration in which a property opposite to the property indicated by the property data dC is proposed to the user may be employed.
  • FIG. 11 is a configuration diagram of an acoustic analysis device 100E according to the fifth embodiment.
  • the acoustic analysis device 100E according to the fifth embodiment includes elements similar to those in the first embodiment (the feature extraction unit 22, the impression identification unit 24, the presentation processing unit 26, and the related expression setting unit 40).
  • a target setting unit 42 and an analysis processing unit 44 are added.
  • N feature indices X1 to XN are extracted by the feature extraction unit 22
  • M impression indices Y1 to YM are identified by the impression identification unit 24, and M relational expressions F1 to FM are set by the relational expression setting unit 40 Is the same as in the first embodiment. Therefore, the same effect as that of the first embodiment is also realized in the fifth embodiment.
  • the target setting unit 42 variably sets each target value Am in accordance with an instruction from the user to the input device 14.
  • the presentation processing unit 26 of the fifth embodiment causes the display device 18 to display the operation screen 80 of FIG. 12 that accepts an instruction of the target value Am of each impression index Ym.
  • Each operation element image 82 is an image of a slider-type operation element that moves in response to an instruction from the user to the input device 14 and accepts an instruction of a target value Am by the user.
  • the target setting unit 42 sets a target value Am for each impression index Ym according to the position of each operator image 82. Note that each of the plurality of operation element images 82 on the operation screen 80 can be moved individually, but each operation element image 82 can also be moved in conjunction with each other.
  • the analysis processing unit 44 in FIG. 11 specifies an acoustic feature (feature index Xn) to be changed in order to bring each impression index Ym specified for the singing voice V by the impression specifying unit 24 close to the target value Am.
  • the analysis processing unit 44 according to the fifth embodiment generates analysis data QB that designates acoustic features that should be changed to bring each impression index Ym closer to the target value Am and the direction (increase / decrease) of the change.
  • the presentation processing unit 26 causes the display device 18 to display the contents of the analysis data QB generated by the analysis processing unit 44 (acoustic features to be changed and change directions). Therefore, the user can grasp an improvement point for bringing his / her song close to the target auditory impression.
  • the presentation of the analysis data QB corresponds to singing instruction for realizing a target auditory impression.
  • the analysis processing unit 44 of the fifth embodiment minimizes a numerical value (hereinafter referred to as “total difference”) ⁇ obtained by summing the absolute value
  • the acoustic features that should be changed in order to be converted are identified from the N types of acoustic features.
  • the analysis processing unit 44 calculates the total difference ⁇ when it is assumed that the feature index Xn of any one of N types of acoustic features is changed by a predetermined change amount p, as the acoustic feature to be changed. Calculate multiple cases with different values and compare them with each other to generate analysis data QB that specifies the acoustic feature to be changed and the direction (increase / phenomenon) of the change when the total difference ⁇ is minimized To do.
  • the total difference ⁇ when any one feature index Xn is changed by the change amount p is expressed by the following formula (A).
  • the subtraction of the multiplication value of the change amount p and the coefficient anm in the formula (A) corresponds to a process of changing the feature index Xn by the change amount p.
  • the characteristic index Xn having a large coefficient anm in the relational expression Fm of the impression index Ym different from the target value Am is the impression.
  • the index Ym is preferentially selected as the characteristic index Xn to be changed in order to bring the index Ym close to the target value Am.
  • the user who confirms the analysis result (analysis data QB) by the analysis processing unit 44 on the display device 18 is a policy of “decrease the vibrato depth” in order to realize the “child-friendly and clean singing”. Can be grasped as the best.
  • the user can grasp the optimum improvement point (acoustic feature) for bringing the singing voice V close to the target auditory impression.
  • the optimum improvement point acoustic feature
  • application as a method of self-realization and health maintenance can also be expected.
  • FIG. 13 is a configuration diagram of an acoustic analysis device 100F according to the sixth embodiment.
  • the acoustic analysis device 100 ⁇ / b> F according to the sixth embodiment includes the same elements (feature extraction unit 22, impression identification unit 24, presentation processing unit 26, relational expression setting unit 40, target as in the fifth embodiment.
  • the sound processing unit 46 is added to the setting unit 42 and the analysis processing unit 44).
  • N feature indices X1 to XN are extracted by the feature extraction unit 22
  • M impression indices Y1 to YM are identified by the impression identification unit 24, and M relational expressions F1 to FM are set by the relational expression setting unit 40 Is the same as in the first embodiment. Therefore, the sixth embodiment can achieve the same effect as that of the first embodiment.
  • the target setting unit 42 of the sixth embodiment sets the target value Am of each impression index Ym in accordance with, for example, an instruction from the user, as in the fifth embodiment.
  • the analysis processing unit 44 uses the analysis data QB for designating acoustic features (feature index Xn) to be changed in order to bring each impression index Ym specified by the impression specifying unit 24 for the singing voice V close to the target value Am, in the fifth embodiment. Generate in the same way as
  • the sound processing unit 46 in FIG. 13 performs an acoustic process on the singing voice V to change the acoustic feature specified by the analysis processing unit 44. Specifically, the acoustic processing unit 46 converges so that the acoustic feature specified by the analysis data QB generated by the analysis processing unit 44 changes (increases / decreases) in the direction specified by the analysis data QB. Acoustic processing is performed on the singing voice V collected by the sound device 16. That is, among the N feature indexes X1 to XN of the singing voice V, the feature index Xn (that is, the impression index) having a large coefficient (contribution to the impression index Ym) anm in the relational expression Fm of the impression index Ym different from the target value Am. The characteristic index Xn) that can effectively bring Ym close to the target value Am is preferentially changed by the acoustic processing by the acoustic processing unit 46.
  • a known acoustic processing technique for the specific acoustic processing executed on the singing voice V, a known acoustic processing technique according to the type of acoustic feature to be changed is arbitrarily adopted.
  • the acoustic processing unit 46 performs acoustic processing (noise addition processing) for adding a noise component to the singing voice V.
  • the analysis data QB designates “decrease in vibrato depth” as illustrated in the fifth embodiment, the acoustic processing unit 46 suppresses minute fluctuations in pitch in the singing voice V. An acoustic process is performed on the singing voice V.
  • the singing voice V after processing by the acoustic processing unit 46 is reproduced from, for example, the sound emitting device 17 (speaker or headphones). Note that, instead of (or along with) reproduction of the singing voice V, it is also possible to generate a file of the singing voice V after processing by the acoustic processing unit 46.
  • the auditory impression of the singing voice V can be adjusted to a desired impression (auditory impression according to the target value Am).
  • a desired impression an impression according to the target value Am.
  • the characteristic index Xn specified by the analysis data QB (hereinafter referred to as “priority index” for convenience) is sufficiently varied in the singing voice V (that is, the impression index Ym sufficiently approximates the target value Am). It may not be possible. For example, even if the analysis data QB specifies “increase in the depth of vibrato”, if the singing voice V does not include a section in which the pitch is maintained over a length of time that vibrato can be added, the priority index is The impression index Ym cannot be made sufficiently close to the target value Am by increasing a certain “depth of vibrato”.
  • the acoustic processing unit 46 sets the priority index in the order effective for bringing the impression index YM close to the target value Am among the N characteristic indices X1 to XN of the singing voice V (ascending order of the total difference ⁇ ).
  • the acoustic processing for the singing voice V is executed so that the feature index Xn positioned at the next position changes. According to the above configuration, each impression index Ym can be effectively brought close to the target value Am regardless of the characteristics of the singing voice V.
  • Japanese Unexamined Patent Application Publication No. 2011-095397 discloses a configuration in which a plurality of types of control variables applied to speech synthesis are set in accordance with instructions from a user.
  • the target value Am of each auditory impression is set in accordance with an instruction from the user. For example, even a user who does not have specialized knowledge about the speech synthesis control variable has a desired auditory impression singing voice.
  • V can be generated effectively (instruction by the user is facilitated).
  • the auditory impression is specified for the singing voice V over the entire section of the music, but the auditory impression (M impressions) for each of a plurality of sections obtained by dividing the singing voice V on the time axis. It is also possible to specify the indicators Y1 to YM) sequentially.
  • the auditory impression is sequentially specified for each section of the singing voice V
  • the presentation data QA exemplified in the second to fourth embodiments and the analysis data QB exemplified in the fifth and sixth embodiments are sung.
  • a configuration may also be adopted in which updating is performed sequentially (in real time) for each section according to the auditory impression of each section of the voice V.
  • elements for analyzing the singing voice V picked up by the sound pickup device 16 feature extraction unit 22, impression identification unit 24, presentation processing unit 26, information generation unit 32, target setting unit 42,
  • An acoustic analysis apparatus 100 (100A, 100B, 100C, 100D, 100E, 100F, 100G) including both an analysis processing unit 44 and an acoustic processing unit 46) and a related formula setting unit 40 for setting each related formula Fm is provided.
  • the acoustic analysis device 110 and the acoustic analysis device 120 that communicate with each other via the communication network 200 can share the functions illustrated in the above-described embodiments.
  • the acoustic analysis device (relevant formula setting device) 110 uses the reference data group DR and the relationship description data DC to set M related formulas F1 to FM in the same manner as in the first embodiment. Part 40.
  • the acoustic analysis device 110 is realized by a server device connected to the communication network 200.
  • the M related formulas F1 to FM set by the acoustic analysis device 110 (the related formula setting unit 40) are transferred to the acoustic analysis device 120 via the communication network 200.
  • the acoustic analysis device 120 includes a feature extraction unit 22 and an impression identification unit 24, and sings in the same manner as in the first embodiment using M related expressions F1 to FM transferred from the acoustic analysis device 110.
  • the auditory impression M impression indices Y1 to YM
  • an information generation unit 32 similar to those in the second to fourth embodiments, a target setting unit 42, and an analysis processing unit 44 similar to those in the fifth and sixth embodiments may be installed. .
  • FIG. 1 In the configuration of FIG.
  • the control data for controlling various devices can be set as the presentation data QA according to the auditory impression of the singing voice V.
  • the control data is applied to, for example, control of an image (background image) displayed on the display device 18 during song singing, playback of accompaniment sounds played by a playback device (karaoke device), and control of effects such as lighting devices. Is done. It is also possible to change foods and drinks that can be ordered in a store such as a karaoke store according to the presentation data QA. It is also possible to apply the auditory impression (M impression indices Y1 to YM) of the singing voice V to singing evaluation (scoring).
  • a configuration (for example, a configuration in which the points are increased as the two are similar) is preferably employed.
  • each target value Am is set according to an instruction from the user, but the method of setting the target value Am is not limited to the above examples.
  • a configuration may be employed in which the target value Am (A1 to AM) is selected in advance for each song, and the target setting unit 42 selects the target value Am of the song that the user actually sings.
  • the target setting unit 42 can variably set each target value Am according to the attributes of the music sung by the user (main melody, genre, singer, etc.).
  • the analysis object is not limited to the singing voice V.
  • auditory impressions M impression indicators
  • Fm auditory impressions
  • Y1-YM an impression index Ym such as light and darkness and turbidity can be specified as in the above-described embodiments.
  • an auditory impression such as “muffled / excluded” or “far / perverse” may be used.
  • any sound system including sound played back at each point under a remote conference system that sends and receives sound between remote locations (for example, conversation sound at a conference) and sound emitting devices such as speakers.
  • the auditory impression can be specified for the sound to be heard.
  • the specific content (type) of the sound (analysis target sound) to be analyzed in the present invention, the principle of pronunciation, and the like are arbitrary.
  • the acoustic analysis device is realized by a dedicated electronic circuit, or by cooperation of a general-purpose arithmetic processing device such as a CPU (Central Processing Unit) and a program.
  • the program of the present invention can be provided in a form stored in a computer-readable recording medium and installed in the computer.
  • the recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but a known arbitrary one such as a semiconductor recording medium or a magnetic recording medium This type of recording medium can be included.
  • the program of the present invention can be provided in the form of distribution via a communication network and installed in a computer.
  • the present invention is also specified as an operation method (acoustic analysis method) of the acoustic analysis device according to each of the above embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

L'invention concerne une unité de définition de formule de relation (40) qui définit une formule de relation (Fm) qui exprime la relation entre un indice de caractéristique (xn) pour chaque caractéristique acoustique et un indice d'impression (ym) pour l'impression auditive dans une relation de correspondance prescrite par des données de description de relation (Dc) qui prescrivent la relation de correspondance entre l'impression auditive et la pluralité de types de caractéristiques acoustiques, ladite formule de relation (Fm) étant définie à l'aide d'une pluralité de données de référence (r) qui associent mutuellement un indice d'impression (ym) indiquant l'impression auditive d'un son de référence et un indice de caractéristique (xn) indiquant les caractéristiques acoustiques du son de référence ; et les données de description de relation (Dc).
PCT/JP2015/075923 2014-09-12 2015-09-11 Dispositif d'analyse acoustique WO2016039463A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2014186191A JP2016057570A (ja) 2014-09-12 2014-09-12 音響解析装置
JP2014-186191 2014-09-12

Publications (1)

Publication Number Publication Date
WO2016039463A1 true WO2016039463A1 (fr) 2016-03-17

Family

ID=55459206

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/075923 WO2016039463A1 (fr) 2014-09-12 2015-09-11 Dispositif d'analyse acoustique

Country Status (2)

Country Link
JP (1) JP2016057570A (fr)
WO (1) WO2016039463A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7402396B2 (ja) 2020-01-07 2023-12-21 株式会社鉄人化計画 感情解析装置、感情解析方法、及び感情解析プログラム

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001083984A (ja) * 1999-09-09 2001-03-30 Alpine Electronics Inc インタフェース装置
JP2007114798A (ja) * 2006-11-14 2007-05-10 Matsushita Electric Ind Co Ltd 楽曲検索装置、楽曲検索方法、及びそのプログラムと記録媒体
JP2014006692A (ja) * 2012-06-25 2014-01-16 Nippon Hoso Kyokai <Nhk> 聴覚印象量推定装置及びそのプログラム

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4622199B2 (ja) * 2001-09-21 2011-02-02 日本ビクター株式会社 楽曲検索装置及び楽曲検索方法
JP4695853B2 (ja) * 2003-05-26 2011-06-08 パナソニック株式会社 音楽検索装置
JP2006155157A (ja) * 2004-11-29 2006-06-15 Sanyo Electric Co Ltd 自動選曲装置
JP4622808B2 (ja) * 2005-10-28 2011-02-02 日本ビクター株式会社 楽曲分類装置、楽曲分類方法、楽曲分類プログラム
JP4810245B2 (ja) * 2006-01-30 2011-11-09 株式会社リコー 画像形成装置の音質評価方法、及び画像形成装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001083984A (ja) * 1999-09-09 2001-03-30 Alpine Electronics Inc インタフェース装置
JP2007114798A (ja) * 2006-11-14 2007-05-10 Matsushita Electric Ind Co Ltd 楽曲検索装置、楽曲検索方法、及びそのプログラムと記録媒体
JP2014006692A (ja) * 2012-06-25 2014-01-16 Nippon Hoso Kyokai <Nhk> 聴覚印象量推定装置及びそのプログラム

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TAKESHI IKEZOE: "Music Database Retrieval System with Sensitivity Words Using Music Sensitivity Space", TRANSACTIONS OF INFORMATION PROCESSING SOCIETY OF JAPAN, vol. 42, no. 12, December 2001 (2001-12-01) *

Also Published As

Publication number Publication date
JP2016057570A (ja) 2016-04-21

Similar Documents

Publication Publication Date Title
Glowinski et al. The movements made by performers in a skilled quartet: a distinctive pattern, and the function that it serves
Waddell et al. Eye of the beholder: Stage entrance behavior and facial expression affect continuous quality ratings in music performance
CN108806656A (zh) 歌曲的自动生成
US20180137425A1 (en) Real-time analysis of a musical performance using analytics
JPWO2006132159A1 (ja) ピッチ周波数を検出する音声解析装置、音声解析方法、および音声解析プログラム
Proutskova et al. Breathy, resonant, pressed–automatic detection of phonation mode from audio recordings of singing
Xu et al. Predicting the preference for sad music: the role of gender, personality, and audio features
Yang et al. Examining emotion perception agreement in live music performance
Rho et al. Bridging the semantic gap in multimedia emotion/mood recognition for ubiquitous computing environment
Delgado et al. A state of the art on computational music performance
Sarin et al. SentiSpotMusic: a music recommendation system based on sentiment analysis
JP6350325B2 (ja) 音声解析装置およびプログラム
Haro et al. The musical avatar: A visualization of musical preferences by means of audio content description
WO2016039463A1 (fr) Dispositif d&#39;analyse acoustique
Mangelsdorf et al. Perception of musicality and emotion in signed songs
Fabiani et al. Systems for interactive control of computer generated music performance
Tulilaulu et al. Data musicalization
Pfleiderer Vocal pop pleasures. Theoretical, analytical and empirical approaches to voice and singing in popular music
Lima et al. Visualizing the semantics of music
WO2016039465A1 (fr) Dispositif d&#39;analyse acoustique
WO2016039464A1 (fr) Dispositif d&#39;analyse acoustique
Liu et al. Emotion Recognition of Violin Music based on Strings Music Theory for Mascot Robot System.
KR102623459B1 (ko) 사용자의 보컬 평가에 기반한 오디션 이벤트 서비스 제공 방법, 장치 및 시스템
Sun et al. Intelligent analysis of music's affective features and expressive pattern
Boutard Solo works of mixed music with live electronics: A qualitative enquiry in timbre and gesture from the performer’s perspective

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15839336

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15839336

Country of ref document: EP

Kind code of ref document: A1