WO2022230425A1 - 特徴量出力モデル生成システム - Google Patents
特徴量出力モデル生成システム Download PDFInfo
- Publication number
- WO2022230425A1 WO2022230425A1 PCT/JP2022/012222 JP2022012222W WO2022230425A1 WO 2022230425 A1 WO2022230425 A1 WO 2022230425A1 JP 2022012222 W JP2022012222 W JP 2022012222W WO 2022230425 A1 WO2022230425 A1 WO 2022230425A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature
- output model
- data
- song
- model generation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
- G10H1/361—Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K15/00—Acoustics not otherwise provided for
- G10K15/04—Sound-producing devices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/081—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for automatic key or tonality recognition, e.g. using musical rules or a knowledge base
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/091—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for performance evaluation, i.e. judging, grading or scoring the musical qualities or faithfulness of a performance, e.g. with respect to pitch, tempo or other timings of a reference performance
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/121—Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
- G10H2240/131—Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
- G10H2240/141—Library retrieval matching, i.e. any of the steps of matching an inputted segment or phrase with musical database contents, e.g. query by humming, singing or playing; the steps may include, e.g. musical analysis of the input, musical feature extraction, query formulation, or details of the retrieval process
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/311—Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/315—Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
- G10H2250/455—Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/54—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval
Definitions
- the present invention relates to a feature output model generation system for inputting information based on singing data, which is time-series data of voice related to singing of a song, and generating a feature output model for outputting the feature of the singing data. .
- Patent Document 1 it has been proposed to recommend songs to a user based on the user's singing history in karaoke.
- the key of the song is important when singing at karaoke. Therefore, it is conceivable to recommend the key in which the user sings, similar to the music recommendation described above.
- lyrics data which is voice data sung by the user in the past. By using singing data, it is possible to recommend a more appropriate key.
- the song data can be used to make the recommendation more appropriately.
- One embodiment of the present invention has been made in view of the above, and an object thereof is to provide a feature output model generation system capable of generating a feature output model that appropriately outputs feature values from song data.
- a feature output model generation system inputs information based on singing data, which is time-series data of voice related to singing of a song.
- a feature output model generation system for generating a feature output model for outputting a feature of data, wherein the singing data acquisition unit acquires singing data for each of a plurality of songs for use in generating the feature output model.
- a division unit that divides each song data acquired by the song data acquisition unit into a plurality of time intervals; and a feature amount output model generation unit for generating a feature amount output model for outputting the feature amount of the song data of the section by machine learning, wherein the feature amount output model generation unit generates the song data related to the same song.
- Machine learning is performed using a criterion based on the distance between the feature amounts and the distance between the feature amounts of singing data related to different songs.
- the distance between the feature values of the song data relating to the same song and the distance between the feature values of the singing data relating to different songs are used as criteria for the machine. Learning is performed to generate a feature output model.
- the feature quantity output model generated in this way can output a feature quantity suitable for use in recommendations that take into consideration the degree of similarity between songs. That is, according to the feature quantity output model generation system according to one embodiment of the present invention, it is possible to generate a feature quantity output model that appropriately outputs the feature quantity from the song data.
- the feature value output model generated by one embodiment of the present invention can output feature values suitable for use in recommendations that take into consideration the degree of similarity between songs. That is, according to one embodiment of the present invention, it is possible to generate a feature quantity output model that appropriately outputs a feature quantity from song data.
- FIG. 4 is a diagram for explaining generation of a feature output model by machine learning; 7 is a graph showing an example of feature quantities output by a feature quantity output model; 4 is a flowchart showing processing executed by the feature output model generation system according to the embodiment of the present invention; It is a figure which shows the hardware constitutions of the feature-value output model generation system which concerns on embodiment of this invention.
- FIG. 1 shows the functional configuration of the feature output model generation system 10 according to this embodiment.
- the feature output model generation system 10 is a system (apparatus) that generates a feature output model.
- the feature value output model (feature value generation model) inputs information based on singing data, which is time-series data of voice related to singing of a song, and generates and outputs the feature value of the singing data.
- the singing data is, for example, data based on the user's voice recorded when the user sings a song in karaoke. Specific data of the song data will be described later.
- the singing data does not necessarily have to be sung by the user as long as it is time-series data of the voice associated with singing of the song.
- the singing data may be data (model data) based on a voice that assumes a singing that can get a perfect score in a karaoke scoring system.
- the feature amount of the singing data output by the feature amount output model is used, for example, for the key (key change) when the user sings a song in karaoke or to recommend a song. How the output feature amount is specifically used will be described later.
- the feature output model generation system 10 generates (infers) a feature output model by machine learning. That is, the feature value output model is a learned model (machine learning model) by machine learning.
- the feature output model generation system 10 is implemented by a computer such as a PC (personal computer) or a server device, for example. Also, the feature output model generation system 10 may be realized by a plurality of computers, that is, computer systems.
- the feature output model generation system 10 includes a singing data acquisition unit 11, a division unit 12, and a feature output model generation unit 13. As shown in FIG. 1, the feature output model generation system 10 includes a singing data acquisition unit 11, a division unit 12, and a feature output model generation unit 13. As shown in FIG. 1, the feature output model generation system 10 includes a singing data acquisition unit 11, a division unit 12, and a feature output model generation unit 13. As shown in FIG. 1, the feature output model generation system 10 includes a singing data acquisition unit 11, a division unit 12, and a feature output model generation unit 13. As shown in FIG.
- the singing data acquisition unit 11 is a functional unit that acquires singing data for each of a plurality of songs, which is used to generate the feature value output model.
- the song data acquisition unit 11 may acquire song data including data indicating the length of the time-series pitch.
- the singing data is, for example, information (music note information) indicating the duration of the same pitch length of voice, as shown in FIG.
- the song data is information for each song, and for example, it is associated with the ID of the song so that it is possible to identify which song the data is related to.
- the duration is indicated, for example, by the elapsed time from the start time when the music is played.
- the values in the "pitch” column shown in FIG. 2 are information indicating the pitch. Specifically, the value in the "pitch” column is the note number (MIDI number, MIDI key). For example, a value of 62 in the "pitch” column corresponds to D4 in the international key.
- the information in the "time_from” and “time_to” columns shown in FIG. 2 indicates the timing at which singing (vocalization) at the corresponding pitch is started and finished.
- the units of "time_from” and "time_to” are seconds.
- the numbers on the left side of FIG. 2 are the serial numbers of the information
- the data in the 0th row of FIG. 2 indicates that the pitch of note number 62 is 0.332 from the point of time of 8.005 seconds to the point of time of 8.337 seconds based on the start time of the music (0 seconds). It shows that it continued for seconds.
- the user Due to the presence of the prelude, the user usually starts singing after a certain amount of time has passed since the start time of the song. Therefore, the first "time_from” is not 0 seconds. In the song data shown in FIG. 2, the first "time_from” is 8.005 seconds.
- the singing data shown in FIG. 2 can be obtained, for example, by analyzing the voice (karaoke music file) sung in a conventional karaoke system or the like.
- the singing data acquisition unit 11 acquires, for example, singing data obtained from voice recorded when a user sings in a karaoke system from the karaoke system.
- the song data acquisition unit 11 may acquire raw data of the user's singing voice, and generate and acquire song data by a conventional method.
- the song data acquisition unit 11 may acquire the song data by a method other than the above.
- the song data acquired by the song data acquiring unit 11 does not need to be based on actual singing by the user, and may be data in the format shown in FIG. 2 assuming actual singing.
- the singing data does not necessarily have to be the one shown in FIG. 2, and may be time-series data of the voice related to the singing of the song.
- the song data acquisition unit 11 acquires multiple pieces of song data for different songs.
- the user who sang the song data acquired by the song data acquisition unit 11 may be any user, or may be a plurality of users. In addition, the user may be the same user as the user who sang the song data for which the feature amount is output by the feature amount output model (that is, for example, the user to be recommended), or a different user. may
- the song data acquisition unit 11 acquires a sufficient number of song data to perform machine learning, which will be described later.
- the song data acquiring section 11 outputs the acquired song data to the dividing section 12 .
- the division unit 12 is a functional unit that divides each song data acquired by the song data acquisition unit 11 into a plurality of temporal intervals. As will be described later, machine learning is performed using singing data divided into temporal intervals.
- the dividing unit 12 performs division as follows.
- the division unit 12 receives the song data from the song data acquisition unit 11.
- the dividing unit 12 divides each input song data into a plurality of sections based on a division rule stored in advance.
- the dividing unit 12 divides each song data into a certain number of sections.
- the dividing unit 12 equally divides the time period from the first "time_from” to the last "time_to” of the song data, i.e., the time period related to the song data so that the above constant number is obtained. Timing (time) is used as a delimiter.
- timing used for the delimiter is included in a time zone with the same continuous pitch (that is, the time zone from "time_from” to "time_to” associated with one "pitch” value)
- the leading edge of the time zone and the end, whichever is closer to the relevant timing is used as the section delimiter.
- the same continuous pitch is not divided into a plurality of sections. For example, data Nos. 0 to 4 shown in FIG. 2 are data of one section (first section).
- the number of sections to be divided may not be a fixed number, but may be a number preset for each piece of music.
- the timing used for the break may be a timing set in advance for each piece of music, instead of the timing for equal division as described above.
- the dividing unit 12 outputs the song data to be divided and the information indicating the divided sections to the feature output model generating unit 13 .
- the feature output model generating unit 13 is a functional unit that generates a feature output model from the song data divided by the dividing unit 12 by machine learning.
- the feature amount output model is a model that inputs information based on the song data of the divided section and outputs the feature amount of the song data of the section.
- the feature output model generation unit 13 performs machine learning based on criteria based on the distance between the feature amounts of the song data of the same song and the distance between the feature amounts of the song data of different songs.
- the feature quantity output model generation unit 13 may perform machine learning so that the distance between feature quantities of song data relating to the same song is shorter than the distance between feature quantities of singing data relating to different songs. .
- the feature quantity output model generation unit 13 may determine the section of the song data used for machine learning based on the distance between the feature quantities output by the feature quantity output model in the middle of generation.
- the feature output model generation unit 13 divides the data indicating the length of the pitch included in the singing data divided by the division unit 12 into words that are character strings corresponding to the length of the pitch for each consecutive identical pitch. It may be transformed to generate a feature output model that inputs information based on the transformed word.
- the feature quantity output by the feature quantity output model is a vector with a preset number of dimensions (N-dimensional vector). That is, the feature output model is a model for embedding. As will be described later, the song data of the section is converted into words, and feature amounts are generated from the converted words.
- the feature quantity output model is configured including, for example, a neural network. More specifically, the feature output model is LSTM (Long Short Term Memory) suitable for time series data.
- the feature amount output model may be any model other than the above as long as it is a model that is generated by machine learning, inputs song data of a section, and outputs the feature amount of the song data of the section.
- the degree of similarity (similarity) between songs can be a very important feature of the user's taste or ease of singing.
- the degree of similarity between pieces of music refers to how much the pieces of music have the same rhythm, melody, and scale patterns, and how close the pitches of the sounds are.
- the feature value output model inputs information based on singing data.
- the feature amount output model is generated in consideration of the distance between the feature amounts of the singing data related to the music.
- the feature quantity output by the feature quantity output model generated in this manner takes into account the degree of similarity between the pieces of music.
- Music is expressed by rhythm, melody, scale patterns, and the like, but conventionally, it has been difficult to calculate the degree of similarity between songs with respect to individual features such as rhythm, melody, and scale patterns.
- the feature quantity output model generation unit 13 generates a feature quantity output model as follows.
- the feature output model generation unit 13 receives the information indicating the song data and the divided sections from the division unit 12 .
- the feature output model generation unit 13 converts each continuous pitch of the singing data (each line in FIG. 2) into a word, which is a character string corresponding to the length of the pitch.
- the feature quantity output model generation unit 13 calculates the temporal length of the same continuous pitch. The length in seconds can be calculated by subtracting the "time_from" value from the "time_to” value.
- the feature quantity output model generation unit 13 divides the calculated length by a preset unit time.
- the preset unit time is, for example, 0.1 seconds.
- the feature output model generation unit 13 rounds the calculated value to an integer. Rounding is done by a preset method, eg rounding, rounding down or rounding up.
- the feature quantity output model generation unit 13 sets a character string in which the values indicating the pitch are continuously arranged by the calculated integers as a continuous word with the same pitch.
- the words above indicate how long the pitch lasts. In the example above, the pitch is 0. Wording is done based on how many seconds it lasts. A word expresses the pitch and duration of a sound.
- the feature output model generation unit 13 converts all pitch data into words.
- the feature output model generation unit 13 thus encodes the singing data based on the elapsed time of the same pitch (same sound). As a result, irregularly continuous song data (sound information) can be treated as information used for generating a feature output model. With the above encoding, it is expected that information with the same pitch but slightly different lengths, information with close pitches but the same length, etc. can be treated as similar information.
- the feature output model generation unit 13 converts each converted word into an input feature to be input to the feature output model as preprocessing for generating the feature output model.
- feature amount simply refers to the feature amount that is the output of the feature amount output model, and the feature amount converted from the word is called the input feature amount.
- the input feature amount is a vector with a preset number of dimensions. Conversion from words to input feature values is performed, for example, by a conventional natural language processing technique. Specifically, the conversion is performed by a model generated by fastText. The above words may be used for model generation by machine learning fastText.
- fastText it is possible to take into consideration the grouping of characters that make up a given word, and group words with similar fonts into meaningful groups. Therefore, although "7171” and “717171” are different words, they are treated as close in terms of semantic distance because of the commonality of their component "71".
- the feature quantity output model generation unit 13 uses the input feature quantity converted from the words included in the section in the order in which the words appear in the singing data as input to the feature quantity output model of the section.
- the input layer is provided with neurons as many as the number of dimensions of the vector that is the feature value for input.
- neurons of the number of dimensions (N described above) of vectors, which are feature amounts, are provided in the output layer.
- the feature amount output model sequentially inputs the input feature amounts included in the interval for each input feature amount.
- the feature value output model outputs feature values when all input feature values included in the section are input.
- the feature quantity output model generation unit 13 performs machine learning for generating a feature quantity output model with the information of the three sections as one set.
- the three segments are a predetermined segment (anchor), another segment of the same music as the predetermined segment (positive), and another segment of the music different from the predetermined segment (negative).
- the feature output model generation unit 13 selects (extracts) the above three sections.
- the feature quantity output model generation unit 13 performs machine learning using the information (set of input feature quantities) of the three sections selected as described above shown in FIG. 3 as input for the feature quantity output model. From the feature amount output model, feature amounts are obtained as outputs for each of the three sections. Specifically, as shown in FIG. 3, Anchor Embedding, which is the feature quantity of the anchor, is obtained from the Anchor Seq, which is the input anchor information, by Embedding Net, which is the feature quantity output model. Positive Embedding, which is a positive feature quantity, is obtained from Positive Seq, which is positive information. From Negative Seq, which is negative information, Negative Embedding, which is a negative feature quantity, is obtained. Although words are shown to be input to the feature output model in FIG. 3, input feature values are actually input to the feature output model.
- the feature quantity output model generation unit 13 performs machine learning based on the output feature quantity, that is, the above-mentioned Anchor Embedding, Positive Embedding and Negative Embedding. As shown on the right side of FIG. 3, the feature output model generation unit 13 determines that the distance D1 between Anchor Embedding and Positive Embedding (distance between anchor-positive) is the distance between Anchor Embedding and Negative Embedding (distance between anchor-negative distance) Machine learning is performed so that the distance becomes shorter than D2.
- the above distance may be the Euclidean distance in the N-dimensional space, which is the vector space (feature space) of the feature quantity, or may be any other distance.
- the feature output model generation unit 13 performs machine learning with the following loss function Loss( ).
- Loss(A,P,N) Max(
- A Anchor Seq.
- P is the Positive Seq.
- N is Negative Seq.
- f(X) is an Embedding vector obtained as an output when X is input to the Embedding Net.
- is anchor-positive.
- is the anchor-negative distance.
- ⁇ is a preset hyperparameter, and is a value indicating how much the difference between the anchor-positive distance and the anchor-negative distance should be.
- Max(X, Y) is a function whose function value is the larger of X and Y. Machine learning itself based on the loss function may be performed in the same manner as before.
- the feature output model generation unit 13 selects three sections, anchor, positive, and negative, from the section indicated by the information input from the dividing unit 12, and performs machine learning using the information of the selected section. .
- the feature quantity output model generating unit 13 repeatedly performs selection of three sections and machine learning to generate a feature quantity output model. For example, the feature output model generation unit 13 repeats the above operations until the generation of the feature output model converges based on a preset condition, or a predetermined number of times, in the same manner as in the conventional art, and outputs the feature output. Generate a model.
- the selection of the three sections, anchor, positive and negative is performed as follows.
- the feature output model generation unit 13 selects three sections by random sampling.
- the feature output model generation unit 13 randomly selects a song for each iteration, that is, for each epoch of learning, randomly selects two sections included in the selected song, and selects anchor and be positive.
- the feature output model generation unit 13 randomly selects a song other than the selected song, and randomly selects one section included in the selected song to be negative.
- the feature quantity output model generation unit 13 may select negative using a feature quantity output model that is in the process of being generated.
- anchors and positives are selected as above.
- the feature output model generation unit 13 randomly selects a song other than the songs containing anchors and positives. At this time, there may be a plurality of different songs.
- the feature output model generation unit 13 randomly selects sections included in the selected music as negative candidates. At this time, there are a plurality of negative candidates, for example, a preset number (N).
- the feature quantity output model generation unit 13 uses the feature quantity output model that is in the process of being generated to calculate the feature quantity of the anchor and the feature quantity of each of the negative candidates. Subsequently, the feature quantity output model generating unit 13 calculates the distance between the anchor and the negative candidate for each negative candidate from the calculated feature quantity. The feature quantity output model generation unit 13 determines negative candidates whose calculated distance is within a preset threshold as negatives to be used for machine learning. Adopting negatives in this way requires distance calculation for each sampling, but on the other hand, learning progresses intensively for negatives, which should originally be far from the anchor.
- Negative candidates may be all segments of all songs other than the anchor song. can be used to determine negative. As a result, the processing speed of machine learning can be increased. Note that the processing of determining the intervals used for machine learning may be performed as mini-batch processing.
- the feature output model generation unit 13 outputs the generated feature output model. For example, transmitting or outputting the feature output model to another device or module that uses the feature output model.
- the feature output model generation unit 13 may store the generated feature output model in the feature output model generation system 10 so that it can be used by other devices or modules that use the feature output model.
- FIG. 4 shows an example of the feature quantity output by the feature quantity output model generated by the feature quantity output model generation system 10.
- one point corresponds to the feature amount of one section. Points of the same color (same density) correspond to points of the same song.
- the feature quantity which is a high-dimensional vector, is converted (dimensionally compressed) into a three-dimensional vector.
- the feature amounts of sections included in the same piece of music are the feature amounts of positions close to each other.
- the points included in the rectangular areas A1 and A2 correspond to the feature amounts of the sections included in the same piece of music for each of the areas A1 and A2. Note that even if the songs are different, the feature values of the intervals between songs with similar melody, genre, etc.
- the distance between the feature amounts of the sections of the pops songs is closer than the distance between the feature amounts of the sections of the pops song and the enka song.
- a feature output model which is a learned model generated by the feature output model generation system 10, is assumed to be used as a program module that is part of artificial intelligence software.
- the feature output model is used, for example, in a computer having a CPU (Central Processing Unit) and memory, and the CPU of the computer operates according to instructions from the feature output model stored in the memory.
- the CPU of the computer operates to input information to the feature output model, perform calculations according to the feature output model, and output results from the feature output model, according to the command.
- the CPU of the computer inputs information to the input layer of the neural network according to the command, performs calculations based on weighting coefficients that have been learned in the neural network, and outputs the results from the output layer of the neural network. It works to output.
- the feature value output model generated as above is used as follows.
- the feature output model is used for recommending keys or songs when songs are sung in karaoke. Specifically, it is used when making the above-mentioned recommendation based on past singing performance data of the user targeted for the recommendation.
- the song record data includes the same song data as the song data acquired by the song data acquisition unit 11 described above.
- a feature value is generated from the song data using a feature value output model.
- information for example, the above-described input feature value
- Information to be input to the feature value output model is information for each section of the song data. Division into intervals is performed in the same manner as above.
- the feature amount for each section may be arranged in the order of the sections for each piece of music and concatenated to form the feature amount for each piece of music. That is, concatenate of feature amounts may be performed.
- the dimension of the vector which is the feature amount for each piece of music
- the number of dimensions of the vector which is the feature amount output by the feature amount output model, ⁇ the number of sections of the piece of music.
- vectors may be integrated (averaged or added) at the time of concatenation to obtain feature values for each song of lower dimension.
- singing data which is model data for each key of a piece of music that the user wants to sing
- the pitch value shown in FIG. 2 may be changed. For example, if the pitch value is 62, change it to 63 when moving up one key. As a result, if the word before change corresponding to the pitch is "626262", the word after change is "636363".
- the recommendation is, for example, inputting the feature amount related to the singing record and the feature amount related to the model data, and outputting a value indicating how well the user's singing matches the key related to the model data. This is done by using the Dense model.
- the above values are calculated using a plurality of mutually different key model data, and the key with the highest value is recommended.
- the recommendation model may be created using a conventional machine learning method or the like.
- the model data of the song that is a candidate for recommending the user to sing should be used.
- the recommendation is, for example, a recommendation model that outputs a value indicating how much the song related to the model data should be recommended, for example, by inputting the feature value related to the singing performance and the feature value related to the model data. This is done by using The above value may be calculated using a plurality of sample data of songs different from each other, and the song with the highest value may be recommended.
- the above is an example of recommendations using feature values output by the feature value output model, and may be used for recommendations other than the above.
- the feature output model may be used for purposes other than recommendation.
- the functions of the feature output model generation system 10 according to the present embodiment have been described above.
- the song data acquisition unit 11 acquires song data for each of a plurality of songs (S01). Subsequently, each song data is divided into a plurality of time intervals by the division unit 12 (S02). Subsequently, the singing data of each section is converted into the input format of the feature output model by the feature output model generation unit 13 (S03). Specifically, the singing data is converted into words according to the pitch length. Also, the word is converted into an input feature amount.
- the section used for machine learning is determined by the feature output model generation unit 13 (S04). Specifically, the above-described three sections of anchor, positive, and negative are determined. Subsequently, the feature amount output model generating unit 13 performs machine learning for generating a feature amount output model using the determined section information (S05). Specifically, as described above, machine learning is performed based on a criterion based on the distance between the feature amounts of the song data of the same song and the distance between the feature amounts of the song data of different songs. Subsequently, the feature output model generation unit 13 determines whether or not to end the machine learning (S06).
- the above processing of S04 to S06 is performed again.
- the generated feature output model is output from the feature output model generation unit 13 (S07). It is a process executed by the feature output model generation system 10 according to the present embodiment.
- machine learning is performed according to a criterion based on the distance between the feature amounts of the song data relating to the same song and the distance between the feature amounts of the song data relating to different songs.
- a quantity output model is generated.
- the feature quantity output model generated in this manner can output a feature quantity suitable for use in, for example, recommendations that take into consideration the degree of similarity between songs as described with reference to FIG. That is, according to this embodiment, it is possible to generate a feature quantity output model that appropriately outputs a feature quantity from song data. As a result, similar songs can be extracted, and keys or songs can be recommended with higher accuracy.
- machine learning may be performed so that the distance between the feature amounts of the song data related to the same song is shorter than the distance between the feature amounts of the song data related to different songs. good. According to this configuration, it is possible to reliably generate a feature quantity output model that appropriately outputs the feature quantity from the song data. However, if the machine learning is performed based on a standard based on the distance between the feature amounts of the song data relating to the same song and the distance between the feature amounts of the singing data relating to different songs, it is not necessarily as described above. does not have to be done on
- the section of the song data used for machine learning (for example, the above-mentioned anchor, positive and negative three interval) may be determined. According to this configuration, it is possible to efficiently perform machine learning as described above, and as a result, it is possible to generate a feature output model that is more advanced in learning with respect to the same learning process. Alternatively, it is possible to reduce the learning process in generating the feature quantity output model. However, the determination of the interval used for machine learning need not be performed as described above.
- the song data may include data indicating the length of the time-series pitch. According to this configuration, it is possible to reliably and appropriately generate a feature output model.
- the data indicating the length of the pitch included in the divided singing data is converted into a word, which is a character string corresponding to the length of the pitch for each consecutive identical pitch, and information based on the converted word is obtained.
- a feature value output model to be input may be generated. According to this configuration, the song data can be handled appropriately and easily by using the above-described conventional natural language processing technique in generating the feature quantity output model. As a result, a feature output model can be generated appropriately and easily.
- the singing data does not need to be the above data, and may be time-series data of the voice related to the singing of the song.
- the singing data need not be treated as words as described above, and may be treated in any format as long as it can be input to the feature value output model.
- each functional block may be implemented using one device that is physically or logically coupled, or directly or indirectly using two or more devices that are physically or logically separated (e.g. , wired, wireless, etc.) and may be implemented using these multiple devices.
- a functional block may be implemented by combining software in the one device or the plurality of devices.
- Functions include judging, determining, determining, calculating, calculating, processing, deriving, investigating, searching, checking, receiving, transmitting, outputting, accessing, resolving, selecting, choosing, establishing, comparing, assuming, expecting, assuming, Broadcasting, notifying, communicating, forwarding, configuring, reconfiguring, allocating, mapping, assigning, etc. can't
- a functional block (component) responsible for transmission is called a transmitting unit or transmitter.
- the implementation method is not particularly limited.
- the feature output model generation system 10 in one embodiment of the present disclosure may function as a computer that performs the information processing of the present disclosure.
- FIG. 6 is a diagram showing an example of a hardware configuration of the feature output model generation system 10 according to an embodiment of the present disclosure.
- the feature output model generation system 10 described above may be physically configured as a computer device including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like. .
- the term "apparatus” can be read as a circuit, device, unit, or the like.
- the hardware configuration of the feature output model generation system 10 may be configured to include one or more of each device shown in the figure, or may be configured without some of the devices.
- Each function in the feature output model generation system 10 is executed by the processor 1001 by loading predetermined software (program) onto hardware such as the processor 1001 and the memory 1002, and the processor 1001 performs calculations and controls communication by the communication device 1004. or by controlling at least one of reading and writing data in the memory 1002 and the storage 1003 .
- the processor 1001 for example, operates an operating system and controls the entire computer.
- the processor 1001 may be configured by a central processing unit (CPU) including an interface with peripheral devices, a control device, an arithmetic device, registers, and the like.
- CPU central processing unit
- each function in the feature output model generation system 10 described above may be implemented by the processor 1001 .
- the processor 1001 reads programs (program codes), software modules, data, etc. from at least one of the storage 1003 and the communication device 1004 to the memory 1002, and executes various processes according to them.
- programs program codes
- software modules software modules
- data etc.
- the program a program that causes a computer to execute at least part of the operations described in the above embodiments is used.
- each function in the feature output model generation system 10 may be stored in the memory 1002 and implemented by a control program running on the processor 1001 .
- FIG. Processor 1001 may be implemented by one or more chips.
- the program may be transmitted from a network via an electric communication line.
- the memory 1002 is a computer-readable recording medium, and is composed of at least one of, for example, ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), RAM (Random Access Memory), etc. may be
- ROM Read Only Memory
- EPROM Erasable Programmable ROM
- EEPROM Electrical Erasable Programmable ROM
- RAM Random Access Memory
- the memory 1002 may also be called a register, cache, main memory (main storage device), or the like.
- the memory 1002 can store executable programs (program code), software modules, etc. for performing information processing according to an embodiment of the present disclosure.
- the storage 1003 is a computer-readable recording medium, for example, an optical disc such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disc, a magneto-optical disc (for example, a compact disc, a digital versatile disc, a Blu-ray disk), smart card, flash memory (eg, card, stick, key drive), floppy disk, magnetic strip, and/or the like.
- Storage 1003 may also be called an auxiliary storage device.
- a storage medium included in the feature output model generation system 10 may be, for example, a database including at least one of the memory 1002 and the storage 1003, a server, or other suitable medium.
- the communication device 1004 is hardware (transmitting/receiving device) for communicating between computers via at least one of a wired network and a wireless network, and is also called a network device, a network controller, a network card, a communication module, or the like.
- the input device 1005 is an input device (for example, keyboard, mouse, microphone, switch, button, sensor, etc.) that receives input from the outside.
- the output device 1006 is an output device (eg, display, speaker, LED lamp, etc.) that outputs to the outside. Note that the input device 1005 and the output device 1006 may be integrated (for example, a touch panel).
- Each device such as the processor 1001 and the memory 1002 is connected by a bus 1007 for communicating information.
- the bus 1007 may be configured using a single bus, or may be configured using different buses between devices.
- the feature output model generation system 10 includes hardware such as a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), and an FPGA (Field Programmable Gate Array). hardware, and part or all of each functional block may be realized by the hardware.
- processor 1001 may be implemented using at least one of these pieces of hardware.
- Input/output information may be stored in a specific location (for example, memory) or managed using a management table. Input/output information and the like can be overwritten, updated, or appended. The output information and the like may be deleted. The entered information and the like may be transmitted to another device.
- the determination may be made by a value represented by one bit (0 or 1), by a true/false value (Boolean: true or false), or by numerical comparison (for example, a predetermined value).
- notification of predetermined information is not limited to being performed explicitly, but may be performed implicitly (for example, not notifying the predetermined information). good too.
- Software whether referred to as software, firmware, middleware, microcode, hardware description language or otherwise, includes instructions, instruction sets, code, code segments, program code, programs, subprograms, and software modules. , applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, and the like.
- software, instructions, information, etc. may be transmitted and received via a transmission medium.
- the software uses at least one of wired technology (coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), etc.) and wireless technology (infrared, microwave, etc.) to website, Wired and/or wireless technologies are included within the definition of transmission medium when sent from a server or other remote source.
- wired technology coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), etc.
- wireless technology infrared, microwave, etc.
- system and “network” used in this disclosure are used interchangeably.
- information, parameters, etc. described in the present disclosure may be expressed using absolute values, may be expressed using relative values from a predetermined value, or may be expressed using other corresponding information. may be represented.
- determining and “determining” used in this disclosure may encompass a wide variety of actions.
- “Judgement” and “determination” are, for example, judging, calculating, computing, processing, deriving, investigating, looking up, searching, inquiring (eg, lookup in a table, database, or other data structure), ascertaining as “judged” or “determined”, and the like.
- "judgment” and “determination” are used for receiving (e.g., receiving information), transmitting (e.g., transmitting information), input, output, access (accessing) (for example, accessing data in memory) may include deeming that a "judgment” or “decision” has been made.
- judgment and “decision” are considered to be “judgment” and “decision” by resolving, selecting, choosing, establishing, comparing, etc. can contain.
- judgment and “decision” may include considering that some action is “judgment” and “decision”.
- judgment (decision) may be read as “assuming”, “expecting”, “considering”, or the like.
- connection means any direct or indirect connection or coupling between two or more elements, It can include the presence of one or more intermediate elements between two elements being “connected” or “coupled.” Couplings or connections between elements may be physical, logical, or a combination thereof. For example, “connection” may be read as "access”.
- two elements are defined using at least one of one or more wires, cables, and printed electrical connections and, as some non-limiting and non-exhaustive examples, in the radio frequency domain. , electromagnetic energy having wavelengths in the microwave and optical (both visible and invisible) regions, and the like.
- any reference to elements using the "first,” “second,” etc. designations used in this disclosure does not generally limit the quantity or order of those elements. These designations may be used in this disclosure as a convenient method of distinguishing between two or more elements. Thus, reference to a first and second element does not imply that only two elements can be employed or that the first element must precede the second element in any way.
- a and B are different may mean “A and B are different from each other.”
- the term may also mean that "A and B are different from C”.
- Terms such as “separate,” “coupled,” etc. may also be interpreted in the same manner as “different.”
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Reverberation, Karaoke And Other Acoustics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2023517139A JP7784420B2 (ja) | 2021-04-27 | 2022-03-17 | 特徴量出力モデル生成システム |
| US18/554,031 US20240371394A1 (en) | 2021-04-27 | 2022-03-17 | Feature amount output model generation system |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2021074895 | 2021-04-27 | ||
| JP2021-074895 | 2021-04-27 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022230425A1 true WO2022230425A1 (ja) | 2022-11-03 |
Family
ID=83848395
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2022/012222 Ceased WO2022230425A1 (ja) | 2021-04-27 | 2022-03-17 | 特徴量出力モデル生成システム |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US20240371394A1 (https=) |
| JP (1) | JP7784420B2 (https=) |
| WO (1) | WO2022230425A1 (https=) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2004510176A (ja) * | 2000-06-29 | 2004-04-02 | ミュージックゲノム.コム インコーポレイテッド | セルラー・ネットワークを通して音楽コンテンツを配信するために、音楽に対する好みを予測するためのシステムの使用 |
| US20150220633A1 (en) * | 2013-03-14 | 2015-08-06 | Aperture Investments, Llc | Music selection and organization using rhythm, texture and pitch |
| WO2021005985A1 (ja) * | 2019-07-05 | 2021-01-14 | 株式会社Nttドコモ | 楽曲レコメンド用モデル生成システム及び楽曲レコメンドシステム |
| WO2021005984A1 (ja) * | 2019-07-05 | 2021-01-14 | 株式会社Nttドコモ | 楽曲レコメンド用モデル生成システム及び楽曲レコメンドシステム |
| CN113297412A (zh) * | 2020-02-24 | 2021-08-24 | 北京达佳互联信息技术有限公司 | 音乐推荐方法、装置、电子设备和存储介质 |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7825321B2 (en) * | 2005-01-27 | 2010-11-02 | Synchro Arts Limited | Methods and apparatus for use in sound modification comparing time alignment data from sampled audio signals |
| JP5147389B2 (ja) * | 2007-12-28 | 2013-02-20 | 任天堂株式会社 | 楽曲提示装置、楽曲提示プログラム、楽曲提示システム、楽曲提示方法 |
| US12175957B2 (en) * | 2018-08-06 | 2024-12-24 | Spotify Ab | Automatic isolation of multiple instruments from musical mixtures |
| US12141708B2 (en) * | 2019-07-05 | 2024-11-12 | Ntt Docomo, Inc. | Type estimation model generation system and type estimation system |
| JP7714543B2 (ja) * | 2020-06-09 | 2025-07-29 | 株式会社Nttドコモ | 推奨情報提供装置 |
| JP7682175B2 (ja) * | 2020-06-09 | 2025-05-23 | 株式会社Nttドコモ | 予測装置 |
| US20240370489A1 (en) * | 2021-05-10 | 2024-11-07 | Ntt Docomo, Inc. | Recommendation system |
-
2022
- 2022-03-17 JP JP2023517139A patent/JP7784420B2/ja active Active
- 2022-03-17 US US18/554,031 patent/US20240371394A1/en active Pending
- 2022-03-17 WO PCT/JP2022/012222 patent/WO2022230425A1/ja not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2004510176A (ja) * | 2000-06-29 | 2004-04-02 | ミュージックゲノム.コム インコーポレイテッド | セルラー・ネットワークを通して音楽コンテンツを配信するために、音楽に対する好みを予測するためのシステムの使用 |
| US20150220633A1 (en) * | 2013-03-14 | 2015-08-06 | Aperture Investments, Llc | Music selection and organization using rhythm, texture and pitch |
| WO2021005985A1 (ja) * | 2019-07-05 | 2021-01-14 | 株式会社Nttドコモ | 楽曲レコメンド用モデル生成システム及び楽曲レコメンドシステム |
| WO2021005984A1 (ja) * | 2019-07-05 | 2021-01-14 | 株式会社Nttドコモ | 楽曲レコメンド用モデル生成システム及び楽曲レコメンドシステム |
| CN113297412A (zh) * | 2020-02-24 | 2021-08-24 | 北京达佳互联信息技术有限公司 | 音乐推荐方法、装置、电子设备和存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| JPWO2022230425A1 (https=) | 2022-11-03 |
| JP7784420B2 (ja) | 2025-12-11 |
| US20240371394A1 (en) | 2024-11-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Jamdar et al. | Emotion analysis of songs based on lyrical and audio features | |
| Chowdhury et al. | Towards explainable music emotion recognition: The route via mid-level features | |
| CN100454298C (zh) | 旋律数据库搜索 | |
| Pearce et al. | Melodic grouping in music information retrieval: New methods and applications | |
| US20090306797A1 (en) | Music analysis | |
| JP2003519845A (ja) | 音楽検索エンジン | |
| CN111046217B (zh) | 组合歌曲生成方法、装置、设备以及存储介质 | |
| Granroth-Wilding et al. | Harmonic analysis of music using combinatory categorial grammar | |
| Lazzari et al. | Pitchclass2vec: Symbolic music structure segmentation with chord embeddings | |
| JP7724205B2 (ja) | 分割装置 | |
| Park et al. | Music plagiarism detection based on siamese cnn | |
| JP2000347659A (ja) | 音楽検索装置,音楽検索方法および音楽検索プログラムを記録した記録媒体 | |
| CN111863030B (zh) | 音频检测方法及装置 | |
| JP7784420B2 (ja) | 特徴量出力モデル生成システム | |
| CN112989109A (zh) | 一种音乐结构分析方法、电子设备及存储介质 | |
| Duane | Melodic patterns and tonal cadences: Bayesian learning of cadential categories from contrapuntal information | |
| JP2025106476A (ja) | 文章提供方法、プログラムおよび文章提供装置 | |
| CN115631736B (zh) | 歌曲旋律创作方法、介质、装置和计算设备 | |
| Dixon et al. | Probabilistic and logic-based modelling of harmony | |
| Cherla et al. | Automatic phrase continuation from guitar and bass guitar melodies | |
| JP3934556B2 (ja) | 信号識別子の抽出方法及びその装置、信号識別子からデータベースを作成する方法及びその装置、及び、検索時間領域信号を参照する方法及びその装置 | |
| CN117390217A (zh) | 歌曲片段的确定方法、装置、设备和介质 | |
| JP7630356B2 (ja) | 学習済みモデル生成システム | |
| Huang et al. | A repeating pattern based Query-by-Humming fuzzy system for polyphonic melody retrieval | |
| WO2021005984A1 (ja) | 楽曲レコメンド用モデル生成システム及び楽曲レコメンドシステム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22795347 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023517139 Country of ref document: JP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22795347 Country of ref document: EP Kind code of ref document: A1 |