CN117334206A - Music generation method, device, electronic equipment and storage medium - Google Patents

Music generation method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117334206A
CN117334206A CN202311346438.0A CN202311346438A CN117334206A CN 117334206 A CN117334206 A CN 117334206A CN 202311346438 A CN202311346438 A CN 202311346438A CN 117334206 A CN117334206 A CN 117334206A
Authority
CN
China
Prior art keywords
sound
music
activity
music generation
participant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311346438.0A
Other languages
Chinese (zh)
Inventor
姜波
吴卓浩
徐昕
王超
金准
高晨晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Apilang Creative Technology Co ltd
Original Assignee
Beijing Apilang Creative Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Apilang Creative Technology Co ltd filed Critical Beijing Apilang Creative Technology Co ltd
Priority to CN202311346438.0A priority Critical patent/CN117334206A/en
Publication of CN117334206A publication Critical patent/CN117334206A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The embodiment of the disclosure provides a music generation method, a device, electronic equipment and a storage medium. The music generation method comprises the following steps: acquiring one or more sounds triggered by one or more participants of an activity, the activity comprising the one or more participants; determining a musical instrument corresponding to the one or more sounds; generating a rhythm music score using the music generation model; music is generated based on the rhythmic music score and replacing the sound of the corresponding instrument with the one or more sounds. Embodiments of the present disclosure are capable of generating personalized music for a participant in an activity.

Description

Music generation method, device, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of music generation, and in particular relates to a music generation method, a device, electronic equipment and a storage medium.
Background
In computer games or playgrounds, museums, etc., audio is an indispensable element. In places such as computer games or playgrounds, museums, etc., audio is often used as a background, and the lack of audio is relatively single in relevant interactions and experiences.
Disclosure of Invention
According to an aspect of the embodiments of the present disclosure, there is provided a music generating method including: acquiring one or more sounds triggered by one or more participants of an activity, the activity comprising the one or more participants; determining a musical instrument corresponding to the one or more sounds; generating a rhythm music score using the music generation model; music is generated based on the rhythmic music score and replacing the sound of the corresponding instrument with the one or more sounds.
Optionally, the method further comprises: determining a music generation condition, wherein the music generation condition comprises a text condition and/or a melody condition; wherein the generating of the rhythm music score using the music generation model includes: a musical score is generated based on the music generation conditions using a music generation model.
Optionally, the determining a music generating condition includes: a music generation condition is determined based on the activity scene of the activity, the text condition including a text description associated with the activity scene, and/or the melody condition including an audio file of a reference melody associated with the activity scene.
Optionally, the determining a music generating condition includes: receiving input from one or more participants, wherein the input includes a textual description of the music and/or an audio file referencing the melody; a music generation condition is determined based on the input.
Optionally, the determining the musical instrument corresponding to the sound includes: extracting a sound feature from the sound, the sound feature representing a timbre of the sound; determining a timbre similarity of the sound to the plurality of instruments based on the sound characteristics of the sound and the sound characteristics of the plurality of instruments; the instrument to which the sound corresponds is determined based on the similarity of the sound and the timbres of the plurality of instruments.
Optionally, the determining the musical instrument corresponding to the sound includes: the instrument to which the sound corresponds is identified using a pre-trained deep learning model.
Optionally, the capturing the one or more sounds triggered by the one or more participants of the activity includes: interaction of one or more participants in response to the activity; under the condition that the interaction triggers sound collection, selecting sound corresponding to at least one participant from a preset sound library to obtain sound triggered by the at least one participant; and/or receiving sound recorded by at least one participant to obtain sound triggered by the at least one participant.
According to another aspect of the embodiments of the present disclosure, there is provided a music generating apparatus including: an acquisition module for acquiring one or more sounds triggered by one or more participants of an activity, the activity including the one or more participants; a processing module for determining one or more instruments corresponding to the sounds; generating a rhythm music score using the music generation model; music is generated based on the rhythmic music score and replacing the sound of the corresponding instrument with the one or more sounds.
According to another aspect of the embodiments of the present disclosure, there is provided an electronic device including: a processor; and a memory storing a program, wherein the program comprises instructions that when executed by the processor cause the processor to perform the method of embodiments of the present disclosure.
According to another aspect of the disclosed embodiments, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of the disclosed embodiments.
According to one or more technical schemes provided by the embodiment of the disclosure, one or more sounds triggered by one or more participants of an activity are acquired, musical instruments corresponding to the one or more sounds are determined, a music score is generated by using a music generation model, music is generated by using the music score, the sounds of the corresponding musical instruments are replaced by the one or more sounds based on the music score, and personalized music can be generated for the participants in the activity.
Drawings
Further details, features and advantages of the present disclosure are disclosed in the following description of exemplary embodiments, with reference to the following drawings, wherein:
fig. 1 illustrates a flowchart of a music generation method according to an exemplary embodiment of the present disclosure;
fig. 2 illustrates a flowchart of a music generation method according to another exemplary embodiment of the present disclosure;
FIG. 3 is a hardware schematic of one implementation of a system provided by an embodiment of the present disclosure;
FIG. 4 is a hardware schematic diagram of an implementation of an interactive device in a system according to an embodiment of the disclosure;
FIG. 5 is a hardware schematic diagram of another implementation of an interactive device in a system according to an embodiment of the disclosure;
FIG. 6 illustrates a flow chart of a method of implementing music generation in a system of an embodiment of the present disclosure;
fig. 7 shows a schematic block diagram of a music generating apparatus according to an exemplary embodiment of the present disclosure;
fig. 8 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
It should be noted that, the execution body of the music generating method provided in the embodiment of the present disclosure may be one or more electronic devices, which is not limited in this disclosure; the electronic device may be a terminal (i.e. a client) or a server, and when the execution body includes a plurality of electronic devices and at least one terminal and at least one server are included in the plurality of electronic devices, the music generating method provided by the embodiment of the present disclosure may be executed jointly by the terminal and the server. Accordingly, the terminals referred to herein may include, but are not limited to: smart phones, tablet computers, notebook computers, desktop computers, smart watches, intelligent voice interaction devices, intelligent home appliances, vehicle terminals, and the like. The server mentioned herein may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing (cloud computing), cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing services such as big data and artificial intelligence platforms, and so on.
Based on the above description, the presently disclosed embodiments propose a music generation method that can be executed by the above-mentioned electronic device (terminal or server); alternatively, the music generation method may be performed by the terminal and the server together.
Fig. 1 shows a flowchart of a music generation method according to an exemplary embodiment of the present disclosure, and as shown in fig. 1, the music generation method includes steps S101 to S104.
Step S101, one or more sounds triggered by one or more participants of an activity are acquired, the activity including the one or more participants.
As one embodiment, capturing one or more sounds triggered by one or more participants of an activity, comprises: interaction of one or more participants in response to the activity; and under the condition that the interaction triggers sound collection, selecting sound corresponding to at least one participant from a preset sound library to obtain the sound triggered by the at least one participant.
As one embodiment, capturing one or more sounds triggered by one or more participants of an activity, comprises: interaction of one or more participants in response to the activity; and under the condition that the interaction triggers sound collection, receiving sound recorded by at least one participant to obtain the sound triggered by the at least one participant.
As one embodiment, capturing one or more sounds triggered by one or more participants of an activity, comprises: interaction of one or more participants in response to the activity; under the condition that the interaction triggers sound collection, selecting sound corresponding to at least one participant from a preset sound library, and receiving sound recorded by at least one participant to obtain one or more sounds triggered by one or more participants of the activity.
Step S102, determining the musical instrument corresponding to the one or more sounds.
As one embodiment, determining a musical instrument corresponding to the sound includes: extracting a sound feature from the sound, the sound feature representing a Timbre T (Timbre) of the sound; determining a timbre similarity of the sound to the plurality of instruments based on the sound characteristics of the sound and the sound characteristics of the plurality of instruments; the instrument to which the sound corresponds is determined based on the similarity of the sound and the timbres of the plurality of instruments.
Illustratively, determining an instrument corresponding to the sound based on a timbre similarity of the sound to the plurality of instruments includes: the instrument to which the sound corresponds is determined based on the similarity of the tone colors of the sound and the plurality of instruments, and at least one of a sound Frequency F (Frequency, corresponding to pitch), a sound intensity I (Duration), and a Duration D (Duration) of the sound.
Illustratively, the timbre similarity is determined using Mel frequency cepstral coefficients (Mel-Frequency Cepstral Coefficients, MFCCs). MFCCs are a set of features extracted from sound waves that can be used to represent the timbre of sound. The MFCCs are calculated by dividing the sound wave into short segments, fourier transforming each segment to extract the frequency component of the sound, converting the frequency component into mel-scale values, and performing cepstrum analysis on the mel-scale frequencies to extract the MFCCs. After extraction of MFCCs, the timbre similarity of two sounds is compared by the distance between their MFCCs vectors, with a smaller distance indicating that the timbres of the two sounds are more similar.
As another embodiment, the determining the musical instrument corresponding to the sound includes: the instrument to which the sound corresponds is identified using a pre-trained deep learning model.
For example, the deep learning model may be a multi-class deep learning model, for example, a waveform of the sound is input and output as a probability that the sound belongs to each class, and a class corresponding to the highest probability may be used as the instrument corresponding to the sound.
By way of example, the sounds of various instruments and the collected sounds of various sound sources are used as training sets, one instrument sound and one sound source sound are input each time during training, and the similarity is scored manually according to at least one of a tone color T (Timbre), a Frequency F (Frequency, corresponding to pitch), a sound intensity I (index) and a Duration D (Duration), and the higher the similarity score is, the higher the overall score is, the higher the similarity is. And obtaining a similarity judging model through a certain amount of training. When a source sound is input, the model identifies the closest instrument sound.
Step S103, generating a rhythm music score by using the music generation model.
In this embodiment, the music generation model may include Riffusion, mousai, musicLM, noise2Music, MUSICGEN, magenta, deepjazz, etc., which is not limited in this implementation.
As an embodiment, the rhythm music score generated by the music generation model includes a multi-track musical instrument, and the rhythm music score includes at least one or more musical instruments corresponding to the one or more sounds.
Step S104, music is generated based on the rhythm music score and replacing the sound of the corresponding instrument with the one or more sounds.
By the music generation method, personalized music can be generated for participants in an activity. Multiple participants may cooperate to generate personalized music.
In some embodiments, as shown in fig. 2, the music generation method may include steps S201 to S205.
Step S201, one or more sounds triggered by one or more participants of an activity are acquired, the activity including the one or more participants.
As one embodiment, capturing one or more sounds triggered by one or more participants of an activity, comprises: interaction of one or more participants in response to the activity; and under the condition that the interaction triggers sound collection, selecting sound corresponding to at least one participant from a preset sound library to obtain the sound triggered by the at least one participant.
As one embodiment, capturing one or more sounds triggered by one or more participants of an activity, comprises: interaction of one or more participants in response to the activity; and under the condition that the interaction triggers sound collection, receiving sound recorded by at least one participant to obtain the sound triggered by the at least one participant.
As one embodiment, capturing one or more sounds triggered by one or more participants of an activity, comprises: interaction of one or more participants in response to the activity; under the condition that the interaction triggers sound collection, selecting sound corresponding to at least one participant from a preset sound library, and receiving sound recorded by at least one participant to obtain one or more sounds triggered by one or more participants of the activity.
Step S202, determining the musical instrument corresponding to the one or more sounds.
As one embodiment, determining a musical instrument corresponding to the sound includes: extracting a sound feature from the sound, the sound feature representing a Timbre T (Timbre) of the sound; determining a timbre similarity of the sound to the plurality of instruments based on the sound characteristics of the sound and the sound characteristics of the plurality of instruments; the instrument to which the sound corresponds is determined based on the similarity of the sound and the timbres of the plurality of instruments.
Illustratively, determining an instrument corresponding to the sound based on a timbre similarity of the sound to the plurality of instruments includes: the instrument to which the sound corresponds is determined based on the similarity of the tone colors of the sound and the plurality of instruments, and at least one of a sound Frequency F (Frequency, corresponding to pitch), a sound intensity I (Duration), and a Duration D (Duration) of the sound.
Illustratively, the timbre similarity is determined using Mel frequency cepstral coefficients (Mel-Frequency Cepstral Coefficients, MFCCs). MFCCs are a set of features extracted from sound waves that can be used to represent the timbre of sound. The MFCCs are calculated by dividing the sound wave into short segments, fourier transforming each segment to extract the frequency component of the sound, converting the frequency component into mel-scale values, and performing cepstrum analysis on the mel-scale frequencies to extract the MFCCs. After extraction of MFCCs, the timbre similarity of two sounds is compared by the distance between their MFCCs vectors, with a smaller distance indicating that the timbres of the two sounds are more similar.
As another embodiment, the determining the musical instrument corresponding to the sound includes: the instrument to which the sound corresponds is identified using a pre-trained deep learning model.
For example, the deep learning model may be a multi-class deep learning model, for example, a waveform of the sound is input and output as a probability that the sound belongs to each class, and a class corresponding to the highest probability may be used as the instrument corresponding to the sound.
By way of example, the sounds of various instruments and the collected sounds of various sound sources are used as training sets, one instrument sound and one sound source sound are input each time during training, and the similarity is scored manually according to at least one of a tone color T (Timbre), a Frequency F (Frequency, corresponding to pitch), a sound intensity I (index) and a Duration D (Duration), and the higher the similarity score is, the higher the overall score is, the higher the similarity is. And obtaining a similarity judging model through a certain amount of training. When a source sound is input, the model identifies the closest instrument sound.
Step S203, music generation conditions are determined.
As one embodiment, the music generation condition includes a text condition. The text condition is a text description of the target music.
As one embodiment, the music generation condition includes a melody condition. The melody condition may be an audio file of a reference melody.
As yet another embodiment, the music generation condition includes a text condition and a melody condition.
In this embodiment, the music generation condition may be derived based on the activity scene and/or the participant's input.
As one embodiment, the determining music generation conditions includes: a music generation condition is determined based on the activity scene of the activity, the music generation condition including a text description associated with the activity scene. With this embodiment, personalized music can be generated based on the activity scene and the participant-triggered sound.
As one embodiment, the determining music generation conditions includes: a music generation condition is determined based on the activity scene of the activity, the music generation condition including a melody condition including an audio file of a reference melody associated with the activity scene. With this embodiment, personalized music can be generated based on the activity scene and the participant-triggered sound.
As one embodiment, the determining music generation conditions includes: a music generation condition is determined based on the activity scene of the activity, the music generation condition including a text description associated with the activity scene and a melody condition including an audio file of a reference melody associated with the activity scene. With this embodiment, personalized music can be generated based on the activity scene and the participant-triggered sound.
As one embodiment, the determining music generation conditions includes: receiving input from one or more participants, wherein the input includes a textual description of the music; a music generation condition is determined based on the input. With this embodiment, personalized music can be generated based on the participant's input and the participant-triggered sound.
As one embodiment, the determining music generation conditions includes: receiving input from one or more participants, wherein the input includes an audio file of a reference melody of the music; a music generation condition is determined based on the input. With this embodiment, personalized music can be generated based on the participant's input and the participant-triggered sound.
As one embodiment, the determining music generation conditions includes: receiving input from one or more participants, wherein the input includes a textual description of the music and an audio file referencing the melody; a music generation condition is determined based on the input. With this embodiment, personalized music can be generated based on the participant's input and the participant-triggered sound.
Step S204, a musical score of the musical instrument is generated based on the music generation conditions using the music generation model.
As an embodiment, the rhythm music score generated by the music generation model includes a multi-track musical instrument, and the rhythm music score includes at least one or more musical instruments corresponding to the one or more sounds.
Step S205, generating music based on the rhythm music score and replacing the sound of the corresponding instrument with the one or more sounds.
The music generation method of the embodiment of the disclosure can be applied to computer games and also can be applied to virtual activities realized in real space. The following describes an example of applying the music generation method of the embodiment of the present disclosure in real space.
Herein, the term "real space" may include, but is not limited to, amusement parks, museums, exhibition halls, and the like. The term "activity" may include, but is not limited to, activities simulated by a computer system or the like (e.g., a computer game) and activities that are computer-simulated in conjunction with real-space objects (e.g., virtual meanings given to behavior in real space). In the embodiments of the present disclosure, the amusement park, the museum, the exhibition hall and other places are provided with activities outside the fixed facilities, and all activities can be used as activities in the present disclosure.
Fig. 3 is a schematic hardware diagram of an implementation of a system provided by an embodiment of the disclosure, as shown in fig. 3, where the system includes: control system 100, contact device 210, and carry-on device 220. The contact device 210 is disposed in the real space 200, and the portable device 220 is configured to be suitable for carrying by a participant 240 in the real space 200. At least one of the contact device 210 and the carry-on device 220 can be communicatively coupled to the control system 100. The contact device 210 is communicatively connected to the portable device 220.
Fig. 4 is a hardware schematic diagram of an implementation manner of setting an interaction device in a system according to an embodiment of the disclosure, as shown in fig. 4, and setting an interaction device 230 in the system shown in fig. 3. The interaction means 230 is configured to perform a man-machine interaction with the participant 240 such that the participant 240 completes a virtual activity in the real space 200.
Fig. 5 is a hardware schematic diagram of another implementation of providing an interaction device in a system according to an embodiment of the disclosure, as shown in fig. 5, in the system shown in fig. 3, the contact device 210 may include an interaction unit 215, and the portable device 220 may include an interaction unit 221. The interaction unit 215 is configured to perform man-machine interaction with the participant 240 independently to complete a virtual activity in the real space 200; or configured to cooperate with the interaction unit 221 to perform a human-machine interaction with the participant 240 to complete the virtual activity in the real space 200. The interaction unit 221 is configured to perform man-machine interaction with the participant 240 independently to complete a virtual activity in the real space 200; or configured to cooperate with the interaction unit 215 to perform a human-machine interaction with the participant 240 to complete the virtual activity in the real space 200. In some embodiments, the participant 240 may complete a virtual activity in the real space 200 through one of the interaction unit 215 and the interaction unit 221. In other embodiments, the participant 240 may complete a virtual activity in the real space 200 through the interaction unit 215 and the interaction unit 221.
In certain embodiments, the interaction device 230 shown in fig. 4, the interaction units 215, 221 shown in fig. 5 are human-machine interaction devices having user interfaces (e.g., but not limited to, a graphical user interface (Graphic User Interface, abbreviated as GUI), a Multi-channel user interface (Multi-Channel User Interface), etc.), having input devices (e.g., but not limited to, a keyboard, a mouse, a joystick, a touch pad, a touch-sensitive display, an image sensor, a microphone array, etc.), and output devices (e.g., but not limited to, a display, a speaker, a graphical user interface, a vibration motor, etc.).
In some embodiments, the interaction device 230 shown in fig. 4 and the interaction units 215 and 221 shown in fig. 5 are devices with information collection function, for example, devices with a password, which can receive the password information input by the participant and determine whether the password information is correct or transmit the password information; as another example, a mechanical manipulation device can collect manipulation results and/or manipulation procedures.
As an exemplary illustration, the interaction device 230 shown in fig. 4 and the interaction units 215 and 221 shown in fig. 5 may be devices or systems that use Virtual Reality (VR) technology, such as VR glasses, so that participants can be immersed in a Virtual environment created by a computer system, and can also operate and feedback objects in the Virtual environment.
As another exemplary illustration, the interaction device 230 shown in fig. 4 and the interaction units 215 and 221 shown in fig. 5 may be devices or systems adopting augmented reality (Augmented Reality, abbreviated as AR) technology, which is not described in detail in the embodiments of the present disclosure.
As yet another exemplary illustration, the interaction means 230 shown in fig. 4, the interaction units 215, 221 shown in fig. 5 may be amusement park game devices or systems, such as, but not limited to, racing game devices, etc. These devices or systems are capable of recording interaction processes and/or interaction results (e.g., processes or achievements of a racing game).
It should be appreciated that the interaction means 230 shown in fig. 4, the interaction units 215, 221 shown in fig. 5 are not limited to the above exemplary illustration, and any means capable of interacting with a participant are contemplated herein, as the embodiments of the present disclosure are not limited in this regard.
Referring to fig. 3 to 5, a control system 100 provided in an embodiment of the disclosure includes a distributed computing platform such as a cloud platform, but is not limited thereto. In some embodiments, the control system 100 is a single cloud platform, and one single cloud platform is applied to a site in the real space 200, and maintains real-time network connection with the contact device 210 in the real space 100 through software, so that two-way information intercommunication is achieved. The control system 100 collects information of the participant 240's portable device 220 in real time and transmits and receives the information to the participant 240's portable device 220 if a trigger condition is satisfied. In some embodiments, the control system 100 is a collective cloud platform, which is a collection of several individual cloud platforms, and may centrally control the individual cloud platforms, so as to achieve coordination and control of contents and experiences of sites in multiple real spaces.
Referring to fig. 3 to 5, in some embodiments, the portable device 220 provided in the embodiments of the present disclosure includes a smart band, a smart watch, and a mobile terminal such as a smart phone. The portable device 220 provided in the embodiments of the present disclosure may include: RF (Radio Frequency) unit, wiFi module, audio output unit, A/V (audio/video) input unit, sensor, display unit, user input unit, interface unit, memory, processor, and power supply. In some embodiments, the portable device 220 includes: site customized mobile phones with sensing devices, hand rings, identity cards, props, etc., but are not limited thereto.
Referring to fig. 3 to 5, in some embodiments, the contact device 210 in the real space 200 includes: contactless devices such as two-dimensional codes, bar codes, NFC, RFID, bluetooth, infrared, interactive projection, light control devices, or voice control devices, etc. In some embodiments, the contact devices 210 in real space 200 comprise contact devices, such as, for example, tapping-type interactive devices, mechanical, etc., analog operation-type devices. In the embodiment of the present disclosure, one or more types of contact devices 200 may be disposed in the real space 200, which is not limited in the embodiment of the present disclosure.
The manner of triggering between the participant 240 and the contact arrangement 210 is described below, it being understood that the following description is merely illustrative of an understanding of the present disclosure, and not a specific limitation of the present disclosure, and embodiments of the present disclosure are not limited to the following examples. In addition, the following triggering methods may be used in combination, which will not be described in detail in the embodiments of the present disclosure.
In some embodiments, the participant 240 uses a field-customized mobile phone with a sensing device, a bracelet, an identification card, a prop, etc. as the portable device 220, enters the identification range of the contact device 210, is identified by the contact device 210 in an NFC/RFID/bluetooth/infrared mode, etc., forms a trigger, activates the contact device 210 and the linked interaction device 230, interaction unit 221 and/or interaction unit 215 to start operation, and sends related information to the portable device 220 of the participant 240 by the control system 100.
In some embodiments, the participant 240 uses a field customized mobile phone with sensing device, a bracelet, an identification card, a prop, etc. as the portable device 220, enters the recognition range of the contact device 210, takes a specific shape, action or combination with the body and the devices, is photographed by a camera, forms a trigger through image recognition, activates the contact device 210 and the linked interaction device 230, interaction unit 221 and/or interaction unit 215 to start operation, and sends related information to the portable device 220 of the user by the control system 100.
In some embodiments, the participant 240 enters the recognition range of the contact device 210, is captured by the camera due to the visual features of the face, body, clothing, prop, etc., forms a trigger through image recognition, activates the contact device 210 and the linked interaction device 230, interaction unit 221 and/or interaction unit 215 to start to operate, and sends related information to the user's portable device 220, interaction device 230 and/or contact device 210 by the control system 100.
In some embodiments, the participant 240 enters the recognition range of the contact device 210, is captured by the infrared camera, acquires the physiological sign data such as the heartbeat and the body temperature thereof through image recognition, forms a trigger, activates the contact device 210 and the linked interaction device 230, interaction unit 221 and/or interaction unit 215 to start to operate, and sends related information to the user's portable device 220, interaction device 230 and/or contact device 210 by the control system 100.
In some embodiments, the participant 240 enters the identification range of the contact device 210, is triggered by the sensor acquiring its speed, acceleration, weight, etc., activates the contact device 210 and the associated interaction device 230, interaction unit 221 and/or interaction unit 215 to start operation, and sends related information to the user's personal device 220, interaction device 230 and/or contact device 210 by the control system 100.
In some embodiments, the participant 240 uses a field-customized mobile phone with a sensing device, a bracelet, an identification card, a prop, etc. as the portable device 220, enters the recognition range of the contact device 210, emits a specific sound or voice information, is obtained by the sound receiving device, forms a trigger through sound recognition, activates the contact device 210 and the linked interaction device 230, interaction unit 221 and/or interaction unit 215 to start operation, and sends related information to the portable device 220, interaction device 230 and/or contact device 210 of the user by the control system 100.
In some embodiments, the participant 240 uses a field customized mobile phone with a sensing device, a bracelet, an identification card, a prop, etc. as the portable device 220, reaches a specific location, is positioned by means of GPS/Wifi/bluetooth/lidar, etc., forms a trigger, activates the contact device 210 and the linked interaction device 230, interaction unit 221 and/or interaction unit 215 to start operation, and sends related information to the user's portable device 220, interaction device 230 and/or contact device 210 by the control system 100. The distance, orientation, speed, acceleration, angular velocity, etc. of the participant 240 from the contact set 210 will be parameters affecting the content and effect of the trigger.
In some embodiments, the participant 240 uses a site customized mobile phone or a personal mobile phone as the portable device 220, uses related software to scan the two-dimensional code/barcode/solar code or other graphic codes provided by the contact device 210 to form a trigger, activates the contact device 210 and the linked interaction device 230, interaction unit 221 and/or interaction unit 215 to start operation, and sends related information to the user's portable device 220, interaction device 230 and/or contact device 210 by the control system 100.
In some embodiments, the participant 240 captures the sound emitted by the contact device 210 using the site customized phone or personal phone as the personal device 220, recognizes the sound code thereof using the relevant software, forms a trigger, activates the contact device 210 and the associated interaction device 230, interaction unit 221 and/or interaction unit 215 to start operation, and sends the relevant information to the user's personal device 220, interaction device 230 and/or contact device 210 by the control system 100.
In some embodiments, the participant 240 enters the scope of the contact device 210, forms a trigger by a touch action, such as a tap, push-pull, or manipulation of a mechanical device, activates the contact device 210 and the associated interaction device 230, interaction unit 221, and/or interaction unit 215 to begin operation, and sends related information to the user's personal device 220, interaction device 230, and/or contact device 210 by the control system 100.
Fig. 6 shows a flowchart of implementing a music generation method in the system of the embodiment of the present disclosure, and as shown in fig. 6, the music generation method includes steps S601 to S607.
In step S601, the control system receives trigger information generated by the portable device and/or the contact device in real space.
In an embodiment, the trigger information may include a participant identification.
In step S602, the control system determines an activity in which the participant participates based on the participant identification corresponding to the trigger message.
In step S603, the control system selects a sound corresponding to the sound participant from a preset sound library to obtain a sound triggered by the participant when the trigger sound collection is determined based on the activity of the participant.
Optionally, a sound corresponding to the sound participant is selected from a preset sound library based on the contact device corresponding to the trigger information. For example, contact arrangements in different positions trigger different sounds.
Optionally, in case it is determined to trigger sound collection, the participant may be instructed to record sound through the interaction means 230, the interaction unit 221 and/or the interaction unit 215. The control system receives the sound recorded by the participant and obtains the sound triggered by the participant.
In step S604, the control system determines the instrument corresponding to the sound triggered by the participant.
In step S605, the control system determines a music generation condition based on an activity scene of an activity.
Alternatively, the control system may instruct the participant to input the music generation condition through the interaction means 230, the interaction unit 221, and/or the interaction unit 215.
As one embodiment, the music generation condition includes a text condition. The text condition is a text description of the target music.
As one embodiment, the music generation condition includes a melody condition. The melody condition may be an audio file of a reference melody.
As yet another embodiment, the music generation condition includes a text condition and a melody condition.
In this embodiment, the music generation condition may be derived based on the activity scene and/or the participant's input. The text condition includes a text description associated with the activity scene. The melody condition includes an audio file of a reference melody associated with the active scene.
In step S606, the control system generates a musical score for a musical score based on the music generation conditions using the music generation model.
As an embodiment, the rhythm music score generated by the music generation model includes a multi-track musical instrument, and the rhythm music score includes at least one or more musical instruments corresponding to the one or more sounds.
In step S607, the control system generates music based on the rhythm music score and replaces the sound of the corresponding instrument with the above-mentioned triggered sound.
There is also provided, according to an embodiment of the present disclosure, a music generating apparatus, as shown in fig. 7, including: an acquisition module 710 for acquiring one or more sounds triggered by one or more participants of an activity, the activity including the one or more participants; a processing module 720 for determining one or more instruments corresponding to the sounds; generating a rhythm music score using the music generation model; music is generated based on the rhythmic music score and replacing the sound of the corresponding instrument with the one or more sounds.
In some embodiments, the processing module 720 is further to: determining a music generation condition, wherein the music generation condition comprises a text condition and/or a melody condition; wherein the generating of the rhythm music score using the music generation model includes: a musical score is generated based on the music generation conditions using a music generation model.
As one embodiment, the determining, by the processing module 720, a music generation condition specifically includes: a music generation condition is determined based on the activity scene of the activity, the text condition including a text description associated with the activity scene, and/or the melody condition including an audio file of a reference melody associated with the activity scene.
As one embodiment, the determining, by the processing module 720, a music generation condition specifically includes: receiving input from one or more participants, wherein the input includes a textual description of the music and/or an audio file referencing the melody; a music generation condition is determined based on the input.
As one embodiment, the processing module 720 determines the musical instrument corresponding to the sound, specifically includes: extracting a sound feature from the sound, the sound feature representing a timbre of the sound; determining a timbre similarity of the sound to the plurality of instruments based on the sound characteristics of the sound and the sound characteristics of the plurality of instruments; the instrument to which the sound corresponds is determined based on the similarity of the sound and the timbres of the plurality of instruments.
As one embodiment, the processing module 720 determines that the musical instrument corresponding to the sound specifically includes: the instrument to which the sound corresponds is identified using a pre-trained deep learning model.
For one embodiment, the acquisition module 710 acquires one or more sounds triggered by one or more participants of an activity, including: interaction of one or more participants in response to the activity; under the condition that the interaction triggers sound collection, selecting sound corresponding to at least one participant from a preset sound library to obtain sound triggered by the at least one participant; and/or receiving sound recorded by at least one participant to obtain sound triggered by the at least one participant.
The exemplary embodiments of the present disclosure also provide an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor for causing the electronic device to perform a method according to embodiments of the present disclosure when executed by the at least one processor.
The present disclosure also provides a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform a method according to an embodiment of the present disclosure.
The present disclosure also provides a computer program product comprising a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform a method according to embodiments of the disclosure.
Referring to fig. 8, a block diagram of an electronic device 800 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
Various components in electronic device 800 are connected to I/O interface 805, including: an input unit 806, an output unit 807, a storage unit 808, and a communication unit 809. The input unit 806 may be any type of device capable of inputting information to the electronic device 800, and the input unit 806 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 807 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. The storage unit 808 may include, but is not limited to, magnetic disks, optical disks. The communication unit 809 allows the electronic device 800 to exchange information/data with other devices over computer networks, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the various methods and processes described above. For example, in some embodiments, the music generation method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 800 via the ROM 802 and/or the communication unit 809. In some embodiments, the computing unit 801 may be configured to perform the music generation method by any other suitable means (e.g., by means of firmware).
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Claims (10)

1. A music generation method, comprising:
acquiring one or more sounds triggered by one or more participants of an activity, the activity comprising the one or more participants;
determining a musical instrument corresponding to the one or more sounds;
generating a rhythm music score using the music generation model;
music is generated based on the rhythmic music score and replacing the sound of the corresponding instrument with the one or more sounds.
2. The method as recited in claim 1, further comprising:
determining music generation conditions, wherein the music generation conditions comprise text conditions and/or melody conditions;
wherein the generating of the rhythm music score using the music generation model includes: a musical score is generated based on the music generation conditions using a music generation model.
3. The method of claim 2, wherein the determining music generation conditions comprises:
A music generation condition is determined based on an activity scene of the activity, the text condition comprising a text description associated with the activity scene, and/or the melody condition comprising an audio file of a reference melody associated with the activity scene.
4. A method as claimed in claim 2 or 3, wherein said determining music generation conditions comprises:
receiving input from the one or more participants, wherein the input includes a textual description of the music and/or an audio file referencing the melody; a music generation condition is determined based on the input.
5. The method of claim 1, wherein the determining the instrument to which the sound corresponds comprises:
extracting sound features from the sound, the sound features representing timbres of the sound;
determining a timbre similarity of the sound to a plurality of instruments based on sound features of the sound and sound features of the plurality of instruments;
and determining the musical instrument corresponding to the sound based on the tone color similarity of the sound and the plurality of musical instruments.
6. The method of claim 1 or 5, wherein said determining the instrument to which the sound corresponds comprises:
A pre-trained deep learning model is used to identify the instrument to which the sound corresponds.
7. The method of claim 1, wherein the acquiring one or more sounds triggered by one or more participants of an activity comprises:
interaction of one or more participants in response to the activity;
in case the interaction triggers a sound collection,
selecting sound corresponding to at least one participant from a preset sound library to obtain sound triggered by the participant; and/or
And receiving sound recorded by at least one participant, and obtaining sound triggered by the participant.
8. A music generating apparatus, comprising:
an acquisition module for acquiring one or more sounds triggered by one or more participants of an activity, the activity comprising the one or more participants;
a processing module for determining a musical instrument to which the one or more sounds correspond; generating a rhythm music score using the music generation model; music is generated based on the rhythmic music score and replacing the sound of the corresponding instrument with the one or more sounds.
9. An electronic device, comprising:
a processor; and
A memory in which a program is stored,
wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method according to any of claims 1-7.
10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-7.
CN202311346438.0A 2023-10-17 2023-10-17 Music generation method, device, electronic equipment and storage medium Pending CN117334206A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311346438.0A CN117334206A (en) 2023-10-17 2023-10-17 Music generation method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311346438.0A CN117334206A (en) 2023-10-17 2023-10-17 Music generation method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117334206A true CN117334206A (en) 2024-01-02

Family

ID=89295102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311346438.0A Pending CN117334206A (en) 2023-10-17 2023-10-17 Music generation method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117334206A (en)

Similar Documents

Publication Publication Date Title
US11158102B2 (en) Method and apparatus for processing information
CN110288077B (en) Method and related device for synthesizing speaking expression based on artificial intelligence
CN110379430B (en) Animation display method and device based on voice, computer equipment and storage medium
US20210029305A1 (en) Method and apparatus for adding a video special effect, terminal device and storage medium
CN106104677B (en) The movement that the voice identified is initiated visually indicates
US9811313B2 (en) Voice-triggered macros
JP6505117B2 (en) Interaction of digital personal digital assistant by replication and rich multimedia at response
JP2022527155A (en) Animation character driving method and related equipment based on artificial intelligence
CN111524501B (en) Voice playing method, device, computer equipment and computer readable storage medium
CN113946211A (en) Method for interacting multiple objects based on metauniverse and related equipment
US11600266B2 (en) Network-based learning models for natural language processing
CN111261161B (en) Voice recognition method, device and storage medium
CN110263131B (en) Reply information generation method, device and storage medium
CN114787814A (en) Reference resolution
WO2023082703A1 (en) Voice control method and apparatus, electronic device, and readable storage medium
CN112309365A (en) Training method and device of speech synthesis model, storage medium and electronic equipment
JP2023552854A (en) Human-computer interaction methods, devices, systems, electronic devices, computer-readable media and programs
CN111291151A (en) Interaction method and device and computer equipment
CN111835621A (en) Session message processing method and device, computer equipment and readable storage medium
CN111428079A (en) Text content processing method and device, computer equipment and storage medium
CN111831249A (en) Audio playing method and device, storage medium and electronic equipment
CN111816168A (en) Model training method, voice playing method, device and storage medium
CN117334206A (en) Music generation method, device, electronic equipment and storage medium
US20220301250A1 (en) Avatar-based interaction service method and apparatus
CN112820265B (en) Speech synthesis model training method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination