WO2021068812A1 - Music generation method and apparatus, electronic device and computer-readable storage medium - Google Patents

Music generation method and apparatus, electronic device and computer-readable storage medium Download PDF

Info

Publication number
WO2021068812A1
WO2021068812A1 PCT/CN2020/119078 CN2020119078W WO2021068812A1 WO 2021068812 A1 WO2021068812 A1 WO 2021068812A1 CN 2020119078 W CN2020119078 W CN 2020119078W WO 2021068812 A1 WO2021068812 A1 WO 2021068812A1
Authority
WO
WIPO (PCT)
Prior art keywords
video frame
music
target video
type
position coordinate
Prior art date
Application number
PCT/CN2020/119078
Other languages
French (fr)
Chinese (zh)
Inventor
刘奡智
蔡梓丰
王健宗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021068812A1 publication Critical patent/WO2021068812A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H7/00Instruments in which the tones are synthesised from a data store, e.g. computer organs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of data processing technology, and in particular to a music generation method, device, electronic equipment, and computer-readable storage medium.
  • the music generation method provided in this application includes:
  • the first recognition step use the camera unit to record the user's action video, read the current video frame of the action video as the first target video frame, input the first target video frame into the pre-trained model, and recognize the first target video frame.
  • a user s key part information in a target video frame, where the key part information includes the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value;
  • the playback unit is controlled to start and play according to the preset music parameters and The music generated by the initial value of the sound effect parameter;
  • the second identification step taking the reading time of the first target video frame as the starting point of time, reading the current video frame of the action video as the second target video frame at a preset time interval, and setting the second target video frame Input the pre-trained model to identify the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and the position coordinate value of the user in the second target video;
  • the adjustment step adjust the music according to the predetermined mapping relationship table of the first type of human body joint parts and the music parameters, the change of the position coordinate values of the first type of human body joint parts in the second target video frame, and the first preset adjustment range table Parameters: adjust the sound effect parameters according to the predetermined mapping table of the second type of human joint parts and the sound effect parameters, the change in the position coordinate values of the second type of human joint parts in the second target video frame, and the second preset adjustment range table , And adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
  • the present application also provides a music generating device, which includes:
  • the first recognition module is used to record the user's action video by using the camera unit, read the current video frame of the action video as the first target video frame, input the first target video frame into the pre-trained model, and identify all The key part information of the user in the first target video frame, where the key part information includes the ID of the first type of human body joint part and its position coordinate value, and the ID of the second type of human body joint part and its position coordinate value;
  • the generating module is used to control the playback unit to start and play music according to the preset when the position coordinate values of the first type of human joint parts and the second type of human joint parts in the first target video frame are recognized
  • the music generated by the initial values of the parameters and sound effect parameters
  • the second recognition module is used to read the current video frame of the action video as the second target video frame by taking the reading time of the first target video frame as the time starting point and every preset time interval, and set the second target video frame as the second target video frame.
  • the video frame is input into the pre-trained model to identify the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and the position coordinate value of the user in the second target video;
  • the adjustment module is used to perform the mapping relationship between the first type of human body joints and the music parameters, the change of the position coordinate values of the first type of human joints in the second target video frame, and the first preset adjustment range table. Adjust the music parameters according to the predetermined mapping relationship table of the second type of human body joints and the sound effect parameters, the change of the position coordinate value of the second type of human joints in the second target video frame, and the second preset adjustment range table. Sound effect parameters, and adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
  • the present application also provides an electronic device, the electronic device comprising: a memory, a processor, the memory stores a music generation program that can run on the processor, and when the music generation program is executed by the processor To achieve the following steps:
  • the first recognition step use the camera unit to record the user's action video, read the current video frame of the action video as the first target video frame, input the first target video frame into the pre-trained model, and recognize the first target video frame.
  • a user s key part information in a target video frame, where the key part information includes the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value;
  • the playback unit is controlled to start and play according to the preset music parameters and The music generated by the initial value of the sound effect parameter;
  • the second identification step taking the reading time of the first target video frame as the starting point of time, reading the current video frame of the action video as the second target video frame at a preset time interval, and setting the second target video frame Input the pre-trained model to identify the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and the position coordinate value of the user in the second target video;
  • the adjustment step adjust the music according to the predetermined mapping relationship table of the first type of human body joint parts and the music parameters, the change of the position coordinate values of the first type of human body joint parts in the second target video frame, and the first preset adjustment range table Parameters: adjust the sound effect parameters according to the predetermined mapping table of the second type of human joint parts and the sound effect parameters, the change in the position coordinate values of the second type of human joint parts in the second target video frame, and the second preset adjustment range table , And adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
  • the present application also provides a computer-readable storage medium having a music generation program stored on the computer-readable storage medium, and the music generation program can be executed by one or more processors to implement the following steps:
  • the first recognition step use the camera unit to record the user's action video, read the current video frame of the action video as the first target video frame, input the first target video frame into the pre-trained model, and recognize the first target video frame.
  • a user s key part information in a target video frame, where the key part information includes the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value;
  • the playback unit is controlled to start and play according to the preset music parameters and The music generated by the initial value of the sound effect parameter;
  • the second identification step taking the reading time of the first target video frame as the starting point of time, reading the current video frame of the action video as the second target video frame at a preset time interval, and setting the second target video frame Input the pre-trained model to identify the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and the position coordinate value of the user in the second target video;
  • the adjustment step adjust the music according to the predetermined mapping relationship table of the first type of human body joint parts and the music parameters, the change of the position coordinate values of the first type of human body joint parts in the second target video frame, and the first preset adjustment range table Parameters: adjust the sound effect parameters according to the predetermined mapping table of the second type of human joint parts and the sound effect parameters, the change in the position coordinate values of the second type of human joint parts in the second target video frame, and the second preset adjustment range table , And adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
  • FIG. 1 is a schematic diagram of an embodiment of an electronic device of this application
  • FIG. 2 is a schematic diagram of modules of an embodiment of a music generating device
  • Fig. 3 is a flowchart of an embodiment of a method for generating music according to this application.
  • the electronic device 1 is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • the electronic device 1 may be a computer, a single web server, a server group composed of multiple web servers, or a cloud composed of a large number of hosts or web servers based on cloud computing, where cloud computing is a type of distributed computing, A super virtual computer composed of a group of loosely coupled computer sets.
  • the electronic device 1 includes, but is not limited to, a memory 11, a processor 12, and a network interface 13 that can be communicatively connected to each other through a system bus.
  • the memory 11 stores a music generation program 10, the music generation program 10 can be executed by the processor 12.
  • Figure 1 only shows an electronic device 1 with components 11-13 and a music generating program 10. Those skilled in the art will understand that the structure shown in Figure 1 does not constitute a limitation on the electronic device 1, and may include a comparison chart. Show fewer or more components, or combinations of certain components, or different component arrangements.
  • the memory 11 includes a memory and at least one type of readable storage medium.
  • the memory provides a cache for the operation of the electronic device 1;
  • the readable storage medium can be, for example, flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM) ), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks and other non-volatile or volatile storage media.
  • the readable storage medium may be an internal storage unit of the electronic device 1, such as the hard disk of the electronic device 1.
  • the non-volatile or volatile storage medium may also be an electronic device.
  • the external storage device of the device such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card, and a flash card (Flash Card) equipped on the electronic device 1.
  • the readable storage medium of the memory 11 is generally used to store the operating system and various application software installed in the electronic device 1, for example, to store the code of the music generation program 10 in an embodiment of the present application.
  • the memory 11 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 12 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 12 is generally used to control the overall operation of the electronic device 1, such as performing data interaction or communication-related control and processing with other devices.
  • the processor 12 is used to run the program code or process data stored in the memory 11, for example, to run the music generation program 10 and so on.
  • the network interface 13 may include a wireless network interface or a wired network interface, and the network interface 13 is used to establish a communication connection between the electronic device 1 and a client (not shown in the figure).
  • the electronic device 1 may further include a user interface.
  • the user interface may include a display (Display) and an input unit such as a keyboard (Keyboard).
  • the optional user interface may also include a standard wired interface and a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc.
  • the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
  • the music generation program 10 when executed by the processor 12, the following first recognition step, generation step, second recognition step, and adjustment step are implemented.
  • the first recognition step use the camera unit to record the user's action video, read the current video frame of the action video as the first target video frame, input the first target video frame into the pre-trained model, and recognize the first target video frame.
  • the key part information of the user in a target video frame where the key part information includes the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value.
  • the recording of the user's action video can be the recording of the user's dance video, or the recording of the user's fitness video, sports training video, or any other action video.
  • the pre-trained model is a PoseNet model, which is a convolutional neural network model, which runs on Tensorflow.js (a deep learning framework) and can be viewed in a browser Perform real-time human pose estimation.
  • PoseNet model is a convolutional neural network model, which runs on Tensorflow.js (a deep learning framework) and can be viewed in a browser Perform real-time human pose estimation.
  • the PoseNet model can recognize single-person postures, as well as multi-person postures.
  • the single-player action video of the user is selected.
  • the training process of the PoseNet model includes:
  • step B4 If the accuracy rate is less than the preset accuracy rate, increase the preset number of character action picture samples according to a preset percentage (for example, 15%), and return to step B1.
  • a preset percentage for example, 15%
  • the output of the PoseNet model is the ID of the user's 17 key joint parts and their position coordinate values.
  • the key joint parts are divided into the first type of human joint parts and the second type of human joint parts according to the position distribution of the human body joint parts in the human body.
  • the first type of human joints may be the upper half of the human body
  • the second type of human joints may be the lower half of the human body
  • the first type of human joints may be the left half of the human body
  • the second type The node part of the human body is the joint part of the right half of the human body.
  • the first type of joint parts of the human body are the joint parts of the left half of the human body, for example, the left wrist, the left knee, the left elbow, and the left hip.
  • the second type of human joints is the joints of the right half of the human body, such as the right wrist, right knee, right elbow, and right hip.
  • the position of the camera unit is fixed
  • the position coordinate value of the human body joint part is the two-dimensional coordinate value (X, Y) of the human body joint part in each video frame
  • the X of the two-dimensional coordinate system The axis is the upper border of each video frame
  • the Y axis is the left border of each video frame
  • the origin is the intersection of the upper border and the left border of each video frame.
  • the key part information also includes the confidence score of the position accuracy of the human body joint part.
  • the confidence score is between 0 and 1.0. The higher the confidence score, the position of the identified human joint part is indicated. The higher the accuracy.
  • the playback unit is controlled to start and play the generated music parameters and the initial values of the sound effect parameters. music.
  • the music parameters include pitch, music speed, note time value, sound zone and so on.
  • the pitch is the height of the sound, including four types: A, B, C, and D.
  • the music speed is the number of beats per minute
  • the slow speed is 40 to 69 beats per minute
  • the medium speed is 72 to 84 beats per minute
  • the fast speed is 108 to 28 beats per minute.
  • the note duration is used to indicate the relative duration between notes.
  • the duration of a half note is 1/2 of a whole note
  • a quarter note is 1/4 of a whole note
  • an eighth note is 1/8 of a whole note. .
  • the sound zone includes a high-range zone, a mid-range zone and a low-range zone, and the numerical range is 3-5.
  • the sound effect parameters include loudness, delay time, left and right phase, reverberation time, and the like.
  • the loudness is used to describe the size of the volume.
  • the delay time is the intermediate period of time from sound emission to human ear reception.
  • the reverberation time is the intermediate period of time between reflection and absorption of sound waves after the sound source stops sounding before the sound disappears.
  • the left-right phase is the direction of the sound, including three types: left, right, and center.
  • the initial values of preset music parameters and sound effect parameters are as follows:
  • the initial value of pitch is C
  • the initial value of music speed is 90 beats
  • the initial value of note duration is one quarter
  • the initial value of sound zone is 4
  • the initial value of loudness is 80% of the system volume
  • the initial value of delay time is 0.6 seconds.
  • the initial value of the reverberation time is 1 second
  • the initial value of the left and right phases is centered.
  • the second identification step taking the reading time of the first target video frame as the starting point of time, reading the current video frame of the action video as the second target video frame at a preset time interval, and setting the second target video frame Input the pre-trained model to identify the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value of the user in the second target video.
  • this embodiment does not read all the video frames, but adopts a method of reading one frame at a preset time interval.
  • the adjustment step adjust the music according to the predetermined mapping relationship table of the first type of human body joint parts and the music parameters, the change of the position coordinate values of the first type of human body joint parts in the second target video frame, and the first preset adjustment range table Parameter, adjust the sound effect parameter according to the predetermined mapping relationship table of the second type of human body joint parts and the sound effect parameter, the change amount of the position coordinate value of the second type of human body joint part in the second target video frame, and the second preset adjustment range table , And adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
  • the adjustment step includes:
  • A1 Use the position coordinate value of each joint part of the human body in the first target video frame as the initial value of the position of each joint part;
  • the initial value of the left wrist position of the first target video frame ID 9 can be expressed as (X 9-start , Y 9-start ), and the initial value of the right wrist position ID of 10 can be expressed as (X 10-start , Y 10-start ).
  • the position coordinate value of the left wrist with the ID of 9 in the second target video frame is represented as (X 9-2 , Y 9-2 ), then the position coordinate value of the left wrist with the ID of 9 in the second target video frame on the X axis
  • the amount of change in X 9-2start X 9-2 -X 9-start
  • the position coordinate value of the right wrist with ID 10 in the second target video frame is expressed as (X 10-2 , Y 10-2 ), then the X-axis position coordinate value of the right wrist with ID 10 in the second target video frame
  • the amount of change of X 10-2start X 10-2 -X 10-start
  • A4. Determine the name of the music parameter that needs to be adjusted according to the change in the position coordinate value of the first type of human joint part in the second target video frame and the predetermined mapping relationship between the first type of human joint part and the music parameter. 2.
  • the amount of change in the position coordinate values of the second type of human joint parts in the target video frame and the predetermined mapping relationship table between the second type of human joint parts and the sound effect parameters determine the name of the sound effect parameter that needs to be adjusted;
  • the predetermined mapping relationship table between IDs of the first type of human joint parts and music parameters can be represented by Table 2 below.
  • the predetermined mapping relationship table between the ID of the second type of human joint part and the sound effect parameter can be represented by Table 3 below.
  • the amount of change in the position coordinate value can determine that the pitch of the music parameter needs to be adjusted.
  • A5. Adjust the music parameters that need to be adjusted according to the amount of change in the position coordinate values of the joints of the first type of human body in the second target video frame and the first preset adjustment range table, and adjust the music parameters to be adjusted according to the second target video frame in the second target video frame.
  • the change amount of the position coordinate values of the human body joint parts and the second preset adjustment range table adjust the sound effect parameters that need to be adjusted.
  • the first preset adjustment range table can be represented by Table 4.
  • the second preset adjustment range table can be represented by Table 5.
  • the music speed needs to be adjusted to 110 beats.
  • the loudness needs to be adjusted to 74% of the system volume.
  • A6 Adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
  • Stopping step when a preset stop signal is received, the playing unit is controlled to stop playing the music.
  • the preset stop signal may be to stop recording the user's action video, or it may be that the music playing time reaches a preset time threshold (for example, 3 minutes).
  • the electronic device 1 proposed in this application first reads the current video frame of the action video being recorded as the first target video frame, and identifies the ID and position coordinates of the human joints in the first target video frame Value and control the playback unit to start, play music generated according to the preset music parameters and the initial values of the sound effect parameters; then, take the reading time of the first target video frame as the starting point, and read the action video every preset time
  • the current video frame is used as the second target video frame, the ID and position coordinate values of the human joints in the second target video frame are identified, and the music parameters and sound effect parameters are adjusted according to the changes in the position coordinate values of the human joints, so that the music Make adjustments to generate new music, thereby solving the problem of difficulty in music creation and not easy to expand.
  • FIG. 2 it is a schematic diagram of modules of an embodiment of the music generating apparatus 100.
  • the music generation device 100 includes a first recognition module 110, a generation module 120, a second recognition module 130, and an adjustment module 140.
  • a first recognition module 110 a generation module 120
  • a second recognition module 130 a recognition module 130
  • an adjustment module 140 a recognition module 140
  • the first recognition module 110 is configured to use a camera unit to record a user's action video, read the current video frame of the action video as a first target video frame, and input the first target video frame into a pre-trained model Identify the key part information of the user in the first target video frame, where the key part information includes the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value;
  • the generating module 120 is configured to control the playing unit to start and play the first target video frame when the position coordinate values of the first type of human joint parts and the second type of human joint parts are recognized. Music generated by the initial values of the set music parameters and sound effect parameters;
  • the second identification module 130 is configured to take the reading time of the first target video frame as the starting point of time, and read the current video frame of the action video as the second target video frame at a preset time interval, and set the The second target video frame is input to the pre-trained model to identify the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate in the second target video value;
  • the adjustment module 140 is used to determine the mapping relationship between the first type of human body joint parts and the music parameters, the change in the position coordinate values of the first type of human body joint parts in the second target video frame, and the first preset
  • the adjustment range table adjusts the music parameters, according to the predetermined mapping table of the second type of human body joints and the sound effect parameters, the change of the position coordinate values of the second type of human joints in the second target video frame, and the second preset adjustment
  • the amplitude table adjusts the sound effect parameters, and adjusts the music according to the adjusted music parameters and the sound effect parameters to generate new music.
  • the functions or operation steps implemented by the first recognition module 110, the generation module 120, the second recognition module 130, and the adjustment module 140 when executed are substantially the same as those in the foregoing embodiment, and will not be repeated here.
  • the music generating method includes steps S1-S4.
  • the camera unit uses the camera unit to record the user's action video, read the current video frame of the action video as the first target video frame, input the first target video frame into a pre-trained model, and identify the first target video
  • the key part information of the user in the frame, the key part information includes the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value.
  • the recording of the user's action video can be the recording of the user's dance video, or the recording of the user's fitness video, sports training video, or any other action video.
  • the pre-trained model is a PoseNet model, which is a convolutional neural network model, which runs on Tensorflow.js (a deep learning framework) and can be viewed in a browser Perform real-time human pose estimation.
  • PoseNet model is a convolutional neural network model, which runs on Tensorflow.js (a deep learning framework) and can be viewed in a browser Perform real-time human pose estimation.
  • the PoseNet model can recognize single-person postures, as well as multi-person postures.
  • the single-player action video of the user is selected.
  • the training process of the PoseNet model includes:
  • step B4 If the accuracy rate is less than the preset accuracy rate, increase the preset number of character action picture samples according to a preset percentage (for example, 15%), and return to step B1.
  • a preset percentage for example, 15%
  • the output of the PoseNet model is the ID of the user's 17 key joint parts and their position coordinate values.
  • the relationship between the body joints and their ID can be as shown in Table 1 above.
  • the key joint parts are divided into the first type of human joint parts and the second type of human joint parts according to the position distribution of the human body joint parts in the human body.
  • the first type of human joints may be the upper half of the human body
  • the second type of human joints may be the lower half of the human body
  • the first type of joints may be the left half of the human body
  • the second type of joints The part is the joint part of the right half of the human body.
  • the first type of joint parts of the human body are the joint parts of the left half of the human body, for example, the left wrist, the left knee, the left elbow, and the left hip.
  • the second type of human joints is the joints of the right half of the human body, such as the right wrist, right knee, right elbow, and right hip.
  • the position of the camera unit is fixed
  • the position coordinate value of the human body joint part is the two-dimensional coordinate value (X, Y) of the human body joint part in each video frame
  • the X of the two-dimensional coordinate system The axis is the upper border of each video frame
  • the Y axis is the left border of each video frame
  • the origin is the intersection of the upper border and the left border of each video frame.
  • the key part information also includes the confidence score of the position accuracy of the human body joint part.
  • the confidence score is between 0 and 1.0. The higher the confidence score, the position of the identified human joint part is indicated. The higher the accuracy.
  • the music parameters include pitch, music speed, note time value, sound zone and so on.
  • the pitch is the height of the sound, including four types: A, B, C, and D.
  • the music speed is the number of beats per minute
  • the slow speed is 40 to 69 beats per minute
  • the medium speed is 72 to 84 beats per minute
  • the fast speed is 108 to 28 beats per minute.
  • the note duration is used to indicate the relative duration between notes.
  • the duration of a half note is 1/2 of a whole note
  • a quarter note is 1/4 of a whole note
  • an eighth note is 1/8 of a whole note. .
  • the sound zone includes a high-range zone, a mid-range zone and a low-range zone, and the numerical range is 3-5.
  • the sound effect parameters include loudness, delay time, left and right phase, reverberation time, and the like.
  • the loudness is used to describe the size of the volume.
  • the delay time is the intermediate period of time from sound emission to human ear reception.
  • the reverberation time is the intermediate period of time between reflection and absorption of sound waves after the sound source stops sounding before the sound disappears.
  • the left-right phase is the direction of the sound, including three types: left, right, and center.
  • the initial values of preset music parameters and sound effect parameters are as follows:
  • the initial value of pitch is C
  • the initial value of music speed is 90 beats
  • the initial value of note duration is one quarter
  • the initial value of sound zone is 4
  • the initial value of loudness is 80% of the system volume
  • the initial value of delay time is 0.6 seconds.
  • the initial value of the reverberation time is 1 second
  • the initial value of the left and right phases is centered.
  • the pre-trained model identifies the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value of the user in the second target video.
  • this embodiment does not read all the video frames, but adopts a method of reading one frame at a preset time interval.
  • the adjustment step includes:
  • A1 Use the position coordinate value of each joint part of the human body in the first target video frame as the initial value of the position of each joint part;
  • the initial value of the left wrist position of the first target video frame ID 9 can be expressed as (X 9-start , Y 9-start ), and the initial value of the right wrist position ID of 10 can be expressed as (X 10-start , Y 10-start ).
  • the position coordinate value of the left wrist with the ID of 9 in the second target video frame is represented as (X 9-2 , Y 9-2 ), then the position coordinate value of the left wrist with the ID of 9 in the second target video frame on the X axis
  • the amount of change in X 9-2start X 9-2 -X 9-start
  • the position coordinate value of the right wrist with ID 10 in the second target video frame is expressed as (X 10-2 , Y 10-2 ), then the X-axis position coordinate value of the right wrist with ID 10 in the second target video frame
  • the amount of change of X 10-2start X 10-2 -X 10-start
  • A4. Determine the name of the music parameter that needs to be adjusted according to the change in the position coordinate value of the first type of human joint part in the second target video frame and the predetermined mapping relationship between the first type of human joint part and the music parameter. 2.
  • the amount of change in the position coordinate values of the second type of human joint parts in the target video frame and the predetermined mapping relationship table between the second type of human joint parts and the sound effect parameters determine the name of the sound effect parameter that needs to be adjusted;
  • the predetermined mapping relationship table between IDs of the first type of human joint parts and music parameters can be represented by Table 2 above.
  • the predetermined mapping relationship table between the ID of the second type of human body joint part and the sound effect parameter can be represented by Table 3 above.
  • the amount of change in the position coordinate value can determine that the pitch of the music parameter needs to be adjusted.
  • A5. Adjust the music parameters that need to be adjusted according to the amount of change in the position coordinate values of the joints of the first type of human body in the second target video frame and the first preset adjustment range table, and according to the second target video frame in the second target video frame.
  • the change amount of the position coordinate values of the human body joint parts and the second preset adjustment range table adjust the sound effect parameters that need to be adjusted.
  • the first preset adjustment range table can be represented by Table 4 above.
  • the second preset adjustment range table can be represented by Table 5 above.
  • the music speed needs to be adjusted to 110 beats.
  • the loudness needs to be adjusted to 74% of the system volume.
  • A6 Adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
  • Stopping step when a preset stop signal is received, the playing unit is controlled to stop playing the music.
  • the preset stop signal may be to stop recording the user's action video, or it may be that the music playing time reaches a preset time threshold (for example, 3 minutes).
  • the music generation method proposed in this application first reads the current video frame of the action video being recorded as the first target video frame, and identifies the ID and position coordinates of the human joints in the first target video frame Value and control the playback unit to start, play music generated according to the preset music parameters and the initial values of the sound effect parameters; then, take the reading time of the first target video frame as the starting point, and read the action video every preset time
  • the current video frame is used as the second target video frame, the ID and position coordinate values of the human joints in the second target video frame are identified, and the music parameters and sound effect parameters are adjusted according to the changes in the position coordinate values of the human joints, so that the music Make adjustments to generate new music, thereby solving the problem of difficulty in music creation and not easy to expand.
  • the embodiment of the present application also proposes a computer-readable storage medium.
  • the computer-readable storage medium may be volatile or non-volatile.
  • the computer-readable storage medium may be a hard disk, a multimedia card, or an SD card. , Flash memory card, SMC, read only memory (ROM), erasable programmable read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, etc. any one or more of them random combination.
  • the computer-readable storage medium includes a music generation program 10, and when the music generation program 10 is executed by a processor, the following operations are implemented:
  • the camera unit uses the camera unit to record the user's action video, read the current video frame of the action video as the first target video frame, input the first target video frame into the pre-trained model, and identify the first target video frame
  • the key part information of the user includes the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value;
  • the playback unit is controlled to start and play the music parameters and sound effect parameters according to the preset music parameters and sound effect parameters. Music generated by the initial value;
  • the music parameters are adjusted according to the predetermined mapping relationship table of the first type of human body joint parts and the music parameters, the change of the position coordinate value of the first type of human body joint parts in the second target video frame, and the first preset adjustment range table.
  • the adjusted music parameters and sound effect parameters are adjusted to the music to generate new music.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.
  • a terminal device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Acoustics & Sound (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

A music generation method, comprising: recording a motion video of a user; reading a current video frame of the motion video as a first target video frame; identifying IDs and position coordinate values of human joint sites in the first target video frame, and controlling a playback unit to start up and play back music generated according to preset initial values of a music parameter and sound effect parameter; by using the reading time of the first target video frame as a time start point, reading a current video frame of the motion video at every preset time as a second target video frame; identifying the IDs and position coordinate values of the human joint sites in the second target video frame; and adjusting the music parameter and the sound effect parameter according to changes in the position coordinate values of the human joint sites, thereby adjusting the music so as to generate new music. A music generation apparatus, an electronic device and a computer-readable storage medium. Thus, the problem in which music creation is difficult and is not easy to expand is solved.

Description

音乐生成方法、装置、电子设备及计算机可读存储介质Music generation method, device, electronic equipment and computer readable storage medium
本申请要求于2019年10月12日提交中国专利局、申请号为CN201910969868.5、名称为“音乐生成方法、电子装置及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, application number CN201910969868.5, and titled "Music Generation Method, Electronic Device, and Computer-readable Storage Medium" on October 12, 2019, and the entire content of it is approved The reference is incorporated in this application.
技术领域Technical field
本申请涉及数据处理技术领域,尤其涉及一种音乐生成方法、装置、电子设备及计算机可读存储介质。This application relates to the field of data processing technology, and in particular to a music generation method, device, electronic equipment, and computer-readable storage medium.
背景技术Background technique
当今社会,音乐已经深入渗透到人们的生活中,音乐可以调节心情,缓解压力、减少焦虑。发明人意识到传统音乐的生成方式需要创作人具有一定的乐理知识,并结合灵感和创作经验,才能创作出完整的音乐。而对于没有音乐基础的人来说,这些条件限制形成了很高的门槛,让很多热爱音乐的非专业人群都未能参与到创作音乐中。目前,缺少一种创作简单、易扩展的音乐生成方法。In today's society, music has penetrated deeply into people's lives. Music can adjust mood, relieve stress and reduce anxiety. The inventor realized that the generation of traditional music requires the creator to have a certain knowledge of music theory, combined with inspiration and creative experience, in order to create complete music. For those who do not have a musical foundation, these conditions have formed a high threshold, which prevents many non-professional people who love music from participating in the creation of music. At present, there is a lack of a music generation method that is simple to create and easy to expand.
发明内容Summary of the invention
本申请提供的音乐生成方法,包括:The music generation method provided in this application includes:
第一识别步骤:利用摄像单元录制用户的动作视频,读取所述动作视频的当前视频帧作为第一目标视频帧,将所述第一目标视频帧输入预先训练好的模型,识别所述第一目标视频帧中用户的关键部位信息,所述关键部位信息包括第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;The first recognition step: use the camera unit to record the user's action video, read the current video frame of the action video as the first target video frame, input the first target video frame into the pre-trained model, and recognize the first target video frame. A user’s key part information in a target video frame, where the key part information includes the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value;
生成步骤:当识别到第一目标视频帧中所述第一类人体关节部位的位置坐标值和第二类人体关节部位的位置坐标值时,控制播放单元启动并播放根据预设的音乐参数和音效参数的初始值生成的音乐;Generation step: when the position coordinate values of the first type of human body joint parts and the position coordinate values of the second type of human body joint parts in the first target video frame are recognized, the playback unit is controlled to start and play according to the preset music parameters and The music generated by the initial value of the sound effect parameter;
第二识别步骤:以第一目标视频帧的读取时间为时间起点,每间隔预设时间,读取所述动作视频的当前视频帧作为第二目标视频帧,将所述第二目标视频帧输入所述预先训练好的模型,识别所述第二目标视频中用户的第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;The second identification step: taking the reading time of the first target video frame as the starting point of time, reading the current video frame of the action video as the second target video frame at a preset time interval, and setting the second target video frame Input the pre-trained model to identify the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and the position coordinate value of the user in the second target video;
调整步骤:根据预先确定的第一类人体关节部位与音乐参数的映射关系表、第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及第一预设调整幅度表调整音乐参数,根据预先确定的第二类人体关节部位与音效参数的映射关系表、第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及第二预设调整幅度表调整音效参数,并根据调整后音乐参数及音效参数对所述音乐进行调整生成新的音乐。The adjustment step: adjust the music according to the predetermined mapping relationship table of the first type of human body joint parts and the music parameters, the change of the position coordinate values of the first type of human body joint parts in the second target video frame, and the first preset adjustment range table Parameters: adjust the sound effect parameters according to the predetermined mapping table of the second type of human joint parts and the sound effect parameters, the change in the position coordinate values of the second type of human joint parts in the second target video frame, and the second preset adjustment range table , And adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
本申请还提供一种音乐生成装置,所述装置包括:The present application also provides a music generating device, which includes:
第一识别模块,用于利用摄像单元录制用户的动作视频,读取所述动作视频的当前视频帧作为第一目标视频帧,将所述第一目标视频帧输入预先训练好的模型,识别所述第一目标视频帧中用户的关键部位信息,所述关键部位信息包括第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;The first recognition module is used to record the user's action video by using the camera unit, read the current video frame of the action video as the first target video frame, input the first target video frame into the pre-trained model, and identify all The key part information of the user in the first target video frame, where the key part information includes the ID of the first type of human body joint part and its position coordinate value, and the ID of the second type of human body joint part and its position coordinate value;
生成模块,用于当识别到第一目标视频帧中所述第一类人体关节部位的位置坐标值和第二类人体关节部位的位置坐标值时,控制播放单元启动并播放根据预设的音乐参数和音效参数的初始值生成的音乐;The generating module is used to control the playback unit to start and play music according to the preset when the position coordinate values of the first type of human joint parts and the second type of human joint parts in the first target video frame are recognized The music generated by the initial values of the parameters and sound effect parameters;
第二识别模块,用于以第一目标视频帧的读取时间为时间起点,每间隔预设时间,读取所述动作视频的当前视频帧作为第二目标视频帧,将所述第二目标视频帧输入所述预先训练好的模型,识别所述第二目标视频中用户的第一类人体关节部位的ID及其位置坐标 值和第二类人体关节部位的ID及其位置坐标值;The second recognition module is used to read the current video frame of the action video as the second target video frame by taking the reading time of the first target video frame as the time starting point and every preset time interval, and set the second target video frame as the second target video frame. The video frame is input into the pre-trained model to identify the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and the position coordinate value of the user in the second target video;
调整模块,用于根据预先确定的第一类人体关节部位与音乐参数的映射关系表、第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及第一预设调整幅度表调整音乐参数,根据预先确定的第二类人体关节部位与音效参数的映射关系表、第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及第二预设调整幅度表调整音效参数,并根据调整后的音乐参数及音效参数对所述音乐进行调整生成新的音乐。The adjustment module is used to perform the mapping relationship between the first type of human body joints and the music parameters, the change of the position coordinate values of the first type of human joints in the second target video frame, and the first preset adjustment range table. Adjust the music parameters according to the predetermined mapping relationship table of the second type of human body joints and the sound effect parameters, the change of the position coordinate value of the second type of human joints in the second target video frame, and the second preset adjustment range table. Sound effect parameters, and adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
本申请还提供一种电子设备,该电子设备包括:存储器、处理器,所述存储器中存储有可在所述处理器上运行的音乐生成程序,所述音乐生成程序被所述处理器执行时实现如下步骤:The present application also provides an electronic device, the electronic device comprising: a memory, a processor, the memory stores a music generation program that can run on the processor, and when the music generation program is executed by the processor To achieve the following steps:
第一识别步骤:利用摄像单元录制用户的动作视频,读取所述动作视频的当前视频帧作为第一目标视频帧,将所述第一目标视频帧输入预先训练好的模型,识别所述第一目标视频帧中用户的关键部位信息,所述关键部位信息包括第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;The first recognition step: use the camera unit to record the user's action video, read the current video frame of the action video as the first target video frame, input the first target video frame into the pre-trained model, and recognize the first target video frame. A user’s key part information in a target video frame, where the key part information includes the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value;
生成步骤:当识别到第一目标视频帧中所述第一类人体关节部位的位置坐标值和第二类人体关节部位的位置坐标值时,控制播放单元启动并播放根据预设的音乐参数和音效参数的初始值生成的音乐;Generation step: when the position coordinate values of the first type of human body joint parts and the position coordinate values of the second type of human body joint parts in the first target video frame are recognized, the playback unit is controlled to start and play according to the preset music parameters and The music generated by the initial value of the sound effect parameter;
第二识别步骤:以第一目标视频帧的读取时间为时间起点,每间隔预设时间,读取所述动作视频的当前视频帧作为第二目标视频帧,将所述第二目标视频帧输入所述预先训练好的模型,识别所述第二目标视频中用户的第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;The second identification step: taking the reading time of the first target video frame as the starting point of time, reading the current video frame of the action video as the second target video frame at a preset time interval, and setting the second target video frame Input the pre-trained model to identify the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and the position coordinate value of the user in the second target video;
调整步骤:根据预先确定的第一类人体关节部位与音乐参数的映射关系表、第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及第一预设调整幅度表调整音乐参数,根据预先确定的第二类人体关节部位与音效参数的映射关系表、第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及第二预设调整幅度表调整音效参数,并根据调整后的音乐参数及音效参数对所述音乐进行调整生成新的音乐。The adjustment step: adjust the music according to the predetermined mapping relationship table of the first type of human body joint parts and the music parameters, the change of the position coordinate values of the first type of human body joint parts in the second target video frame, and the first preset adjustment range table Parameters: adjust the sound effect parameters according to the predetermined mapping table of the second type of human joint parts and the sound effect parameters, the change in the position coordinate values of the second type of human joint parts in the second target video frame, and the second preset adjustment range table , And adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有音乐生成程序,所述音乐生成程序可被一个或者多个处理器执行,以实现如下步骤:The present application also provides a computer-readable storage medium having a music generation program stored on the computer-readable storage medium, and the music generation program can be executed by one or more processors to implement the following steps:
第一识别步骤:利用摄像单元录制用户的动作视频,读取所述动作视频的当前视频帧作为第一目标视频帧,将所述第一目标视频帧输入预先训练好的模型,识别所述第一目标视频帧中用户的关键部位信息,所述关键部位信息包括第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;The first recognition step: use the camera unit to record the user's action video, read the current video frame of the action video as the first target video frame, input the first target video frame into the pre-trained model, and recognize the first target video frame. A user’s key part information in a target video frame, where the key part information includes the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value;
生成步骤:当识别到第一目标视频帧中所述第一类人体关节部位的位置坐标值和第二类人体关节部位的位置坐标值时,控制播放单元启动并播放根据预设的音乐参数和音效参数的初始值生成的音乐;Generation step: when the position coordinate values of the first type of human body joint parts and the position coordinate values of the second type of human body joint parts in the first target video frame are recognized, the playback unit is controlled to start and play according to the preset music parameters and The music generated by the initial value of the sound effect parameter;
第二识别步骤:以第一目标视频帧的读取时间为时间起点,每间隔预设时间,读取所述动作视频的当前视频帧作为第二目标视频帧,将所述第二目标视频帧输入所述预先训练好的模型,识别所述第二目标视频中用户的第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;The second identification step: taking the reading time of the first target video frame as the starting point of time, reading the current video frame of the action video as the second target video frame at a preset time interval, and setting the second target video frame Input the pre-trained model to identify the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and the position coordinate value of the user in the second target video;
调整步骤:根据预先确定的第一类人体关节部位与音乐参数的映射关系表、第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及第一预设调整幅度表调整音乐参数,根据预先确定的第二类人体关节部位与音效参数的映射关系表、第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及第二预设调整幅度表调整音效参数,并根据调整后的音乐参数及音效参数对所述音乐进行调整生成新的音乐。The adjustment step: adjust the music according to the predetermined mapping relationship table of the first type of human body joint parts and the music parameters, the change of the position coordinate values of the first type of human body joint parts in the second target video frame, and the first preset adjustment range table Parameters: adjust the sound effect parameters according to the predetermined mapping table of the second type of human joint parts and the sound effect parameters, the change in the position coordinate values of the second type of human joint parts in the second target video frame, and the second preset adjustment range table , And adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
附图说明Description of the drawings
图1为本申请电子设备一实施例的示意图;FIG. 1 is a schematic diagram of an embodiment of an electronic device of this application;
图2为音乐生成装置一实施例的模块示意图;2 is a schematic diagram of modules of an embodiment of a music generating device;
图3为本申请音乐生成方法一实施例的流程图。Fig. 3 is a flowchart of an embodiment of a method for generating music according to this application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。It should be noted that the descriptions related to "first", "second", etc. in this application are only for descriptive purposes, and cannot be understood as indicating or implying their relative importance or implicitly indicating the number of indicated technical features . Therefore, the features defined with "first" and "second" may explicitly or implicitly include at least one of the features. In addition, the technical solutions between the various embodiments can be combined with each other, but it must be based on what can be achieved by a person of ordinary skill in the art. When the combination of technical solutions is contradictory or cannot be achieved, it should be considered that such a combination of technical solutions does not exist. , Is not within the scope of protection required by this application.
如图1所示,为本申请电子设备1一实施例的示意图。电子设备1是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。所述电子设备1可以是计算机、也可以是单个网络服务器、多个网络服务器组成的服务器组或者基于云计算的由大量主机或者网络服务器构成的云,其中云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。As shown in FIG. 1, it is a schematic diagram of an embodiment of the electronic device 1 of this application. The electronic device 1 is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. The electronic device 1 may be a computer, a single web server, a server group composed of multiple web servers, or a cloud composed of a large number of hosts or web servers based on cloud computing, where cloud computing is a type of distributed computing, A super virtual computer composed of a group of loosely coupled computer sets.
在本实施例中,电子设备1包括,但不仅限于,可通过系统总线相互通信连接的存储器11、处理器12、网络接口13,该存储器11中存储有音乐生成程序10,所述音乐生成程序10可被所述处理器12执行。图1仅示出了具有组件11-13以及音乐生成程序10的电子设备1,本领域技术人员可以理解的是,图1示出的结构并不构成对电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。In this embodiment, the electronic device 1 includes, but is not limited to, a memory 11, a processor 12, and a network interface 13 that can be communicatively connected to each other through a system bus. The memory 11 stores a music generation program 10, the music generation program 10 can be executed by the processor 12. Figure 1 only shows an electronic device 1 with components 11-13 and a music generating program 10. Those skilled in the art will understand that the structure shown in Figure 1 does not constitute a limitation on the electronic device 1, and may include a comparison chart. Show fewer or more components, or combinations of certain components, or different component arrangements.
其中,存储器11包括内存及至少一种类型的可读存储介质。内存为电子设备1的运行提供缓存;可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等的非易失性或易失性存储介质。在一些实施例中,可读存储介质可以是电子设备1的内部存储单元,例如该电子设备1的硬盘;在另一些实施例中,该非易失性或易失性存储介质也可以是电子设备1的外部存储设备,例如电子设备1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。本实施例中,存储器11的可读存储介质通常用于存储安装于电子设备1的操作系统和各类应用软件,例如存储本申请一实施例中的音乐生成程序10的代码等。此外,存储器11还可以用于暂时地存储已经输出或者将要输出的各类数据。Among them, the memory 11 includes a memory and at least one type of readable storage medium. The memory provides a cache for the operation of the electronic device 1; the readable storage medium can be, for example, flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM) ), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks and other non-volatile or volatile storage media. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 1, such as the hard disk of the electronic device 1. In other embodiments, the non-volatile or volatile storage medium may also be an electronic device. The external storage device of the device 1, such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card, and a flash card (Flash Card) equipped on the electronic device 1. In this embodiment, the readable storage medium of the memory 11 is generally used to store the operating system and various application software installed in the electronic device 1, for example, to store the code of the music generation program 10 in an embodiment of the present application. In addition, the memory 11 can also be used to temporarily store various types of data that have been output or will be output.
处理器12在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器12通常用于控制所述电子设备1的总体操作,例如执行与其他设备进行数据交互或者通信相关的控制和处理等。本实施例中,所述处理器12用于运行所述存储器11中存储的程序代码或者处理数据,例如运行音乐生成程序10等。In some embodiments, the processor 12 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. The processor 12 is generally used to control the overall operation of the electronic device 1, such as performing data interaction or communication-related control and processing with other devices. In this embodiment, the processor 12 is used to run the program code or process data stored in the memory 11, for example, to run the music generation program 10 and so on.
网络接口13可包括无线网络接口或有线网络接口,该网络接口13用于在所述电子设备1与客户端(图中未画出)之间建立通信连接。The network interface 13 may include a wireless network interface or a wired network interface, and the network interface 13 is used to establish a communication connection between the electronic device 1 and a client (not shown in the figure).
可选的,所述电子设备1还可以包括用户接口,用户接口可以包括显示器(Display)、输入单元比如键盘(Keyboard),可选的用户接口还可以包括标准的有线接口、无线接口。 可选的,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device 1 may further include a user interface. The user interface may include a display (Display) and an input unit such as a keyboard (Keyboard). The optional user interface may also include a standard wired interface and a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc. Among them, the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
在本申请的一实施例中,所述音乐生成程序10被所述处理器12执行时实现如下第一识别步骤、生成步骤、第二识别步骤及调整步骤。In an embodiment of the present application, when the music generation program 10 is executed by the processor 12, the following first recognition step, generation step, second recognition step, and adjustment step are implemented.
第一识别步骤:利用摄像单元录制用户的动作视频,读取所述动作视频的当前视频帧作为第一目标视频帧,将所述第一目标视频帧输入预先训练好的模型,识别所述第一目标视频帧中用户的关键部位信息,所述关键部位信息包括第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值。The first recognition step: use the camera unit to record the user's action video, read the current video frame of the action video as the first target video frame, input the first target video frame into the pre-trained model, and recognize the first target video frame. The key part information of the user in a target video frame, where the key part information includes the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value.
所述录制用户的动作视频可以是录制用户的舞蹈视频,也可以是录制用户的健身视频、体育训练视频或者其他任意动作视频。The recording of the user's action video can be the recording of the user's dance video, or the recording of the user's fitness video, sports training video, or any other action video.
在本申请的一个实施例中,所述预先训练好的模型为PoseNet模型,PoseNet模型是一种卷积神经网络模型,运行在Tensorflow.js(一种深度学习框架)上,可以在浏览器中进行实时人体姿态估计。In an embodiment of the present application, the pre-trained model is a PoseNet model, which is a convolutional neural network model, which runs on Tensorflow.js (a deep learning framework) and can be viewed in a browser Perform real-time human pose estimation.
PoseNet模型可以识别单人姿态,也可以识别多人姿态。本实施例中,所选用的为用户的单人动作视频。The PoseNet model can recognize single-person postures, as well as multi-person postures. In this embodiment, the single-player action video of the user is selected.
所述PoseNet模型的训练过程包括:The training process of the PoseNet model includes:
B1、获取预设数量(例如,1万张)的人物动作图片样本,将所述图片样本分为第一比例的训练集和第二比例的验证集;B1. Obtain a preset number (for example, 10,000) of character action picture samples, and divide the picture samples into a training set of a first proportion and a verification set of a second proportion;
B2、利用所述训练集训练所述PoseNet模型;B2. Use the training set to train the PoseNet model;
B3、利用所述验证集验证训练后的PoseNet模型的准确率,若所述准确率大于或者等于预设准确率(例如,95%),则训练结束;B3. Use the verification set to verify the accuracy of the trained PoseNet model. If the accuracy is greater than or equal to the preset accuracy (for example, 95%), the training ends;
B4、若所述准确率小于预设准确率,则按照预设百分比(例如,15%)增加所述预设数量的人物动作图片样本的数量,并返回步骤B1。B4. If the accuracy rate is less than the preset accuracy rate, increase the preset number of character action picture samples according to a preset percentage (for example, 15%), and return to step B1.
本实施例中,PoseNet模型输出的为用户17个关键关节部位的ID及其位置坐标值。In this embodiment, the output of the PoseNet model is the ID of the user's 17 key joint parts and their position coordinate values.
人体关节部位与其ID关系可以如下表1所示:The relationship between human joints and their IDs can be shown in Table 1 below:
IDID 人体关节部位Human joints
00 鼻子nose
11 左眼Left eye
22 右眼Right eye
33 左耳Left ear
44 右耳Right ear
55 左肩Left shoulder
66 右肩Right shoulder
77 左肘Left elbow
88 右肘Right elbow
99 左腕 Left wrist
1010 右腕Right wrist
1111 左臀Left hip
1212 右臀Right hip
1313 左膝Left knee
1414 右膝Right knee
1515 左踝Left ankle
1616 右踝Right ankle
表1Table 1
本实施例中,按照人体关节部位在人体的位置分布将关键关节部位划分为第一类人体关节部位和第二类人体关节部位。例如,第一类人体关节部位可以为人体上半部关节部位,第二类人体关节部位可以为人体下半部关节部位,或者第一类人体关节部位为人体左半部关节部位,第二类人体节点部位为人体右半部关节部位。In this embodiment, the key joint parts are divided into the first type of human joint parts and the second type of human joint parts according to the position distribution of the human body joint parts in the human body. For example, the first type of human joints may be the upper half of the human body, the second type of human joints may be the lower half of the human body, or the first type of human joints may be the left half of the human body, and the second type The node part of the human body is the joint part of the right half of the human body.
本实施例中,第一类人体关节部位为人体左半部关节部位,例如,左腕、左膝、左肘、左臀等。In this embodiment, the first type of joint parts of the human body are the joint parts of the left half of the human body, for example, the left wrist, the left knee, the left elbow, and the left hip.
第二类人体关节部位为人体右半部关节部位,例如,右腕、右膝、右肘、右臀等。The second type of human joints is the joints of the right half of the human body, such as the right wrist, right knee, right elbow, and right hip.
本实施例中,摄像单元的位置是固定的,所述人体关节部位的位置坐标值为人体关节部位在各视频帧中的二维坐标值(X,Y),所述二维坐标系的X轴为各视频帧的上边框,Y轴为各视频帧的左边框,原点为各视频帧的上边框与左边框的交叉点。In this embodiment, the position of the camera unit is fixed, the position coordinate value of the human body joint part is the two-dimensional coordinate value (X, Y) of the human body joint part in each video frame, and the X of the two-dimensional coordinate system The axis is the upper border of each video frame, the Y axis is the left border of each video frame, and the origin is the intersection of the upper border and the left border of each video frame.
所述关键部位信息还包括人体关节部位的位置精度的置信度分值,所述置信度分值介于0到1.0之间,置信度的分值越高,表明识别出的人体关节部位的位置精确度越高。The key part information also includes the confidence score of the position accuracy of the human body joint part. The confidence score is between 0 and 1.0. The higher the confidence score, the position of the identified human joint part is indicated. The higher the accuracy.
生成步骤:当识别到第一目标视频帧中所述第一类和第二类人体关节部位的位置坐标值时,控制播放单元启动并播放根据预设的音乐参数和音效参数的初始值生成的音乐。Generation step: When the position coordinate values of the first and second types of human joint parts in the first target video frame are recognized, the playback unit is controlled to start and play the generated music parameters and the initial values of the sound effect parameters. music.
所述音乐参数包括音高、音乐速度、音符时值、音区等。The music parameters include pitch, music speed, note time value, sound zone and so on.
所述音高为声音的高度,包括A,B,C,D四种类型。The pitch is the height of the sound, including four types: A, B, C, and D.
所述音乐速度为每分钟的拍数,慢速为每分钟40~69拍,中速为每分钟72~84拍,快速为每分钟108~28拍。The music speed is the number of beats per minute, the slow speed is 40 to 69 beats per minute, the medium speed is 72 to 84 beats per minute, and the fast speed is 108 to 28 beats per minute.
所述音符时值用来表示音符之间的相对持续时间,二分音符的时值为全音符的1/2,四份音符为全音符的1/4,八分音符为全音符的1/8。The note duration is used to indicate the relative duration between notes. The duration of a half note is 1/2 of a whole note, a quarter note is 1/4 of a whole note, and an eighth note is 1/8 of a whole note. .
所述音区包括高音区、中音区及低音区,数值区域为3~5。The sound zone includes a high-range zone, a mid-range zone and a low-range zone, and the numerical range is 3-5.
所述音效参数包括响度、延迟时间、左右相位、混响时间等。The sound effect parameters include loudness, delay time, left and right phase, reverberation time, and the like.
所述响度用来描述音量的大小。The loudness is used to describe the size of the volume.
所述延迟时间为声音从发出到人耳接收的中间时间段。The delay time is the intermediate period of time from sound emission to human ear reception.
所述混响时间是声源停止发声后,声波经过反射和吸收在声音消失前的中间时间段。The reverberation time is the intermediate period of time between reflection and absorption of sound waves after the sound source stops sounding before the sound disappears.
所述左右相位为声音的方向,包含左、右、居中三种类型。The left-right phase is the direction of the sound, including three types: left, right, and center.
例如,预设音乐参数和音效参数的初始值如下:For example, the initial values of preset music parameters and sound effect parameters are as follows:
音高初始值为C,音乐速度初始值为90拍,音符时值初始值为四分之一,音区初始值为4,响度初始值为系统音量的80%,延迟时间初始值为0.6秒,混响时间初始值为1秒,左右相位初始值为居中。The initial value of pitch is C, the initial value of music speed is 90 beats, the initial value of note duration is one quarter, the initial value of sound zone is 4, the initial value of loudness is 80% of the system volume, and the initial value of delay time is 0.6 seconds. , The initial value of the reverberation time is 1 second, and the initial value of the left and right phases is centered.
第二识别步骤:以第一目标视频帧的读取时间为时间起点,每间隔预设时间,读取所述动作视频的当前视频帧作为第二目标视频帧,将所述第二目标视频帧输入所述预先训练好的模型,识别所述第二目标视频中用户的第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值。The second identification step: taking the reading time of the first target video frame as the starting point of time, reading the current video frame of the action video as the second target video frame at a preset time interval, and setting the second target video frame Input the pre-trained model to identify the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value of the user in the second target video.
因动作视频中相邻视频帧的变化较小,且为了减少待处理的数据量,本实施例未读取所有视频帧,而是采用间隔预设时间读取一帧的方式。Because the changes of adjacent video frames in the action video are small, and in order to reduce the amount of data to be processed, this embodiment does not read all the video frames, but adopts a method of reading one frame at a preset time interval.
调整步骤:根据预先确定的第一类人体关节部位与音乐参数的映射关系表、第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及第一预设调整幅度表调整音乐参数,根据预先确定的第二类人体关节部位与音效参数的映射关系表、第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及第二预设调整幅度表调整音效参数,并根据调整后的音乐参数及音效参数对所述音乐进行调整生成新的音乐。The adjustment step: adjust the music according to the predetermined mapping relationship table of the first type of human body joint parts and the music parameters, the change of the position coordinate values of the first type of human body joint parts in the second target video frame, and the first preset adjustment range table Parameter, adjust the sound effect parameter according to the predetermined mapping relationship table of the second type of human body joint parts and the sound effect parameter, the change amount of the position coordinate value of the second type of human body joint part in the second target video frame, and the second preset adjustment range table , And adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
在本申请的一个实施例中,所述调整步骤包括:In an embodiment of the present application, the adjustment step includes:
A1、将第一目标视频帧中的人体各关节部位的位置坐标值作为各关节部位的位置初始值;A1. Use the position coordinate value of each joint part of the human body in the first target video frame as the initial value of the position of each joint part;
例如,第一目标视频帧ID为9的左腕的位置初始值可以表示为(X 9-start,Y 9-start),ID为10的右腕的位置初始值可以表示为(X 10-start,Y 10-start)。 For example, the initial value of the left wrist position of the first target video frame ID 9 can be expressed as (X 9-start , Y 9-start ), and the initial value of the right wrist position ID of 10 can be expressed as (X 10-start , Y 10-start ).
A2、根据第二目标视频帧中第一类人体关节部位的位置坐标值及其位置初始值,计算得出第二目标视频帧中第一类人体关节部位的位置坐标值的变化量;A2. According to the position coordinate values of the first type of human body joints in the second target video frame and their initial position values, calculate the amount of change in the position coordinate values of the first type of human joints in the second target video frame;
例如,第二目标视频帧中ID为9的左腕的位置坐标值表示为(X 9-2,Y 9-2),则第二目标视频帧中ID为9的左腕的X轴的位置坐标值的变化量为X 9-2start=X 9-2-X 9-start,Y轴的位置坐标值的变化量为Y 9-2start=Y 9-2-Y 9-startFor example, the position coordinate value of the left wrist with the ID of 9 in the second target video frame is represented as (X 9-2 , Y 9-2 ), then the position coordinate value of the left wrist with the ID of 9 in the second target video frame on the X axis The amount of change in X 9-2start =X 9-2 -X 9-start , and the amount of change in the position coordinate value of the Y axis is Y 9-2start =Y 9-2 -Y 9-start .
A3、根据第二目标视频帧中第二类人体关节部位的位置坐标值及其位置初始值,计算得出第二目标视频帧中第二类人体关节部位的位置坐标值的变化量;A3. According to the position coordinate values of the second type of human joints in the second target video frame and their initial position values, calculate the amount of change in the position coordinate values of the second type of human joints in the second target video frame;
例如,第二目标视频帧中ID为10的右腕的位置坐标值表示为(X 10-2,Y 10-2),则第二目标视频帧中ID为10的右腕的X轴的位置坐标值的变化量为X 10-2start=X 10-2-X 10-start,Y轴的位置坐标值的变化量为Y 10-2start=Y 10-2-Y 10-startFor example, the position coordinate value of the right wrist with ID 10 in the second target video frame is expressed as (X 10-2 , Y 10-2 ), then the X-axis position coordinate value of the right wrist with ID 10 in the second target video frame The amount of change of X 10-2start =X 10-2 -X 10-start , and the amount of change of the position coordinate value of the Y axis is Y 10-2start =Y 10-2 -Y 10-start .
A4、根据第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及预先确定的第一类人体关节部位与音乐参数的映射关系表确定需要调整的音乐参数的名称,根据第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及预先确定的第二类人体关节部位与音效参数的映射关系表确定需要调整的音效参数的名称;A4. Determine the name of the music parameter that needs to be adjusted according to the change in the position coordinate value of the first type of human joint part in the second target video frame and the predetermined mapping relationship between the first type of human joint part and the music parameter. 2. The amount of change in the position coordinate values of the second type of human joint parts in the target video frame and the predetermined mapping relationship table between the second type of human joint parts and the sound effect parameters determine the name of the sound effect parameter that needs to be adjusted;
所述预先确定的第一类人体关节部位的ID与音乐参数的映射关系表可以用如下表2表示。The predetermined mapping relationship table between IDs of the first type of human joint parts and music parameters can be represented by Table 2 below.
IDID 动作姿态Action posture 有变化的坐标Changing coordinates 音乐参数Music parameters
99 左腕上下移动Move left wrist up and down YY 音高pitch
99 左腕左右摆动Swing left and right XX 音乐速度Music speed
1313 左膝左右摆动Left knee swing left and right XX 音符时值Note duration
77 左肘上下移动Move the left elbow up and down YY 音区Sound zone
表2Table 2
所述预先确定的第二类人体关节部位的ID与音效参数的映射关系表可以用如下表3表示。The predetermined mapping relationship table between the ID of the second type of human joint part and the sound effect parameter can be represented by Table 3 below.
IDID 动作姿态Action posture 有变化的坐标Changing coordinates 音效参数 Sound effect parameters
1010 右腕上下移动Move right wrist up and down YY 响度Loudness
88 右肘左右摆动Swing right elbow left and right XX 延迟时间delay
66 右肩左右摆动Swing right shoulder XX 左右相位Left and right phase
1414 右膝左右摆动Right knee swing left and right XX 混响时间Reverberation time
表3table 3
例如,根据第二目标视频帧中ID为9的左腕的X轴的位置坐标值的变化量可确定音乐参数中的音乐速度需要调整,根据第二目标视频帧中ID为9的左腕的Y轴的位置坐标值的变化量可确定音乐参数中的音高需要调整。For example, according to the change in the X-axis position coordinate value of the left wrist with ID 9 in the second target video frame, it can be determined that the music speed in the music parameters needs to be adjusted, and according to the Y-axis of the left wrist with ID 9 in the second target video frame The amount of change in the position coordinate value can determine that the pitch of the music parameter needs to be adjusted.
A5、根据第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及第一预设调整幅度表对所述需要调整的音乐参数进行调整,根据第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及第二预设调整幅度表对所述需要调整的音效参数进行调整。A5. Adjust the music parameters that need to be adjusted according to the amount of change in the position coordinate values of the joints of the first type of human body in the second target video frame and the first preset adjustment range table, and adjust the music parameters to be adjusted according to the second target video frame in the second target video frame. The change amount of the position coordinate values of the human body joint parts and the second preset adjustment range table adjust the sound effect parameters that need to be adjusted.
第一预设调整幅度表可以用表4表示。The first preset adjustment range table can be represented by Table 4.
Figure PCTCN2020119078-appb-000001
Figure PCTCN2020119078-appb-000001
Figure PCTCN2020119078-appb-000002
Figure PCTCN2020119078-appb-000002
表4Table 4
第二预设调整幅度表可以用表5表示。The second preset adjustment range table can be represented by Table 5.
Figure PCTCN2020119078-appb-000003
Figure PCTCN2020119078-appb-000003
表5table 5
例如,假如d为5,当第二目标视频帧中ID为9的左腕的X轴的位置坐标值的变化量X 9-2start为8时,则音乐速度需调整为110拍。 For example, if d is 5, when the amount of change X 9-2start of the X axis position coordinate value of the left wrist with ID 9 in the second target video frame is 8, the music speed needs to be adjusted to 110 beats.
当第二目标视频帧中ID为10的右腕的Y轴的位置坐标值的变化量Y 10-2start为-13时,则响度需调整为系统音量的74%。 When the change amount Y 10-2start of the Y-axis position coordinate value of the right wrist with the ID of 10 in the second target video frame is -13, the loudness needs to be adjusted to 74% of the system volume.
A6、根据调整后音乐参数及音效参数对所述音乐进行调整生成新的音乐。A6. Adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
在本申请的一个实施例中,所述音乐生成程序10被所述处理器12执行时还实现如下步骤:In an embodiment of the present application, when the music generating program 10 is executed by the processor 12, the following steps are further implemented:
停止步骤:当接收到预设停止信号时,控制播放单元停止播放所述音乐。Stopping step: when a preset stop signal is received, the playing unit is controlled to stop playing the music.
本实施例中,所述预设停止信号可以是停止录制用户的动作视频,也可以是音乐播放时间达到预设时间阈值(例如3分钟)。In this embodiment, the preset stop signal may be to stop recording the user's action video, or it may be that the music playing time reaches a preset time threshold (for example, 3 minutes).
由上述实施例可知,本申请提出的电子设备1,首先,读取正在录制的动作视频的当前视频帧作为第一目标视频帧,识别第一目标视频帧中的人体关节部位的ID和位置坐标值并控制播放单元启动,播放按照预设的音乐参数和音效参数的初始值生成的音乐;然后,以第一目标视频帧的读取时间为时间起点,每隔预设时间读取动作视频的当前视频帧作为第二目标视频帧,识别第二目标视频帧中人体关节部位的ID及位置坐标值,根据人体关节部位的位置坐标值的变化量调整音乐参数和音效参数,从而对所述音乐进行调整生成新 的音乐,从而解决了音乐创作难、不易扩展的问题。It can be seen from the above embodiment that the electronic device 1 proposed in this application first reads the current video frame of the action video being recorded as the first target video frame, and identifies the ID and position coordinates of the human joints in the first target video frame Value and control the playback unit to start, play music generated according to the preset music parameters and the initial values of the sound effect parameters; then, take the reading time of the first target video frame as the starting point, and read the action video every preset time The current video frame is used as the second target video frame, the ID and position coordinate values of the human joints in the second target video frame are identified, and the music parameters and sound effect parameters are adjusted according to the changes in the position coordinate values of the human joints, so that the music Make adjustments to generate new music, thereby solving the problem of difficulty in music creation and not easy to expand.
如图2所示,为音乐生成装置100一实施例的模块示意图。As shown in FIG. 2, it is a schematic diagram of modules of an embodiment of the music generating apparatus 100.
在本申请的一个实施例中,音乐生成装置100包括第一识别模块110、生成模块120、第二识别模块130及调整模块140,示例性地:In an embodiment of the present application, the music generation device 100 includes a first recognition module 110, a generation module 120, a second recognition module 130, and an adjustment module 140. Illustratively:
所述第一识别模块110,用于利用摄像单元录制用户的动作视频,读取所述动作视频的当前视频帧作为第一目标视频帧,将所述第一目标视频帧输入预先训练好的模型,识别所述第一目标视频帧中用户的关键部位信息,所述关键部位信息包括第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;The first recognition module 110 is configured to use a camera unit to record a user's action video, read the current video frame of the action video as a first target video frame, and input the first target video frame into a pre-trained model Identify the key part information of the user in the first target video frame, where the key part information includes the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value;
所述生成模块120,用于当识别到第一目标视频帧中所述第一类人体关节部位的位置坐标值和第二类人体关节部位的位置坐标值时,控制播放单元启动并播放根据预设的音乐参数和音效参数的初始值生成的音乐;The generating module 120 is configured to control the playing unit to start and play the first target video frame when the position coordinate values of the first type of human joint parts and the second type of human joint parts are recognized. Music generated by the initial values of the set music parameters and sound effect parameters;
所述第二识别模块130,用于以第一目标视频帧的读取时间为时间起点,每间隔预设时间,读取所述动作视频的当前视频帧作为第二目标视频帧,将所述第二目标视频帧输入所述预先训练好的模型,识别所述第二目标视频中用户的第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;The second identification module 130 is configured to take the reading time of the first target video frame as the starting point of time, and read the current video frame of the action video as the second target video frame at a preset time interval, and set the The second target video frame is input to the pre-trained model to identify the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate in the second target video value;
所述调整模块140,用于根据预先确定的第一类人体关节部位与音乐参数的映射关系表、第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及第一预设调整幅度表调整音乐参数,根据预先确定的第二类人体关节部位与音效参数的映射关系表、第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及第二预设调整幅度表调整音效参数,并根据调整后的音乐参数及音效参数对所述音乐进行调整生成新的音乐。The adjustment module 140 is used to determine the mapping relationship between the first type of human body joint parts and the music parameters, the change in the position coordinate values of the first type of human body joint parts in the second target video frame, and the first preset The adjustment range table adjusts the music parameters, according to the predetermined mapping table of the second type of human body joints and the sound effect parameters, the change of the position coordinate values of the second type of human joints in the second target video frame, and the second preset adjustment The amplitude table adjusts the sound effect parameters, and adjusts the music according to the adjusted music parameters and the sound effect parameters to generate new music.
上述第一识别模块110、生成模块120、第二识别模块130及调整模块140等模块被执行时所实现的功能或操作步骤与上述实施例大体相同,在此不再赘述。The functions or operation steps implemented by the first recognition module 110, the generation module 120, the second recognition module 130, and the adjustment module 140 when executed are substantially the same as those in the foregoing embodiment, and will not be repeated here.
如图3所示,为本申请音乐生成方法一实施例的流程图,该音乐生成方法包括步骤S1-S4。As shown in FIG. 3, it is a flowchart of an embodiment of the music generating method of this application. The music generating method includes steps S1-S4.
S1、利用摄像单元录制用户的动作视频,读取所述动作视频的当前视频帧作为第一目标视频帧,将所述第一目标视频帧输入预先训练好的模型,识别所述第一目标视频帧中用户的关键部位信息,所述关键部位信息包括第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值。S1. Use the camera unit to record the user's action video, read the current video frame of the action video as the first target video frame, input the first target video frame into a pre-trained model, and identify the first target video The key part information of the user in the frame, the key part information includes the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value.
所述录制用户的动作视频可以是录制用户的舞蹈视频,也可以是录制用户的健身视频、体育训练视频或者其他任意动作视频。The recording of the user's action video can be the recording of the user's dance video, or the recording of the user's fitness video, sports training video, or any other action video.
在本申请的一个实施例中,所述预先训练好的模型为PoseNet模型,PoseNet模型是一种卷积神经网络模型,运行在Tensorflow.js(一种深度学习框架)上,可以在浏览器中进行实时人体姿势估计。In an embodiment of the present application, the pre-trained model is a PoseNet model, which is a convolutional neural network model, which runs on Tensorflow.js (a deep learning framework) and can be viewed in a browser Perform real-time human pose estimation.
PoseNet模型可以识别单人姿态,也可以识别多人姿态。本实施例中,所选用的为用户的单人动作视频。The PoseNet model can recognize single-person postures, as well as multi-person postures. In this embodiment, the single-player action video of the user is selected.
所述PoseNet模型的训练过程包括:The training process of the PoseNet model includes:
B1、获取预设数量(例如,1万张)的人物动作图片样本,将所述图片样本分为第一比例的训练集和第二比例的验证集;B1. Obtain a preset number (for example, 10,000) of character action picture samples, and divide the picture samples into a training set of a first proportion and a verification set of a second proportion;
B2、利用所述训练集训练所述PoseNet模型;B2. Use the training set to train the PoseNet model;
B3、利用所述验证集验证训练后的PoseNet模型的准确率,若所述准确率大于或者等于预设准确率(例如,95%),则训练结束;B3. Use the verification set to verify the accuracy of the trained PoseNet model. If the accuracy is greater than or equal to the preset accuracy (for example, 95%), the training ends;
B4、若所述准确率小于预设准确率,则按照预设百分比(例如,15%)增加所述预设数量的人物动作图片样本的数量,并返回步骤B1。B4. If the accuracy rate is less than the preset accuracy rate, increase the preset number of character action picture samples according to a preset percentage (for example, 15%), and return to step B1.
本实施例中,PoseNet模型输出的为用户17个关键关节部位的ID及其位置坐标值。In this embodiment, the output of the PoseNet model is the ID of the user's 17 key joint parts and their position coordinate values.
人体关节部位与其ID关系可以如上表1所示。The relationship between the body joints and their ID can be as shown in Table 1 above.
本实施例中,按照人体关节部位在人体的位置分布将关键关节部位划分为第一类人体关节部位和第二类人体关节部位。例如,第一类人体关节部位可以为人体上半部关节部位,第二类人体关节部位可以为人体下半部关节部位,或者第一类关节部位为人体左半部关节部位,第二类节点部位为人体右半部关节部位。In this embodiment, the key joint parts are divided into the first type of human joint parts and the second type of human joint parts according to the position distribution of the human body joint parts in the human body. For example, the first type of human joints may be the upper half of the human body, the second type of human joints may be the lower half of the human body, or the first type of joints may be the left half of the human body, and the second type of joints The part is the joint part of the right half of the human body.
本实施例中,第一类人体关节部位为人体左半部关节部位,例如,左腕、左膝、左肘、左臀等。In this embodiment, the first type of joint parts of the human body are the joint parts of the left half of the human body, for example, the left wrist, the left knee, the left elbow, and the left hip.
第二类人体关节部位为人体右半部关节部位,例如,右腕、右膝、右肘、右臀等。The second type of human joints is the joints of the right half of the human body, such as the right wrist, right knee, right elbow, and right hip.
本实施例中,摄像单元的位置是固定的,所述人体关节部位的位置坐标值为人体关节部位在各视频帧中的二维坐标值(X,Y),所述二维坐标系的X轴为各视频帧的上边框,Y轴为各视频帧的左边框,原点为各视频帧的上边框与左边框的交叉点。In this embodiment, the position of the camera unit is fixed, the position coordinate value of the human body joint part is the two-dimensional coordinate value (X, Y) of the human body joint part in each video frame, and the X of the two-dimensional coordinate system The axis is the upper border of each video frame, the Y axis is the left border of each video frame, and the origin is the intersection of the upper border and the left border of each video frame.
所述关键部位信息还包括人体关节部位的位置精度的置信度分值,所述置信度分值介于0到1.0之间,置信度的分值越高,表明识别出的人体关节部位的位置精确度越高。The key part information also includes the confidence score of the position accuracy of the human body joint part. The confidence score is between 0 and 1.0. The higher the confidence score, the position of the identified human joint part is indicated. The higher the accuracy.
S2、当识别到第一目标视频帧中所述第一类和第二类人体关节部位的位置坐标值时,控制播放单元启动并播放根据预设的音乐参数和音效参数的初始值生成的音乐。S2. When the position coordinate values of the first and second types of human joint parts in the first target video frame are recognized, control the playback unit to start and play the music generated according to the preset music parameters and the initial values of the sound effect parameters .
所述音乐参数包括音高、音乐速度、音符时值、音区等。The music parameters include pitch, music speed, note time value, sound zone and so on.
所述音高为声音的高度,包括A,B,C,D四种类型。The pitch is the height of the sound, including four types: A, B, C, and D.
所述音乐速度为每分钟的拍数,慢速为每分钟40~69拍,中速为每分钟72~84拍,快速为每分钟108~28拍。The music speed is the number of beats per minute, the slow speed is 40 to 69 beats per minute, the medium speed is 72 to 84 beats per minute, and the fast speed is 108 to 28 beats per minute.
所述音符时值用来表示音符之间的相对持续时间,二分音符的时值为全音符的1/2,四份音符为全音符的1/4,八分音符为全音符的1/8。The note duration is used to indicate the relative duration between notes. The duration of a half note is 1/2 of a whole note, a quarter note is 1/4 of a whole note, and an eighth note is 1/8 of a whole note. .
所述音区包括高音区、中音区及低音区,数值区域为3~5。The sound zone includes a high-range zone, a mid-range zone and a low-range zone, and the numerical range is 3-5.
所述音效参数包括响度、延迟时间、左右相位、混响时间等。The sound effect parameters include loudness, delay time, left and right phase, reverberation time, and the like.
所述响度用来描述音量的大小。The loudness is used to describe the size of the volume.
所述延迟时间为声音从发出到人耳接收的中间时间段。The delay time is the intermediate period of time from sound emission to human ear reception.
所述混响时间是声源停止发声后,声波经过反射和吸收在声音消失前的中间时间段。The reverberation time is the intermediate period of time between reflection and absorption of sound waves after the sound source stops sounding before the sound disappears.
所述左右相位为声音的方向,包含左、右、居中三种类型。The left-right phase is the direction of the sound, including three types: left, right, and center.
例如,预设音乐参数和音效参数的初始值如下:For example, the initial values of preset music parameters and sound effect parameters are as follows:
音高初始值为C,音乐速度初始值为90拍,音符时值初始值为四分之一,音区初始值为4,响度初始值为系统音量的80%,延迟时间初始值为0.6秒,混响时间初始值为1秒,左右相位初始值为居中。The initial value of pitch is C, the initial value of music speed is 90 beats, the initial value of note duration is one quarter, the initial value of sound zone is 4, the initial value of loudness is 80% of the system volume, and the initial value of delay time is 0.6 seconds. , The initial value of the reverberation time is 1 second, and the initial value of the left and right phases is centered.
S3、以第一目标视频帧的读取时间为时间起点,每间隔预设时间,读取所述动作视频的当前视频帧作为第二目标视频帧,将所述第二目标视频帧输入所述预先训练好的模型,识别所述第二目标视频中用户的第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值。S3. Taking the reading time of the first target video frame as the starting point of time, reading the current video frame of the action video as the second target video frame at a preset time interval, and inputting the second target video frame to the The pre-trained model identifies the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value of the user in the second target video.
因动作视频中相邻视频帧的变化较小,且为了减少待处理的数据量,本实施例未读取所有视频帧,而是采用间隔预设时间读取一帧的方式。Because the changes of adjacent video frames in the action video are small, and in order to reduce the amount of data to be processed, this embodiment does not read all the video frames, but adopts a method of reading one frame at a preset time interval.
S4、根据预先确定的第一类人体关节部位与音乐参数的映射关系表、第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及第一预设调整幅度表调整音乐参数,根据预先确定的第二类人体关节部位与音效参数的映射关系表、第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及第二预设调整幅度表调整音效参数,并根据调整后的音乐参数及音效参数对所述音乐进行调整生成新的音乐。S4. Adjust the music parameters according to the predetermined mapping relationship table of the first type of human body joint parts and the music parameters, the change amount of the position coordinate values of the first type of human body joint parts in the second target video frame, and the first preset adjustment range table , Adjust the sound effect parameters according to the predetermined mapping relationship table of the second type of human body joint parts and the sound effect parameters, the change amount of the position coordinate values of the second type of human body joint parts in the second target video frame, and the second preset adjustment range table, And according to the adjusted music parameters and sound effect parameters, the music is adjusted to generate new music.
在本申请的一个实施例中,所述调整步骤包括:In an embodiment of the present application, the adjustment step includes:
A1、将第一目标视频帧中的人体各关节部位的位置坐标值作为各关节部位的位置初始值;A1. Use the position coordinate value of each joint part of the human body in the first target video frame as the initial value of the position of each joint part;
例如,第一目标视频帧ID为9的左腕的位置初始值可以表示为(X 9-start,Y 9-start),ID为10的右腕的位置初始值可以表示为(X 10-start,Y 10-start)。 For example, the initial value of the left wrist position of the first target video frame ID 9 can be expressed as (X 9-start , Y 9-start ), and the initial value of the right wrist position ID of 10 can be expressed as (X 10-start , Y 10-start ).
A2、根据第二目标视频帧中第一类人体关节部位的位置坐标值及其位置初始值,计算得出第二目标视频帧中第一类人体关节部位的位置坐标值的变化量;A2. According to the position coordinate values of the first type of human body joints in the second target video frame and their initial position values, calculate the amount of change in the position coordinate values of the first type of human joints in the second target video frame;
例如,第二目标视频帧中ID为9的左腕的位置坐标值表示为(X 9-2,Y 9-2),则第二目标视频帧中ID为9的左腕的X轴的位置坐标值的变化量为X 9-2start=X 9-2-X 9-start,Y轴的位置坐标值的变化量为Y 9-2start=Y 9-2-Y 9-startFor example, the position coordinate value of the left wrist with the ID of 9 in the second target video frame is represented as (X 9-2 , Y 9-2 ), then the position coordinate value of the left wrist with the ID of 9 in the second target video frame on the X axis The amount of change in X 9-2start =X 9-2 -X 9-start , and the amount of change in the position coordinate value of the Y axis is Y 9-2start =Y 9-2 -Y 9-start .
A3、根据第二目标视频帧中第二类人体关节部位的位置坐标值及其位置初始值,计算得出第二目标视频帧中第二类人体关节部位的位置坐标值的变化量;A3. According to the position coordinate values of the second type of human joints in the second target video frame and their initial position values, calculate the amount of change in the position coordinate values of the second type of human joints in the second target video frame;
例如,第二目标视频帧中ID为10的右腕的位置坐标值表示为(X 10-2,Y 10-2),则第二目标视频帧中ID为10的右腕的X轴的位置坐标值的变化量为X 10-2start=X 10-2-X 10-start,Y轴的位置坐标值的变化量为Y 10-2start=Y 10-2-Y 10-startFor example, the position coordinate value of the right wrist with ID 10 in the second target video frame is expressed as (X 10-2 , Y 10-2 ), then the X-axis position coordinate value of the right wrist with ID 10 in the second target video frame The amount of change of X 10-2start =X 10-2 -X 10-start , and the amount of change of the position coordinate value of the Y axis is Y 10-2start =Y 10-2 -Y 10-start .
A4、根据第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及预先确定的第一类人体关节部位与音乐参数的映射关系表确定需要调整的音乐参数的名称,根据第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及预先确定的第二类人体关节部位与音效参数的映射关系表确定需要调整的音效参数的名称;A4. Determine the name of the music parameter that needs to be adjusted according to the change in the position coordinate value of the first type of human joint part in the second target video frame and the predetermined mapping relationship between the first type of human joint part and the music parameter. 2. The amount of change in the position coordinate values of the second type of human joint parts in the target video frame and the predetermined mapping relationship table between the second type of human joint parts and the sound effect parameters determine the name of the sound effect parameter that needs to be adjusted;
所述预先确定的第一类人体关节部位的ID与音乐参数的映射关系表可以用如上表2表示。The predetermined mapping relationship table between IDs of the first type of human joint parts and music parameters can be represented by Table 2 above.
所述预先确定的第二类人体关节部位的ID与音效参数的映射关系表可以用如上表3表示。The predetermined mapping relationship table between the ID of the second type of human body joint part and the sound effect parameter can be represented by Table 3 above.
例如,根据第二目标视频帧中ID为9的左腕的X轴的位置坐标值的变化量可确定音乐参数中的音乐速度需要调整,根据第二目标视频帧中ID为9的左腕的Y轴的位置坐标值的变化量可确定音乐参数中的音高需要调整。For example, according to the change in the X-axis position coordinate value of the left wrist with ID 9 in the second target video frame, it can be determined that the music speed in the music parameters needs to be adjusted, and according to the Y-axis of the left wrist with ID 9 in the second target video frame The amount of change in the position coordinate value can determine that the pitch of the music parameter needs to be adjusted.
A5、根据第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及第一预设调整幅度表对所述需要调整的音乐参数进行调整,根据第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及第二预设调整幅度表对所述需要调整的音效参数进行调整。A5. Adjust the music parameters that need to be adjusted according to the amount of change in the position coordinate values of the joints of the first type of human body in the second target video frame and the first preset adjustment range table, and according to the second target video frame in the second target video frame. The change amount of the position coordinate values of the human body joint parts and the second preset adjustment range table adjust the sound effect parameters that need to be adjusted.
第一预设调整幅度表可以用上表4表示。The first preset adjustment range table can be represented by Table 4 above.
第二预设调整幅度表可以用上表5表示。The second preset adjustment range table can be represented by Table 5 above.
例如,假如d为5,当第二目标视频帧中ID为9的左腕的X轴的位置坐标值的变化量X 9-2start为8时,则音乐速度需调整为110拍。 For example, if d is 5, when the amount of change X 9-2start of the X axis position coordinate value of the left wrist with ID 9 in the second target video frame is 8, the music speed needs to be adjusted to 110 beats.
当第二目标视频帧中ID为10的右腕的Y轴的位置坐标值的变化量Y 10-2start为-13时,则响度需调整为系统音量的74%。 When the change amount Y 10-2start of the Y-axis position coordinate value of the right wrist with the ID of 10 in the second target video frame is -13, the loudness needs to be adjusted to 74% of the system volume.
A6、根据调整后音乐参数及音效参数对所述音乐进行调整生成新的音乐。A6. Adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
在本申请的一个实施例中,所述音乐生成程序10被所述处理器12执行时还实现如下步骤:In an embodiment of the present application, when the music generating program 10 is executed by the processor 12, the following steps are further implemented:
停止步骤:当接收到预设停止信号时,控制播放单元停止播放所述音乐。Stopping step: when a preset stop signal is received, the playing unit is controlled to stop playing the music.
本实施例中,所述预设停止信号可以是停止录制用户的动作视频,也可以是音乐播放时间达到预设时间阈值(例如3分钟)。In this embodiment, the preset stop signal may be to stop recording the user's action video, or it may be that the music playing time reaches a preset time threshold (for example, 3 minutes).
由上述实施例可知,本申请提出的音乐生成方法,首先,读取正在录制的动作视频的当前视频帧作为第一目标视频帧,识别第一目标视频帧中的人体关节部位的ID和位置坐标值并控制播放单元启动,播放按照预设的音乐参数和音效参数的初始值生成的音乐;然后,以第一目标视频帧的读取时间为时间起点,每隔预设时间读取动作视频的当前视频帧作为第二目标视频帧,识别第二目标视频帧中人体关节部位的ID及位置坐标值,根据人体关节部位的位置坐标值的变化量调整音乐参数和音效参数,从而对所述音乐进行调整生成新的音乐,从而解决了音乐创作难、不易扩展的问题。It can be seen from the above embodiments that the music generation method proposed in this application first reads the current video frame of the action video being recorded as the first target video frame, and identifies the ID and position coordinates of the human joints in the first target video frame Value and control the playback unit to start, play music generated according to the preset music parameters and the initial values of the sound effect parameters; then, take the reading time of the first target video frame as the starting point, and read the action video every preset time The current video frame is used as the second target video frame, the ID and position coordinate values of the human joints in the second target video frame are identified, and the music parameters and sound effect parameters are adjusted according to the changes in the position coordinate values of the human joints, so that the music Make adjustments to generate new music, thereby solving the problem of difficulty in music creation and not easy to expand.
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质可以是易失性,也可以是非易失性,计算机可读存储介质可以是硬盘、多媒体卡、SD卡、闪存卡、SMC、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器等中的任意一种或者几种的任意组合。计算机可读存储介质中包括音乐生成程序10,所述音乐生成程序10被处理器执行时实现如下操作:In addition, the embodiment of the present application also proposes a computer-readable storage medium. The computer-readable storage medium may be volatile or non-volatile. The computer-readable storage medium may be a hard disk, a multimedia card, or an SD card. , Flash memory card, SMC, read only memory (ROM), erasable programmable read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, etc. any one or more of them random combination. The computer-readable storage medium includes a music generation program 10, and when the music generation program 10 is executed by a processor, the following operations are implemented:
利用摄像单元录制用户的动作视频,读取所述动作视频的当前视频帧作为第一目标视频帧,将所述第一目标视频帧输入预先训练好的模型,识别所述第一目标视频帧中用户的关键部位信息,所述关键部位信息包括第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;Use the camera unit to record the user's action video, read the current video frame of the action video as the first target video frame, input the first target video frame into the pre-trained model, and identify the first target video frame The key part information of the user, where the key part information includes the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value;
当识别到第一目标视频帧中所述第一类人体关节部位的位置坐标值和第二类人体关节部位的位置坐标值时,控制播放单元启动并播放根据预设的音乐参数和音效参数的初始值生成的音乐;When the position coordinate values of the first type of human body joint parts and the position coordinate values of the second type of human body joint parts in the first target video frame are recognized, the playback unit is controlled to start and play the music parameters and sound effect parameters according to the preset music parameters and sound effect parameters. Music generated by the initial value;
以第一目标视频帧的读取时间为时间起点,每间隔预设时间,读取所述动作视频的当前视频帧作为第二目标视频帧,将所述第二目标视频帧输入所述预先训练好的模型,识别所述第二目标视频中用户的第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;Taking the reading time of the first target video frame as the starting point of time, every preset time interval, read the current video frame of the action video as the second target video frame, and input the second target video frame into the pre-training A good model to identify the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value of the user in the second target video;
根据预先确定的第一类人体关节部位与音乐参数的映射关系表、第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及第一预设调整幅度表调整音乐参数,根据预先确定的第二类人体关节部位与音效参数的映射关系表、第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及第二预设调整幅度表调整音效参数,并根据调整后的音乐参数及音效参数对所述音乐进行调整生成新的音乐。The music parameters are adjusted according to the predetermined mapping relationship table of the first type of human body joint parts and the music parameters, the change of the position coordinate value of the first type of human body joint parts in the second target video frame, and the first preset adjustment range table. The predetermined mapping relationship table between the second type of human body joints and the sound effect parameters, the change of the position coordinate values of the second type of human joints in the second target video frame, and the second preset adjustment range table to adjust the sound effect parameters, and adjust the sound effect parameters according to The adjusted music parameters and sound effect parameters are adjusted to the music to generate new music.
本申请之计算机可读存储介质的具体实施方式与上述音乐生成方法以及电子设备的具体实施方式大致相同,在此不再赘述。The specific implementation of the computer-readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned music generation method and electronic device, and will not be repeated here.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the superiority of the embodiments.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, It also includes other elements not explicitly listed, or elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article, or method that includes the element.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims (20)

  1. 一种音乐生成方法,应用于电子设备,该电子设备包括摄像单元、播放单元,其中,所述方法包括:A music generation method is applied to an electronic device, the electronic device includes a camera unit and a playback unit, wherein the method includes:
    第一识别步骤:利用摄像单元录制用户的动作视频,读取所述动作视频的当前视频帧作为第一目标视频帧,将所述第一目标视频帧输入预先训练好的模型,识别所述第一目标视频帧中用户的关键部位信息,所述关键部位信息包括第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;The first recognition step: use the camera unit to record the user's action video, read the current video frame of the action video as the first target video frame, input the first target video frame into the pre-trained model, and recognize the first target video frame. A user’s key part information in a target video frame, where the key part information includes the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value;
    生成步骤:当识别到第一目标视频帧中所述第一类人体关节部位的位置坐标值和第二类人体关节部位的位置坐标值时,控制播放单元启动并播放根据预设的音乐参数和音效参数的初始值生成的音乐;Generation step: when the position coordinate values of the first type of human body joint parts and the position coordinate values of the second type of human body joint parts in the first target video frame are recognized, the playback unit is controlled to start and play according to the preset music parameters and The music generated by the initial value of the sound effect parameter;
    第二识别步骤:以第一目标视频帧的读取时间为时间起点,每间隔预设时间,读取所述动作视频的当前视频帧作为第二目标视频帧,将所述第二目标视频帧输入所述预先训练好的模型,识别所述第二目标视频中用户的第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;The second identification step: taking the reading time of the first target video frame as the starting point of time, reading the current video frame of the action video as the second target video frame at a preset time interval, and setting the second target video frame Input the pre-trained model to identify the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and the position coordinate value of the user in the second target video;
    调整步骤:根据预先确定的第一类人体关节部位与音乐参数的映射关系表、第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及第一预设调整幅度表调整音乐参数,根据预先确定的第二类人体关节部位与音效参数的映射关系表、第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及第二预设调整幅度表调整音效参数,并根据调整后的音乐参数及音效参数对所述音乐进行调整生成新的音乐。The adjustment step: adjust the music according to the predetermined mapping relationship table of the first type of human body joint parts and the music parameters, the change of the position coordinate values of the first type of human body joint parts in the second target video frame, and the first preset adjustment range table Parameters: adjust the sound effect parameters according to the predetermined mapping table of the second type of human joint parts and the sound effect parameters, the change in the position coordinate values of the second type of human joint parts in the second target video frame, and the second preset adjustment range table , And adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
  2. 如权利要求1所述的音乐生成方法,其中,所述调整步骤包括:8. The music generating method according to claim 1, wherein the adjusting step comprises:
    A1、将第一目标视频帧中的人体各关节部位的位置坐标值作为各关节部位的位置初始值;A1. Use the position coordinate value of each joint part of the human body in the first target video frame as the initial value of the position of each joint part;
    A2、根据第二目标视频帧中第一类人体关节部位的位置坐标值及其位置初始值,计算得出第二目标视频帧中第一类人体关节部位的位置坐标值的变化量;A2. According to the position coordinate values of the first type of human body joints in the second target video frame and their initial position values, calculate the amount of change in the position coordinate values of the first type of human joints in the second target video frame;
    A3、根据第二目标视频帧中第二类人体关节部位的位置坐标值及其位置初始值,计算得出第二目标视频帧中第二类人体关节部位的位置坐标值的变化量;A3. According to the position coordinate values of the second type of human joints in the second target video frame and their initial position values, calculate the amount of change in the position coordinate values of the second type of human joints in the second target video frame;
    A4、根据第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及预先确定的第一类人体关节部位与音乐参数的映射关系表确定需要调整的音乐参数的名称,根据第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及预先确定的第二类人体关节部位与音效参数的映射关系表确定需要调整的音效参数的名称;A4. Determine the name of the music parameter that needs to be adjusted according to the change in the position coordinate value of the first type of human joint part in the second target video frame and the predetermined mapping relationship between the first type of human joint part and the music parameter. 2. The amount of change in the position coordinate values of the second type of human joint parts in the target video frame and the predetermined mapping relationship table between the second type of human joint parts and the sound effect parameters determine the name of the sound effect parameter that needs to be adjusted;
    A5、根据第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及第一预设调整幅度表对所述需要调整的音乐参数进行调整,根据第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及第二预设调整幅度表对所述需要调整的音效参数进行调整;A5. Adjust the music parameters that need to be adjusted according to the variation of the position coordinate values of the joints of the first type of human body in the second target video frame and the first preset adjustment range table, and adjust the music parameters to be adjusted according to the second target video frame in the second target video frame. The amount of change in the position coordinate values of the human body joint parts and the second preset adjustment range table adjust the sound effect parameters that need to be adjusted;
    A6、根据调整后音乐参数及音效参数对所述音乐进行调整生成新的音乐。A6. Adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
  3. 如权利要求2所述的音乐生成方法,其中,所述方法还包括:3. The music generating method according to claim 2, wherein the method further comprises:
    停止步骤:当接收到预设停止信号时,控制播放单元停止播放所述音乐。Stopping step: when a preset stop signal is received, the playing unit is controlled to stop playing the music.
  4. 如权利要求1所述的音乐生成方法,其中,所述第一类人体关节部位为人体左半边关节部位,所述第二类人体关节部位为人体右半边关节部位。5. The music generating method according to claim 1, wherein the first type of human joint part is a left half of the human body joint part, and the second type of human body joint part is a right half of the human body joint part.
  5. 如权利要求1至4中的任意一项所述的音乐生成方法,其中,所述预先训练好的模型为PoseNet模型,所述PoseNet模型的训练过程包括:The music generation method according to any one of claims 1 to 4, wherein the pre-trained model is a PoseNet model, and the training process of the PoseNet model includes:
    B1、获取预设数量的人物动作图片样本,将所述图片样本分为第一比例的训练集和第二比例的验证集;B1. Obtain a preset number of character action picture samples, and divide the picture samples into a training set of a first proportion and a verification set of a second proportion;
    B2、利用所述训练集训练所述PoseNet模型;B2. Use the training set to train the PoseNet model;
    B3、利用所述验证集验证训练后的PoseNet模型的准确率,若所述准确率大于或者等 于预设准确率,则训练结束;B3. Use the verification set to verify the accuracy of the trained PoseNet model. If the accuracy is greater than or equal to the preset accuracy, the training ends;
    B4、若所述准确率小于预设准确率,则按照预设百分比增加所述预设数量的人物动作图片样本的数量,并返回步骤B1。B4. If the accuracy rate is less than the preset accuracy rate, increase the preset number of character action picture samples according to the preset percentage, and return to step B1.
  6. 如权利要求1所述的音乐生成方法,其中,所述关键部位信息还包括第一、第二类人体关节部位的位置精度的置信度分值。3. The music generating method according to claim 1, wherein the key part information further includes the confidence scores of the position accuracy of the first and second types of human joint parts.
  7. 如权利要求1所述的音乐生成方法,其中,所述音乐参数包括音高、音乐速度、音符时值及音区,所述音效参数包括响度、延迟时间、左右相位及混响时间。3. The music generating method according to claim 1, wherein the music parameters include pitch, music speed, note duration, and sound zone, and the sound effect parameters include loudness, delay time, left and right phase, and reverberation time.
  8. 一种音乐生成装置,其中,所述装置包括:A music generating device, wherein the device includes:
    第一识别模块,用于利用摄像单元录制用户的动作视频,读取所述动作视频的当前视频帧作为第一目标视频帧,将所述第一目标视频帧输入预先训练好的模型,识别所述第一目标视频帧中用户的关键部位信息,所述关键部位信息包括第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;The first recognition module is used to record the user's action video by using the camera unit, read the current video frame of the action video as the first target video frame, input the first target video frame into the pre-trained model, and identify all The key part information of the user in the first target video frame, where the key part information includes the ID of the first type of human body joint part and its position coordinate value, and the ID of the second type of human body joint part and its position coordinate value;
    生成模块,用于当识别到第一目标视频帧中所述第一类人体关节部位的位置坐标值和第二类人体关节部位的位置坐标值时,控制播放单元启动并播放根据预设的音乐参数和音效参数的初始值生成的音乐;The generating module is used to control the playback unit to start and play music according to the preset when the position coordinate values of the first type of human joint parts and the second type of human joint parts in the first target video frame are recognized The music generated by the initial values of the parameters and sound effect parameters;
    第二识别模块,用于以第一目标视频帧的读取时间为时间起点,每间隔预设时间,读取所述动作视频的当前视频帧作为第二目标视频帧,将所述第二目标视频帧输入所述预先训练好的模型,识别所述第二目标视频中用户的第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;The second recognition module is used to read the current video frame of the action video as the second target video frame by taking the reading time of the first target video frame as the time starting point and every preset time interval, and set the second target video frame as the second target video frame. The video frame is input into the pre-trained model to identify the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and the position coordinate value of the user in the second target video;
    调整模块,用于根据预先确定的第一类人体关节部位与音乐参数的映射关系表、第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及第一预设调整幅度表调整音乐参数,根据预先确定的第二类人体关节部位与音效参数的映射关系表、第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及第二预设调整幅度表调整音效参数,并根据调整后的音乐参数及音效参数对所述音乐进行调整生成新的音乐。The adjustment module is used to perform the mapping relationship between the first type of human body joints and the music parameters, the change of the position coordinate values of the first type of human joints in the second target video frame, and the first preset adjustment range table. Adjust the music parameters according to the predetermined mapping relationship table of the second type of human body joints and the sound effect parameters, the change of the position coordinate value of the second type of human joints in the second target video frame, and the second preset adjustment range table. Sound effect parameters, and adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
  9. 一种电子设备,其中,该电子设备包括:存储器、处理器,所述存储器上存储有可在所述处理器上运行的音乐生成程序,所述音乐生成程序被所述处理器执行时实现如下步骤:An electronic device, wherein the electronic device includes a memory and a processor, the memory stores a music generation program that can run on the processor, and when the music generation program is executed by the processor, the following is achieved step:
    第一识别步骤:利用摄像单元录制用户的动作视频,读取所述动作视频的当前视频帧作为第一目标视频帧,将所述第一目标视频帧输入预先训练好的模型,识别所述第一目标视频帧中用户的关键部位信息,所述关键部位信息包括第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;The first recognition step: use the camera unit to record the user's action video, read the current video frame of the action video as the first target video frame, input the first target video frame into the pre-trained model, and recognize the first target video frame. A user’s key part information in a target video frame, where the key part information includes the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value;
    生成步骤:当识别到第一目标视频帧中所述第一类人体关节部位的位置坐标值和第二类人体关节部位的位置坐标值时,控制播放单元启动并播放根据预设的音乐参数和音效参数的初始值生成的音乐;Generation step: when the position coordinate values of the first type of human body joint parts and the position coordinate values of the second type of human body joint parts in the first target video frame are recognized, the playback unit is controlled to start and play according to the preset music parameters and The music generated by the initial value of the sound effect parameter;
    第二识别步骤:以第一目标视频帧的读取时间为时间起点,每间隔预设时间,读取所述动作视频的当前视频帧作为第二目标视频帧,将所述第二目标视频帧输入所述预先训练好的模型,识别所述第二目标视频中用户的第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;The second identification step: taking the reading time of the first target video frame as the starting point of time, reading the current video frame of the action video as the second target video frame at a preset time interval, and setting the second target video frame Input the pre-trained model to identify the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and the position coordinate value of the user in the second target video;
    调整步骤:根据预先确定的第一类人体关节部位与音乐参数的映射关系表、第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及第一预设调整幅度表调整音乐参数,根据预先确定的第二类人体关节部位与音效参数的映射关系表、第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及第二预设调整幅度表调整音效参数,并根据调整后的音乐参数及音效参数对所述音乐进行调整生成新的音乐。The adjustment step: adjust the music according to the predetermined mapping relationship table of the first type of human body joint parts and the music parameters, the change of the position coordinate values of the first type of human body joint parts in the second target video frame, and the first preset adjustment range table Parameters: adjust the sound effect parameters according to the predetermined mapping table of the second type of human joint parts and the sound effect parameters, the change in the position coordinate values of the second type of human joint parts in the second target video frame, and the second preset adjustment range table , And adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
  10. 如权利要求9所述的电子设备,其中,所述调整步骤包括:9. The electronic device of claim 9, wherein the adjusting step comprises:
    A1、将第一目标视频帧中的人体各关节部位的位置坐标值作为各关节部位的位置初始 值;A1. Use the position coordinate value of each joint part of the human body in the first target video frame as the initial value of the position of each joint part;
    A2、根据第二目标视频帧中第一类人体关节部位的位置坐标值及其位置初始值,计算得出第二目标视频帧中第一类人体关节部位的位置坐标值的变化量;A2. According to the position coordinate values of the first type of human body joints in the second target video frame and their initial position values, calculate the amount of change in the position coordinate values of the first type of human joints in the second target video frame;
    A3、根据第二目标视频帧中第二类人体关节部位的位置坐标值及其位置初始值,计算得出第二目标视频帧中第二类人体关节部位的位置坐标值的变化量;A3. According to the position coordinate values of the second type of human joints in the second target video frame and their initial position values, calculate the amount of change in the position coordinate values of the second type of human joints in the second target video frame;
    A4、根据第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及预先确定的第一类人体关节部位与音乐参数的映射关系表确定需要调整的音乐参数的名称,根据第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及预先确定的第二类人体关节部位与音效参数的映射关系表确定需要调整的音效参数的名称;A4. Determine the name of the music parameter that needs to be adjusted according to the change in the position coordinate value of the first type of human joint part in the second target video frame and the predetermined mapping relationship between the first type of human joint part and the music parameter. 2. The amount of change in the position coordinate values of the second type of human joint parts in the target video frame and the predetermined mapping relationship table between the second type of human joint parts and the sound effect parameters determine the name of the sound effect parameter that needs to be adjusted;
    A5、根据第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及第一预设调整幅度表对所述需要调整的音乐参数进行调整,根据第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及第二预设调整幅度表对所述需要调整的音效参数进行调整;A5. Adjust the music parameters that need to be adjusted according to the variation of the position coordinate values of the joints of the first type of human body in the second target video frame and the first preset adjustment range table, and adjust the music parameters to be adjusted according to the second target video frame in the second target video frame. The amount of change in the position coordinate values of the human body joint parts and the second preset adjustment range table adjust the sound effect parameters that need to be adjusted;
    A6、根据调整后音乐参数及音效参数对所述音乐进行调整生成新的音乐。A6. Adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
  11. 如权利要求10所述的电子设备,其中,所述音乐生成程序被所述处理器执行时还实现如下步骤:9. The electronic device of claim 10, wherein the following steps are further implemented when the music generating program is executed by the processor:
    停止步骤:当接收到预设停止信号时,控制播放单元停止播放所述音乐。Stopping step: when a preset stop signal is received, the playing unit is controlled to stop playing the music.
  12. 如权利要求9所述的电子设备,其中,所述第一类人体关节部位为人体左半边关节部位,所述第二类人体关节部位为人体右半边关节部位。9. The electronic device according to claim 9, wherein the first type of human joint part is a left half of the human body joint part, and the second type of human body joint part is a right half of the human body joint part.
  13. 如权利要求9至12中的任意一项所述的电子设备,其中,所述预先训练好的模型为PoseNet模型,所述PoseNet模型的训练过程包括:The electronic device according to any one of claims 9 to 12, wherein the pre-trained model is a PoseNet model, and the training process of the PoseNet model includes:
    B1、获取预设数量的人物动作图片样本,将所述图片样本分为第一比例的训练集和第二比例的验证集;B1. Obtain a preset number of character action picture samples, and divide the picture samples into a training set of a first proportion and a verification set of a second proportion;
    B2、利用所述训练集训练所述PoseNet模型;B2. Use the training set to train the PoseNet model;
    B3、利用所述验证集验证训练后的PoseNet模型的准确率,若所述准确率大于或者等于预设准确率,则训练结束;B3. Use the verification set to verify the accuracy of the trained PoseNet model. If the accuracy is greater than or equal to the preset accuracy, the training ends;
    B4、若所述准确率小于预设准确率,则按照预设百分比增加所述预设数量的人物动作图片样本的数量,并返回步骤B1。B4. If the accuracy rate is less than the preset accuracy rate, increase the preset number of character action picture samples according to the preset percentage, and return to step B1.
  14. 如权利要求9所述的电子设备,其中,所述关键部位信息还包括第一、第二类人体关节部位的位置精度的置信度分值。9. The electronic device according to claim 9, wherein the key part information further comprises the confidence scores of the position accuracy of the first and second types of human joint parts.
  15. 如权利要求9所述的电子设备,其中,所述音乐参数包括音高、音乐速度、音符时值及音区,所述音效参数包括响度、延迟时间、左右相位及混响时间。9. The electronic device of claim 9, wherein the music parameters include pitch, music speed, note duration, and sound zone, and the sound effect parameters include loudness, delay time, left and right phase, and reverberation time.
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有音乐生成程序,所述音乐生成程序可被一个或者多个处理器执行,以实现如下步骤:A computer-readable storage medium, wherein a music generation program is stored on the computer-readable storage medium, and the music generation program can be executed by one or more processors to implement the following steps:
    第一识别步骤:利用摄像单元录制用户的动作视频,读取所述动作视频的当前视频帧作为第一目标视频帧,将所述第一目标视频帧输入预先训练好的模型,识别所述第一目标视频帧中用户的关键部位信息,所述关键部位信息包括第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;The first recognition step: use the camera unit to record the user's action video, read the current video frame of the action video as the first target video frame, input the first target video frame into the pre-trained model, and recognize the first target video frame. A user’s key part information in a target video frame, where the key part information includes the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and its position coordinate value;
    生成步骤:当识别到第一目标视频帧中所述第一类人体关节部位的位置坐标值和第二类人体关节部位的位置坐标值时,控制播放单元启动并播放根据预设的音乐参数和音效参数的初始值生成的音乐;Generation step: when the position coordinate values of the first type of human body joint parts and the position coordinate values of the second type of human body joint parts in the first target video frame are recognized, the playback unit is controlled to start and play according to the preset music parameters and The music generated by the initial value of the sound effect parameter;
    第二识别步骤:以第一目标视频帧的读取时间为时间起点,每间隔预设时间,读取所述动作视频的当前视频帧作为第二目标视频帧,将所述第二目标视频帧输入所述预先训练好的模型,识别所述第二目标视频中用户的第一类人体关节部位的ID及其位置坐标值和第二类人体关节部位的ID及其位置坐标值;The second identification step: taking the reading time of the first target video frame as the starting point of time, reading the current video frame of the action video as the second target video frame at a preset time interval, and setting the second target video frame Input the pre-trained model to identify the ID of the first type of human body joint part and its position coordinate value and the ID of the second type of human body joint part and the position coordinate value of the user in the second target video;
    调整步骤:根据预先确定的第一类人体关节部位与音乐参数的映射关系表、第二目标 视频帧中第一类人体关节部位的位置坐标值的变化量及第一预设调整幅度表调整音乐参数,根据预先确定的第二类人体关节部位与音效参数的映射关系表、第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及第二预设调整幅度表调整音效参数,并根据调整后的音乐参数及音效参数对所述音乐进行调整生成新的音乐。The adjustment step: adjust the music according to the predetermined mapping relationship table of the first type of human body joint parts and the music parameters, the change of the position coordinate values of the first type of human body joint parts in the second target video frame, and the first preset adjustment range table Parameter, adjust the sound effect parameter according to the predetermined mapping relationship table of the second type of human body joint parts and the sound effect parameter, the change amount of the position coordinate value of the second type of human body joint part in the second target video frame, and the second preset adjustment range table , And adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
  17. 如权利要求16所述的计算机可读存储介质,其中,所述调整步骤包括:The computer-readable storage medium of claim 16, wherein the adjusting step comprises:
    A1、将第一目标视频帧中的人体各关节部位的位置坐标值作为各关节部位的位置初始值;A1. Use the position coordinate value of each joint part of the human body in the first target video frame as the initial value of the position of each joint part;
    A2、根据第二目标视频帧中第一类人体关节部位的位置坐标值及其位置初始值,计算得出第二目标视频帧中第一类人体关节部位的位置坐标值的变化量;A2. According to the position coordinate values of the first type of human body joints in the second target video frame and their initial position values, calculate the amount of change in the position coordinate values of the first type of human joints in the second target video frame;
    A3、根据第二目标视频帧中第二类人体关节部位的位置坐标值及其位置初始值,计算得出第二目标视频帧中第二类人体关节部位的位置坐标值的变化量;A3. According to the position coordinate values of the second type of human joints in the second target video frame and their initial position values, calculate the amount of change in the position coordinate values of the second type of human joints in the second target video frame;
    A4、根据第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及预先确定的第一类人体关节部位与音乐参数的映射关系表确定需要调整的音乐参数的名称,根据第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及预先确定的第二类人体关节部位与音效参数的映射关系表确定需要调整的音效参数的名称;A4. Determine the name of the music parameter that needs to be adjusted according to the change in the position coordinate value of the first type of human joint part in the second target video frame and the predetermined mapping relationship between the first type of human joint part and the music parameter. 2. The amount of change in the position coordinate values of the second type of human joint parts in the target video frame and the predetermined mapping relationship table between the second type of human joint parts and the sound effect parameters determine the name of the sound effect parameter that needs to be adjusted;
    A5、根据第二目标视频帧中第一类人体关节部位的位置坐标值的变化量及第一预设调整幅度表对所述需要调整的音乐参数进行调整,根据第二目标视频帧中第二类人体关节部位的位置坐标值的变化量及第二预设调整幅度表对所述需要调整的音效参数进行调整;A5. Adjust the music parameters that need to be adjusted according to the variation of the position coordinate values of the joints of the first type of human body in the second target video frame and the first preset adjustment range table, and adjust the music parameters to be adjusted according to the second target video frame in the second target video frame. The amount of change in the position coordinate values of the human body joint parts and the second preset adjustment range table adjust the sound effect parameters that need to be adjusted;
    A6、根据调整后音乐参数及音效参数对所述音乐进行调整生成新的音乐。A6. Adjust the music according to the adjusted music parameters and sound effect parameters to generate new music.
  18. 如权利要求17所述的计算机可读存储介质,其中,所述音乐生成程序被一个或者多个处理器执行时还实现如下步骤:17. The computer-readable storage medium according to claim 17, wherein the following steps are further implemented when the music generation program is executed by one or more processors:
    停止步骤:当接收到预设停止信号时,控制播放单元停止播放所述音乐。Stopping step: when a preset stop signal is received, the playing unit is controlled to stop playing the music.
  19. 如权利要求16所述的计算机可读存储介质,其中,所述第一类人体关节部位为人体左半边关节部位,所述第二类人体关节部位为人体右半边关节部位。16. The computer-readable storage medium of claim 16, wherein the first type of human joint part is a left half of the human body joint part, and the second type of human body joint part is a right half of the human body joint part.
  20. 如权利要求16至19中的任意一项所述的计算机可读存储介质,其中,所述预先训练好的模型为PoseNet模型,所述PoseNet模型的训练过程包括:The computer-readable storage medium according to any one of claims 16 to 19, wherein the pre-trained model is a PoseNet model, and the training process of the PoseNet model includes:
    B1、获取预设数量的人物动作图片样本,将所述图片样本分为第一比例的训练集和第二比例的验证集;B1. Obtain a preset number of character action picture samples, and divide the picture samples into a training set of a first proportion and a verification set of a second proportion;
    B2、利用所述训练集训练所述PoseNet模型;B2. Use the training set to train the PoseNet model;
    B3、利用所述验证集验证训练后的PoseNet模型的准确率,若所述准确率大于或者等于预设准确率,则训练结束;B3. Use the verification set to verify the accuracy of the trained PoseNet model. If the accuracy is greater than or equal to the preset accuracy, the training ends;
    B4、若所述准确率小于预设准确率,则按照预设百分比增加所述预设数量的人物动作图片样本的数量,并返回步骤B1。B4. If the accuracy rate is less than the preset accuracy rate, increase the preset number of character action picture samples according to the preset percentage, and return to step B1.
PCT/CN2020/119078 2019-10-12 2020-09-29 Music generation method and apparatus, electronic device and computer-readable storage medium WO2021068812A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910969868.5A CN110827789B (en) 2019-10-12 2019-10-12 Music generation method, electronic device and computer readable storage medium
CN201910969868.5 2019-10-12

Publications (1)

Publication Number Publication Date
WO2021068812A1 true WO2021068812A1 (en) 2021-04-15

Family

ID=69549173

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/119078 WO2021068812A1 (en) 2019-10-12 2020-09-29 Music generation method and apparatus, electronic device and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN110827789B (en)
WO (1) WO2021068812A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827789B (en) * 2019-10-12 2023-05-23 平安科技(深圳)有限公司 Music generation method, electronic device and computer readable storage medium
CN112380362A (en) 2020-10-27 2021-02-19 脸萌有限公司 Music playing method, device and equipment based on user interaction and storage medium
CN115881064A (en) * 2021-09-28 2023-03-31 北京字跳网络技术有限公司 Music generation method, device, equipment, storage medium and program

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005328236A (en) * 2004-05-13 2005-11-24 Nippon Telegr & Teleph Corp <Ntt> Video monitoring method, device, and program
CN108053815A (en) * 2017-12-12 2018-05-18 广州德科投资咨询有限公司 The performance control method and robot of a kind of robot
CN108415764A (en) * 2018-02-13 2018-08-17 广东欧珀移动通信有限公司 Electronic device, game background music matching process and Related product
CN109102787A (en) * 2018-09-07 2018-12-28 温州市动宠商贸有限公司 A kind of simple background music automatically creates system
CN109413351A (en) * 2018-10-26 2019-03-01 平安科技(深圳)有限公司 A kind of music generating method and device
CN109618183A (en) * 2018-11-29 2019-04-12 北京字节跳动网络技术有限公司 A kind of special video effect adding method, device, terminal device and storage medium
CN110827789A (en) * 2019-10-12 2020-02-21 平安科技(深圳)有限公司 Music generation method, electronic device and computer-readable storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050190199A1 (en) * 2001-12-21 2005-09-01 Hartwell Brown Apparatus and method for identifying and simultaneously displaying images of musical notes in music and producing the music
US9972339B1 (en) * 2016-08-04 2018-05-15 Amazon Technologies, Inc. Neural network based beam selection
CN109325933B (en) * 2017-07-28 2022-06-21 阿里巴巴集团控股有限公司 Method and device for recognizing copied image

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005328236A (en) * 2004-05-13 2005-11-24 Nippon Telegr & Teleph Corp <Ntt> Video monitoring method, device, and program
CN108053815A (en) * 2017-12-12 2018-05-18 广州德科投资咨询有限公司 The performance control method and robot of a kind of robot
CN108415764A (en) * 2018-02-13 2018-08-17 广东欧珀移动通信有限公司 Electronic device, game background music matching process and Related product
CN109102787A (en) * 2018-09-07 2018-12-28 温州市动宠商贸有限公司 A kind of simple background music automatically creates system
CN109413351A (en) * 2018-10-26 2019-03-01 平安科技(深圳)有限公司 A kind of music generating method and device
CN109618183A (en) * 2018-11-29 2019-04-12 北京字节跳动网络技术有限公司 A kind of special video effect adding method, device, terminal device and storage medium
CN110827789A (en) * 2019-10-12 2020-02-21 平安科技(深圳)有限公司 Music generation method, electronic device and computer-readable storage medium

Also Published As

Publication number Publication date
CN110827789B (en) 2023-05-23
CN110827789A (en) 2020-02-21

Similar Documents

Publication Publication Date Title
WO2021068812A1 (en) Music generation method and apparatus, electronic device and computer-readable storage medium
CN108615055B (en) Similarity calculation method and device and computer readable storage medium
US9489934B2 (en) Method for selecting music based on face recognition, music selecting system and electronic apparatus
US11205408B2 (en) Method and system for musical communication
US11007445B2 (en) Techniques for curation of video game clips
WO2016169432A1 (en) Identity authentication method and device, and terminal
CN109785820A (en) A kind of processing method, device and equipment
US20220061695A1 (en) Entrainment sonification techniques
CN106951881B (en) Three-dimensional scene presenting method, device and system
US11947789B2 (en) Interactive control method and apparatus, storage medium, and electronic device
US11219815B2 (en) Physiological response management using computer-implemented activities
WO2020244074A1 (en) Expression interaction method and apparatus, computer device, and readable storage medium
CN111836110B (en) Method and device for displaying game video, electronic equipment and storage medium
CN114053688A (en) Online body feeling fighting dance method and device, computer equipment and storage medium
WO2021134078A1 (en) Workout-training method
US20190164444A1 (en) Assessing a level of comprehension of a virtual lecture
JP2024513001A (en) Artificial intelligence to capture facial expressions and generate mesh data
KR102021700B1 (en) System and method for rehabilitate language disorder custermized patient based on internet of things
CN112752149A (en) Live broadcast method, device, terminal and storage medium
WO2020045042A1 (en) Mental control system, mental control method, and program
US10564608B2 (en) Eliciting user interaction with a stimulus through a computing platform
WO2019062600A1 (en) Movement control method, device and system for massager
JP5947438B1 (en) Performance technology drawing evaluation system
US20220321772A1 (en) Camera Control Method and Apparatus, and Terminal Device
JP2022550396A (en) language teaching machine

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20874540

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20874540

Country of ref document: EP

Kind code of ref document: A1