WO2023000917A1 - 音乐文件的生成方法、生成装置、电子设备和存储介质 - Google Patents

音乐文件的生成方法、生成装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2023000917A1
WO2023000917A1 PCT/CN2022/100969 CN2022100969W WO2023000917A1 WO 2023000917 A1 WO2023000917 A1 WO 2023000917A1 CN 2022100969 W CN2022100969 W CN 2022100969W WO 2023000917 A1 WO2023000917 A1 WO 2023000917A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
salient
music
generating
target
Prior art date
Application number
PCT/CN2022/100969
Other languages
English (en)
French (fr)
Inventor
薛愉凡
郭冠军
袁欣
陈月朝
黄昊
李娜
周栩彬
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Priority to EP22845082.1A priority Critical patent/EP4339809A1/en
Publication of WO2023000917A1 publication Critical patent/WO2023000917A1/zh
Priority to US18/545,825 priority patent/US20240127777A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/44Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/483Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10GREPRESENTATION OF MUSIC; RECORDING MUSIC IN NOTATION FORM; ACCESSORIES FOR MUSIC OR MUSICAL INSTRUMENTS NOT OTHERWISE PROVIDED FOR, e.g. SUPPORTS
    • G10G1/00Means for the representation of music
    • G10G1/02Chord or note indicators, fixed or adjustable, for keyboard of fingerboards
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0033Recording/reproducing or transmission of music for electrophonic musical instruments
    • G10H1/0041Recording/reproducing or transmission of music for electrophonic musical instruments in coded form
    • G10H1/0058Transmission between separate instruments or between individual components of a musical system
    • G10H1/0066Transmission between separate instruments or between individual components of a musical system using a MIDI interface
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • G10H1/368Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems displaying animated or moving pictures synchronized with the music or audio part

Definitions

  • the application belongs to the field of computer technology, and in particular relates to a music file generation method, generation device, electronic equipment and storage medium.
  • music creation has a relatively high threshold, and it is difficult for ordinary users to participate in music creation.
  • the created "music” is generally regarded as an auditory art.
  • the music itself establishes a connection with the listener's auditory senses, but does not establish a connection with the most important human sense "vision", resulting in a single user experience in the process of creating music.
  • the purpose of the embodiment of the present application is to provide a music file generation method, generation device, electronic equipment and storage medium, which can generate music based on visualized images, and give users a unique dual experience of hearing and vision.
  • the embodiment of the present application provides a method for generating a music file, including:
  • the salient features are mapped to the MDI information coordinate system, and the MDI information corresponding to the salient features is determined; the MDI information coordinate system is used to indicate the relationship between the MDI information and time Correspondence;
  • the music file is generated based on the corresponding relationship between the digital instrument interface information and the time.
  • the embodiment of the present application provides a device for generating music files, including:
  • An acquisition module configured to acquire the first image
  • the extraction module is used to perform feature extraction on the first image to obtain salient features of the first image
  • the processing module is configured to map the salient features to the musical instrument digital interface information coordinate system based on the position of the salient features in the first image, and determine the musical instrument digital interface information corresponding to the salient features; the musical instrument digital interface information coordinate system is used to indicate the musical instrument digital interface Correspondence between interface information and time;
  • the generation module is used for generating music files based on the corresponding relationship between the digital instrument interface information and time.
  • the embodiment of the present application provides an electronic device, including a processor, a memory, and a program or instruction stored in the memory and operable on the processor.
  • the program or instruction is executed by the processor, the first aspect is implemented. steps of the method.
  • an embodiment of the present application provides a readable storage medium, on which a program or an instruction is stored, and when the program or instruction is executed by a processor, the steps of the method in the first aspect are implemented.
  • the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the steps of the method in the first aspect .
  • image information such as a photo or video
  • image information is converted into a visualized electronic score file by processing the image, that is, the above-mentioned first image, specifically in the musical instrument digital interface (Musical Instrument Digital Interface, MIDI ) coordinate system, the manner of displaying the audio track blocks, wherein these audio track blocks constitute the salient features of the first image, that is, the graph formed by the audio track blocks matches the image of the salient features of the first image.
  • these track blocks all include musical instrument digital interface information, that is, MIDI information. After the MIDI information is recognized by the computer, these track blocks are played in chronological order according to the correspondence between MIDI information and time, thereby forming music.
  • the embodiment of this application constructs music through images, so that the formed music matches the images containing the user's memories.
  • the threshold for music creation is lowered, so that "novice" users who do not have music theory knowledge can also construct corresponding music based on pictures.
  • Music displays audio track blocks through the MIDI information coordinate system, making the final music visualization visible, giving users a unique dual experience of hearing and vision.
  • Fig. 1 shows one of the flowcharts of the method for generating a music file according to an embodiment of the application
  • Fig. 2 shows the interface schematic diagram of the MIDI information coordinate system according to the generation method of the music file of the embodiment of the application;
  • Fig. 3 shows a schematic diagram of a salient target texture map according to a method for generating a music file according to an embodiment of the present application
  • Fig. 4 shows the second flow chart of the method for generating a music file according to an embodiment of the present application
  • Fig. 5 shows the division diagram of the target texture map according to the generation method of the music file of the embodiment of the present application
  • Fig. 6 shows the third flowchart of the method for generating a music file according to an embodiment of the present application
  • FIG. 7 shows a schematic diagram of a piano roll graphical interface in a method for generating a music file according to an embodiment of the present application
  • Fig. 8 shows the structural block diagram of the generating device of music file according to the embodiment of the present application.
  • FIG. 9 shows a structural block diagram of an electronic device according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
  • FIG. 1 shows one of the flowcharts of a method for generating a music file according to an embodiment of the application. As shown in FIG. 1 , the method includes :
  • Step 102 acquiring a first image
  • Step 104 performing feature extraction on the first image to obtain salient features of the first image
  • Step 106 based on the position of the salient features in the first image, map the salient features to the musical instrument digital interface information coordinate system, and determine the musical instrument digital interface information corresponding to the salient features;
  • step 106 the musical instrument digital interface information coordinate system is used to indicate the corresponding relationship between the musical instrument digital interface information and time;
  • Step 108 based on the corresponding relationship between the musical instrument digital interface information and the time, a music file is generated.
  • the first image is specifically the "memory image" selected by the user.
  • the user can obtain the first image by uploading a photo or video saved locally to the client, and the user can also obtain the first image by taking a photo or recording a video with a camera of an electronic device such as a mobile phone.
  • the first image can be obtained by extracting frames from the video.
  • a frame may be randomly extracted from the video, or the content of the video may be identified through a neural network model, so as to determine an image frame that can reflect the theme of the video for extraction.
  • acquiring the first image specifically includes: receiving a third input, where the third input is an input for selecting the first image; and determining the first image in response to the third input.
  • acquiring the first image specifically includes: receiving a fourth input, where the fourth input is an input for shooting a video; responding to the fourth input, shooting the video to be processed; performing frame extraction processing on the video to be processed to obtain the first an image.
  • feature extraction is further performed on the first image, so that salient features of the first image are extracted from the first image. For example, if the first image is a "face” picture, the salient features of the first image are the outline of the face, the position of facial features, etc. in the first image. If the first image is a "portrait" picture of a full body or a half body, then the salient features of the first image are the silhouette, posture, etc. of the person in it.
  • the salient features of the first image may be the animal's or child's body shape and facial features. If the first image is a "still" object (stationary object) such as a building, a vehicle, a landscape, etc., then the salient feature of the first image may be the overall appearance and prominent installation of these still objects.
  • the salient feature is mapped in the MDI information coordinate system, that is, the MIDI information coordinate system, so that the salient feature Image unit, formed as a track block in the MIDI message coordinate system.
  • the MIDI information coordinate system is used to indicate the corresponding relationship between MIDI information and time, that is, the relationship between MIDI information corresponding to a track block and time.
  • Fig. 2 shows a schematic interface diagram of the MIDI information coordinate system of the method for generating music files according to the embodiment of the present application.
  • the first image is specifically a face image
  • the salient features of the face image mapped to a plurality of audio track blocks 202 in the MIDI information coordinate system 200
  • the plurality of audio track blocks 202 form a shape similar to a human face in the MIDI information coordinate system
  • the shape of the human face is the same as the first image. corresponding to the characteristics.
  • these audio track blocks corresponding to the salient features have musical instrument digital interface information, that is, MIDI information.
  • MIDI information are specifically information that can be recognized by a computer device and played as "sound".
  • digital signals corresponding to information such as pitch, timbre, and volume are obtained, thereby forming a musical motive, that is, an accent, and according to the corresponding relationship between these distinctive features and time, that is, the corresponding relationship between these musical motives and time , and play the "sounds" corresponding to these music motives in sequence, thereby forming a piece of music, which is unique music generated according to the "memory image" selected by the user, that is, the first image.
  • the embodiment of this application constructs music through images, so that the formed music matches the images containing the user's memories.
  • the threshold for music creation is lowered, so that "novice" users who do not have music theory knowledge can also construct corresponding music based on pictures.
  • Music displays audio track blocks through the MIDI information coordinate system, making the final music visualization visible, giving users a unique dual experience of hearing and vision.
  • the image content of the first image includes salient objects
  • the salient features include at least one of the following: key points of the salient objects, and edge feature points of the salient objects.
  • the salient object is the main object in the image content of the first image.
  • the salient object is the "human face”.
  • the salient object is the "building".
  • salient features specifically include the key points of salient objects, such as the key points of a human face, namely "five sense organs", and the key points of a building are the characteristic design of the building, such as “windows” and "doorways”.
  • Salient features may also include edge feature points of salient objects, and these edge feature points will form contours of salient objects, such as human face contours or building contours.
  • a "simplified map" of the salient objects can be formed, through which the viewer can be associated with the original image.
  • the subject being photographed such as “someone” or "a certain building,” evokes memories in the viewer.
  • feature extraction is performed on the first image to obtain salient features of the first image, including:
  • the key points of the salient objects are extracted, and the key points of the salient objects are obtained.
  • the first image when performing feature extraction on the first image, first, the first image may be segmented through a pre-trained convolutional neural network.
  • the object of object segmentation is to segment out salient objects in the first image.
  • a preset convolutional neural network can be trained through a large number of pre-labeled training sets, so that the trained convolutional neural network can identify salient objects in pictures. For example, for portrait pictures, a training set can be generated by setting a large number of original face pictures and the salient target pictures containing only "face” after the "face” part is cut out. The convolutional neural network is trained so that the convolutional neural network iterates continuously. When the convolutional neural network can relatively accurately identify the salient target and the edge of the salient target in the picture, it is judged that the convolutional neural network can be put into use.
  • the convolutional neural network trained by the above method is used to perform artificial intelligence recognition on the first image, thereby judging the salient objects and the edges of the salient objects, and obtaining the edge feature points of the salient objects.
  • the specific types of salient objects are judged, such as “face”, “animal”, “building”, etc., so as to determine the corresponding key point extraction granularity according to the specific types of salient objects, According to the corresponding extraction granularity, the key points of the salient objects are extracted, so as to obtain the key points of the salient objects, such as the facial features of the face.
  • the application extracts the salient features of the salient objects in the first image through the trained convolutional neural network, specifically the key points and edge feature points of the salient objects, which can quickly and accurately obtain the salient features, thereby improving
  • the processing speed of generating music from images is conducive to improving user experience.
  • the music file generating method before mapping the salient features into the MDI information coordinate system based on the positions of the salient features in the first image, the music file generating method further includes:
  • the location of the salient feature in the first image is determined.
  • a salient target texture map corresponding to the first image is generated.
  • the salient object texture map that is, in the first image, only shows the image of the salient features of the salient object.
  • the salient object texture map includes only two types of pixels, wherein the first type of pixels are pixels for displaying salient features, and the second type of pixels are pixels at non-salient feature positions.
  • Fig. 3 shows a schematic diagram of a salient object texture map according to a method for generating a music file according to an embodiment of the present application.
  • the first image is a human face image
  • the salient object therein is a human face.
  • the salient object A texture map looks like a sketch of a human face.
  • the salient target texture map is an image that processes the first image to only display salient features
  • the salient features can be determined according to the salient target texture map, so that the salient features Features are mapped to the MIDI information coordinate system to realize the conversion process from image to MIDI electronic score, and finally to music, realizing "from image to music" and giving users a unique experience.
  • FIG. 4 shows the second flow chart of the music file generation method according to the embodiment of the present application.
  • the salient target texture corresponding to the first image is generated
  • the steps of the figure include the following steps:
  • Step 402 Perform edge detection on the first image according to the edge feature points and the Canny edge detection algorithm to obtain the edge image of the salient target;
  • Step 404 generating a salient object map corresponding to the salient object according to the key points and edge feature points;
  • Step 406 performing image superposition on the edge image and the salient object map to obtain the salient object texture map corresponding to the first image.
  • edge detection is performed by using the Canny edge detection algorithm according to the edge feature points.
  • the Canny edge detection algorithm is also the Canny edge detection algorithm, specifically a multi-level edge detection algorithm developed by John F.Canny in 1986.
  • the first image is first subjected to Gaussian filtering, that is, a Gaussian matrix is used to remove the average value of each pixel and its neighborhood to be weighted , as the gray value of the pixel. Further, calculate the gradient value and gradient direction, and filter the non-maximum value, and finally use the set threshold range to perform edge detection to obtain the edge image of the salient target.
  • Gaussian filtering that is, a Gaussian matrix is used to remove the average value of each pixel and its neighborhood to be weighted , as the gray value of the pixel.
  • a salient object map corresponding to the salient object is generated, that is, a feature map formed by the key points and the edge feature points.
  • edge image and the salient object map are connected, which is equivalent to drawing each key point and the contour together, and finally a salient object texture map with clear contours is obtained.
  • determining the position of the salient feature in the first image includes:
  • X and Y are both integers greater than 1, and at least one of bright pixels and dark pixels is included in the graphics units, and bright pixels are brightness A pixel with a value of 1, and a dark pixel is a pixel with a brightness value of 0;
  • N the number of salient features of the first image is N, and N target graphics units One-to-one correspondence with N salient features, N is a positive integer;
  • the position of the salient feature in the first image is determined.
  • the target texture map is divided into X rows and Y columns to obtain an X ⁇ Y graphics matrix, which includes X ⁇ Y graphics units.
  • each graphics unit there are multiple pixels, including bright pixels and dark pixels.
  • Bright pixels are pixels used to display salient features, and their brightness value is 1.
  • Dark pixels are pixels outside of salient features, whose brightness is 1.
  • a value of 0 means "pure black" is displayed.
  • the proportion of bright pixels in each image unit is judged respectively. For example, assuming that the number of pixels in a graphic unit is 10, including 6 bright pixels and 4 dark pixels, the ratio of the number of bright pixels in the graphic unit is 0.6.
  • the ratio of bright pixels in each graphics unit is greater than a preset ratio, wherein the range of the preset ratio is greater than or equal to 0.2, preferably 0.4 .
  • the preset ratio of 0.4 as an example, if there are 4 or more bright pixels among the 10 pixels in a graphics unit, this graphics unit is marked as the target graphics unit, which is used to indicate that the target graphics unit is have distinctive features.
  • FIG. 5 shows a schematic diagram of the division of the target texture map according to the music file generation method of the embodiment of the present application. As shown in FIG. Among them, the one filled with hatching is a target graphic unit 504 , that is, a unit of a distinctive feature.
  • the graphic unit 506 is located in the fourth column and the second row, then the salient features corresponding to the graphic unit 506 can be determined, and the first image in the first image One abscissa is 4x, and the first ordinate is 2y.
  • the target graphic unit is determined according to the ratio of the number of bright pixels in the divided X ⁇ Y graphic units, and the target graphic unit is taken as a distinctive feature, Mapped to the MIDI information coordinate system, the transformation from image to MIDI electronic score is realized, and then the transformation from image to music is realized. At the same time, the music is visualized, which can give users a dual experience of hearing and vision.
  • the salient features are mapped to the MDI information coordinate system, including:
  • the N salient features are mapped to the MDI information coordinate system, and N audio track blocks corresponding to the N salient features one-to-one are obtained.
  • the first abscissa and the first ordinate of the salient features obtained above can be synchronously converted to the second abscissa in the MIDI information coordinate system coordinates and the second ordinate, so as to realize the mapping of salient features in the MIDI information coordinate system.
  • N salient features are mapped to the MIDI information coordinate system, and N track blocks corresponding to the N notable features are obtained, and the N track blocks are processed through the musical instrument digital interface program.
  • visualized music can be obtained.
  • image features of the salient objects in the first image are preserved, and on the other hand, unique music corresponding to the salient objects in the first image can be generated.
  • the MIDI information coordinate system is used to indicate the corresponding relationship between musical instrument digital interface information and time. Therefore, according to a notable feature, that is, the coordinates of an audio track block in the MIDI information coordinate system, the MIDI information of an audio track block can be determined. Information and time information. After the computer program recognizes the MIDI information and time information of the track block, it can convert it into a music motive.
  • This music motive has sound attributes such as timbre, pitch, volume, etc., and also has the time attribute of beat , play multiple audio track blocks corresponding to multiple salient features according to their MIDI information and time information, and finally get the music converted from the first image, that is, the music that matches the user's "recall image", satisfying Meet the user's demand for unique music creation.
  • the audio track block includes the digital instrument interface information, and the digital instrument interface information is determined according to the second ordinate corresponding to the audio track block; wherein, the digital instrument interface information includes at least one of the following information: High, Timbre, Volume.
  • FIG. 6 shows the third flowchart of the method for generating music files according to the embodiment of the present application. As shown in FIG. 6, the method also includes:
  • Step 602 receiving a first input
  • the first input is an input for selecting a preset music feature; in this step, the first input is a user input received through a human-computer interaction component, and the first input includes: touch input, biometric input , click input, somatosensory input, voice input, keyboard input or press input, wherein: touch input includes but not limited to touch, slide or specific touch gestures, etc.; biometric input includes but not Limited to biometric information input such as fingerprints, irises, voiceprints, or facial recognition; click input includes but not limited to mouse clicks, switch clicks, etc.; somatosensory input includes but not limited to shaking electronic devices, flipping electronic devices, etc.; press input includes but not limited to Press input on the touch screen, on the frame, on the back cover, or on other parts of the electronic device.
  • touch input includes but not limited to touch, slide or specific touch gestures, etc.
  • biometric input includes but not Limited to biometric information input such as fingerprints, irises, voiceprints, or facial recognition
  • click input includes but not limited
  • Step 604 in response to the first input, determine the target music feature
  • the target music features include at least one of the following: music style, music mood, music style;
  • Step 606 adjusting the music according to the music features
  • Step 608 play the music file.
  • the user can select a plurality of preset music features and select a target music feature, so as to adjust the music generated according to the first image rationally.
  • the target music characteristics include music style, such as: pop music, classical music, electronic music, etc., and also include music mood, such as: passionate, deep, soothing, etc., and also include music style, such as: rock music, jazz, blues, etc. .
  • the music generated according to the first image is adjusted, so that the adjusted music is more in line with the music feature selected by the user. If the user selects classical music, soothing, and blues, then the intermediate frequency and The volume of the low frequency, while adjusting the time interval of the second abscissa, makes the music rhythm slower and more soothing.
  • further post-processing can be performed on the second ordinate of the track block in the MIDI coordinate system according to the preset music theory data and acoustic data.
  • a key can be set in advance, and the range of the highest and lowest scales can be specified. If the highest and lowest scales of the track block within a certain period of time exceed this range, then according to certain adjustment rules, the range will be adjusted.
  • Adjust the pitch of the track block that is, adjust the out-of-tone to in-tone, such as lowering the pitch of the track block higher than the highest scale threshold by one octave, or lowering the pitch of the track block lower than the lowest scale threshold
  • the pitch is increased by one octave, etc., so that the adjusted music is more in line with the music theory.
  • the adjusted music can be played automatically, so that the user can immediately enjoy the music generated according to the "memory photo" selected by him. Music, enjoy the joy of music creation.
  • the method for generating a music file further includes: generating a second image corresponding to the music;
  • Playing the music file includes: displaying the second image and playing the music file.
  • a second image corresponding to the music file to be played may also be generated, and the second image is displayed while the music file is played, so that the user can experience visual and auditory enjoyment at the same time.
  • the second image may be a static picture generated according to the first image selected by the user, or the salient feature texture map corresponding to the first image, and the static picture and the playing progress of the music are displayed when the music file is played.
  • the second image can also be an animation file generated according to a preset template, or according to the playback interface of the MIDI information coordinate system, the animation duration of the animation file matches the music duration of the generated music, and is played while playing the music file Animations further enhance the user's visual experience.
  • a second image is generated according to the target video template and the salient target texture map.
  • the target video template selected by the user's second input by receiving the user's second input, the target video template selected by the user's second input, and the salient target texture map corresponding to the first image can be generated when playing music as the background when playing music image.
  • the video template may be a coherent animation template, or a "slideshow" in which multiple static pictures are displayed in turn.
  • the salient target texture map corresponding to the first image is superimposed and displayed, so that when the user sees the second image, the memory of the first image can be recalled and user experience can be improved.
  • the second input is a user input received through the human-computer interaction component, and the second input includes: one of touch input, biometric input, click input, somatosensory input, voice input, keyboard input or press input
  • touch input includes but not limited to touch, slide or specific touch gestures, etc.
  • biometric input includes but not limited to biometric information input such as fingerprint, iris, voiceprint or facial recognition
  • click input Including but not limited to mouse click, switch click, etc.
  • somatosensory input includes but not limited to shaking electronic devices, flipping electronic devices, etc.
  • pressing input includes but not limited to pressing input on the touch screen, pressing input on the frame, pressing on the back cover Input or press input to parts of other electronic devices.
  • the embodiment of the present application does not limit the specific form of the second input.
  • generating a second image corresponding to the music file includes:
  • a second image is generated based on the target animation and the salient target texture map.
  • the target animation is generated through the piano roll graphical interface, wherein the target animation is the process of playing the audio track block in the MIDI file in the piano roll graphical interface.
  • FIG. 7 shows a schematic diagram of the piano roll graphical interface in the method for generating music files according to an embodiment of the present application, wherein the left side is the key 702 of the animation image of the piano, and the track block 704 is in the interface. According to Its corresponding time information gradually moves to the left key 702 .
  • the background of the interface according to the salient target texture map corresponding to the first image, it is used as the background image of the second image, so that a dominant visual connection is established between the second image and the first image, so that the user listens to music.
  • watch the second image associated with the "reminiscence image” thereby arousing the user's memory and enriching the user's visual experience.
  • An acquisition module 802 configured to acquire a first image
  • the processing module 806 is configured to map the salient features to the musical instrument digital interface information coordinate system based on the salient features in the first image, and determine the musical instrument digital interface information corresponding to the salient features; the musical instrument digital interface information coordinate system is used to indicate the musical instrument Correspondence between digital interface information and time;
  • the generating module 808 is configured to generate music files based on the correspondence between the musical instrument digital interface information and time.
  • the first image is specifically the "memory image" selected by the user.
  • the user can obtain the first image by uploading a photo or video saved locally to the client, and the user can also obtain the first image by taking a photo or recording a video with a camera of an electronic device such as a mobile phone.
  • the first image can be obtained by extracting frames from the video.
  • a frame may be randomly extracted from the video, or the content of the video may be identified through a neural network model, so as to determine an image frame that can reflect the theme of the video for extraction.
  • acquiring the first image specifically includes: receiving a third input, wherein the third input is an input for selecting the first image; in response to the third input, determining the first image.
  • acquiring the first image specifically includes: receiving a fourth input, where the fourth input is an input for shooting a video; responding to the fourth input, shooting the video to be processed; performing frame extraction processing on the video to be processed to obtain the first an image.
  • feature extraction is further performed on the first image, so that salient features of the first image are extracted from the first image. For example, if the first image is a "face” picture, the salient features of the first image are the outline of the face, the position of facial features, etc. in the first image. If the first image is a "portrait" picture of a full body or a half body, then the salient features of the first image are the silhouette, posture, etc. of the person in it.
  • the salient features of the first image may be the animal's or child's body shape and facial features. If the first image is a "still" object such as a building, a vehicle, or a landscape, then the salient features of the first image may be the overall appearance and prominent devices of these still objects.
  • the salient feature is mapped in the MDI information coordinate system, that is, the MIDI information coordinate system, so that the salient feature Image unit, formed as a track block in the MIDI message coordinate system.
  • the MIDI information coordinate system is used to indicate the corresponding relationship between MIDI information and time, that is, the relationship between MIDI information corresponding to a track block and time.
  • these audio track blocks corresponding to the salient features have musical instrument digital interface information, that is, MIDI information.
  • MIDI information are specifically information that can be recognized by a computer device and played as "sound".
  • digital signals corresponding to information such as pitch, timbre, and volume are obtained, thereby forming a musical motive, that is, an accent, and according to the corresponding relationship between these distinctive features and time, that is, the corresponding relationship between these musical motives and time , and play the "sounds" corresponding to these music motives in sequence, thereby forming a piece of music, which is unique music generated according to the "memory image" selected by the user, that is, the first image.
  • the embodiment of this application constructs music through images, so that the formed music matches the images containing the user's memories.
  • the threshold for music creation is lowered, so that "novice" users who do not have music theory knowledge can also construct corresponding music based on pictures.
  • Music displays audio track blocks through the MIDI information coordinate system, making the final music visualization visible, giving users a unique dual experience of hearing and vision.
  • the image content of the first image includes salient objects
  • the salient features include at least one of the following: key points of the salient objects, and edge feature points of the salient objects.
  • the salient object is the main object in the image content of the first image.
  • the salient object is the "human face”.
  • the salient object is the "building".
  • salient features specifically include the key points of salient objects, such as the key points of a human face, namely "five features", and the key points of a building are the characteristic design of the building, such as “windows” and "doorways”.
  • Salient features may also include edge feature points of salient objects, and these edge feature points will form contours of salient objects, such as human face contours or building contours.
  • a "simplified map" of the salient objects can be formed, through which the viewer can be associated with the original image.
  • the subject being photographed such as “someone” or "a certain building,” evokes memories in the viewer.
  • the processing module is also used to perform object segmentation on the first image through a convolutional neural network to obtain salient objects in the first image and edge feature points of the salient objects ; Extract the key points of the salient objects to obtain the key points of the salient objects.
  • the first image when performing feature extraction on the first image, first, the first image may be segmented through a pre-trained convolutional neural network.
  • the object of object segmentation is to segment out salient objects in the first image.
  • a preset convolutional neural network can be trained through a large number of pre-labeled training sets, so that the trained convolutional neural network can identify salient objects in pictures. For example, for portrait pictures, a training set can be generated by setting a large number of original face pictures and the salient target pictures containing only "face” after the "face” part is cut out. The convolutional neural network is trained so that the convolutional neural network iterates continuously. When the convolutional neural network can relatively accurately identify the salient target and the edge of the salient target in the picture, it is judged that the convolutional neural network can be put into use.
  • the convolutional neural network trained by the above method is used to perform artificial intelligence recognition on the first image, thereby judging the salient objects and the edges of the salient objects, and obtaining the edge feature points of the salient objects.
  • the specific types of salient objects are judged, such as “face”, “animal”, “building”, etc., so as to determine the corresponding key point extraction granularity according to the specific types of salient objects, According to the corresponding extraction granularity, the key points of the salient objects are extracted, so as to obtain the key points of the salient objects, such as the facial features of the face.
  • the application extracts the salient features of the salient objects in the first image through the trained convolutional neural network, specifically the key points and edge feature points of the salient objects, which can quickly and accurately obtain the salient features, thereby improving
  • the processing speed of generating music from images is conducive to improving user experience.
  • the generating module is further configured to generate a salient target texture map corresponding to the first image according to the salient features
  • the processing module is further configured to determine the position of the salient feature in the first image according to the salient object texture map.
  • a salient target texture map corresponding to the first image is generated.
  • the salient object texture map that is, in the first image, only shows the image of the salient features of the salient object.
  • the salient object texture map includes only two types of pixels, wherein the first type of pixels are pixels used to display salient features, and the second type of pixels are pixels at non-salient feature positions.
  • the salient target texture map is an image that processes the first image to only display salient features
  • the salient features can be determined according to the salient target texture map, so that the salient features Features are mapped to the MIDI information coordinate system to realize the conversion process from image to MIDI electronic score, and finally to music, realizing "from image to music" and giving users a unique experience.
  • the processing module is further configured to perform edge detection on the first image according to the edge feature points and the Canny edge detection algorithm, to obtain the edge image of the salient object;
  • the generation module is also used to generate a salient object map corresponding to the salient object according to the key points and edge feature points; image superposition is performed on the edge image and the salient object map to obtain a salient object texture map corresponding to the first image.
  • edge detection is performed by using the Canny edge detection algorithm according to the edge feature points.
  • the first image is first subjected to Gaussian filtering, that is, a Gaussian matrix is used to remove the average value of each pixel and its neighborhood to be weighted , as the gray value of the pixel.
  • Gaussian filtering that is, a Gaussian matrix is used to remove the average value of each pixel and its neighborhood to be weighted , as the gray value of the pixel.
  • calculate the gradient value and gradient direction, and filter the non-maximum value and finally use the set threshold range to perform edge detection to obtain the edge image of the salient target.
  • a salient object map corresponding to the salient object is generated, that is, a feature map formed by the key points and the edge feature points.
  • edge image and the salient object map are connected, which is equivalent to drawing each key point and the contour together, and finally a salient object texture map with clear contours is obtained.
  • the processing module is also used for:
  • X and Y are both integers greater than 1, and at least one of bright pixels and dark pixels is included in the graphics units, and bright pixels are brightness A pixel with a value of 1, and a dark pixel is a pixel with a brightness value of 0; in X multiplied by Y graphics units, determine the target graphics unit whose quantity ratio of bright pixels is greater than the preset ratio, and obtain N target graphics units, wherein, The number of salient features of the first image is N, and the N target graphic units are in one-to-one correspondence with the N salient features, and N is a positive integer;
  • the row number of each target graphic unit in X multiplied by Y graphic units determine the first ordinate of the salient features in the first image; according to the N target graphic units, each The number of columns of the target graphics unit in the X multiplied by the Y graphics unit, determine the first abscissa of the salient feature in the first image; determine the salient feature according to the abscissa of the notable feature and the abscissa and ordinate of the notable feature position in the first image.
  • X ⁇ Y graphics matrix which includes X ⁇ Y graphics units.
  • each graphics unit there are multiple pixels, including bright pixels and dark pixels.
  • Bright pixels are pixels used to display salient features, and their brightness value is 1.
  • Dark pixels are pixels outside of salient features, whose brightness is 1.
  • a value of 0 means "pure black" is displayed.
  • the proportion of bright pixels in each image unit is judged respectively. For example, assuming that the number of pixels in a graphic unit is 10, including 6 bright pixels and 4 dark pixels, the ratio of the number of bright pixels in the graphic unit is 0.6.
  • the ratio of bright pixels in each graphics unit is greater than a preset ratio, wherein the range of the preset ratio is greater than or equal to 0.2, preferably 0.4 .
  • the preset ratio of 0.4 as an example, if there are 4 or more bright pixels among the 10 pixels in a graphics unit, this graphics unit is marked as the target graphics unit, which is used to indicate that the target graphics unit is have distinctive features.
  • target graphic units in all X ⁇ Y graphic units are determined, these target graphic units are the salient features that are finally mapped in the MIDI information coordinate system.
  • the target graphic unit is determined according to the ratio of the number of bright pixels in the divided X ⁇ Y graphic units, and the target graphic unit is taken as a distinctive feature, Mapped to the MIDI information coordinate system, the transformation from image to MIDI electronic score is realized, and then the transformation from image to music is realized. At the same time, the music is visualized, which can give users a dual experience of hearing and vision.
  • the processing module is also used to transform the first vertical coordinate into the coordinate system of the digital instrument interface information to obtain the second vertical coordinate of the salient features in the digital musical instrument interface coordinate system Coordinates; convert the first abscissa to the musical instrument digital interface information coordinate system to obtain the second abscissa of the salient features in the musical instrument digital interface information coordinate system; according to the second ordinate and the second abscissa, the N salient features Mapped to the musical instrument digital interface information coordinate system, N audio track blocks corresponding to N salient features one-to-one are obtained.
  • the first abscissa and the first ordinate of the salient features obtained above can be synchronously converted to the second abscissa in the MIDI information coordinate system coordinates and the second ordinate, so as to realize the mapping of salient features in the MIDI information coordinate system.
  • N salient features are mapped to the MIDI information coordinate system, and N track blocks corresponding to the N notable features are obtained, and the N track blocks are processed through the musical instrument digital interface program.
  • visualized music can be obtained.
  • the features of the salient objects in the first image are preserved, and on the other hand, unique music corresponding to the salient objects in the first image can be generated.
  • the MIDI information coordinate system is used to indicate the corresponding relationship between musical instrument digital interface information and time. Therefore, according to a notable feature, that is, the coordinates of an audio track block in the MIDI information coordinate system, the MIDI information of an audio track block can be determined. Information and time information. After the computer program recognizes the MIDI information and time information of the track block, it can convert it into a music motive.
  • This music motive has sound attributes such as timbre, pitch, volume, etc., and also has the time attribute of beat , play multiple audio track blocks corresponding to multiple salient features according to their MIDI information and time information, and finally get the music converted from the first image, that is, the music that matches the user's "recall image", satisfying Meet the user's demand for unique music creation.
  • the audio track block contains the digital instrument interface information, and the digital instrument interface information is determined according to the second ordinate corresponding to the audio track block; wherein, the digital instrument interface information includes the following information At least one of: pitch, timbre, volume.
  • the second vertical coordinate of the audio track block in the MIDI information coordinate system is the MIDI information corresponding to the audio track block.
  • the second ordinate represents the MIDI information of the track block, including MIDI pitch, MIDI timbre and MIDI volume. Specifically, every time the ordinate increases by 1, the scale increases by 1, and every time the ordinate increases by 8, the scale increases by one octave.
  • the timbre and volume of an audio track block can also be obtained.
  • a crisper timbre can be set for it, such as For violin, flute and other musical instruments, and the pitch of the track block is within the range of the middle scale, you can set the tone of the main melody instrument such as piano, guitar, etc., and when the pitch of the track block is within the range of the bass scale , you can set the timbre of thick instruments such as organ and bass.
  • the present application sets the MIDI information based on the second ordinate of the track block, specifically setting music attributes such as the pitch, timbre, and volume of the track block, so that the generated music is more in line with the music theory, and the image quality is improved. Effects that generate music.
  • the music file generation device further includes a receiving module, configured to receive a first input, wherein the first input is an input for selecting preset music features;
  • the processing module is also used to determine the target music feature in response to the first input, the target music feature includes at least one of the following: music style, music mood, music style; adjust the music according to the music feature;
  • the device for generating music files also includes a playing module for playing music files.
  • the user can select a plurality of preset music features and select a target music feature, so as to adjust the music generated according to the first image rationally.
  • the target music characteristics include music style, such as: pop music, classical music, electronic music, etc., and also include music mood, such as: passionate, deep, soothing, etc., and also include music style, such as: rock music, jazz, blues, etc. .
  • the music generated according to the first image is adjusted, so that the adjusted music is more in line with the music feature selected by the user. If the user selects classical music, soothing, and blues, then the intermediate frequency and The volume of the low frequency, while adjusting the time interval of the second abscissa, makes the music rhythm slower and more soothing.
  • further post-processing can be performed on the second ordinate of the track block in the MIDI coordinate system according to the preset music theory data and acoustic data.
  • a key can be set in advance, and the range of the highest and lowest scales can be specified. If the highest and lowest scales of the track block within a certain period of time exceed this range, then according to certain adjustment rules, the range will be adjusted.
  • Adjust the pitch of the track block that is, adjust the out-of-tone to in-tone, such as lowering the pitch of the track block higher than the highest scale threshold by one octave, or lowering the pitch of the track block lower than the lowest scale threshold
  • the pitch is increased by one octave, etc., so that the adjusted music is more in line with the music theory.
  • the adjusted music can be played automatically, so that the user can immediately enjoy the music generated according to the "memory photo" selected by him. Music, enjoy the joy of music creation.
  • the generating module is also used to generate a second image corresponding to the music file
  • the playing module is also used for displaying the second image and playing music files.
  • a second image corresponding to the music file to be played may also be generated, and the second image is displayed while the music file is played, so that the user can experience visual and auditory enjoyment at the same time.
  • the second image may be a static picture generated according to the first image selected by the user, or the salient feature texture map corresponding to the first image, and the static picture and the playing progress of the music are displayed when the music file is played.
  • the second image can also be an animation file generated according to a preset template, or according to the playback interface of the MIDI information coordinate system, the animation duration of the animation file matches the music duration of the generated music, and is played while playing the music file Animations further enhance the user's visual experience.
  • the receiving module is further configured to receive a second input, wherein the second input is an input for selecting a preset video template;
  • the processing module is also used to determine the target video template in response to the second input
  • the generation module is also used to generate the second image according to the target video template and the salient target texture map.
  • the target video template selected by the user's second input by receiving the user's second input, the target video template selected by the user's second input, and the salient target texture map corresponding to the first image can be generated when playing music as the background when playing music image.
  • the video template may be a coherent animation template, or a "slideshow" in which multiple static pictures are displayed in turn.
  • the salient target texture map corresponding to the first image is superimposed and displayed, so that when the user sees the second image, the memory of the first image can be recalled and user experience can be improved.
  • the generating module is also used to generate a target animation through the piano roll graphical interface, wherein the target animation is used to display the progress of music playback; according to the target animation and the prominent target texture map , generating the second image.
  • the target animation is generated through the Piano Roll GUI, wherein the target animation is the process of playing the audio track block in the MIDI file in the Piano Roll GUI.
  • the target animation is the process of playing the audio track block in the MIDI file in the Piano Roll GUI.
  • the background of the interface according to the salient target texture map corresponding to the first image, it is used as the background image of the second image, so that a dominant visual connection is established between the second image and the first image, so that the user listens to music.
  • watch the second image associated with the "reminiscence image” thereby arousing the user's memory and enriching the user's visual experience.
  • the device for generating music files in the embodiment of the present application may be a device, or a component, an integrated circuit, or a chip in a terminal.
  • the device may be a mobile electronic device or a non-mobile electronic device.
  • the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle electronic device, a wearable device, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook or a personal digital assistant (personal digital assistant).
  • non-mobile electronic devices can be servers, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (television, TV), teller machine or self-service machine, etc., this application Examples are not specifically limited.
  • Network Attached Storage NAS
  • personal computer personal computer, PC
  • television television
  • teller machine or self-service machine etc.
  • the device for generating music files in the embodiment of the present application may be a device with an operating system.
  • the operating system may be an Android operating system, an iOS operating system, or other possible operating systems, which are not specifically limited in this embodiment of the present application.
  • the device for generating a music file provided in the embodiment of the present application can implement the various processes implemented in the above-mentioned method embodiments, and details are not repeated here to avoid repetition.
  • FIG. 9 shows a structural block diagram of the electronic device according to the embodiment of the present application. As shown in FIG. 9, it includes a processor 902, a memory 904, and stores 904 and can run on the processor 902.
  • the program or instruction is executed by the processor 902
  • the various processes of the above-mentioned method embodiments can be achieved, and the same technical effect can be achieved. In order to avoid repetition, it is not described here Let me repeat.
  • the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.
  • FIG. 10 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
  • the electronic device 2000 includes, but is not limited to: a radio frequency unit 2001, a network module 2002, an audio output unit 2003, an input unit 2004, a sensor 2005, a display unit 2006, a user input unit 2007, an interface unit 2008, a memory 2009, and a processor 2010, etc. part.
  • the electronic device 2000 can also include a power supply (such as a battery) for supplying power to various components, and the power supply can be logically connected to the processor 2010 through the power management system, so that the management of charging, discharging, and function can be realized through the power management system. Consumption management and other functions.
  • a power supply such as a battery
  • the structure of the electronic device shown in FIG. 10 does not constitute a limitation to the electronic device.
  • the electronic device may include more or fewer components than shown in the figure, or combine certain components, or arrange different components, and details will not be repeated here. .
  • the processor 2010 is used to obtain the first image; perform feature extraction on the first image to obtain salient features of the first image; based on the position of the salient features in the first image, map the salient features to the musical instrument digital interface information coordinate system
  • the MDI information corresponding to the salient features is determined; the MDI information coordinate system is used to indicate the correspondence between the MDI information and time; based on the MDI information and the time correspondence, a music file is generated.
  • the image content of the first image includes salient objects, and the salient features include at least one of the following: key points of the salient objects, and edge feature points of the salient objects.
  • the processor 2010 is further configured to perform object segmentation on the first image through a convolutional neural network to obtain salient objects in the first image and edge feature points of the salient objects; perform key point extraction on the salient objects to obtain Key points for salient goals.
  • the processor 2010 is further configured to generate a salient object texture map corresponding to the first image according to the salient features; and determine a position of the salient feature in the first image according to the salient object texture map.
  • the processor 2010 is further configured to perform edge detection on the first image according to the edge feature points and the Canny edge detection algorithm to obtain an edge image of the salient object; and generate a salient object corresponding to the salient object according to the key points and the edge feature points.
  • the target map performing image superposition on the edge image and the salient target map to obtain the salient target texture map corresponding to the first image.
  • the processor 2010 is further configured to divide the target texture map into X by Y graphics units of X rows and Y columns, where X and Y are both integers greater than 1, and the graphics units include bright pixels and dark pixels At least one of them, the bright pixel is a pixel with a brightness value of 1, and the dark pixel is a pixel with a brightness value of 0; among the graphics units X multiplied by Y, determine the target graphics unit whose quantity ratio of bright pixels is greater than a preset ratio , to obtain N target graphic units, wherein, the number of salient features of the first image is N, and N target graphic units are in one-to-one correspondence with N salient features, and N is a positive integer; according to N target graphic units, each The number of rows of the target graphic unit in X multiplied by Y graphic units determines the first vertical coordinate of the salient feature in the first image; according to the N target graphic units, each target graphic unit is in X multiplied by Y graphics Determine the first abscis
  • the processor 2010 is also configured to transform the first ordinate into the musical instrument digital interface information coordinate system to obtain the second ordinate of the salient feature in the musical instrument digital interface information coordinate system; convert the first abscissa to the musical instrument In the digital interface information coordinate system, the second abscissa of the salient features in the musical instrument digital interface information coordinate system is obtained; according to the second ordinate and the second abscissa, N salient features are mapped to the musical instrument digital interface information coordinate system, N track blocks corresponding to N salient features one-to-one are obtained.
  • the audio track block includes the digital instrument interface information
  • the processor 2010 is further configured to determine the digital musical instrument interface information according to the second ordinate corresponding to the audio track block; wherein, the digital musical instrument interface information includes at least one of the following information: Pitch, timbre, volume.
  • the user input unit 2007 is configured to receive a first input, wherein the first input is an input for selecting preset music features;
  • the processor 2010 is also used to determine the target music feature in response to the first input, the target music feature includes at least one of the following: music style, music mood, music style; adjust the music according to the music feature;
  • the audio output unit 2003 is used to play music files.
  • the processor 2010 is also configured to generate a second image corresponding to the music file
  • the display unit 2006 is also used for displaying the second image, and the audio output unit 2003 is also used for playing music files.
  • the user input unit 2007 is further configured to receive a second input, wherein the second input is an input for selecting a preset video template;
  • the processor 2010 is further configured to determine a target video template in response to the second input; and generate a second image according to the target video template and the salient target texture map.
  • the processor 2010 is further configured to generate a target animation through the Piano Roll GUI, where the target animation is used to show the playing progress of the music; and generate the second image according to the target animation and the salient target texture map.
  • the embodiment of this application constructs music through images, so that the formed music matches the images containing the user's memories.
  • the threshold for music creation is lowered, so that "novice" users who do not have music theory knowledge can also construct corresponding music based on pictures.
  • Music displays audio track blocks through the MIDI information coordinate system, making the final music visualization visible, giving users a unique dual experience of hearing and vision.
  • the input unit 2004 may include a graphics processor (Graphics Processing Unit, GPU) 20041 and a microphone 20042, and the graphics processor 20041 is used for the image capture device (such as the image data of the still picture or video obtained by the camera) for processing.
  • a graphics processor Graphics Processing Unit, GPU
  • the graphics processor 20041 is used for the image capture device (such as the image data of the still picture or video obtained by the camera) for processing.
  • the display unit 2006 may include a display panel 20061, and the display panel 20061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the user input unit 2007 includes a touch panel 20071 and other input devices 20072 .
  • Touch panel 20071 also called touch screen.
  • the touch panel 20071 may include two parts, a touch detection device and a touch controller.
  • Other input devices 20072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be repeated here.
  • the memory 2009 can be used to store software programs as well as various data, including but not limited to application programs and operating systems.
  • Processor 2010 may integrate an application processor and a modem processor, wherein the application processor mainly processes operating systems, user interfaces, and application programs, and the modem processor mainly processes wireless communications. It can be understood that the foregoing modem processor may not be integrated into the processor 2010 .
  • the embodiment of the present application also provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by the processor, each process of the above-mentioned method embodiment can be realized, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
  • a readable storage medium includes a computer-readable storage medium, such as a computer read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.
  • ROM computer read-only memory
  • RAM random access memory
  • magnetic disk or an optical disk and the like.
  • the embodiment of the present application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the various processes of the above-mentioned method embodiments, and can achieve the same technical effect , to avoid repetition, it will not be repeated here.
  • chips mentioned in the embodiments of the present application may also be called system-on-chip, system-on-chip, system-on-a-chip, or system-on-a-chip.
  • the term “comprising”, “comprising” or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase “comprising a " does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.
  • the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Functions are performed, for example, the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本申请公开了一种音乐文件的生成方法、生成装置、电子设备和存储介质,属于计算机技术领域。音乐文件的生成方法包括:获取第一图像;对第一图像进行特征提取,得到第一图像的显著特征;基于显著特征在第一图像中的位置,将显著特征映射到乐器数字接口信息坐标系中,确定显著特征对应的乐器数字接口信息;乐器数字接口信息坐标系用于指示乐器数字接口信息与时间的对应关系;基于乐器数字接口信息与时间的对应关系,生成音乐文件。本申请一方面降低了音乐创作门槛,使得不具有乐理知识的"小白"用户,也能根据图片构建出对应的音乐,另一方面通过MIDI信息坐标系展示音轨块,使得最终构建的音乐可视化,给予用户独一无二的听觉和视觉的双重体验。

Description

音乐文件的生成方法、生成装置、电子设备和存储介质
相关申请的交叉引用
本申请要求于2021年07月23日提交的,申请号为202110839656.2、发明名称为“音乐文件的生成方法、生成装置、电子设备和存储介质”的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请属于计算机技术领域,具体涉及一种音乐文件的生成方法、生成装置、电子设备和存储介质。
背景技术
在相关技术中,音乐创作具有较高的门槛,普通用户难以参与到音乐创造中。同时,创造出的“音乐”一般被认为是听觉的艺术,音乐本身与听者的听觉感官建立联系,没有与人最重要的感官“视觉”建立联系,导致创造音乐的过程的用户体验单一。
发明内容
本申请实施例的目的是提供一种音乐文件的生成方法、生成装置、电子设备和存储介质,能够基于可视化的图像生成音乐,给予用户独一无二的听觉和视觉的双重体验。
第一方面,本申请实施例提供了一种音乐文件的生成方法,包括:
获取第一图像;
对第一图像进行特征提取,得到第一图像的显著特征;
基于显著特征在第一图像中的位置,将显著特征映射到乐器数字接口信息坐标系中,确定显著特征对应的乐器数字接口信息;乐器数字接口信息坐标系用于指示乐器数字接口信息与时间的对应关系;
基于乐器数字接口信息与时间的对应关系,生成音乐文件。
第二方面,本申请实施例提供了一种音乐文件的生成装置,包括:
获取模块,用于获取第一图像;
提取模块,用于对第一图像进行特征提取,得到第一图像的显著特征;
处理模块,用于基于显著特征在第一图像中的位置,将显著特征映射到乐器数字接口信息坐标系中,确定显著特征对应的乐器数字接口信息;乐器数字接口信息坐标系用于指示乐器数字接口信息与时间的对应关系;
生成模块,用于基于乐器数字接口信息与时间的对应关系,生成音乐文件。
第三方面,本申请实施例提供了一种电子设备,包括处理器,存储器及存储在存储器上并可在处理器上运行的程序或指令,程序或指令被处理器执行时实现如第一方面的方法的步骤。
第四方面,本申请实施例提供了一种可读存储介质,该可读存储介质上存储程序或指令,该程序或指令被处理器执行时实现如第一方面的方法的步骤。
第五方面,本申请实施例提供了一种芯片,该芯片包括处理器和通信接口,该通信接口和该处理器耦合,该处理器用于运行程序或指令,实现如第一方面的方法的步骤。
在本申请实施例中,通过对图像,即上述第一图像进行处理,从而将图像信息,如照片或视频,转化为可视化的电子乐谱文件,具体为在乐器数字接口(Musical Instrument Digital Interface,MIDI)坐标系中,显示音轨块的方式,其中,这些音轨块构成了第一图像的显著特征,即音轨块构成的图形与第一图像的显著特征的图像相匹配。同时,这些音轨块均包括乐器数字接口信息,也即MIDI信息,MIDI信息被计算机识别后,根据MIDI信息与时间的对应关系,按照时间顺序播放这些音轨块,从而形成音乐。
本申请实施例通过图像构建音乐,使得形成的音乐与包含用户回忆的图像相符合,一方面降低了音乐创作门槛,使得不具有乐理知识的“小白”用户,也能根据图片构建出对应的音乐,另一方面通过MIDI信息坐标系展示音轨块,使得最终构建的音乐可视化,给予用户独一无二的听觉和视觉的双重体验。
附图说明
图1示出了根据本申请实施例的音乐文件的生成方法的流程图之一;
图2示出了根据本申请实施例的音乐文件的生成方法的MIDI信息坐标系的界面示意图;
图3示出了根据本申请实施例的音乐文件的生成方法的显著目标纹理图的示意图;
图4示出了根据本申请实施例的音乐文件的生成方法的流程图之二;
图5示出了根据本申请实施例的音乐文件的生成方法的目标纹理图的划分示意图;
图6示出了根据本申请实施例的音乐文件的生成方法的流程图之三;
图7示出了根据本申请实施例的音乐文件的生成方法中钢琴卷帘图形界面的示意图;
图8示出了根据本申请实施例的音乐文件的生成装置的结构框图;
图9示出了根据本申请实施例的电子设备的结构框图;
图10为实现本申请实施例的一种电子设备的硬件结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”等所区分的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。
下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的音乐文件的生成方法、生成装置、电子设备和存储介质进行详细地说明。
在本申请的一些实施例中,提供了一种音乐文件的生成方法,图1示出了根据本申请实施例的音乐文件的生成方法的流程图之一,如图1所示,该方法包括:
步骤102,获取第一图像;
步骤104,对第一图像进行特征提取,得到第一图像的显著特征;
步骤106,基于显著特征在第一图像中的位置,将显著特征映射到乐器数字接口信息坐标系中,确定显著特征对应的乐器数字接口信息;
在步骤106中,乐器数字接口信息坐标系用于指示乐器数字接口信息与时间的对应关系;
步骤108,基于乐器数字接口信息与时间的对应关系,生成音乐文件。
在本申请实施例中,第一图像,具体为用户选择的“回忆图像”。具体地,用于可以通过将本地保存的照片或视频上传至客户端,来得到第一图像,用户还可以通过手机等电子设备的摄像头拍摄照片或录制视频,从而得到第一图像。
其中,当用户选择上传了视频,或通过手机录制了视频的情况,可通过在视频中抽帧的方式,得到第一图像。其中,可以在视频中随机抽取一帧,也可以通过神经网络模型对视频内容进行识别,从而确定能够体现视频主题的图像帧进行抽取。
具体地,在一些实施方式中,获取第一图像具体包括:接收第三输入,其中第三输入为对第一图像进行选择的输入;响应于第三输入,确定第一图像。
在另一些实施方式中,获取第一图像具体包括:接收第四输入,其中第四输入为拍摄视频的输入;响应于第四输入,拍摄待处理视频;对待处理视频进行抽帧处理,得到第一图像。
在得到第一图像后,进一步对第一图像进行特征提取,从而在第一图像中,提取出第一图像的显著特征。举例来说,如果第一图像是“人脸”图片,则第一图像的显著特征为其中的人脸轮廓、五官位置等。如果第一图像是全身或半身的“人像”图片,则第一图像的显著特征为其中人物的身形轮廓、姿势等。
继续举例来说,如果第一图像是动物照片或孩童照片等“动”物(活动的生物),则第一图像的显著特征可以是动物或孩子的身形轮廓、五官位置。如果第一图像是建筑、车辆、风景等“静”物(静止的物体),则第一图像的显著特征可以是这些静物的整体外款和显著装置。
能够理解的是,可以根据第一图像的具体内容,设置不同的特征提取粒度。
进一步地,在得到第一图像的显著特征后,根据该显著特征在第一图像中的位置,在乐器数字接口信息坐标系,也即MIDI信息坐标系中映射该显著特征,使该显著特征的图像单元,形成为MIDI信息坐标系中的音轨块。其中,MIDI信息坐标系用于指示乐器数字接口信息与时间的对应关系,也即一个音轨块对应的MIDI信息,和时间之间的关系。
具体地,图2示出了根据本申请实施例的音乐文件的生成方法的MIDI信息坐标系的界面示意图,如图2所示,第一图像具体为人脸图像,将该人脸图像的显著特征,在MIDI信息坐标系200中映射为多个音轨块202,多个音轨块202在MIDI信息坐标系中构成了一个近似人脸的形状,该人脸的形状即与第一图像的显著特征相对应。
进一步地,这些与显著特征对应的音轨块具有乐器数字接口信息,也即MIDI信息,这些MIDI信息具体为能够被计算机设备识别并播放成“声音”的信息,当计算机设备识别到MIDI信息后,根据MIDI信息得到如音高、音色、音量等信息对应的数字信号,从而形成为一个音乐动机,即一个重音,根据这些显著特征与时间的对应关系,也即这些音乐动机与时间的对应关系,顺次播放这些音乐动机对应的“声音”,从而形成一段音乐,这段音乐即根据用户所选择的“回忆图像”,也即第一图像生成的,独一无二的音乐。
本申请实施例通过图像构建音乐,使得形成的音乐与包含用户回忆的图像相符合,一方面降低了音乐创作门槛,使得不具有乐理知识的“小白”用户,也能根据图片构建出对应的音乐,另一方面通过MIDI信息坐标系展示音轨块,使得最终构建的音乐可视化,给予用户独一无二的听觉和视觉的双重体验。
在本申请的一些实施例中,第一图像的图像内容包括显著目标,显著特征包括以下中的至少一项:显著目标的关键点、显著目标的边缘特征点。
在本申请实施例中,显著目标即第一图像的图像内容中的主体目标。比如说,当第一图像的图像内容为人脸和背景的花丛时,显著目标即该“人脸”。又比如说,当第一图像的图像内容为建筑物和背景的蓝天时,显著目标即该“建筑物”。
在此基础上,显著特征具体包括显著目标的关键点,比如人脸的关键点即“五官”,而建筑物的关键点即建筑物的特色设计,如“窗子”、“门庭”等。显著特征还可以包括显著目标的边缘特征点,这些边缘特征点会形成为显著目标的轮廓,如人脸轮廓或建筑物轮廓。
因此,通过对图像内容中显著目标的关键点,和显著目标的边缘特征点进行提取,既可以形成为显著目标的“简图”,通过该简图,能够让观看者联想到原图像中的被拍摄物体,如“某人”或“某栋建筑”,从而唤起观看者的回忆。
本申请实施例通过检测关键点和边缘特征点,构成显著目标的显著特征,并基于显著特征生成音乐,实现了音乐可视化,给予了用户听觉上和视觉上的双重体验。
在本申请的一些实施例中,对第一图像进行特征提取,得到第一图像的显著特征,包括:
通过卷积神经网络,对第一图像进行目标分割,得到第一图像中的显著目标,和显著目标的边缘特征点;
对显著目标进行关键点提取,得到显著目标的关键点。
在本申请实施例中,在对第一图形进行特征提取时,首先,可以通过预先训练好的卷积神经网络,第一图像进行目标分割。其中,目标分割的目的是在第一图像中的显著目标分割出来。
具体地,可以通过大量预先标注好的训练集,对预设的卷积神经网络进行训练,使得训练好的卷积神经网络,能够在图片中识别出显著目标。比如说,对于人像图片,可以通过大量设置原始的人脸图片,和将“人脸”部分抠图分割出来后的仅包含“人脸”的显著目标图片,生成训练集,通过该训练集对卷积神经网络进行训练,使得卷积神经网络不断迭代,当卷积神经网络能够相对准确地在图片中识别出显著目标,和显著目标的边缘后,判断卷积神经网络可以投入使用。
通过上述方法训练好的卷积神经网络,对第一图像进行人工智能识别,从而判断出其中的显著目标,和显著目标的边缘,得到显著目标的边缘特征点。
进一步地,通过对显著目标的图像识别,判断出显著目标的具体类型,如“人脸”、“动物”、“建筑物”等,从而根据显著目标的具体类型确定对应的关键点提取粒度,根据对应的提取粒度,对显著目标进行关键点提取,从而得到显著目标的关键点,如人脸的五官等。
本申请通过训练好的卷积神经网络,对第一图像中显著目标的显著特征进行提取,具体为对显著目标的关键点和边缘特征点进行提取,能够快速、准确地得到显著特征,从而提高通过图像生成音乐的处理速度,有利于提高用户体验。
在本申请的一些实施例中,在基于显著特征在第一图像中的位置,将显著特征映射到乐器数字接口信息坐标系中之前,音乐文件的生成方法还包括:
根据显著特征,生成第一图像对应的显著目标纹理图;
根据显著目标纹理图,确定显著特征在第一图像中的位置。
在本申请实施例中,根据第一图像的显著特征,生成第一图像对应的显著目标纹理图。其中,显著目标纹理图,即在第一图像中,仅显示显著目标的显著特征的图像。在一个典型的实施方式中,显著目标纹路图中,仅包括两种像素,其中的第一种像素即用于显示显著特征的像素,第二种像素即非显著特征位置的像素。
图3示出了根据本申请实施例的音乐文件的生成方法的显著目标纹理图的示意图,如图3所示,第一图像为人脸图像,其中的显著目标即人脸,此时,显著目标纹理图看起来就像是人脸的简图。
由于显著目标纹理图为将第一图像处理为仅显示显著特征的图像,因此在确定显著特征在第一图像中的位置时,可以根据显著目标纹理图,来确定显著特征的位置,从而将显著特征映射到MIDI信息坐标系中,实现从图像到MIDI电子乐谱,最终到音乐的转换过程,实现了“从图像到音乐”,给予用户独一无二的体验。
在本申请的一些实施例中,图4示出了根据本申请实施例的音乐文件的生成方法的流程图之二,如图4所示,根据显著特征,生成第一图像对应的显著目标纹理图的步骤,具体包括以下步骤:
步骤402,根据边缘特征点和坎尼边缘检测算法,对第一图像进行边缘检测,得到显著目标的边缘图像;
步骤404,根据关键点和边缘特征点,生成显著目标对应的显著目标图;
步骤406,对边缘图像和显著目标图进行图像叠加,得到第一图像对应的显著目标纹理图。
在本申请实施例中,在根据显著特征生成显著目标纹理图时,首先,根据边缘特征点,通过坎尼边缘检测算法进行边缘检测。其中,坎尼边缘检测算法也即Canny边缘检测算法,具体是John F.Canny于1986年开发出来的一个多级边缘检测算法。
具体地,通过Canny边缘检测算法对第一图像进行边缘检测时,首先对第一图像进行高斯滤波,也就是用一个高斯矩阵,对每一个像素点及其邻域,去其待权重的平均值,作为像素的灰度值。进一步地,计算梯度值和梯度方向,并过滤非最大值,最后使用设置好的阈值范围,来进行边缘检测,得到显著目标物的边缘图像。
进一步地,根据显著目标的关键点,和显著目标的边缘特征点,生成显著目标对应的显著目标图,也即由关键点和边缘特征点形成的特征图。
更进一步地,对边缘图像,和显著目标图进行图像叠加,使得边缘图像和边缘特征点相连,也相当于把每个关键点和轮廓一起画出来,最终得到具有清晰轮廓的显著目标纹理 图。
在本申请的一些实施例中,根据目标纹理图,确定显著特征在第一图像中的位置,包括:
将目标纹理图划分为X行、Y列的X乘Y个图形单元,其中,X和Y均为大于1的整数,图形单元内包括亮像素和暗像素中的至少一种,亮像素为亮度值为1的像素,暗像素为亮度值为0的像素;
在X乘Y个图形单元中,确定亮像素的数量占比大于预设比值的目标图形单元,得到N个目标图形单元,其中,第一图像的显著特征的数量为N,N个目标图形单元与N个显著特征一一对应,N为正整数;
根据N个目标图形单元中,每个目标图形单元在X乘Y个图形单元中所处的行数,确定显著特征在第一图像中的第一纵坐标;
根据N个目标图形单元中,每个目标图形单元在X乘Y个图形单元中所处的列数,确定显著特征在第一图像中的第一横坐标;
根据显著特征的横坐标和显著特征的横坐标纵坐标,确定显著特征在第一图像中的位置。
在本申请实施例中,首先,对目标纹理图进行划分,具体划分为X行、Y列,得到一个X×Y的图形矩阵,该图形矩阵中包括X×Y个图形单元。在每个图形单元中,包括多个像素,其中包括有亮像素和暗像素,亮像素即用于显示显著特征的像素,其亮度值为1,暗像素即显著特征之外的像素,其亮度值为0,即显示“纯黑”。
进一步地,分别判断X×Y个图形单元中,每一个图像单元内亮像素的占比。举例来说,假设一个图形单元内的像素数量为10个,其中包括6个亮像素,和4个暗像素,则该图形单元中亮像素的数量占比为0.6。
在确定每个图形单元内亮像素数量的占比后,分别判断每个图像单元中,亮像素的占比是否大于预设比值,其中,预设比值的范围为大于或等于0.2,优选为0.4。以预设比值为0.4为例,如果一个图形单元内的10个像素中,有4个或更多亮像素,则将这个图形单元,标记为目标图形单元,用于表示这个目标图形单元中是有显著特征的。
在确定全部X×Y个图形单元中的全部目标图形单元后,这些目标图形单元,即最终在MIDI信息坐标系中进行映射的显著特征。图5示出了根据本申请实施例的音乐文件的生成方法的目标纹理图的划分示意图,如图5所示,将目标纹理图500,划分为5×5,共25个图形单元502。其中,填充有阴影线的,即一个目标图形单元504,也即一个显著特征的单位。
更进一步地,以图5中,被黑色填充的图形单元506为例,图形单元506位于第4列, 第2行,则可以确定该图形单元506对应的显著特征,在第一图像中的第一横坐标4x,第一纵坐标2y。
同理,根据相同的方法,确定每一个目标图形单元的第一横坐标和第一纵坐标,进而得到每个显著特征在第一图像中的位置。
本申请实施例通过对第一图像对应的目标纹理图进行划分,从而根据划分后的X×Y个图形单元中亮像素数量的占比,确定目标图形单元,将目标图形单元作为一个显著特征,映射到MIDI信息坐标系中,实现了由图像到MIDI电子乐谱的转换,进而实现了图像到音乐的转化,同时将音乐可视化,能够给予用户听觉上和视觉上的双重体验。
在本申请的一些实施例中,基于显著特征在第一图像中的位置,将显著特征映射到乐器数字接口信息坐标系中,包括:
将第一纵坐标转换到乐器数字接口信息坐标系中,得到显著特征在乐器数字接口信息坐标系中的第二纵坐标;
将第一横坐标转换到乐器数字接口信息坐标系中,得到显著特征在乐器数字接口信息坐标系中的第二横坐标;
根据第二纵坐标和第二横坐标,将N个显著特征映射到乐器数字接口信息坐标系中,得到与N个显著特征一一对应的N个音轨块。
在本申请实施例中,在将显著特征映射到MIDI信息坐标系时,可以将上文得到的显著特征的第一横坐标和第一纵坐标,同步转化到MIDI信息坐标系下的第二横坐标和第二纵坐标,从而实现显著特征在MIDI信息坐标系中的映射。
其中,将全部的N个显著特征,均映射到MIDI信息坐标系中,得到了与N个显著特征一一对应的N个音轨块,通过乐器数字接口程序,对这N个音轨块进行显示和播放,能够得到可视化的音乐,一方面保留第一图像中显著目标的图像特征,另一方面能够生成与第一图像中显著目标对应的独一无二的音乐。
具体地,MIDI信息坐标系用于指示乐器数字接口信息与时间的对应关系,因此,根据一个显著特征,也即一个音轨块在MIDI信息坐标系中的坐标,可以确定一个音轨块的MIDI信息和时间信息,计算机程序在识别到音轨块的MIDI信息和时间信息后,能够将其转化为一个音乐动机,这个音乐动机具有音色、音高、音量等声音属性,还具有节拍的时间属性,将多个显著特征对应的多个音轨块,按照其MIDI信息和时间信息进行播放,最终会得到由第一图像转换而来的音乐,即与用户“回忆图像”相匹配的音乐,满足了用户对独一无二的音乐创造的需求。
在本申请的一些实施例中,音轨块包含乐器数字接口信息,根据音轨块对应的第二纵坐标确定乐器数字接口信息;其中,乐器数字接口信息包括以下信息中的至少一项:音高、 音色、音量。
在本申请实施例中,音轨块在MIDI信息坐标系下的第二纵坐标,即该音轨块对应的MIDI信息。具体地,在MIDI信息坐标系下,第二纵坐标代表了音轨块的MIDI信息,其中包括MIDI音高、MIDI音色和MIDI音量。具体地,纵坐标每提升1,则音阶提升1,纵坐标每提升8,则音阶提高一个八度。
同时,根据第二坐标,还可以得到一个音轨块的音色和音量,其中,当一个音轨块的音高越高,如处于高音音阶的范围,则可以为其设置较为清脆的音色,如小提琴、长笛等乐器的音色,而音轨块的音高处于中音音阶范围内,则可以为其设置如钢琴、吉他等主旋律乐器的音色,而当音轨块的音高处于低音音阶范围内,则可以为其设置风琴、贝斯等厚重的乐器的音色。
同理,处于中音音阶范围内的音轨块,可以为其设置更大的音量以凸显主旋律,而对于高音和低音的范围,可以适当降低音量,防止对用户耳朵产生压迫。
本申请基于音轨块的第二纵坐标,对其MIDI信息进行设置,具体为对音轨块的音高、音色和音量等音乐属性进行设置,从而使得生成的音乐更加符合乐理,提高由图片生成音乐的效果。
在本申请的一些实施例中,图6示出了根据本申请实施例的音乐文件的生成方法的流程图之三,如图6所示,方法还包括:
步骤602,接收第一输入;
在步骤602中,第一输入为对预设音乐特征进行选择的输入;在该步骤中,第一输入为通过人机交互部件接收到的用户输入,第一输入包括:触摸输入、生物识别输入、点击输入、体感输入、语音输入、键盘输入或按压输入中的一种或多种的组合,其中:触摸输入包括但不限于点触、滑动或特定的触摸手势等;生物识别输入包括但不限于指纹、虹膜、声纹或面部识别等生物信息输入等;点击输入包括但不限于鼠标点击、开关点击等;体感输入包括但不限于摇晃电子设备、翻转电子设备等;按压输入包括但不限于对触摸屏幕的按压输入、对边框的按压输入、对后盖的按压输入或对其他电子设备的部位的按压输入。本申请实施例对第一输入的具体形式不做限定。
步骤604,响应于第一输入,确定目标音乐特征;
在步骤604中,目标音乐特征包括以下至少一项:音乐风格、音乐心情、音乐曲风;
步骤606,根据音乐特征对音乐进行调整;
步骤608,播放音乐文件。
在本申请实施例中,用户可通过对多个预设音乐特征进行选择,通过选出目标音乐特征,从而对根据第一图像生成的音乐,进行乐理性的调整。其中,目标音乐特征,包括音 乐风格,如:流行音乐、古典音乐、电子音乐等,还包括音乐心情,如:激昂、低沉、舒缓等,还包括音乐曲风,如:摇滚乐、爵士乐、蓝调等。
根据用户选择的目标音乐特征,对根据第一图像生成的音乐进行调整,从而使调整后的音乐更加符合用户选择的音乐特征,如用户选择了古典音乐、舒缓、蓝调,则可以适当增加中频和低频的音量,同时调整第二横坐标的时间间隔,使得音乐节奏更慢、更舒缓。
同时,还可以根据预设的乐理数据和声学数据,对MIDI坐标系中的音轨块的第二纵坐标进行进一步的后处理。举例来说,可以预先设置一个调性,并规定最高音阶和最低音阶的范围,如果一段时长内的音轨块的最高音阶和最低音阶超过了这个范围,则按照一定的调整规则,将范围外的音轨块的音高进行调整,即将调外音调整成调内,如将高过最高音阶阈值的音轨块的音高降低一个八度,或将低于最低音阶阈值的音轨块的音高提高一个八度等,使得调整后的音乐更符合乐理,在对生成的音乐进行调整后,可以自动播放调整后的音乐,从而使用户能够即刻享受根据其选择的“回忆照片”生成的音乐,享受到音乐创作的快乐。
在本申请的一些实施例中,音乐文件的生成方法还包括:生成音乐对应的第二图像;
播放音乐文件,包括:显示第二图像,并播放音乐文件。
在本申请实施例中,还可以生成与播放音乐文件对应的第二图像,并在播放音乐文件的同时,显示第二图像,从而使用户同时体验到视觉上和听觉上的享受。其中,第二图像可以是根据用户选择的第一图像,或者第一图像对应的显著特征纹理图生成的静态图片,在播放音乐文件的时候显示该静态图片和音乐的播放进度。
第二图像还可以是根据预设模版,或根据MIDI信息坐标系的播放界面,生成的动画文件,该动画文件的动画时长,与生成的音乐的音乐时长相匹配,在播放音乐文件的同时播放动画,进一步提高用户的视觉体验。
在本申请的一些实施例中,生成音乐对应的第二图像,包括:
接收第二输入,其中,第二输入为对预设视频模版进行选择的输入;
响应于第二输入,确定目标视频模版;
根据目标视频模版和显著目标纹理图,生成第二图像。
在本申请实施例中,可以通过接收用户的第二输入,根据用户第二输入选择的目标视频模版,和第一图像对应的显著目标纹理图,生成在播放音乐时,作为播放音乐时的背景图像。其中,视频模版可以是连贯的动画模版,也可以是多张静态图片轮次展示的“幻灯片”。
其中,在动画模版中,叠加显示第一图像对应的显著目标纹理图,使得用户在看到第二图像时,能够唤起对拍摄第一图像时的回忆,提高用户的使用体验。
在该实施例中,第二输入为通过人机交互部件接收到的用户输入,第二输入包括:触摸输入、生物识别输入、点击输入、体感输入、语音输入、键盘输入或按压输入中的一种或多种的组合,其中:触摸输入包括但不限于点触、滑动或特定的触摸手势等;生物识别输入包括但不限于指纹、虹膜、声纹或面部识别等生物信息输入等;点击输入包括但不限于鼠标点击、开关点击等;体感输入包括但不限于摇晃电子设备、翻转电子设备等;按压输入包括但不限于对触摸屏幕的按压输入、对边框的按压输入、对后盖的按压输入或对其他电子设备的部位的按压输入。本申请实施例对第二输入的具体形式不做限定。
在本申请的一些实施例中,生成与音乐文件对应的第二图像,包括:
通过钢琴卷帘图形界面,生成目标动画,其中目标动画用于展示音乐的播放进度;
根据目标动画和显著目标纹理图,生成第二图像。
在本申请实施例中,通过钢琴卷帘图形界面,生成目标动画,其中,目标动画即在钢琴卷帘图形界面中,播放MIDI文件中音轨块的过程。具体地,图7示出了根据本申请实施例的音乐文件的生成方法中钢琴卷帘图形界面的示意图,其中,左侧为钢琴的动画图像的琴键702,音轨块704在界面中,根据其对应的时间信息逐渐向左侧的琴键702。
同时,在界面的背景中,根据第一图像对应的显著目标纹理图,作为第二图像的背景图像,使得第二图像与第一图像之间建立显性的视觉连接,使得用户在听音乐的同时,观看与“回忆图像”相关联的第二图像,从而唤起用户的回忆,丰富用户的视觉体验。
在本申请的一些实施例中,提供了一种音乐文件的生成装置,图8示出了根据本申请实施例的音乐文件的生成装置的结构框图,如图8所示,音乐文件的生成装置800包括:
获取模块802,用于获取第一图像;
提取模块804,用于对第一图像进行特征提取,得到第一图像的显著特征;
处理模块806,用于基于显著特征在第一图像中的位置,将显著特征映射到乐器数字接口信息坐标系中,确定显著特征对应的乐器数字接口信息;乐器数字接口信息坐标系用于指示乐器数字接口信息与时间的对应关系;
生成模块808,用于基于乐器数字接口信息与时间的对应关系,生成音乐文件。
在本申请实施例中,第一图像,具体为用户选择的“回忆图像”。具体地,用于可以通过将本地保存的照片或视频上传至客户端,来得到第一图像,用户还可以通过手机等电子设备的摄像头拍摄照片或录制视频,从而得到第一图像。
其中,当用户选择上传了视频,或通过手机录制了视频的情况,可通过在视频中抽帧的方式,得到第一图像。其中,可以在视频中随机抽取一帧,也可以通过神经网络模型对视频内容进行识别,从而确定能够体现视频主题的图像帧进行抽取。
具体地,在一些实施方式中,获取第一图像具体包括:接收第三输入,其中第三输入 为对第一图像进行选择的输入;响应于第三输入,确定第一图像。
在另一些实施方式中,获取第一图像具体包括:接收第四输入,其中第四输入为拍摄视频的输入;响应于第四输入,拍摄待处理视频;对待处理视频进行抽帧处理,得到第一图像。
在得到第一图像后,进一步对第一图像进行特征提取,从而在第一图像中,提取出第一图像的显著特征。举例来说,如果第一图像是“人脸”图片,则第一图像的显著特征为其中的人脸轮廓、五官位置等。如果第一图像是全身或半身的“人像”图片,则第一图像的显著特征为其中人物的身形轮廓、姿势等。
继续举例来说,如果第一图像是动物照片或孩童照片等“动”物,则第一图像的显著特征可以是动物或孩子的身形轮廓、五官位置。如果第一图像是建筑、车辆、风景等“静”物,则第一图像的显著特征可以是这些静物的整体外款和显著装置。
能够理解的是,可以根据第一图像的具体内容,设置不同的特征提取粒度。
进一步地,在得到第一图像的显著特征后,根据该显著特征在第一图像中的位置,在乐器数字接口信息坐标系,也即MIDI信息坐标系中映射该显著特征,使该显著特征的图像单元,形成为MIDI信息坐标系中的音轨块。其中,MIDI信息坐标系用于指示乐器数字接口信息与时间的对应关系,也即一个音轨块对应的MIDI信息,和时间之间的关系。
进一步地,这些与显著特征对应的音轨块具有乐器数字接口信息,也即MIDI信息,这些MIDI信息具体为能够被计算机设备识别并播放成“声音”的信息,当计算机设备识别到MIDI信息后,根据MIDI信息得到如音高、音色、音量等信息对应的数字信号,从而形成为一个音乐动机,即一个重音,根据这些显著特征与时间的对应关系,也即这些音乐动机与时间的对应关系,顺次播放这些音乐动机对应的“声音”,从而形成一段音乐,这段音乐即根据用户所选择的“回忆图像”,也即第一图像生成的,独一无二的音乐。
本申请实施例通过图像构建音乐,使得形成的音乐与包含用户回忆的图像相符合,一方面降低了音乐创作门槛,使得不具有乐理知识的“小白”用户,也能根据图片构建出对应的音乐,另一方面通过MIDI信息坐标系展示音轨块,使得最终构建的音乐可视化,给予用户独一无二的听觉和视觉的双重体验。
在本申请的一些实施例的音乐文件的生成装置中,第一图像的图像内容包括显著目标,显著特征包括以下中的至少一项:显著目标的关键点、显著目标的边缘特征点。
在本申请实施例中,显著目标即第一图像的图像内容中的主体目标。比如说,当第一图像的图像内容为人脸和背景的花丛时,显著目标即该“人脸”。又比如说,当第一图像的图像内容为建筑物和背景的蓝天时,显著目标即该“建筑物”。
在此基础上,显著特征具体包括显著目标的关键点,比如人脸的关键点即“五官”,而 建筑物的关键点即建筑物的特色设计,如“窗子”、“门庭”等。显著特征还可以包括显著目标的边缘特征点,这些边缘特征点会形成为显著目标的轮廓,如人脸轮廓或建筑物轮廓。
因此,通过对图像内容中显著目标的关键点,和显著目标的边缘特征点进行提取,既可以形成为显著目标的“简图”,通过该简图,能够让观看者联想到原图像中的被拍摄物体,如“某人”或“某栋建筑”,从而唤起观看者的回忆。
本申请实施例通过检测关键点和边缘特征点,构成显著目标的显著特征,并基于显著特征生成音乐,实现了音乐可视化,给予了用户听觉上和视觉上的双重体验。
在本申请的一些实施例的音乐文件的生成装置中,处理模块还用于通过卷积神经网络,对第一图像进行目标分割,得到第一图像中的显著目标,和显著目标的边缘特征点;对显著目标进行关键点提取,得到显著目标的关键点。
在本申请实施例中,在对第一图形进行特征提取时,首先,可以通过预先训练好的卷积神经网络,第一图像进行目标分割。其中,目标分割的目的是在第一图像中的显著目标分割出来。
具体地,可以通过大量预先标注好的训练集,对预设的卷积神经网络进行训练,使得训练好的卷积神经网络,能够在图片中识别出显著目标。比如说,对于人像图片,可以通过大量设置原始的人脸图片,和将“人脸”部分抠图分割出来后的仅包含“人脸”的显著目标图片,生成训练集,通过该训练集对卷积神经网络进行训练,使得卷积神经网络不断迭代,当卷积神经网络能够相对准确地在图片中识别出显著目标,和显著目标的边缘后,判断卷积神经网络可以投入使用。
通过上述方法训练好的卷积神经网络,对第一图像进行人工智能识别,从而判断出其中的显著目标,和显著目标的边缘,得到显著目标的边缘特征点。
进一步地,通过对显著目标的图像识别,判断出显著目标的具体类型,如“人脸”、“动物”、“建筑物”等,从而根据显著目标的具体类型确定对应的关键点提取粒度,根据对应的提取粒度,对显著目标进行关键点提取,从而得到显著目标的关键点,如人脸的五官等。
本申请通过训练好的卷积神经网络,对第一图像中显著目标的显著特征进行提取,具体为对显著目标的关键点和边缘特征点进行提取,能够快速、准确地得到显著特征,从而提高通过图像生成音乐的处理速度,有利于提高用户体验。
在本申请的一些实施例的音乐文件的生成装置中,生成模块还用于根据显著特征,生成第一图像对应的显著目标纹理图;
处理模块还用于根据显著目标纹理图,确定显著特征在第一图像中的位置。
在本申请实施例中,根据第一图像的显著特征,生成第一图像对应的显著目标纹理图。其中,显著目标纹理图,即在第一图像中,仅显示显著目标的显著特征的图像。在一个典 型的实施方式中,显著目标纹路图中,仅包括两种像素,其中的第一种像素即用于显示显著特征的像素,第二种像素即非显著特征位置的像素。
由于显著目标纹理图为将第一图像处理为仅显示显著特征的图像,因此在确定显著特征在第一图像中的位置时,可以根据显著目标纹理图,来确定显著特征的位置,从而将显著特征映射到MIDI信息坐标系中,实现从图像到MIDI电子乐谱,最终到音乐的转换过程,实现了“从图像到音乐”,给予用户独一无二的体验。
在本申请的一些实施例的音乐文件的生成装置中,处理模块还用于根据边缘特征点和坎尼边缘检测算法,对第一图像进行边缘检测,得到显著目标的边缘图像;
生成模块还用于根据关键点和边缘特征点,生成显著目标对应的显著目标图;对边缘图像和显著目标图进行图像叠加,得到第一图像对应的显著目标纹理图。
在本申请实施例中,在根据显著特征生成显著目标纹理图时,首先,根据边缘特征点,通过坎尼边缘检测算法进行边缘检测。具体地,通过Canny边缘检测算法对第一图像进行边缘检测时,首先对第一图像进行高斯滤波,也就是用一个高斯矩阵,对每一个像素点及其邻域,去其待权重的平均值,作为像素的灰度值。进一步地,计算梯度值和梯度方向,并过滤非最大值,最后使用设置好的阈值范围,来进行边缘检测,得到显著目标物的边缘图像。
进一步地,根据显著目标的关键点,和显著目标的边缘特征点,生成显著目标对应的显著目标图,也即由关键点和边缘特征点形成的特征图。
更进一步地,对边缘图像,和显著目标图进行图像叠加,使得边缘图像和边缘特征点相连,也相当于把每个关键点和轮廓一起画出来,最终得到具有清晰轮廓的显著目标纹理图。
在本申请的一些实施例的音乐文件的生成装置中,处理模块还用于:
将目标纹理图划分为X行、Y列的X乘Y个图形单元,其中,X和Y均为大于1的整数,图形单元内包括亮像素和暗像素中的至少一种,亮像素为亮度值为1的像素,暗像素为亮度值为0的像素;在X乘Y个图形单元中,确定亮像素的数量占比大于预设比值的目标图形单元,得到N个目标图形单元,其中,第一图像的显著特征的数量为N,N个目标图形单元与N个显著特征一一对应,N为正整数;
根据N个目标图形单元中,每个目标图形单元在X乘Y个图形单元中所处的行数,确定显著特征在第一图像中的第一纵坐标;根据N个目标图形单元中,每个目标图形单元在X乘Y个图形单元中所处的列数,确定显著特征在第一图像中的第一横坐标;根据显著特征的横坐标和显著特征的横坐标纵坐标,确定显著特征在第一图像中的位置。
在本申请实施例中,首先,对目标纹理图进行划分,具体划分为X行、Y列,得到一 个X×Y的图形矩阵,该图形矩阵中包括X×Y个图形单元。在每个图形单元中,包括多个像素,其中包括有亮像素和暗像素,亮像素即用于显示显著特征的像素,其亮度值为1,暗像素即显著特征之外的像素,其亮度值为0,即显示“纯黑”。
进一步地,分别判断X×Y个图形单元中,每一个图像单元内亮像素的占比。举例来说,假设一个图形单元内的像素数量为10个,其中包括6个亮像素,和4个暗像素,则该图形单元中亮像素的数量占比为0.6。
在确定每个图形单元内亮像素数量的占比后,分别判断每个图像单元中,亮像素的占比是否大于预设比值,其中,预设比值的范围为大于或等于0.2,优选为0.4。以预设比值为0.4为例,如果一个图形单元内的10个像素中,有4个或更多亮像素,则将这个图形单元,标记为目标图形单元,用于表示这个目标图形单元中是有显著特征的。
在确定全部X×Y个图形单元中的全部目标图形单元后,这些目标图形单元,即最终在MIDI信息坐标系中进行映射的显著特征。
本申请实施例通过对第一图像对应的目标纹理图进行划分,从而根据划分后的X×Y个图形单元中亮像素数量的占比,确定目标图形单元,将目标图形单元作为一个显著特征,映射到MIDI信息坐标系中,实现了由图像到MIDI电子乐谱的转换,进而实现了图像到音乐的转化,同时将音乐可视化,能够给予用户听觉上和视觉上的双重体验。
在本申请的一些实施例的音乐文件的生成装置中,处理模块还用于将第一纵坐标转换到乐器数字接口信息坐标系中,得到显著特征在乐器数字接口信息坐标系中的第二纵坐标;将第一横坐标转换到乐器数字接口信息坐标系中,得到显著特征在乐器数字接口信息坐标系中的第二横坐标;根据第二纵坐标和第二横坐标,将N个显著特征映射到乐器数字接口信息坐标系中,得到与N个显著特征一一对应的N个音轨块。
在本申请实施例中,在将显著特征映射到MIDI信息坐标系时,可以将上文得到的显著特征的第一横坐标和第一纵坐标,同步转化到MIDI信息坐标系下的第二横坐标和第二纵坐标,从而实现显著特征在MIDI信息坐标系中的映射。
其中,将全部的N个显著特征,均映射到MIDI信息坐标系中,得到了与N个显著特征一一对应的N个音轨块,通过乐器数字接口程序,对这N个音轨块进行显示和播放,能够得到可视化的音乐,一方面保留第一图像中显著目标的特征,另一方面能够生成与第一图像中显著目标对应的独一无二的音乐。
具体地,MIDI信息坐标系用于指示乐器数字接口信息与时间的对应关系,因此,根据一个显著特征,也即一个音轨块在MIDI信息坐标系中的坐标,可以确定一个音轨块的MIDI信息和时间信息,计算机程序在识别到音轨块的MIDI信息和时间信息后,能够将其转化为一个音乐动机,这个音乐动机具有音色、音高、音量等声音属性,还具有节拍的时间属 性,将多个显著特征对应的多个音轨块,按照其MIDI信息和时间信息进行播放,最终会得到由第一图像转换而来的音乐,即与用户“回忆图像”相匹配的音乐,满足了用户对独一无二的音乐创造的需求。
在本申请的一些实施例的音乐文件的生成装置中,述音轨块包含乐器数字接口信息,根据音轨块对应的第二纵坐标确定乐器数字接口信息;其中,乐器数字接口信息包括以下信息中的至少一项:音高、音色、音量。
在本申请实施例中,音轨块在MIDI信息坐标系下的第二纵坐标,即该音轨块对应的MIDI信息。具体地,在MIDI信息坐标系下,第二纵坐标代表了音轨块的MIDI信息,其中包括MIDI音高、MIDI音色和MIDI音量。具体地,纵坐标每提升1,则音阶提升1,纵坐标每提升8,则音阶提高一个八度。
同时,根据第二坐标,还可以得到一个音轨块的音色和音量,其中,当一个音轨块的音高越高,如处于高音音阶的范围,则可以为其设置较为清脆的音色,如小提琴、长笛等乐器的音色,而音轨块的音高处于中音音阶范围内,则可以为其设置如钢琴、吉他等主旋律乐器的音色,而当音轨块的音高处于低音音阶范围内,则可以为其设置风琴、贝斯等厚重的乐器的音色。
同理,处于中音音阶范围内的音轨块,可以为其设置更大的音量以凸显主旋律,而对于高音和低音的范围,可以适当降低音量,防止对用户耳朵产生压迫。
本申请基于音轨块的第二纵坐标,对其MIDI信息进行设置,具体为对音轨块的音高、音色和音量等音乐属性进行设置,从而使得生成的音乐更加符合乐理,提高由图片生成音乐的效果。
在本申请的一些实施例的音乐文件的生成装置中,音乐文件的生成装置还包括接收模块,用于接收第一输入,其中,第一输入为对预设音乐特征进行选择的输入;
处理模块还用于响应于第一输入,确定目标音乐特征,目标音乐特征包括以下至少一项:音乐风格、音乐心情、音乐曲风;根据音乐特征对音乐进行调整;
音乐文件的生成装置还包括播放模块,用于播放音乐文件。
在本申请实施例中,用户可通过对多个预设音乐特征进行选择,通过选出目标音乐特征,从而对根据第一图像生成的音乐,进行乐理性的调整。其中,目标音乐特征,包括音乐风格,如:流行音乐、古典音乐、电子音乐等,还包括音乐心情,如:激昂、低沉、舒缓等,还包括音乐曲风,如:摇滚乐、爵士乐、蓝调等。
根据用户选择的目标音乐特征,对根据第一图像生成的音乐进行调整,从而使调整后的音乐更加符合用户选择的音乐特征,如用户选择了古典音乐、舒缓、蓝调,则可以适当增加中频和低频的音量,同时调整第二横坐标的时间间隔,使得音乐节奏更慢、更舒缓。
同时,还可以根据预设的乐理数据和声学数据,对MIDI坐标系中的音轨块的第二纵坐标进行进一步的后处理。举例来说,可以预先设置一个调性,并规定最高音阶和最低音阶的范围,如果一段时长内的音轨块的最高音阶和最低音阶超过了这个范围,则按照一定的调整规则,将范围外的音轨块的音高进行调整,即将调外音调整成调内,如将高过最高音阶阈值的音轨块的音高降低一个八度,或将低于最低音阶阈值的音轨块的音高提高一个八度等,使得调整后的音乐更符合乐理,在对生成的音乐进行调整后,可以自动播放调整后的音乐,从而使用户能够即刻享受根据其选择的“回忆照片”生成的音乐,享受到音乐创作的快乐。
在本申请的一些实施例的音乐文件的生成装置中,生成模块还用于生成与音乐文件对应的第二图像;
播放模块还用于显示第二图像,并播放音乐文件。
在本申请实施例中,还可以生成与播放音乐文件对应的第二图像,并在播放音乐文件的同时,显示第二图像,从而使用户同时体验到视觉上和听觉上的享受。其中,第二图像可以是根据用户选择的第一图像,或者第一图像对应的显著特征纹理图生成的静态图片,在播放音乐文件的时候显示该静态图片和音乐的播放进度。
第二图像还可以是根据预设模版,或根据MIDI信息坐标系的播放界面,生成的动画文件,该动画文件的动画时长,与生成的音乐的音乐时长相匹配,在播放音乐文件的同时播放动画,进一步提高用户的视觉体验。
在本申请的一些实施例的音乐文件的生成装置中,接收模块还用于接收第二输入,其中,第二输入为对预设视频模版进行选择的输入;
处理模块还用于响应于第二输入,确定目标视频模版;
生成模块还用于根据目标视频模版和显著目标纹理图,生成第二图像。
在本申请实施例中,可以通过接收用户的第二输入,根据用户第二输入选择的目标视频模版,和第一图像对应的显著目标纹理图,生成在播放音乐时,作为播放音乐时的背景图像。其中,视频模版可以是连贯的动画模版,也可以是多张静态图片轮次展示的“幻灯片”。
其中,在动画模版中,叠加显示第一图像对应的显著目标纹理图,使得用户在看到第二图像时,能够唤起对拍摄第一图像时的回忆,提高用户的使用体验。
在本申请的一些实施例的音乐文件的生成装置中,生成模块还用于通过钢琴卷帘图形界面,生成目标动画,其中目标动画用于展示音乐的播放进度;根据目标动画和显著目标纹理图,生成第二图像。
在本申请实施例中,通过钢琴卷帘图形界面,生成目标动画,其中,目标动画即在钢 琴卷帘图形界面中,播放MIDI文件中音轨块的过程。同时,在界面的背景中,根据第一图像对应的显著目标纹理图,作为第二图像的背景图像,使得第二图像与第一图像之间建立显性的视觉连接,使得用户在听音乐的同时,观看与“回忆图像”相关联的第二图像,从而唤起用户的回忆,丰富用户的视觉体验。
本申请实施例中的音乐文件的生成装置可以是装置,也可以是终端中的部件、集成电路、或芯片。该装置可以是移动电子设备,也可以为非移动电子设备。示例性的,移动电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本或者个人数字助理(personal digital assistant,PDA)等,非移动电子设备可以为服务器、网络附属存储器(Network Attached Storage,NAS)、个人计算机(personal computer,PC)、电视机(television,TV)、柜员机或者自助机等,本申请实施例不作具体限定。
本申请实施例中的音乐文件的生成装置可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统,可以为iOS操作系统,还可以为其他可能的操作系统,本申请实施例不作具体限定。
本申请实施例提供的音乐文件的生成装置能够实现上述方法实施例实现的各个过程,为避免重复,这里不再赘述。
可选地,本申请实施例还提供一种电子设备900,图9示出了根据本申请实施例的电子设备的结构框图,如图9所示,包括处理器902,存储器904,存储在存储器904上并可在所述处理器902上运行的程序或指令,该程序或指令被处理器902执行时实现上述方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
需要说明的是,本申请实施例中的电子设备包括上述所述的移动电子设备和非移动电子设备。
图10为实现本申请实施例的一种电子设备的硬件结构示意图。
该电子设备2000包括但不限于:射频单元2001、网络模块2002、音频输出单元2003、输入单元2004、传感器2005、显示单元2006、用户输入单元2007、接口单元2008、存储器2009、以及处理器2010等部件。
本领域技术人员可以理解,电子设备2000还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器2010逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图10中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。
其中,处理器2010用于获取第一图像;对第一图像进行特征提取,得到第一图像的显 著特征;基于显著特征在第一图像中的位置,将显著特征映射到乐器数字接口信息坐标系中,确定显著特征对应的乐器数字接口信息;乐器数字接口信息坐标系用于指示乐器数字接口信息与时间的对应关系;基于乐器数字接口信息与时间的对应关系,生成音乐文件。
可选地,第一图像的图像内容包括显著目标,显著特征包括以下中的至少一项:显著目标的关键点、显著目标的边缘特征点。
可选地,处理器2010还用于通过卷积神经网络,对第一图像进行目标分割,得到第一图像中的显著目标,和显著目标的边缘特征点;对显著目标进行关键点提取,得到显著目标的关键点。
可选地,处理器2010还用于根据显著特征,生成第一图像对应的显著目标纹理图;根据显著目标纹理图,确定显著特征在第一图像中的位置。
可选地,处理器2010还用于根据边缘特征点和坎尼边缘检测算法,对第一图像进行边缘检测,得到显著目标的边缘图像;根据关键点和边缘特征点,生成显著目标对应的显著目标图;对边缘图像和显著目标图进行图像叠加,得到第一图像对应的显著目标纹理图。
可选地,处理器2010还用于将目标纹理图划分为X行、Y列的X乘Y个图形单元,其中,X和Y均为大于1的整数,图形单元内包括亮像素和暗像素中的至少一种,亮像素为亮度值为1的像素,暗像素为亮度值为0的像素;在X乘Y个图形单元中,确定亮像素的数量占比大于预设比值的目标图形单元,得到N个目标图形单元,其中,第一图像的显著特征的数量为N,N个目标图形单元与N个显著特征一一对应,N为正整数;根据N个目标图形单元中,每个目标图形单元在X乘Y个图形单元中所处的行数,确定显著特征在第一图像中的第一纵坐标;根据N个目标图形单元中,每个目标图形单元在X乘Y个图形单元中所处的列数,确定显著特征在第一图像中的第一横坐标;根据显著特征的横坐标和显著特征的横坐标纵坐标,确定显著特征在第一图像中的位置。
可选地,处理器2010还用于将第一纵坐标转换到乐器数字接口信息坐标系中,得到显著特征在乐器数字接口信息坐标系中的第二纵坐标;将第一横坐标转换到乐器数字接口信息坐标系中,得到显著特征在乐器数字接口信息坐标系中的第二横坐标;根据第二纵坐标和第二横坐标,将N个显著特征映射到乐器数字接口信息坐标系中,得到与N个显著特征一一对应的N个音轨块。
可选地,音轨块包含乐器数字接口信息,处理器2010还用于根据音轨块对应的第二纵坐标确定乐器数字接口信息;其中,乐器数字接口信息包括以下信息中的至少一项:音高、音色、音量。
可选地,用户输入单元2007用于接收第一输入,其中,第一输入为对预设音乐特征进行选择的输入;
处理器2010还用于响应于第一输入,确定目标音乐特征,目标音乐特征包括以下至少一项:音乐风格、音乐心情、音乐曲风;根据音乐特征对音乐进行调整;
音频输出单元2003用于播放音乐文件。
可选地,处理器2010还用于生成与音乐文件对应的第二图像;
显示单元2006还用于显示第二图像,音频输出单元2003还用于播放音乐文件。
可选地,用户输入单元2007还用于接收第二输入,其中,第二输入为对预设视频模版进行选择的输入;
处理器2010还用于响应于第二输入,确定目标视频模版;根据目标视频模版和显著目标纹理图,生成第二图像。
可选地,处理器2010还用于通过钢琴卷帘图形界面,生成目标动画,其中目标动画用于展示音乐的播放进度;根据目标动画和显著目标纹理图,生成第二图像。
本申请实施例通过图像构建音乐,使得形成的音乐与包含用户回忆的图像相符合,一方面降低了音乐创作门槛,使得不具有乐理知识的“小白”用户,也能根据图片构建出对应的音乐,另一方面通过MIDI信息坐标系展示音轨块,使得最终构建的音乐可视化,给予用户独一无二的听觉和视觉的双重体验。
应理解的是,本申请实施例中,输入单元2004可以包括图形处理器(Graphics Processing Unit,GPU)20041和麦克风20042,图形处理器20041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。
显示单元2006可包括显示面板20061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板20061。用户输入单元2007包括触控面板20071以及其他输入设备20072。触控面板20071,也称为触摸屏。触控面板20071可包括触摸检测装置和触摸控制器两个部分。其他输入设备20072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。存储器2009可用于存储软件程序以及各种数据,包括但不限于应用程序和操作系统。处理器2010可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器2010中。
本申请实施例还提供一种可读存储介质,可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
其中,处理器为上述实施例中的电子设备中的处理器。可读存储介质,包括计算机可读存储介质,如计算机只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等。
本申请实施例另提供了一种芯片,芯片包括处理器和通信接口,通信接口和处理器耦合,处理器用于运行程序或指令,实现上述方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
应理解,本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。

Claims (15)

  1. 一种音乐文件的生成方法,其特征在于,包括:
    获取第一图像;
    对所述第一图像进行特征提取,得到所述第一图像的显著特征;
    基于所述显著特征在所述第一图像中的位置,将所述显著特征映射到乐器数字接口信息坐标系中,确定所述显著特征对应的乐器数字接口信息;所述乐器数字接口信息坐标系用于指示所述乐器数字接口信息与时间的对应关系;
    基于所述乐器数字接口信息与时间的对应关系,生成音乐文件。
  2. 根据权利要求1所述的音乐文件的生成方法,其特征在于,所述第一图像的图像内容包括显著目标,所述显著特征包括以下中的至少一项:
    所述显著目标的关键点、所述显著目标的边缘特征点。
  3. 根据权利要求2所述的音乐文件的生成方法,其特征在于,所述对所述第一图像进行特征提取,得到所述第一图像的显著特征,包括:
    通过卷积神经网络,对所述第一图像进行目标分割,得到所述第一图像中的所述显著目标,和所述显著目标的所述边缘特征点;
    对所述显著目标进行关键点提取,得到所述显著目标的关键点。
  4. 根据权利要求2所述的音乐文件的生成方法,其特征在于,在所述基于所述显著特征在所述第一图像中的位置,将所述显著特征映射到乐器数字接口信息坐标系中之前,所述音乐文件的生成方法还包括:
    根据所述显著特征,生成所述第一图像对应的显著目标纹理图;
    根据所述显著目标纹理图,确定所述显著特征在所述第一图像中的位置。
  5. 根据权利要求4所述的音乐文件的生成方法,其特征在于,所述根据所述显著特征,生成所述第一图像对应的显著目标纹理图,包括:
    根据所述边缘特征点和坎尼边缘检测算法,对所述第一图像进行边缘检测,得到所述显著目标的边缘图像;
    根据所述关键点和所述边缘特征点,生成所述显著目标对应的显著目标图;
    对所述边缘图像和所述显著目标图进行图像叠加,得到所述第一图像对应的显著目标纹理图。
  6. 根据权利要求4所述的音乐文件的生成方法,其特征在于,所述根据所述目标纹理图,确定所述显著特征在所述第一图像中的位置,包括:
    将所述目标纹理图划分为X行、Y列的X乘Y个图形单元,其中,X和Y均为大于1的整数,所述图形单元内包括亮像素和暗像素中的至少一种,所述亮像素为亮度值为1的像素,所述暗像素为亮度值为0的像素;
    在所述X乘Y个图形单元中,确定所述亮像素的数量占比大于预设比值的目标图形单元,得到N个所述目标图形单元,其中,所述第一图像的显著特征的数量为N,所述N个目标图形单元与所述N个显著特征一一对应,N为正整数;
    根据所述N个目标图形单元中,每个所述目标图形单元在所述X乘Y个图形单元中所处的行数,确定所述显著特征在所述第一图像中的第一纵坐标;
    根据所述N个目标图形单元中,每个所述目标图形单元在所述X乘Y个图形单元中所处的列数,确定所述显著特征在所述第一图像中的第一横坐标;
    根据所述显著特征的横坐标和所述显著特征的横坐标纵坐标,确定所述显著特征在所述第一图像中的位置。
  7. 根据权利要求6所述的音乐文件的生成方法,其特征在于,所述基于所述显著特征在所述第一图像中的位置,将所述显著特征映射到乐器数字接口信息坐标系中,包括:
    将所述第一纵坐标转换到所述乐器数字接口信息坐标系中,得到所述显著特征在所述乐器数字接口信息坐标系中的第二纵坐标;
    将所述第一横坐标转换到所述乐器数字接口信息坐标系中,得到所述显著特征在所述乐器数字接口信息坐标系中的第二横坐标;
    根据所述第二纵坐标和所述第二横坐标,将所述N个显著特征映射到所述乐器数字接口信息坐标系中,得到与所述N个显著特征一一对应的N个音轨块。
  8. 根据权利要求7所述的音乐文件的生成方法,其特征在于,所述音轨块包含所述乐器数字接口信息,根据所述音轨块对应的第二纵坐标确定所述乐器数字接口信息;
    其中,所述乐器数字接口信息包括以下信息中的至少一项:音高、音色、音量。
  9. 根据权利要求4至8中任一项所述的音乐文件的生成方法,其特征在于,还包括:
    接收第一输入,其中,所述第一输入为对预设音乐特征进行选择的输入;
    响应于所述第一输入,确定目标音乐特征,所述目标音乐特征包括以下至少一项:音乐风格、音乐心情、音乐曲风;
    根据所述音乐特征对所述音乐进行调整;
    播放所述音乐文件。
  10. 根据权利要求9所述的音乐文件的生成方法,其特征在于,还包括:
    生成与所述音乐文件对应的第二图像;
    所述播放所述音乐,包括:
    显示所述第二图像,并播放所述音乐。
  11. 根据权利要求10所述的音乐文件的生成方法,其特征在于,所述生成所述音乐对应的第二图像,包括:
    接收第二输入,其中,所述第二输入为对预设视频模版进行选择的输入;
    响应于所述第二输入,确定目标视频模版;
    根据所述目标视频模版和所述显著目标纹理图,生成所述第二图像。
  12. 根据权利要求10所述的音乐文件的生成方法,其特征在于,所述生成所述音乐对应的第二图像,包括:
    通过钢琴卷帘图形界面,生成目标动画,其中所述目标动画用于展示所述音乐的播放进度;
    根据所述目标动画和所述显著目标纹理图,生成所述第二图像。
  13. 一种音乐文件的生成装置,其特征在于,包括:
    获取模块,用于获取第一图像;
    提取模块,用于对所述第一图像进行特征提取,得到所述第一图像的显著特征;
    处理模块,用于基于所述显著特征在所述第一图像中的位置,将所述显著特征映射到乐器数字接口信息坐标系中,确定所述显著特征对应的乐器数字接口信息;所述乐器数字接口信息坐标系用于指示所述乐器数字接口信息与时间的对应关系;
    生成模块,用于基于所述乐器数字接口信息与时间的对应关系,生成音乐文件。
  14. 一种电子设备,其特征在于,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1至12中任一项所述音乐文件的生成方法的步骤。
  15. 一种可读存储介质,其特征在于,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1至12中任一项所述音乐文件的生成方法的步骤。
PCT/CN2022/100969 2021-07-23 2022-06-24 音乐文件的生成方法、生成装置、电子设备和存储介质 WO2023000917A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22845082.1A EP4339809A1 (en) 2021-07-23 2022-06-24 Method and apparatus for generating music file, and electronic device and storage medium
US18/545,825 US20240127777A1 (en) 2021-07-23 2023-12-19 Method and apparatus for generating music file, and electronic device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110839656.2 2021-07-23
CN202110839656.2A CN115687668A (zh) 2021-07-23 2021-07-23 音乐文件的生成方法、生成装置、电子设备和存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/545,825 Continuation US20240127777A1 (en) 2021-07-23 2023-12-19 Method and apparatus for generating music file, and electronic device and storage medium

Publications (1)

Publication Number Publication Date
WO2023000917A1 true WO2023000917A1 (zh) 2023-01-26

Family

ID=84980085

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/100969 WO2023000917A1 (zh) 2021-07-23 2022-06-24 音乐文件的生成方法、生成装置、电子设备和存储介质

Country Status (4)

Country Link
US (1) US20240127777A1 (zh)
EP (1) EP4339809A1 (zh)
CN (1) CN115687668A (zh)
WO (1) WO2023000917A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1287320A (zh) * 1999-09-03 2001-03-14 北京航空航天大学 一种将图像信息转换成音乐的方法
JP2004287144A (ja) * 2003-03-24 2004-10-14 Yamaha Corp 音楽再生と動画表示の制御装置およびそのプログラム
JP2004286918A (ja) * 2003-03-20 2004-10-14 Yamaha Corp 楽音形成端末装置、サーバ装置及びプログラム
US20060156906A1 (en) * 2005-01-18 2006-07-20 Haeker Eric P Method and apparatus for generating visual images based on musical compositions
CN113035158A (zh) * 2021-01-28 2021-06-25 深圳点猫科技有限公司 一种在线midi音乐编辑方法、系统及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1287320A (zh) * 1999-09-03 2001-03-14 北京航空航天大学 一种将图像信息转换成音乐的方法
JP2004286918A (ja) * 2003-03-20 2004-10-14 Yamaha Corp 楽音形成端末装置、サーバ装置及びプログラム
JP2004287144A (ja) * 2003-03-24 2004-10-14 Yamaha Corp 音楽再生と動画表示の制御装置およびそのプログラム
US20060156906A1 (en) * 2005-01-18 2006-07-20 Haeker Eric P Method and apparatus for generating visual images based on musical compositions
CN113035158A (zh) * 2021-01-28 2021-06-25 深圳点猫科技有限公司 一种在线midi音乐编辑方法、系统及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIAOYING WU ; ZE-NIAN LI: "A study of image-based music composition", MULTIMEDIA AND EXPO, 2008 IEEE INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 23 June 2008 (2008-06-23), Piscataway, NJ, USA , pages 1345 - 1348, XP031312979, ISBN: 978-1-4244-2570-9 *

Also Published As

Publication number Publication date
CN115687668A (zh) 2023-02-03
EP4339809A1 (en) 2024-03-20
US20240127777A1 (en) 2024-04-18

Similar Documents

Publication Publication Date Title
CN110941954B (zh) 文本播报方法、装置、电子设备及存储介质
JP7408048B2 (ja) 人工知能に基づくアニメキャラクター駆動方法及び関連装置
TWI486904B (zh) 律動影像化方法、系統以及電腦可讀取記錄媒體
CN111464834B (zh) 一种视频帧处理方法、装置、计算设备及存储介质
CN109785820A (zh) 一种处理方法、装置及设备
JP2021192222A (ja) 動画インタラクティブ方法と装置、電子デバイス、コンピュータ可読記憶媒体、及び、コンピュータプログラム
CN104574453A (zh) 用图像表达音乐的软件
CN112562705A (zh) 直播互动方法、装置、电子设备及可读存储介质
CN112235635B (zh) 动画显示方法、装置、电子设备及存储介质
WO2019040524A1 (en) METHOD AND SYSTEM FOR MUSIC COMMUNICATION
US11511200B2 (en) Game playing method and system based on a multimedia file
CN112309365A (zh) 语音合成模型的训练方法、装置、存储介质以及电子设备
CN116484318A (zh) 一种演讲训练反馈方法、装置及存储介质
CN116630495A (zh) 基于aigc算法的虚拟数字人模型规划系统
WO2017168260A1 (ja) 情報処理装置、プログラム及び情報処理システム
Solah et al. Mood-driven colorization of virtual indoor scenes
CN112435641B (zh) 音频处理方法、装置、计算机设备及存储介质
WO2023000917A1 (zh) 音乐文件的生成方法、生成装置、电子设备和存储介质
US20220335974A1 (en) Multimedia music creation using visual input
CN114786030B (zh) 主播画面显示方法、装置、电子设备和存储介质
JP7466087B2 (ja) 推定装置、推定方法、及び、推定システム
JP2024523396A (ja) 音楽ファイルの生成方法、生成装置、電子機器及び記憶媒体
Chen et al. New Enhancement Techniques for Optimizing Multimedia Visual Representations in Music Pedagogy.
WO2023001115A1 (zh) 视频的生成方法、电子设备及其介质
JP7339420B1 (ja) プログラム、方法、情報処理装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22845082

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022845082

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022845082

Country of ref document: EP

Effective date: 20231212

ENP Entry into the national phase

Ref document number: 2023577867

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE