WO2023000917A1 - 音乐文件的生成方法、生成装置、电子设备和存储介质 - Google Patents
音乐文件的生成方法、生成装置、电子设备和存储介质 Download PDFInfo
- Publication number
- WO2023000917A1 WO2023000917A1 PCT/CN2022/100969 CN2022100969W WO2023000917A1 WO 2023000917 A1 WO2023000917 A1 WO 2023000917A1 CN 2022100969 W CN2022100969 W CN 2022100969W WO 2023000917 A1 WO2023000917 A1 WO 2023000917A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- salient
- music
- generating
- target
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 77
- 238000000605 extraction Methods 0.000 claims abstract description 28
- 238000013507 mapping Methods 0.000 claims abstract description 5
- 230000015654 memory Effects 0.000 claims description 25
- 238000013527 convolutional neural network Methods 0.000 claims description 22
- 238000003708 edge detection Methods 0.000 claims description 21
- 238000012545 processing Methods 0.000 claims description 17
- 230000004044 response Effects 0.000 claims description 10
- 230000036651 mood Effects 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 6
- 230000000007 visual effect Effects 0.000 abstract description 11
- 235000019587 texture Nutrition 0.000 description 51
- 230000008569 process Effects 0.000 description 17
- 238000010586 diagram Methods 0.000 description 14
- 241001465754 Metazoa Species 0.000 description 8
- 230000001815 facial effect Effects 0.000 description 8
- 230000009977 dual effect Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 235000019580 granularity Nutrition 0.000 description 6
- 230000003068 static effect Effects 0.000 description 6
- 238000012800 visualization Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 4
- 230000003238 somatosensory effect Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 239000000284 extract Substances 0.000 description 3
- 241000699666 Mus <mouse, genus> Species 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000037237 body shape Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000001427 coherent effect Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 210000005069 ears Anatomy 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 210000000554 iris Anatomy 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 230000033764 rhythmic process Effects 0.000 description 2
- 239000011435 rock Substances 0.000 description 2
- 241000699670 Mus sp. Species 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 230000012447 hatching Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 210000000697 sensory organ Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/44—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/48—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/48—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/483—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10G—REPRESENTATION OF MUSIC; RECORDING MUSIC IN NOTATION FORM; ACCESSORIES FOR MUSIC OR MUSICAL INSTRUMENTS NOT OTHERWISE PROVIDED FOR, e.g. SUPPORTS
- G10G1/00—Means for the representation of music
- G10G1/02—Chord or note indicators, fixed or adjustable, for keyboard of fingerboards
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
- G10H1/0025—Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0033—Recording/reproducing or transmission of music for electrophonic musical instruments
- G10H1/0041—Recording/reproducing or transmission of music for electrophonic musical instruments in coded form
- G10H1/0058—Transmission between separate instruments or between individual components of a musical system
- G10H1/0066—Transmission between separate instruments or between individual components of a musical system using a MIDI interface
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
- G10H1/361—Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
- G10H1/368—Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems displaying animated or moving pictures synchronized with the music or audio part
Definitions
- the application belongs to the field of computer technology, and in particular relates to a music file generation method, generation device, electronic equipment and storage medium.
- music creation has a relatively high threshold, and it is difficult for ordinary users to participate in music creation.
- the created "music” is generally regarded as an auditory art.
- the music itself establishes a connection with the listener's auditory senses, but does not establish a connection with the most important human sense "vision", resulting in a single user experience in the process of creating music.
- the purpose of the embodiment of the present application is to provide a music file generation method, generation device, electronic equipment and storage medium, which can generate music based on visualized images, and give users a unique dual experience of hearing and vision.
- the embodiment of the present application provides a method for generating a music file, including:
- the salient features are mapped to the MDI information coordinate system, and the MDI information corresponding to the salient features is determined; the MDI information coordinate system is used to indicate the relationship between the MDI information and time Correspondence;
- the music file is generated based on the corresponding relationship between the digital instrument interface information and the time.
- the embodiment of the present application provides a device for generating music files, including:
- An acquisition module configured to acquire the first image
- the extraction module is used to perform feature extraction on the first image to obtain salient features of the first image
- the processing module is configured to map the salient features to the musical instrument digital interface information coordinate system based on the position of the salient features in the first image, and determine the musical instrument digital interface information corresponding to the salient features; the musical instrument digital interface information coordinate system is used to indicate the musical instrument digital interface Correspondence between interface information and time;
- the generation module is used for generating music files based on the corresponding relationship between the digital instrument interface information and time.
- the embodiment of the present application provides an electronic device, including a processor, a memory, and a program or instruction stored in the memory and operable on the processor.
- the program or instruction is executed by the processor, the first aspect is implemented. steps of the method.
- an embodiment of the present application provides a readable storage medium, on which a program or an instruction is stored, and when the program or instruction is executed by a processor, the steps of the method in the first aspect are implemented.
- the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the steps of the method in the first aspect .
- image information such as a photo or video
- image information is converted into a visualized electronic score file by processing the image, that is, the above-mentioned first image, specifically in the musical instrument digital interface (Musical Instrument Digital Interface, MIDI ) coordinate system, the manner of displaying the audio track blocks, wherein these audio track blocks constitute the salient features of the first image, that is, the graph formed by the audio track blocks matches the image of the salient features of the first image.
- these track blocks all include musical instrument digital interface information, that is, MIDI information. After the MIDI information is recognized by the computer, these track blocks are played in chronological order according to the correspondence between MIDI information and time, thereby forming music.
- the embodiment of this application constructs music through images, so that the formed music matches the images containing the user's memories.
- the threshold for music creation is lowered, so that "novice" users who do not have music theory knowledge can also construct corresponding music based on pictures.
- Music displays audio track blocks through the MIDI information coordinate system, making the final music visualization visible, giving users a unique dual experience of hearing and vision.
- Fig. 1 shows one of the flowcharts of the method for generating a music file according to an embodiment of the application
- Fig. 2 shows the interface schematic diagram of the MIDI information coordinate system according to the generation method of the music file of the embodiment of the application;
- Fig. 3 shows a schematic diagram of a salient target texture map according to a method for generating a music file according to an embodiment of the present application
- Fig. 4 shows the second flow chart of the method for generating a music file according to an embodiment of the present application
- Fig. 5 shows the division diagram of the target texture map according to the generation method of the music file of the embodiment of the present application
- Fig. 6 shows the third flowchart of the method for generating a music file according to an embodiment of the present application
- FIG. 7 shows a schematic diagram of a piano roll graphical interface in a method for generating a music file according to an embodiment of the present application
- Fig. 8 shows the structural block diagram of the generating device of music file according to the embodiment of the present application.
- FIG. 9 shows a structural block diagram of an electronic device according to an embodiment of the present application.
- FIG. 10 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
- FIG. 1 shows one of the flowcharts of a method for generating a music file according to an embodiment of the application. As shown in FIG. 1 , the method includes :
- Step 102 acquiring a first image
- Step 104 performing feature extraction on the first image to obtain salient features of the first image
- Step 106 based on the position of the salient features in the first image, map the salient features to the musical instrument digital interface information coordinate system, and determine the musical instrument digital interface information corresponding to the salient features;
- step 106 the musical instrument digital interface information coordinate system is used to indicate the corresponding relationship between the musical instrument digital interface information and time;
- Step 108 based on the corresponding relationship between the musical instrument digital interface information and the time, a music file is generated.
- the first image is specifically the "memory image" selected by the user.
- the user can obtain the first image by uploading a photo or video saved locally to the client, and the user can also obtain the first image by taking a photo or recording a video with a camera of an electronic device such as a mobile phone.
- the first image can be obtained by extracting frames from the video.
- a frame may be randomly extracted from the video, or the content of the video may be identified through a neural network model, so as to determine an image frame that can reflect the theme of the video for extraction.
- acquiring the first image specifically includes: receiving a third input, where the third input is an input for selecting the first image; and determining the first image in response to the third input.
- acquiring the first image specifically includes: receiving a fourth input, where the fourth input is an input for shooting a video; responding to the fourth input, shooting the video to be processed; performing frame extraction processing on the video to be processed to obtain the first an image.
- feature extraction is further performed on the first image, so that salient features of the first image are extracted from the first image. For example, if the first image is a "face” picture, the salient features of the first image are the outline of the face, the position of facial features, etc. in the first image. If the first image is a "portrait" picture of a full body or a half body, then the salient features of the first image are the silhouette, posture, etc. of the person in it.
- the salient features of the first image may be the animal's or child's body shape and facial features. If the first image is a "still" object (stationary object) such as a building, a vehicle, a landscape, etc., then the salient feature of the first image may be the overall appearance and prominent installation of these still objects.
- the salient feature is mapped in the MDI information coordinate system, that is, the MIDI information coordinate system, so that the salient feature Image unit, formed as a track block in the MIDI message coordinate system.
- the MIDI information coordinate system is used to indicate the corresponding relationship between MIDI information and time, that is, the relationship between MIDI information corresponding to a track block and time.
- Fig. 2 shows a schematic interface diagram of the MIDI information coordinate system of the method for generating music files according to the embodiment of the present application.
- the first image is specifically a face image
- the salient features of the face image mapped to a plurality of audio track blocks 202 in the MIDI information coordinate system 200
- the plurality of audio track blocks 202 form a shape similar to a human face in the MIDI information coordinate system
- the shape of the human face is the same as the first image. corresponding to the characteristics.
- these audio track blocks corresponding to the salient features have musical instrument digital interface information, that is, MIDI information.
- MIDI information are specifically information that can be recognized by a computer device and played as "sound".
- digital signals corresponding to information such as pitch, timbre, and volume are obtained, thereby forming a musical motive, that is, an accent, and according to the corresponding relationship between these distinctive features and time, that is, the corresponding relationship between these musical motives and time , and play the "sounds" corresponding to these music motives in sequence, thereby forming a piece of music, which is unique music generated according to the "memory image" selected by the user, that is, the first image.
- the embodiment of this application constructs music through images, so that the formed music matches the images containing the user's memories.
- the threshold for music creation is lowered, so that "novice" users who do not have music theory knowledge can also construct corresponding music based on pictures.
- Music displays audio track blocks through the MIDI information coordinate system, making the final music visualization visible, giving users a unique dual experience of hearing and vision.
- the image content of the first image includes salient objects
- the salient features include at least one of the following: key points of the salient objects, and edge feature points of the salient objects.
- the salient object is the main object in the image content of the first image.
- the salient object is the "human face”.
- the salient object is the "building".
- salient features specifically include the key points of salient objects, such as the key points of a human face, namely "five sense organs", and the key points of a building are the characteristic design of the building, such as “windows” and "doorways”.
- Salient features may also include edge feature points of salient objects, and these edge feature points will form contours of salient objects, such as human face contours or building contours.
- a "simplified map" of the salient objects can be formed, through which the viewer can be associated with the original image.
- the subject being photographed such as “someone” or "a certain building,” evokes memories in the viewer.
- feature extraction is performed on the first image to obtain salient features of the first image, including:
- the key points of the salient objects are extracted, and the key points of the salient objects are obtained.
- the first image when performing feature extraction on the first image, first, the first image may be segmented through a pre-trained convolutional neural network.
- the object of object segmentation is to segment out salient objects in the first image.
- a preset convolutional neural network can be trained through a large number of pre-labeled training sets, so that the trained convolutional neural network can identify salient objects in pictures. For example, for portrait pictures, a training set can be generated by setting a large number of original face pictures and the salient target pictures containing only "face” after the "face” part is cut out. The convolutional neural network is trained so that the convolutional neural network iterates continuously. When the convolutional neural network can relatively accurately identify the salient target and the edge of the salient target in the picture, it is judged that the convolutional neural network can be put into use.
- the convolutional neural network trained by the above method is used to perform artificial intelligence recognition on the first image, thereby judging the salient objects and the edges of the salient objects, and obtaining the edge feature points of the salient objects.
- the specific types of salient objects are judged, such as “face”, “animal”, “building”, etc., so as to determine the corresponding key point extraction granularity according to the specific types of salient objects, According to the corresponding extraction granularity, the key points of the salient objects are extracted, so as to obtain the key points of the salient objects, such as the facial features of the face.
- the application extracts the salient features of the salient objects in the first image through the trained convolutional neural network, specifically the key points and edge feature points of the salient objects, which can quickly and accurately obtain the salient features, thereby improving
- the processing speed of generating music from images is conducive to improving user experience.
- the music file generating method before mapping the salient features into the MDI information coordinate system based on the positions of the salient features in the first image, the music file generating method further includes:
- the location of the salient feature in the first image is determined.
- a salient target texture map corresponding to the first image is generated.
- the salient object texture map that is, in the first image, only shows the image of the salient features of the salient object.
- the salient object texture map includes only two types of pixels, wherein the first type of pixels are pixels for displaying salient features, and the second type of pixels are pixels at non-salient feature positions.
- Fig. 3 shows a schematic diagram of a salient object texture map according to a method for generating a music file according to an embodiment of the present application.
- the first image is a human face image
- the salient object therein is a human face.
- the salient object A texture map looks like a sketch of a human face.
- the salient target texture map is an image that processes the first image to only display salient features
- the salient features can be determined according to the salient target texture map, so that the salient features Features are mapped to the MIDI information coordinate system to realize the conversion process from image to MIDI electronic score, and finally to music, realizing "from image to music" and giving users a unique experience.
- FIG. 4 shows the second flow chart of the music file generation method according to the embodiment of the present application.
- the salient target texture corresponding to the first image is generated
- the steps of the figure include the following steps:
- Step 402 Perform edge detection on the first image according to the edge feature points and the Canny edge detection algorithm to obtain the edge image of the salient target;
- Step 404 generating a salient object map corresponding to the salient object according to the key points and edge feature points;
- Step 406 performing image superposition on the edge image and the salient object map to obtain the salient object texture map corresponding to the first image.
- edge detection is performed by using the Canny edge detection algorithm according to the edge feature points.
- the Canny edge detection algorithm is also the Canny edge detection algorithm, specifically a multi-level edge detection algorithm developed by John F.Canny in 1986.
- the first image is first subjected to Gaussian filtering, that is, a Gaussian matrix is used to remove the average value of each pixel and its neighborhood to be weighted , as the gray value of the pixel. Further, calculate the gradient value and gradient direction, and filter the non-maximum value, and finally use the set threshold range to perform edge detection to obtain the edge image of the salient target.
- Gaussian filtering that is, a Gaussian matrix is used to remove the average value of each pixel and its neighborhood to be weighted , as the gray value of the pixel.
- a salient object map corresponding to the salient object is generated, that is, a feature map formed by the key points and the edge feature points.
- edge image and the salient object map are connected, which is equivalent to drawing each key point and the contour together, and finally a salient object texture map with clear contours is obtained.
- determining the position of the salient feature in the first image includes:
- X and Y are both integers greater than 1, and at least one of bright pixels and dark pixels is included in the graphics units, and bright pixels are brightness A pixel with a value of 1, and a dark pixel is a pixel with a brightness value of 0;
- N the number of salient features of the first image is N, and N target graphics units One-to-one correspondence with N salient features, N is a positive integer;
- the position of the salient feature in the first image is determined.
- the target texture map is divided into X rows and Y columns to obtain an X ⁇ Y graphics matrix, which includes X ⁇ Y graphics units.
- each graphics unit there are multiple pixels, including bright pixels and dark pixels.
- Bright pixels are pixels used to display salient features, and their brightness value is 1.
- Dark pixels are pixels outside of salient features, whose brightness is 1.
- a value of 0 means "pure black" is displayed.
- the proportion of bright pixels in each image unit is judged respectively. For example, assuming that the number of pixels in a graphic unit is 10, including 6 bright pixels and 4 dark pixels, the ratio of the number of bright pixels in the graphic unit is 0.6.
- the ratio of bright pixels in each graphics unit is greater than a preset ratio, wherein the range of the preset ratio is greater than or equal to 0.2, preferably 0.4 .
- the preset ratio of 0.4 as an example, if there are 4 or more bright pixels among the 10 pixels in a graphics unit, this graphics unit is marked as the target graphics unit, which is used to indicate that the target graphics unit is have distinctive features.
- FIG. 5 shows a schematic diagram of the division of the target texture map according to the music file generation method of the embodiment of the present application. As shown in FIG. Among them, the one filled with hatching is a target graphic unit 504 , that is, a unit of a distinctive feature.
- the graphic unit 506 is located in the fourth column and the second row, then the salient features corresponding to the graphic unit 506 can be determined, and the first image in the first image One abscissa is 4x, and the first ordinate is 2y.
- the target graphic unit is determined according to the ratio of the number of bright pixels in the divided X ⁇ Y graphic units, and the target graphic unit is taken as a distinctive feature, Mapped to the MIDI information coordinate system, the transformation from image to MIDI electronic score is realized, and then the transformation from image to music is realized. At the same time, the music is visualized, which can give users a dual experience of hearing and vision.
- the salient features are mapped to the MDI information coordinate system, including:
- the N salient features are mapped to the MDI information coordinate system, and N audio track blocks corresponding to the N salient features one-to-one are obtained.
- the first abscissa and the first ordinate of the salient features obtained above can be synchronously converted to the second abscissa in the MIDI information coordinate system coordinates and the second ordinate, so as to realize the mapping of salient features in the MIDI information coordinate system.
- N salient features are mapped to the MIDI information coordinate system, and N track blocks corresponding to the N notable features are obtained, and the N track blocks are processed through the musical instrument digital interface program.
- visualized music can be obtained.
- image features of the salient objects in the first image are preserved, and on the other hand, unique music corresponding to the salient objects in the first image can be generated.
- the MIDI information coordinate system is used to indicate the corresponding relationship between musical instrument digital interface information and time. Therefore, according to a notable feature, that is, the coordinates of an audio track block in the MIDI information coordinate system, the MIDI information of an audio track block can be determined. Information and time information. After the computer program recognizes the MIDI information and time information of the track block, it can convert it into a music motive.
- This music motive has sound attributes such as timbre, pitch, volume, etc., and also has the time attribute of beat , play multiple audio track blocks corresponding to multiple salient features according to their MIDI information and time information, and finally get the music converted from the first image, that is, the music that matches the user's "recall image", satisfying Meet the user's demand for unique music creation.
- the audio track block includes the digital instrument interface information, and the digital instrument interface information is determined according to the second ordinate corresponding to the audio track block; wherein, the digital instrument interface information includes at least one of the following information: High, Timbre, Volume.
- FIG. 6 shows the third flowchart of the method for generating music files according to the embodiment of the present application. As shown in FIG. 6, the method also includes:
- Step 602 receiving a first input
- the first input is an input for selecting a preset music feature; in this step, the first input is a user input received through a human-computer interaction component, and the first input includes: touch input, biometric input , click input, somatosensory input, voice input, keyboard input or press input, wherein: touch input includes but not limited to touch, slide or specific touch gestures, etc.; biometric input includes but not Limited to biometric information input such as fingerprints, irises, voiceprints, or facial recognition; click input includes but not limited to mouse clicks, switch clicks, etc.; somatosensory input includes but not limited to shaking electronic devices, flipping electronic devices, etc.; press input includes but not limited to Press input on the touch screen, on the frame, on the back cover, or on other parts of the electronic device.
- touch input includes but not limited to touch, slide or specific touch gestures, etc.
- biometric input includes but not Limited to biometric information input such as fingerprints, irises, voiceprints, or facial recognition
- click input includes but not limited
- Step 604 in response to the first input, determine the target music feature
- the target music features include at least one of the following: music style, music mood, music style;
- Step 606 adjusting the music according to the music features
- Step 608 play the music file.
- the user can select a plurality of preset music features and select a target music feature, so as to adjust the music generated according to the first image rationally.
- the target music characteristics include music style, such as: pop music, classical music, electronic music, etc., and also include music mood, such as: passionate, deep, soothing, etc., and also include music style, such as: rock music, jazz, blues, etc. .
- the music generated according to the first image is adjusted, so that the adjusted music is more in line with the music feature selected by the user. If the user selects classical music, soothing, and blues, then the intermediate frequency and The volume of the low frequency, while adjusting the time interval of the second abscissa, makes the music rhythm slower and more soothing.
- further post-processing can be performed on the second ordinate of the track block in the MIDI coordinate system according to the preset music theory data and acoustic data.
- a key can be set in advance, and the range of the highest and lowest scales can be specified. If the highest and lowest scales of the track block within a certain period of time exceed this range, then according to certain adjustment rules, the range will be adjusted.
- Adjust the pitch of the track block that is, adjust the out-of-tone to in-tone, such as lowering the pitch of the track block higher than the highest scale threshold by one octave, or lowering the pitch of the track block lower than the lowest scale threshold
- the pitch is increased by one octave, etc., so that the adjusted music is more in line with the music theory.
- the adjusted music can be played automatically, so that the user can immediately enjoy the music generated according to the "memory photo" selected by him. Music, enjoy the joy of music creation.
- the method for generating a music file further includes: generating a second image corresponding to the music;
- Playing the music file includes: displaying the second image and playing the music file.
- a second image corresponding to the music file to be played may also be generated, and the second image is displayed while the music file is played, so that the user can experience visual and auditory enjoyment at the same time.
- the second image may be a static picture generated according to the first image selected by the user, or the salient feature texture map corresponding to the first image, and the static picture and the playing progress of the music are displayed when the music file is played.
- the second image can also be an animation file generated according to a preset template, or according to the playback interface of the MIDI information coordinate system, the animation duration of the animation file matches the music duration of the generated music, and is played while playing the music file Animations further enhance the user's visual experience.
- a second image is generated according to the target video template and the salient target texture map.
- the target video template selected by the user's second input by receiving the user's second input, the target video template selected by the user's second input, and the salient target texture map corresponding to the first image can be generated when playing music as the background when playing music image.
- the video template may be a coherent animation template, or a "slideshow" in which multiple static pictures are displayed in turn.
- the salient target texture map corresponding to the first image is superimposed and displayed, so that when the user sees the second image, the memory of the first image can be recalled and user experience can be improved.
- the second input is a user input received through the human-computer interaction component, and the second input includes: one of touch input, biometric input, click input, somatosensory input, voice input, keyboard input or press input
- touch input includes but not limited to touch, slide or specific touch gestures, etc.
- biometric input includes but not limited to biometric information input such as fingerprint, iris, voiceprint or facial recognition
- click input Including but not limited to mouse click, switch click, etc.
- somatosensory input includes but not limited to shaking electronic devices, flipping electronic devices, etc.
- pressing input includes but not limited to pressing input on the touch screen, pressing input on the frame, pressing on the back cover Input or press input to parts of other electronic devices.
- the embodiment of the present application does not limit the specific form of the second input.
- generating a second image corresponding to the music file includes:
- a second image is generated based on the target animation and the salient target texture map.
- the target animation is generated through the piano roll graphical interface, wherein the target animation is the process of playing the audio track block in the MIDI file in the piano roll graphical interface.
- FIG. 7 shows a schematic diagram of the piano roll graphical interface in the method for generating music files according to an embodiment of the present application, wherein the left side is the key 702 of the animation image of the piano, and the track block 704 is in the interface. According to Its corresponding time information gradually moves to the left key 702 .
- the background of the interface according to the salient target texture map corresponding to the first image, it is used as the background image of the second image, so that a dominant visual connection is established between the second image and the first image, so that the user listens to music.
- watch the second image associated with the "reminiscence image” thereby arousing the user's memory and enriching the user's visual experience.
- An acquisition module 802 configured to acquire a first image
- the processing module 806 is configured to map the salient features to the musical instrument digital interface information coordinate system based on the salient features in the first image, and determine the musical instrument digital interface information corresponding to the salient features; the musical instrument digital interface information coordinate system is used to indicate the musical instrument Correspondence between digital interface information and time;
- the generating module 808 is configured to generate music files based on the correspondence between the musical instrument digital interface information and time.
- the first image is specifically the "memory image" selected by the user.
- the user can obtain the first image by uploading a photo or video saved locally to the client, and the user can also obtain the first image by taking a photo or recording a video with a camera of an electronic device such as a mobile phone.
- the first image can be obtained by extracting frames from the video.
- a frame may be randomly extracted from the video, or the content of the video may be identified through a neural network model, so as to determine an image frame that can reflect the theme of the video for extraction.
- acquiring the first image specifically includes: receiving a third input, wherein the third input is an input for selecting the first image; in response to the third input, determining the first image.
- acquiring the first image specifically includes: receiving a fourth input, where the fourth input is an input for shooting a video; responding to the fourth input, shooting the video to be processed; performing frame extraction processing on the video to be processed to obtain the first an image.
- feature extraction is further performed on the first image, so that salient features of the first image are extracted from the first image. For example, if the first image is a "face” picture, the salient features of the first image are the outline of the face, the position of facial features, etc. in the first image. If the first image is a "portrait" picture of a full body or a half body, then the salient features of the first image are the silhouette, posture, etc. of the person in it.
- the salient features of the first image may be the animal's or child's body shape and facial features. If the first image is a "still" object such as a building, a vehicle, or a landscape, then the salient features of the first image may be the overall appearance and prominent devices of these still objects.
- the salient feature is mapped in the MDI information coordinate system, that is, the MIDI information coordinate system, so that the salient feature Image unit, formed as a track block in the MIDI message coordinate system.
- the MIDI information coordinate system is used to indicate the corresponding relationship between MIDI information and time, that is, the relationship between MIDI information corresponding to a track block and time.
- these audio track blocks corresponding to the salient features have musical instrument digital interface information, that is, MIDI information.
- MIDI information are specifically information that can be recognized by a computer device and played as "sound".
- digital signals corresponding to information such as pitch, timbre, and volume are obtained, thereby forming a musical motive, that is, an accent, and according to the corresponding relationship between these distinctive features and time, that is, the corresponding relationship between these musical motives and time , and play the "sounds" corresponding to these music motives in sequence, thereby forming a piece of music, which is unique music generated according to the "memory image" selected by the user, that is, the first image.
- the embodiment of this application constructs music through images, so that the formed music matches the images containing the user's memories.
- the threshold for music creation is lowered, so that "novice" users who do not have music theory knowledge can also construct corresponding music based on pictures.
- Music displays audio track blocks through the MIDI information coordinate system, making the final music visualization visible, giving users a unique dual experience of hearing and vision.
- the image content of the first image includes salient objects
- the salient features include at least one of the following: key points of the salient objects, and edge feature points of the salient objects.
- the salient object is the main object in the image content of the first image.
- the salient object is the "human face”.
- the salient object is the "building".
- salient features specifically include the key points of salient objects, such as the key points of a human face, namely "five features", and the key points of a building are the characteristic design of the building, such as “windows” and "doorways”.
- Salient features may also include edge feature points of salient objects, and these edge feature points will form contours of salient objects, such as human face contours or building contours.
- a "simplified map" of the salient objects can be formed, through which the viewer can be associated with the original image.
- the subject being photographed such as “someone” or "a certain building,” evokes memories in the viewer.
- the processing module is also used to perform object segmentation on the first image through a convolutional neural network to obtain salient objects in the first image and edge feature points of the salient objects ; Extract the key points of the salient objects to obtain the key points of the salient objects.
- the first image when performing feature extraction on the first image, first, the first image may be segmented through a pre-trained convolutional neural network.
- the object of object segmentation is to segment out salient objects in the first image.
- a preset convolutional neural network can be trained through a large number of pre-labeled training sets, so that the trained convolutional neural network can identify salient objects in pictures. For example, for portrait pictures, a training set can be generated by setting a large number of original face pictures and the salient target pictures containing only "face” after the "face” part is cut out. The convolutional neural network is trained so that the convolutional neural network iterates continuously. When the convolutional neural network can relatively accurately identify the salient target and the edge of the salient target in the picture, it is judged that the convolutional neural network can be put into use.
- the convolutional neural network trained by the above method is used to perform artificial intelligence recognition on the first image, thereby judging the salient objects and the edges of the salient objects, and obtaining the edge feature points of the salient objects.
- the specific types of salient objects are judged, such as “face”, “animal”, “building”, etc., so as to determine the corresponding key point extraction granularity according to the specific types of salient objects, According to the corresponding extraction granularity, the key points of the salient objects are extracted, so as to obtain the key points of the salient objects, such as the facial features of the face.
- the application extracts the salient features of the salient objects in the first image through the trained convolutional neural network, specifically the key points and edge feature points of the salient objects, which can quickly and accurately obtain the salient features, thereby improving
- the processing speed of generating music from images is conducive to improving user experience.
- the generating module is further configured to generate a salient target texture map corresponding to the first image according to the salient features
- the processing module is further configured to determine the position of the salient feature in the first image according to the salient object texture map.
- a salient target texture map corresponding to the first image is generated.
- the salient object texture map that is, in the first image, only shows the image of the salient features of the salient object.
- the salient object texture map includes only two types of pixels, wherein the first type of pixels are pixels used to display salient features, and the second type of pixels are pixels at non-salient feature positions.
- the salient target texture map is an image that processes the first image to only display salient features
- the salient features can be determined according to the salient target texture map, so that the salient features Features are mapped to the MIDI information coordinate system to realize the conversion process from image to MIDI electronic score, and finally to music, realizing "from image to music" and giving users a unique experience.
- the processing module is further configured to perform edge detection on the first image according to the edge feature points and the Canny edge detection algorithm, to obtain the edge image of the salient object;
- the generation module is also used to generate a salient object map corresponding to the salient object according to the key points and edge feature points; image superposition is performed on the edge image and the salient object map to obtain a salient object texture map corresponding to the first image.
- edge detection is performed by using the Canny edge detection algorithm according to the edge feature points.
- the first image is first subjected to Gaussian filtering, that is, a Gaussian matrix is used to remove the average value of each pixel and its neighborhood to be weighted , as the gray value of the pixel.
- Gaussian filtering that is, a Gaussian matrix is used to remove the average value of each pixel and its neighborhood to be weighted , as the gray value of the pixel.
- calculate the gradient value and gradient direction, and filter the non-maximum value and finally use the set threshold range to perform edge detection to obtain the edge image of the salient target.
- a salient object map corresponding to the salient object is generated, that is, a feature map formed by the key points and the edge feature points.
- edge image and the salient object map are connected, which is equivalent to drawing each key point and the contour together, and finally a salient object texture map with clear contours is obtained.
- the processing module is also used for:
- X and Y are both integers greater than 1, and at least one of bright pixels and dark pixels is included in the graphics units, and bright pixels are brightness A pixel with a value of 1, and a dark pixel is a pixel with a brightness value of 0; in X multiplied by Y graphics units, determine the target graphics unit whose quantity ratio of bright pixels is greater than the preset ratio, and obtain N target graphics units, wherein, The number of salient features of the first image is N, and the N target graphic units are in one-to-one correspondence with the N salient features, and N is a positive integer;
- the row number of each target graphic unit in X multiplied by Y graphic units determine the first ordinate of the salient features in the first image; according to the N target graphic units, each The number of columns of the target graphics unit in the X multiplied by the Y graphics unit, determine the first abscissa of the salient feature in the first image; determine the salient feature according to the abscissa of the notable feature and the abscissa and ordinate of the notable feature position in the first image.
- X ⁇ Y graphics matrix which includes X ⁇ Y graphics units.
- each graphics unit there are multiple pixels, including bright pixels and dark pixels.
- Bright pixels are pixels used to display salient features, and their brightness value is 1.
- Dark pixels are pixels outside of salient features, whose brightness is 1.
- a value of 0 means "pure black" is displayed.
- the proportion of bright pixels in each image unit is judged respectively. For example, assuming that the number of pixels in a graphic unit is 10, including 6 bright pixels and 4 dark pixels, the ratio of the number of bright pixels in the graphic unit is 0.6.
- the ratio of bright pixels in each graphics unit is greater than a preset ratio, wherein the range of the preset ratio is greater than or equal to 0.2, preferably 0.4 .
- the preset ratio of 0.4 as an example, if there are 4 or more bright pixels among the 10 pixels in a graphics unit, this graphics unit is marked as the target graphics unit, which is used to indicate that the target graphics unit is have distinctive features.
- target graphic units in all X ⁇ Y graphic units are determined, these target graphic units are the salient features that are finally mapped in the MIDI information coordinate system.
- the target graphic unit is determined according to the ratio of the number of bright pixels in the divided X ⁇ Y graphic units, and the target graphic unit is taken as a distinctive feature, Mapped to the MIDI information coordinate system, the transformation from image to MIDI electronic score is realized, and then the transformation from image to music is realized. At the same time, the music is visualized, which can give users a dual experience of hearing and vision.
- the processing module is also used to transform the first vertical coordinate into the coordinate system of the digital instrument interface information to obtain the second vertical coordinate of the salient features in the digital musical instrument interface coordinate system Coordinates; convert the first abscissa to the musical instrument digital interface information coordinate system to obtain the second abscissa of the salient features in the musical instrument digital interface information coordinate system; according to the second ordinate and the second abscissa, the N salient features Mapped to the musical instrument digital interface information coordinate system, N audio track blocks corresponding to N salient features one-to-one are obtained.
- the first abscissa and the first ordinate of the salient features obtained above can be synchronously converted to the second abscissa in the MIDI information coordinate system coordinates and the second ordinate, so as to realize the mapping of salient features in the MIDI information coordinate system.
- N salient features are mapped to the MIDI information coordinate system, and N track blocks corresponding to the N notable features are obtained, and the N track blocks are processed through the musical instrument digital interface program.
- visualized music can be obtained.
- the features of the salient objects in the first image are preserved, and on the other hand, unique music corresponding to the salient objects in the first image can be generated.
- the MIDI information coordinate system is used to indicate the corresponding relationship between musical instrument digital interface information and time. Therefore, according to a notable feature, that is, the coordinates of an audio track block in the MIDI information coordinate system, the MIDI information of an audio track block can be determined. Information and time information. After the computer program recognizes the MIDI information and time information of the track block, it can convert it into a music motive.
- This music motive has sound attributes such as timbre, pitch, volume, etc., and also has the time attribute of beat , play multiple audio track blocks corresponding to multiple salient features according to their MIDI information and time information, and finally get the music converted from the first image, that is, the music that matches the user's "recall image", satisfying Meet the user's demand for unique music creation.
- the audio track block contains the digital instrument interface information, and the digital instrument interface information is determined according to the second ordinate corresponding to the audio track block; wherein, the digital instrument interface information includes the following information At least one of: pitch, timbre, volume.
- the second vertical coordinate of the audio track block in the MIDI information coordinate system is the MIDI information corresponding to the audio track block.
- the second ordinate represents the MIDI information of the track block, including MIDI pitch, MIDI timbre and MIDI volume. Specifically, every time the ordinate increases by 1, the scale increases by 1, and every time the ordinate increases by 8, the scale increases by one octave.
- the timbre and volume of an audio track block can also be obtained.
- a crisper timbre can be set for it, such as For violin, flute and other musical instruments, and the pitch of the track block is within the range of the middle scale, you can set the tone of the main melody instrument such as piano, guitar, etc., and when the pitch of the track block is within the range of the bass scale , you can set the timbre of thick instruments such as organ and bass.
- the present application sets the MIDI information based on the second ordinate of the track block, specifically setting music attributes such as the pitch, timbre, and volume of the track block, so that the generated music is more in line with the music theory, and the image quality is improved. Effects that generate music.
- the music file generation device further includes a receiving module, configured to receive a first input, wherein the first input is an input for selecting preset music features;
- the processing module is also used to determine the target music feature in response to the first input, the target music feature includes at least one of the following: music style, music mood, music style; adjust the music according to the music feature;
- the device for generating music files also includes a playing module for playing music files.
- the user can select a plurality of preset music features and select a target music feature, so as to adjust the music generated according to the first image rationally.
- the target music characteristics include music style, such as: pop music, classical music, electronic music, etc., and also include music mood, such as: passionate, deep, soothing, etc., and also include music style, such as: rock music, jazz, blues, etc. .
- the music generated according to the first image is adjusted, so that the adjusted music is more in line with the music feature selected by the user. If the user selects classical music, soothing, and blues, then the intermediate frequency and The volume of the low frequency, while adjusting the time interval of the second abscissa, makes the music rhythm slower and more soothing.
- further post-processing can be performed on the second ordinate of the track block in the MIDI coordinate system according to the preset music theory data and acoustic data.
- a key can be set in advance, and the range of the highest and lowest scales can be specified. If the highest and lowest scales of the track block within a certain period of time exceed this range, then according to certain adjustment rules, the range will be adjusted.
- Adjust the pitch of the track block that is, adjust the out-of-tone to in-tone, such as lowering the pitch of the track block higher than the highest scale threshold by one octave, or lowering the pitch of the track block lower than the lowest scale threshold
- the pitch is increased by one octave, etc., so that the adjusted music is more in line with the music theory.
- the adjusted music can be played automatically, so that the user can immediately enjoy the music generated according to the "memory photo" selected by him. Music, enjoy the joy of music creation.
- the generating module is also used to generate a second image corresponding to the music file
- the playing module is also used for displaying the second image and playing music files.
- a second image corresponding to the music file to be played may also be generated, and the second image is displayed while the music file is played, so that the user can experience visual and auditory enjoyment at the same time.
- the second image may be a static picture generated according to the first image selected by the user, or the salient feature texture map corresponding to the first image, and the static picture and the playing progress of the music are displayed when the music file is played.
- the second image can also be an animation file generated according to a preset template, or according to the playback interface of the MIDI information coordinate system, the animation duration of the animation file matches the music duration of the generated music, and is played while playing the music file Animations further enhance the user's visual experience.
- the receiving module is further configured to receive a second input, wherein the second input is an input for selecting a preset video template;
- the processing module is also used to determine the target video template in response to the second input
- the generation module is also used to generate the second image according to the target video template and the salient target texture map.
- the target video template selected by the user's second input by receiving the user's second input, the target video template selected by the user's second input, and the salient target texture map corresponding to the first image can be generated when playing music as the background when playing music image.
- the video template may be a coherent animation template, or a "slideshow" in which multiple static pictures are displayed in turn.
- the salient target texture map corresponding to the first image is superimposed and displayed, so that when the user sees the second image, the memory of the first image can be recalled and user experience can be improved.
- the generating module is also used to generate a target animation through the piano roll graphical interface, wherein the target animation is used to display the progress of music playback; according to the target animation and the prominent target texture map , generating the second image.
- the target animation is generated through the Piano Roll GUI, wherein the target animation is the process of playing the audio track block in the MIDI file in the Piano Roll GUI.
- the target animation is the process of playing the audio track block in the MIDI file in the Piano Roll GUI.
- the background of the interface according to the salient target texture map corresponding to the first image, it is used as the background image of the second image, so that a dominant visual connection is established between the second image and the first image, so that the user listens to music.
- watch the second image associated with the "reminiscence image” thereby arousing the user's memory and enriching the user's visual experience.
- the device for generating music files in the embodiment of the present application may be a device, or a component, an integrated circuit, or a chip in a terminal.
- the device may be a mobile electronic device or a non-mobile electronic device.
- the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle electronic device, a wearable device, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook or a personal digital assistant (personal digital assistant).
- non-mobile electronic devices can be servers, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (television, TV), teller machine or self-service machine, etc., this application Examples are not specifically limited.
- Network Attached Storage NAS
- personal computer personal computer, PC
- television television
- teller machine or self-service machine etc.
- the device for generating music files in the embodiment of the present application may be a device with an operating system.
- the operating system may be an Android operating system, an iOS operating system, or other possible operating systems, which are not specifically limited in this embodiment of the present application.
- the device for generating a music file provided in the embodiment of the present application can implement the various processes implemented in the above-mentioned method embodiments, and details are not repeated here to avoid repetition.
- FIG. 9 shows a structural block diagram of the electronic device according to the embodiment of the present application. As shown in FIG. 9, it includes a processor 902, a memory 904, and stores 904 and can run on the processor 902.
- the program or instruction is executed by the processor 902
- the various processes of the above-mentioned method embodiments can be achieved, and the same technical effect can be achieved. In order to avoid repetition, it is not described here Let me repeat.
- the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.
- FIG. 10 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
- the electronic device 2000 includes, but is not limited to: a radio frequency unit 2001, a network module 2002, an audio output unit 2003, an input unit 2004, a sensor 2005, a display unit 2006, a user input unit 2007, an interface unit 2008, a memory 2009, and a processor 2010, etc. part.
- the electronic device 2000 can also include a power supply (such as a battery) for supplying power to various components, and the power supply can be logically connected to the processor 2010 through the power management system, so that the management of charging, discharging, and function can be realized through the power management system. Consumption management and other functions.
- a power supply such as a battery
- the structure of the electronic device shown in FIG. 10 does not constitute a limitation to the electronic device.
- the electronic device may include more or fewer components than shown in the figure, or combine certain components, or arrange different components, and details will not be repeated here. .
- the processor 2010 is used to obtain the first image; perform feature extraction on the first image to obtain salient features of the first image; based on the position of the salient features in the first image, map the salient features to the musical instrument digital interface information coordinate system
- the MDI information corresponding to the salient features is determined; the MDI information coordinate system is used to indicate the correspondence between the MDI information and time; based on the MDI information and the time correspondence, a music file is generated.
- the image content of the first image includes salient objects, and the salient features include at least one of the following: key points of the salient objects, and edge feature points of the salient objects.
- the processor 2010 is further configured to perform object segmentation on the first image through a convolutional neural network to obtain salient objects in the first image and edge feature points of the salient objects; perform key point extraction on the salient objects to obtain Key points for salient goals.
- the processor 2010 is further configured to generate a salient object texture map corresponding to the first image according to the salient features; and determine a position of the salient feature in the first image according to the salient object texture map.
- the processor 2010 is further configured to perform edge detection on the first image according to the edge feature points and the Canny edge detection algorithm to obtain an edge image of the salient object; and generate a salient object corresponding to the salient object according to the key points and the edge feature points.
- the target map performing image superposition on the edge image and the salient target map to obtain the salient target texture map corresponding to the first image.
- the processor 2010 is further configured to divide the target texture map into X by Y graphics units of X rows and Y columns, where X and Y are both integers greater than 1, and the graphics units include bright pixels and dark pixels At least one of them, the bright pixel is a pixel with a brightness value of 1, and the dark pixel is a pixel with a brightness value of 0; among the graphics units X multiplied by Y, determine the target graphics unit whose quantity ratio of bright pixels is greater than a preset ratio , to obtain N target graphic units, wherein, the number of salient features of the first image is N, and N target graphic units are in one-to-one correspondence with N salient features, and N is a positive integer; according to N target graphic units, each The number of rows of the target graphic unit in X multiplied by Y graphic units determines the first vertical coordinate of the salient feature in the first image; according to the N target graphic units, each target graphic unit is in X multiplied by Y graphics Determine the first abscis
- the processor 2010 is also configured to transform the first ordinate into the musical instrument digital interface information coordinate system to obtain the second ordinate of the salient feature in the musical instrument digital interface information coordinate system; convert the first abscissa to the musical instrument In the digital interface information coordinate system, the second abscissa of the salient features in the musical instrument digital interface information coordinate system is obtained; according to the second ordinate and the second abscissa, N salient features are mapped to the musical instrument digital interface information coordinate system, N track blocks corresponding to N salient features one-to-one are obtained.
- the audio track block includes the digital instrument interface information
- the processor 2010 is further configured to determine the digital musical instrument interface information according to the second ordinate corresponding to the audio track block; wherein, the digital musical instrument interface information includes at least one of the following information: Pitch, timbre, volume.
- the user input unit 2007 is configured to receive a first input, wherein the first input is an input for selecting preset music features;
- the processor 2010 is also used to determine the target music feature in response to the first input, the target music feature includes at least one of the following: music style, music mood, music style; adjust the music according to the music feature;
- the audio output unit 2003 is used to play music files.
- the processor 2010 is also configured to generate a second image corresponding to the music file
- the display unit 2006 is also used for displaying the second image, and the audio output unit 2003 is also used for playing music files.
- the user input unit 2007 is further configured to receive a second input, wherein the second input is an input for selecting a preset video template;
- the processor 2010 is further configured to determine a target video template in response to the second input; and generate a second image according to the target video template and the salient target texture map.
- the processor 2010 is further configured to generate a target animation through the Piano Roll GUI, where the target animation is used to show the playing progress of the music; and generate the second image according to the target animation and the salient target texture map.
- the embodiment of this application constructs music through images, so that the formed music matches the images containing the user's memories.
- the threshold for music creation is lowered, so that "novice" users who do not have music theory knowledge can also construct corresponding music based on pictures.
- Music displays audio track blocks through the MIDI information coordinate system, making the final music visualization visible, giving users a unique dual experience of hearing and vision.
- the input unit 2004 may include a graphics processor (Graphics Processing Unit, GPU) 20041 and a microphone 20042, and the graphics processor 20041 is used for the image capture device (such as the image data of the still picture or video obtained by the camera) for processing.
- a graphics processor Graphics Processing Unit, GPU
- the graphics processor 20041 is used for the image capture device (such as the image data of the still picture or video obtained by the camera) for processing.
- the display unit 2006 may include a display panel 20061, and the display panel 20061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
- the user input unit 2007 includes a touch panel 20071 and other input devices 20072 .
- Touch panel 20071 also called touch screen.
- the touch panel 20071 may include two parts, a touch detection device and a touch controller.
- Other input devices 20072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be repeated here.
- the memory 2009 can be used to store software programs as well as various data, including but not limited to application programs and operating systems.
- Processor 2010 may integrate an application processor and a modem processor, wherein the application processor mainly processes operating systems, user interfaces, and application programs, and the modem processor mainly processes wireless communications. It can be understood that the foregoing modem processor may not be integrated into the processor 2010 .
- the embodiment of the present application also provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by the processor, each process of the above-mentioned method embodiment can be realized, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
- a readable storage medium includes a computer-readable storage medium, such as a computer read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.
- ROM computer read-only memory
- RAM random access memory
- magnetic disk or an optical disk and the like.
- the embodiment of the present application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the various processes of the above-mentioned method embodiments, and can achieve the same technical effect , to avoid repetition, it will not be repeated here.
- chips mentioned in the embodiments of the present application may also be called system-on-chip, system-on-chip, system-on-a-chip, or system-on-a-chip.
- the term “comprising”, “comprising” or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase “comprising a " does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.
- the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Functions are performed, for example, the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Library & Information Science (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
Claims (15)
- 一种音乐文件的生成方法,其特征在于,包括:获取第一图像;对所述第一图像进行特征提取,得到所述第一图像的显著特征;基于所述显著特征在所述第一图像中的位置,将所述显著特征映射到乐器数字接口信息坐标系中,确定所述显著特征对应的乐器数字接口信息;所述乐器数字接口信息坐标系用于指示所述乐器数字接口信息与时间的对应关系;基于所述乐器数字接口信息与时间的对应关系,生成音乐文件。
- 根据权利要求1所述的音乐文件的生成方法,其特征在于,所述第一图像的图像内容包括显著目标,所述显著特征包括以下中的至少一项:所述显著目标的关键点、所述显著目标的边缘特征点。
- 根据权利要求2所述的音乐文件的生成方法,其特征在于,所述对所述第一图像进行特征提取,得到所述第一图像的显著特征,包括:通过卷积神经网络,对所述第一图像进行目标分割,得到所述第一图像中的所述显著目标,和所述显著目标的所述边缘特征点;对所述显著目标进行关键点提取,得到所述显著目标的关键点。
- 根据权利要求2所述的音乐文件的生成方法,其特征在于,在所述基于所述显著特征在所述第一图像中的位置,将所述显著特征映射到乐器数字接口信息坐标系中之前,所述音乐文件的生成方法还包括:根据所述显著特征,生成所述第一图像对应的显著目标纹理图;根据所述显著目标纹理图,确定所述显著特征在所述第一图像中的位置。
- 根据权利要求4所述的音乐文件的生成方法,其特征在于,所述根据所述显著特征,生成所述第一图像对应的显著目标纹理图,包括:根据所述边缘特征点和坎尼边缘检测算法,对所述第一图像进行边缘检测,得到所述显著目标的边缘图像;根据所述关键点和所述边缘特征点,生成所述显著目标对应的显著目标图;对所述边缘图像和所述显著目标图进行图像叠加,得到所述第一图像对应的显著目标纹理图。
- 根据权利要求4所述的音乐文件的生成方法,其特征在于,所述根据所述目标纹理图,确定所述显著特征在所述第一图像中的位置,包括:将所述目标纹理图划分为X行、Y列的X乘Y个图形单元,其中,X和Y均为大于1的整数,所述图形单元内包括亮像素和暗像素中的至少一种,所述亮像素为亮度值为1的像素,所述暗像素为亮度值为0的像素;在所述X乘Y个图形单元中,确定所述亮像素的数量占比大于预设比值的目标图形单元,得到N个所述目标图形单元,其中,所述第一图像的显著特征的数量为N,所述N个目标图形单元与所述N个显著特征一一对应,N为正整数;根据所述N个目标图形单元中,每个所述目标图形单元在所述X乘Y个图形单元中所处的行数,确定所述显著特征在所述第一图像中的第一纵坐标;根据所述N个目标图形单元中,每个所述目标图形单元在所述X乘Y个图形单元中所处的列数,确定所述显著特征在所述第一图像中的第一横坐标;根据所述显著特征的横坐标和所述显著特征的横坐标纵坐标,确定所述显著特征在所述第一图像中的位置。
- 根据权利要求6所述的音乐文件的生成方法,其特征在于,所述基于所述显著特征在所述第一图像中的位置,将所述显著特征映射到乐器数字接口信息坐标系中,包括:将所述第一纵坐标转换到所述乐器数字接口信息坐标系中,得到所述显著特征在所述乐器数字接口信息坐标系中的第二纵坐标;将所述第一横坐标转换到所述乐器数字接口信息坐标系中,得到所述显著特征在所述乐器数字接口信息坐标系中的第二横坐标;根据所述第二纵坐标和所述第二横坐标,将所述N个显著特征映射到所述乐器数字接口信息坐标系中,得到与所述N个显著特征一一对应的N个音轨块。
- 根据权利要求7所述的音乐文件的生成方法,其特征在于,所述音轨块包含所述乐器数字接口信息,根据所述音轨块对应的第二纵坐标确定所述乐器数字接口信息;其中,所述乐器数字接口信息包括以下信息中的至少一项:音高、音色、音量。
- 根据权利要求4至8中任一项所述的音乐文件的生成方法,其特征在于,还包括:接收第一输入,其中,所述第一输入为对预设音乐特征进行选择的输入;响应于所述第一输入,确定目标音乐特征,所述目标音乐特征包括以下至少一项:音乐风格、音乐心情、音乐曲风;根据所述音乐特征对所述音乐进行调整;播放所述音乐文件。
- 根据权利要求9所述的音乐文件的生成方法,其特征在于,还包括:生成与所述音乐文件对应的第二图像;所述播放所述音乐,包括:显示所述第二图像,并播放所述音乐。
- 根据权利要求10所述的音乐文件的生成方法,其特征在于,所述生成所述音乐对应的第二图像,包括:接收第二输入,其中,所述第二输入为对预设视频模版进行选择的输入;响应于所述第二输入,确定目标视频模版;根据所述目标视频模版和所述显著目标纹理图,生成所述第二图像。
- 根据权利要求10所述的音乐文件的生成方法,其特征在于,所述生成所述音乐对应的第二图像,包括:通过钢琴卷帘图形界面,生成目标动画,其中所述目标动画用于展示所述音乐的播放进度;根据所述目标动画和所述显著目标纹理图,生成所述第二图像。
- 一种音乐文件的生成装置,其特征在于,包括:获取模块,用于获取第一图像;提取模块,用于对所述第一图像进行特征提取,得到所述第一图像的显著特征;处理模块,用于基于所述显著特征在所述第一图像中的位置,将所述显著特征映射到乐器数字接口信息坐标系中,确定所述显著特征对应的乐器数字接口信息;所述乐器数字接口信息坐标系用于指示所述乐器数字接口信息与时间的对应关系;生成模块,用于基于所述乐器数字接口信息与时间的对应关系,生成音乐文件。
- 一种电子设备,其特征在于,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1至12中任一项所述音乐文件的生成方法的步骤。
- 一种可读存储介质,其特征在于,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1至12中任一项所述音乐文件的生成方法的步骤。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP22845082.1A EP4339809A1 (en) | 2021-07-23 | 2022-06-24 | Method and apparatus for generating music file, and electronic device and storage medium |
US18/545,825 US20240127777A1 (en) | 2021-07-23 | 2023-12-19 | Method and apparatus for generating music file, and electronic device and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110839656.2 | 2021-07-23 | ||
CN202110839656.2A CN115687668A (zh) | 2021-07-23 | 2021-07-23 | 音乐文件的生成方法、生成装置、电子设备和存储介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/545,825 Continuation US20240127777A1 (en) | 2021-07-23 | 2023-12-19 | Method and apparatus for generating music file, and electronic device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023000917A1 true WO2023000917A1 (zh) | 2023-01-26 |
Family
ID=84980085
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/100969 WO2023000917A1 (zh) | 2021-07-23 | 2022-06-24 | 音乐文件的生成方法、生成装置、电子设备和存储介质 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20240127777A1 (zh) |
EP (1) | EP4339809A1 (zh) |
CN (1) | CN115687668A (zh) |
WO (1) | WO2023000917A1 (zh) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1287320A (zh) * | 1999-09-03 | 2001-03-14 | 北京航空航天大学 | 一种将图像信息转换成音乐的方法 |
JP2004287144A (ja) * | 2003-03-24 | 2004-10-14 | Yamaha Corp | 音楽再生と動画表示の制御装置およびそのプログラム |
JP2004286918A (ja) * | 2003-03-20 | 2004-10-14 | Yamaha Corp | 楽音形成端末装置、サーバ装置及びプログラム |
US20060156906A1 (en) * | 2005-01-18 | 2006-07-20 | Haeker Eric P | Method and apparatus for generating visual images based on musical compositions |
CN113035158A (zh) * | 2021-01-28 | 2021-06-25 | 深圳点猫科技有限公司 | 一种在线midi音乐编辑方法、系统及存储介质 |
-
2021
- 2021-07-23 CN CN202110839656.2A patent/CN115687668A/zh active Pending
-
2022
- 2022-06-24 EP EP22845082.1A patent/EP4339809A1/en active Pending
- 2022-06-24 WO PCT/CN2022/100969 patent/WO2023000917A1/zh active Application Filing
-
2023
- 2023-12-19 US US18/545,825 patent/US20240127777A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1287320A (zh) * | 1999-09-03 | 2001-03-14 | 北京航空航天大学 | 一种将图像信息转换成音乐的方法 |
JP2004286918A (ja) * | 2003-03-20 | 2004-10-14 | Yamaha Corp | 楽音形成端末装置、サーバ装置及びプログラム |
JP2004287144A (ja) * | 2003-03-24 | 2004-10-14 | Yamaha Corp | 音楽再生と動画表示の制御装置およびそのプログラム |
US20060156906A1 (en) * | 2005-01-18 | 2006-07-20 | Haeker Eric P | Method and apparatus for generating visual images based on musical compositions |
CN113035158A (zh) * | 2021-01-28 | 2021-06-25 | 深圳点猫科技有限公司 | 一种在线midi音乐编辑方法、系统及存储介质 |
Non-Patent Citations (1)
Title |
---|
XIAOYING WU ; ZE-NIAN LI: "A study of image-based music composition", MULTIMEDIA AND EXPO, 2008 IEEE INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 23 June 2008 (2008-06-23), Piscataway, NJ, USA , pages 1345 - 1348, XP031312979, ISBN: 978-1-4244-2570-9 * |
Also Published As
Publication number | Publication date |
---|---|
CN115687668A (zh) | 2023-02-03 |
EP4339809A1 (en) | 2024-03-20 |
US20240127777A1 (en) | 2024-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110941954B (zh) | 文本播报方法、装置、电子设备及存储介质 | |
JP7408048B2 (ja) | 人工知能に基づくアニメキャラクター駆動方法及び関連装置 | |
TWI486904B (zh) | 律動影像化方法、系統以及電腦可讀取記錄媒體 | |
CN111464834B (zh) | 一种视频帧处理方法、装置、计算设备及存储介质 | |
CN109785820A (zh) | 一种处理方法、装置及设备 | |
JP2021192222A (ja) | 動画インタラクティブ方法と装置、電子デバイス、コンピュータ可読記憶媒体、及び、コンピュータプログラム | |
CN104574453A (zh) | 用图像表达音乐的软件 | |
CN112562705A (zh) | 直播互动方法、装置、电子设备及可读存储介质 | |
CN112235635B (zh) | 动画显示方法、装置、电子设备及存储介质 | |
WO2019040524A1 (en) | METHOD AND SYSTEM FOR MUSIC COMMUNICATION | |
US11511200B2 (en) | Game playing method and system based on a multimedia file | |
CN112309365A (zh) | 语音合成模型的训练方法、装置、存储介质以及电子设备 | |
CN116484318A (zh) | 一种演讲训练反馈方法、装置及存储介质 | |
CN116630495A (zh) | 基于aigc算法的虚拟数字人模型规划系统 | |
WO2017168260A1 (ja) | 情報処理装置、プログラム及び情報処理システム | |
Solah et al. | Mood-driven colorization of virtual indoor scenes | |
CN112435641B (zh) | 音频处理方法、装置、计算机设备及存储介质 | |
WO2023000917A1 (zh) | 音乐文件的生成方法、生成装置、电子设备和存储介质 | |
US20220335974A1 (en) | Multimedia music creation using visual input | |
CN114786030B (zh) | 主播画面显示方法、装置、电子设备和存储介质 | |
JP7466087B2 (ja) | 推定装置、推定方法、及び、推定システム | |
JP2024523396A (ja) | 音楽ファイルの生成方法、生成装置、電子機器及び記憶媒体 | |
Chen et al. | New Enhancement Techniques for Optimizing Multimedia Visual Representations in Music Pedagogy. | |
WO2023001115A1 (zh) | 视频的生成方法、电子设备及其介质 | |
JP7339420B1 (ja) | プログラム、方法、情報処理装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22845082 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022845082 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2022845082 Country of ref document: EP Effective date: 20231212 |
|
ENP | Entry into the national phase |
Ref document number: 2023577867 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |