CN108960250B - Method and device for converting image into melody and computer readable storage medium - Google Patents

Method and device for converting image into melody and computer readable storage medium Download PDF

Info

Publication number
CN108960250B
CN108960250B CN201810427683.7A CN201810427683A CN108960250B CN 108960250 B CN108960250 B CN 108960250B CN 201810427683 A CN201810427683 A CN 201810427683A CN 108960250 B CN108960250 B CN 108960250B
Authority
CN
China
Prior art keywords
image
color
point
target image
melody
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810427683.7A
Other languages
Chinese (zh)
Other versions
CN108960250A (en
Inventor
邓立邦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Matview Intelligent Science & Technology Co ltd
Original Assignee
Guangdong Matview Intelligent Science & Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Matview Intelligent Science & Technology Co ltd filed Critical Guangdong Matview Intelligent Science & Technology Co ltd
Priority to CN201810427683.7A priority Critical patent/CN108960250B/en
Publication of CN108960250A publication Critical patent/CN108960250A/en
Application granted granted Critical
Publication of CN108960250B publication Critical patent/CN108960250B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0033Recording/reproducing or transmission of music for electrophonic musical instruments
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/111Automatic composing, i.e. using predefined musical rules
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/145Composing rules, e.g. harmonic or musical rules, for use in automatic composition; Rule generation algorithms therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/541Details of musical waveform synthesis, i.e. audio waveshape processing from individual wavetable samples, independently of their origin or of the sound they represent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Auxiliary Devices For Music (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention provides a method, a device and a computer readable storage medium for converting an image into a melody, wherein the method comprises the following steps: acquiring HSB values of all pixel points in a target image, and performing color clustering processing to obtain a color clustering image corresponding to the target image; carrying out normalization processing on each color block in the color clustering image to obtain a pronunciation point image corresponding to the target image, mapping the pronunciation point image to a pre-established grid, and establishing a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid; and extracting the scale corresponding to the sound production point in the sound production point image along the set direction of the grid according to the mapping relation, and converting the scale corresponding to the sound production point in the sound production point image into audio by adopting the virtual instrument corresponding to the playing instrument determined by the color clustering image so as to generate the melody corresponding to the target image. The method can convert the target image into a section of specific melody, greatly reduce the time length and cost for making the music melody and meet the customization requirement of people on the music melody.

Description

Method and device for converting image into melody and computer readable storage medium
Technical Field
The invention relates to the technical field of image and music processing, in particular to a method and a device for converting an image into a melody and a computer readable storage medium.
Background
Music is a form of expression of human emotion, and melody is the most basic element constituting music, and a music artist completes music creation by creating melody. With the continuous development of digital music and computer-related technologies, more and more people want to use computer technology to automatically compose music to meet personalized requirements, such as matching a section of unique background music with a section of photographed video, matching a section of melody for browsing a group of photos, setting a unique and personalized incoming ringtone without two for a mobile phone, etc., however, it is very difficult for general people to compose a beautiful melody and music belonging to the general people, and the current music composition needs a special computer device and system, which is high in cost, time-consuming and complex in operation, and especially has a very high learning cost for common users, and is not easy to use.
Disclosure of Invention
The invention aims to provide an image-to-melody conversion method, device and computer readable storage medium, which can convert a target image into a specific music melody, greatly reduce the time length and cost for making the music melody and meet the customization requirement of people for the music melody.
The embodiment of the invention provides a method for converting an image into a melody, which comprises the following steps:
acquiring an HSB value of each pixel point in a target image, and performing color clustering processing on each pixel point of the target image according to the HSB value to obtain a color clustering image corresponding to the target image;
normalizing the color blocks in the color clustering image to obtain a phonation point image corresponding to the target image;
mapping the pronunciation point image to a pre-established grid, and establishing a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid;
extracting the dominant hue of the target image according to the color clustering image;
determining the type of the playing musical instrument according to the main tone of the target image and a preset tone musical instrument comparison table;
and extracting the scale corresponding to the pronunciation point in the pronunciation point image along the set direction of the grid according to the mapping relation, converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by adopting the virtual instrument corresponding to the type of the played instrument, and generating the melody corresponding to the target image.
Preferably, the method for converting an image into a melody further comprises:
collecting cover images corresponding to musical compositions of a plurality of playing musical instruments;
extracting HSB values of all pixel points in any one cover image, and carrying out color clustering processing on all pixel points of any one cover image according to the HSB values to obtain a template color clustering image corresponding to any one cover image, and obtaining N template color clustering images in total;
calculating the area ratio of each color patch in the template color clustering image, and obtaining the corresponding dominant hue and dominant hue area ratio of the template color clustering image as the color distribution of the template color clustering image;
and carrying out statistical analysis on the color distribution of the N template color cluster images and the playing musical instruments corresponding to the template color cluster images, establishing a mapping relation between the color distribution of the template color cluster images and the playing musical instruments corresponding to the template color cluster images, and generating the tonal musical instrument comparison table.
Preferably, the determining the type of the playing instrument according to the dominant hue of the target image and a preset hue instrument comparison table specifically includes:
calculating the area ratio of each color patch in the color clustering image corresponding to the target image to obtain the dominant hue area ratio corresponding to the dominant hue of the target image;
comparing the keytone and the keytone area ratio of the target image with a plurality of color distribution ratios in the tone instrument comparison table, and determining the playing instrument corresponding to the color distribution with the smallest difference between the keytone and the keytone area ratio of the target image in the tone instrument comparison table as the type of the playing instrument corresponding to the keytone in the color cluster image;
and determining the volume ratio of the musical instruments corresponding to the color blocks in the color cluster image according to the dominant hue of the target image and the dominant hue area ratio.
Preferably, the obtaining of the HSB value of each pixel point in the target image and the color clustering processing of each pixel point of the target image according to the HSB value to obtain the color clustering image corresponding to the target image specifically include:
acquiring an HSB value of each pixel point in a target image;
according to the HSB value of each pixel point in the target image, acquiring the pixel points of which the hue distance exceeds a first threshold value in the target image, and acquiring a plurality of color mutation areas;
calculating the average hue value of adjacent pixel points of which the difference value of the HSB values in the color mutation area is smaller than a second threshold value, and aggregating the adjacent pixel points into color blocks corresponding to the average hue value;
and when the hue distance of the adjacent pixel points in the color mutation area is zero, generating the color clustering image according to the polymerized color block.
Preferably, the normalizing the color patches in the color cluster image to obtain the phonation point image corresponding to the target image specifically includes:
acquiring a color block with the minimum area in the color clustering image, and setting the color block with the minimum area as a sound producing point;
adjusting other color blocks in the color clustering images to be integral multiples of the phonation points;
and generating the phonation point image according to the phonation points corresponding to the color blocks in the color clustering image.
Preferably, the mapping the pronunciation point image to a pre-established grid and establishing a mapping relationship between each pronunciation point in the pronunciation point image and each scale in the grid specifically include:
setting the area of the square grid and establishing the grid according to the area of the phonation point and a preset proportion; each row of the grid corresponds to a scale, and each column of the grid corresponds to a time point;
mapping each sound point in the sound point image to the grid;
when the phonation points are distributed on the grid lines of the grid, respectively calculating the area occupation ratio of the phonation points in the adjacent grids connected with the grid lines, and distributing the phonation points to one grid with the larger area occupation ratio of the phonation points in the adjacent grids;
and establishing a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid according to the position of each pronunciation point in the pronunciation point image in the grid and the scale corresponding to each row in the grid.
Preferably, the extracting, according to the mapping relationship, the scale corresponding to the pronunciation point in the pronunciation point image along the set direction of the mesh, and converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by using the virtual instrument corresponding to the type of the musical instrument to generate the melody corresponding to the target image specifically includes:
the set direction is a time axis direction formed by time points corresponding to each row of the grid;
extracting a scale corresponding to the pronunciation point in the pronunciation point image according to the mapping relation and the time axis direction corresponding to the grids;
when a plurality of sound producing points are positioned in any row of adjacent grids in the grid, the sound producing points are adjusted to be long tones of the scale corresponding to the any row;
extracting time points corresponding to the sound points in the sound point images according to the time axis direction;
and converting the scale corresponding to the sound producing point in the sound producing point image into audio by adopting a virtual instrument corresponding to the type of the played instrument according to the scale and the time point corresponding to the sound producing point in the sound producing point image, and generating the melody corresponding to the target image.
Preferably, the extracting, according to the mapping relationship, a scale corresponding to a pronunciation point in the pronunciation point image along a set direction of the mesh, and converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by using a virtual instrument corresponding to the type of the musical instrument to generate a melody corresponding to the target image, further includes:
adjusting scales corresponding to each row of the grid, reestablishing the mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid, and regenerating the melody corresponding to the target image by adopting the virtual musical instrument to obtain N melodies corresponding to the target image;
respectively converting the N melodies corresponding to the target image into oscillograms to obtain N oscillograms in total;
respectively calculating the similarity between any one oscillogram and a plurality of template oscillograms pre-stored in a oscillogram template database, and extracting the maximum value of the similarity of any one oscillogram relative to the plurality of template oscillograms as the reference value of any one oscillogram;
extracting a waveform diagram corresponding to a maximum reference value from the N waveform diagrams;
and extracting the melody corresponding to the waveform diagram corresponding to the maximum reference value as the target melody of the target image.
The embodiment of the present invention further provides an apparatus for converting an image into a melody, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor implements the method for converting an image into a melody when executing the computer program.
The embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the above method for converting an image into a melody.
Compared with the prior art, the method for converting the image into the melody provided by the embodiment of the invention has the beneficial effects that: the method for converting the image into the melody comprises the following steps: acquiring an HSB value of each pixel point in a target image, and performing color clustering processing on each pixel point of the target image according to the HSB value to obtain a color clustering image corresponding to the target image; normalizing the color blocks in the color clustering image to obtain a phonation point image corresponding to the target image; mapping the pronunciation point image to a pre-established grid, and establishing a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid; extracting the dominant hue of the target image according to the color clustering image; determining the type of the playing musical instrument according to the main tone of the target image and a preset tone musical instrument comparison table; and extracting the scale corresponding to the pronunciation point in the pronunciation point image along the set direction of the grid according to the mapping relation, converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by adopting the virtual instrument corresponding to the type of the played instrument, and generating the melody corresponding to the target image. By the method, the target image can be converted into a section of specific music melody, so that the time length and the cost for manufacturing the music melody are greatly reduced, and the customization requirement of people on the music melody is met. The embodiment of the invention also provides a device for converting the image into the melody and a computer readable storage medium.
Drawings
FIG. 1 is a flowchart illustrating a method for converting an image into a melody according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an apparatus for converting an image into a melody according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Please refer to fig. 1, which is a flowchart illustrating a method for image-to-melody conversion according to an embodiment of the present invention, the method comprising:
s100: acquiring an HSB value of each pixel point in a target image, and performing color clustering processing on each pixel point of the target image according to the HSB value to obtain a color clustering image corresponding to the target image;
s200: normalizing the color blocks in the color clustering image to obtain a phonation point image corresponding to the target image;
s300: mapping the pronunciation point image to a pre-established grid, and establishing a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid;
s400: extracting the dominant hue of the target image according to the color clustering image;
s500: determining the type of the playing musical instrument according to the main tone of the target image and a preset tone musical instrument comparison table;
s600: and extracting the scale corresponding to the pronunciation point in the pronunciation point image along the set direction of the grid according to the mapping relation, converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by adopting the virtual instrument corresponding to the type of the played instrument, and generating the melody corresponding to the target image.
In the embodiment, a pronunciation point image is obtained after color clustering and normalization processing is carried out on a target image, the pronunciation point image is mapped into a preset grid, a mapping relation between a pronunciation point and a scale is established, and then the type of a musical instrument is determined according to a dominant tone in the color clustering image; the virtual musical instrument corresponding to the type of the playing musical instrument is adopted through the mapping relation, the target image can be converted into a section of specific music melody according to the time axis direction of the grid, the duration and the cost of music melody making are greatly reduced, the difficulty of music making is reduced, and the customization requirements of people on the music melody are met, so that the method has wide application prospects in the aspects of personalized mobile phone ring, electronic album background music, screen protection background music, film and television composition music and the like.
Wherein, the step S400: extracting the dominant hue of the target image according to the color clustering image, which specifically comprises the following steps: and carrying out color clustering on the color clustering image by adopting a clustering algorithm, specifically, partitioning the main color tone into blocks, continuously taking an average value of adjacent points with approximate HSB (hue-of-saturation) values and aggregating the points into the same color block, and processing the target image into color block combinations of various main color tones, such as triangle, circle, rectangle and other graphic combinations, so as to obtain the color block combination of the main color tone of the target image. And extracting the color value of each color block, namely the dominant hue of the target image, and respectively calculating the area ratio of each color block in the target image. Further, the area ratio of the color blocks in the color cluster image is larger than a set threshold value, and the color blocks are determined as the main tone of the target image.
In an optional embodiment, the method for converting an image into a melody further comprises:
collecting cover images corresponding to musical compositions of a plurality of playing musical instruments;
extracting HSB values of all pixel points in any one cover image, and carrying out color clustering processing on all pixel points of any one cover image according to the HSB values to obtain a template color clustering image corresponding to any one cover image, and obtaining N template color clustering images in total;
calculating the area ratio of each color patch in the template color clustering image, and obtaining the corresponding dominant hue and dominant hue area ratio of the template color clustering image as the color distribution of the template color clustering image;
and carrying out statistical analysis on the color distribution of the N template color cluster images and the playing musical instruments corresponding to the template color cluster images, establishing a mapping relation between the color distribution of the template color cluster images and the playing musical instruments corresponding to the template color cluster images, and generating the tonal musical instrument comparison table.
In this embodiment, a large number of cover images corresponding to musical compositions (e.g., musical compositions such as CDs, DVDs, digital sound sources, etc.) of musical instruments are collected, HSB values of the cover images are extracted, points in the cover images with adjacent HSB values having close color differences are averaged and aggregated, the collected cover images are subjected to color clustering, the cover images are processed into different color block combination areas, color values of the color blocks are extracted, and area ratios of the color blocks in the cover images are calculated, so that dominant hues and the area ratios of the cover images are obtained. Through statistical rule analysis, the rule of the dominant hue and area ratio of different playing instruments and the cover image is obtained, and a large amount of statistical data of the dominant hue and the area ratio of each dominant hue of the corresponding cover image of the playing instruments, namely the color distribution of the cover image, are obtained.
In an alternative embodiment, S500: determining the type of the playing musical instrument according to the main tone of the target image and a preset tone musical instrument comparison table, wherein the method specifically comprises the following steps:
calculating the area ratio of each color patch in the color clustering image corresponding to the target image to obtain the dominant hue area ratio corresponding to the dominant hue of the target image;
comparing the keytone and the keytone area ratio of the target image with a plurality of color distribution ratios in the tone instrument comparison table, and determining the playing instrument corresponding to the color distribution with the smallest difference between the keytone and the keytone area ratio of the target image in the tone instrument comparison table as the type of the playing instrument corresponding to the keytone in the color cluster image;
and determining the volume ratio of the musical instruments corresponding to the color blocks in the color cluster image according to the dominant hue of the target image and the dominant hue area ratio.
In the present embodiment, specifically, the keytone of the target image for generating the musical melody is extracted, and the type of the corresponding musical instrument to be played and the volume of the musical instrument to be played are determined based on the keytone of the target image and the area ratio of the keytone in the target image. Because the color distribution condition of each picture is different, some expressions have more contents and richer color distribution, some expressions have less contents and more single color distribution, a threshold value is set to determine the number of the dominant colors of the target image, and the playing musical instruments are determined to be used singly or in combination according to the area ratio of the dominant colors. Comparing the obtained dominant hue and dominant hue area of the target image, comparing the data of the corresponding relation between various playing instruments and color distribution which are counted in advance and stored in the server, namely the tonal instrument comparison table, finding out the playing instrument corresponding to the color distribution closest to the dominant hue combination of the target image, and obtaining the playing instrument combination mode of the melody to be generated. For example, the target image includes M dominant colors, and based on the M dominant colors and the dominant color area ratios corresponding to the M dominant colors, corresponding M musical instruments can be determined and used to combine and generate the melody.
For example, when the area ratio of a certain color patch in the target image reaches 80% or more, a single musical instrument is used for playing. For another example, when the ratio of the dominant hue to the dominant hue distribution corresponding to the target image is: the mapping relation between various playing instruments and color distribution can be known according to the tone instrument comparison table, so that the playing instrument types corresponding to all the keytones in the target image are obtained, at the moment, the playing instruments corresponding to 40%, 30%, 20% and 10% keytones are simultaneously used for carrying out ensemble on the scales marked in the grid, and the corresponding volume is also distributed according to the proportion of the keytone area.
In an alternative embodiment, S100: the method comprises the steps of obtaining an HSB value of each pixel point in a target image, carrying out color clustering processing on each pixel point of the target image according to the HSB value, and obtaining a color clustering image corresponding to the target image, wherein the method specifically comprises the following steps:
acquiring an HSB value of each pixel point in a target image;
according to the HSB value of each pixel point in the target image, acquiring the pixel points of which the hue distance exceeds a first threshold value in the target image, and acquiring a plurality of color mutation areas;
calculating the average hue value of adjacent pixel points of which the difference value of the HSB values in the color mutation area is smaller than a second threshold value, and aggregating the adjacent pixel points into color blocks corresponding to the average hue value;
and when the hue distance of the adjacent pixel points in the color mutation area is zero, generating the color clustering image according to the polymerized color block.
In this embodiment, the range of the first threshold is 60 degrees to 130 degrees, and preferably, the first threshold is 60 degrees. The second threshold is 15 degrees. For example, when the hue distance between two pixel points in the target image exceeds 60 degrees, the color abrupt change region is determined. After the color mutation area is found out, adjacent pixel points in the target image are continuously analyzed, and the adjacent pixel points with the close HSB values are averaged and aggregated into a color block, for example, the HSB values of the adjacent pixel point A, B are respectively: the HSB values of the points A are H42 degrees, S43 degrees and B21 degrees, the HSB values of the points B are H38 degrees, S42 degrees and B25 degrees, the H values of the points A and B are A42 degrees and B38 degrees, the hue distance is within 15 degrees, the average values of the hue values of the points A and B are converged into a color block with a hue value of H40 degrees, different HSB values of adjacent points are repeatedly selected to analyze and calculate the hue average value until the hue average values of the adjacent points with the HSB values close to color difference are converged, finally, the target image is processed into a plurality of different color blocks, and the color cluster image is generated.
In an alternative embodiment, S200: normalizing the color patches in the color clustering image to obtain a phonation point image corresponding to the target image, which specifically comprises:
acquiring a color block with the minimum area in the color clustering image, and setting the color block with the minimum area as a sound producing point;
adjusting other color blocks in the color clustering images to be integral multiples of the phonation points;
and generating the phonation point image according to the phonation points corresponding to the color blocks in the color clustering image.
In an alternative embodiment, S300: mapping the pronunciation point image to a pre-established grid, and establishing a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid, which specifically comprises the following steps:
setting the area of the square grid and establishing the grid according to the area of the phonation point and a preset proportion; each row of the grid corresponds to a scale, and each column of the grid corresponds to a time point;
mapping each sound point in the sound point image to the grid;
when the phonation points are distributed on the grid lines of the grid, respectively calculating the area occupation ratio of the phonation points in the adjacent grids connected with the grid lines, and distributing the phonation points to one grid with the larger area occupation ratio of the phonation points in the adjacent grids;
and establishing a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid according to the position of each pronunciation point in the pronunciation point image in the grid and the scale corresponding to each row in the grid.
In an alternative embodiment, S400: extracting the scale corresponding to the pronunciation point in the pronunciation point image along the set direction of the grid according to the mapping relation, converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by adopting the virtual instrument corresponding to the type of the played instrument, and generating the melody corresponding to the target image, wherein the steps specifically comprise:
the set direction is a time axis direction formed by time points corresponding to each row of the grid;
extracting a scale corresponding to the pronunciation point in the pronunciation point image according to the mapping relation and the time axis direction corresponding to the grids;
when a plurality of sound producing points are positioned in any row of adjacent grids in the grid, the sound producing points are adjusted to be long tones of the scale corresponding to the any row;
extracting time points corresponding to the sound points in the sound point images according to the time axis direction;
and converting the scale corresponding to the sound producing point in the sound producing point image into audio by adopting a virtual instrument corresponding to the type of the played instrument according to the scale and the time point corresponding to the sound producing point in the sound producing point image, and generating the melody corresponding to the target image.
In an optional embodiment, the extracting, according to the mapping relationship, a scale corresponding to a pronunciation point in the pronunciation point image along a set direction of the mesh, and converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by using a virtual instrument corresponding to the type of the playing instrument to generate a melody corresponding to the target image, and then further includes:
adjusting scales corresponding to each row of the grid, reestablishing the mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid, and regenerating the melody corresponding to the target image by adopting the virtual musical instrument to obtain N melodies corresponding to the target image;
respectively converting the N melodies corresponding to the target image into oscillograms to obtain N oscillograms in total;
respectively calculating the similarity between any one oscillogram and a plurality of template oscillograms pre-stored in a oscillogram template database, and extracting the maximum value of the similarity of any one oscillogram relative to the plurality of template oscillograms as the reference value of any one oscillogram;
extracting a waveform diagram corresponding to a maximum reference value from the N waveform diagrams;
and extracting the melody corresponding to the waveform diagram corresponding to the maximum reference value as the target melody of the target image.
In this embodiment, the scale corresponding to each row of the mesh may be adjusted, the mapping relationship between each pronunciation point in the pronunciation point image and each scale in the mesh may be reestablished, and the melody corresponding to the target image may be regenerated, so that a plurality of melodies may be generated according to the mesh. Each music style has a unique scale combination, and the melody created according to the scales in the unique scale combinations can have the characteristics of the national music, so that the scales of each row in the grid are set according to the creation style, and the created melody has a specific music style. For example, the chinese five-tone scale contains the following tones: 123561, respectively; japanese six-tone scale, contains the following tones: 6712346, respectively; romania minor scale, comprising the following tones: 671 #234# 56. By changing the scale combination of each row in the grid, different styles of music melodies can be created. Therefore, N melodies corresponding to the target image can be obtained, then the N melodies are converted into oscillograms and are matched with a plurality of template oscillograms, the maximum value of the similarity of each oscillogram relative to the plurality of template oscillograms is extracted and is used as a reference value of each oscillogram, and then each oscillogram corresponds to one reference value; the sizes of the reference values of the waveform images are compared to obtain the waveform image corresponding to the maximum reference value, the melody corresponding to the waveform image corresponding to the maximum reference value is extracted to serve as the target melody of the target image, and the N generated melodies can be effectively screened by the method, so that the melody closest to the existing music melody creation style is obtained, and the melody creation quality is improved.
Please refer to fig. 2, which is a diagram illustrating an apparatus for converting an image into a melody according to an embodiment of the present invention, the apparatus comprising:
the color clustering module 1 is used for acquiring an HSB value of each pixel point in a target image, and performing color clustering processing on each pixel point of the target image according to the HSB value to obtain a color clustering image corresponding to the target image;
the normalization processing module 2 is used for performing normalization processing on each color block in the color clustering image to obtain a sound point image corresponding to the target image;
a mapping relation establishing module 3, configured to map the pronunciation point image into a pre-established grid, and establish a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid;
a dominant hue extraction module 4, configured to extract a dominant hue of the target image according to the color cluster image;
the instrument type determining module 5 is used for determining the type of the playing instrument according to the dominant tone of the target image and a preset tone instrument comparison table;
and the first melody generating module 6 is configured to extract a scale corresponding to the pronunciation point in the pronunciation point image along the set direction of the mesh according to the mapping relationship, convert the scale corresponding to the pronunciation point in the pronunciation point image into audio by using a virtual instrument corresponding to the type of the musical instrument, and generate a melody corresponding to the target image.
In the embodiment, after color clustering and normalization processing are performed on the target image, the pronunciation point image is obtained, the pronunciation point image is mapped into a preset grid, a mapping relation between the pronunciation point and the scale is established, the target image can be converted into a specific music melody according to the time axis direction of the grid through the mapping relation, the duration and the cost of music melody making are greatly reduced, the difficulty of music making is reduced, the customization requirements of people on the music melody are met, and therefore the device has wide application prospects in the aspects of personalized mobile phone ring, electronic album background music, screen protection background music, film and television work music and the like.
The dominant hue extraction module 4 is mainly configured to extract the dominant hue of the target image according to the color cluster image, and specifically includes: and carrying out color clustering on the color clustering image by adopting a clustering algorithm, specifically, partitioning the main color tone into blocks, continuously taking an average value of adjacent points with approximate HSB (hue-of-saturation) values and aggregating the points into the same color block, and processing the target image into color block combinations of various main color tones, such as triangle, circle, rectangle and other graphic combinations, so as to obtain the color block combination of the main color tone of the target image. And extracting the color value of each color block, namely the dominant hue of the target image, and respectively calculating the area ratio of each color block in the target image. Further, the area ratio of the color blocks in the color cluster image is larger than a set threshold value, and the color blocks are determined as the main tone of the target image.
In an alternative embodiment, the apparatus for converting image into melody further comprises:
the cover image acquisition module is used for acquiring cover images corresponding to musical compositions of a plurality of playing musical instruments;
the template clustering image generation module is used for extracting the HSB value of each pixel point in any cover image, carrying out color clustering processing on each pixel point of any cover image according to the HSB value, obtaining a template color clustering image corresponding to any cover image, and obtaining N template color clustering images in total;
the color distribution calculation module is used for calculating the area ratio of each color patch in the template color clustering image, and obtaining the corresponding dominant hue and dominant hue area ratio of the template color clustering image as the color distribution of the template color clustering image;
and the tonal instrument comparison table generation module is used for performing statistical analysis on the color distribution of the N template color cluster images and the playing instruments corresponding to the template color cluster images, establishing a mapping relation between the color distribution of the template color cluster images and the playing instruments corresponding to the template color cluster images, and generating the tonal instrument comparison table.
In this embodiment, a large number of cover images corresponding to musical compositions (e.g., musical compositions such as CDs, DVDs, digital sound sources, etc.) of musical instruments are collected, HSB values of the cover images are extracted, points in the cover images with adjacent HSB values having close color differences are averaged and aggregated, the collected cover images are subjected to color clustering, the cover images are processed into different color block combination areas, color values of the color blocks are extracted, and area ratios of the color blocks in the cover images are calculated, so that dominant hues and the area ratios of the cover images are obtained. Through statistical rule analysis, the rule of the dominant hue and area ratio of different playing instruments and the cover image is obtained, and a large amount of statistical data of the dominant hue and the area ratio of each dominant hue of the corresponding cover image of the playing instruments, namely the color distribution of the cover image, are obtained.
In an alternative embodiment, the instrument type determination module 5 comprises:
the area ratio calculation unit is used for calculating the area ratio of each color block in the color clustering image corresponding to the target image to obtain the dominant hue area ratio corresponding to the dominant hue of the target image;
an area ratio comparing unit configured to compare the keytone and the keytone area ratio of the target image with a plurality of color distribution ratios in the tone instrument comparison table, and determine, as a type of the musical instrument corresponding to the keytone in the color cluster image, a musical instrument corresponding to a color distribution in the tone instrument comparison table having a smallest difference between the keytone and the keytone area ratio of the target image;
and the volume distribution unit is used for determining the volume ratio of the musical instruments corresponding to the color blocks in the color cluster image according to the dominant hue of the target image and the dominant hue area ratio.
In the present embodiment, specifically, the keytone of the target image for generating the musical melody is extracted, and the type of the corresponding musical instrument to be played and the volume of the musical instrument to be played are determined based on the keytone of the target image and the area ratio of the keytone in the target image. Because the color distribution condition of each picture is different, some expressions have more contents and richer color distribution, some expressions have less contents and more single color distribution, a threshold value is set to determine the number of the dominant colors of the target image, and the playing musical instruments are determined to be used singly or in combination according to the area ratio of the dominant colors. Comparing the obtained dominant hue and dominant hue area of the target image, comparing the data of the corresponding relation between various playing instruments and color distribution which are counted in advance and stored in the server, namely the tonal instrument comparison table, finding out the playing instrument corresponding to the color distribution closest to the dominant hue combination of the target image, and obtaining the playing instrument combination mode of the melody to be generated. For example, the target image includes M dominant colors, and based on the M dominant colors and the dominant color area ratios corresponding to the M dominant colors, corresponding M musical instruments can be determined and used to combine and generate the melody.
For example, when the area ratio of a certain color patch in the target image reaches 80% or more, a single musical instrument is used for playing. For another example, when the ratio of the dominant hue to the dominant hue distribution corresponding to the target image is: the mapping relation between various playing instruments and color distribution can be known according to the tone instrument comparison table, so that the playing instrument types corresponding to all the keytones in the target image are obtained, at the moment, the playing instruments corresponding to 40%, 30%, 20% and 10% keytones are simultaneously used for carrying out ensemble on the scales marked in the grid, and the corresponding volume is also distributed according to the proportion of the keytone area.
In an alternative embodiment, the color clustering module 1 comprises: the device comprises an HSB value acquisition unit, a color mutation acquisition unit, a color block polymerization unit and a color clustering image generation unit;
the HSB value acquisition unit is used for acquiring the HSB value of each pixel point in the target image;
the color mutation obtaining unit is used for obtaining pixel points with hue distances exceeding a first threshold value in the target image according to the HSB value of each pixel point in the target image and obtaining a plurality of color mutation areas;
the color lump polymerization unit is used for calculating the hue average value of the adjacent pixel points of which the difference value of the HSB values in the color mutation area is smaller than a second threshold value, and polymerizing the adjacent pixel points into a color lump corresponding to the hue average value;
and the color cluster image generation unit is used for generating the color cluster image according to the aggregated color block when the hue distance of the adjacent pixel points in the color mutation area is zero.
In this embodiment, the range of the first threshold is 60 degrees to 130 degrees, and preferably, the first threshold is 60 degrees. The second threshold is 15 degrees. For example, when the hue distance between two pixel points in the target image exceeds 60 degrees, the color abrupt change region is determined. After the color mutation area is found out, adjacent pixel points in the target image are continuously analyzed, and the adjacent pixel points with the close HSB values are averaged and aggregated into a color block, for example, the HSB values of the adjacent pixel point A, B are respectively: the HSB values of the points A are H42 degrees, S43 degrees and B21 degrees, the HSB values of the points B are H38 degrees, S42 degrees and B25 degrees, the H values of the points A and B are A42 degrees and B38 degrees, the hue distance is within 15 degrees, the average values of the hue values of the points A and B are converged into a color block with a hue value of H40 degrees, different HSB values of adjacent points are repeatedly selected to analyze and calculate the hue average value until the hue average values of the adjacent points with the HSB values close to color difference are converged, finally, the target image is processed into a plurality of different color blocks, and the color cluster image is generated.
In an alternative embodiment, the normalization processing module 2 comprises: a sound point setting unit, a sound point adjusting unit and a sound point image generating unit;
the phonation point setting unit is used for acquiring a color block with the minimum area in the color clustering image and setting the color block with the minimum area as a phonation point;
the phonation point adjusting unit is used for adjusting other color blocks in the color clustering images to be integral multiples of the phonation points;
and the sound producing point image generating unit is used for generating the sound producing point image according to the sound producing points corresponding to the color blocks in the color clustering image.
In an alternative embodiment, the mapping relationship establishing module 3 includes: the system comprises a grid establishing unit, a mapping unit, a phonemic point distributing unit and a mapping relation establishing unit;
the grid establishing unit is used for setting the area of the square grid and establishing the grid according to the area of the phonation point and a preset proportion; each row of the grid corresponds to a scale, and each column of the grid corresponds to a time point;
the mapping unit is used for mapping each phonation point in the phonation point image to the grid;
the sound producing point distributing unit is used for respectively calculating the area occupation ratio of the sound producing points in adjacent grids connected with the grid lines when the sound producing points are distributed on the grid lines of the grid, and distributing the sound producing points to one grid with larger area occupation ratio of the sound producing points in the adjacent grids;
the mapping relation establishing unit is used for establishing the mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid according to the position of each pronunciation point in the pronunciation point image in the grid and the scale corresponding to each row in the grid.
In an alternative embodiment, the first melody generating module 4 comprises: the musical scale extracting unit, the duration setting unit, the time point extracting unit and the melody generating unit;
the set direction is a time axis direction formed by time points corresponding to each row of the grid;
the scale extracting unit is used for extracting scales corresponding to the pronunciation points in the pronunciation point image according to the mapping relation and the time axis direction corresponding to the grids;
the note length setting unit is used for adjusting a plurality of pronunciation points to long notes of a scale corresponding to any row when the plurality of pronunciation points are positioned in any row of adjacent grids in the grid;
the time point extracting unit is used for extracting the time point corresponding to the sound point in the sound point image according to the time axis direction;
and the melody generating unit is used for converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by adopting a virtual instrument corresponding to the type of the musical instrument to generate the melody corresponding to the target image according to the scale and the time point corresponding to the pronunciation point in the pronunciation point image.
In an alternative embodiment, the apparatus for converting an image into a melody further comprises:
the grid scale adjusting module is used for adjusting scales corresponding to each line of the grid, reestablishing the mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid, and regenerating the melody corresponding to the target image by adopting the virtual musical instrument to obtain N melodies corresponding to the target image;
the oscillogram generation module is used for respectively converting the N melodies corresponding to the target image into oscillograms to obtain N oscillograms in total;
the similarity calculation module is used for calculating the similarity between any one oscillogram and a plurality of template oscillograms pre-stored in a oscillogram template database respectively, and extracting the maximum value of the similarity of any one oscillogram relative to the plurality of template oscillograms as the reference value of any one oscillogram;
the oscillogram extracting module is used for extracting the oscillogram corresponding to the maximum reference value from the N oscillograms;
and the melody extraction module is used for extracting the melody corresponding to the waveform diagram corresponding to the maximum reference value as the target melody of the target image.
In this embodiment, the scale corresponding to each row of the mesh may be adjusted, the mapping relationship between each pronunciation point in the pronunciation point image and each scale in the mesh may be reestablished, and the melody corresponding to the target image may be regenerated, so that a plurality of melodies may be generated according to the mesh. Each music style has a unique scale combination, and the melody created according to the scales in the unique scale combinations can have the characteristics of the national music, so that the scales of each row in the grid are set according to the creation style, and the created melody has a specific music style. For example, the chinese five-tone scale contains the following tones: 123561, respectively; japanese six-tone scale, contains the following tones: 6712346, respectively; romania minor scale, comprising the following tones: 671 #234# 56. By changing the scale combination of each row in the grid, different styles of music melodies can be created. Therefore, N melodies corresponding to the target image can be obtained, then the N melodies are converted into oscillograms and are matched with a plurality of template oscillograms, the maximum value of the similarity of each oscillogram relative to the plurality of template oscillograms is extracted and is used as a reference value of each oscillogram, and then each oscillogram corresponds to one reference value; the sizes of the reference values of the waveform images are compared to obtain the waveform image corresponding to the maximum reference value, the melody corresponding to the waveform image corresponding to the maximum reference value is extracted to serve as the target melody of the target image, and the N generated melodies can be effectively screened by the method, so that the melody closest to the existing music melody creation style is obtained, and the melody creation quality is improved.
The embodiment of the present invention further provides an apparatus for converting an image into a melody, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor implements the method for converting an image into a melody when executing the computer program.
Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory and executed by the processor to implement the invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution process of the computer program in the image-to-melody converting apparatus. For example, the computer program may be divided into a color clustering module 1, a normalization processing module 2, a mapping relationship establishing module 3, a dominant hue extracting module 4, an instrument type determining module 5, and a first melody generating module 6 shown in fig. 2, and the specific functions of the modules are as follows: the color clustering module 1 is used for acquiring an HSB value of each pixel point in a target image, and performing color clustering processing on each pixel point of the target image according to the HSB value to obtain a color clustering image corresponding to the target image; the normalization processing module 2 is used for performing normalization processing on each color block in the color clustering image to obtain a sound point image corresponding to the target image; a mapping relation establishing module 3, configured to map the pronunciation point image into a pre-established grid, and establish a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid; a dominant hue extraction module 4, configured to extract a dominant hue of the target image according to the color cluster image; the instrument type determining module 5 is used for determining the type of the playing instrument according to the dominant tone of the target image and a preset tone instrument comparison table; and the first melody generating module 6 is configured to convert the scale corresponding to the pronunciation point in the pronunciation point image into audio by using the virtual instrument corresponding to the type of the musical instrument to generate the melody corresponding to the target image according to the scale and the time point corresponding to the pronunciation point in the pronunciation point image.
The image-to-melody conversion device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing equipment. The image-to-melody converting device may include, but is not limited to, a processor, a memory. It will be appreciated by a person skilled in the art that the schematic diagram is merely an example of the image-to-melody converting apparatus and does not constitute a limitation to the image-to-melody converting apparatus, and may comprise more or less components than those shown, or some components in combination, or different components, for example, the image-to-melody converting apparatus may further comprise an input-output device, a network access device, a bus, etc.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, said processor being the control center of said image to melody converting device, the various parts of the whole image to melody converting device being connected by means of various interfaces and lines.
The memory may be used to store the computer program and/or module, and the processor may implement various functions of the image-to-melody converting apparatus by operating or executing the computer program and/or module stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Wherein, the module/unit integrated with the image-to-melody converting device can be stored in a computer readable storage medium if it is implemented in the form of software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.
The embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the above method for converting an image into a melody.
Compared with the prior art, the method for converting the image into the melody provided by the embodiment of the invention has the beneficial effects that: the method for converting the image into the melody comprises the following steps: acquiring an HSB value of each pixel point in a target image, and performing color clustering processing on each pixel point of the target image according to the HSB value to obtain a color clustering image corresponding to the target image; normalizing the color blocks in the color clustering image to obtain a phonation point image corresponding to the target image; mapping the pronunciation point image to a pre-established grid, and establishing a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid; extracting the dominant hue of the target image according to the color clustering image; determining the type of the playing musical instrument according to the main tone of the target image and a preset tone musical instrument comparison table; and extracting the scale corresponding to the pronunciation point in the pronunciation point image along the set direction of the grid according to the mapping relation, converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by adopting the virtual instrument corresponding to the type of the played instrument, and generating the melody corresponding to the target image. By the method, the target image can be converted into a section of specific music melody, so that the time length and the cost for manufacturing the music melody are greatly reduced, and the customization requirement of people on the music melody is met. The embodiment of the invention also provides a device for converting the image into the melody and a computer readable storage medium.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (9)

1. A method for converting an image into a melody, comprising:
acquiring an HSB value of each pixel point in a target image, and performing color clustering processing on each pixel point of the target image according to the HSB value to obtain a color clustering image corresponding to the target image;
normalizing the color patches in the color clustering image to obtain a phonation point image corresponding to the target image, which specifically comprises:
acquiring a color block with the minimum area in the color clustering image, and setting the color block with the minimum area as a sound producing point;
adjusting other color blocks in the color clustering images to be integral multiples of the phonation points;
generating a phonation point image according to phonation points corresponding to the color blocks in the color clustering image;
mapping the pronunciation point image to a pre-established grid, and establishing a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid;
extracting the dominant hue of the target image according to the color clustering image;
determining the type of the playing musical instrument according to the main tone of the target image and a preset tone musical instrument comparison table;
and extracting the scale corresponding to the pronunciation point in the pronunciation point image along the set direction of the grid according to the mapping relation, converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by adopting the virtual instrument corresponding to the type of the played instrument, and generating the melody corresponding to the target image.
2. The method as claimed in claim 1, wherein the method further comprises:
collecting cover images corresponding to musical compositions of a plurality of playing musical instruments;
extracting HSB values of all pixel points in any one cover image, and carrying out color clustering processing on all pixel points of any one cover image according to the HSB values to obtain a template color clustering image corresponding to any one cover image, and obtaining N template color clustering images in total;
calculating the area ratio of each color patch in the template color clustering image, and obtaining the corresponding dominant hue and dominant hue area ratio of the template color clustering image as the color distribution of the template color clustering image;
and carrying out statistical analysis on the color distribution of the N template color cluster images and the playing musical instruments corresponding to the template color cluster images, establishing a mapping relation between the color distribution of the template color cluster images and the playing musical instruments corresponding to the template color cluster images, and generating the tonal musical instrument comparison table.
3. The method for converting an image into a melody according to claim 2, wherein said determining the type of playing musical instrument according to the dominant hue of the target image and a preset hue musical instrument look-up table comprises:
calculating the area ratio of each color patch in the color clustering image corresponding to the target image to obtain the dominant hue area ratio corresponding to the dominant hue of the target image;
comparing the keytone and the keytone area ratio of the target image with a plurality of color distribution ratios in the tone instrument comparison table, and determining the playing instrument corresponding to the color distribution with the smallest difference between the keytone and the keytone area ratio of the target image in the tone instrument comparison table as the type of the playing instrument corresponding to the keytone in the color cluster image;
and determining the volume ratio of the musical instruments corresponding to the color blocks in the color cluster image according to the dominant hue of the target image and the dominant hue area ratio.
4. The method for converting an image into a melody according to claim 1, wherein the obtaining of the HSB value of each pixel in the target image and the color clustering processing of each pixel of the target image according to the HSB value to obtain the color cluster image corresponding to the target image specifically comprise:
acquiring an HSB value of each pixel point in a target image;
according to the HSB value of each pixel point in the target image, acquiring the pixel points of which the hue distance exceeds a first threshold value in the target image, and acquiring a plurality of color mutation areas;
calculating the average hue value of adjacent pixel points of which the difference value of the HSB values in the color mutation area is smaller than a second threshold value, and aggregating the adjacent pixel points into color blocks corresponding to the average hue value;
and when the hue distance of the adjacent pixel points in the color mutation area is zero, generating the color clustering image according to the polymerized color block.
5. The method for converting an image into a melody according to claim 1, wherein said mapping said sound point image to a pre-established grid to establish a mapping relationship between each sound point in said sound point image and each scale in said grid comprises:
setting the area of the square grid and establishing the grid according to the area of the phonation point and a preset proportion; each row of the grid corresponds to a scale, and each column of the grid corresponds to a time point;
mapping each sound point in the sound point image to the grid;
when the phonation points are distributed on the grid lines of the grid, respectively calculating the area occupation ratio of the phonation points in the adjacent grids connected with the grid lines, and distributing the phonation points to one grid with the larger area occupation ratio of the phonation points in the adjacent grids;
and establishing a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid according to the position of each pronunciation point in the pronunciation point image in the grid and the scale corresponding to each row in the grid.
6. The method according to claim 5, wherein the extracting the scale corresponding to the pronunciation point in the pronunciation point image along the set direction of the grid according to the mapping relationship, and converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by using the virtual instrument corresponding to the type of the playing instrument to generate the melody corresponding to the target image comprises:
the set direction is a time axis direction formed by time points corresponding to each row of the grid;
extracting a scale corresponding to the pronunciation point in the pronunciation point image according to the mapping relation and the time axis direction corresponding to the grids;
when a plurality of sound producing points are positioned in any row of adjacent grids in the grid, the sound producing points are adjusted to be long tones of the scale corresponding to the any row;
extracting time points corresponding to the sound points in the sound point images according to the time axis direction;
and converting the scale corresponding to the sound producing point in the sound producing point image into audio by adopting a virtual instrument corresponding to the type of the played instrument according to the scale and the time point corresponding to the sound producing point in the sound producing point image, and generating the melody corresponding to the target image.
7. The method for image-to-melody conversion according to claim 6, wherein said extracting the scale corresponding to the sound generation point in the sound generation point image along the set direction of the mesh according to the mapping relationship, and converting the scale corresponding to the sound generation point in the sound generation point image into audio using the virtual instrument corresponding to the type of the musical instrument being played, to generate the melody corresponding to the target image, further comprises:
adjusting scales corresponding to each row of the grid, reestablishing the mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid, and regenerating the melody corresponding to the target image by adopting the virtual musical instrument to obtain N melodies corresponding to the target image;
respectively converting the N melodies corresponding to the target image into oscillograms to obtain N oscillograms in total;
respectively calculating the similarity between any one oscillogram and a plurality of template oscillograms pre-stored in a oscillogram template database, and extracting the maximum value of the similarity of any one oscillogram relative to the plurality of template oscillograms as the reference value of any one oscillogram;
extracting a waveform diagram corresponding to a maximum reference value from the N waveform diagrams;
and extracting the melody corresponding to the waveform diagram corresponding to the maximum reference value as the target melody of the target image.
8. An apparatus for image-to-melody conversion, comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the method of image-to-melody conversion according to any of claims 1 to 7 when executing the computer program.
9. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus on which the computer-readable storage medium is located to perform the method for converting an image into a melody according to any one of claims 1 to 7.
CN201810427683.7A 2018-05-07 2018-05-07 Method and device for converting image into melody and computer readable storage medium Active CN108960250B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810427683.7A CN108960250B (en) 2018-05-07 2018-05-07 Method and device for converting image into melody and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810427683.7A CN108960250B (en) 2018-05-07 2018-05-07 Method and device for converting image into melody and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN108960250A CN108960250A (en) 2018-12-07
CN108960250B true CN108960250B (en) 2020-08-25

Family

ID=64498915

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810427683.7A Active CN108960250B (en) 2018-05-07 2018-05-07 Method and device for converting image into melody and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN108960250B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109872340B (en) * 2019-01-03 2023-06-27 广东智媒云图科技股份有限公司 Composition method, electronic device and computer readable storage medium
CN110322520A (en) * 2019-07-04 2019-10-11 厦门美图之家科技有限公司 Image key color extraction method, apparatus, electronic equipment and storage medium
CN112652030B (en) * 2020-12-11 2023-09-19 浙江工商大学 Color space position layout recommendation method based on specific scene
CN113160781B (en) * 2021-04-12 2023-11-17 广州酷狗计算机科技有限公司 Audio generation method, device, computer equipment and storage medium
CN113885829B (en) * 2021-10-25 2023-10-31 北京字跳网络技术有限公司 Sound effect display method and terminal equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005165194A (en) * 2003-12-05 2005-06-23 Nippon Hoso Kyokai <Nhk> Music data converter and music data conversion program
CN1862656A (en) * 2005-05-13 2006-11-15 杭州波导软件有限公司 Method for converting musci score to music output and apparatus thereof
KR20070059253A (en) * 2005-12-06 2007-06-12 최종민 The method for transforming the language into symbolic melody
KR20080083433A (en) * 2007-03-12 2008-09-18 주식회사 하모니칼라시스템 Method and apparatus for converting image to sound
CN102289778A (en) * 2011-05-10 2011-12-21 南京大学 Method for converting image into music
CN104918059A (en) * 2015-05-19 2015-09-16 京东方科技集团股份有限公司 Method and device for image transmission and terminal device
CN105205047A (en) * 2015-09-30 2015-12-30 北京金山安全软件有限公司 Playing method, converting method and device of musical instrument music score file and electronic equipment
CN106203465A (en) * 2016-06-24 2016-12-07 百度在线网络技术(北京)有限公司 A kind of method and device generating the music score of Chinese operas based on image recognition
CN107239482A (en) * 2017-04-12 2017-10-10 中国科学院光电研究院 A kind of processing method and server for converting the image into music
CN107967476A (en) * 2017-12-05 2018-04-27 北京工业大学 A kind of method that image turns sound

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005165194A (en) * 2003-12-05 2005-06-23 Nippon Hoso Kyokai <Nhk> Music data converter and music data conversion program
CN1862656A (en) * 2005-05-13 2006-11-15 杭州波导软件有限公司 Method for converting musci score to music output and apparatus thereof
KR20070059253A (en) * 2005-12-06 2007-06-12 최종민 The method for transforming the language into symbolic melody
KR20080083433A (en) * 2007-03-12 2008-09-18 주식회사 하모니칼라시스템 Method and apparatus for converting image to sound
CN102289778A (en) * 2011-05-10 2011-12-21 南京大学 Method for converting image into music
CN104918059A (en) * 2015-05-19 2015-09-16 京东方科技集团股份有限公司 Method and device for image transmission and terminal device
CN105205047A (en) * 2015-09-30 2015-12-30 北京金山安全软件有限公司 Playing method, converting method and device of musical instrument music score file and electronic equipment
CN106203465A (en) * 2016-06-24 2016-12-07 百度在线网络技术(北京)有限公司 A kind of method and device generating the music score of Chinese operas based on image recognition
CN107239482A (en) * 2017-04-12 2017-10-10 中国科学院光电研究院 A kind of processing method and server for converting the image into music
CN107967476A (en) * 2017-12-05 2018-04-27 北京工业大学 A kind of method that image turns sound

Also Published As

Publication number Publication date
CN108960250A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
CN108960250B (en) Method and device for converting image into melody and computer readable storage medium
CN108805171B (en) Method, device and computer readable storage medium for converting image to music melody
CN108898643B (en) Image generation method, device and computer readable storage medium
CN108615253B (en) Image generation method, device and computer readable storage medium
US20050109194A1 (en) Automatic musical composition classification device and method
CN110444185B (en) Music generation method and device
CN109472832B (en) Color scheme generation method and device and intelligent robot
CN108876871A (en) Image processing method, device and computer readable storage medium based on circle fitting
CN109656366B (en) Emotional state identification method and device, computer equipment and storage medium
KR20150112048A (en) music-generation method based on real-time image
CN114764768A (en) Defect detection and classification method and device, electronic equipment and storage medium
CN106997769B (en) Trill recognition method and device
CN108962231A (en) A kind of method of speech classification, device, server and storage medium
CN109697083B (en) Fixed-point acceleration method and device for data, electronic equipment and storage medium
US20220284720A1 (en) Method for grouping cells according to density and electronic device employing method
CN111444379A (en) Audio feature vector generation method and audio segment representation model training method
CN110969141A (en) Music score generation method and device based on audio file identification and terminal equipment
CN109859284B (en) Dot-based drawing implementation method and system
CN109191539B (en) Oil painting generation method and device based on image and computer readable storage medium
CN108492347A (en) Image generating method, device and computer readable storage medium
CN110874567B (en) Color value judging method and device, electronic equipment and storage medium
CN115914772A (en) Video synthesis method and device, electronic equipment and storage medium
CN111429949B (en) Pitch line generation method, device, equipment and storage medium
CN113093967A (en) Data generation method, data generation device, computer device, and storage medium
Wu et al. A study of image-based music composition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant