CN108960250B

CN108960250B - Method and device for converting image into melody and computer readable storage medium

Info

Publication number: CN108960250B
Application number: CN201810427683.7A
Authority: CN
Inventors: 邓立邦
Original assignee: Guangdong Matview Intelligent Science & Technology Co ltd
Current assignee: Guangdong Matview Intelligent Science & Technology Co ltd
Priority date: 2018-05-07
Filing date: 2018-05-07
Publication date: 2020-08-25
Anticipated expiration: 2038-05-07
Also published as: CN108960250A

Abstract

The invention provides a method, a device and a computer readable storage medium for converting an image into a melody, wherein the method comprises the following steps: acquiring HSB values of all pixel points in a target image, and performing color clustering processing to obtain a color clustering image corresponding to the target image; carrying out normalization processing on each color block in the color clustering image to obtain a pronunciation point image corresponding to the target image, mapping the pronunciation point image to a pre-established grid, and establishing a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid; and extracting the scale corresponding to the sound production point in the sound production point image along the set direction of the grid according to the mapping relation, and converting the scale corresponding to the sound production point in the sound production point image into audio by adopting the virtual instrument corresponding to the playing instrument determined by the color clustering image so as to generate the melody corresponding to the target image. The method can convert the target image into a section of specific melody, greatly reduce the time length and cost for making the music melody and meet the customization requirement of people on the music melody.

Description

Method and device for converting image into melody and computer readable storage medium

Technical Field

The invention relates to the technical field of image and music processing, in particular to a method and a device for converting an image into a melody and a computer readable storage medium.

Background

Music is a form of expression of human emotion, and melody is the most basic element constituting music, and a music artist completes music creation by creating melody. With the continuous development of digital music and computer-related technologies, more and more people want to use computer technology to automatically compose music to meet personalized requirements, such as matching a section of unique background music with a section of photographed video, matching a section of melody for browsing a group of photos, setting a unique and personalized incoming ringtone without two for a mobile phone, etc., however, it is very difficult for general people to compose a beautiful melody and music belonging to the general people, and the current music composition needs a special computer device and system, which is high in cost, time-consuming and complex in operation, and especially has a very high learning cost for common users, and is not easy to use.

Disclosure of Invention

The invention aims to provide an image-to-melody conversion method, device and computer readable storage medium, which can convert a target image into a specific music melody, greatly reduce the time length and cost for making the music melody and meet the customization requirement of people for the music melody.

The embodiment of the invention provides a method for converting an image into a melody, which comprises the following steps:

acquiring an HSB value of each pixel point in a target image, and performing color clustering processing on each pixel point of the target image according to the HSB value to obtain a color clustering image corresponding to the target image;

normalizing the color blocks in the color clustering image to obtain a phonation point image corresponding to the target image;

mapping the pronunciation point image to a pre-established grid, and establishing a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid;

extracting the dominant hue of the target image according to the color clustering image;

determining the type of the playing musical instrument according to the main tone of the target image and a preset tone musical instrument comparison table;

and extracting the scale corresponding to the pronunciation point in the pronunciation point image along the set direction of the grid according to the mapping relation, converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by adopting the virtual instrument corresponding to the type of the played instrument, and generating the melody corresponding to the target image.

Preferably, the method for converting an image into a melody further comprises:

collecting cover images corresponding to musical compositions of a plurality of playing musical instruments;

extracting HSB values of all pixel points in any one cover image, and carrying out color clustering processing on all pixel points of any one cover image according to the HSB values to obtain a template color clustering image corresponding to any one cover image, and obtaining N template color clustering images in total;

calculating the area ratio of each color patch in the template color clustering image, and obtaining the corresponding dominant hue and dominant hue area ratio of the template color clustering image as the color distribution of the template color clustering image;

and carrying out statistical analysis on the color distribution of the N template color cluster images and the playing musical instruments corresponding to the template color cluster images, establishing a mapping relation between the color distribution of the template color cluster images and the playing musical instruments corresponding to the template color cluster images, and generating the tonal musical instrument comparison table.

Preferably, the determining the type of the playing instrument according to the dominant hue of the target image and a preset hue instrument comparison table specifically includes:

calculating the area ratio of each color patch in the color clustering image corresponding to the target image to obtain the dominant hue area ratio corresponding to the dominant hue of the target image;

comparing the keytone and the keytone area ratio of the target image with a plurality of color distribution ratios in the tone instrument comparison table, and determining the playing instrument corresponding to the color distribution with the smallest difference between the keytone and the keytone area ratio of the target image in the tone instrument comparison table as the type of the playing instrument corresponding to the keytone in the color cluster image;

and determining the volume ratio of the musical instruments corresponding to the color blocks in the color cluster image according to the dominant hue of the target image and the dominant hue area ratio.

Preferably, the obtaining of the HSB value of each pixel point in the target image and the color clustering processing of each pixel point of the target image according to the HSB value to obtain the color clustering image corresponding to the target image specifically include:

acquiring an HSB value of each pixel point in a target image;

according to the HSB value of each pixel point in the target image, acquiring the pixel points of which the hue distance exceeds a first threshold value in the target image, and acquiring a plurality of color mutation areas;

calculating the average hue value of adjacent pixel points of which the difference value of the HSB values in the color mutation area is smaller than a second threshold value, and aggregating the adjacent pixel points into color blocks corresponding to the average hue value;

and when the hue distance of the adjacent pixel points in the color mutation area is zero, generating the color clustering image according to the polymerized color block.

Preferably, the normalizing the color patches in the color cluster image to obtain the phonation point image corresponding to the target image specifically includes:

acquiring a color block with the minimum area in the color clustering image, and setting the color block with the minimum area as a sound producing point;

adjusting other color blocks in the color clustering images to be integral multiples of the phonation points;

and generating the phonation point image according to the phonation points corresponding to the color blocks in the color clustering image.

Preferably, the mapping the pronunciation point image to a pre-established grid and establishing a mapping relationship between each pronunciation point in the pronunciation point image and each scale in the grid specifically include:

setting the area of the square grid and establishing the grid according to the area of the phonation point and a preset proportion; each row of the grid corresponds to a scale, and each column of the grid corresponds to a time point;

mapping each sound point in the sound point image to the grid;

when the phonation points are distributed on the grid lines of the grid, respectively calculating the area occupation ratio of the phonation points in the adjacent grids connected with the grid lines, and distributing the phonation points to one grid with the larger area occupation ratio of the phonation points in the adjacent grids;

and establishing a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid according to the position of each pronunciation point in the pronunciation point image in the grid and the scale corresponding to each row in the grid.

Preferably, the extracting, according to the mapping relationship, the scale corresponding to the pronunciation point in the pronunciation point image along the set direction of the mesh, and converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by using the virtual instrument corresponding to the type of the musical instrument to generate the melody corresponding to the target image specifically includes:

the set direction is a time axis direction formed by time points corresponding to each row of the grid;

extracting a scale corresponding to the pronunciation point in the pronunciation point image according to the mapping relation and the time axis direction corresponding to the grids;

when a plurality of sound producing points are positioned in any row of adjacent grids in the grid, the sound producing points are adjusted to be long tones of the scale corresponding to the any row;

extracting time points corresponding to the sound points in the sound point images according to the time axis direction;

and converting the scale corresponding to the sound producing point in the sound producing point image into audio by adopting a virtual instrument corresponding to the type of the played instrument according to the scale and the time point corresponding to the sound producing point in the sound producing point image, and generating the melody corresponding to the target image.

Preferably, the extracting, according to the mapping relationship, a scale corresponding to a pronunciation point in the pronunciation point image along a set direction of the mesh, and converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by using a virtual instrument corresponding to the type of the musical instrument to generate a melody corresponding to the target image, further includes:

adjusting scales corresponding to each row of the grid, reestablishing the mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid, and regenerating the melody corresponding to the target image by adopting the virtual musical instrument to obtain N melodies corresponding to the target image;

respectively converting the N melodies corresponding to the target image into oscillograms to obtain N oscillograms in total;

respectively calculating the similarity between any one oscillogram and a plurality of template oscillograms pre-stored in a oscillogram template database, and extracting the maximum value of the similarity of any one oscillogram relative to the plurality of template oscillograms as the reference value of any one oscillogram;

extracting a waveform diagram corresponding to a maximum reference value from the N waveform diagrams;

and extracting the melody corresponding to the waveform diagram corresponding to the maximum reference value as the target melody of the target image.

The embodiment of the present invention further provides an apparatus for converting an image into a melody, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor implements the method for converting an image into a melody when executing the computer program.

The embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the above method for converting an image into a melody.

Compared with the prior art, the method for converting the image into the melody provided by the embodiment of the invention has the beneficial effects that: the method for converting the image into the melody comprises the following steps: acquiring an HSB value of each pixel point in a target image, and performing color clustering processing on each pixel point of the target image according to the HSB value to obtain a color clustering image corresponding to the target image; normalizing the color blocks in the color clustering image to obtain a phonation point image corresponding to the target image; mapping the pronunciation point image to a pre-established grid, and establishing a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid; extracting the dominant hue of the target image according to the color clustering image; determining the type of the playing musical instrument according to the main tone of the target image and a preset tone musical instrument comparison table; and extracting the scale corresponding to the pronunciation point in the pronunciation point image along the set direction of the grid according to the mapping relation, converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by adopting the virtual instrument corresponding to the type of the played instrument, and generating the melody corresponding to the target image. By the method, the target image can be converted into a section of specific music melody, so that the time length and the cost for manufacturing the music melody are greatly reduced, and the customization requirement of people on the music melody is met. The embodiment of the invention also provides a device for converting the image into the melody and a computer readable storage medium.

Drawings

FIG. 1 is a flowchart illustrating a method for converting an image into a melody according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating an apparatus for converting an image into a melody according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Please refer to fig. 1, which is a flowchart illustrating a method for image-to-melody conversion according to an embodiment of the present invention, the method comprising:

s100: acquiring an HSB value of each pixel point in a target image, and performing color clustering processing on each pixel point of the target image according to the HSB value to obtain a color clustering image corresponding to the target image;

s200: normalizing the color blocks in the color clustering image to obtain a phonation point image corresponding to the target image;

s300: mapping the pronunciation point image to a pre-established grid, and establishing a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid;

s400: extracting the dominant hue of the target image according to the color clustering image;

s500: determining the type of the playing musical instrument according to the main tone of the target image and a preset tone musical instrument comparison table;

s600: and extracting the scale corresponding to the pronunciation point in the pronunciation point image along the set direction of the grid according to the mapping relation, converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by adopting the virtual instrument corresponding to the type of the played instrument, and generating the melody corresponding to the target image.

In the embodiment, a pronunciation point image is obtained after color clustering and normalization processing is carried out on a target image, the pronunciation point image is mapped into a preset grid, a mapping relation between a pronunciation point and a scale is established, and then the type of a musical instrument is determined according to a dominant tone in the color clustering image; the virtual musical instrument corresponding to the type of the playing musical instrument is adopted through the mapping relation, the target image can be converted into a section of specific music melody according to the time axis direction of the grid, the duration and the cost of music melody making are greatly reduced, the difficulty of music making is reduced, and the customization requirements of people on the music melody are met, so that the method has wide application prospects in the aspects of personalized mobile phone ring, electronic album background music, screen protection background music, film and television composition music and the like.

Wherein, the step S400: extracting the dominant hue of the target image according to the color clustering image, which specifically comprises the following steps: and carrying out color clustering on the color clustering image by adopting a clustering algorithm, specifically, partitioning the main color tone into blocks, continuously taking an average value of adjacent points with approximate HSB (hue-of-saturation) values and aggregating the points into the same color block, and processing the target image into color block combinations of various main color tones, such as triangle, circle, rectangle and other graphic combinations, so as to obtain the color block combination of the main color tone of the target image. And extracting the color value of each color block, namely the dominant hue of the target image, and respectively calculating the area ratio of each color block in the target image. Further, the area ratio of the color blocks in the color cluster image is larger than a set threshold value, and the color blocks are determined as the main tone of the target image.

In an optional embodiment, the method for converting an image into a melody further comprises:

In this embodiment, a large number of cover images corresponding to musical compositions (e.g., musical compositions such as CDs, DVDs, digital sound sources, etc.) of musical instruments are collected, HSB values of the cover images are extracted, points in the cover images with adjacent HSB values having close color differences are averaged and aggregated, the collected cover images are subjected to color clustering, the cover images are processed into different color block combination areas, color values of the color blocks are extracted, and area ratios of the color blocks in the cover images are calculated, so that dominant hues and the area ratios of the cover images are obtained. Through statistical rule analysis, the rule of the dominant hue and area ratio of different playing instruments and the cover image is obtained, and a large amount of statistical data of the dominant hue and the area ratio of each dominant hue of the corresponding cover image of the playing instruments, namely the color distribution of the cover image, are obtained.

In an alternative embodiment, S500: determining the type of the playing musical instrument according to the main tone of the target image and a preset tone musical instrument comparison table, wherein the method specifically comprises the following steps:

In the present embodiment, specifically, the keytone of the target image for generating the musical melody is extracted, and the type of the corresponding musical instrument to be played and the volume of the musical instrument to be played are determined based on the keytone of the target image and the area ratio of the keytone in the target image. Because the color distribution condition of each picture is different, some expressions have more contents and richer color distribution, some expressions have less contents and more single color distribution, a threshold value is set to determine the number of the dominant colors of the target image, and the playing musical instruments are determined to be used singly or in combination according to the area ratio of the dominant colors. Comparing the obtained dominant hue and dominant hue area of the target image, comparing the data of the corresponding relation between various playing instruments and color distribution which are counted in advance and stored in the server, namely the tonal instrument comparison table, finding out the playing instrument corresponding to the color distribution closest to the dominant hue combination of the target image, and obtaining the playing instrument combination mode of the melody to be generated. For example, the target image includes M dominant colors, and based on the M dominant colors and the dominant color area ratios corresponding to the M dominant colors, corresponding M musical instruments can be determined and used to combine and generate the melody.

For example, when the area ratio of a certain color patch in the target image reaches 80% or more, a single musical instrument is used for playing. For another example, when the ratio of the dominant hue to the dominant hue distribution corresponding to the target image is: the mapping relation between various playing instruments and color distribution can be known according to the tone instrument comparison table, so that the playing instrument types corresponding to all the keytones in the target image are obtained, at the moment, the playing instruments corresponding to 40%, 30%, 20% and 10% keytones are simultaneously used for carrying out ensemble on the scales marked in the grid, and the corresponding volume is also distributed according to the proportion of the keytone area.

In an alternative embodiment, S100: the method comprises the steps of obtaining an HSB value of each pixel point in a target image, carrying out color clustering processing on each pixel point of the target image according to the HSB value, and obtaining a color clustering image corresponding to the target image, wherein the method specifically comprises the following steps:

acquiring an HSB value of each pixel point in a target image;

In this embodiment, the range of the first threshold is 60 degrees to 130 degrees, and preferably, the first threshold is 60 degrees. The second threshold is 15 degrees. For example, when the hue distance between two pixel points in the target image exceeds 60 degrees, the color abrupt change region is determined. After the color mutation area is found out, adjacent pixel points in the target image are continuously analyzed, and the adjacent pixel points with the close HSB values are averaged and aggregated into a color block, for example, the HSB values of the adjacent pixel point A, B are respectively: the HSB values of the points A are H42 degrees, S43 degrees and B21 degrees, the HSB values of the points B are H38 degrees, S42 degrees and B25 degrees, the H values of the points A and B are A42 degrees and B38 degrees, the hue distance is within 15 degrees, the average values of the hue values of the points A and B are converged into a color block with a hue value of H40 degrees, different HSB values of adjacent points are repeatedly selected to analyze and calculate the hue average value until the hue average values of the adjacent points with the HSB values close to color difference are converged, finally, the target image is processed into a plurality of different color blocks, and the color cluster image is generated.

In an alternative embodiment, S200: normalizing the color patches in the color clustering image to obtain a phonation point image corresponding to the target image, which specifically comprises:

In an alternative embodiment, S300: mapping the pronunciation point image to a pre-established grid, and establishing a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid, which specifically comprises the following steps:

mapping each sound point in the sound point image to the grid;

In an alternative embodiment, S400: extracting the scale corresponding to the pronunciation point in the pronunciation point image along the set direction of the grid according to the mapping relation, converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by adopting the virtual instrument corresponding to the type of the played instrument, and generating the melody corresponding to the target image, wherein the steps specifically comprise:

In an optional embodiment, the extracting, according to the mapping relationship, a scale corresponding to a pronunciation point in the pronunciation point image along a set direction of the mesh, and converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by using a virtual instrument corresponding to the type of the playing instrument to generate a melody corresponding to the target image, and then further includes:

In this embodiment, the scale corresponding to each row of the mesh may be adjusted, the mapping relationship between each pronunciation point in the pronunciation point image and each scale in the mesh may be reestablished, and the melody corresponding to the target image may be regenerated, so that a plurality of melodies may be generated according to the mesh. Each music style has a unique scale combination, and the melody created according to the scales in the unique scale combinations can have the characteristics of the national music, so that the scales of each row in the grid are set according to the creation style, and the created melody has a specific music style. For example, the chinese five-tone scale contains the following tones: 123561, respectively; japanese six-tone scale, contains the following tones: 6712346, respectively; romania minor scale, comprising the following tones: 671 #234# 56. By changing the scale combination of each row in the grid, different styles of music melodies can be created. Therefore, N melodies corresponding to the target image can be obtained, then the N melodies are converted into oscillograms and are matched with a plurality of template oscillograms, the maximum value of the similarity of each oscillogram relative to the plurality of template oscillograms is extracted and is used as a reference value of each oscillogram, and then each oscillogram corresponds to one reference value; the sizes of the reference values of the waveform images are compared to obtain the waveform image corresponding to the maximum reference value, the melody corresponding to the waveform image corresponding to the maximum reference value is extracted to serve as the target melody of the target image, and the N generated melodies can be effectively screened by the method, so that the melody closest to the existing music melody creation style is obtained, and the melody creation quality is improved.

Please refer to fig. 2, which is a diagram illustrating an apparatus for converting an image into a melody according to an embodiment of the present invention, the apparatus comprising:

the color clustering module 1 is used for acquiring an HSB value of each pixel point in a target image, and performing color clustering processing on each pixel point of the target image according to the HSB value to obtain a color clustering image corresponding to the target image;

the normalization processing module 2 is used for performing normalization processing on each color block in the color clustering image to obtain a sound point image corresponding to the target image;

a mapping relation establishing module 3, configured to map the pronunciation point image into a pre-established grid, and establish a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid;

a dominant hue extraction module 4, configured to extract a dominant hue of the target image according to the color cluster image;

the instrument type determining module 5 is used for determining the type of the playing instrument according to the dominant tone of the target image and a preset tone instrument comparison table;

and the first melody generating module 6 is configured to extract a scale corresponding to the pronunciation point in the pronunciation point image along the set direction of the mesh according to the mapping relationship, convert the scale corresponding to the pronunciation point in the pronunciation point image into audio by using a virtual instrument corresponding to the type of the musical instrument, and generate a melody corresponding to the target image.

In the embodiment, after color clustering and normalization processing are performed on the target image, the pronunciation point image is obtained, the pronunciation point image is mapped into a preset grid, a mapping relation between the pronunciation point and the scale is established, the target image can be converted into a specific music melody according to the time axis direction of the grid through the mapping relation, the duration and the cost of music melody making are greatly reduced, the difficulty of music making is reduced, the customization requirements of people on the music melody are met, and therefore the device has wide application prospects in the aspects of personalized mobile phone ring, electronic album background music, screen protection background music, film and television work music and the like.

The dominant hue extraction module 4 is mainly configured to extract the dominant hue of the target image according to the color cluster image, and specifically includes: and carrying out color clustering on the color clustering image by adopting a clustering algorithm, specifically, partitioning the main color tone into blocks, continuously taking an average value of adjacent points with approximate HSB (hue-of-saturation) values and aggregating the points into the same color block, and processing the target image into color block combinations of various main color tones, such as triangle, circle, rectangle and other graphic combinations, so as to obtain the color block combination of the main color tone of the target image. And extracting the color value of each color block, namely the dominant hue of the target image, and respectively calculating the area ratio of each color block in the target image. Further, the area ratio of the color blocks in the color cluster image is larger than a set threshold value, and the color blocks are determined as the main tone of the target image.

In an alternative embodiment, the apparatus for converting image into melody further comprises:

the cover image acquisition module is used for acquiring cover images corresponding to musical compositions of a plurality of playing musical instruments;

the template clustering image generation module is used for extracting the HSB value of each pixel point in any cover image, carrying out color clustering processing on each pixel point of any cover image according to the HSB value, obtaining a template color clustering image corresponding to any cover image, and obtaining N template color clustering images in total;

the color distribution calculation module is used for calculating the area ratio of each color patch in the template color clustering image, and obtaining the corresponding dominant hue and dominant hue area ratio of the template color clustering image as the color distribution of the template color clustering image;

and the tonal instrument comparison table generation module is used for performing statistical analysis on the color distribution of the N template color cluster images and the playing instruments corresponding to the template color cluster images, establishing a mapping relation between the color distribution of the template color cluster images and the playing instruments corresponding to the template color cluster images, and generating the tonal instrument comparison table.

In an alternative embodiment, the instrument type determination module 5 comprises:

the area ratio calculation unit is used for calculating the area ratio of each color block in the color clustering image corresponding to the target image to obtain the dominant hue area ratio corresponding to the dominant hue of the target image;

an area ratio comparing unit configured to compare the keytone and the keytone area ratio of the target image with a plurality of color distribution ratios in the tone instrument comparison table, and determine, as a type of the musical instrument corresponding to the keytone in the color cluster image, a musical instrument corresponding to a color distribution in the tone instrument comparison table having a smallest difference between the keytone and the keytone area ratio of the target image;

and the volume distribution unit is used for determining the volume ratio of the musical instruments corresponding to the color blocks in the color cluster image according to the dominant hue of the target image and the dominant hue area ratio.

In an alternative embodiment, the color clustering module 1 comprises: the device comprises an HSB value acquisition unit, a color mutation acquisition unit, a color block polymerization unit and a color clustering image generation unit;

the HSB value acquisition unit is used for acquiring the HSB value of each pixel point in the target image;

the color mutation obtaining unit is used for obtaining pixel points with hue distances exceeding a first threshold value in the target image according to the HSB value of each pixel point in the target image and obtaining a plurality of color mutation areas;

the color lump polymerization unit is used for calculating the hue average value of the adjacent pixel points of which the difference value of the HSB values in the color mutation area is smaller than a second threshold value, and polymerizing the adjacent pixel points into a color lump corresponding to the hue average value;

and the color cluster image generation unit is used for generating the color cluster image according to the aggregated color block when the hue distance of the adjacent pixel points in the color mutation area is zero.

In an alternative embodiment, the normalization processing module 2 comprises: a sound point setting unit, a sound point adjusting unit and a sound point image generating unit;

the phonation point setting unit is used for acquiring a color block with the minimum area in the color clustering image and setting the color block with the minimum area as a phonation point;

the phonation point adjusting unit is used for adjusting other color blocks in the color clustering images to be integral multiples of the phonation points;

and the sound producing point image generating unit is used for generating the sound producing point image according to the sound producing points corresponding to the color blocks in the color clustering image.

In an alternative embodiment, the mapping relationship establishing module 3 includes: the system comprises a grid establishing unit, a mapping unit, a phonemic point distributing unit and a mapping relation establishing unit;

the grid establishing unit is used for setting the area of the square grid and establishing the grid according to the area of the phonation point and a preset proportion; each row of the grid corresponds to a scale, and each column of the grid corresponds to a time point;

the mapping unit is used for mapping each phonation point in the phonation point image to the grid;

the sound producing point distributing unit is used for respectively calculating the area occupation ratio of the sound producing points in adjacent grids connected with the grid lines when the sound producing points are distributed on the grid lines of the grid, and distributing the sound producing points to one grid with larger area occupation ratio of the sound producing points in the adjacent grids;

the mapping relation establishing unit is used for establishing the mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid according to the position of each pronunciation point in the pronunciation point image in the grid and the scale corresponding to each row in the grid.

In an alternative embodiment, the first melody generating module 4 comprises: the musical scale extracting unit, the duration setting unit, the time point extracting unit and the melody generating unit;

the scale extracting unit is used for extracting scales corresponding to the pronunciation points in the pronunciation point image according to the mapping relation and the time axis direction corresponding to the grids;

the note length setting unit is used for adjusting a plurality of pronunciation points to long notes of a scale corresponding to any row when the plurality of pronunciation points are positioned in any row of adjacent grids in the grid;

the time point extracting unit is used for extracting the time point corresponding to the sound point in the sound point image according to the time axis direction;

and the melody generating unit is used for converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by adopting a virtual instrument corresponding to the type of the musical instrument to generate the melody corresponding to the target image according to the scale and the time point corresponding to the pronunciation point in the pronunciation point image.

In an alternative embodiment, the apparatus for converting an image into a melody further comprises:

the grid scale adjusting module is used for adjusting scales corresponding to each line of the grid, reestablishing the mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid, and regenerating the melody corresponding to the target image by adopting the virtual musical instrument to obtain N melodies corresponding to the target image;

the oscillogram generation module is used for respectively converting the N melodies corresponding to the target image into oscillograms to obtain N oscillograms in total;

the similarity calculation module is used for calculating the similarity between any one oscillogram and a plurality of template oscillograms pre-stored in a oscillogram template database respectively, and extracting the maximum value of the similarity of any one oscillogram relative to the plurality of template oscillograms as the reference value of any one oscillogram;

the oscillogram extracting module is used for extracting the oscillogram corresponding to the maximum reference value from the N oscillograms;

and the melody extraction module is used for extracting the melody corresponding to the waveform diagram corresponding to the maximum reference value as the target melody of the target image.

Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory and executed by the processor to implement the invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution process of the computer program in the image-to-melody converting apparatus. For example, the computer program may be divided into a color clustering module 1, a normalization processing module 2, a mapping relationship establishing module 3, a dominant hue extracting module 4, an instrument type determining module 5, and a first melody generating module 6 shown in fig. 2, and the specific functions of the modules are as follows: the color clustering module 1 is used for acquiring an HSB value of each pixel point in a target image, and performing color clustering processing on each pixel point of the target image according to the HSB value to obtain a color clustering image corresponding to the target image; the normalization processing module 2 is used for performing normalization processing on each color block in the color clustering image to obtain a sound point image corresponding to the target image; a mapping relation establishing module 3, configured to map the pronunciation point image into a pre-established grid, and establish a mapping relation between each pronunciation point in the pronunciation point image and each scale in the grid; a dominant hue extraction module 4, configured to extract a dominant hue of the target image according to the color cluster image; the instrument type determining module 5 is used for determining the type of the playing instrument according to the dominant tone of the target image and a preset tone instrument comparison table; and the first melody generating module 6 is configured to convert the scale corresponding to the pronunciation point in the pronunciation point image into audio by using the virtual instrument corresponding to the type of the musical instrument to generate the melody corresponding to the target image according to the scale and the time point corresponding to the pronunciation point in the pronunciation point image.

The image-to-melody conversion device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing equipment. The image-to-melody converting device may include, but is not limited to, a processor, a memory. It will be appreciated by a person skilled in the art that the schematic diagram is merely an example of the image-to-melody converting apparatus and does not constitute a limitation to the image-to-melody converting apparatus, and may comprise more or less components than those shown, or some components in combination, or different components, for example, the image-to-melody converting apparatus may further comprise an input-output device, a network access device, a bus, etc.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, said processor being the control center of said image to melody converting device, the various parts of the whole image to melody converting device being connected by means of various interfaces and lines.

The memory may be used to store the computer program and/or module, and the processor may implement various functions of the image-to-melody converting apparatus by operating or executing the computer program and/or module stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Wherein, the module/unit integrated with the image-to-melody converting device can be stored in a computer readable storage medium if it is implemented in the form of software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A method for converting an image into a melody, comprising:

normalizing the color patches in the color clustering image to obtain a phonation point image corresponding to the target image, which specifically comprises:

generating a phonation point image according to phonation points corresponding to the color blocks in the color clustering image;

2. The method as claimed in claim 1, wherein the method further comprises:

3. The method for converting an image into a melody according to claim 2, wherein said determining the type of playing musical instrument according to the dominant hue of the target image and a preset hue musical instrument look-up table comprises:

4. The method for converting an image into a melody according to claim 1, wherein the obtaining of the HSB value of each pixel in the target image and the color clustering processing of each pixel of the target image according to the HSB value to obtain the color cluster image corresponding to the target image specifically comprise:

acquiring an HSB value of each pixel point in a target image;

5. The method for converting an image into a melody according to claim 1, wherein said mapping said sound point image to a pre-established grid to establish a mapping relationship between each sound point in said sound point image and each scale in said grid comprises:

mapping each sound point in the sound point image to the grid;

6. The method according to claim 5, wherein the extracting the scale corresponding to the pronunciation point in the pronunciation point image along the set direction of the grid according to the mapping relationship, and converting the scale corresponding to the pronunciation point in the pronunciation point image into audio by using the virtual instrument corresponding to the type of the playing instrument to generate the melody corresponding to the target image comprises:

7. The method for image-to-melody conversion according to claim 6, wherein said extracting the scale corresponding to the sound generation point in the sound generation point image along the set direction of the mesh according to the mapping relationship, and converting the scale corresponding to the sound generation point in the sound generation point image into audio using the virtual instrument corresponding to the type of the musical instrument being played, to generate the melody corresponding to the target image, further comprises:

8. An apparatus for image-to-melody conversion, comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the method of image-to-melody conversion according to any of claims 1 to 7 when executing the computer program.

9. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus on which the computer-readable storage medium is located to perform the method for converting an image into a melody according to any one of claims 1 to 7.