CN113192152A

CN113192152A - Audio-based image generation method, electronic device and storage medium

Info

Publication number: CN113192152A
Application number: CN202110566416.XA
Authority: CN
Inventors: 张帅; 陈晓炯; 赖师悦; 张超鹏; 吕培兰; 俞骁; 罗志浩; 邱文杰
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2021-05-24
Filing date: 2021-05-24
Publication date: 2021-07-30
Anticipated expiration: 2041-05-24
Also published as: CN113192152B

Abstract

The application discloses an image generation method, equipment and a computer readable storage medium based on audio, wherein the method comprises the following steps: acquiring a target audio file; carrying out rhythm detection processing on a target audio file to obtain rhythm information; determining a target rendering program from a plurality of rendering programs by using the initialization information or the rhythm information; determining display parameters corresponding to each image texture in each target rendering program by using the rhythm information, wherein the display parameters comprise display area parameters and/or display state parameters; based on the display parameters and the inherent parameters corresponding to the target rendering program, performing image rendering by using the target rendering program to obtain a target image, wherein the inherent parameters comprise inherent color values and/or inherent positions of image textures; because the rhythm information can represent the rhythm condition of the target audio file, the obtained target image can record the rhythm information of the audio and can correspond to the target audio file in the aspect of the rhythm condition.

Description

Audio-based image generation method, electronic device and storage medium

Technical Field

The present disclosure relates to the field of image generation technologies, and in particular, to an image generation method based on audio, an electronic device, and a computer-readable storage medium.

Background

Currently, many software such as music players can display a visual special effect corresponding to an audio when the audio is played. In the related art, a specific waveform of a spectrogram is usually used as a special visual effect, for example, a Fast Fourier Transform (FFT) is performed on an audio signal to obtain a corresponding spectrogram. The waveform of the spectrogram can represent the energy of the audio on each frequency, the distribution of the energy can be visually seen, but the waveform cannot express the rhythm of the audio when being used as a visual special effect, so that the representation of the visual special effect cannot correspond to the audio heard by a user.

Disclosure of Invention

In view of the above, an object of the present application is to provide an audio-based image generation method, an electronic device, and a computer-readable storage medium, which enable an obtained target image to correspond to a target audio file in terms of rhythm condition, by acquiring rhythm information of an audio and performing image rendering using the rhythm information.

In order to solve the above technical problem, in a first aspect, the present application provides an audio-based image generation method, including:

acquiring a target audio file;

carrying out rhythm detection processing on the target audio file to obtain rhythm information;

determining a target rendering program from a plurality of rendering programs by using initialization information or the rhythm information;

determining display parameters corresponding to each image texture in each target rendering program by using the rhythm information, wherein the display parameters comprise display area parameters and/or display state parameters;

and based on the display parameters and the inherent parameters corresponding to the target rendering program, performing image rendering by using the target rendering program to obtain a target image, wherein the inherent parameters comprise inherent color values and/or inherent positions of image textures.

In one embodiment, the determining a target rendering program from a plurality of rendering programs using the initialization information or the tempo information includes:

if the current rendering program does not exist, selecting the target rendering program from the plurality of rendering programs by using the initialization information;

if the current rendering program exists, acquiring and analyzing configuration information to obtain a first base number and/or a second base number;

when the drum point information in the rhythm information is a multiple of the first base number, the target rendering program is obtained again according to the updating sequence;

and/or the presence of a gas in the gas,

and when the measure information in the rhythm information is a multiple of the second base number, re-acquiring the target rendering program according to the updating sequence.

In one embodiment, the rendering an image by using the target rendering program based on the display parameters and the intrinsic parameters corresponding to the target rendering program to obtain a target image includes:

determining a target image texture by using the display state parameters, and determining a display area in the target image texture by using the display area parameters corresponding to the target image texture;

acquiring volume amplitude information, and carrying out normalization processing on the volume amplitude information to obtain normalized amplitude;

obtaining texture color values corresponding to the display areas by using the normalized amplitude and the inherent color values;

and performing image rendering on the display area based on the texture color value, the inherent color value and the inherent position of the image texture to obtain the target image.

In one embodiment, the method further comprises:

acquiring and analyzing configuration information to obtain an amplitude interval;

and when the normalized amplitude is in the amplitude interval, re-acquiring the target rendering program according to the updating sequence.

In one embodiment, the image rendering the display area based on the texture color value, the inherent color value, and the image texture inherent position to obtain the target image includes:

judging whether the inherent color value corresponding to the image texture meets a preset condition or not; the preset condition is a size relation condition between inherent channel color values corresponding to all the color value channels respectively;

and performing image rendering on the display area which meets the preset condition based on the texture color value and the inherent position of the image texture, and performing image rendering on the display area which does not meet the preset condition based on the inherent color value and the inherent position of the image texture to obtain the target image.

In one embodiment, the performing a tempo detection process on the target audio file to obtain tempo information includes:

performing down-sampling processing on the target audio file to obtain a preprocessed audio;

performing framing processing on the preprocessed audio to obtain a plurality of frame signals, and performing windowing processing on each frame signal to obtain intermediate data corresponding to each frame signal;

carrying out Fourier transform and power spectrum calculation by using the intermediate data to obtain a plurality of power spectrums;

performing frame power calculation by using the power spectrum to obtain a power value corresponding to each frame signal;

and carrying out average value calculation and normalization processing by using the power value to obtain drum point information, and determining the drum point information as the rhythm information.

In one embodiment, the performing average value calculation and normalization processing by using the power value to obtain drum information includes:

calculating the average value corresponding to each power value to obtain an average power value;

normalizing the average power value based on a preset interval to obtain initial drum point information;

and correcting the initial drum point information by using a correction function to obtain the drum point information.

extracting attribute information from the target audio file;

and generating bar information by using the attribute information, and determining the bar information as the rhythm information.

In one embodiment, the obtaining the target audio file includes:

acquiring initial audio, and calculating the fragment length by using a video frame rate and an audio parameter corresponding to the target audio file; the audio parameters comprise audio sampling rate, channel number and audio bit depth;

and slicing the initial audio according to the slicing length to obtain a plurality of target audio files.

In one embodiment, the method further comprises:

acquiring basic information corresponding to the target audio file; the basic information is volume amplitude information and/or frequency spectrum information;

correspondingly, the rendering an image by using the target rendering program based on the display parameters and the intrinsic parameters corresponding to the target rendering program to obtain a target image includes:

respectively utilizing the display parameters and the inherent parameters corresponding to the target rendering programs and the basic information to perform initial image rendering to obtain a plurality of initial images;

and performing image fusion processing on the initial image to obtain the target image, wherein the image fusion processing may be color value addition fusion processing or alpha fusion processing.

In a second aspect, the present application further provides an electronic device comprising a memory and a processor, wherein:

the memory is used for storing a computer program;

the processor is configured to execute the computer program to implement the above-mentioned audio-based image generation method.

In a third aspect, the present application further provides a computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the audio-based image generation method described above.

According to the image generation method based on the audio frequency, a target audio frequency file is obtained; carrying out rhythm detection processing on a target audio file to obtain rhythm information; determining a target rendering program from a plurality of rendering programs by using the initialization information or the rhythm information; determining display parameters corresponding to each image texture in each target rendering program by using the rhythm information, wherein the display parameters comprise display area parameters and/or display state parameters; and based on the display parameters and the inherent parameters corresponding to the target rendering program, performing image rendering by using the target rendering program to obtain a target image, wherein the inherent parameters comprise inherent color values and/or inherent positions of image textures.

Therefore, after the target audio file is obtained, the target audio file is processed, and the corresponding rhythm information is extracted. The rhythm information can represent rhythm conditions of the target audio file, and specifically can be drum point information, bar information and the like. After the rhythm information is obtained, determining a target rendering program required by rendering a target image according to the initialization information or the obtained rhythm information, determining how image textures in the rendering program are displayed according to the rhythm information, and further expressing the rhythm condition of the target audio file by changing the display mode of the image textures. Specifically, the rhythm information is used to determine display parameters corresponding to each image texture, where the parameters may include a display area parameter and/or a display state parameter, which are used to indicate which portion of the image texture is displayed and whether the image texture is displayed. After the display parameters are obtained, the display parameters and the inherent parameters which are inevitably used when the target rendering program is used for rendering the image are used for performing image rendering together to obtain the corresponding target image. The intrinsic parameters may comprise intrinsic color values and/or intrinsic positions of image textures, etc. And performing image rendering operation by using the rhythm information, wherein the obtained target image can record rhythm information of the audio, can represent the rhythm condition of the target audio file, and can correspond to the target audio file. By acquiring rhythm information of the audio and utilizing the rhythm information to render images, the rhythm condition of the target audio file can be represented by the rhythm information, so that the obtained target image can correspond to the target audio file in the aspect of rhythm condition, and the problem that the expression of the visual special effect in the related art can not correspond to the audio heard by a user is solved.

In addition, the application also provides an image generation device based on audio, electronic equipment and a computer readable storage medium, and the beneficial effects are also achieved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic diagram of a hardware composition framework to which an audio-based image generation method provided in an embodiment of the present application is applied;

fig. 2 is a schematic diagram of a hardware composition framework to which another audio-based image generation method provided in the embodiment of the present application is applied;

fig. 3 is a schematic flowchart of an audio-based image generation method according to an embodiment of the present disclosure;

fig. 4 is a flowchart of a specific target audio file acquisition process provided in an embodiment of the present application;

fig. 5 is a flowchart of another specific target audio file acquisition process provided in the embodiment of the present application;

FIG. 6 is a flowchart illustrating an atmosphere rendering process according to an embodiment of the present disclosure;

FIG. 7 is a flowchart of a specific rhythm layer rendering process provided by an embodiment of the present application;

fig. 8 is a schematic structural diagram of a rendering algorithm module according to an embodiment of the present disclosure;

FIG. 9 is a flowchart of a specific target image rendering process provided by an embodiment of the present application;

FIG. 10 is a schematic diagram of a target image according to an embodiment of the present disclosure;

FIG. 11 is a schematic diagram of another target image provided in the embodiments of the present application;

fig. 12 is a schematic view of another target image according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

For convenience of understanding, a hardware composition framework used in a scheme corresponding to the audio-based image generation method provided in the embodiment of the present application is described first. Referring to fig. 1, fig. 1 is a schematic diagram of a hardware composition framework applicable to an audio-based image generation method according to an embodiment of the present disclosure. Wherein the electronic device 100 may include a processor 101 and a memory 102, and may further include one or more of a multimedia component 103, an information input/information output (I/O) interface 104, and a communication component 105.

Wherein, the processor 101 is used for controlling the overall operation of the electronic device 100 to complete all or part of the steps in the audio-based image generation method; the memory 102 is used to store various types of data to support operation at the electronic device 100, such data may include, for example, instructions for any application or method operating on the electronic device 100, as well as application-related data. The Memory 102 may be implemented by any type or combination of volatile and non-volatile Memory devices, such as one or more of Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic or optical disk. In the present embodiment, the memory 102 stores therein at least programs and/or data for realizing the following functions:

acquiring a target audio file;

carrying out rhythm detection processing on a target audio file to obtain rhythm information;

determining a target rendering program from a plurality of rendering programs by using the initialization information or the rhythm information;

The multimedia component 103 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 102 or transmitted through the communication component 105. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 104 provides an interface between the processor 101 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 105 is used for wired or wireless communication between the electronic device 100 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding Communication component 105 may include: Wi-Fi part, Bluetooth part, NFC part.

The electronic Device 100 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the audio-based image generation method.

Of course, the structure of the electronic device 100 shown in fig. 1 does not constitute a limitation of the electronic device in the embodiment of the present application, and in practical applications, the electronic device 100 may include more or less components than those shown in fig. 1, or some components may be combined.

It is to be understood that, in the embodiment of the present application, the number of the electronic devices is not limited, and it may be that a plurality of electronic devices cooperate together to complete the audio-based image generation method. In a possible implementation manner, please refer to fig. 2, and fig. 2 is a schematic diagram of a hardware composition framework to which another audio-based image generation method provided in the embodiment of the present application is applied. As can be seen from fig. 2, the hardware composition framework may include: the first electronic device 11 and the second electronic device 12 are connected to each other through a network 13.

In the embodiment of the present application, the hardware structures of the first electronic device 11 and the second electronic device 12 may refer to the electronic device 100 in fig. 1. That is, it can be understood that there are two electronic devices 100 in the present embodiment, and the two devices perform data interaction. Further, in this embodiment of the application, the form of the network 13 is not limited, that is, the network 13 may be a wireless network (e.g., WIFI, bluetooth, etc.), or may be a wired network.

The first electronic device 11 and the second electronic device 12 may be the same electronic device, for example, the first electronic device 11 and the second electronic device 12 are both servers; or may be different types of electronic devices, for example, the first electronic device 11 may be a smartphone or other smart terminal, and the second electronic device 12 may be a server. In one possible embodiment, a server with high computing power may be used as the second electronic device 12 to improve the data processing efficiency and reliability, and thus the processing efficiency of the audio-based image generation. Meanwhile, a smartphone with low cost and wide application range is used as the first electronic device 11 to realize interaction between the second electronic device 12 and the user. It is to be understood that the interaction process may be: the smart phone sends the target audio file to the server, and the server performs rhythm detection processing to obtain rhythm information. The server sends the rhythm information to the smart phone, and the smart phone performs image rendering locally by using a rendering program based on the rhythm information to obtain a target image.

Based on the above description, please refer to fig. 3, and fig. 3 is a flowchart illustrating an audio-based image generation method according to an embodiment of the present disclosure. The method in this embodiment comprises:

s101: and acquiring a target audio file.

The target audio file refers to an audio file which needs to generate a video (or image) special effect corresponding to the target audio file, and in one embodiment, the target audio file comprises audio data; in another embodiment, the target audio file may include audio attribute information corresponding to the audio data, among other things. In one embodiment, the target audio file may record audio of a shorter length, such as audio frames; in a second embodiment, the target audio file may record audio of a longer length, such as a complete audio fragment. It should be noted that, for how to determine whether the audio length is longer, a threshold may be set as needed, and the threshold is compared with the audio length, and the specific size of the threshold is not limited. The audio format of the target audio file may be set to any optional format, such as a non-compressed audio format based on a PCM (Pulse Code Modulation) format, for example, a WAV format, an APE format, or a FLAC format, or a lossy audio compression format, for example, an MP3 format, an AAC format, or the like.

As to the obtaining manner of the target audio file, in an embodiment, the target audio file may be pre-stored, and is directly obtained from a storage location during obtaining, where the storage location may specifically be an external storage path or a local storage path, where the local storage path refers to a storage path inside the device, and the external storage path refers to a storage path of an external removable storage device, a cloud storage device, and the like. In this case, when the target audio file needs to be acquired, the target audio file is acquired using the pre-stored or acquired storage location information. In a second embodiment, the target audio file is received in real time, for example, audio sent by other electronic devices is received in real time, in this case, data input from outside may be obtained and stored in the memory (i.e., buffer), and the obtaining of the target audio file is completed.

It will be appreciated that the trigger timing for the step of retrieving the target audio file may be set by setting different trigger rules. In one embodiment, a storage path of a target audio file or a transmission port of the target audio file may be detected in real time, and when a file existing in the storage path or a transmission port starts to transmit data is detected, it is determined that the target audio file starts to be acquired. In another embodiment, the target audio file may be acquired after detecting a trigger signal, which is a signal indicating the start of the process. Specifically, the generation manner of the trigger signal is not limited, for example, the trigger signal may be generated according to an instruction input by a user, for example, the trigger signal is generated when a music playing instruction input by the user is detected; or the trigger signal may be generated according to a switch state switching signal of the display device, for example, when an on signal of the display device is received.

S102: and carrying out rhythm detection processing on the target audio file to obtain rhythm information.

Rhythm information refers to information that can characterize the rhythm and rhythm of an audio, and in the audio field, it may be drum information in general, or may be bar information. The drum point information is used for recording the corresponding time of the audio drum point, and the drum point refers to that the first beat or the third beat (2/4 or 4/4) in a bar of music is a repeated beat; the first beat of the first bar of music of the three-step dance (3/4) is a rephoto beat or a hard beat. The music beat refers to the combination rule of the strong beat and the weak beat, so the drumbeat information can be used for representing the rhythm of the audio. The bar information is information for recording the time of a bar, and since the strong and weak beats in audio always appear in a regular cycle, the part from one strong beat to the next is generally called a bar.

The tempo detection processing is processing for generating tempo information corresponding to a target audio file. It can be understood that, according to the specific type of the rhythm information and the specific content of the target audio file, the specific manner of the rhythm detection processing may also be different, and the specific acquisition manner of the drumbeat information, the bar information, and the like may refer to the related art, which is not described in detail again in this embodiment.

In a specific embodiment, when the target audio file is a long audio, the rhythm detection process may first segment the target audio file, and extract rhythm information in each segment, so as to improve the accuracy of the rhythm information; when the target audio file is a short audio, the tempo detection processing can be directly performed thereon.

Further, the embodiment does not limit how to determine whether the target audio file is long. Since the obtained rhythm information is finally used for image rendering to obtain a target image, and the target image needs to correspond to the target audio file, the situation that whether the target audio file is currently at the switching position of the drumbeat or the bar changes along with the playing of the target audio file. Therefore, in order to make the target image correspond to the target audio file, the duration of the drumbeat or the time required for the section switching may be used as a determination criterion for determining whether the target audio file is long. In another embodiment, the target images are typically assembled as video frames into a video stream, i.e., a video effect. The video stream has a certain frame rate, so each video frame corresponds to a period of time, and in order to make the video stream completely correspond to the target audio file, the time length corresponding to the video frame can be used as a criterion for judging whether the target audio file is longer. For example, when the frame rate of the video stream is 30 frames, each video frame corresponds to 0.33 seconds. In this case, it may be determined whether the target audio file is greater than 0.33 seconds, and if so, it is determined to be longer.

It should be noted that the rhythm information may only include one type of information, such as only the drumbeat information, or only the bar information; in another embodiment, the cadence information may include multiple types of information.

S103: and determining a target rendering program from the plurality of rendering programs by using the initialization information or the rhythm information.

The rendering program refers to a program for rendering an image, and according to different specific types of electronic devices for executing the steps in the embodiment, the rendering program may be written in different computer languages, and in one embodiment, may be written in C language; in a second embodiment, the rendering program may be written in the OpenGL ES SL language. And the rendering program takes the rhythm information as a parameter necessary for rendering, and performs image rendering according to the rhythm information to obtain a target image. The rendering program may be constructed based on a fading-in and fading-out image algorithm, a light scanning image algorithm, a random image algorithm, a flash image algorithm, and the like, and a specific construction manner is not limited.

The initialization information is reference information for selecting a target rendering program at the time of first image rendering, with which a rendering program designated manually at the time of first image rendering can be adopted. In another embodiment, the target rendering program may be further selected according to the rhythm information, and the target rendering program is obtained again when the rhythm information satisfies a certain condition.

The number of the target rendering programs may be multiple, that is, multiple rendering programs may be set, and multiple target rendering programs may be obtained by selecting from the multiple rendering programs when performing image rendering. When the number of the target images is multiple, the rendering programs corresponding to different target images may be different, that is, the rendering programs may be switched during the image rendering process. The number of rendering programs corresponding to a single target image may be one or more.

Specifically, in one embodiment, in order to improve the strength of the representation of the rhythm information, the target rendering program may be updated according to the change of the rhythm information, so as to render the target image in different rendering manners between rhythms. S103 may specifically include the following steps:

step 11: and if the current rendering program does not exist, selecting a target rendering program from the plurality of rendering programs by using the initialization information.

Step 12: and if the current rendering program exists, acquiring and analyzing the configuration information to obtain a first base number and/or a second base number.

Step 13: and when the drum point information in the rhythm information is a multiple of the first base number, re-acquiring the target rendering program according to the updating sequence.

And/or the presence of a gas in the gas,

step 14: and when the measure information in the rhythm information is a multiple of the second base number, re-acquiring the target rendering program according to the updating sequence.

In this embodiment, when determining the target rendering manner, it is first determined whether a selected rendering program exists currently, that is, a current rendering program. If the current rendering program does not exist, the rendering program selected this time is the initial selection, and in this case, the target rendering program may be selected from the plurality of rendering programs by using the initialization information.

If the current rendering program exists, the selection is not the initial selection, and the target rendering program selected this time is used for replacing the current rendering program to complete the updating of the target rendering program. Firstly, a judgment reference for judging whether updating is needed is obtained, namely, the configuration information is obtained and analyzed, and a first base number and/or a second base number which are/is used as the judgment reference are/is obtained. In this embodiment, the first base number is used to process the drum point information, and when the drum point information (for example, a drum point serial number) in the rhythm information is a multiple of the first base number, the target rendering program is updated, specifically, the target rendering program is obtained again according to the update sequence. The update sequence refers to a sequence of each target rendering program in the using process, and the specific sequence is not limited.

Similar to the drum point information, the second base number is used for comparing with the bar information in the rhythm information, and when the bar information (such as the bar number) is a multiple of the second base number, the target rendering program is obtained again according to the updating sequence. It is understood that the first base and the second base may be applied simultaneously, that is, the target rendering program is updated when any one of the above two multiple conditions is satisfied.

Further, in another embodiment, the target rendering program may be updated based on other data, and therefore, the method may further include the following steps:

step 21: obtaining and analyzing the configuration information to obtain an amplitude interval.

Step 22: and when the normalized amplitude is in the amplitude interval, re-acquiring the target rendering program according to the updating sequence.

In this embodiment, the configuration information may further include an amplitude interval, where the amplitude interval is used to limit the normalized amplitude, and the target rendering program is updated when the normalized amplitude satisfies the amplitude interval. The normalized amplitude refers to amplitude information obtained by normalizing the volume amplitude, and the specific processing procedure of the normalization processing is not limited.

It should be noted that the update orders adopted in the three target rendering program update methods may be the same or different, and for example, the update may be performed according to a first update order, a second update order, and a third update order, respectively. Further, the amplitude intervals obtained in step 21 may be plural, and there may be no overlapping portion between the amplitude intervals, and in this case, the update order for each amplitude interval may be different.

In a specific embodiment, the drum point information may be represented by basex, the section information may be represented by chapterindex, and the normalized amplitude may be represented by amplitude, and the first base number may be set to 4, the second base number may be set to 2, and the amplitude interval may be set to [0.3,1 ]. In this case, the conditions for re-acquiring the target rendering program are as follows:

beatdindexvild, 4, 0; drum number being a multiple of 4

The amplitude value is 0.3; normalized amplitude of 0.3 or more

chapterIndexValid is Chapteridendex% 2 is 0; the order of the minor segments being multiples of 2

The process of retrieving the target rendering program can be expressed as:

it should be noted that, in another embodiment, only one rendering program may exist in a certain update sequence, and the rendering manner and result of the rendering program have a correlation with the called duration. In this case, the rendering program is repeatedly called and reused for rendering, which has the effect of circularly triggering re-rendering.

S104: and determining display parameters corresponding to each image texture in each target rendering program by using the rhythm information.

The display parameters comprise display area parameters and/or display state parameters, the display area parameters refer to parameters of a part needing to be displayed currently in the designated image texture, and the display state parameters refer to parameters indicating whether the image texture is displayed or not. Image texture refers to the basic elements used to construct the target image and may also be referred to as auxiliary texture. Any number of image textures can exist in a single target rendering program, and based on the display parameters, the image textures are drawn on a preset background, so that a target image can be obtained.

As for the manner of determining the display parameter by using the rhythm information, in one embodiment, an update base number may be set, and when the drumbeat information or bar information in the rhythm information is a multiple of the update base number, the display parameter is determined as a first preset parameter, and when the rhythm information is not a multiple of the update base number, the display parameter is determined as a second preset parameter. For example, when the image texture is divided into two parts, namely, the lower left part and the upper right part, and the drum point information is taken as the rhythm information for generating the display parameters, it may be calculated whether the drum point information is a multiple of 2, that is, a 2-based remainder calculation is performed on the drum point information, and the calculation result index is:

index＝mod(beatIndex,2)

in this case, the image texture of the upper right part may be rendered when the index is 0, and the image texture of the lower left part may be rendered when the index is 1. Reference may be made specifically to fig. 12, which shows a target image drawn after the display parameters specify the drawing of the image texture of the lower left portion, specifically, the image texture is a hexagonal texture.

S105: and based on the display parameters and the inherent parameters corresponding to the target rendering program, performing image rendering by using the target rendering program to obtain a target image.

The intrinsic parameters include intrinsic color values and/or intrinsic positions of image textures, the intrinsic color values refer to standard color values of a rendering program during image rendering, and the intrinsic positions of the image textures refer to positions of the image textures in the whole target image. After the display parameters are obtained, the display parameters and the intrinsic parameters are used together for image rendering. The embodiment does not limit a specific image rendering manner, and in an implementation, the process may specifically include the following steps:

step 31: and determining the texture of the target image by using the display state parameters, and determining a display area in the texture of the target image by using the display area parameters corresponding to the texture of the target image.

Step 32: and acquiring volume amplitude information, and performing normalization processing on the volume amplitude information to obtain normalized amplitude.

Step 33: and obtaining texture color values corresponding to the display areas by using the normalized amplitude and the inherent color values.

Step 34: and performing image rendering on the display area based on the texture color value, the inherent color value and the inherent position of the image texture to obtain a target image.

First, the display parameters in this embodiment include display state parameters and display area parameters, the image texture that needs to be displayed, that is, the target image texture, may be determined by using the display state parameters, and the display area in which the display needs to be displayed may be determined by using the display area parameters corresponding to the target image texture. In this embodiment, in order to form the flicker effect, the texture color value corresponding to the display area may be determined using the normalized amplitude as the amplitude coefficient.

Specifically, after the volume amplitude information is acquired, the normalized amplitude can be obtained by performing normalization processing on the volume amplitude information, and since the amplitude is subjected to the normalization processing, the position of the current volume in the normal volume interval can be represented, and the amplitude can be calculated as an amplitude coefficient and a natural color value to obtain a texture color value. Thus, the linear corresponding relation of large volume, namely large color value, or large volume, namely small color value can be realized. And after the texture color value is obtained, rendering is carried out based on the texture color value, the inherent color value and the inherent position of the image texture, so that the display area in the obtained target image can flicker correspondingly along with the change of the volume.

Further, in an embodiment, there may be a part of the whole target image that needs to have the flicker effect and another part of the whole target image that does not need to have the flicker effect, in which case, the following steps may be included:

step 41: and judging whether the inherent color value corresponding to the image texture meets the preset condition.

The preset condition is a size relationship condition between the intrinsic channel color values corresponding to the color value channels, for example, when the target image has three color channels of RGB, the target image may be regarded as three color value channels, and the intrinsic color value corresponding to each pixel is formed by three intrinsic channel color values. When rendering is carried out, whether the size relation among the three inherent channel color values corresponding to the image textures meets a preset condition or not is judged, for example, when the R channel color value is larger than the B channel color value and the R channel color value is larger than the G channel color value, the preset condition is considered to be met. The specific content of the inherent color value and the corresponding preset condition corresponding to each image texture may be different.

Step 42: and based on the texture color value and the inherent position of the image texture, performing image rendering on the display area meeting the preset condition, and based on the inherent color value and the inherent position of the image texture, performing image rendering on the display area not meeting the preset condition to obtain the target image.

If the preset condition is met, the image texture needs to be added with a flicker effect, in this case, the texture color value and the inherent position of the image texture can be used for rendering, otherwise, the inherent color value and the inherent position of the image texture are used for rendering, and the target image is obtained.

By applying the image generation method based on the audio frequency provided by the embodiment of the application, after the target audio frequency file is obtained, the target audio frequency file is processed, and the corresponding rhythm information is extracted. The rhythm information can represent rhythm conditions of the target audio file, and specifically can be drum point information, bar information and the like. After the rhythm information is obtained, determining a target rendering program required by rendering a target image according to the initialization information or the obtained rhythm information, determining how image textures in the rendering program are displayed according to the rhythm information, and further expressing the rhythm condition of the target audio file by changing the display mode of the image textures. Specifically, the rhythm information is used to determine display parameters corresponding to each image texture, where the parameters may include a display area parameter and/or a display state parameter, which are used to indicate which portion of the image texture is displayed and whether the image texture is displayed. After the display parameters are obtained, the display parameters and the inherent parameters which are inevitably used when the target rendering program is used for rendering the image are used for performing image rendering together to obtain the corresponding target image. The intrinsic parameters may comprise intrinsic color values and/or intrinsic positions of image textures, etc. And performing image rendering operation by using the rhythm information, wherein the obtained target image can record rhythm information of the audio, can represent the rhythm condition of the target audio file, and can correspond to the target audio file. By acquiring rhythm information of the audio and utilizing the rhythm information to render images, the rhythm condition of the target audio file can be represented by the rhythm information, so that the obtained target image can correspond to the target audio file in the aspect of rhythm condition, and the problem that the expression of the visual special effect in the related art can not correspond to the audio heard by a user is solved.

Based on the above embodiments, the present embodiment specifically describes some steps in the above embodiments. In one embodiment, in order to make the target image correspond to the target audio file as much as possible, the upper limit of the frame rate of the video effect may not be set. Because the acquisition process of the target audio file needs a certain time, subsequent operations can be performed while the target audio file is acquired. For example, when a certain portion of certain audio is acquired, the portion may be determined as a target audio file, its corresponding rhythm information extracted, or a corresponding target image generated. New audio continues to be acquired while the target image is being generated. When the generation of the rhythm information of the previous part is finished or the generation of the target image of the previous part is finished, the audio newly acquired in the period of generating the rhythm information or the target image can be determined as the target audio file so as to continue the subsequent operation. Referring to fig. 4, fig. 4 is a flowchart illustrating a specific target audio file obtaining method according to an embodiment of the present disclosure, where the obtaining method may be referred to as real-time obtaining. The audio data source provides a data source of a target audio file, which may be other electronic devices or an electronic device that performs the steps of this embodiment, and the provided audio is in a PCM format. When the target audio file is obtained, audio data is obtained through the setData instruction and stored in the AudioBuffer, namely the audio memory, and the audio data stored in the audio memory is the target audio file. And reading the target audio file from the audio memory through the getData instruction so as to execute subsequent operations.

In another embodiment, in order to reduce the consumption of computing resources, a fixed frame rate upper limit of the video special effect may be set, in this case, in order to ensure that the video special effect can correspond to the target audio file, reasonable average slicing may be performed on the original audio according to the video frame rate to obtain a plurality of target audio files, so as to generate target images corresponding to the respective target audio files. Specifically, the process of acquiring the target audio file may include the following steps:

step 51: and acquiring initial audio, and calculating the fragment length by using the video frame rate and the audio parameter corresponding to the target audio file.

The initial audio refers to directly acquired audio data without being processed. The audio parameters include audio sampling rate, channel number and audio bit depth, and since the video frame rate represents the number of video frames (i.e. target images) in a unit time length and the data length of the initial audio is related to the audio parameters, the video frame rate and the audio parameters can be used to jointly calculate the length of the audio data in the unit time length, and the initial audio is sliced according to the length of the slice. Specifically, the fragment length may be calculated according to the following formula:

slice length (byte) audio sampling rate channel number audio bit depth/8/video frame rate

In one embodiment, the data format of the initial audio is a PCM format and corresponds to a sample rate of 44100Hz, two channel, 16 bit depth audio, and a video frame rate of 30 frames per second. In this case, the calculated slice length is:

length of the slice 44100 × 2 × 16/8/30 ═ 5880(byte)

That is, in the above case, the initial audio data of length 5800 bytes can be used for playing for 0.33 seconds.

Step 52: and slicing the initial audio according to the slicing length to obtain a plurality of target audio files.

After determining the slice length, the initial audio may be sliced according to it to obtain a plurality of target audio files. Referring to fig. 5, fig. 5 is a flowchart for another specific target audio file acquisition according to an embodiment of the present application. In the process of acquiring the initial audio data by using the getData instruction and storing the initial audio data into the audio cache, or after the initial audio data is completely stored into the audio cache, the initial audio data can be segmented from the initial segment of the audio data by using the calculated segment length, and one of the segments is read out in sequence when the getData instruction is used for reading the data from the audio cache every time, so that the segment of data is the target audio file.

Based on the above embodiment, in a possible implementation manner, since the drumhead can directly and accurately reflect the audio rhythm, the drumhead information of the target audio file can be extracted as the rhythm information. In this case, the process of performing tempo detection processing on the target audio file to obtain tempo information may include the following steps:

step 61: and performing down-sampling processing on the target audio file to obtain a preprocessed audio.

The down-sampling process refers to a process of resampling the high-frequency audio at a lower sampling rate. Since information such as the human voice in audio is mainly concentrated in the lower frequency band, e.g. within 8kHz, the higher frequency band usually does not have much valuable signal. Therefore, in order to reduce the signals of the computing resources and improve the extraction speed of the drumhead information, the target audio file can be subjected to down-sampling processing to obtain the corresponding preprocessed audio. The frequency of the down-sampling process is the frequency of the pre-processed audio, and the present embodiment does not limit the specific frequency of the pre-processed pinyin, for example, when the target audio file is 48kHz, the pre-processed audio file may be 16 kHz. It will be appreciated that although the higher frequency band does not have too many valuable signals, it still records some information, so the frequency of the pre-processed audio cannot be too low, avoiding that more signals are lost causing inaccurate drumbeat information.

Step 62: and performing framing processing on the preprocessed audio to obtain a plurality of frame signals, and performing windowing processing on each frame signal to obtain intermediate data corresponding to each frame signal.

In this embodiment, a series of processes including frame windowing, fourier transform, power spectrum calculation, frame power calculation, average value calculation, and normalization processing may be employed to obtain accurate drum information. Specifically, the estimation process and manner of the framing processing are not limited, and the result can be expressed by the following formula:

x_n(i)＝x(n·M+i)

where n represents a representation corresponding to the frame signal, x_n(i) Namely the nth frame signal; m represents frame shift, i represents an index in the frame signal, the index is a positive integer between 0 and L-1, and L is the frame length. Preferably, L may be set to 0.05 second and M may be set to 0.25 second.

After each frame signal is obtained, windowing is performed on the frame signal, and specific content of a window function used in the windowing is not limited. The process of the windowing process can be represented by the following formula:

wherein, xw_n(i) For intermediate data, w (i) is a window function.

And step 63: and carrying out Fourier transform and power spectrum calculation by using the intermediate data to obtain a plurality of power spectrums.

After the intermediate data corresponding to the target audio file is obtained, Fourier transformation is performed on the intermediate data so as to calculate the power spectrum subsequently. In this embodiment, the Fourier Transform adopts Short-Time Fourier Transform (STFT), and the Fourier Transform process can be expressed by the following formula:

wherein k is the frequency point number in the frame signal, and N represents the number of points of Fourier transform. Because the frame length may not match with the number of points of the fourier transform, when L < N, the intermediate data needs to be complemented; when L > N, the intermediate data may be truncated, with N points truncated for fourier transformation.

After the fourier transform is finished, power spectrum calculation may be performed, the power spectrum calculation in this embodiment adopts a modulo calculation method, and a specific calculation process may be represented by the following formula:

P(n,k)＝||X(n,k)||²，n＝0,1,...,Q-1

wherein Q represents the total frame number of the target audio file after Fourier transformation, P (n, k) represents the power spectrum corresponding to the kth frequency point of the nth frame, | | | | | circuitry²Representing a modulo operation.

Step 64: and calculating the frame power by using the power spectrum to obtain a power value corresponding to each frame signal.

Specifically, the frame power may be calculated according to the following formula, so as to obtain the power value corresponding to each frame signal. Specifically, the power value calculation process can be expressed by the following formula:

wherein, p (n) is the power value corresponding to the nth frame signal.

Step 65: and performing average value calculation and normalization processing by using the power value to obtain drum point information, and determining the drum point information as rhythm information.

The average power value corresponding to the target audio file can be obtained by calculating the average value of each power value, and the average power value can be normalized by normalizing the average power value, namely normalizing the average power value to a preset interval to obtain drum point information. In this embodiment, the drumbeat information is rhythm information.

In a specific embodiment, in order to enable the drum point information obtained after normalization to have a higher contrast, the drum point information obtained after normalization may be processed by using a correction function to obtain optimized drum point information. Specifically, the process of obtaining the drum point information by performing average value calculation and normalization processing on the power value may specifically include the following steps:

step 651: and calculating the average value corresponding to each power value to obtain the average power value.

Specifically, the specific process of calculating the average value can be represented by the following formula:

wherein B is the frame number corresponding to the frame division of the target audio file,

is the average power value.

Step 652: and carrying out normalization processing based on a preset interval on the average power value to obtain initial drum point information.

Specifically, the specific process of the normalization process can be represented by the following formula:

where P is initial drum point information, P^UIs the upper limit of the interval of the preset interval, P^LIs the lower limit of the preset interval.

Step 653: and correcting the initial drum point information by using a correction function to obtain drum point information.

The correction function is used to increase the contrast of the initial drum point information, so that the drum point information is more clear, and the specific content is not limited. For example, in one embodiment, the modification process may specifically be:

wherein,

is the drumhead information.

In another embodiment, bar information may also characterize the tempo of the audio, and thus may be utilized as tempo information. In this case, the process of performing tempo detection processing on the target audio file to obtain tempo information may specifically include the following steps:

step 71: attribute information is extracted from the target audio file.

The attribute information is information for reflecting the attribute of the target audio file, and in this embodiment, the attribute information may record music score data corresponding to the target audio file, and the form and content of the attribute information are not limited.

Step 72: section information is generated using the attribute information, and the section information is determined as rhythm information.

The present embodiment does not limit the specific process and manner for generating section information by using attribute information, and reference may be made to related technologies, which are not described herein again. In this embodiment, the target audio file may not be directly utilized to obtain the rhythm information, but the target audio file is utilized to determine the corresponding attribute information, obtain the music score information from the attribute information, and analyze the music score information to obtain the corresponding rhythm information. The method does not need to specifically analyze the target audio file, can improve the acquisition speed of rhythm information, and further improves the generation speed of the target image.

Based on the foregoing embodiment, in a specific implementation manner, in order to characterize a target audio file from multiple angles, the target image may be obtained by rendering together with other information while rendering the image by using the rhythm information. Specifically, the method can further comprise the following steps:

step 81: and acquiring basic information corresponding to the target audio file.

In the present embodiment, the basic information is volume amplitude information and/or spectrum information. The volume amplitude information may be used to represent the volume amplitude of the target audio file, and the spectral information may represent the energy distribution of the target audio file in the frequency domain.

Correspondingly, based on the display parameters and the intrinsic parameters corresponding to the target rendering program, the process of performing image rendering by using the target rendering program to obtain the target image may include:

step 82: and respectively utilizing the display parameters and the inherent parameters corresponding to each target rendering program and the basic information to render the initial images to obtain a plurality of initial images.

The specific process of rendering the image based on the display parameters, the intrinsic parameters, and the basic information may refer to the description of rendering the image by using the rhythm information in the foregoing embodiment, for example, the normalized amplitude may be regarded as the basic information, and is not described herein again. By acquiring the basic information, the target audio file can be represented from a plurality of angles, so that the video special effect not only has the rhythm and rhythm information of the target audio file, but also improves the flexibility of the target image.

Step 83: and carrying out image fusion processing on the initial image to obtain a target image.

The image fusion processing refers to a processing mode of fusing lines, brightness and the like in each image, and a specific fusion process may refer to a related technology. In the present embodiment, the image fusion process may be a color value addition fusion process or an alpha fusion process. In one embodiment, when the requirement of fusion is to fuse the lights in each initial image, an alpha fusion algorithm may be used to implement the image fusion process. Alpha blending is a process of combining an image with a background, and a partially transparent or fully transparent visual effect can be generated after combination, and a specific blending process can be expressed by the following formula:

I_result＝α×I_light+(1-α)×I_background

wherein, I_resultIs the brightness of a pixel in the target image, I_lightAnd I_backgroundRespectively the brightness of the pixels in the two initial images. When a plurality of initial images exist, the fusion can be performed in sequence, that is, the image fusion is performed for a plurality of times, and finally the target image is obtained. Rendering of a target image by using multiple rendering programsAnd dyeing can improve the selection flexibility of the rendering program, thereby improving the rendering flexibility of the target image.

Based on the above embodiments, the present embodiment will explain a specific target image rendering process. Referring to fig. 6, fig. 6 is a flowchart for a specific atmosphere rendering process according to an embodiment of the present disclosure. The method comprises the steps that a shader rendering program corresponding to an atmosphere layer adopts a progressive fading image algorithm and a ray scanning image algorithm, corresponding target rhythm information is drum point information, and in addition, amplitude information (namely volume amplitude information) can be used for rendering atmosphere layer special effects together. Referring to fig. 7, fig. 7 is a specific flow chart of a rhythm layer rendering according to an embodiment of the present application. Similar to the atmosphere layer, the shader rendering program corresponding to the rhythm layer adopts a random image algorithm and a flash image algorithm, and the corresponding target rhythm information is drum point information and bar information.

In order to facilitate flexible development, a plurality of OpenGL shaders are constructed as rendering programs in the embodiment. Referring to fig. 8, fig. 8 is a schematic structural diagram of a rendering algorithm module according to an embodiment of the present disclosure. There is one ambience layer shader handler and multiple rhythm layer (i.e., rhythm layer) shader handlers. Referring to fig. 9, fig. 9 is a specific target image rendering flowchart provided in an embodiment of the present disclosure, when performing target image rendering, information such as amplitude, drum point, and event may be input, a rendering program corresponding to an atmosphere layer selects target rhythm information and other non-rhythm information from the target rhythm information and other non-rhythm information to generate an atmosphere layer, the atmosphere layer mainly includes a scanning top light and two side bar-shaped atmosphere lights, and equal brightness may be set by using the target rhythm information and other non-rhythm information to obtain an atmosphere map, that is, an atmosphere layer image. And inputting information such as drum points, time and the like into a shader rendering program corresponding to the rhythm layer to obtain the corresponding rhythm layer, and carrying out image fusion on the rhythm layer and the atmosphere map and then sending the rhythm layer and the atmosphere map to the front end of the client for display. Specifically, in this embodiment, the rhythm layer mainly includes rhythm lights, the node information is used to select different rhythm layer shader rendering programs, and further determine the style of the current rhythm light, and the node information may be an update sequence, or may be a combination of the update sequence and an update instruction. In this embodiment, there are three rhythm layer shader rendering programs, the first is a circular lamp, and the input target rhythm information and non-rhythm information may be used to control the brightness and sequence of the circular lamp. The second is a rectangular lamp, and the input target rhythm information and the non-rhythm information can be used for controlling the brightness and the movement direction of the rectangular lamp. The second is a hexagonal lamp, and the input target rhythm information and non-rhythm information can be used for controlling the brightness and the display part of the hexagonal lamp. Referring to fig. 10, fig. 10 is a schematic diagram of a target image provided by an embodiment of the present application, in which the first pulse layer shader rendering program is adopted, and left and right atmosphere lamps and a top scanning dome lamp of the image schematic diagram may be controlled by volume amplitude information, or the left and right atmosphere lamps are controlled by volume amplitude information and the top scanning dome lamp is controlled by bump information. Referring to fig. 11, fig. 11 is a schematic diagram of another target image according to an embodiment of the present application, which employs the second type of the beat layer shader rendering program. Referring to fig. 12, fig. 12 is a schematic diagram of another target image according to an embodiment of the present application, which employs the third rhythm layer shader rendering procedure.

The following describes a computer-readable storage medium provided in an embodiment of the present application, and the computer-readable storage medium described below and the audio-based image generation method described above may be referred to in correspondence with each other.

The present application further provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described audio-based image generation method.

The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it should also be noted that, herein, relationships such as first and second, etc., are intended only to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms include, or any other variation is intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that includes a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The principle and the implementation of the present application are explained herein by applying specific examples, and the above description of the embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An audio-based image generation method, comprising:

acquiring a target audio file;

2. The audio-based image generation method according to claim 1, wherein the determining a target rendering program from a plurality of rendering programs using the initialization information or the rhythm information includes:

and/or the presence of a gas in the gas,

3. The audio-based image generation method according to claim 1, wherein the rendering an image by using the object rendering program based on the display parameter and the intrinsic parameter corresponding to the object rendering program to obtain an object image includes:

4. The audio-based image generation method according to claim 3, further comprising:

5. The audio-based image generation method according to claim 3, wherein the image rendering the display area based on the texture color value, the inherent color value, and the image texture inherent position to obtain the target image includes:

6. The method of claim 1, wherein the performing a tempo detection process on the target audio file to obtain tempo information comprises:

7. The audio-based image generation method according to claim 6, wherein the performing average value calculation and normalization processing using the power value to obtain drum information includes:

8. The method of claim 1, wherein the performing a tempo detection process on the target audio file to obtain tempo information comprises:

extracting attribute information from the target audio file;

9. The audio-based image generation method according to claim 1, wherein the acquiring a target audio file includes:

10. The audio-based image generation method according to any one of claims 1 to 9, further comprising:

11. An electronic device comprising a memory and a processor, wherein:

the memory is used for storing a computer program;

the processor for executing the computer program to implement the audio-based image generation method of any of claims 1 to 10.

12. A computer-readable storage medium for storing a computer program, wherein the computer program when executed by a processor implements the audio-based image generation method of any of claims 1 to 10.