CN112714355A - Audio visualization method and device, projection equipment and storage medium - Google Patents

Audio visualization method and device, projection equipment and storage medium Download PDF

Info

Publication number
CN112714355A
CN112714355A CN202110330592.3A CN202110330592A CN112714355A CN 112714355 A CN112714355 A CN 112714355A CN 202110330592 A CN202110330592 A CN 202110330592A CN 112714355 A CN112714355 A CN 112714355A
Authority
CN
China
Prior art keywords
audio signal
special effect
visualization
generation model
effect generation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110330592.3A
Other languages
Chinese (zh)
Other versions
CN112714355B (en
Inventor
李禹�
曹琦
何维
张聪
王骁逸
胡震宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huole Science and Technology Development Co Ltd
Original Assignee
Shenzhen Huole Science and Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huole Science and Technology Development Co Ltd filed Critical Shenzhen Huole Science and Technology Development Co Ltd
Priority to CN202110330592.3A priority Critical patent/CN112714355B/en
Publication of CN112714355A publication Critical patent/CN112714355A/en
Application granted granted Critical
Publication of CN112714355B publication Critical patent/CN112714355B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4398Processing of audio elementary streams involving reformatting operations of audio signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440236Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present disclosure provides an audio visualization method, an apparatus, a projection device and a storage medium, wherein the audio visualization method comprises: when an environment audio signal input by microphone equipment is acquired, character recognition is carried out on the environment audio signal; when character information in the environment audio signal is identified, carrying out visualization processing on the character information by using a first special effect generation model to obtain a visual video; and displaying the visual video through the projection device. According to the method and the device, after the environment audio signal transmitted by the microphone device is obtained, the environment audio signal is subjected to character recognition to extract character information, then the character information is subjected to visualization processing and projection by utilizing the preset special effect generation model, the visualization special effect can be changed in real time according to the change of the environment of the projection device, and the audio visualization flexibility of the projection device can be improved.

Description

Audio visualization method and device, projection equipment and storage medium
Technical Field
The present disclosure relates to the field of audio visualization, and in particular, to an audio visualization method and apparatus, a projection device, and a storage medium.
Background
With the popularization of the internet and the rapid popularity of digital audio, users can play various audio data (such as audio novels, songs, etc.) through electronic devices such as mobile phones, tablet computers, etc. In order to improve the diversity of display information in the audio data playing process, in the related technology, the audio features are visually represented by extracting the features of the audio data and by an image rendering mode, so that the effect that a picture changes along with the change of the audio data is achieved, namely, the music experience is explained by using an image language. The existing projection equipment can only perform audio visualization according to audio data with a format set by various electronic equipment, when the projection equipment acquires the audio data of the environment where the projection equipment is located from the microphone equipment in real time, the projection equipment cannot perform audio visualization according to the current environment audio data acquired by the microphone, and the audio visualization of the projection equipment is not flexible enough.
That is, the audio visualization of the projection apparatus in the prior art is not flexible enough.
Disclosure of Invention
The present disclosure aims to provide an audio visualization method, an audio visualization apparatus, a projection device, and a storage medium, and aims to solve the problem that the audio visualization of the projection device in the prior art is not flexible enough.
In one aspect, the present disclosure provides an audio visualization method applied to a projection device, where the projection device is provided with a microphone interface, the microphone interface is used to connect to a microphone device, the microphone device is used to collect an environmental audio signal of an environment where the projection device is located, and the audio visualization method includes:
when an environment audio signal input by the microphone device is acquired, performing character recognition on the environment audio signal;
when character information in the environment audio signal is identified, carrying out visualization processing on the character information by using a first special effect generation model to obtain a visual video;
and displaying the visual video through the projection equipment.
Optionally, when the text information in the environmental audio signal is identified, performing visualization processing on the text information by using a first special effect generation model to obtain a visualized video, where the visualization processing includes:
when character information in the environment audio signal is identified, adjusting picture change parameters of the first special effect generation model based on frequency spectrum information of the environment audio signal to obtain a second special effect generation model;
and carrying out visualization processing on the character information by using the second special effect generation model to obtain a visual video.
Optionally, when the text information in the environmental audio signal is identified, adjusting a picture change parameter of the first special effect generation model based on the frequency spectrum information of the environmental audio signal to obtain a second special effect generation model, where the method includes:
when character information in the environment audio signal is identified, carrying out voiceprint identification on the environment audio signal to obtain a speaking user identifier corresponding to the environment audio signal;
determining the first special effect generation model from a plurality of preset special effect generation models based on the speaking user identification;
and adjusting the picture change parameters of the first special effect generation model based on the frequency spectrum information of the environment audio signal to obtain the second special effect generation model.
Optionally, when the environmental audio signal input by the microphone device is acquired, performing text recognition on the environmental audio signal, before the acquiring, includes:
acquiring an input audio signal;
determining whether a source of the input audio signal is the microphone apparatus;
determining the input audio signal as the ambient audio signal if the source of the input audio signal is the microphone apparatus.
Optionally, the determining whether the source of the input audio signal is the microphone apparatus comprises:
detecting whether the microphone interface is triggered;
determining that the source of the input audio signal is the microphone device if the microphone interface is triggered.
Optionally, the method of audio visualization further comprises:
determining the input audio signal as a system audio signal if the source of the input audio signal is not the microphone apparatus;
acquiring lyric information in the system audio signal;
and carrying out visualization processing on the lyric information by using the first special effect generation model to obtain a visual video.
Optionally, when the environmental audio signal input by the microphone device is acquired, performing text recognition on the environmental audio signal includes:
when the environment audio signal input by the microphone device is acquired, acquiring the sound intensity of the environment audio signal;
judging whether the sound intensity of the environment audio signal is greater than a preset decibel or not;
and if the sound intensity of the environment audio signal is greater than a preset decibel, performing character recognition on the environment audio signal.
In one aspect, the present disclosure provides an apparatus for audio visualization, the apparatus for audio visualization comprising:
the character recognition unit is used for performing character recognition on the environment audio signal when the environment audio signal input by the microphone device is acquired;
the visualization processing unit is used for carrying out visualization processing on the character information by utilizing a first special effect generation model when the character information in the environment audio signal is identified to obtain a visualization video;
and the projection unit is used for displaying the visual video through the projection equipment.
Optionally, the visualization processing unit is configured to, when character information in the environmental audio signal is identified, adjust a picture change parameter of the first special effect generation model based on frequency spectrum information of the environmental audio signal to obtain a second special effect generation model;
and carrying out visualization processing on the character information by using the second special effect generation model to obtain a visual video.
Optionally, the visualization processing unit is configured to, when character information in the environment audio signal is identified, perform voiceprint identification on the environment audio signal to obtain a speaking user identifier corresponding to the environment audio signal;
determining the first special effect generation model from a plurality of preset special effect generation models based on the speaking user identification;
and adjusting the picture change parameters of the first special effect generation model based on the frequency spectrum information of the environment audio signal to obtain the second special effect generation model.
Optionally, the audio visualization apparatus includes an obtaining unit, configured to obtain an input audio signal;
determining whether a source of the input audio signal is the microphone apparatus;
determining the input audio signal as the ambient audio signal if the source of the input audio signal is the microphone apparatus.
Optionally, the obtaining unit is configured to detect whether the microphone interface is triggered;
determining that the source of the input audio signal is the microphone device if the microphone interface is triggered.
Optionally, the obtaining unit is configured to determine the input audio signal as a system audio signal if the source of the input audio signal is not the microphone device;
acquiring lyric information in the system audio signal;
and carrying out visualization processing on the lyric information by using the first special effect generation model to obtain a visual video.
Optionally, the text recognition unit is configured to, when an environmental audio signal input by the microphone device is acquired, acquire a sound intensity of the environmental audio signal;
judging whether the sound intensity of the environment audio signal is greater than a preset decibel or not;
and if the sound intensity of the environment audio signal is greater than a preset decibel, performing character recognition on the environment audio signal.
In one aspect, the present disclosure also provides a projection apparatus, including:
one or more processors;
a memory; and
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the processor to implement the method of audio visualization of any of the first aspects.
In one aspect, the present disclosure also provides a computer-readable storage medium having stored thereon a computer program, which is loaded by a processor to perform the steps in the method for audio visualization according to any one of the first aspect.
The present disclosure provides an audio visualization method, which performs text recognition on an environmental audio signal input by a microphone device when the environmental audio signal is acquired; when character information in the environment audio signal is identified, carrying out visualization processing on the character information by using a first special effect generation model to obtain a visual video; and displaying the visual video through the projection device. According to the method and the device, after the environment audio signal transmitted by the microphone device is obtained, the environment audio signal is subjected to character recognition to extract character information, then the character information is subjected to visualization processing and projection by utilizing the preset special effect generation model, the visualization special effect can be changed in real time according to the change of the environment of the projection device, and the audio visualization flexibility of the projection device can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a scene schematic diagram of a system for audio visualization provided by an embodiment of the present disclosure;
FIG. 2 is a flow diagram of one embodiment of a method of audio visualization provided by an embodiment of the present disclosure;
FIG. 3 is a schematic interface diagram of a visual video in an embodiment of a method of audio visualization provided by an embodiment of the present disclosure;
fig. 4 is a flow chart of another embodiment of a method of audio visualization provided by an embodiment of the present disclosure;
fig. 5 is a flowchart illustrating a method for audio visualization provided by an embodiment of the present disclosure;
FIG. 6 is a schematic structural diagram of an embodiment of an apparatus for audio visualization provided in an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of an embodiment of a projection device provided in an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
In the description of the present disclosure, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of describing and simplifying the description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus should not be construed as limiting the present disclosure. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more features. In the description of the present disclosure, "a plurality" means two or more unless specifically limited otherwise.
In the present disclosure, the word "exemplary" is used to mean "serving as an example, instance, or illustration. Any embodiment described in this disclosure as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. The following description is presented to enable any person skilled in the art to make and use the disclosure. In the following description, details are set forth for the purpose of explanation. It will be apparent to one of ordinary skill in the art that the present disclosure may be practiced without these specific details. In other instances, well-known structures and processes are not set forth in detail in order to avoid obscuring the description of the present disclosure with unnecessary detail. Thus, the present disclosure is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
It should be noted that, since the method in the embodiment of the present disclosure is executed in the projection device, the processing objects of each projection device exist in the form of data or information, for example, time, which is substantially time information, it can be understood that, in the subsequent embodiments, if the size, the number, the position, and the like are mentioned, corresponding data exist, so that the projection device performs processing, and details are not described herein.
The embodiments of the present disclosure provide an audio visualization method and apparatus, a projection device, and a storage medium, which are described in detail below.
Referring to fig. 1, fig. 1 is a schematic view of a scene of an audio visualization system provided in an embodiment of the present disclosure, where the audio visualization system may include a projection apparatus 100, and an audio visualization device is integrated in the projection apparatus 100.
In the embodiment of the present disclosure, the projection device 100 may be an independent server, or may be a server network or a server cluster composed of servers, for example, the projection device 100 described in the embodiment of the present disclosure includes, but is not limited to, a computer, a network host, a single network server, multiple network server sets, or a cloud server composed of multiple servers. Among them, the Cloud server is constituted by a large number of computers or web servers based on Cloud Computing (Cloud Computing).
Those skilled in the art will appreciate that the application environment shown in fig. 1 is only one application scenario of the present disclosure, and does not constitute a limitation on the application scenario of the present disclosure, and that other application environments may further include more or less projection devices than those shown in fig. 1, for example, only 1 projection device is shown in fig. 1, and it is understood that the system for audio visualization may further include one or more other servers, which are not limited herein.
In addition, as shown in fig. 1, the system for audio visualization may further include a memory 200 for storing data.
It should be noted that the scene schematic diagram of the audio visualization system shown in fig. 1 is merely an example, and the audio visualization system and the scene described in the embodiment of the present disclosure are for more clearly illustrating the technical solution of the embodiment of the present disclosure, and do not form a limitation on the technical solution provided in the embodiment of the present disclosure.
First, an audio visualization method is provided in an embodiment of the present disclosure, where an execution main body of the audio visualization method is an audio visualization device, the audio visualization device is applied to a projection device, the projection device is provided with a microphone interface, the microphone interface is used to connect a microphone device, and the microphone device is used to collect an environmental audio signal of an environment where the projection device is located, and the audio visualization method includes:
when an environment audio signal input by microphone equipment is acquired, character recognition is carried out on the environment audio signal;
when character information in the environment audio signal is identified, carrying out visualization processing on the character information by using a first special effect generation model to obtain a visual video;
and displaying the visual video through the projection device.
Referring to fig. 2, fig. 2 is a flowchart illustrating an embodiment of a method for audio visualization according to an embodiment of the present disclosure. As shown in fig. 2, the method of audio visualization includes:
201. and when the environment audio signal input by the microphone equipment is acquired, performing character recognition on the environment audio signal.
Audio signals (audio signals) are regular carriers of frequency, amplitude variation information with speech, music and sound effects. Audio information can be classified into regular audio and irregular sound according to the characteristics of sound waves. Regular audio can be divided into speech, music and sound effects. Regular audio is a continuously varying analog signal that can be represented by a continuous curve called a sound wave. The three elements of sound are pitch, intensity and timbre. There are three important parameters of a sound wave or sine wave: frequency, amplitude and phase, which also determine the characteristics of the audio signal.
In the embodiment of the present disclosure, the projection device is provided with a microphone interface, the microphone interface is used for connecting the microphone device, and the microphone device is used for acquiring an environmental audio signal of an environment where the projection device is located. The microphone, known as a microphone, is translated from an english microphone (microphone), and is also called a microphone or a microphone. A microphone is an energy conversion device that converts a sound signal into an electrical signal. There are classes of moving coil, capacitor, electret and recently emerging silicon micro-microphones, but also liquid microphones and laser microphones. Most microphones are electret condenser microphones which operate on the principle of using a diaphragm of polymeric material with permanent charge isolation. The type of microphone device may be chosen according to the particular situation.
Alternatively, the microphone device may be connected to the microphone interface by a wireless connection, a wired connection, or the like, thereby establishing a connection with the projection device. The microphone device may capture an ambient audio signal of the projection device. For example, when a user speaks, plays music, sings a song, talks, recites, etc. in a room, a microphone device in the room collects sounds in the room to obtain an ambient audio signal, and the ambient audio signal is input to a projection device through a microphone interface, and the projection device can obtain the ambient audio signal input by the microphone device.
In the embodiment of the disclosure, when the environmental audio signal input by the microphone device is acquired, the environmental audio signal is subjected to character recognition.
Specifically, the projection device is provided with an automatic voice recognition module, and the automatic voice recognition module can be used for carrying out character recognition on the environment audio signal. Automatic Speech Recognition (ASR), whose goal is to convert the vocabulary content in human Speech into computer-readable input, such as keystrokes, binary codes, or character sequences. The speech recognition technology is mainly divided into three major categories, the first category is a model matching method, including Vector Quantization (VQ), Dynamic Time Warping (DTW) and the like; the second category is probabilistic methods, including Gaussian Mixture Models (GMMs), Hidden Markov Models (HMMs), etc.; the third category is a discriminator classification method, such as a Support Vector Machine (SVM), an Artificial Neural Network (ANN), a Deep Neural Network (DNN), and the like, and various combination methods, which are selected according to specific situations.
202. When character information in the environment audio signal is identified, the character information is visualized by using the first special effect generation model, and a visualized video is obtained.
In the embodiment of the disclosure, when the text information in the environmental audio signal is identified, the text information is visualized by using the first special effect generation model, so as to obtain a visualized video. The text information may be lyrics, speech content, etc.
Wherein, the pictures produced by different first special effect generation models are also different. Special effect generating models for simulating abstract effects such as fountain, flame, storm and the like are common special effect generating models in audio visualization. The first special effect generation model mainly comprises a composition picture and picture change parameters, picture elements in the composition picture can be points, lines, graphs, curves or other entities with set shapes and the like, and can also be combinations of any two or more of fountain, flame, storm, curves, points and lines. For example: triangles, histograms, wave diagrams, etc. Wherein the frame change parameters are used to control the change of the constituent frames, thereby generating dynamic histograms, oscillograms, and the like. The picture change parameter may be a change period in each period, a change frequency, and a change amplitude of the picture element in each change period, and the like. Of course, an existing screen generation model may be selected, or a screen generation model may be designed by itself.
The first special effect generation model controls picture elements in the composition picture to generate a picture according to the picture change parameters. The first special effect generation model may be a commonly used cyclic one model in which the motion of each target individual (particle) is continuous, and a sudden change of displacement is generated at a strong beat point, so that the whole picture has a "dance" effect, and the beat of the audio can be well emphasized. The core of the cycle model is that a curve is obtained by 6 reference point coordinates according to a Bezier function, N picture elements take a fitted curve as a center, the N picture elements spirally rise to the top point of the curve from the bottom of the curve, the rotating radius follows the change rule from small (bottom) to large (top), and the rotating linear speed is kept stable. In the first special effect generation model, at least the change amplitude of the curve, the time length of one change cycle, and the like are included, and a change curve which is switched to be a heart at some time point, an abrupt change of a display shape at some time point, and the like may be included.
Wherein, the picture change parameters and the composition pictures of the first special effect generation model are set by default. After the character information is input into the first special effect generation model, the first special effect generation model determines each character in the character information as a picture element, and controls each character in the character information to move according to the picture change parameters to obtain the special effect of the character information and form a visual video.
Referring to fig. 3, fig. 3 is a schematic interface diagram of a visualized video in an embodiment of the method for audio visualization provided by the embodiment of the present disclosure.
As shown in fig. 3, for example, the text information is "all the people are happy", and the 6 characters are "i", "people", "ten", respectively. The first special effect generation model performs visualization processing on the character information to generate a histogram shown in 3. The first special effect generation model controls the change frequency and the change amplitude of each character according to the picture change parameters, and controls each character to jump to form a dynamic special effect.
203. And displaying the visual video through the projection device.
Referring to fig. 4, fig. 4 is a flowchart illustrating a method for audio visualization according to another embodiment of the present disclosure. As shown in fig. 4, the method of audio visualization includes:
301. an input audio signal is acquired.
In the embodiment of the present disclosure, the input audio signal may be a system audio signal input by a media player, or may be an environment audio signal input by a microphone device. Media players, also called Media players, generally refer to playing software for playing multimedia in a computer, which gathers decoders together to generate playing functions, such as Windows Media Player. The system audio signal input by the media player is a processed audio file which is provided with lyric information. The environmental audio signal directly input through the microphone device has no text information and needs to be subjected to text recognition. Therefore, different processing is required for audio signals of different sources.
302. It is determined whether the source of the input audio signal is a microphone device.
In a particular embodiment, determining whether the source of the input audio signal is a microphone device comprises: detecting whether a microphone interface is triggered; if the microphone interface is triggered, the source of the input audio signal is determined to be the microphone device. If the microphone interface is not triggered, the source of the input audio signal is determined to be the media player.
Specifically, the state of the microphone interface is detected, and whether the microphone interface is triggered is detected. For example, when a user plugs a plug of a microphone device into a microphone interface on the projection device, the microphone interface is triggered, and the projection device detects that the microphone interface is triggered.
Further, if the source of the input audio signal is a microphone device, which indicates that the input audio signal is an environmental audio signal and a step of character recognition is required, executing 303; if the source of the input audio signal is not the microphone device, indicating that the input audio signal is the system audio signal and the text recognition is not needed, step 307 is executed.
303. An input audio signal is determined as an ambient audio signal.
And determining the input audio signal as an environment audio signal, and determining to acquire the environment audio signal input by the microphone device.
304. And performing character recognition on the environment audio signal.
In the embodiment of the disclosure, when the environmental audio signal input by the microphone device is acquired, the environmental audio signal is subjected to character recognition. Specifically, the projection device is provided with an automatic voice recognition module, and the automatic voice recognition module can be used for carrying out character recognition on the environment audio signal.
Optionally, when performing text recognition on the environmental audio signal, timing the recognition time, determining whether text information in the environmental audio signal is recognized within a preset time, and if text information in the environmental audio signal is recognized within the preset time, executing 305; and if the text information in the environment audio signal is not identified within the preset time, generating a visual video by using the first special effect generation model.
Optionally, when the ambient audio signal input by the microphone device is acquired, acquiring the sound intensity of the ambient audio signal; judging whether the sound intensity of the environment audio signal is greater than a preset decibel or not; and if the sound intensity of the environment audio signal is greater than the preset decibel, performing character recognition on the environment audio signal. The preset decibel may be 20 decibels, 30 decibels, etc. When the environment audio signal is too small, the user may not speak or sing towards the microphone device at the moment, the environment audio signal is not input by the user at the moment, audio visualization is not performed at the moment, and consumption of the projection device can be reduced.
305. And adjusting the picture change parameters of the first special effect generation model based on the frequency spectrum information of the environment audio signal to obtain a second special effect generation model.
The frequency spectrum is short for frequency spectrum density and is a distribution curve of frequency. The complex oscillations are decomposed into harmonic oscillations of different amplitudes and different frequencies, and the pattern of the amplitude of these harmonic oscillations arranged in terms of frequency is called the frequency spectrum. Frequency spectrum is widely used in acoustic, optical and radio technologies. The frequency spectrum introduces the study of the signal from the time domain to the frequency domain, leading to a more intuitive understanding. The frequency spectrum, also called vibration spectrum, reflects the most basic physical quantity of the vibration phenomenon, namely the frequency, and the simple periodic vibration has only one frequency. Complex motion cannot describe the motion condition of the complex motion by using a frequency, and the characteristics of the complex motion cannot be quantitatively described from a vibration pattern, and a frequency spectrum is usually used for describing the complex vibration condition. Any complex vibration can be decomposed into the sum of many simple harmonic vibrations of different amplitudes and different frequencies. In order to analyze the nature of the actual vibration, an image in which the amplitudes of the partial vibrations are arranged in the magnitude of their frequencies is referred to as the spectrum of the complex vibration. In the vibration spectrum, the abscissa represents the circular frequency of the partial vibration, and the ordinate represents the partial vibration amplitude. For periodic complex vibration with the frequency f, according to the fourier theorem, the frequency of each simple harmonic vibration decomposed by the complex vibration is an integral multiple of f, namely f, 2f, 3f, 4f, …, and the vibration spectrum is a discrete line spectrum, and each line in the graph is called a spectral line. For non-periodic vibrations (such as damped vibrations or short bursts), it can be decomposed into the sum of an infinite number of simple harmonic vibrations with a continuous distribution of frequencies, according to a fourier integral. Because the spectral lines become infinite, the vibration spectrum is not a separate linear spectrum any more, the top ends of the spectral lines are dense to form a continuous curve, namely a so-called continuous spectrum, and the continuous spectrum curve is the envelope curve of the spectral lines; it is also possible to decompose into many simple harmonic vibrations of an insurmountable frequency to form a discrete spectrum.
Specifically, fourier transform is performed on the environmental audio signal to obtain spectral information of the environmental audio signal. Fourier transform, meaning that a certain function satisfying a certain condition can be represented as a trigonometric function (sine and/or cosine function) or a linear combination of their integrals. In different fields of research, fourier transforms have many different variant forms, such as continuous fourier transforms and discrete fourier transforms.
Preferably, the fast fourier transform is performed on the ambient audio signal to obtain spectral information of the ambient audio signal. Fast Fourier Transform (FFT) is an efficient algorithm for Discrete Fourier Transform (DFT). Fast Fourier Transform (FFT), a general term for an efficient and fast calculation method for calculating discrete Fourier transform using a computer, is abbreviated as FFT. The fast fourier transform was proposed in 1965 by j.w. kuri and t.w. graph base. The multiplication times required by a computer for calculating the discrete Fourier transform can be greatly reduced by adopting the algorithm, and particularly, the more the number of the converted sampling points is, the more remarkable the calculation amount of the FFT algorithm is saved.
Wherein, the picture change parameters and the composition pictures of the first special effect generation model are set by default. The frequency spectrum information of the environment audio signal is input into the first special effect generation model, and the picture change parameters of the first special effect generation model can be changed, so that the second special effect generation model is obtained. The second special effect generation model is associated with the frequency spectrum information of the environment audio signal, so that the second special effect generation model can generate a special effect matched with the background music after visualizing the character information.
Optionally, before adjusting a picture change parameter of the first special effect generation model based on the spectral information of the environmental audio signal to obtain the second special effect generation model, the method may include:
(1) and when character information in the environment audio signal is identified, carrying out voiceprint identification on the environment audio signal to obtain a speaking user identifier corresponding to the environment audio signal.
Voiceprint Recognition (VPR), also known as Speaker Recognition (Speaker Recognition), has two categories, namely Speaker Identification (Speaker Identification) and Speaker Verification (Speaker Verification). The former is used for judging which one of a plurality of people said a certain section of voice, and is a 'one-out-of-multiple' problem; the latter is used to confirm whether a certain speech is spoken by a given person, which is a "one-to-one decision" problem.
Specifically, a user identifier of the projection device is obtained, where the user identifier may be a name, a contact phone, and the like, and the user identifier of the projection device is an account registration user who performs account registration on the projection device. Matching the audio information of the user identification of the projection equipment with the environment audio signal, determining that the speaker corresponding to the audio information of the user identification of the projection equipment is the same as the speaker in the environment audio signal when the audio information of the user identification of the projection equipment is matched with the environment audio signal, and determining the user identification of the projection equipment as the speaker corresponding to the environment audio signal. When the audio information of the user identifier of the projection equipment is not matched with the environmental audio signal, a pre-stored user identifier set is obtained, the environmental audio signal is respectively matched with a plurality of user identifiers in the user identifier set, and a speaking user identifier corresponding to the environmental audio signal is obtained. The user identifier set is identifiers of a plurality of associated users, which are previously entered by the user on the projection device, for example, the account registration user enters user identifiers of friends or family and audio information in advance to form the user identifier set. The voice print recognition is preferentially carried out on the account number registered user, when the environment audio signal is determined not to be sent by the account number registered user, whether the environment audio signal is the audio signal of the associated user is judged, and the problem that the recognition time is too long due to random voice print recognition can be solved.
(2) A first special effect generation model is determined from a plurality of preset special effect generation models based on the speaking user identification.
Specifically, a plurality of speaking user identifiers and corresponding special effect generation models are stored in the projection device in advance. For example, a special effect generation model of the account number registered user is a special effect generation model A, and the special effect generation model A mainly generates a histogram; the method comprises the following steps that a special effect generation model of a user son is registered by an account and is a special effect generation model B, and the special effect generation model B mainly generates a volcano effect graph; the special effect generating model of the account registration user daughter is a special effect generating model C which mainly generates a firework effect diagram. And determining a first special effect generation model according to the corresponding relation between the preset user identification and the special effect generation model and the speaking user identification corresponding to the environment audio signal. For example, if the identification of the speaking user is identified as the identification of the son of the account registration user, the special effect generation model B is determined as a first special effect generation model.
(3) And adjusting the picture change parameters of the first special effect generation model based on the frequency spectrum information of the environment audio signal to obtain a second special effect generation model.
Further, when the special effect generation model manually input by the user is obtained, the special effect generation model manually input by the user is determined as the first special effect generation model. That is, the user can adjust the special effect generation model at any time, thereby changing the special effect in real time.
306. And performing visualization processing on the character information by using the second special effect generation model to obtain a visual video.
307. The input audio signal is determined as a system audio signal.
308. And acquiring lyric information in the system audio signal.
Specifically, each character in the lyric information is acquired.
309. And carrying out visualization processing on the lyric information by using the first special effect generation model to obtain a visual video.
Wherein, the picture change parameters and the composition pictures of the first special effect generation model are set by default. After the lyric information is input into the first special effect generation model, each character in the lyric information is determined as a picture element by the first special effect generation model, and each character in the lyric information is controlled to move according to the picture change parameters to obtain the special effect of the lyric information, so that a visual video is formed.
310. And displaying the visual video through the projection device.
Further, referring to fig. 5, fig. 5 is a flowchart illustrating a method for audio visualization according to another embodiment of the present disclosure. As shown in fig. 5, the method of audio visualization includes:
401. an input audio signal is acquired.
In the embodiment of the present disclosure, the input audio signal may be a system audio signal input by a media player, or may be an environment audio signal input by a microphone device. Media players, also called Media players, generally refer to playing software for playing multimedia in a computer, which gathers decoders together to generate playing functions, such as Windows Media Player. The system audio signal input by the media player is a processed audio file which is provided with lyric information. The environmental audio signal directly input through the microphone device has no text information and needs to be subjected to text recognition. Therefore, different processing is required for audio signals of different sources.
402. It is determined whether the source of the input audio signal is a microphone device.
In a particular embodiment, determining whether the source of the input audio signal is a microphone device comprises: detecting whether a microphone interface is triggered; if the microphone interface is triggered, the source of the ambient audio signal is determined to be the microphone device. If the microphone interface is not triggered, the source of the ambient audio signal is determined to be the media player.
Specifically, the state of the microphone interface is detected, and whether the microphone interface is triggered is detected. For example, when a user plugs a plug of a microphone device into a microphone interface on the projection device, the microphone interface is triggered, and the projection device detects that the microphone interface is triggered.
Further, if the source of the input audio signal is a microphone device, which indicates that the input audio signal is an environmental audio signal and a step of character recognition is required, 4031 is executed; if the source of the input audio signal is not the microphone device, indicating that the input audio signal is the system audio signal and the text recognition is not needed, then step 404 is executed.
4031. An input audio signal is determined as an ambient audio signal.
And determining the input audio signal as an environment audio signal, and determining to acquire the environment audio signal input by the microphone device.
4032. And performing character recognition on the environment audio signal.
In the embodiment of the disclosure, when the environmental audio signal input by the microphone device is acquired, the environmental audio signal is subjected to character recognition. Specifically, the projection device is provided with an automatic voice recognition module, and the automatic voice recognition module can be used for carrying out character recognition on the environment audio signal.
Optionally, when the ambient audio signal input by the microphone device is acquired, acquiring the sound intensity of the ambient audio signal; judging whether the sound intensity of the environment audio signal is greater than a preset decibel or not; and if the sound intensity of the environment audio signal is greater than the preset decibel, performing character recognition on the environment audio signal. The preset decibel may be 20 decibels, 30 decibels, etc. When the environment audio signal is too small, the user may not speak or sing towards the microphone device at the moment, the environment audio signal is not input by the user at the moment, audio visualization is not performed at the moment, and consumption of the projection device can be reduced.
4033. It is determined whether textual information in the ambient audio signal is identified.
Specifically, when the text information in the environment audio signal is identified, 4034 is executed; when no text information in the environmental audio signal is identified, 4051 is performed.
Optionally, when performing text recognition on the environmental audio signal, timing the recognition time, and determining whether text information in the environmental audio signal is recognized within a preset time, if the text information in the environmental audio signal is recognized within the preset time, executing 4034; if the text information in the environmental audio signal is not recognized within the preset time, 4051 is executed.
4034. And adjusting the picture change parameters of the first special effect generation model based on the frequency spectrum information of the environment audio signal to obtain a second special effect generation model.
Specifically, fourier transform is performed on the environmental audio signal to obtain spectral information of the environmental audio signal. Fourier transform, meaning that a certain function satisfying a certain condition can be represented as a trigonometric function (sine and/or cosine function) or a linear combination of their integrals. In different fields of research, fourier transforms have many different variant forms, such as continuous fourier transforms and discrete fourier transforms.
Preferably, the fast fourier transform is performed on the ambient audio signal to obtain spectral information of the ambient audio signal. Fast Fourier Transform (FFT) is an efficient algorithm for Discrete Fourier Transform (DFT). Fast Fourier Transform (FFT), a general term for an efficient and fast calculation method for calculating discrete Fourier transform using a computer, is abbreviated as FFT. The fast fourier transform was proposed in 1965 by j.w. kuri and t.w. graph base. The multiplication times required by a computer for calculating the discrete Fourier transform can be greatly reduced by adopting the algorithm, and particularly, the more the number of the converted sampling points is, the more remarkable the calculation amount of the FFT algorithm is saved.
Wherein, the picture change parameters and the composition pictures of the first special effect generation model are set by default. The frequency spectrum information of the environment audio signal is input into the first special effect generation model, and the picture change parameters of the first special effect generation model can be changed, so that the second special effect generation model is obtained. The second special effect generation model is associated with the frequency spectrum information of the environment audio signal, so that the second special effect generation model can generate a special effect matched with the background music after visualizing the character information.
Optionally, before adjusting a picture change parameter of the first special effect generation model based on the spectral information of the environmental audio signal to obtain the second special effect generation model, the method may include:
(1) and when character information in the environment audio signal is identified, carrying out voiceprint identification on the environment audio signal to obtain a speaking user identifier corresponding to the environment audio signal.
Specifically, a user identifier of the projection device is obtained, where the user identifier may be a name, a contact phone, and the like, and the user identifier of the projection device is an account registration user who performs account registration on the projection device. Matching the audio information of the user identification of the projection equipment with the environment audio signal, determining that the speaker corresponding to the audio information of the user identification of the projection equipment is the same as the speaker in the environment audio signal when the audio information of the user identification of the projection equipment is matched with the environment audio signal, and determining the user identification of the projection equipment as the speaker corresponding to the environment audio signal. When the audio information of the user identifier of the projection equipment is not matched with the environmental audio signal, a pre-stored user identifier set is obtained, the environmental audio signal is respectively matched with a plurality of user identifiers in the user identifier set, and a speaking user identifier corresponding to the environmental audio signal is obtained. The user identifier set is identifiers of a plurality of associated users, which are previously entered by the user on the projection device, for example, the account registration user enters user identifiers of friends or family and audio information in advance to form the user identifier set. The voice print recognition is preferentially carried out on the account number registered user, when the environment audio signal is determined not to be sent by the account number registered user, whether the environment audio signal is the audio signal of the associated user is judged, and the problem that the recognition time is too long due to random voice print recognition can be solved.
(2) A first special effect generation model is determined from a plurality of preset special effect generation models based on the speaking user identification.
Specifically, a plurality of speaking user identifiers and corresponding special effect generation models are stored in the projection device in advance. For example, a special effect generation model of the account number registered user is a special effect generation model A, and the special effect generation model A mainly generates a histogram; the method comprises the following steps that a special effect generation model of a user son is registered by an account and is a special effect generation model B, and the special effect generation model B mainly generates a volcano effect graph; the special effect generating model of the account registration user daughter is a special effect generating model C which mainly generates a firework effect diagram. And determining a first special effect generation model according to the corresponding relation between the preset user identification and the special effect generation model and the speaking user identification corresponding to the environment audio signal. For example, if the identification of the speaking user is identified as the identification of the son of the account registration user, the special effect generation model B is determined as a first special effect generation model.
(3) And adjusting the picture change parameters of the first special effect generation model based on the frequency spectrum information of the environment audio signal to obtain a second special effect generation model.
Further, when the special effect generation model manually input by the user is obtained, the special effect generation model manually input by the user is determined as the first special effect generation model. That is, the user can adjust the special effect generation model at any time, thereby changing the special effect in real time.
4035. And performing visualization processing on the character information by using the second special effect generation model to obtain a visual video.
4041. The input audio signal is determined as a system audio signal.
4042. And judging whether the lyric information exists in the system audio signal.
Judging whether the system audio signal has lyric information, if so, executing 4043; if the lyric information is not present in the system audio signal, 406 is performed.
4043. And acquiring lyric information in the system audio signal.
Specifically, each character in the lyric information is acquired.
4044. And carrying out visualization processing on the lyric information by using a second special effect generation model to obtain a visual video.
Specifically, picture change parameters of the first special effect generation model are adjusted based on frequency spectrum information of the environment audio signal to obtain a second special effect generation model, and the second special effect generation model is used for performing visualization processing on lyric information to obtain a visual video. Of course, in other embodiments, the lyric information may also be visualized by using the first special effect generation model to obtain a visualized video.
4051. A sound type of the ambient audio signal is identified.
Specifically, the sound type may be wind sound, rain sound, thunder sound, animal cry, and the like. Further, the animal cry includes various animal cry sounds such as cat cry sound, dog cry sound, and the like.
4052. And determining a special effect generation model matched with the sound type based on the sound type.
Specifically, the corresponding relationship between the sound type and the special effect generation model is prestored in the projection device. For example, when the sound type is cat cry, the special effect generation model can generate a picture of cat jumping; when the sound type is wind sound, the special effect generation model can generate a windy picture; when the sound type is rain, the special effect generation model can generate a rainy picture. And determining a special effect generation model matched with the sound type according to the sound type identified from the environment audio signal.
4053. And generating a visual video based on the special effect generation model matched with the sound type.
Further, a special effect generation model matching the sound type is determined as a third special effect generation model. The third special effect generation model is set by default. And adjusting the picture change parameters of the third special effect generation model based on the frequency spectrum information of the environmental audio signal to obtain a fourth special effect generation model, and performing visualization processing according to the picture change parameters of the fourth special effect generation model to obtain a visual video. At this time, no text information exists in the visual video. And adjusting the picture change parameters of the special effect generation model according to the frequency spectrum information of the environmental audio signal, and generating a special effect matched with the environmental sound after visualization. For example, the frequency and amplitude of the cat beat are more consistent with the rhythm of the ambient sounds.
406. And carrying out visualization processing on the system audio signal by using the second special effect generation model to obtain a visual video.
Specifically, picture change parameters of the first special effect generation model are adjusted based on frequency spectrum information of system audio signals to obtain a second special effect generation model, and visualization processing is performed according to the picture change parameters of the second special effect generation model to obtain a visual video. At this time, the visual video has no lyric information.
407. And displaying the visual video through the projection device.
In order to better implement the audio visualization method in the embodiment of the present application, on the basis of the audio visualization method, an audio visualization apparatus is further provided in the embodiment of the present application, as shown in fig. 6, fig. 6 is a schematic structural diagram of an embodiment of the audio visualization apparatus provided in the embodiment of the present disclosure, and the audio visualization apparatus includes:
a character recognition unit 602, configured to perform character recognition on an environmental audio signal input by a microphone device when the environmental audio signal is acquired;
the visualization processing unit 603 is configured to, when text information in the environmental audio signal is identified, perform visualization processing on the text information by using the first special effect generation model to obtain a visualization video;
and a projection unit 604 for displaying the visual video through the projection device.
Optionally, the visualization processing unit 603 is configured to, when character information in the environmental audio signal is identified, adjust a picture change parameter of the first special effect generation model based on frequency spectrum information of the environmental audio signal to obtain a second special effect generation model;
and performing visualization processing on the character information by using the second special effect generation model to obtain a visual video.
Optionally, the visualization processing unit 603 is configured to, when character information in the environment audio signal is identified, perform voiceprint identification on the environment audio signal to obtain a speaking user identifier corresponding to the environment audio signal;
determining a first special effect generation model from a plurality of preset special effect generation models based on the speaking user identification;
and adjusting the picture change parameters of the first special effect generation model based on the frequency spectrum information of the environment audio signal to obtain a second special effect generation model.
Optionally, the apparatus for audio visualization includes an obtaining unit 601, where the obtaining unit 601 is configured to obtain an input audio signal;
determining whether a source of an input audio signal is a microphone apparatus;
if the source of the input audio signal is a microphone device, the input audio signal is determined to be an ambient audio signal.
Optionally, the obtaining unit 601 is configured to detect whether the microphone interface is triggered;
if the microphone interface is triggered, the source of the input audio signal is determined to be the microphone device.
Optionally, the obtaining unit 601 is configured to determine the input audio signal as a system audio signal if the source of the input audio signal is not a microphone device;
acquiring lyric information in a system audio signal;
and carrying out visualization processing on the lyric information by using the first special effect generation model to obtain a visual video.
Optionally, the character recognition unit 602 is configured to, when an environmental audio signal input by the microphone device is acquired, acquire a sound intensity of the environmental audio signal;
judging whether the sound intensity of the environment audio signal is greater than a preset decibel or not;
and if the sound intensity of the environment audio signal is greater than the preset decibel, performing character recognition on the environment audio signal.
The embodiment of the present disclosure also provides a projection device, which integrates any audio visualization apparatus provided by the embodiment of the present disclosure. As shown in fig. 7, a schematic structural diagram of a projection device according to an embodiment of the disclosure is shown, specifically:
the projection device may include components such as a processor 701 of one or more processing cores, memory 702 of one or more computer-readable storage media, a power supply 703, and an input unit 704. It will be appreciated by those skilled in the art that the projection device configurations shown in the figures do not constitute limitations on the projection device, and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. Wherein:
the processor 701 is a control center of the projection apparatus, connects various parts of the entire projection apparatus by using various interfaces and lines, and performs various functions of the projection apparatus and processes data by running or executing software programs and/or modules stored in the memory 702 and calling data stored in the memory 702, thereby performing overall monitoring of the projection apparatus. Optionally, processor 701 may include one or more processing cores; preferably, the processor 701 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 701.
The memory 702 may be used to store software programs and modules, and the processor 701 executes various functional applications and data processing by operating the software programs and modules stored in the memory 702. The memory 702 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the projection apparatus, and the like. Further, the memory 702 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 702 may also include a memory controller to provide the processor 701 with access to the memory 702.
The projection device further includes a power source 703 for supplying power to each component, and preferably, the power source 703 may be logically connected to the processor 701 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The power supply 703 may also include any component including one or more of a dc or ac power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The projection device may also include an input unit 704, where the input unit 704 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the projection apparatus may further include a display unit and the like, which will not be described in detail herein. Specifically, in this embodiment, the processor 701 in the projection device loads an executable file corresponding to a process of one or more application programs into the memory 702 according to the following instructions, and the processor 701 runs the application program stored in the memory 702, so as to implement various functions as follows:
when an environment audio signal input by microphone equipment is acquired, character recognition is carried out on the environment audio signal;
when character information in the environment audio signal is identified, carrying out visualization processing on the character information by using a first special effect generation model to obtain a visual video;
and displaying the visual video through the projection device.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present disclosure provide a computer-readable storage medium, which may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like. Stored thereon, a computer program is loaded by a processor to perform the steps of any of the methods for audio visualization provided by the embodiments of the present disclosure. For example, the computer program may be loaded by a processor to perform the steps of:
when an environment audio signal input by microphone equipment is acquired, character recognition is carried out on the environment audio signal;
when character information in the environment audio signal is identified, carrying out visualization processing on the character information by using a first special effect generation model to obtain a visual video;
and displaying the visual video through the projection device.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed descriptions of other embodiments, and are not described herein again.
In a specific implementation, each unit or structure may be implemented as an independent entity, or may be combined arbitrarily to be implemented as one or several entities, and the specific implementation of each unit or structure may refer to the foregoing method embodiment, which is not described herein again.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
The method, the apparatus, the projection device and the storage medium for audio visualization provided by the embodiments of the present disclosure are described in detail above, and specific examples are applied herein to illustrate the principles and implementations of the present disclosure, and the descriptions of the above embodiments are only used to help understand the method and the core ideas of the present disclosure; meanwhile, for those skilled in the art, according to the idea of the present disclosure, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present description should not be construed as a limitation to the present disclosure.

Claims (11)

1. A method for audio visualization is applied to a projection device, the projection device is provided with a microphone interface, the microphone interface is used for connecting a microphone device, the microphone device is used for collecting an environmental audio signal of an environment where the projection device is located, and the method for audio visualization comprises the following steps:
when an environment audio signal input by the microphone device is acquired, performing character recognition on the environment audio signal;
when character information in the environment audio signal is identified, carrying out visualization processing on the character information by using a first special effect generation model to obtain a visual video;
and displaying the visual video through the projection equipment.
2. The method of audio visualization of claim 1,
when the text information in the environment audio signal is identified, performing visualization processing on the text information by using a first special effect generation model to obtain a visual video, including:
when character information in the environment audio signal is identified, adjusting picture change parameters of the first special effect generation model based on frequency spectrum information of the environment audio signal to obtain a second special effect generation model;
and carrying out visualization processing on the character information by using the second special effect generation model to obtain a visual video.
3. The method of audio visualization according to claim 2, wherein the adjusting the picture variation parameter of the first special effect generation model based on the spectrum information of the environmental audio signal when the text information in the environmental audio signal is recognized to obtain a second special effect generation model comprises:
when character information in the environment audio signal is identified, carrying out voiceprint identification on the environment audio signal to obtain a speaking user identifier corresponding to the environment audio signal;
determining the first special effect generation model from a plurality of preset special effect generation models based on the speaking user identification;
and adjusting the picture change parameters of the first special effect generation model based on the frequency spectrum information of the environment audio signal to obtain the second special effect generation model.
4. The method for audio visualization according to claim 1, wherein said performing text recognition on the ambient audio signal input by the microphone device when the ambient audio signal is acquired comprises:
acquiring an input audio signal;
determining whether a source of the input audio signal is the microphone apparatus;
determining the input audio signal as the ambient audio signal if the source of the input audio signal is the microphone apparatus.
5. The method of audio visualization of claim 4, wherein said determining whether the source of the input audio signal is the microphone device comprises:
detecting whether the microphone interface is triggered;
determining that the source of the input audio signal is the microphone device if the microphone interface is triggered.
6. The method of audio visualization of claim 4, wherein the method of audio visualization further comprises:
determining the input audio signal as a system audio signal if the source of the input audio signal is not the microphone apparatus;
acquiring lyric information in the system audio signal;
and carrying out visualization processing on the lyric information by using the first special effect generation model to obtain a visual video.
7. The method for audio visualization according to claim 1, wherein the performing text recognition on the environmental audio signal input by the microphone device when the environmental audio signal is acquired comprises:
when the environment audio signal input by the microphone device is acquired, acquiring the sound intensity of the environment audio signal;
judging whether the sound intensity of the environment audio signal is greater than a preset decibel or not;
and if the sound intensity of the environment audio signal is greater than a preset decibel, performing character recognition on the environment audio signal.
8. The method of audio visualization of claim 1, wherein the method of audio visualization further comprises:
when the text information in the environment audio signal is not identified, identifying the sound type of the environment audio signal;
determining a special effect generation model matched with the sound type based on the sound type;
and generating a visual video based on the special effect generation model matched with the sound type.
9. An apparatus for audio visualization, the apparatus comprising:
the character recognition unit is used for performing character recognition on the environment audio signal when the environment audio signal input by the microphone device is acquired;
the visualization processing unit is used for carrying out visualization processing on the character information by utilizing a first special effect generation model when the character information in the environment audio signal is identified to obtain a visualization video;
and the projection unit is used for displaying the visual video through the projection equipment.
10. A projection device, characterized in that the projection device comprises:
one or more processors;
a memory; and
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the processor to implement the method of audio visualization of any of claims 1 to 8.
11. A computer-readable storage medium, having stored thereon a computer program which is loaded by a processor for performing the steps in the method of audio visualization according to any of the claims 1 to 8.
CN202110330592.3A 2021-03-29 2021-03-29 Audio visualization method and device, projection equipment and storage medium Active CN112714355B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110330592.3A CN112714355B (en) 2021-03-29 2021-03-29 Audio visualization method and device, projection equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110330592.3A CN112714355B (en) 2021-03-29 2021-03-29 Audio visualization method and device, projection equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112714355A true CN112714355A (en) 2021-04-27
CN112714355B CN112714355B (en) 2021-08-31

Family

ID=75550336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110330592.3A Active CN112714355B (en) 2021-03-29 2021-03-29 Audio visualization method and device, projection equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112714355B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113450804A (en) * 2021-06-23 2021-09-28 深圳市火乐科技发展有限公司 Voice visualization method and device, projection equipment and computer readable storage medium
CN113746911A (en) * 2021-08-26 2021-12-03 科大讯飞股份有限公司 Audio processing method and related device, electronic equipment and storage medium
CN114090696A (en) * 2021-10-13 2022-02-25 桂林理工大学 Environment sound big data visualization system and method
CN114760493A (en) * 2022-03-25 2022-07-15 腾讯音乐娱乐科技(深圳)有限公司 Method, device and storage medium for adding lyric progress image
CN115129211A (en) * 2022-04-24 2022-09-30 北京达佳互联信息技术有限公司 Method and device for generating multimedia file, electronic equipment and storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101577114B (en) * 2009-06-18 2012-01-25 无锡中星微电子有限公司 Method and device for implementing audio visualization
US20120173008A1 (en) * 2009-09-21 2012-07-05 Koninklijke Philips Electronics N.V. Method and device for processing audio data
CN104754261A (en) * 2013-12-26 2015-07-01 深圳市快播科技有限公司 Projection equipment and projection method
CN204759782U (en) * 2015-05-13 2015-11-11 郑州电力高等专科学校 Projection equipment is used in teaching
US20170323655A1 (en) * 2011-06-17 2017-11-09 At&T Intellectual Property I, L.P. Speaker association with a visual representation of spoken content
CN107943964A (en) * 2017-11-27 2018-04-20 腾讯音乐娱乐科技(深圳)有限公司 Lyric display method, device and computer-readable recording medium
CN107944397A (en) * 2017-11-27 2018-04-20 腾讯音乐娱乐科技(深圳)有限公司 Video recording method, device and computer-readable recording medium
CN107995478A (en) * 2017-12-13 2018-05-04 歌尔股份有限公司 Projecting method and projector equipment
CN108255876A (en) * 2016-12-29 2018-07-06 中移(苏州)软件技术有限公司 A kind of audio emotion visualization method and device
CN109257659A (en) * 2018-11-16 2019-01-22 北京微播视界科技有限公司 Subtitle adding method, device, electronic equipment and computer readable storage medium
CN109672832A (en) * 2018-12-20 2019-04-23 四川湖山电器股份有限公司 The processing method of digital movie interlude lyrics subtitle realization dynamic Special display effect
CN110647002A (en) * 2019-09-30 2020-01-03 深圳市火乐科技发展有限公司 Projection device
CN111552836A (en) * 2020-04-29 2020-08-18 咪咕文化科技有限公司 Lyric display method, device and storage medium
CN112185415A (en) * 2020-09-10 2021-01-05 珠海格力电器股份有限公司 Sound visualization method and device, storage medium and MR mixed reality equipment

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101577114B (en) * 2009-06-18 2012-01-25 无锡中星微电子有限公司 Method and device for implementing audio visualization
US20120173008A1 (en) * 2009-09-21 2012-07-05 Koninklijke Philips Electronics N.V. Method and device for processing audio data
US20170323655A1 (en) * 2011-06-17 2017-11-09 At&T Intellectual Property I, L.P. Speaker association with a visual representation of spoken content
CN104754261A (en) * 2013-12-26 2015-07-01 深圳市快播科技有限公司 Projection equipment and projection method
CN204759782U (en) * 2015-05-13 2015-11-11 郑州电力高等专科学校 Projection equipment is used in teaching
CN108255876A (en) * 2016-12-29 2018-07-06 中移(苏州)软件技术有限公司 A kind of audio emotion visualization method and device
CN107944397A (en) * 2017-11-27 2018-04-20 腾讯音乐娱乐科技(深圳)有限公司 Video recording method, device and computer-readable recording medium
CN107943964A (en) * 2017-11-27 2018-04-20 腾讯音乐娱乐科技(深圳)有限公司 Lyric display method, device and computer-readable recording medium
CN107995478A (en) * 2017-12-13 2018-05-04 歌尔股份有限公司 Projecting method and projector equipment
CN109257659A (en) * 2018-11-16 2019-01-22 北京微播视界科技有限公司 Subtitle adding method, device, electronic equipment and computer readable storage medium
CN109672832A (en) * 2018-12-20 2019-04-23 四川湖山电器股份有限公司 The processing method of digital movie interlude lyrics subtitle realization dynamic Special display effect
CN110647002A (en) * 2019-09-30 2020-01-03 深圳市火乐科技发展有限公司 Projection device
CN111552836A (en) * 2020-04-29 2020-08-18 咪咕文化科技有限公司 Lyric display method, device and storage medium
CN112185415A (en) * 2020-09-10 2021-01-05 珠海格力电器股份有限公司 Sound visualization method and device, storage medium and MR mixed reality equipment

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113450804A (en) * 2021-06-23 2021-09-28 深圳市火乐科技发展有限公司 Voice visualization method and device, projection equipment and computer readable storage medium
CN113746911A (en) * 2021-08-26 2021-12-03 科大讯飞股份有限公司 Audio processing method and related device, electronic equipment and storage medium
CN114090696A (en) * 2021-10-13 2022-02-25 桂林理工大学 Environment sound big data visualization system and method
CN114090696B (en) * 2021-10-13 2024-03-29 桂林理工大学 Environmental sound big data visualization system and method
CN114760493A (en) * 2022-03-25 2022-07-15 腾讯音乐娱乐科技(深圳)有限公司 Method, device and storage medium for adding lyric progress image
CN115129211A (en) * 2022-04-24 2022-09-30 北京达佳互联信息技术有限公司 Method and device for generating multimedia file, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112714355B (en) 2021-08-31

Similar Documents

Publication Publication Date Title
CN112714355B (en) Audio visualization method and device, projection equipment and storage medium
CN109166593B (en) Audio data processing method, device and storage medium
CN112863547B (en) Virtual resource transfer processing method, device, storage medium and computer equipment
CN105556592B (en) Detect the wake-up tone of self generation
CN110265011B (en) Electronic equipment interaction method and electronic equipment
CN110838286A (en) Model training method, language identification method, device and equipment
CN109346076A (en) Interactive voice, method of speech processing, device and system
CN107767869A (en) Method and apparatus for providing voice service
CN108922525B (en) Voice processing method, device, storage medium and electronic equipment
CN112309365B (en) Training method and device of speech synthesis model, storage medium and electronic equipment
CN108806684B (en) Position prompting method and device, storage medium and electronic equipment
KR102523135B1 (en) Electronic Device and the Method for Editing Caption by the Device
Pillos et al. A Real-Time Environmental Sound Recognition System for the Android OS.
CN110248021A (en) A kind of smart machine method for controlling volume and system
CN113330511B (en) Voice recognition method, voice recognition device, storage medium and electronic equipment
WO2021114808A1 (en) Audio processing method and apparatus, electronic device and storage medium
CN107680614B (en) Audio signal processing method, apparatus and storage medium
WO2022218027A1 (en) Audio playing method and apparatus, and computer-readable storage medium and electronic device
US11511200B2 (en) Game playing method and system based on a multimedia file
CN203167075U (en) Mobile terminal
WO2017177629A1 (en) Far-talking voice recognition method and device
CN110070891B (en) Song identification method and device and storage medium
CN105852810A (en) Sleep control method
CN112259077B (en) Speech recognition method, device, terminal and storage medium
JP2011170622A (en) Content providing system, content providing method, and content providing program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant