CN118038888A - Determination method and device for white definition, electronic equipment and storage medium - Google Patents

Determination method and device for white definition, electronic equipment and storage medium Download PDF

Info

Publication number
CN118038888A
CN118038888A CN202410171946.8A CN202410171946A CN118038888A CN 118038888 A CN118038888 A CN 118038888A CN 202410171946 A CN202410171946 A CN 202410171946A CN 118038888 A CN118038888 A CN 118038888A
Authority
CN
China
Prior art keywords
audio
determining
loudness
sampling point
screened
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410171946.8A
Other languages
Chinese (zh)
Inventor
刘阳
刘长滔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202410171946.8A priority Critical patent/CN118038888A/en
Publication of CN118038888A publication Critical patent/CN118038888A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Stereophonic System (AREA)

Abstract

The application discloses a method and a device for determining definition of white, electronic equipment and a storage medium, and belongs to the technical field of audio processing. The method comprises the following steps: acquiring audio to be screened, wherein the audio to be screened comprises white sound and background sound; separating the white sound and the background sound included in the audio to be screened to obtain a first audio and a second audio; determining a third audio based on the first audio and the second audio, the third audio including a first channel signal and a second channel signal, the first channel signal corresponding to the first audio and the second channel signal corresponding to the second audio; determining a first loudness of white sound in the audio to be screened based on the first channel signal, and determining a second loudness of background sound in the audio to be screened based on the second channel signal; based on the first loudness and the second loudness, the dialect definition of the audio to be screened is determined. The method of the embodiment of the application can improve the accuracy of the white definition result.

Description

Determination method and device for white definition, electronic equipment and storage medium
Technical Field
The application belongs to the technical field of audio processing, and particularly relates to a method and a device for determining white definition, electronic equipment and a storage medium.
Background
Sound in a movie is typically divided into two parts, a dialogue sound and a background sound. Due to limitations of the mixing technology and method, the problems of different degrees of dialogue clarity exist in part of movie and television dramas, such as too low dialogue sound or too loud music and sound effect, which seriously affect the audiovisual experience.
In order to solve the above-mentioned problems, it is necessary to find a movie with low definition for white first, and in the related art, a manual subjective determination method is generally adopted, so that the accuracy of the obtained result is low.
Disclosure of Invention
The embodiment of the application aims to provide a method, a device, electronic equipment and a storage medium for determining the definition of white, which can solve the problem that the accuracy of the obtained result is lower in the existing method for determining the definition of audio.
In a first aspect, an embodiment of the present application provides a method for determining a white definition, where the method includes:
acquiring audio to be screened, wherein the audio to be screened comprises white sound and background sound;
Separating the dialogue sound and the background sound included in the audio to be screened to obtain a first audio and a second audio, wherein the first audio is corresponding to the dialogue sound, and the second audio is corresponding to the background sound;
Determining a third audio based on the first audio and the second audio, the third audio comprising a first channel signal and a second channel signal, the first channel signal corresponding to the first audio and the second channel signal corresponding to the second audio;
Determining a first loudness of the dialogue sound in the audio to be screened based on the first channel signal, and determining a second loudness of the background sound in the audio to be screened based on the second channel signal;
and determining the white definition of the audio to be screened based on the first loudness and the second loudness.
In a second aspect, an embodiment of the present application provides a device for determining a white definition, where the device includes:
the acquisition module is used for acquiring audio to be screened, wherein the audio to be screened comprises white sound and background sound;
The separation module is used for separating the dialogue sound and the background sound included in the audio to be screened to obtain a first audio and a second audio, wherein the first audio is the audio corresponding to the dialogue sound, and the second audio is the audio corresponding to the background sound;
a first determining module configured to determine a third audio based on the first audio and the second audio, the third audio including a first channel signal and a second channel signal, the first channel signal corresponding to the first audio, the second channel signal corresponding to the second audio;
A second determining module, configured to determine a first loudness of the white sound in the audio to be screened based on the first channel signal, and determine a second loudness of the background sound in the audio to be screened based on the second channel signal;
and the third determining module is used for determining the white contrast definition of the audio to be screened based on the first loudness and the second loudness.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores a program or instructions executable on the processor, the program or instructions implementing the steps of the method for determining white definition according to the first aspect when executed by the processor.
In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which, when executed by a processor, implement the steps of the method for determining white definition according to the first aspect.
In the embodiment of the application, white sound and background sound mixed in the audio to be screened are separated to obtain first audio corresponding to the white sound and second audio corresponding to the background sound, and third audio is determined based on the first audio and the second audio. In the third audio, the first loudness of the white sound is determined according to the first channel signal, the second loudness of the background sound is determined according to the second channel signal, and then analysis is performed according to the first loudness and the second loudness to determine the white definition of the audio to be screened. In the process, the white sound and the background sound in the third audio are not interfered with each other, the background sound and the white sound can be accurately analyzed, and finally the white definition of the audio to be screened is quantitatively analyzed through the first loudness and the second loudness.
Drawings
Fig. 1 is a schematic flow chart of a method for determining white definition according to an embodiment of the present application;
FIG. 2 is a second flowchart of a method for determining the definition of white according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a device for determining white definition according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions of the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.
The method for determining the white definition provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.
Fig. 1 is a flow chart of a method for determining white definition according to an embodiment of the present application, where the method for determining white definition specifically includes the following steps:
Step 101, obtaining audio to be screened, wherein the audio to be screened comprises white sound and background sound.
Wherein the audio to be screened may be mono audio or stereo audio.
In the case where Audio to be screened (hereinafter, may be abbreviated as Audio) is mono Audio, the Audio to be screened may be understood as Audio in which white sound and background sound are mixed.
Under the condition that the audio to be screened is stereo audio, the audio to be screened comprises left channel audio and right channel audio, the left channel audio and the right channel audio are mixed audio of white sound and background sound, and the content of the left channel audio and the content of the right channel audio are consistent.
Step 102, separating the dialogue sound and the background sound included in the audio to be screened to obtain a first audio and a second audio, wherein the first audio is the audio corresponding to the dialogue sound, and the second audio is the audio corresponding to the background sound.
In this step, the dialogue sound and the background sound of the Audio may be extracted by presetting a trained sound separation model, so as to obtain a first Audio (hereinafter, may be abbreviated as audio_v) and a second Audio (hereinafter, may be abbreviated as audio_c), where the audio_v and the audio_c each include a left channel Audio and a right channel Audio.
The Audio content of the left channel Audio and the right channel Audio of the audio_v are basically consistent, and are the white sound extracted from the Audio. Similarly, the Audio content of the left channel Audio and the right channel Audio of audio_c are substantially identical, and are all background sounds extracted from Audio.
In an optional embodiment, when the white sound and the background sound of the Audio are separated by the sound separation model, the Audio may be cut into a plurality of sub-Audio files according to a time sequence, the file size of each sub-Audio file is smaller than or equal to a preset file size, and then the plurality of sub-Audio files are processed in parallel by the sound separation model, so as to obtain the background sound and the white sound corresponding to each sub-Audio file.
Splicing the white sounds corresponding to the plurality of sub-Audio files according to the time sequence to obtain audio_v; and splicing the background sounds corresponding to the plurality of sub-Audio files according to the time sequence to obtain the audio_c.
In the above embodiment, after the large file is split into the plurality of small files, the parallel processing mode is adopted, so that the plurality of small files can be processed simultaneously, the data processing speed is improved, and the problems that in the large file processing process, the file transmission efficiency is low, the network environment is unstable, the data is lost, the transmission is interrupted, the retransmission is required, and the like can be avoided.
Step 103, determining third audio based on the first audio and the second audio, wherein the third audio comprises a first channel signal and a second channel signal, the first channel signal corresponds to the first audio, and the second channel signal corresponds to the second audio.
In an embodiment, the first target channel Audio is selected from the left channel Audio and the right channel Audio of audio_v as the first channel signal, and the second target channel Audio is selected from audio_c as the second channel signal. When the first target channel is the left channel of the audio_v, the second target channel is the left channel of the audio_c; when the first target channel is the right channel of audio_v, the second target channel is the right channel of audio_c.
In another embodiment, the audio_v of the left channel and the Audio of the right channel are combined to obtain a first channel signal, and the audio_c of the left channel and the Audio of the right channel are combined to obtain a second channel signal.
The left channel Audio and the right channel Audio of the third Audio (hereinafter may be simply referred to as Audio 1) are a dialogue corresponding to audio_v and a background corresponding to audio_c, respectively.
In the Audio, the white sound and the background sound are mixed, so that the characteristics of the white sound or the background sound such as the Audio frequency and the amplitude cannot be separately distinguished when the sampling point of the Audio file is analyzed, and it is difficult to quantitatively analyze the background sound and the definition of the white sound.
Step 104, determining a first loudness of the white sound in the audio to be screened based on the first channel signal, and determining a second loudness of the background sound in the audio to be screened based on the second channel signal.
In the Audio1, the left channel Audio and the right channel Audio are respectively background sound and white sound, the background sound and the white sound are not affected, the analysis can be independently performed, the first channel signal corresponds to the white sound in the Audio, the second channel signal corresponds to the background sound in the Audio, the first loudness corresponding to the white sound can be determined through further analysis of the characteristics of the first channel signal, and the second loudness corresponding to the background sound can be determined through further analysis of the characteristics of the second channel signal.
Step 105, determining the white definition of the audio to be screened based on the first loudness and the second loudness.
After the first loudness corresponding to the white sound and the second loudness corresponding to the background sound are obtained, the first loudness and the second loudness are compared, and the white definition of the audio to be screened can be determined according to a comparison result.
In the method for determining the white definition, white sound and background sound mixed in the audio to be screened are separated to obtain first audio corresponding to the white sound and second audio corresponding to the background sound, and third audio is determined based on the first audio and the second audio. In the third audio, the first loudness of the white sound is determined according to the first channel signal, the second loudness of the background sound is determined according to the second channel signal, and then analysis is performed according to the first loudness and the second loudness to determine the white definition of the audio to be screened. In the process, the white sound and the background sound in the third audio are not interfered with each other, the background sound and the white sound can be accurately analyzed, and finally the white definition of the audio to be screened is quantitatively analyzed through the first loudness and the second loudness.
Optionally, step 104, determining a first loudness of the white sound in the audio to be screened based on the first channel signal, and determining a second loudness of the background sound in the audio to be screened based on the second channel signal includes:
Determining at least one first sampling point from sampling points of the first channel signal, wherein the energy value of the first sampling point is larger than a preset energy value;
Determining at least one second sampling point corresponding to the at least one first sampling point from sampling points of the second channel signal, wherein the second sampling point is consistent with the time sequence information of the corresponding first sampling point;
Determining the first loudness based on the at least one first sampling point;
The second loudness is determined based on the at least one second sampling point.
In this embodiment, the first channel signal corresponds to the white-to-white sound in the Audio, each sampling point in the first channel signal is analyzed according to the time sequence, the energy value of each sampling point is obtained, and the sampling point with the energy value greater than the preset energy value is determined as the first sampling point.
After the first sampling point is determined, a second sampling point consistent with the timing information of the first sampling point is determined from the second channel signal, and the following description is given to the above procedure by way of example:
The first channel signal includes n sampling points, the n sampling points are arranged according to a time sequence, and similarly, the second channel signal also includes n sampling points arranged according to a time sequence, and the n sampling points of the second channel signal are in one-to-one correspondence with the n sampling points of the first channel signal. And acquiring n energy values corresponding to n sampling points of the first channel signal, when the energy value of a certain sampling point is larger than a preset energy value, indicating that the audio corresponding to the sampling point has white sound, namely that a person is talking, and conversely, if the energy value of the certain sampling point is not larger than the preset energy value, indicating that the audio corresponding to the sampling point has no person talking, and also can be understood as a blank section of the white sound. According to the magnitude relation between n energy values corresponding to n sampling points of the first channel signal and a preset energy value, m first sampling points can be determined, and according to the corresponding relation between the sampling points of the first channel signal and the sampling points of the second channel signal, m second sampling points can be determined. Then, according to the characteristics of the m first sampling points, the first loudness of the corresponding dialogue sound can be determined, and according to the characteristics of the m second sampling points, the second loudness of the corresponding background sound can be determined.
In this embodiment, by comparing the energy values of the sampling points, it is determined that the corresponding audio is at least one first sampling point with white sound from the sampling points of the first channel signal, and then only the first sampling point and the corresponding second sampling point are analyzed, so that not only can the blank section of the audio be prevented from being subjected to invalid analysis, the efficiency is improved, but also the interference caused by the blank section of the audio to the determination of the loudness is eliminated, and the accuracy of the first loudness and the second loudness is improved.
Optionally, in step 104, determining at least one first sampling point from sampling points of the first channel signal includes:
acquiring audio characteristics of sampling points to be judged, wherein the sampling points to be judged are any one of the sampling points of the first channel signal;
determining the energy value of the sampling point to be judged based on the audio characteristics of the sampling point to be judged;
and under the condition that the energy value of the sampling point to be judged is larger than the preset energy value, determining the sampling point to be judged as the first sampling point.
In this embodiment, the audio features include sound frequency, amplitude, wave speed, and the like, and according to the audio features, an energy value corresponding to each sampling point in the first channel signal can be determined, and each sampling point is compared with a preset energy value to determine a first sampling point with a corresponding energy value greater than the preset energy value. Through the steps, the first sampling point can be accurately screened from the sampling points of the first channel signal.
It should be noted that, the total number of sampling points in a section of audio is determined by the total duration of audio and the sampling frequency, that is, the number of sampling points in unit time, the maximum frequency that can be heard by the human ear is 20kHz, the conversation can be completed when the sampling frequency reaches 8kHz, and the audio with the sampling frequency above 40kHz can be generally called lossless audio.
Optionally, in step 104, determining the first loudness based on the at least one first sampling point includes:
Acquiring audio features corresponding to the at least one first sampling point respectively;
Determining energy values corresponding to the at least one first sampling point respectively based on the audio features corresponding to the at least one first sampling point respectively;
And determining the first loudness based on the energy value corresponding to each of the at least one first sampling point and the number of sampling points of the first channel signal.
In this embodiment, the first loudness:
Loud_v=10×log10(sum(Energy_v)/length(Audio1_v))
Wherein energy_v represents Energy values corresponding to at least one first sampling point, sum (energy_v) represents a sum of Energy values obtained by summing the Energy values corresponding to the at least one first sampling point, and length (audio1_v) represents a total number of sampling points in the first channel signal.
Through the mode, the energy value corresponding to each at least one first sampling point is converted into the more visual first loudness, and the subsequent loudness comparison is facilitated.
Optionally, in step 104, the determining the second loudness based on the at least one second sampling point includes:
acquiring audio features corresponding to the at least one second sampling point respectively;
determining energy values corresponding to the at least one second sampling point respectively based on the audio features corresponding to the at least one second sampling point respectively;
And determining the second loudness based on the energy value corresponding to each of the at least one second sampling point and the number of sampling points of the second sound signal.
Similar to the process of determining the first loudness described above, the second loudness:
Loud_c=10×log10(sum(Energy_c)/length(Audio1_c))
Wherein energy_c represents Energy values corresponding to at least one second sampling point, sum (energy_c) represents a sum of Energy values obtained by summing the Energy values corresponding to the at least one second sampling point, and length (audio1_c) represents a total number of sampling points in the second channel signal.
Optionally, step 105, determining the sharpness of the audio to be screened based on the first loudness and the second loudness includes:
acquiring a first judgment threshold value and a second judgment threshold value, wherein the second judgment threshold value is larger than the first judgment threshold value, and the first judgment threshold value and the second judgment threshold value are associated with the loudness of the audio;
and determining the contrast definition of the audio to be screened according to the magnitude relation between the difference value of the second loudness and the first judgment threshold value and the second judgment threshold value.
For example, the first judgment threshold is threshold1, the second judgment threshold is threshold2, threshold1 is less than threshold2, threshold1 and threshold2 are experimental values obtained by experiments on whether the loudness of sound of human ears can be clearly heard, and specific values can be adjusted manually. The definition of the dialogue is classified into three layers of extremely poor definition, better definition and the like by taking whether the dialogue sound can be heard or not as a standard.
In the case of (Loud _v-Loud _c) < threshold1, it is explained that the loudness gap between the dialogue sound and the background sound is small, and it is difficult to distinguish the dialogue sound, so the dialogue sharpness of the audio to be screened is determined as extremely poor sharpness.
The small case of threshold2 > (Loud _v-Loud _c) > threshold1 indicates that the loudness gap between the white and background sounds can discriminate the white sounds to some extent, thus determining the white sharpness of the audio to be screened as poor sharpness.
In the case of (Loud _v-Loud _c) > threshold2, it is explained that the loudness gap between the dialogue sound and the background sound is large, and the dialogue sound can be clearly distinguished, so that the dialogue definition of the audio to be screened is determined as better definition.
In the method of the embodiment of the application, the difference between the first loudness representing the white sound and the second loudness representing the background sound is compared with the two preset loudness judgment thresholds to judge the white definition of the audio to be screened, and compared with the mode of artificial subjective judgment in the prior art, the accuracy of the white definition result is greatly improved through the mode of quantitative analysis.
Optionally, before the audio to be screened is acquired in step 101, the method further includes:
acquiring video materials to be screened;
extracting audio from the video materials to be screened to obtain initial audio;
And adjusting the volume of the initial audio to a preset volume to obtain the audio to be screened.
The method of the embodiment can be applied to dialogue definition analysis of a movie and television play, as shown in fig. 2, after obtaining movie and television play materials, namely video materials to be screened are generally mkv and mp4 files, audio is extracted from the video materials to be screened, and initial audio is obtained. And based on the EBUR.128 loudness standard, adopting a loudness control algorithm to adjust the volume of the initial audio, and adjusting the volume of the initial audio to a preset volume, so as to obtain the audio to be screened. And then separating the dialogue sound and the background sound to obtain a first audio and a second audio, extracting audio files played by a single channel of the first audio and the second audio, synthesizing a third audio, carrying out loudness analysis and threshold judgment, finally determining the dialogue definition of the video to be screened, carrying out hierarchical division on the dialogue definition of the movie and television drama, positioning the movie and television drama with lower dialogue definition, and pertinently adjusting parameters of the dialogue sound and the background sound, thereby improving the dialogue definition and achieving the aim of improving the quality of the movie and television drama.
In the method of the embodiment of the application, for each initial audio, volume adjustment is needed first, so that all the audio are positioned under the same volume standard, and all the audio can be judged and analyzed by using the same loudness standard, thereby avoiding the situation that the original volume of the audio is smaller or larger and an accurate definition judgment result cannot be obtained.
As shown in fig. 3, the embodiment of the present application further provides a device 300 for determining a white definition, where the device 300 for determining a white definition includes:
The obtaining module 301 is configured to obtain audio to be screened, where the audio to be screened includes a white sound and a background sound;
the separation module 302 is configured to separate the dialogue sound and the background sound included in the audio to be screened to obtain a first audio and a second audio, where the first audio is an audio corresponding to the dialogue sound, and the second audio is an audio corresponding to the background sound;
a first determining module 303, configured to determine, based on the first audio and the second audio, a third audio, where the third audio includes a first channel signal and a second channel signal, and the first channel signal corresponds to the first audio, and the second channel signal corresponds to the second audio;
A second determining module 304, configured to determine a first loudness of the white sound in the audio to be screened based on the first channel signal, and determine a second loudness of the background sound in the audio to be screened based on the second channel signal;
A third determining module 305 is configured to determine the white definition of the audio to be screened based on the first loudness and the second loudness.
Optionally, the second determining module 304 includes:
The first determining submodule is used for determining at least one first sampling point from sampling points of the first sound channel signal, and the energy value of the first sampling point is larger than a preset energy value;
A second determining sub-module, configured to determine, from sampling points of the second channel signal, at least one second sampling point corresponding to the at least one first sampling point, where the second sampling point is consistent with timing information of the corresponding first sampling point;
a third determination sub-module for determining the first loudness based on the at least one first sampling point;
a fourth determination sub-module for determining the second loudness based on the at least one second sampling point.
Optionally, the first determining submodule includes:
A first obtaining unit, configured to obtain an audio feature of a sampling point to be determined, where the sampling point to be determined is any one of sampling points of the first channel signal;
The first determining unit is used for determining the energy value of the sampling point to be judged based on the audio characteristics of the sampling point to be judged;
And the second determining unit is used for determining the sampling point to be judged as the first sampling point under the condition that the energy value of the sampling point to be judged is larger than the preset energy value.
Optionally, the third determining submodule includes:
the second acquisition unit is used for acquiring the audio features corresponding to the at least one first sampling point respectively;
A third determining unit, configured to determine energy values corresponding to the at least one first sampling point respectively based on the audio features corresponding to the at least one first sampling point respectively;
and a fourth determining unit, configured to determine the first loudness based on the energy values corresponding to the at least one first sampling point and the number of sampling points of the first channel signal.
Optionally, the fourth determining submodule includes:
a third obtaining unit, configured to obtain audio features corresponding to the at least one second sampling point respectively;
a fifth determining unit, configured to determine energy values corresponding to the at least one second sampling point respectively, based on the audio features corresponding to the at least one second sampling point respectively;
And a sixth determining unit, configured to determine the second loudness based on the energy values corresponding to the at least one second sampling point and the number of sampling points of the second sound signal.
Optionally, the third determining module includes:
The audio processing device comprises an acquisition sub-module, a processing module and a processing module, wherein the acquisition sub-module is used for acquiring a first judgment threshold value and a second judgment threshold value, the second judgment threshold value is larger than the first judgment threshold value, and the first judgment threshold value and the second judgment threshold value are associated with the loudness of the audio;
And a fifth determining submodule, configured to determine the dialog definition of the audio to be screened according to the magnitude relation between the difference between the second loudness and the first judgment threshold and between the difference between the second loudness and the second judgment threshold.
Optionally, the device 300 for determining the contrast definition is further configured to:
acquiring video materials to be screened;
extracting audio from the video materials to be screened to obtain initial audio;
And adjusting the volume of the initial audio to a preset volume to obtain the audio to be screened.
It should be noted that, the device 300 for determining the definition of white provided in the embodiment of the present application can implement all technical processes of the method for determining the definition of white shown in the embodiment of fig. 1, and achieve the same technical effects, and is not repeated here for avoiding repetition.
The data transmitting device in the embodiment of the application can be an electronic device or a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. For example, the electronic device may be a Mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a Mobile internet appliance (Mobile INTERNET DEVICE, MID), an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a robot, a wearable device, an ultra-Mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), etc., and the non-Mobile electronic device may be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine, a self-service machine, etc., which is not limited in the embodiments of the present application.
Optionally, as shown in fig. 4, the embodiment of the present application further provides an electronic device 400, including a processor 401 and a memory 402, where the memory 402 stores a program or an instruction that can be executed on the processor 401, and the program or the instruction implements each step of the embodiment of the method when executed by the processor 401, and the steps can achieve the same technical effect, so that repetition is avoided, and no redundant description is provided herein.
The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.
The embodiment of the application also provides a readable storage medium, on which a program or an instruction is stored, which when executed by a processor, implements each process of the above method embodiment, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here.
Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium such as a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
The foregoing is merely illustrative embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the technical scope of the present application, and the application should be covered. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims (10)

1. A method for determining white definition, the method comprising:
acquiring audio to be screened, wherein the audio to be screened comprises white sound and background sound;
Separating the dialogue sound and the background sound included in the audio to be screened to obtain a first audio and a second audio, wherein the first audio is corresponding to the dialogue sound, and the second audio is corresponding to the background sound;
Determining a third audio based on the first audio and the second audio, the third audio comprising a first channel signal and a second channel signal, the first channel signal corresponding to the first audio and the second channel signal corresponding to the second audio;
Determining a first loudness of the dialogue sound in the audio to be screened based on the first channel signal, and determining a second loudness of the background sound in the audio to be screened based on the second channel signal;
and determining the white definition of the audio to be screened based on the first loudness and the second loudness.
2. The method of claim 1, wherein the determining a first loudness of the white sound in the audio to be screened based on the first channel signal and determining a second loudness of background sound in the audio to be screened based on the second channel signal comprises:
Determining at least one first sampling point from sampling points of the first channel signal, wherein the energy value of the first sampling point is larger than a preset energy value;
Determining at least one second sampling point corresponding to the at least one first sampling point from sampling points of the second channel signal, wherein the second sampling point is consistent with the time sequence information of the corresponding first sampling point;
Determining the first loudness based on the at least one first sampling point;
The second loudness is determined based on the at least one second sampling point.
3. The method of claim 2, wherein the determining at least one first sampling point from among sampling points of the first channel signal comprises:
acquiring audio characteristics of sampling points to be judged, wherein the sampling points to be judged are any one of the sampling points of the first channel signal;
determining the energy value of the sampling point to be judged based on the audio characteristics of the sampling point to be judged;
and under the condition that the energy value of the sampling point to be judged is larger than the preset energy value, determining the sampling point to be judged as the first sampling point.
4. The method of claim 2, wherein the determining the first loudness based on the at least one first sampling point comprises:
Acquiring audio features corresponding to the at least one first sampling point respectively;
Determining energy values corresponding to the at least one first sampling point respectively based on the audio features corresponding to the at least one first sampling point respectively;
And determining the first loudness based on the energy value corresponding to each of the at least one first sampling point and the number of sampling points of the first channel signal.
5. The method of claim 2, wherein the determining the second loudness based on the at least one second sampling point comprises:
acquiring audio features corresponding to the at least one second sampling point respectively;
determining energy values corresponding to the at least one second sampling point respectively based on the audio features corresponding to the at least one second sampling point respectively;
And determining the second loudness based on the energy value corresponding to each of the at least one second sampling point and the number of sampling points of the second sound signal.
6. The method of any one of claims 1 to 5, wherein the determining the dialog clarity of the audio to be screened based on the first loudness and the second loudness comprises:
acquiring a first judgment threshold value and a second judgment threshold value, wherein the second judgment threshold value is larger than the first judgment threshold value, and the first judgment threshold value and the second judgment threshold value are associated with the loudness of the audio;
and determining the contrast definition of the audio to be screened according to the magnitude relation between the difference value of the second loudness and the first judgment threshold value and the second judgment threshold value.
7. The method of any one of claims 1 to 5, wherein prior to the acquiring audio to be screened, the method further comprises:
acquiring video materials to be screened;
extracting audio from the video materials to be screened to obtain initial audio;
And adjusting the volume of the initial audio to a preset volume to obtain the audio to be screened.
8. A device for determining white definition, the device comprising:
the acquisition module is used for acquiring audio to be screened, wherein the audio to be screened comprises white sound and background sound;
The separation module is used for separating the dialogue sound and the background sound included in the audio to be screened to obtain a first audio and a second audio, wherein the first audio is the audio corresponding to the dialogue sound, and the second audio is the audio corresponding to the background sound;
a first determining module configured to determine a third audio based on the first audio and the second audio, the third audio including a first channel signal and a second channel signal, the first channel signal corresponding to the first audio, the second channel signal corresponding to the second audio;
A second determining module, configured to determine a first loudness of the white sound in the audio to be screened based on the first channel signal, and determine a second loudness of the background sound in the audio to be screened based on the second channel signal;
and the third determining module is used for determining the white contrast definition of the audio to be screened based on the first loudness and the second loudness.
9. An electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the method of determining the definition of white according to any one of claims 1 to 7.
10. A readable storage medium, characterized in that the readable storage medium has stored thereon a program or instructions which, when executed by a processor, implement the steps of the method of determining a definition of white according to any of claims 1 to 7.
CN202410171946.8A 2024-02-07 2024-02-07 Determination method and device for white definition, electronic equipment and storage medium Pending CN118038888A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410171946.8A CN118038888A (en) 2024-02-07 2024-02-07 Determination method and device for white definition, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410171946.8A CN118038888A (en) 2024-02-07 2024-02-07 Determination method and device for white definition, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN118038888A true CN118038888A (en) 2024-05-14

Family

ID=91000174

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410171946.8A Pending CN118038888A (en) 2024-02-07 2024-02-07 Determination method and device for white definition, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN118038888A (en)

Similar Documents

Publication Publication Date Title
US10820131B1 (en) Method and system for creating binaural immersive audio for an audiovisual content
CN109348274B (en) Live broadcast interaction method and device and storage medium
CN108521612B (en) Video abstract generation method, device, server and storage medium
US8868419B2 (en) Generalizing text content summary from speech content
CN110381336B (en) Video segment emotion judgment method and device based on 5.1 sound channel and computer equipment
US20230254655A1 (en) Signal processing apparatus and method, and program
CN114879929A (en) Multimedia file playing method and device
CN113707183A (en) Audio processing method and device in video
CN112416116B (en) Vibration control method and system for computer equipment
CN118038888A (en) Determination method and device for white definition, electronic equipment and storage medium
CN115333879B (en) Remote conference method and system
CN110739006A (en) Audio processing method and device, storage medium and electronic equipment
US12073844B2 (en) Audio-visual hearing aid
US20230360662A1 (en) Method and device for processing a binaural recording
US11361777B2 (en) Sound prioritisation system and method
CN113488068B (en) Audio anomaly detection method, device and computer readable storage medium
CN115119110A (en) Sound effect adjusting method, audio playing device and computer readable storage medium
CN113392234A (en) Multimedia file processing method, device, equipment and medium
CN112333531A (en) Audio data playing method and device and readable storage medium
CN112562737B (en) Method, device, medium and electronic equipment for evaluating audio processing quality
CN112735455A (en) Method and device for processing sound information
CN111145769A (en) Audio processing method and device
CN113808615B (en) Audio category positioning method, device, electronic equipment and storage medium
JP7499459B2 (en) Control device, control method, and program
CN115484503B (en) Bullet screen generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination