CN115119007A - Big data based audio acquisition and processing system and method for online live broadcast recording - Google Patents

Big data based audio acquisition and processing system and method for online live broadcast recording Download PDF

Info

Publication number
CN115119007A
CN115119007A CN202210724426.6A CN202210724426A CN115119007A CN 115119007 A CN115119007 A CN 115119007A CN 202210724426 A CN202210724426 A CN 202210724426A CN 115119007 A CN115119007 A CN 115119007A
Authority
CN
China
Prior art keywords
audio
image
recording
target
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210724426.6A
Other languages
Chinese (zh)
Other versions
CN115119007B (en
Inventor
冼文忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinyingke Electroacoustic Technology Co ltd
Original Assignee
Xinyingke Electroacoustic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinyingke Electroacoustic Technology Co ltd filed Critical Xinyingke Electroacoustic Technology Co ltd
Priority to CN202210724426.6A priority Critical patent/CN115119007B/en
Publication of CN115119007A publication Critical patent/CN115119007A/en
Application granted granted Critical
Publication of CN115119007B publication Critical patent/CN115119007B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • H04N5/92Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N5/9201Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving the multiplexing of an additional signal and the video signal
    • H04N5/9202Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving the multiplexing of an additional signal and the video signal the additional signal being a sound signal

Abstract

The invention discloses an audio acquisition processing system and method for online live broadcast recording based on big data, comprising a video data acquisition module, a recording end data analysis module, a to-be-analyzed set judgment module, a first correction audio acquisition module and a background sound adjustment module; the video data acquisition module is used for acquiring video data of an online live broadcast recording end, and the recording end data analysis module is used for analyzing image data and audio data of the recording end; the to-be-analyzed set judgment module judges whether the set is 0 or not based on the to-be-analyzed set; the first correction audio acquisition module analyzes the image information acquired by the listening end and marks a first correction audio based on the condition that the image information in the set to be analyzed is not 0; the background tone adjusting module is used for adjusting the proportion of the background tone and the main body tone in the first correction audio frequency; the invention carries out the preprocessing of the corresponding audio according to the image data meeting the existence of the serial mark so as to improve the listening experience of a listening end.

Description

Big data based audio acquisition and processing system and method for online live broadcast recording
Technical Field
The invention relates to the technical field of audio acquisition and processing, in particular to an audio acquisition and processing system and method for online live broadcast recording based on big data.
Background
With the continuous development of science and technology, online live broadcast recording becomes another mode for people to communicate at present, the online live broadcast has the difference between a recording end and a listening end, the live broadcast recording not only can embody the spirit and the action of a recorded person in a video, but also can synchronously transmit the sound of the recorded person, and brings double enjoyment of hearing and vision to the listening end;
however, when recording sound at the recording end, there are other sounds except for the recorded person, some of the sounds are intended to be transmitted to the audience, and some of the recorded sounds are interference factors but cannot be completely avoided; meanwhile, when the 'noise' generated by the recording end is transmitted to the listener, the listening experience of the listener is influenced, so that the problem of reducing the proportion of the audio frequency except the sound of the recorded person is solved by the invention.
Disclosure of Invention
The invention aims to provide an audio acquisition and processing system and method for online live broadcast recording based on big data, so as to solve the problems in the background technology.
In order to solve the technical problems, the invention provides the following technical scheme: the audio acquisition and processing method for the online live broadcast recording based on the big data is characterized by comprising the following specific steps of:
step S100: acquiring video data of an online live recording end, wherein the video data comprises image data and audio data; continuously and equally dividing the audio data and the image data of the recording end into one-to-one correspondence according to the time sequence; the audio data of the recording end comprises a recording main body sound and a recording background sound, the recording main body sound is an audio frequency for recording the content of a recorder, and the recording background sound is an audio frequency corresponding to the non-recorded content;
step S200: recording image data corresponding to the recorded main voice as a first image set, and recording image data corresponding to the recorded background voice as a second image set; analyzing the behavior of a person who records in the first image set and the second image set of the recording end, and taking the intersection of the first image set and the second image set as a set to be analyzed;
step S300: judging a set to be analyzed, wherein the first image set and the second image set are not empty sets, and when the set to be analyzed is 0, analyzing a condition set which needs to correct and record background sound in the second image set; when the set to be analyzed is not 0, recording as a target set, extracting a corresponding image in the target set as a target image, wherein audio data corresponding to the target image is a target audio, and the target audio comprises a target subject sound and a target background sound;
step S400: recording image information acquired by a listening end corresponding to the image generation time in the target set, wherein the image information is a listener image acquired by a listening end camera device; analyzing the relation between the image information acquired by the listening end and the target audio, and marking a first correction audio;
step S500: based on the first corrected audio marked in step S400, the target background sound in the first corrected audio is adjusted. Because when the background sound and the main body sound exist simultaneously, the listening effect of the listening end to the audio frequency can be interfered, bad listening experience is brought, the main body sound recorded by the recording end is not clear enough, and the recording efficiency is reduced.
Further, the behavior of the person who records in the image data of the recording terminal is analyzed, and the method comprises the following steps:
step S210: marking the positions of the head, eyes and elbow of the recorded person in the first image set and the second image set; establishing a rectangular coordinate system by taking the central point of the image data as an origin, recording an angle average value R1 formed by a line segment of the head position of a recording person in the first image set to the origin by taking the nose as a fixed point in the change process, and a corresponding angle average value R2 in the second image set; calculating the difference value of the angle average values R1 and R2, and recording the head of the person to be recorded as a first serial mark when the difference value is greater than or equal to a preset difference value threshold;
step S220: when the difference is smaller than a preset difference threshold, acquiring an eye expansion ratio E ═ E1a, E1b, E2a and E2b, wherein the eye expansion ratio is the ratio of the exposed area of eyeballs to the whole area of the eyes, and the whole area of the eyes is the rectangular area above the eyelids below the eyebrows; using the formula:
Figure BDA0003710396260000021
calculating a dynamic index E of eyes of the recorded person, wherein E1a represents an eyeball expansion degree proportion mean value when the head position change angle of the recorded person in the first image set is less than or equal to R1, E1b represents an eyeball expansion degree proportion mean value when the head position change angle of the recorded person in the first image set is greater than R1, E2a represents an eyeball expansion degree proportion mean value when the head position change angle of the recorded person in the second image set is less than or equal to R2, and E2b represents an eyeball expansion degree proportion mean value when the head position change angle of the recorded person in the second image set is greater than R2;
the human eye dynamic index is calculated and recorded in order to analyze the behavior dynamic trend of the human eyes recorded under the same tendency of the head angle change in the images corresponding to different audios, and the same tendency indicates that when the head position angle change of the recorded person in the first image set is smaller than the angle average value, the value corresponding to the angle average value is still obtained in the second image set; analyzing the dynamic trend of the eye behaviors influenced by the change of the head angle in the same set, and comprehensively analyzing the eye behavior difference of the recorded person under different scenes;
step S230: comparing the dynamic index e of the eyes of the recorded person with a preset dynamic index threshold value e0, and recording the eyes of the recorded person as a second serial mark when e is greater than or equal to e 0; when the dynamic index is larger than the threshold value, the eye behaviors of the recorded person under different scenes are subjected to dynamic difference; when e is smaller than e0, acquiring the stay time h1k of the elbow of the recorded person in the kth quadrant in the first image set and the stay time h2k of the kth quadrant in the second image set; arranging corresponding quadrants in the first image set into a set K1 according to the sequence of the stay time lengths from large to small, arranging corresponding quadrants in the second image set into a set K2 according to the sequence of the stay time lengths from large to small, judging whether the first quadrants in the set K1 and the set K2 are the same, and marking the first quadrant regions which correspond to the second image set at different times as third serial marks.
Analyzing the difference of the behavior of the human beings recorded corresponding to the main body sound and the background sound in the recording process according to different conditions by analyzing the image data of the recording end, wherein different scenes can be generated according to the divided image sets of the audio, different scenes are different in serial marks obtained by corresponding analysis, the head of the human being recorded is judged firstly because the head is easy to calibrate in the video image, the head difference is analyzed simply and quickly, the eyes are further analyzed under the condition that the head is not different, and the head position of the human being recorded is unchanged but the difference of the images corresponding to the audio is caused by the dynamic change of the eyes; if the dynamic change of the eyes cannot effectively distinguish the behavior difference of the recorded person in the images corresponding to the main body sound and the background sound, the elbow of the hand is further analyzed, and the behavior difference of the recorded person in the images corresponding to the main body sound and the background sound can be effectively marked through triple verification.
Further, when the set to be analyzed is 0, analyzing a condition set of the second image set, which needs to correct the recorded background sound, including the following steps:
acquiring the ith image p2i in the second image set and the first image p1i in the first image set corresponding to the image p1 before the adjacent time period, substituting the set formed by the image p1i and the image p2i into the processes from the step S210 to the step S230 for analysis, and judging a finally obtained target serial mark, wherein the target serial mark is any one of { a first serial mark, a second serial mark and a third serial mark };
when the target tandem flag is the first tandem flag, the angle threshold of the recorded image p1i is [ R (p1i) min, R (p1i) max ], the angle threshold is the first correction condition, R (p1i) min is the minimum angle value formed by the line segment of the head position from the nose as the fixed point to the origin in the image p1i during the change, and R (p1i) max is the maximum angle value formed by the line segment of the head position from the nose as the fixed point to the origin in the image p1i during the change;
when the target serial mark is a second serial mark, establishing a relation pair { first serial mark → second serial mark } in the image p1i, wherein the relation pair is a relation formed by eye opening proportion corresponding to each head position, recording a relation pair threshold { first serial mark min → second serial mark max } in the image p1i as a second correction condition, and the first serial mark min → second serial mark max represents all combination relations formed by an angle minimum value formed by a line segment of the head position using a nose as a fixed point to an origin in a changing process and a maximum value of the eye opening proportion corresponding to the head;
when the target serial mark is a third serial mark, acquiring a quadrant K0 corresponding to the first image set before the image generation time corresponding to the third serial mark; acquiring quadrant sets { K1a, K2a and K3a } except the third serial mark, which are arranged in descending order according to the dwell time, and corresponding quadrants { K1b, K2b and K3b } in the first image set before the time corresponding to the images { K1a, K2a and K3a }; constructing a quadrant dynamic path Q (KA → KB), and sorting the quadrant dynamic paths according to quadrant difference values from small to large in priority, wherein the quadrant dynamic path corresponding to the third serial mark is always the first, and the priority is a correction condition III; the staying time length indicates that the proportion of background sounds is large, the influence on audio recording is large, so that the condition is preferably inspected, the object limit difference values are sorted from small to large so as to analyze that the quadrant change is small and the quadrant change is difficult to monitor under the condition that different audios correspond to adjacent image data, the quadrant change is large, the action amplitude of a recorded person is large, the difference is likely to be generated in the previous two-layer analysis, and the quadrant difference value is small and taken as the priority inspection;
the condition set for correcting the recording background sound is { correction condition one, correction condition two, correction condition three }, and when detecting that the image data of the recording end meets any correction condition, the volume of the background sound is reduced under the state corresponding to { first serial mark, second serial mark, third serial mark }.
Further, analyzing the relation between the image information acquired by the listening end and the target audio, and marking the first correction audio, comprising the following steps:
step S410: acquiring image information of a listening end corresponding to a target audio, establishing a face extension change curve of a listener image, extracting the target audio corresponding to a curve point with the change rate being more than or equal to a change rate threshold value in the face extension change curve, and recording the target audio as a first target audio;
step S420: marking the target audio corresponding to the curve point with the change rate smaller than the change rate threshold value in the face extension degree change curve as a second target audio, comparing the similarity of the target image corresponding to the first target audio and the target image corresponding to the second target audio, and marking the first target audio as a first corrected audio if the similarity is smaller than the similarity threshold value; and if the similarity is greater than or equal to the similarity threshold, not marking. When the background sound and the main body sound are generated, the acceptance degree of the audio frequency of the recording end can be reflected according to the facial expression of a listener at the listening end, if the recording end generates noise or harsh sound, the corresponding reaction can be generated corresponding to the face of the listener at the listening end, and therefore whether the background sound and the main body sound of the recording end do not influence the audio frequency of the listening end or not is estimated; and the reason for analyzing the similarity is that if the image corresponding to the change rate of the face stretch degree of the listening end surface is higher than the threshold value and the image lower than the threshold value has obvious difference, the corresponding image is the audio affecting the listening end when the change rate of the face stretch degree is higher than the threshold value, and if the two images are not obviously different, the face stretch degree when the change rate of the face stretch degree is higher than the threshold value is probably caused by the self-reason of the listening person of the listening end, and the audio data of the recording end does not need to be adjusted.
Further, according to the marked first corrected audio analyzed in step S400, the method for adjusting the target background sound in the first corrected audio includes the following steps:
step S510: acquiring a first correction audio, mixing and synthesizing a target main sound and a target background sound in the first correction audio, wherein the mixing and synthesizing is carried out based on a mixing proportion, and the mixing proportion is the target main sound: target background sound s 0: g0, and s0 > g 0;
step S520: acquiring a mixing ratio when a listening end receives a second target audio, and calculating a mixing ratio threshold value G corresponding to the second target audio [ s1/G1, s2/G2], wherein s1/G1 is a minimum value of the ratio of a target main body sound to a target background sound in the second target audio, and s2/G2 is a maximum value of the ratio of the target main body sound to the target background sound in the second target audio;
step S530: s0/g0 was adjusted so that s0/g0 ∈ [ s1/g1, s2/g2 ].
The system for acquiring and processing the audio recorded by live broadcasting on line based on big data is characterized by comprising a video data acquisition module, a recording end data analysis module, a to-be-analyzed set judgment module, a first corrected audio acquisition module and a background sound adjustment module;
the video data acquisition module is used for acquiring video data of an online live broadcast recording end, wherein the video data comprises image data and audio data; continuously and equally dividing the audio data and the image data of the recording end into one-to-one correspondence according to the time sequence; the audio data of the recording end comprises a recording main body sound and a recording background sound, the recording main body sound is an audio frequency for recording contents by a recorder, and the recording background sound is an audio frequency corresponding to non-recorded contents;
the recording end data analysis module is used for analyzing the image data and the audio data of the recording end;
the to-be-analyzed set judgment module is used for judging whether the set is 0 or not based on the to-be-analyzed set acquired by the recording end data analysis module;
the first correction audio acquisition module analyzes the image information acquired by the listening end and marks a first correction audio based on the condition that the image information in the set to be analyzed is not 0;
the background tone adjusting module is used for adjusting the proportion of the background tone to the main body tone in the first correction audio.
Further, the recording end data analysis module comprises a first image collection acquisition unit, a second image collection acquisition unit and a serial connection mark analysis unit;
the first image set acquisition unit acquires image data corresponding to the recording main body voice and records the image data as a first image set;
the second image set acquisition unit acquires image data corresponding to the recorded background sound and records the image data as a second image set;
the serial mark analysis unit is used for analyzing the position relation of the head, the eyes and the elbow of the recording person, and establishing a progressive analysis method from the head, the eyes to the elbow through layer-by-layer analysis.
Further, the to-be-analyzed set judgment module comprises a data substitution unit, a scene analysis unit and a condition set acquisition unit;
the data substituting unit substitutes and analyzes the actual data in the acquired collection to be analyzed based on the analysis method of the serial connection mark analysis unit;
the scene analysis unit analyzes different scenes based on the result of the data substitution unit;
the condition set acquisition unit forms a condition set based on the correction conditions obtained by the scene analysis unit, and reduces the volume of the background sound when detecting that the image data of the recording end meets any correction condition.
Further, the first correction audio acquisition module comprises a face extension degree change curve establishing unit, a curve point marking unit and a similarity comparison unit;
the face extension degree change curve establishing unit is used for acquiring image information of a listening end corresponding to the target audio and establishing a face extension degree change curve of the image of the listener;
the curve point marking unit is used for marking the target audio corresponding to the curve point which is greater than or equal to the change rate threshold value on the face extension degree change curve and recording the target audio as a first target audio; marking the target audio corresponding to the curve point with the change rate smaller than the change rate threshold value in the face extension degree change curve, and recording the target audio as a second target audio;
the similarity comparison unit is used for comparing the similarity of a target image corresponding to the first target audio and a target image corresponding to the second target audio, and if the similarity is smaller than a similarity threshold value, the first target audio is marked as a first correction audio; and if the similarity is greater than or equal to the similarity threshold, not marking.
Compared with the prior art, the invention has the following beneficial effects: the invention combines the image data and the audio data in the analysis of the recording end, analyzes the behavior analysis method of the recorder according to the big data, and the behavior analysis method relates to three different parts in the image of the recording end, and carries out layer-by-layer progressive analysis, which is accurate and fast; meanwhile, the corresponding serial marks are obtained by substituting the actual data under different conditions into a behavior analysis method of the recorder, and the preprocessing of the corresponding audio is carried out according to the image data meeting the existence of the serial marks, so that the audio proportion of the background sound is timely adjusted when the background sound generates poor impression to the listener of the listener, the listening feeling of the listener is met, the recording efficiency of the recorder is improved, and the main body sound is clearer.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic structural diagram of an audio acquisition and processing system for big data-based online live recording according to the present invention;
fig. 2 is a frame diagram of an audio acquisition and processing system according to an embodiment of the system and method for acquiring and processing big data live online recording.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-2, the present invention provides a technical solution: the audio acquisition and processing method for the online live broadcast recording based on the big data is characterized by comprising the following specific steps of:
step S100: acquiring video data of an online live recording end, wherein the video data comprises image data and audio data; continuously and equally dividing the audio data and the image data of the recording end into one-to-one correspondence according to the time sequence; the audio data of the recording end comprises a recording main body sound and a recording background sound, the recording main body sound is an audio frequency for recording the content of a recorder, and the recording background sound is an audio frequency corresponding to the non-recorded content;
step S200: recording image data corresponding to the recorded main voice as a first image set, and recording image data corresponding to the recorded background voice as a second image set; analyzing the behavior actions of the person recorded in the first image set and the second image set of the recording terminal, such as: the first image set and the second image set have two possibilities, the first one is that only main body sound exists when a person records contents, when the main body sound disappears, background sound appears, for example, when a teacher teaches on line, when the teacher does not say the contents in class, friction sound caused by writing contents is wiped by a board, and at the moment, the friction sound belongs to the background sound; the second is that when a recorder records contents, the main body sound and the background sound of the recorded contents exist at the same time, for example, a teacher gives lessons online and carries out blackboard writing while teaching lessons, friction sound caused by blackboard writing is background sound, or loudspeaker sound on a road outside a window can be used as background sound, and the background sound and the main body sound exist at the same time; taking the intersection of the first image set and the second image set as a set to be analyzed;
step S300: judging a set to be analyzed, wherein the first image set and the second image set are not empty sets, and when the set to be analyzed is 0, analyzing a condition set which needs to correct and record background sound in the second image set; when the set to be analyzed is not 0, recording as a target set, extracting a corresponding image in the target set as a target image, wherein audio data corresponding to the target image is a target audio, and the target audio comprises a target subject sound and a target background sound;
step S400: recording image information acquired by a listening end corresponding to the image generation time in the target set, wherein the image information is a listener image acquired by a listening end camera device; analyzing the relation between the image information acquired by the listening end and the target audio, and marking a first correction audio;
step S500: based on the first corrected audio marked in step S400, the target background sound in the first corrected audio is adjusted. Because when the background sound and the main body sound exist simultaneously, the listening effect of the listening end to the audio frequency can be interfered, bad listening experience is brought, the main body sound recorded by the recording end is not clear enough, and the recording efficiency is reduced.
Analyzing the behavior of a person who records in the image data of the recording terminal, comprising the following steps:
step S210: marking the positions of the head, eyes and elbow of the recorded person in the first image set and the second image set; establishing a rectangular coordinate system by taking the central point of the image data as an origin, and recording an angle average value R1 formed by a line segment of the head position of a recording person in the first image set to the origin by taking the nose as a fixed point in the change process and a corresponding angle average value R2 in the second image set; calculating the difference value of the angle average values R1 and R2, and recording the head of the person to be recorded as a first serial mark when the difference value is greater than or equal to a preset difference value threshold;
step S220: when the difference is smaller than a preset difference threshold, acquiring an eye expansion ratio E ═ E1a, E1b, E2a and E2b, wherein the eye expansion ratio is the ratio of the exposed area of eyeballs to the whole area of the eyes, and the whole area of the eyes is the rectangular area above the eyelids below the eyebrows; using the formula:
Figure BDA0003710396260000081
calculating a dynamic index E of eyes of the recorded person, wherein E1a represents an eyeball expansion degree proportion mean value when the head position change angle of the recorded person in the first image set is less than or equal to R1, E1b represents an eyeball expansion degree proportion mean value when the head position change angle of the recorded person in the first image set is greater than R1, E2a represents an eyeball expansion degree proportion mean value when the head position change angle of the recorded person in the second image set is less than or equal to R2, and E2b represents an eyeball expansion degree proportion mean value when the head position change angle of the recorded person in the second image set is greater than R2;
the human eye dynamic index is calculated to analyze the behavior dynamic trend of the human eyes recorded under the same tendency of the angle change of the head of the recorded person in the images corresponding to different audios, and the same tendency indicates that when the angle change of the head position of the recorded person in the first image set is smaller than the angle average value, the numerical value smaller than the angle average value is still obtained in the second image set; analyzing the dynamic trend of the eye behaviors influenced by the change of the head angle in the same set, and comprehensively analyzing the eye behavior difference of the recorded person under different scenes;
step S230: comparing the dynamic index e of the recorded human eyes with a preset dynamic index threshold value e0, and recording the eyes of the recorded human as a second serial mark when e is greater than or equal to e 0; when the dynamic index is larger than the threshold value, the eye behaviors of the recorded person under different scenes are subjected to dynamic difference; when e is smaller than e0, acquiring the stay time h1k of the elbow of the recorded person in the kth quadrant in the first image set and the stay time h2k of the kth quadrant in the second image set; arranging corresponding quadrants in the first image set into a set K1 according to the sequence of the stay time lengths from large to small, arranging corresponding quadrants in the second image set into a set K2 according to the sequence of the stay time lengths from large to small, judging whether the first quadrants in the set K1 and the set K2 are the same, and marking the first quadrant regions which correspond to the second image set at different times as third serial marks.
Analyzing the difference of the behavior of the human beings recorded corresponding to the main body sound and the background sound in the recording process according to different conditions by analyzing the image data of the recording end, wherein different scenes can be generated according to the divided image sets of the audio, different scenes are different in serial marks obtained by corresponding analysis, the head of the human being recorded is judged firstly because the head is easy to calibrate in the video image, the head difference is analyzed simply and quickly, the eyes are further analyzed under the condition that the head is not different, and the head position of the human being recorded is unchanged but the difference of the images corresponding to the audio is caused by the dynamic change of the eyes; if the dynamic change of the eyes cannot effectively distinguish the behavior difference of the recorded person in the corresponding images of the main body sound and the background sound, the elbow of the hand is further analyzed, and the behavior difference of the recorded person in the corresponding images of the main body sound and the background sound can be effectively marked through triple verification.
When the set to be analyzed is 0, analyzing a condition set which needs to correct the recorded background sound in the second image set, and comprising the following steps:
acquiring the ith image p2i in the second image set and the first image p1i in the first image set corresponding to the image p1 before the adjacent time period, substituting the set formed by the image p1i and the image p2i into the processes from the step S210 to the step S230 for analysis, and judging a finally obtained target serial mark, wherein the target serial mark is any one of { a first serial mark, a second serial mark and a third serial mark };
when the target tandem flag is the first tandem flag, the angle threshold of the recorded image p1i is [ R (p1i) min, R (p1i) max ], the angle threshold is the first correction condition, R (p1i) min is the minimum angle value formed by the line segment of the head position from the nose as the fixed point to the origin in the image p1i during the change, and R (p1i) max is the maximum angle value formed by the line segment of the head position from the nose as the fixed point to the origin in the image p1i during the change;
when the target serial mark is a second serial mark, establishing a relation pair { first serial mark → second serial mark } in the image p1i, wherein the relation pair is a relation formed by eye opening proportion corresponding to each head position, recording a relation pair threshold { first serial mark min → second serial mark max } in the image p1i as a second correction condition, and the first serial mark min → second serial mark max represents all combination relations formed by an angle minimum value formed by a line segment of the head position using a nose as a fixed point to an origin in a changing process and a maximum value of the eye opening proportion corresponding to the head;
when the target serial mark is a third serial mark, acquiring a quadrant K0 corresponding to the first image set before the image generation time corresponding to the third serial mark; acquiring quadrant sets { K1a, K2a and K3a } except the third serial mark, which are arranged in descending order according to the dwell time, and corresponding quadrants { K1b, K2b and K3b } in the first image set before the time corresponding to the images { K1a, K2a and K3a }; constructing a quadrant dynamic path Q (KA → KB), and sorting the quadrant dynamic paths according to quadrant difference values from small to large in priority, wherein the quadrant dynamic path corresponding to the third serial mark is always the first, and the priority is a correction condition III; the staying time length indicates that the proportion of background sounds is large, the influence on audio recording is large, so that the condition is preferably inspected, the object limit difference values are sorted from small to large so as to analyze that the quadrant change is small and the quadrant change is difficult to monitor under the condition that different audios correspond to adjacent image data, the quadrant change is large, the action amplitude of a recorded person is large, the difference is likely to be generated in the previous two-layer analysis, and the quadrant difference value is small and taken as the priority inspection;
such as: in the first image set { quadrant 1: 15min (12:11), quadrant 2: 3min (12: 02), quadrant 3: 1min (12:31), quadrant 4: 2min (12:28) }
In the second image set { quadrant 1: 1min (12:27), quadrant 2: 1min (12:30), quadrant 3: 6min (12:05), quadrant 4: 5min (12:32) };
the third series is marked as quadrant 3,
and the quadrant K0 corresponding to the first image set before the image generation time corresponding to the third serial flag is quadrant 2, the quadrant dynamic path corresponding to the third serial flag is { quadrant 3 → quadrant K0: quadrant 2}
Quadrant set { K1a (quadrant 4), K2a (quadrant 2) ═ K3a (quadrant 1) }, which are arranged in descending order of dwell time duration; { K1a, K2a, K3a } corresponds to the corresponding quadrant { K1b (quadrant 3), K2b (quadrant 4), K3b (quadrant 1) } in the first set of first images before the image generation time;
the corresponding difference is: { K1a (quadrant 4) -K1b (quadrant 3) ═ 1, K2b (quadrant 4) -K2a (quadrant 2) ═ 2, and K3a (quadrant 1) -K3b (quadrant 1) ═ 0 };
the priority is { third tandem flag → K0 > K3a → K3b > K1a → K1b > K2a → K2b };
the condition set for correcting the recording background sound is { correction condition one, correction condition two, correction condition three }, and when detecting that the image data of the recording end meets any correction condition, the volume of the background sound is reduced under the state corresponding to { first serial mark, second serial mark, third serial mark }.
Analyzing the relation between the image information acquired by the listening end and the target audio and marking the first correction audio, comprising the following steps:
step S410: acquiring image information of a listening end corresponding to a target audio, establishing a face extension change curve of a listener image, extracting the target audio corresponding to a curve point with the change rate being more than or equal to a change rate threshold value in the face extension change curve, and recording the target audio as a first target audio;
step S420: recording a target audio corresponding to a curve point with a change rate smaller than a change rate threshold value in the face extension degree change curve as a second target audio, comparing the similarity of a target image corresponding to the first target audio and a target image corresponding to the second target audio, and if the similarity is smaller than the similarity threshold value, marking the first target audio as a first corrected audio; and if the similarity is greater than or equal to the similarity threshold, not marking. When the background sound and the main body sound are generated, the receiving degree of the audio frequency of the recording end can be reflected according to the facial expression of a listener at the listening end, if the recording end generates noise or harsh sound, the corresponding response is generated corresponding to the face of the listener at the listening end, and therefore whether the background sound and the main body sound of the recording end do not influence the audio frequency of the listening end is estimated; the reason for analyzing the similarity is that if the image corresponding to the listening end face part stretch degree change rate higher than the threshold value and the image lower than the threshold value have obvious difference, the audio affecting the listening end is indicated when the image is higher than the threshold value, and if the two images are not obviously different, the face stretch degree higher than the threshold value may be caused by the self-reason of the listening person at the listening end, and the audio data of the recording end does not need to be adjusted.
Analyzing the marked first correction audio according to the step S400, and adjusting the target background sound in the first correction audio, including the following steps:
step S510: acquiring a first correction audio, mixing and synthesizing a target main sound and a target background sound in the first correction audio, wherein the mixing and synthesizing is carried out based on a mixing proportion, and the mixing proportion is the target main sound: target background sound s 0: g0, and s0 > g 0;
step S520: acquiring a mixing ratio when a listening end receives a second target audio, and calculating a mixing ratio threshold value G corresponding to the second target audio [ s1/G1, s2/G2], wherein s1/G1 is the minimum value of the ratio of a target main body sound to a target background sound in the second target audio, and s2/G2 is the maximum value of the ratio of the target main body sound to the target background sound in the second target audio;
step S530: s0/g0 was adjusted so that s0/g0 ∈ [ s1/g1, s2/g2 ].
The system is characterized by comprising a video data acquisition module, a recording end data analysis module, a to-be-analyzed set judgment module, a first corrected audio acquisition module and a background sound adjustment module;
the video data acquisition module is used for acquiring video data of an online live broadcast recording end, wherein the video data comprises image data and audio data; continuously and equally dividing the audio data and the image data of the recording end into one-to-one correspondence according to the time sequence; the audio data of the recording end comprises a recording main body sound and a recording background sound, the recording main body sound is an audio frequency for recording the content of a recorder, and the recording background sound is an audio frequency corresponding to the non-recorded content;
the recording end data analysis module is used for analyzing the image data and the audio data of the recording end;
the to-be-analyzed set judgment module is used for judging whether the set is 0 or not based on the to-be-analyzed set acquired by the recording end data analysis module;
the first correction audio acquisition module analyzes the image information acquired by the listening end and marks a first correction audio based on the condition that the image information in the set to be analyzed is not 0;
the background tone adjusting module is used for adjusting the proportion of the background tone to the main body tone in the first correction audio.
Example as shown in fig. 2: in the video data acquisition module of the system, an audio acquisition and processing system is used, and the audio acquisition and processing system has the following composition structure: an analog-digital converter (ADC/DAC), a large-scale array set circuit (FPGA), a touch display integrated module, an OTG chip, a Bluetooth chip, a FLASH memory, a potentiometer, keys and other control parts;
the system is provided with 4 microphone input interfaces which can accept 4 simultaneous speeches or sings, 2 groups of stereo input interfaces which can be accessed to electronic keys or other musical instrument players and the like, 1 digital OTG audio input/output interface which is matched with a smart phone terminal, 1 high-speed USB digital audio input/output interface, 5 earphone output interfaces, 1 monitoring audio output interface, wireless Bluetooth communication audio and the like. Forming an 8 x 8 audio matrix by using a large-scale array integrated circuit, freely distributing signal routes to each input/output interface, and performing processing such as audio signal gain, volume, effect, noise, equalization and the like;
the acquisition system uses a large-size touch display screen, has a man-machine interaction interface with independent intellectual property rights, can simulate a large number of entity operation keys in the touch display screen, effectively reduces the number of desktop operation keys of the audio acquisition processing system for the whole online live broadcast recording, effectively reduces the volume, and miniaturizes products. The large-size touch display screen is used for imaging the functions and the states of the audio channels of the system, so that a user can easily and vividly observe and know the functions and the states of the system, and the time for learning and using the system is effectively shortened;
the acquisition system adopts the RGB three primary colors luminous LED with adjustable, uses the PWM pulse adjustment technology, and can set the color and the brightness displayed by each entity key on the desktop of the system. The user can calculate the self-defined setting to the luminance and the required look of each entity button according to personal taste and ambient lighting, has effectively improved product and has used interesting and pleasing to the eye degree to when using for a long time, have certain guard action to user's eyes. The keys of the traditional product can only be adjusted to be bright or not bright, and the brightness and the color can not be set;
the acquisition system is provided with a digital audio OTG interface, and transmits acquired and processed audio signals to the smart phone in a digital audio mode, so that lossless high-fidelity audio input and output are realized. This system supports main smart mobile phone system on the market like Hua be hongmeng OS, android mobile phone OS, apple IOS, is connected to smart mobile phone's TYPE C or LIGHT interface with the OTG data line, can two-way transmission reach 24 bits/48 KHz's harmless high-fidelity digital audio to can also charge at the smart mobile phone that connects when using, realize the limit transmission function that charges. Traditional products are mostly connected to an earphone interface or an external earphone interface of a smart phone through a 4-pole audio line to transmit analog audio signals, and the achieved audio effect is poor in signal-to-noise ratio, narrow in frequency response, small in dynamic range and easy to interfere. The system solves the problems that in the traditional mode, the audio signal is transmitted to the smart phone in a transmission mode of connecting through an analog signal mode, the dynamic range of the audio signal is limited, the noise is large, the interference is easy to happen, and the quality of the audio signal is influenced.
The recording end data analysis module comprises a first image set acquisition unit, a second image set acquisition unit and a serial mark analysis unit;
the first image set acquisition unit acquires image data corresponding to the recording main body voice and records the image data as a first image set;
the second image set acquisition unit acquires image data corresponding to the recorded background sound and records the image data as a second image set;
the tandem mark analysis unit is used for analyzing the position relation of the head, the eyes and the elbow of the recorded person and establishing a progressive analysis method from the head, the eyes to the elbow through layer-by-layer analysis.
The to-be-analyzed set judgment module comprises a data substitution unit, a scene analysis unit and a condition set acquisition unit;
the data substituting unit substitutes and analyzes the actual data in the acquired collection to be analyzed based on the analysis method of the serial connection mark analysis unit;
the scene analysis unit analyzes different scenes based on the result of the data substitution unit;
the condition set acquisition unit forms a condition set based on the correction conditions obtained by the scene analysis unit, and reduces the volume of the background sound when detecting that the image data of the recording end meets any correction condition.
The first correction audio acquisition module comprises a face extension degree change curve establishing unit, a curve point marking unit and a similarity comparison unit;
the face extension degree change curve establishing unit is used for acquiring image information of a listening end corresponding to the target audio and establishing a face extension degree change curve of the image of the listener;
the curve point marking unit is used for marking the target audio corresponding to the curve point which is greater than or equal to the change rate threshold value on the face extension degree change curve and recording the target audio as a first target audio; marking the target audio corresponding to the curve point with the change rate smaller than the change rate threshold value in the face extension degree change curve, and recording the target audio as a second target audio;
the similarity comparison unit is used for comparing the similarity of a target image corresponding to the first target audio and a target image corresponding to the second target audio, and if the similarity is smaller than a similarity threshold value, the first target audio is marked as a first correction audio; and if the similarity is greater than or equal to the similarity threshold, not marking.
Example in fig. 2: the background tone adjusting module of the system can also carry out multi-track recording synthesis, namely, audio signals of all input channels are recorded into a storage aiming at the proportion adjustment of mixed audio. When a recording key on the desktop of the system is clicked, the recording function can be accessed by one key, and the function is independent, so that other functions of the system are not influenced while recording. The system uses the 'POLYWAV' multitrack recording technology to independently process audio signals with up to 16 channels respectively to form 1 recording file containing 16 tracks, and the recording file is recorded into a storage device for storage. The file can restore 14-channel independent audio signals in a recording program in a computer DAW format and is independently processed in later edition. In the conventional product, the multi-channel audio signals can only be mixed into a 2-channel stereo format, and then recorded into a storage for storage, so that the audio signals of each channel cannot be processed independently in the later audio editing.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. The audio acquisition and processing method for the online live broadcast recording based on the big data is characterized by comprising the following specific steps of:
step S100: acquiring video data of an online live broadcast recording end, wherein the video data comprises image data and audio data; continuously and equally dividing the audio data and the image data of the recording end into one-to-one correspondence according to the time sequence; the audio data of the recording end comprises a recording main body sound and a recording background sound, the recording main body sound is an audio frequency for recording the content of a recorder, and the recording background sound is an audio frequency corresponding to the non-recorded content;
step S200: recording image data corresponding to the recorded main voice as a first image set, and recording image data corresponding to the recorded background voice as a second image set; analyzing the behavior of a person who records in the first image set and the second image set of the recording end, and taking the intersection of the first image set and the second image set as a set to be analyzed;
step S300: judging the set to be analyzed, wherein the first image set and the second image set are not empty sets, and when the set to be analyzed is 0, analyzing a condition set of recording background sound needing to be corrected in the second image set; when the set to be analyzed is not 0, recording as a target set, extracting a corresponding image in the target set as a target image, wherein audio data corresponding to the target image is a target audio, and the target audio comprises a target subject sound and a target background sound;
step S400: recording image information acquired by a listening end corresponding to the image generation time in a target set, wherein the image information is a listener image acquired by a listening end camera device; analyzing the relation between the image information acquired by the listening end and the target audio, and marking a first correction audio;
step S500: based on the first corrected audio marked in step S400, the target background sound in the first corrected audio is adjusted.
2. The big-data-based audio processing method for live online recording according to claim 1, wherein: the method for analyzing the behavior of the person who records in the image data of the recording end comprises the following steps:
step S210: marking the positions of the head, eyes and elbow of the recorded person in the first image set and the second image set; establishing a rectangular coordinate system by taking the central point of the image data as an origin, and recording an angle average value R1 formed by a line segment of the head position of a recording person in the first image set to the origin by taking the nose as a fixed point in the change process and a corresponding angle average value R2 in the second image set; calculating the difference value of the angle average values R1 and R2, and recording the head of the person to be recorded as a first serial mark when the difference value is greater than or equal to a preset difference value threshold;
step S220: when the difference is smaller than a preset difference threshold, acquiring an eye expansion ratio E ═ E1a, E1b, E2a and E2b, wherein the eye expansion ratio is the ratio of the exposed area of eyeballs to the whole area of the eyes, and the whole area of the eyes is the rectangular area above the eyelids below the eyebrows; using the formula:
Figure FDA0003710396250000021
calculating a dynamic index E of eyes of the recorded person, wherein E1a represents an eyeball expansion degree proportion mean value when the head position change angle of the recorded person in the first image set is less than or equal to R1, E1b represents an eyeball expansion degree proportion mean value when the head position change angle of the recorded person in the first image set is greater than R1, E2a represents an eyeball expansion degree proportion mean value when the head position change angle of the recorded person in the second image set is less than or equal to R2, and E2b represents an eyeball expansion degree proportion mean value when the head position change angle of the recorded person in the second image set is greater than R2;
step S230: comparing the dynamic index e of the eyes of the recorded person with a preset dynamic index threshold value e0, and recording the eyes of the recorded person as a second serial mark when e is greater than or equal to e 0; when e is smaller than e0, acquiring the stay time h1k of the elbow of the recorded person in the kth quadrant in the first image set and the stay time h2k of the kth quadrant in the second image set; arranging corresponding quadrants in the first image set into a set K1 according to the sequence of the stay time lengths from large to small, arranging corresponding quadrants in the second image set into a set K2 according to the sequence of the stay time lengths from large to small, judging whether the first quadrants in the set K1 and the set K2 are the same, and marking the first quadrant regions which correspond to the second image set at different times as third serial marks.
3. The big-data-based audio processing and collecting method for online live recording according to claim 1, wherein the method comprises the following steps: when the set to be analyzed is 0, analyzing a condition set which needs to correct the recorded background sound in the second image set, comprising the following steps:
acquiring the ith image p2i in the second image set and the first image p1i in the first image set corresponding to the image p1 before the adjacent time period, substituting the set formed by the image p1i and the image p2i into the processes from the step S210 to the step S230 for analysis, and judging a finally obtained target serial mark, wherein the target serial mark is any one of { a first serial mark, a second serial mark and a third serial mark };
when the target tandem flag is the first tandem flag, an angle threshold of the recorded image p1i is [ R (p1i) min, R (p1i) max ], the angle threshold is a correction condition one, R (p1i) min is an angle minimum value formed by a line segment of the head position fixed to the origin with the nose in the image p1i during the change, and R (p1i) max is an angle maximum value formed by a line segment of the head position fixed to the origin with the nose in the image p1i during the change;
when the target serial mark is a second serial mark, establishing a relation pair { a first serial mark → a second serial mark } in the image p1i, wherein the relation pair is a relation formed by eye opening proportion corresponding to each head position, and recording a relation pair threshold { a first serial mark min → a second serial mark max } in the image p1i as a second correction condition, wherein the first serial mark min → the second serial mark max represent all combination relations formed by the minimum angle value formed by a line segment of the head position from a nose as a fixed point to an origin in a changing process and the maximum value of the eye opening proportion corresponding to the head;
when the target serial mark is a third serial mark, acquiring a quadrant K0 corresponding to the first image set before the image generation time corresponding to the third serial mark; acquiring quadrant sets { K1a, K2a and K3a } except the third serial mark, which are arranged in descending order according to the dwell time, and corresponding quadrants { K1b, K2b and K3b } in the first image set before the time corresponding to the images { K1a, K2a and K3a }; constructing a quadrant dynamic path Q (KA → KB), and sorting the quadrant dynamic paths according to quadrant difference values from small to large in priority, wherein the quadrant dynamic path corresponding to the third serial mark is always the first, and the priority is a correction condition III;
the condition set of the recording background sound needing to be corrected is { correction condition one, correction condition two, correction condition three }, and when the image data of the recording end is detected to meet any correction condition, the volume of the background sound is reduced under the state corresponding to { first serial mark, second serial mark and third serial mark }.
4. The big-data-based audio processing and collecting method for online live recording according to claim 1, wherein the method comprises the following steps: the method for analyzing the relation between the image information acquired by the listening end and the target audio and marking the first correction audio comprises the following steps:
step S410: acquiring image information of a listening end corresponding to a target audio, establishing a face extension change curve of a listener image, extracting the target audio corresponding to a curve point with the change rate being more than or equal to a change rate threshold value in the face extension change curve, and recording the target audio as a first target audio;
step S420: marking the target audio corresponding to the curve point with the change rate smaller than the change rate threshold value in the face extension degree change curve as a second target audio, comparing the similarity of the target image corresponding to the first target audio and the target image corresponding to the second target audio, and marking the first target audio as a first corrected audio if the similarity is smaller than the similarity threshold value; and if the similarity is greater than or equal to the similarity threshold, not marking.
5. The big-data-based audio acquisition and processing method for online live recording according to claim 1, wherein the method comprises the following steps: the analyzing the marked first corrected audio according to the step S400 and adjusting the target background sound in the first corrected audio includes the following steps:
step S510: acquiring a first correction audio, and performing audio mixing synthesis on a target main sound and a target background sound in the first correction audio, wherein the audio mixing synthesis is performed based on an audio mixing proportion, and the audio mixing proportion is the target main sound: target background sound s 0: g0, and s0 > g 0;
step S520: acquiring a mixing ratio when a listening end receives a second target audio, and calculating a mixing ratio threshold value G corresponding to the second target audio [ s1/G1, s2/G2], wherein s1/G1 is the minimum value of the ratio of a target main body sound to a target background sound in the second target audio, and s2/G2 is the maximum value of the ratio of the target main body sound to the target background sound in the second target audio;
step S530: s0/g0 was adjusted so that s0/g0 ∈ [ s1/g1, s2/g2 ].
6. An audio acquisition and processing system based on big data live-line recording applied to the method of any one of claims 1-5, which is characterized by comprising a video data acquisition module, a recording end data analysis module, a to-be-analyzed set judgment module, a first correction audio acquisition module and a background sound adjustment module;
the video data acquisition module is used for acquiring video data of an online live broadcast recording end, wherein the video data comprises image data and audio data; continuously and equally dividing the audio data and the image data of the recording end into one-to-one correspondence according to the time sequence; the audio data of the recording end comprises a recording main body sound and a recording background sound, the recording main body sound is an audio frequency for recording the content of a recorder, and the recording background sound is an audio frequency corresponding to the non-recorded content;
the recording end data analysis module is used for analyzing the image data and the audio data of the recording end;
the to-be-analyzed set judgment module is used for judging whether the set is 0 or not based on the to-be-analyzed set acquired by the recording end data analysis module;
the first correction audio acquisition module analyzes the image information acquired by the listening end and marks a first correction audio based on the condition that the image information in the set to be analyzed is not 0;
the background tone adjusting module is used for adjusting the proportion of the background tone to the main body tone in the first correction audio.
7. The big-data-based audio acquisition and processing system for live online recording according to claim 6, wherein: the recording end data analysis module comprises a first image set acquisition unit, a second image set acquisition unit and a serial mark analysis unit;
the first image set acquisition unit acquires image data corresponding to the recording main body voice and records the image data as a first image set;
the second image set acquisition unit acquires image data corresponding to the recorded background sound and records the image data as a second image set;
the serial mark analysis unit is used for analyzing the position relation of the head, the eyes and the elbow of the recorded person and establishing a progressive analysis method from the head, the eyes to the elbow through layer-by-layer analysis.
8. The big-data-based audio acquisition and processing system for online live recording according to claim 7, wherein: the to-be-analyzed set judgment module comprises a data substitution unit, a scene analysis unit and a condition set acquisition unit;
the data substituting unit is used for substituting and analyzing the actual data in the acquired collection to be analyzed based on the analysis method of the serial connection mark analysis unit;
the scene analysis unit analyzes different scenes based on the result of the data substitution unit;
the condition set acquisition unit forms a condition set based on the correction conditions obtained by the scene analysis unit, and reduces the volume of background sound when detecting that the image data of the recording end meets any correction condition.
9. The big-data-based audio acquisition and processing system for online live recording according to claim 8, wherein: the first correction audio acquisition module comprises a face extension degree change curve establishing unit, a curve point marking unit and a similarity comparison unit;
the face extension degree change curve establishing unit is used for acquiring image information of a listening end corresponding to the target audio and establishing a face extension degree change curve of the image of the listener;
the curve point marking unit is used for marking the target audio corresponding to the curve point which is greater than or equal to the change rate threshold value on the face stretching degree change curve and recording the target audio as a first target audio; marking the target audio corresponding to the curve point with the change rate smaller than the change rate threshold value in the face extension degree change curve, and recording the target audio as a second target audio;
the similarity comparison unit is used for comparing the similarity of a target image corresponding to the first target audio and a target image corresponding to the second target audio, and if the similarity is smaller than a similarity threshold value, the first target audio is marked as a first correction audio; and if the similarity is greater than or equal to the similarity threshold, not marking.
CN202210724426.6A 2022-06-23 2022-06-23 Big data based audio acquisition and processing system and method for online live broadcast recording Active CN115119007B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210724426.6A CN115119007B (en) 2022-06-23 2022-06-23 Big data based audio acquisition and processing system and method for online live broadcast recording

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210724426.6A CN115119007B (en) 2022-06-23 2022-06-23 Big data based audio acquisition and processing system and method for online live broadcast recording

Publications (2)

Publication Number Publication Date
CN115119007A true CN115119007A (en) 2022-09-27
CN115119007B CN115119007B (en) 2023-03-03

Family

ID=83327949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210724426.6A Active CN115119007B (en) 2022-06-23 2022-06-23 Big data based audio acquisition and processing system and method for online live broadcast recording

Country Status (1)

Country Link
CN (1) CN115119007B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116264620A (en) * 2023-04-21 2023-06-16 深圳市声菲特科技技术有限公司 Live broadcast recorded audio data acquisition and processing method and related device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5625463A (en) * 1994-07-20 1997-04-29 Ha-Ngoc; Tuan Video recorder with background audio
KR20070070481A (en) * 2005-12-29 2007-07-04 엠텍비젼 주식회사 Method and apparatus for providing background sound in mobile phone
CN101193381A (en) * 2006-12-01 2008-06-04 中兴通讯股份有限公司 A mobile terminal and method with voice pre-processing function
CN104133652A (en) * 2014-06-10 2014-11-05 腾讯科技(深圳)有限公司 Audio playing control method and terminal
CN105872253A (en) * 2016-05-31 2016-08-17 腾讯科技(深圳)有限公司 Live broadcast sound processing method and mobile terminal
CN109166589A (en) * 2018-08-13 2019-01-08 深圳市腾讯网络信息技术有限公司 Using sound suppressing method, device, medium and equipment
CN114416015A (en) * 2022-01-07 2022-04-29 北京小米移动软件有限公司 Audio adjusting method and device, electronic equipment and readable storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5625463A (en) * 1994-07-20 1997-04-29 Ha-Ngoc; Tuan Video recorder with background audio
KR20070070481A (en) * 2005-12-29 2007-07-04 엠텍비젼 주식회사 Method and apparatus for providing background sound in mobile phone
CN101193381A (en) * 2006-12-01 2008-06-04 中兴通讯股份有限公司 A mobile terminal and method with voice pre-processing function
CN104133652A (en) * 2014-06-10 2014-11-05 腾讯科技(深圳)有限公司 Audio playing control method and terminal
CN105872253A (en) * 2016-05-31 2016-08-17 腾讯科技(深圳)有限公司 Live broadcast sound processing method and mobile terminal
CN109166589A (en) * 2018-08-13 2019-01-08 深圳市腾讯网络信息技术有限公司 Using sound suppressing method, device, medium and equipment
CN114416015A (en) * 2022-01-07 2022-04-29 北京小米移动软件有限公司 Audio adjusting method and device, electronic equipment and readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116264620A (en) * 2023-04-21 2023-06-16 深圳市声菲特科技技术有限公司 Live broadcast recorded audio data acquisition and processing method and related device

Also Published As

Publication number Publication date
CN115119007B (en) 2023-03-03

Similar Documents

Publication Publication Date Title
US8291326B2 (en) Information-processing apparatus, information-processing methods, recording mediums, and programs
JP3521900B2 (en) Virtual speaker amplifier
CN102111601B (en) Content-based adaptive multimedia processing system and method
CN106023983A (en) Multi-user voice interaction method and device based on virtual reality scene
CN115119007B (en) Big data based audio acquisition and processing system and method for online live broadcast recording
CN109040641B (en) Video data synthesis method and device
CN107506171A (en) Audio-frequence player device and its effect adjusting method
CN111508531B (en) Audio processing method and device
Letowski Development of technical listening skills: Timbre solfeggio
CN109598991A (en) A kind of pronunciation of English tutoring system, device and method
WO2015090182A1 (en) Multi-information synchronization code learning device and method
CN103474082A (en) Multi-microphone vocal accompaniment marking system and method thereof
CN109782850A (en) Support the full interactive intelligence intelligent education machine of multiple network access
CN111818441A (en) Sound effect realization method and device, storage medium and electronic equipment
JPH05101608A (en) Audio editing device
CN108269460B (en) Electronic screen reading method and system and terminal equipment
CN108810436A (en) A kind of video recording method and system based on the He Zou of full-automatic musical instrument
CN107507467A (en) A kind of Multimedia EFL teaching system
CN111445742B (en) Vocal music teaching system based on distance education system
CN113965771A (en) VR live broadcast user interactive experience system
CN103489464A (en) Effect control device, effect control method, and program
JP2003079000A (en) Presence control system for video acoustic device
CN112130662A (en) Interactive network lesson system based on AR technology
TWM600921U (en) Learning trajectory analysis system
CN112489636A (en) Intelligent voice broadcast assistant selection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Audio acquisition and processing system and method of online live recording based on Big data

Effective date of registration: 20230601

Granted publication date: 20230303

Pledgee: Agricultural Bank of China Limited Enping City sub branch

Pledgor: XINYINGKE ELECTROACOUSTIC TECHNOLOGY Co.,Ltd.

Registration number: Y2023980042545