CN116072123A - Broadcast information playing method and device, readable storage medium and electronic equipment - Google Patents

Broadcast information playing method and device, readable storage medium and electronic equipment Download PDF

Info

Publication number
CN116072123A
CN116072123A CN202310202075.7A CN202310202075A CN116072123A CN 116072123 A CN116072123 A CN 116072123A CN 202310202075 A CN202310202075 A CN 202310202075A CN 116072123 A CN116072123 A CN 116072123A
Authority
CN
China
Prior art keywords
voice information
personnel
voice
feature
broadcasting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310202075.7A
Other languages
Chinese (zh)
Other versions
CN116072123B (en
Inventor
邱晓健
连峰
邱正峰
崔韧
吴鼎元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang Hang Tian Guang Xin Technology Co ltd
Original Assignee
Nanchang Hang Tian Guang Xin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang Hang Tian Guang Xin Technology Co ltd filed Critical Nanchang Hang Tian Guang Xin Technology Co ltd
Priority to CN202310202075.7A priority Critical patent/CN116072123B/en
Publication of CN116072123A publication Critical patent/CN116072123A/en
Application granted granted Critical
Publication of CN116072123B publication Critical patent/CN116072123B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H20/00Arrangements for broadcast or for distribution combined with broadcast
    • H04H20/86Arrangements characterised by the broadcast information itself
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention discloses a broadcast information playing method, a device, a readable storage medium and electronic equipment, wherein the broadcast information playing method comprises the following steps: acquiring voice information acquired by a microphone, and extracting acoustic features in the voice information; inputting the acoustic features into a voiceprint recognition model to identify the current voice personnel; judging whether the current voice personnel are personnel in a preset list according to the identification result; if the current voice person is a person in the preset list, sending the voice information to a broadcasting terminal for playing; if the current voice personnel are not personnel in the preset list, extracting the content of the voice information, and analyzing to judge whether the content of the voice information meets the broadcasting requirement; when the broadcasting requirement is met, the voice information is sent to the broadcasting terminal for playing.

Description

Broadcast information playing method and device, readable storage medium and electronic equipment
Technical Field
The present invention relates to the field of broadcasting devices, and in particular, to a broadcasting information playing method and apparatus, a readable storage medium, and an electronic device.
Background
The broadcasting system is widely applied to various fields, and places such as campuses, hospitals, parks, markets and the like are provided with the broadcasting system and mainly used for music playing, emergency notification, news broadcasting, paging and the like. The broadcasting terminal, such as a sound box, is a terminal device of a network broadcasting system, and is in wireless communication connection with an upper computer (such as a server) through a switch.
The existing broadcasting system generally comprises a control platform, at least one microphone and at least one broadcasting terminal which are connected with the control platform, any person can use the broadcasting system to broadcast information, the use of the broadcasting system cannot be effectively controlled, abuse of the broadcasting system is caused, and meanwhile poor information is easily spread.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a broadcast information playing method, apparatus, readable storage medium and electronic device, aiming at the problem that the use of the broadcast system in the prior art cannot be effectively controlled.
The invention discloses a broadcast information playing method, which comprises the following steps:
acquiring voice information acquired by a microphone, and extracting acoustic features in the voice information;
inputting the acoustic features into a voiceprint recognition model to identify the current voice personnel;
judging whether the current voice personnel are personnel in a preset list according to the identification result;
when the current voice personnel are personnel in a preset list, sending the voice information to a broadcasting terminal for playing;
when the current voice personnel are not personnel in a preset list, extracting the content of the voice information, and analyzing to judge whether the content of the voice information meets the broadcasting requirement;
and when the content of the voice information meets the broadcasting requirement, sending the voice information to a broadcasting terminal for broadcasting.
Further, in the broadcast information playing method, the step of extracting acoustic features in the voice information includes:
extracting MEL spectrum cepstrum features and Bottleneck features in the voice information;
calculating the weight coefficient of each dimension characteristic component of the MEL spectrum cepstrum feature, and carrying out weighted calculation on the MEL spectrum cepstrum feature according to the weight coefficient of each dimension characteristic component;
and carrying out feature fusion on the MEL spectrum cepstrum feature and the Bottleneck feature after weighted calculation to obtain acoustic features in the voice information.
Further, in the broadcast information playing method, the step of calculating the weight coefficient of each dimension feature component of the MEL spectrum cepstrum feature includes:
calculating contribution degrees of each dimension characteristic component of the MEL spectrum cepstrum characteristic to the speaker identity recognition rate respectively;
carrying out standardization processing on the contribution degree of each dimension characteristic component to the speaker identity recognition rate by adopting a min-max standardization method;
and determining the weight coefficient of each dimension characteristic component according to the contribution degree after the normalization processing.
Further, in the broadcast information playing method, the step of extracting the bottleck feature in the voice information includes:
pre-emphasis, framing and windowing are carried out on the voice information;
converting the processed voice information through FFT, and obtaining a corresponding frequency spectrum after taking an absolute value or a square value;
inputting the corresponding frequency spectrum into a Mel filter bank, and obtaining the Mel frequency spectrum output by the Mel filter bank;
taking logarithm of the MEL spectrum to obtain FBanks characteristics;
inputting the FBanks characteristics into a DNN model, and extracting node excitation values of a Bottleneck layer in the DNN model to obtain Bottleneck characteristics.
Further, in the broadcast information playing method, the step of extracting the content of the voice information and analyzing to determine whether the content of the voice information meets the broadcast requirement includes:
identifying the content in the voice information through a voice identification algorithm, and matching with a sensitive word database;
judging whether the voice information contains sensitive words or not according to the matching result;
if not, determining that the voice information meets the broadcasting requirement.
The invention also discloses a broadcasting information playing device, which comprises:
the feature extraction module is used for acquiring voice information acquired by the microphone and extracting acoustic features in the voice information;
the identity recognition module is used for inputting the acoustic characteristics into a voiceprint recognition model so as to recognize the identity of the current voice personnel;
the first judging module is used for judging whether the current voice personnel are personnel in a preset list or not according to the identification result;
the first sending module is used for sending the voice information to a broadcasting terminal for playing when the current voice personnel are personnel in a preset list;
the second judging module is used for extracting the content of the voice information when the current voice personnel are not personnel in a preset list and analyzing the content to judge whether the content of the voice information meets the broadcasting requirement or not;
and the second sending module is used for sending the voice information to a broadcasting terminal for playing when the content of the voice information meets the broadcasting requirement.
Further, in the broadcast information playing device, the feature extraction module is specifically configured to:
extracting MEL spectrum cepstrum features and Bottleneck features in the voice information;
calculating the weight coefficient of each dimension characteristic component of the MEL spectrum cepstrum feature, and carrying out weighted calculation on the MEL spectrum cepstrum feature according to the weight coefficient of each dimension characteristic component;
and carrying out feature fusion on the MEL spectrum cepstrum feature and the Bottleneck feature after weighted calculation to obtain acoustic features in the voice information.
Further, in the broadcast information playing device, the step of calculating the weight coefficient of each dimension feature component of the MEL spectrum cepstrum feature includes:
calculating contribution degrees of each dimension characteristic component of the MEL spectrum cepstrum characteristic to the speaker identity recognition rate respectively;
carrying out standardization processing on the contribution degree of each dimension characteristic component to the speaker identity recognition rate by adopting a min-max standardization method;
and determining the weight coefficient of each dimension characteristic component according to the contribution degree after the normalization processing.
The invention also discloses a readable storage medium, on which a computer program is stored, which when executed by a processor, implements the broadcast information playing method of any one of the above.
The invention also discloses an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the broadcast information playing method of any one of the above is realized when the processor executes the computer program.
According to the invention, the voice information collected by the microphone is subjected to acoustic feature extraction, the identity of the current voice personnel is identified by utilizing the voiceprint identification model, whether the current voice personnel is a person in the preset list is judged according to the identification result, if so, the voice information of the current voice personnel is played, if not, the content of the voice information of the current voice personnel is analyzed, and whether the content meets the broadcasting requirement is judged, and if so, the voice information of the current voice personnel is played. The invention standardizes the use of the broadcasting system by identifying the identity of the current voice personnel and judging the voice content.
Drawings
Fig. 1 is a flowchart of a broadcast information playing method in an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating steps of acoustic feature extraction in an embodiment of the present invention;
fig. 3 is a block diagram of a broadcast information playing device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
Embodiments of the present invention will be apparent with reference to the following description and attached drawings. In the description and drawings, particular implementations of embodiments of the invention are disclosed in detail as being indicative of some of the ways in which the principles of embodiments of the invention may be employed, but it is understood that the scope of the embodiments of the invention is not limited correspondingly. On the contrary, the embodiments of the invention include all alternatives, modifications and equivalents as may be included within the spirit and scope of the appended claims.
Referring to fig. 1, the broadcast information playing method in the embodiment of the invention includes steps S11 to S15.
Step S11, voice information collected by a microphone is obtained, and acoustic features in the voice information are extracted.
Typically a broadcast system will be provided with one or more microphones and at least one broadcast terminal. The user can broadcast voice information through one of the microphones.
Because the voice of each person speaking is different, different speakers can be identified according to the voice characteristics of the speakers, thereby managing the users of the broadcasting system and preventing abuse.
When the voice information of the current voice person acquired by the microphone is acquired, the voice information is extracted, and the acoustic feature is a voiceprint feature for identifying the current voice person, and in one embodiment of the invention, the acoustic feature may be a fusion feature of MEL frequency spectrum cepstrum feature (MEL-frequencycepstral coefficient, MFCC) and bottleck feature, for example. The MEL frequency spectrum cepstrum features are coefficients for forming a Mel frequency cepstrum, and the auditory perception of human ears is emphasized, so that the MEL frequency spectrum cepstrum can well reflect the characteristics of superficial speech of different speakers, and has good identification.
The Bottleneck features can be extracted through a Deep Neural Network (DNN), and a hidden layer with a small number of nodes is arranged in the DNN, and the hidden layer is the Bottleneck layer. The excitation value of the node of the Bottleneck layer is Bottleneck characteristic, and the Bottleneck layer contains information with strong differentiation.
The acoustic features are obtained by carrying out feature fusion on the MEL spectrum cepstrum features and the Bottleneck features, and the fused acoustic features can inherit the advantages of the MEL spectrum cepstrum features and the Bottleneck features, so that the voice feature individuality of a speaker is enhanced, and the recognition performance is improved.
Specifically, as shown in fig. 2, in one implementation of the present invention, the step of extracting the acoustic feature in the voice information includes:
step S111, extracting MEL spectrum cepstrum features and Bottleneck features in the voice information;
step S112, calculating weight coefficients of each dimension characteristic component of the MEL spectrum cepstrum feature, and carrying out weighted calculation on the MEL spectrum cepstrum feature according to the weight coefficients of each dimension characteristic component;
and step S113, carrying out feature fusion on the MEL frequency spectrum cepstrum feature and the Bottleneck feature after the weighted calculation to obtain acoustic features in the voice information.
Specifically, the step of extracting MEL spectrum cepstrum features in the voice information includes:
pre-emphasis, framing and windowing are carried out on the voice information;
converting the processed voice information through FFT, and obtaining a corresponding frequency spectrum after taking an absolute value or a square value;
inputting the corresponding frequency spectrum into a Mel filter bank, and obtaining the Mel frequency spectrum output by the Mel filter bank;
and carrying out cepstrum analysis on the MEL spectrum to obtain MEL spectrum cepstrum characteristics.
The process of pre-emphasis, framing and windowing of the voice information can reduce the interference of noise signals, enhance the signal-to-noise ratio of the voice signals and improve the accuracy. And (3) converting each frame of processed voice information through FFT (fast Fourier transform) and taking an absolute value or a square value to obtain a corresponding energy spectrum, inputting the spectrum into a Mel filter bank, and converting the physical frequency scale of the spectrum into a Mel scale by the Mel filter bank, namely converting a linear natural spectrum into a Mel spectrum capable of reflecting human auditory characteristics.
Cepstrum analysis of MEL spectrum is mainly to perform logarithmic sum inverse transformation on MEL spectrum, which can be usually implemented by DCT discrete cosine transform. And after carrying out cepstrum analysis on the MEL frequency spectrum, obtaining an MEL frequency cepstrum coefficient, wherein the MEL frequency cepstrum coefficient is the MEL frequency spectrum cepstrum characteristic.
Specifically, the step of extracting the bottleck feature in the voice information includes:
pre-emphasis, framing and windowing are carried out on the voice information;
converting the processed voice information through FFT, and obtaining a corresponding frequency spectrum after taking an absolute value or a square value;
inputting the corresponding frequency spectrum into a Mel filter bank, and obtaining the Mel frequency spectrum output by the Mel filter bank;
taking logarithm of the MEL spectrum to obtain FBanks characteristics;
inputting the FBanks characteristics into a DNN model, and extracting node excitation values of a Bottleneck layer in the DNN model to obtain Bottleneck characteristics.
The Bottleneck feature is an excitation value of a node of the Bottleneck layer in the DNN network, in this embodiment, the DNN network is used as a feature extractor, the input is FBanks feature, the output is speaker identity, and the Bottleneck feature is extracted. In the cepstrum analysis of the MEL spectrum, the MEL spectrum is obtained by taking the logarithm, and the logarithmic energy value is calculated by taking the logarithm, so that the FBanks characteristic is obtained.
The MEL spectrum cepstrum features comprise multidimensional feature components, and as the identification capacity of each dimensional feature component for identifying a speaker is different, in some embodiments of the invention, the whole MEL spectrum cepstrum features are weighted and calculated according to the weight coefficient of each dimensional feature component of the MEL spectrum cepstrum features, thereby improving the characterization capacity of the MEL spectrum cepstrum features and the distinguishing property.
Specifically, in one implementation of the present invention, the weight coefficient of each dimension feature component may be calculated according to the following formula:
Figure SMS_1
wherein r is p And N represents the total dimension of the MEL spectrum cepstrum feature as the weight coefficient of the p-th dimension feature component.
It can be appreciated that in another implementation of the present invention, the weight coefficient of each dimension feature component may also be determined according to the contribution degree of each dimension feature component to the speaker recognition rate, so as to highlight the feature with large recognition contribution, and improve the overall recognition rate. In specific implementation, the contribution degree of each dimension characteristic component to the speaker recognition rate can be calculated by adopting an increasing and decreasing component method, and the calculation formula is as follows:
Figure SMS_2
wherein->
Figure SMS_3
For the recognition rate of MEL spectral cepstrum features from i-dimension to j-dimension, N represents the total dimension of MEL spectral cepstrum features. R (i) represents the average contribution value of the feature component of the i-th dimension to the recognition rate, a positive value of R (i) represents that the recognition rate is improved by adding the feature, and a negative value of R (i) represents that the recognition rate is reduced by adding the feature.
After the contribution degree of each dimension characteristic component to the recognition rate is obtained, the contribution degree of each dimension characteristic component is subjected to standardization processing, for example, a min-max standardization method can be adopted for processing. Specifically, the characteristic component with the largest contribution degree is set as 1, the characteristic component with the smallest contribution degree is set as 0.5, and based on the characteristic component, the weight coefficient is set according to the min-max standardization method for each dimension characteristic component of the MEL spectrum cepstrum characteristic, so that the weight coefficient of each dimension characteristic component is limited within [0.5,1 ].
Furthermore, the calculated weight coefficient Fourier series of each dimension characteristic component can be fitted, so that the weight coefficient is excessively smoother.
And carrying out feature fusion on the MEL spectrum cepstrum features and the Bottleneck features after weighted calculation, namely carrying out splicing fusion on the two features in a vector dimension in a superposition dimension mode to obtain acoustic features containing more feature information.
Step S12, inputting the acoustic features into a voiceprint recognition model to identify the current voice personnel, and judging whether the current voice personnel are the personnel in a preset list according to the recognition result.
And step S13, when the current voice personnel are personnel in a preset list, the voice information is sent to a broadcasting terminal for playing.
And inputting the obtained acoustic features into a voiceprint recognition model, matching with voiceprint feature data of different people, and outputting identity information of the current voice personnel. The voiceprint recognition model is, for example, a UBM/i-vector model, and can accurately recognize identity information of a voice person through data set training in advance.
The preset personnel list records the identity information of a plurality of personnel, and generally records the identity information of the personnel allowed to use the broadcasting system. And comparing the identity information of the current voice personnel output by the voiceprint recognition model with the preset personnel list, and determining whether the current voice personnel are personnel in the preset personnel list. If yes, the broadcasting system is allowed to be used, namely the voice information of the current voice personnel is sent to a broadcasting terminal to be played.
And S14, when the current voice personnel are not personnel in a preset list, extracting the content of the voice information, and analyzing to judge whether the content of the voice information meets the broadcasting requirement.
And step S15, when the content of the voice information meets the broadcasting requirement, the voice information is sent to a broadcasting terminal for playing.
If the current voice personnel is not the personnel in the preset personnel list, carrying out content identification on the voice information of the current voice personnel, and analyzing whether the content meets the broadcasting requirement.
In the specific implementation, the content in the voice information is identified through a voice identification algorithm and is matched with a sensitive word database so as to judge whether the voice information contains sensitive words, if so, the voice information is determined not to be in accordance with the playing requirement, and the voice information is not played and can be early-warned; if not, the voice content is judged to be in accordance with the requirement, and the voice content is sent to a broadcasting terminal for playing.
In this embodiment, acoustic feature extraction is performed on the voice information collected by the microphone, the identity of the current voice personnel is identified by using a voiceprint identification model, whether the current voice personnel is a person in a preset list is determined according to the identification result, if yes, the voice information of the current voice personnel is played, if no, the content of the voice information of the current voice personnel is analyzed, and whether the content meets the broadcasting requirement is determined, if yes, the playing is performed. The embodiment standardizes the use of the broadcasting system by identifying the identity of the current voice personnel and judging the voice content.
Referring to fig. 3, a broadcast information playing device according to an embodiment of the invention includes:
the feature extraction module 31 is configured to obtain voice information collected by the microphone, and extract acoustic features in the voice information;
the identity recognition module 32 is configured to input the acoustic features into a voiceprint recognition model to identify a current voice person;
a first judging module 33, configured to judge whether the current voice person is a person in a preset list according to the recognition result;
a first sending module 34, configured to send the voice information to a broadcast terminal for playing when the current voice person is a person in a preset list;
a second judging module 35, configured to extract the content of the voice information and analyze the content to determine whether the content of the voice information meets the broadcasting requirement when the current voice person is not a person in the preset list;
and the second sending module 36 is configured to send the voice information to a broadcast terminal for playing when the content of the voice information meets the broadcasting requirement.
Further, in the broadcast information playing device, the feature extraction module 31 is specifically configured to:
extracting MEL spectrum cepstrum features and Bottleneck features in the voice information;
calculating the weight coefficient of each dimension characteristic component of the MEL spectrum cepstrum feature, and carrying out weighted calculation on the MEL spectrum cepstrum feature according to the weight coefficient of each dimension characteristic component;
and carrying out feature fusion on the MEL spectrum cepstrum feature and the Bottleneck feature after weighted calculation to obtain acoustic features in the voice information.
Further, in the broadcast information playing device, the step of calculating the weight coefficient of each dimension feature component of the MEL spectrum cepstrum feature includes:
calculating contribution degrees of each dimension characteristic component of the MEL spectrum cepstrum characteristic to the speaker identity recognition rate respectively;
carrying out standardization processing on the contribution degree of each dimension characteristic component to the speaker identity recognition rate by adopting a min-max standardization method;
and determining the weight coefficient of each dimension characteristic component according to the contribution degree after the normalization processing.
The broadcast information playing device provided by the embodiment of the present invention has the same implementation principle and technical effects as those of the foregoing method embodiment, and for brevity, reference may be made to corresponding contents in the foregoing method embodiment where the device embodiment is not mentioned.
In another aspect, referring to fig. 4, an electronic device according to an embodiment of the present invention includes a processor 10, a memory 20, and a computer program 30 stored in the memory and capable of running on the processor, where the broadcast information playing method described above is implemented when the processor 10 executes the computer program 30.
The electronic device may be, but is not limited to, a personal computer, a server, or other computer device. The processor 10 may in some embodiments be a central processing unit (CentralProcessing Unit, CPU), controller, microcontroller, microprocessor or other data processing chip for executing program code or processing data stored in the memory 20, etc.
The memory 20 includes at least one type of readable storage medium including flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 20 may in some embodiments be an internal storage unit of the electronic device, such as a hard disk of the electronic device. The memory 20 may also be an external storage device of the electronic device in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash Card (FlashCard) or the like. Further, the memory 20 may also include both internal storage units and external storage devices of the electronic device. The memory 20 may be used not only for storing application software installed in an electronic device, various types of data, and the like, but also for temporarily storing data that has been output or is to be output.
Optionally, the electronic device may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), a network interface, a communication bus, etc., and an optional user interface may further comprise a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (organic light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device and for displaying a visual user interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), and is typically used to establish a communication connection between the device and other electronic devices. The communication bus is used to enable connected communication between these components.
It should be noted that the structure shown in fig. 4 does not constitute a limitation of the electronic device, and in other embodiments the electronic device may comprise fewer or more components than shown, or may combine certain components, or may have a different arrangement of components.
The present invention also proposes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a broadcast information playback method as described above.
Those of skill in the art will appreciate that the logic and/or steps represented in the flow diagrams or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus (e.g., a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus). For the purposes of this description, a "computer-readable medium" can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (10)

1. A broadcast information playing method, comprising:
acquiring voice information acquired by a microphone, and extracting acoustic features in the voice information;
inputting the acoustic features into a voiceprint recognition model to identify the current voice personnel;
judging whether the current voice personnel are personnel in a preset list according to the identification result;
when the current voice personnel are personnel in a preset list, sending the voice information to a broadcasting terminal for playing;
when the current voice personnel are not personnel in a preset list, extracting the content of the voice information, and analyzing to judge whether the content of the voice information meets the broadcasting requirement;
and when the content of the voice information meets the broadcasting requirement, sending the voice information to a broadcasting terminal for broadcasting.
2. The broadcasting information broadcasting method of claim 1, wherein the step of extracting acoustic features in the voice information comprises:
extracting MEL spectrum cepstrum features and Bottleneck features in the voice information;
calculating the weight coefficient of each dimension characteristic component of the MEL spectrum cepstrum feature, and carrying out weighted calculation on the MEL spectrum cepstrum feature according to the weight coefficient of each dimension characteristic component;
and carrying out feature fusion on the MEL spectrum cepstrum feature and the Bottleneck feature after weighted calculation to obtain acoustic features in the voice information.
3. The broadcast information playback method of claim 2, wherein the step of calculating weight coefficients for each dimensional feature component of the MEL spectral cepstrum feature comprises:
calculating contribution degrees of each dimension characteristic component of the MEL spectrum cepstrum characteristic to the speaker identity recognition rate respectively;
carrying out standardization processing on the contribution degree of each dimension characteristic component to the speaker identity recognition rate by adopting a min-max standardization method;
and determining the weight coefficient of each dimension characteristic component according to the contribution degree after the normalization processing.
4. The broadcast information playing method of claim 2, wherein the step of extracting the bottleck feature in the voice information includes:
pre-emphasis, framing and windowing are carried out on the voice information;
converting the processed voice information through FFT, and obtaining a corresponding frequency spectrum after taking an absolute value or a square value;
inputting the corresponding frequency spectrum into a Mel filter bank, and obtaining the Mel frequency spectrum output by the Mel filter bank;
taking logarithm of the MEL spectrum to obtain FBanks characteristics;
inputting the FBanks characteristics into a DNN model, and extracting node excitation values of a Bottleneck layer in the DNN model to obtain Bottleneck characteristics.
5. The broadcasting information broadcasting method of claim 1, wherein the step of extracting the contents of the voice information and analyzing to determine whether the contents of the voice information meet broadcasting requirements comprises:
identifying the content in the voice information through a voice identification algorithm, and matching with a sensitive word database;
judging whether the voice information contains sensitive words or not according to the matching result;
if not, determining that the voice information meets the broadcasting requirement.
6. A broadcast information playback apparatus, comprising:
the feature extraction module is used for acquiring voice information acquired by the microphone and extracting acoustic features in the voice information;
the identity recognition module is used for inputting the acoustic characteristics into a voiceprint recognition model so as to recognize the identity of the current voice personnel;
the first judging module is used for judging whether the current voice personnel are personnel in a preset list or not according to the identification result;
the first sending module is used for sending the voice information to a broadcasting terminal for playing when the current voice personnel are personnel in a preset list;
the second judging module is used for extracting the content of the voice information when the current voice personnel are not personnel in a preset list and analyzing the content to judge whether the content of the voice information meets the broadcasting requirement or not;
and the second sending module is used for sending the voice information to a broadcasting terminal for playing when the content of the voice information meets the broadcasting requirement.
7. The broadcast information playback apparatus of claim 6, wherein the feature extraction module is specifically configured to:
extracting MEL spectrum cepstrum features and Bottleneck features in the voice information;
calculating the weight coefficient of each dimension characteristic component of the MEL spectrum cepstrum feature, and carrying out weighted calculation on the MEL spectrum cepstrum feature according to the weight coefficient of each dimension characteristic component;
and carrying out feature fusion on the MEL spectrum cepstrum feature and the Bottleneck feature after weighted calculation to obtain acoustic features in the voice information.
8. The broadcast information playback apparatus of claim 7, wherein the step of calculating a weight coefficient for each dimensional feature component of the MEL spectral cepstrum feature comprises:
calculating contribution degrees of each dimension characteristic component of the MEL spectrum cepstrum characteristic to the speaker identity recognition rate respectively;
carrying out standardization processing on the contribution degree of each dimension characteristic component to the speaker identity recognition rate by adopting a min-max standardization method;
and determining the weight coefficient of each dimension characteristic component according to the contribution degree after the normalization processing.
9. A readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the broadcast information playing method according to any one of claims 1 to 5.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the broadcast information playing method according to any one of claims 1 to 5 when executing the computer program.
CN202310202075.7A 2023-03-06 2023-03-06 Broadcast information playing method and device, readable storage medium and electronic equipment Active CN116072123B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310202075.7A CN116072123B (en) 2023-03-06 2023-03-06 Broadcast information playing method and device, readable storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310202075.7A CN116072123B (en) 2023-03-06 2023-03-06 Broadcast information playing method and device, readable storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN116072123A true CN116072123A (en) 2023-05-05
CN116072123B CN116072123B (en) 2023-06-23

Family

ID=86174992

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310202075.7A Active CN116072123B (en) 2023-03-06 2023-03-06 Broadcast information playing method and device, readable storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN116072123B (en)

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103458412A (en) * 2012-06-04 2013-12-18 百度在线网络技术(北京)有限公司 System and method for preventing phone fraud, mobile terminal and cloud terminal analysis server
CN106101819A (en) * 2016-06-21 2016-11-09 武汉斗鱼网络科技有限公司 A kind of live video sensitive content filter method based on speech recognition and device
CN106856541A (en) * 2016-11-30 2017-06-16 努比亚技术有限公司 A kind of terminal and method for secret protection
CN108447490A (en) * 2018-02-12 2018-08-24 阿里巴巴集团控股有限公司 The method and device of Application on Voiceprint Recognition based on Memorability bottleneck characteristic
CN109039509A (en) * 2018-07-16 2018-12-18 广州辉群智能科技有限公司 A kind of method and broadcasting equipment of voice control broadcasting equipment
US20200005773A1 (en) * 2017-11-28 2020-01-02 International Business Machines Corporation Filtering data in an audio stream
CN110827792A (en) * 2019-11-15 2020-02-21 广州视源电子科技股份有限公司 Voice broadcasting method and device
CN111768789A (en) * 2020-08-03 2020-10-13 上海依图信息技术有限公司 Electronic equipment and method, device and medium for determining identity of voice sender thereof
CN113315994A (en) * 2021-04-23 2021-08-27 北京达佳互联信息技术有限公司 Live broadcast data processing method and device, electronic equipment and storage medium
US11165911B1 (en) * 2020-08-26 2021-11-02 Stereo App Limited Complex computing network for improving establishment and broadcasting of audio communication among mobile computing devices and for improving speaker-listener engagement using audio conversation control
CN113766256A (en) * 2021-02-09 2021-12-07 北京沃东天骏信息技术有限公司 Live broadcast wind control method and device
US20220059077A1 (en) * 2020-08-19 2022-02-24 Sorenson Ip Holdings, Llc Training speech recognition systems using word sequences
CN114613368A (en) * 2022-03-08 2022-06-10 广州国音智能科技有限公司 Cloud server, identity authentication method and system based on multiple devices
US20220198468A1 (en) * 2020-02-18 2022-06-23 Tencent Technology (Shenzhen) Company Limited Speech information communication management method and apparatus, storage medium, and device
CN115240652A (en) * 2022-06-02 2022-10-25 福建新大陆通信科技股份有限公司 Emergency broadcast sensitive word recognition method
CN115346532A (en) * 2021-05-11 2022-11-15 中国移动通信集团有限公司 Optimization method of voiceprint recognition system, terminal device and storage medium
CN115719595A (en) * 2022-10-12 2023-02-28 厦门快商通科技股份有限公司 Voiceprint recognition method and system for purifying network environment during live broadcast

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103458412A (en) * 2012-06-04 2013-12-18 百度在线网络技术(北京)有限公司 System and method for preventing phone fraud, mobile terminal and cloud terminal analysis server
CN106101819A (en) * 2016-06-21 2016-11-09 武汉斗鱼网络科技有限公司 A kind of live video sensitive content filter method based on speech recognition and device
CN106856541A (en) * 2016-11-30 2017-06-16 努比亚技术有限公司 A kind of terminal and method for secret protection
US20200005773A1 (en) * 2017-11-28 2020-01-02 International Business Machines Corporation Filtering data in an audio stream
CN108447490A (en) * 2018-02-12 2018-08-24 阿里巴巴集团控股有限公司 The method and device of Application on Voiceprint Recognition based on Memorability bottleneck characteristic
CN109039509A (en) * 2018-07-16 2018-12-18 广州辉群智能科技有限公司 A kind of method and broadcasting equipment of voice control broadcasting equipment
CN110827792A (en) * 2019-11-15 2020-02-21 广州视源电子科技股份有限公司 Voice broadcasting method and device
US20220198468A1 (en) * 2020-02-18 2022-06-23 Tencent Technology (Shenzhen) Company Limited Speech information communication management method and apparatus, storage medium, and device
CN111768789A (en) * 2020-08-03 2020-10-13 上海依图信息技术有限公司 Electronic equipment and method, device and medium for determining identity of voice sender thereof
US20220059077A1 (en) * 2020-08-19 2022-02-24 Sorenson Ip Holdings, Llc Training speech recognition systems using word sequences
US11165911B1 (en) * 2020-08-26 2021-11-02 Stereo App Limited Complex computing network for improving establishment and broadcasting of audio communication among mobile computing devices and for improving speaker-listener engagement using audio conversation control
CN113766256A (en) * 2021-02-09 2021-12-07 北京沃东天骏信息技术有限公司 Live broadcast wind control method and device
CN113315994A (en) * 2021-04-23 2021-08-27 北京达佳互联信息技术有限公司 Live broadcast data processing method and device, electronic equipment and storage medium
CN115346532A (en) * 2021-05-11 2022-11-15 中国移动通信集团有限公司 Optimization method of voiceprint recognition system, terminal device and storage medium
CN114613368A (en) * 2022-03-08 2022-06-10 广州国音智能科技有限公司 Cloud server, identity authentication method and system based on multiple devices
CN115240652A (en) * 2022-06-02 2022-10-25 福建新大陆通信科技股份有限公司 Emergency broadcast sensitive word recognition method
CN115719595A (en) * 2022-10-12 2023-02-28 厦门快商通科技股份有限公司 Voiceprint recognition method and system for purifying network environment during live broadcast

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MICHAŁ RACZYŃSKI: "Speech processing algorithm for isolated words recognition", 2018 IEEE *
丁森华;刘春江;张乃光;马艳;: "一种基于数字电视的应急广播系统设计", 电视技术, no. 01 *

Also Published As

Publication number Publication date
CN116072123B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
WO2021208287A1 (en) Voice activity detection method and apparatus for emotion recognition, electronic device, and storage medium
WO2018166187A1 (en) Server, identity verification method and system, and a computer-readable storage medium
WO2018149077A1 (en) Voiceprint recognition method, device, storage medium, and background server
CN103475490B (en) A kind of auth method and device
EP1210711B1 (en) Sound source classification
JP2021527840A (en) Voiceprint identification methods, model training methods, servers, and computer programs
US20150112682A1 (en) Method for verifying the identity of a speaker and related computer readable medium and computer
US20070299671A1 (en) Method and apparatus for analysing sound- converting sound into information
JP2014502375A (en) Passphrase modeling device and method for speaker verification, and speaker verification system
CN109036382A (en) A kind of audio feature extraction methods based on KL divergence
WO2021151310A1 (en) Voice call noise cancellation method, apparatus, electronic device, and storage medium
CN112328994A (en) Voiceprint data processing method and device, electronic equipment and storage medium
CN113330511B (en) Voice recognition method, voice recognition device, storage medium and electronic equipment
US9947323B2 (en) Synthetic oversampling to enhance speaker identification or verification
CN110473552A (en) Speech recognition authentication method and system
CN109272991A (en) Method, apparatus, equipment and the computer readable storage medium of interactive voice
EP4233047A1 (en) Method for detecting an audio adversarial attack with respect to a voice input processed by an automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium
CN111179940A (en) Voice recognition method and device and computing equipment
JP4717872B2 (en) Speaker information acquisition system and method using voice feature information of speaker
CN109545226A (en) A kind of audio recognition method, equipment and computer readable storage medium
Goh et al. Robust computer voice recognition using improved MFCC algorithm
CN111145761B (en) Model training method, voiceprint confirmation method, system, device and medium
CN116072123B (en) Broadcast information playing method and device, readable storage medium and electronic equipment
CN111429919A (en) Anti-sound crosstalk method based on conference recording system, electronic device and storage medium
JP6996627B2 (en) Information processing equipment, control methods, and programs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant