WO2020233068A1 - Conference audio control method, system, device and computer readable storage medium - Google Patents

Conference audio control method, system, device and computer readable storage medium Download PDF

Info

Publication number
WO2020233068A1
WO2020233068A1 PCT/CN2019/121711 CN2019121711W WO2020233068A1 WO 2020233068 A1 WO2020233068 A1 WO 2020233068A1 CN 2019121711 W CN2019121711 W CN 2019121711W WO 2020233068 A1 WO2020233068 A1 WO 2020233068A1
Authority
WO
WIPO (PCT)
Prior art keywords
conference
audio
keywords
preset
word
Prior art date
Application number
PCT/CN2019/121711
Other languages
French (fr)
Chinese (zh)
Inventor
齐燕
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2020233068A1 publication Critical patent/WO2020233068A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms

Abstract

The present application provides a conference audio control method, system, device based on voice detection technology, and a computer readable storage medium, the method includes: receiving conference audio, performing voice detection on the conference audio, determining whether the conference audio includes user voice; if the conference audio includes user voice, extracting the user voice in the conference audio, converting the user voice into text data; comparing and matching the text data with preset conference keywords, and determining whether to output the conference audio according to the matching result of the text data and the conference keywords. The present application can automatically mute users who are not speaking, reduce manual operations, and improve efficiency.

Description

会议音频控制方法、系统、设备及计算机可读存储介质Conference audio control method, system, equipment and computer readable storage medium
本申请要求于2019年05月21日提交中国专利局、申请号为201910432253.9、发明名称为“会议音频控制方法、系统、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 21, 2019, the application number is 201910432253.9, and the invention title is "Conference audio control method, system, equipment, and computer-readable storage medium", and its entire content Incorporated in the application by reference.
技术领域Technical field
本申请涉及会议音频控制技术领域,尤其涉及一种会议音频控制方法、系统、设备及计算机可读存储介质。This application relates to the technical field of conference audio control, and in particular to a conference audio control method, system, device, and computer-readable storage medium.
背景技术Background technique
目前的多方会议系统多人接入时,通常需要手动控制每个与会方的音频是否打开。这需要一个会议发起人不断地看是否有人说话,并打开此方话筒。这种操作需要大量手动控制,自动化程度低,会议效率低。In the current multi-party conference system, when multiple people are connected, it is usually necessary to manually control whether the audio of each participant is turned on. This requires a conference initiator to constantly check if anyone is speaking and turn on the microphone of the party. This kind of operation requires a lot of manual control, the degree of automation is low, and the meeting efficiency is low.
发明内容Summary of the invention
本申请的主要目的在于提供一种会议音频控制方法、系统、设备及计算机可读存储介质,旨在解决现有会议音频控制系统智能化程度较低的技术问题。The main purpose of this application is to provide a conference audio control method, system, equipment, and computer-readable storage medium, aiming to solve the technical problem of low intelligence of the existing conference audio control system.
为实现上述目的,本申请提供一种会议音频控制方法,所述会议音频控制方法包括以下步骤:In order to achieve the above objective, this application provides a conference audio control method. The conference audio control method includes the following steps:
接收会议音频,对所述会议音频进行语音检测,判断所述会议音频中是否包含用户语音;Receiving conference audio, performing voice detection on the conference audio, and determining whether the conference audio includes user voice;
若所述会议音频中包含用户语音,则提取所述会议音频中的用户语音,将所述用户语音转换为文本数据;If the conference audio includes user voice, extract the user voice in the conference audio, and convert the user voice into text data;
将所述文本数据与预置的会议关键词进行对比匹配,并根据所述文本数据与所述会议关键词的匹配结果判断是否输出所述会议音频;Comparing and matching the text data with preset conference keywords, and judging whether to output the conference audio according to the matching result of the text data and the conference keywords;
其中,所述对所述会议音频进行语音检测,判断所述会议音频中是否包含用户语音的步骤包括:Wherein, the step of performing voice detection on the conference audio and judging whether the conference audio includes user voice includes:
从所述会议音频中提取音频帧,并获得所述音频帧的信号能量;Extract audio frames from the conference audio, and obtain signal energy of the audio frames;
输出用户静音提示,对无用户语音状态下的背景噪声进行采集,并获得背景噪声能量;Output user mute prompt, collect background noise when there is no user voice, and obtain background noise energy;
基于所述背景噪声能量以及预设的阈值公式计算预置的能量阈值,所述阈值公式为:E rnew=(1-p)E rold+pE silence,其中,E rnew为新的阈值,E rold为旧的阈值,E silence为背景噪声能量,p为加权值,p满足0<p<1; A preset energy threshold is calculated based on the background noise energy and a preset threshold formula. The threshold formula is: E rnew = (1-p) E rold + pE silence , where E rnew is the new threshold, E rold Is the old threshold, E silence is the background noise energy, p is the weighted value, and p satisfies 0<p<1;
将所述音频帧的信号能量与预置的能量阈值进行大小比较;Comparing the signal energy of the audio frame with a preset energy threshold;
若所述音频帧的信号能量大于预置的能量阈值,则判定所述音频帧为语音帧。If the signal energy of the audio frame is greater than the preset energy threshold, it is determined that the audio frame is a speech frame.
此外,为实现上述目的,本申请还提供一种会议音频控制系统,所述会议音频控制系统包括:In addition, in order to achieve the above objective, the present application also provides a conference audio control system, the conference audio control system including:
语音检测模块,接收会议音频,对所述会议音频进行语音检测,判断所述会议音频中是否包含用户语音;A voice detection module, which receives conference audio, performs voice detection on the conference audio, and determines whether the conference audio contains user voice;
文本转换模块,若所述会议音频中包含用户语音,则提取所述会议音频中的用户语音,将所述用户语音转换为文本数据;A text conversion module, if the conference audio includes user voice, extract the user voice in the conference audio, and convert the user voice into text data;
匹配输出模块,将所述文本数据与预置的会议关键词进行对比匹配,并根据所述文本数据与所述会议关键词的匹配结果判断是否输出所述会议音频;A matching output module that compares and matches the text data with preset conference keywords, and determines whether to output the conference audio according to the matching result of the text data and the conference keywords;
所述语音检测模块,还用于从所述会议音频中提取音频帧,并获得所述音频帧的信号能量;将所述音频帧的信号能量与预置的能量阈值进行大小比较;若所述音频帧的信号能量大于预置的能量阈值,则判定所述音频帧为语音帧;The voice detection module is further configured to extract audio frames from the conference audio and obtain the signal energy of the audio frames; compare the signal energy of the audio frames with a preset energy threshold; if the If the signal energy of the audio frame is greater than the preset energy threshold, it is determined that the audio frame is a speech frame;
所述语音检测模块,还用于输出用户静音提示,对无用户语音状态下的背景噪声进行采集,并获得背景噪声能量;基于所述背景噪声能量以及预设的阈值公式计算预置的能量阈值,所述阈值公式为:E rnew=(1-p)E rold+pE silence,其中,E rnew为新的阈值,E rold为旧的阈值,E silence为背景噪声能量,p为加权值,p满足0<p<1。 The voice detection module is also used to output a user mute prompt, collect background noise in the state of no user voice, and obtain background noise energy; calculate a preset energy threshold based on the background noise energy and a preset threshold formula , The threshold formula is: E rnew = (1-p) E rold + pE silence , where E rnew is the new threshold, E rold is the old threshold, E silence is the background noise energy, p is the weighted value, and p Satisfy 0<p<1.
此外,为实现上述目的,本申请还提供一种会议音频控制设备,所述会议音频控制设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的计算机可读指令,其中所述计算机可读指令被所述处理器执行时,实现如上述的会议音频控制方法的步骤。In addition, in order to achieve the above object, the present application also provides a conference audio control device, the conference audio control device includes a processor, a memory, and computer-readable instructions stored on the memory and executable by the processor , Wherein when the computer-readable instructions are executed by the processor, the steps of the above-mentioned conference audio control method are realized.
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所 述计算机可读存储介质上存储有计算机可读指令,其中所述计算机可读指令被处理器执行时,实现如上述的会议音频控制方法的步骤。In addition, in order to achieve the above objective, the present application also provides a computer-readable storage medium having computer-readable instructions stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the implementation is as described above The steps of the conference audio control method.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其他特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.
附图说明Description of the drawings
图1是本申请实施例方案涉及的硬件运行环境的会议音频控制设备结构示意图;FIG. 1 is a schematic structural diagram of a conference audio control device in a hardware operating environment involved in a solution of an embodiment of the present application;
图2为本申请会议音频控制方法一实施例的流程示意图;2 is a schematic flowchart of an embodiment of an audio control method for a conference according to the application;
图3为本申请会议音频控制系统一实施例的功能模块示意图。FIG. 3 is a schematic diagram of functional modules of an embodiment of the conference audio control system of this application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the application, and are not used to limit the application.
请参见图1,图1为本申请所提供的会议音频控制设备的硬件结构示意图。Please refer to Figure 1, which is a schematic diagram of the hardware structure of the conference audio control device provided by this application.
会议音频控制设备可以是PC,也可以是智能手机、平板电脑、便携计算机、台式计算机等设备,会议成员通过会议音频控制设备参与会议,会议音频控制设备上可安装音、视频采集装置,也可以是由会议音频控制设备外接音、视频采集设备,会议音频控制设备还可安装显示装置以及音频输出装置,用于显示会议视频及输出会议音频;可选地,会议音频控制设备也可以是服务器设备,连接分布在不同地址的会议终端,接收会议终端发送的会议音频,并将分析后可输出的会议音频输出到会议终端。The conference audio control equipment can be a PC, or smart phones, tablet computers, portable computers, desktop computers and other equipment. Conference members participate in the meeting through the conference audio control equipment. The conference audio control equipment can be equipped with audio and video capture devices, or The meeting audio control equipment is connected to the audio and video acquisition equipment. The meeting audio control equipment can also be equipped with a display device and an audio output device to display the meeting video and output the meeting audio; optionally, the meeting audio control equipment can also be a server device , Connect the conference terminals distributed in different addresses, receive the conference audio sent by the conference terminal, and output the conference audio that can be output after analysis to the conference terminal.
会议音频控制设备可以包括:处理器101以及存储器201等部件。在会议音频控制设备中,处理器101与存储器201连接,存储器201上存储有计算机可读指令,处理器101可以调用存储器201中存储的计算机可读指令,并实现如下述会议音频控制方法各实施例的步骤。The conference audio control device may include components such as a processor 101 and a memory 201. In the conference audio control device, the processor 101 is connected to the memory 201, and computer readable instructions are stored on the memory 201. The processor 101 can call the computer readable instructions stored in the memory 201 and implement the following implementations of the conference audio control method Example steps.
本领域技术人员可以理解,图1中示出的会议音频控制设备结构并不 构成对会议音频控制设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the structure of the conference audio control device shown in FIG. 1 does not constitute a limitation on the conference audio control device, and may include more or less components than shown in the figure, or a combination of certain components, or different The layout of the components.
基于上述结构,提出本申请会议音频控制方法的下述各个实施例。Based on the above structure, the following embodiments of the conference audio control method of the present application are proposed.
本申请提供一种会议音频控制方法。This application provides a conference audio control method.
参照图2,图2为本申请会议音频控制方法第一实施例的流程示意图。Referring to Fig. 2, Fig. 2 is a schematic flowchart of a first embodiment of a conference audio control method according to this application.
本实施例中,会议音频控制方法包括以下步骤:In this embodiment, the conference audio control method includes the following steps:
步骤S10,接收会议音频,对所述会议音频进行语音检测,判断所述会议音频中是否包含用户语音;Step S10: Receive conference audio, perform voice detection on the conference audio, and determine whether the conference audio contains user voice;
由上文可知,会议音频控制设备可以为会议终端设备,此处的会议终端设备指会议成员用以参与会议的终端设备,例如,会议成员通过智能手机参与到企业部门会议中,此例中的智能手机即为会议音频控制设备;会议音频控制设备也可以为服务器设备,其中,服务器设备指远程处理会议数据的设备,处理会议数据可指将来自某一会议成员的会议音频传输到其他会议成员终端设备,例如,服务器设备H连接会议成员A、B、C,会议成员A、B、C分别通过三个不同的会议终端设备a、b、c参与到会议中,设备a将会议成员A的音频传输到服务器设备H,再由服务器设备H传输给会议终端设备b、c。It can be seen from the above that the conference audio control device can be a conference terminal device. The conference terminal device here refers to the terminal device used by the conference members to participate in the conference. For example, the conference member participates in a corporate department meeting through a smart phone. In this example, A smart phone is the conference audio control device; the conference audio control device can also be a server device, where the server device refers to a device that remotely processes conference data, and processing conference data can refer to the transmission of conference audio from one conference member to other conference members Terminal equipment, for example, server equipment H is connected to conference members A, B, and C. Conference members A, B, and C participate in the conference through three different conference terminal devices a, b, and c. Device a connects conference member A’s The audio is transmitted to the server device H, and then transmitted from the server device H to the conference terminal devices b and c.
在本申请会议音频控制方法各实施例的解释说明中,以会议终端设备作为会议音频控制设备为例进行说明,且在下文中,会议音频控制设备可简称为设备。In the explanation of each embodiment of the conference audio control method of this application, the conference terminal device is used as the conference audio control device as an example for description, and in the following, the conference audio control device may be referred to as a device for short.
在一实施方式中,会议音频指本地装置/设备采集的会议音频,即设备上的音频采集装置(录音装置)或者设备外接的音频采集设备采集所处空间内的音频信号,音频采集装置/设备将采集的音频信号传输给设备,即设备接收本地的会议音频。例如,会议成员A通过设备a参与到会议中,设备a外接的录音设备L采集会议成员A所在空间的音频信号,并传输给设备a,此处由录音设备L采集的音频信号即为本实施例中的会议音频。在本实施方式中,在将可传输的会议音频直接或(通过服务器)间接输出到其他会议成员终端之前,在本地对会议音频进行分析处理(分析处理指语音检测、文本关键词检测等处理操作),而不是将采集的会议音频通过网络带宽直接或(通过服务器)间接输出到其他会议成员终端,因而避免对无需 输出给其他会议成员的音频进行不必要的网络传输,节省网络带宽,提升会议数据传输速率,进而提升会议数据传输的实时性。In one embodiment, conference audio refers to conference audio collected by a local device/equipment, that is, an audio collection device (recording device) on the device or an audio collection device external to the device collects audio signals in the space where the audio collection device/equipment is located. The collected audio signal is transmitted to the device, that is, the device receives the local conference audio. For example, conference member A participates in the conference through device a, and the recording device L connected to device a collects the audio signal of the space where conference member A is located and transmits it to device a. Here, the audio signal collected by recording device L is this implementation The conference audio in the example. In this embodiment, before outputting the transmittable conference audio directly or indirectly (through the server) to other conference member terminals, the conference audio is analyzed locally (analysis processing refers to processing operations such as voice detection, text keyword detection, etc. ), instead of outputting the collected conference audio directly or indirectly (through the server) to other conference member terminals through the network bandwidth, thus avoiding unnecessary network transmission of audio that does not need to be output to other conference members, saving network bandwidth and improving the conference Data transmission rate, thereby enhancing the real-time performance of conference data transmission.
在另一实施方式中,会议音频指服务器远程传输到本设备的其他会议成员的会议音频,例如,服务器设备H连接会议成员A、B、C,会议成员A、B、C分别通过三个不同的设备a、b、c参与到会议中,设备a将会议成员A的音频传输到服务器设备H,再由服务器设备H传输给设备b、c,其中,设备b、c接收到的会议成员A的音频为本实施例中的会议音频。在设备接收到服务器远程传输到本设备的其他会议成员的会议音频后,对接收的会议音频进行语音检测、文本关键词检测等处理、判断操作后确定输出或不输出。In another embodiment, the conference audio refers to the conference audio of other conference members remotely transmitted by the server to the device. For example, the server device H connects the conference members A, B, and C, and the conference members A, B, and C respectively pass three different conferences. The devices a, b, and c of the device participate in the conference, and the device a transmits the audio of the conference member A to the server device H, and then the server device H transmits the audio to the devices b and c. Among them, the conference member A received by the devices b and c The audio of is the conference audio in this embodiment. After the device receives the conference audio of other conference members remotely transmitted by the server to the device, it performs voice detection, text keyword detection and other processing on the received conference audio, and determines whether to output or not to output after the judgment operation.
对会议音频进行语音检测,即检测会议音频中是否存在用户语音,可基于音频信号能量差异分析是否存在语音,会议场景下的信噪比通常较高,因而语音对应的音频能量较高、背景噪音对应的音频能量较低,通过分析会议音频的能量分布状况可检测出其中是否存在语音以及语音分布与噪音分布。若会议音频中不包含用户语音,则不对会议音频进行后续操作,不输出会议音频。Perform voice detection on conference audio, that is, detect whether there is user voice in the conference audio, and analyze whether there is voice based on the energy difference of audio signals. The signal-to-noise ratio in the conference scene is usually high, so the corresponding audio energy of the voice is high and background noise The corresponding audio energy is low. By analyzing the energy distribution of the conference audio, it can be detected whether there is speech, and the distribution of speech and noise. If the conference audio does not contain the user's voice, no follow-up operation is performed on the conference audio, and no conference audio is output.
步骤S20,若所述会议音频中包含用户语音,则提取所述会议音频中的用户语音,将所述用户语音转换为文本数据;Step S20: If the conference audio includes user voice, extract the user voice in the conference audio, and convert the user voice into text data;
鉴于背景噪音中也可能包含其他人的语音或者会议音频中包含与会议内容无关的发言内容,则为获得噪音更少的传输音频,获得更好的会议效果,本实施例还通过文本内容过滤噪音。Given that the background noise may also contain other people’s voices or the conference audio contains speech content that has nothing to do with the content of the conference, in order to obtain transmission audio with less noise and achieve better conference effects, this embodiment also filters noise through text content .
可对预设长度的会议音频进行语音转文本操作,判断会议发言内容是否与会议相关,若不相关,则很有可能是背景杂音或其他无需传输的声音,可不传输对应的会议音频。可选地,可以首先提取会议音频中的用户语音段,可通过分析会议音频中音频信号能量的变化确定用户语音段,其中可以获得语音对应的语音能量阈值,将各时刻音频对应的音频信号能量与语音能量阈值进行比较,确定音频信号能量大于或等于语音能量阈值的音频段,将该音频信号能量大于或等于语音能量阈值的音频段作为用户语音段。其次,将用户语音段转换为文字,获得用户语音段对应的文本数据。最后,将用户语音段对应的文本数据与预置的会议关键词进行对比,以判断用户语音段与会议是否相关。Voice-to-text operation can be performed on the conference audio of the preset length to determine whether the content of the conference speech is related to the conference. If it is not, it is likely to be background noise or other sounds that do not need to be transmitted. The corresponding conference audio may not be transmitted. Optionally, the user's voice segment in the conference audio can be extracted first, and the user's voice segment can be determined by analyzing changes in the audio signal energy in the conference audio. The voice energy threshold corresponding to the voice can be obtained, and the audio signal energy corresponding to the audio at each time can be obtained. Comparing with the voice energy threshold, determine the audio segment whose audio signal energy is greater than or equal to the voice energy threshold, and use the audio segment whose audio signal energy is greater than or equal to the voice energy threshold as the user's voice segment. Secondly, the user's voice segment is converted into text, and the text data corresponding to the user's voice segment is obtained. Finally, the text data corresponding to the user's voice segment is compared with preset conference keywords to determine whether the user's voice segment is related to the conference.
其中,将用户语音段转换为文本数据包括:将用户语音段划分为语音帧,分别从各语音帧中提取各语音帧对应的声学特征,此处的声学特征可以为MFCC(Mel-Frequency Cepstral Coefficients)特征;将各语音帧对应的声学特征输入到声学模型,由声学模型输出音素,其中,该声学模型可以为隐马尔可夫模型或深度学习模型,或者二者的混合模型;基于声学模型输出的音素组合成文本词语,即用户语音段对应的文本数据。Among them, converting the user voice segment into text data includes: dividing the user voice segment into voice frames, and extracting the corresponding acoustic features of each voice frame from each voice frame. The acoustic features here can be MFCC (Mel-Frequency Cepstral Coefficients). ) Features; the acoustic features corresponding to each speech frame are input to the acoustic model, and the acoustic model outputs phonemes, where the acoustic model can be a hidden Markov model or a deep learning model, or a hybrid model of the two; output based on the acoustic model The phonemes of is combined into text words, that is, the text data corresponding to the user's voice segment.
步骤S30,将所述文本数据与预置的会议关键词进行对比匹配,并根据所述文本数据与所述会议关键词的匹配结果判断是否输出所述会议音频。Step S30, comparing and matching the text data with preset conference keywords, and judging whether to output the conference audio according to the matching result of the text data and the conference keywords.
将文本数据与预置的会议关键词进行对比匹配,以判断用户语音段与会议是否相关,进而判断是否有必要输出会议音频。The text data is compared and matched with preset conference keywords to determine whether the user's voice segment is related to the conference, and then to determine whether it is necessary to output conference audio.
预置的会议关键词,可预存在本地或服务器的预设地址中。可预置关键词库,关键词库中存储有对应不同主题会议的关键词集合,可以由会议成员选择目标会议主题,进而确定对应的会议关键词,其中,目标会议主题可选择一个或多个。可选地,还可以由具有特殊权限的会议成员输入或指定会议关键词。每次会议中,在首次获得会议关键词后,将会议关键词缓存,以供该次会议的后续音频控制步骤中快速获取并使用。The preset conference keywords can be pre-stored in the preset address of the local or server. The keyword library can be preset. The keyword library stores keyword collections corresponding to different theme meetings. The meeting members can select the target meeting theme and then determine the corresponding meeting keywords. Among them, one or more target meeting themes can be selected . Optionally, meeting keywords can also be input or designated by meeting members with special permissions. In each meeting, after the meeting keywords are obtained for the first time, the meeting keywords are cached for quick acquisition and use in subsequent audio control steps of the meeting.
将文本数据与预置的会议关键词进行对比匹配,其中,文本数据由多个词语构成,因而,可将文本数据分词后获得文本词语,将各文本词语分别与预置的会议关键词进行是否相同以及是否含义相似的判断,若文本词语与预置的会议关键词相同或含义相似,则该文本词语与预置的会议关键词匹配成功。Compare and match the text data with preset conference keywords. The text data is composed of multiple words. Therefore, the text data can be divided into words to obtain text words, and each text word can be compared with the preset conference keywords. Judgment of the same and whether the meaning is similar. If the text word is the same or the meaning is similar to the preset meeting keyword, the text word is successfully matched with the preset meeting keyword.
在一实施方式中,只要文本数据中存在与预置的会议关键词匹配成功的文本词语,文本数据与预置的会议关键词就匹配成功,即用户语音段与会议相关,有必要输出会议音频;在另一实施方式中,文本数据中与会议关键词匹配成功的文本词语占比大于预设值时,文本数据与预置的会议关键词才匹配成功,例如,预设值为1/50,将文本数据分词后获得文本词语25个,其中与预置的会议关键词匹配成功的文本词语为5个,即文本数据中与会议关键词匹配成功的文本词语占比为5/25=1/5>1/50,则文本数据与预置的会议关键词匹配成功。In one embodiment, as long as there are text words in the text data that successfully match the preset meeting keywords, the text data and the preset meeting keywords will be matched successfully, that is, the user's voice segment is related to the meeting, and it is necessary to output the meeting audio In another embodiment, when the proportion of text words in the text data that are successfully matched with the meeting keywords is greater than the preset value, the text data and the preset meeting keywords are matched successfully, for example, the preset value is 1/50 , After dividing the text data into words, 25 text words are obtained, of which 5 text words that successfully match the preset conference keywords are 5, that is, the proportion of text words that successfully match the conference keywords in the text data is 5/25=1 /5>1/50, the text data matches the preset conference keywords successfully.
将文本数据与会议关键词进行对比匹配,根据文本数据与会议关键词间的匹配结果判断会议音频中的语音内容是否与会议相关,若相关,则输 出会议音频,若不相关,则不输出会议音频,其中,在一实施方式中,设备接收本地的会议音频,再进行本实施例中的语音检测、文本转换步骤之后,判定可输出会议音频,此处的输出指:将会议音频直接或间接输出到其他会议成员的终端;在另一实施方式中,会议音频指服务器远程传输到本设备的其他会议成员的会议音频,在传输到本设备之后,对会议音频进行本实施例中的语音检测、文本转换步骤之后,判定可输出会议音频,此处的输出指:将会议音频在本地会议终端输出。Compare and match the text data with the conference keywords, and determine whether the voice content in the conference audio is related to the conference based on the matching result between the text data and the conference keywords. If it is related, the conference audio will be output; if not, the conference will not be output. Audio, where, in one embodiment, the device receives the local conference audio, and after performing the voice detection and text conversion steps in this embodiment, it is determined that the conference audio can be output. The output here refers to: direct or indirect conference audio Output to the terminals of other conference members; in another embodiment, the conference audio refers to the conference audio of other conference members remotely transmitted by the server to the device. After being transmitted to the device, the conference audio is subjected to the voice detection in this embodiment , After the text conversion step, it is determined that the conference audio can be output. The output here refers to: output the conference audio on the local conference terminal.
本实施例通过接收会议音频,对所述会议音频进行语音检测,判断所述会议音频中是否包含用户语音,可避免将不包含用户语音的噪音进行输出,也可自动静音不发言的用户,去除背景噪声,减少人工操作,提升会议效率;若所述会议音频中包含用户语音,则提取所述会议音频中的用户语音,将所述用户语音转换为文本数据;将所述文本数据与预置的会议关键词进行对比匹配,并根据所述文本数据与所述会议关键词的匹配结果判断是否输出所述会议音频,可根据语音内容筛除与会议无关的会议音频,降低噪音干扰,减少网络带宽浪费。In this embodiment, by receiving the conference audio, voice detection is performed on the conference audio to determine whether the conference audio contains the user's voice, which can avoid outputting noise that does not contain the user's voice, and can also automatically mute users who are not speaking, and remove Background noise, reduce manual operations, and improve conference efficiency; if the conference audio contains user voice, extract the user voice in the conference audio, and convert the user voice into text data; combine the text data with presets To compare and match the conference keywords, and determine whether to output the conference audio according to the matching result of the text data and the conference keywords, and filter out conference audio irrelevant to the conference according to the voice content, reduce noise interference, and reduce network Bandwidth wasted.
进一步地,基于上述实施例,在本申请第二实施例中,步骤S10中所述对所述会议音频进行语音检测,判断所述会议音频中是否包含用户语音的步骤包括:Further, based on the foregoing embodiment, in the second embodiment of the present application, the step of performing voice detection on the conference audio in step S10, and determining whether the conference audio includes a user voice, includes:
步骤S11,从所述会议音频中提取音频帧,并获得所述音频帧的信号能量;Step S11, extract audio frames from the conference audio, and obtain signal energy of the audio frames;
可以根据预设的采样时间将会议音频划分为音频帧,采样时间可以为2.5ms~60ms,含义为取2.5ms~60ms为单位的数据量为一个音频帧。一段会议音频可能被划分为多个音频帧,后续能量大小判断以单个的音频帧为单位进行。可依据时间顺序依次提取会议音频中的音频帧。The conference audio can be divided into audio frames according to the preset sampling time. The sampling time can be 2.5ms~60ms, which means that the data volume in the unit of 2.5ms~60ms is taken as an audio frame. A piece of conference audio may be divided into multiple audio frames, and the subsequent energy size determination is performed in a single audio frame. The audio frames in the conference audio can be extracted sequentially according to time sequence.
对于音频帧的信号能量,可以用单位时间流经某处单位面积介质的能量的平均值的多少来表示这个地方声音的能量,公式为(P*w 2*u*A 2)/2,其中,P为介质密度,w声音频率,A为振幅,u为波速。 For the signal energy of the audio frame, the average value of the energy flowing through a unit area of the medium per unit time can be used to express the energy of the sound in this place. The formula is (P*w 2 *u*A 2 )/2, where , P is the density of the medium, w is the sound frequency, A is the amplitude, and u is the wave speed.
步骤S12,将所述音频帧的信号能量与预置的能量阈值进行大小比较;Step S12, comparing the signal energy of the audio frame with a preset energy threshold;
步骤S13,若所述音频帧的信号能量大于预置的能量阈值,则判定所述音频帧为语音帧。Step S13: If the signal energy of the audio frame is greater than a preset energy threshold, it is determined that the audio frame is a speech frame.
预置的能量阈值,指预先经实验确定的阈值,也可以是经验值,大于 该预置的能量阈值,则对应音频帧能量较高,该音频帧为语音帧,小于该预置的能量阈值,则对应音频帧能量较低,该音频帧为非语音帧。The preset energy threshold refers to the threshold determined by experiments in advance, or it can be an empirical value. If the energy threshold is greater than the preset energy threshold, the corresponding audio frame has a higher energy, and the audio frame is a speech frame, which is less than the preset energy threshold , The corresponding audio frame has lower energy, and the audio frame is a non-speech frame.
将音频帧的信号能量与预置的能量阈值进行大小比较,并根据大小比较结果分别对从会议音频中提取的所有音频帧进行语音帧与非语音帧的判定。The signal energy of the audio frame is compared with the preset energy threshold, and the speech frame and non-speech frame are judged respectively on all audio frames extracted from the conference audio according to the size comparison result.
可选地,所述步骤S12之前包括:Optionally, the step S12 includes:
步骤S14,输出用户静音提示,对无用户语音状态下的背景噪声进行采集,并获得背景噪声能量;Step S14, output a user mute prompt, collect background noise in the state of no user voice, and obtain background noise energy;
可以在会议开始之前或者会议之初,通过对无用户语音状态下的会议音频进行背景噪声能量采集,计算获得对应预置的能量阈值。Before the meeting or at the beginning of the meeting, the background noise energy can be collected from the meeting audio in the state of no user voice, and the corresponding preset energy threshold can be calculated.
用户静音提示,即提示会议成员保持静音、不要讲话的提示,可以以语音形式或文字形式输出,可选地,用户静音提示可包括保持静音的时间,如“请保持静音5秒”,可输出倒计时以提醒会议成员;可选地,用户静音提示可以一直保持,直至采集完无用户语音状态下的背景噪声。无用户语音状态,即在输出用户静音提示后、用户应保持静止的时间段。可选地,为防止因会议成员在用户静音提示后未保持静音导致用户语音被纳入背景噪声,可对此状态下的音频进行采集并进行语音检测,若存在语音,则重新输出用户静音提示,并重新进行背景噪声及其能量采集。User mute prompt, that is, a prompt to remind meeting members to keep mute and not to speak. It can be output in voice or text form. Optionally, user mute prompt can include the time to keep mute, such as "please keep mute for 5 seconds", which can be output Countdown to remind meeting members; optionally, the user mute prompt can be kept until the background noise in the state of no user voice is collected. No user voice state, that is, the time period during which the user should remain still after outputting the user mute prompt. Optionally, in order to prevent the user’s voice from being included in the background noise due to the meeting members’ failure to mute the user’s mute prompt, the audio in this state can be collected and voice detected. If there is voice, the user’s mute prompt will be output again. And re-acquire background noise and its energy.
步骤S15,基于所述背景噪声能量以及预设的阈值公式计算预置的能量阈值,所述阈值公式为:E rnew=(1-p)E rold+pE silence,其中,E rnew为新的阈值,E rold为旧的阈值,E silence为背景噪声能量,p为加权值,p满足0<p<1。 Step S15: Calculate a preset energy threshold based on the background noise energy and a preset threshold formula. The threshold formula is: Ernew = (1-p)E rold + pE silence , where E rnew is the new threshold , Erold is the old threshold, E silence is the background noise energy, p is the weighted value, and p satisfies 0<p<1.
在获得背景噪声能量后,即可基于背景噪声能量以及预设的阈值公式计算获得预置的能量阈值。预设的阈值公式存储于预设地址,在需计算预置的能量阈值时,仅需从预设地址获得,也可将获得的预置的能量阈值存储于固定地址,在需要进行语音判断时,从该固定地址直接获得预置的能量阈值,以便快速进行语音检测。After the background noise energy is obtained, the preset energy threshold can be calculated based on the background noise energy and the preset threshold formula. The preset threshold formula is stored in the preset address. When the preset energy threshold value needs to be calculated, it only needs to be obtained from the preset address, or the obtained preset energy threshold value can be stored in a fixed address, when voice judgment is required , Obtain the preset energy threshold directly from the fixed address for quick voice detection.
本实施例通过从所述会议音频中提取音频帧,并获得所述音频帧的信号能量;将所述音频帧的信号能量与预置的能量阈值进行大小比较;若所述音频帧的信号能量大于预置的能量阈值,则判定所述音频帧为语音帧,同时,采用基于无用户语音状态下的背景噪声能量及预设的阈值公式计算预置的能量阈值,可顺利实现对音频帧是否为语音帧的判断,以对是否进 行后续的语音转文本操作以及输出操作进行判断。This embodiment extracts audio frames from the conference audio, and obtains the signal energy of the audio frame; compares the signal energy of the audio frame with a preset energy threshold; if the signal energy of the audio frame If it is greater than the preset energy threshold, it is determined that the audio frame is a speech frame. At the same time, the preset energy threshold is calculated by using the background noise energy in the state of no user speech and the preset threshold formula, which can smoothly realize whether the audio frame is It is the judgment of the speech frame to judge whether to perform the subsequent speech-to-text operation and output operation.
进一步地,基于上述实施例,在本申请第三实施例中,步骤S30之前包括:Further, based on the foregoing embodiment, in the third embodiment of the present application, step S30 includes:
步骤S31,获取预存的会议资料,并基于所述会议资料获得目标文本集合,将所述目标文本集合中的目标文本进行分词,获得分词后的目标词语;Step S31: Obtain pre-stored meeting materials, obtain a target text set based on the meeting materials, perform word segmentation on the target text in the target text set, and obtain target words after word segmentation;
会议资料,指与会议有关的图文资料、音视频资料等,可以由会议成员上传,并存储于预置的资料地址,也可以针对不同的会议主题预存对应的会议资料。Conference materials refer to the graphic materials, audio and video materials, etc. related to the conference, which can be uploaded by the conference members and stored in the preset materials address, or corresponding conference materials can be pre-stored for different conference topics.
基于所述会议资料获得目标文本集合,指将会议资料中的图文资料、音视频资料进行图像转文字、音频转文字操作,得到各自对应文本,作为关键词提取的目标文本集合;对目标文本集合中的所有目标文本进行分词,得到分词后的词语,将该分词后得到的词语作为目标词语。其中,在将会议资料中的音频资料转化为文本数据之前,可以对其进行“降噪”处理,将文本数据中的无意义语气词去除后,再对文本数据进行分词。Obtaining a target text set based on the meeting materials refers to performing image-to-text and audio-to-text operations on the graphic materials and audio-visual materials in the meeting materials to obtain respective corresponding texts as the target text sets for keyword extraction; All target texts in the set are segmented to obtain the segmented words, and the segmented words are used as the target words. Among them, before the audio data in the conference materials is converted into text data, it can be subjected to "noise reduction" processing, after removing the meaningless modal particles in the text data, the text data can be segmented.
步骤S32,获得所述目标词语的词语特征,基于所述词语特征计算所述目标词语的权重值,其中,所述词语特征至少包括词性、词位置以及词频;Step S32: Obtain the word characteristics of the target word, and calculate the weight value of the target word based on the word characteristics, wherein the word characteristics include at least part of speech, word position, and word frequency;
分别对各个目标词语进行词语特征的提取,词语特征至少包括词性、词位置以及词频。在提取目标词语的词性特征时,将目标词语与不同词性库中的词语进行对比,确定目标词语所属词性库,该所属词性库对应的词性即为目标词语的词性;在提取目标词语的词位置特征时,获得目标词语在其所属文本的位置,可能为标题、首段、尾段、首句、尾句等;在提取目标词语的词频特征时,统计目标词语在目标文本集合中出现的总次数以及在其所属文本中出现的总次数。The word features are extracted for each target word, and the word features include at least part of speech, word position and word frequency. When extracting the part-of-speech features of the target word, compare the target word with words in different part-of-speech libraries to determine the part-of-speech library of the target word. The part-of-speech corresponding to the part-of-speech library is the part of speech of the target word; in extracting the word position of the target word When feature, obtain the position of the target word in the text to which it belongs, which may be title, first paragraph, last paragraph, first sentence, last sentence, etc.; when extracting the word frequency feature of the target word, count the total number of target words in the target text collection The number of times and the total number of occurrences in the text to which it belongs.
不同词性、词位置以及词频对应着不同子权重值,可预先为不同词性、词位置以及词频赋予不同的子权重值。具体地,对于词性,可以为不同词性预置对应的子权重值,如名词动词的子权重值为0.8,形容词/副词的子权重值为0.5,其他词性的的子权重值为0。Different parts of speech, word positions, and word frequencies correspond to different sub-weight values. Different parts of speech, word positions, and word frequencies can be assigned different sub-weight values in advance. Specifically, for parts of speech, corresponding sub-weight values can be preset for different parts of speech, for example, the sub-weight value of noun verbs is 0.8, the sub-weight value of adjectives/adverbs is 0.5, and the sub-weight values of other parts of speech are 0.
对于词位置,需预置各个位置的词的系数,用以标识不同位置在反映主题内容的重要性。出现在标题中的词比出现在文章其他位置(如段首、正文、段尾)的词更能反映主题,而出现在段首中的词比出现在段尾中的词更能反映主题,正文中的词比重最小。例如,对标题赋予系数0.8,段首 为0.6,段尾为0.5,正文为0.2,则对于某个词语,其位置对应子权重值(Y)为:For word positions, the coefficients of words in each position need to be preset to identify the importance of different positions in reflecting the content of the subject. The words that appear in the title reflect the theme better than the words that appear in other positions of the article (such as the beginning of the paragraph, the body, the end of the paragraph), and the words that appear in the beginning of the paragraph reflect the theme better than the words that appear in the end of the paragraph. The weight of words in the text is the smallest. For example, if you assign a coefficient of 0.8 to the title, 0.6 to the beginning of the paragraph, 0.5 to the end of the paragraph, and 0.2 to the body, then for a word, its position corresponds to the sub-weight value (Y):
Y=xl×0.8+x2×0.6+x3×0.5+x4×0.2Y=xl×0.8+x2×0.6+x3×0.5+x4×0.2
其中,x1指词在标题中出现的次数;x2指词在段首出现的次数;x3指词在段尾出现的次数;x4指词在正文中出现的次数。Among them, x1 refers to the number of times the word appears in the title; x2 refers to the number of times the word appears at the beginning of the paragraph; x3 refers to the number of times the word appears at the end of the paragraph; x4 refers to the number of times the word appears in the text.
对于词频,可以基于公式M=f/(1+f)计算词语的子权重值,其中,f表示词语在一篇文章中的词频,基于上述公式可使词语的子权重值随词频的增加而逐渐上升,当词语的词频逐渐增大时,该公式逐渐向1收敛,即词语出现的次数越多,该词作为关键词的可能性越大,同时可能性的增长又不是线性的,当词频特别高时,基本趋于稳定,比线性公式更加符合语言的实际。For word frequency, the sub-weight value of the word can be calculated based on the formula M=f/(1+f), where f represents the word frequency of the word in an article. Based on the above formula, the sub-weight value of the word can be increased with the increase of the word frequency Gradually increase, when the word frequency of the word gradually increases, the formula gradually converges to 1, that is, the more the word appears, the greater the possibility of the word as a keyword. At the same time, the increase in possibility is not linear. When it is extremely high, it basically tends to be stable, which is more in line with the reality of the language than the linear formula.
在计算获得词性、词位置以及词频各自对应的子权重值后,可将各子权重值求和,即可得到目标词语的权重值。After calculating the sub-weight values corresponding to the part of speech, word position and word frequency, the sub-weight values can be summed to obtain the weight value of the target word.
步骤S33,将权重值大于预设阀值的所述目标词语作为预置的会议关键词。Step S33: Use the target word with a weight value greater than a preset threshold as a preset conference keyword.
将权重值大于预设阀值的所有目标词语作为预置的会议关键词,在权重值大于预设阀值时,说明对应目标词语在会议资料中的重要性程度较高,可以作为会议关键词。预设阀值可以为经验值。Use all target words with a weight value greater than the preset threshold as the preset meeting keywords. When the weight value is greater than the preset threshold, it means that the corresponding target words are more important in the meeting materials and can be used as meeting keywords. . The preset threshold can be an empirical value.
本实施例通过对预存的会议资料进行分词,并对分词获得的目标词语进行词语特征提取,并基于词语特征计算所述目标词语的权重值,其中,所述词语特征至少包括词性、词位置以及词频;将权重值大于预设阀值的所述目标词语作为预置的会议关键词,可根据会议资料自动生成会议关键词,比起由会议成员手动输入会议关键词,本实施例可以获得更客观、全面的会议关键词,使得后续会议音频中用户语音与会议是否相关的判断更为准确。In this embodiment, word segmentation is performed on pre-stored conference materials, and word feature extraction is performed on the target word obtained by word segmentation, and the weight value of the target word is calculated based on the word feature, wherein the word feature includes at least part of speech, word position, and Word frequency; the target words with a weight value greater than a preset threshold are used as preset meeting keywords, and meeting keywords can be automatically generated according to meeting materials. Compared with manual input of meeting keywords by meeting members, this embodiment can obtain more Objective and comprehensive meeting keywords make it more accurate to judge whether the user's voice is related to the meeting in the subsequent meeting audio.
进一步地,基于上述实施例,在本申请第四实施例中,步骤S30中所述将所述文本数据与预置的会议关键词进行对比匹配的步骤包括:Further, based on the foregoing embodiment, in the fourth embodiment of the present application, the step of comparing and matching the text data with preset conference keywords in step S30 includes:
步骤S34,对所述文本数据进行分词,获得分词后的话语关键词;Step S34, performing word segmentation on the text data to obtain the discourse keywords after word segmentation;
对文本数据进行分词后,获得分词后的词语。将分词后得到的所有词语作为话语关键词,也可将分词后得到的所有词语进行词性划分,将其中的名词、动名词、动词作为话语关键词。After segmenting the text data, the segmented words are obtained. All words obtained after word segmentation are used as discourse keywords, and all words obtained after word segmentation can be divided into parts of speech, and nouns, gerunds, and verbs among them are used as discourse keywords.
步骤S35,将所述话语关键词与预置的会议关键词进行对比,判断所述话语关键词中是否包含所述会议关键词;Step S35, comparing the speech keywords with preset meeting keywords, and judging whether the speech keywords include the meeting keywords;
话语关键词可能有多个,预置的会议关键词可能也有多个,则将各话语关键词分别与所有的会议关键词进行对比,判断话语关键词是否与至少一个会议关键词相同或含义相同/近似。本实施例中的“包含”会议关键词,指与会议关键词相同或含义相同/近似。There may be multiple speech keywords, and there may be multiple preset meeting keywords. Compare each speech keyword with all meeting keywords to determine whether the speech keyword is the same or the same as at least one meeting keyword. /approximate. The “contains” conference keyword in this embodiment refers to the same or the same meaning/similar in meaning to the conference keyword.
具体地,首先判断话语关键词是否与至少一个会议关键词相同,若与至少一个会议关键词相同,则可确定话语关键词中包含会议关键词,若与所有会议关键词都不相同,则进一步判断话语关键词是否与至少一个会议关键词含义相同/近似,若与至少一个会议关键词含义相同/近似,则可确定话语关键词中包含会议关键词,若与所有会议关键词含义都不相同/近似,则可确定话语关键词中不包含会议关键词。Specifically, it is first judged whether the speech keyword is the same as at least one meeting keyword. If it is the same as at least one meeting keyword, it can be determined that the speech keyword contains the meeting keyword. If it is different from all meeting keywords, further Determine whether the speech keywords are the same or similar to at least one meeting keyword. If they are the same or similar to at least one meeting keyword, then it can be determined that the speech keyword contains the meeting keyword. If the meaning is different from all meeting keywords /Approximately, it can be determined that the conference keywords are not included in the speech keywords.
其中,可以预先创建语料库,语料库中存储有与会议关键词含义相同/近似的词语,在判断话语关键词是否与至少一个会议关键词含义相同/近似时,从语料库中获取与会议关键词含义相同/近似的关联词语,将话语关键词与关联词语进行对比,判断话语关键词是否与至少一个关联词语相同,若话语关键词与至少一个关联词语相同,则可判定话语关键词与至少一个会议关键词含义相同/近似。Among them, a corpus can be created in advance, and the corpus stores words with the same/similar meaning to the conference keywords. When judging whether the speech keywords are the same/similar to at least one conference keyword, obtain the same meaning as the conference keywords from the corpus /Approximate related words, compare the utterance keywords with related words to determine whether the utterance keywords are the same as at least one related word. If the utterance keywords are the same as at least one related word, then it can be determined that the utterance keyword and at least one meeting key The meaning of the words is the same/similar.
步骤S36,若所述话语关键词中包含所述会议关键词,则所述文本数据与所述会议关键词匹配成功。Step S36: If the speech keywords include the conference keywords, the text data is successfully matched with the conference keywords.
若话语关键词中包含会议关键词,则文本数据与所述会议关键词匹配成功,可输出会议音频;反之,若话语关键词中不包含会议关键词,则文本数据与会议关键词匹配不成功,说明会议音频中的用户语音可能与会议内容无关,无需输出会议音频。If the speech keywords include meeting keywords, the text data matches the meeting keywords successfully, and the conference audio can be output; otherwise, if the speech keywords do not include meeting keywords, the text data and meeting keywords are not matched successfully , Indicating that the user’s voice in the conference audio may have nothing to do with the conference content, and there is no need to output conference audio.
本实施例中,只要话语关键词中包含会议关键词,则文本数据与会议关键词匹配成功,可避免匹配要求太高导致漏掉会议音频中的重要用户语音。In this embodiment, as long as the utterance keywords include the meeting keywords, the text data and the meeting keywords are successfully matched, which can avoid the missing of important user voices in the meeting audio due to too high matching requirements.
进一步地,基于上述实施例,在本申请第五实施例中,步骤S30中所述根据所述文本数据与所述会议关键词的匹配结果判断是否输出所述会议音频的步骤包括:Further, based on the foregoing embodiment, in the fifth embodiment of the present application, the step of determining whether to output the conference audio according to the matching result of the text data and the conference keyword in step S30 includes:
步骤S370,若所述文本数据与所述会议关键词匹配成功,则获取会议 图像;In step S370, if the text data is successfully matched with the conference keyword, obtain a conference image;
在文本数据与会议关键词匹配成功后,可基于图像分析进一步判断是否输出会议音频。本实施例中的会议图像,指会议音频来源处的会议图像,即:会议音频来源的会议成员所处空间的图像。例如,若会议音频是由本地声音采集设备采集的本地音频,则会议图像为本地图像;若会议音频是由服务器网络传输的远程空间的音频,则会议图像为对应远程空间的图像。又例如,会议音频来源于会议成员A,则会议图像指会议成员A所处空间的图像。After the text data is successfully matched with the conference keywords, it can be further determined whether to output conference audio based on image analysis. The meeting image in this embodiment refers to the meeting image at the audio source of the meeting, that is, the image of the space where the meeting members from the audio source of the meeting are located. For example, if the conference audio is a local audio collected by a local sound collection device, the conference image is a local image; if the conference audio is an audio in a remote space transmitted by a server network, the conference image is an image corresponding to the remote space. For another example, if the conference audio comes from conference member A, the conference image refers to the image of the space where conference member A is located.
步骤S371,检测所述会议图像中的人脸,并提取检测到的所述人脸的嘴唇特征,根据所述嘴唇特征判断所述人脸是否符合言语特征;Step S371: Detect the face in the conference image, extract the lip features of the detected face, and determine whether the face meets the speech feature according to the lip feature;
对会议图像进行人脸识别,获得其中的人脸,一个会议图像中可能包含多张人脸,则对每张人脸进行嘴唇特征检测以及是否符合言语特征的判断,若会议图像至少一张人脸符合言语特征,则可判定会议图像中的人脸符合言语特征。基于人脸五官位置特性,可直接对该人脸进行图像识别,以定位其中的嘴唇位置。可将嘴唇特征输入预置的语言判断模型中,由语言判断模型基于嘴唇特征判断该人脸是否符合言语特征。对于语言判断模型,可将标注了说话口型与非说话口型的唇部图像分别作为正例和反例对语言判断模型进行训练,在获得最优模型参数后,将包含最优模型参数的语言判断模型用于基于嘴唇特征的说话判断。Face recognition is performed on the meeting image to obtain the faces in it. A meeting image may contain multiple faces, then each face will be detected by lip feature and whether it meets the speech characteristics. If the meeting image has at least one person If the face conforms to the speech feature, it can be determined that the face in the meeting image conforms to the speech feature. Based on the location characteristics of the facial features of the face, image recognition of the face can be directly performed to locate the position of the lips. Lip characteristics can be input into a preset language judgment model, and the language judgment model determines whether the face meets the speech characteristics based on the lip characteristics. For the language judgment model, the lip images marked with the speaking and non-speaking mouths can be used as positive and negative examples to train the language judgment model. After the optimal model parameters are obtained, the language with the optimal model parameters will be included. The judgment model is used to judge speech based on lip characteristics.
步骤S372,若所述人脸符合言语特征,则判定输出所述会议音频。Step S372: If the human face meets the speech feature, it is determined to output the conference audio.
如果人脸符合言语特征,说明会议音频对应空间的会议成员正在说话,则可确定会议音频中有会议成员说话的语音,则需要输出会议音频;若人脸不符合言语特征,则说明会议音频对应空间的会议成员并没有说话,则说明会议音频中应该不会有会议成员说话的语音,则会议音频中已有的用户语音很可能是噪音,则判定不输出会议音频。If the face meets the verbal characteristics, it means that the meeting members in the corresponding space of the meeting audio are speaking. It can be determined that there is the voice of the meeting members in the meeting audio, and the meeting audio needs to be output; if the face does not meet the speech characteristics, the meeting audio corresponds to If the meeting members in the space do not speak, it means that there should not be the speech of the meeting members in the meeting audio, and the existing user voice in the meeting audio is likely to be noise, so it is determined that the meeting audio is not output.
本实施例通过对会议音频对应的会议图像进行图像识别,提取会议图像中人脸的嘴唇特征,并根据所述嘴唇特征判断所述人脸是否符合言语特征,即确定其是否说话,若会议图像中至少一个人脸符合言语特征,则可输出会议音频,如此,可基于图像特征和音频特征共同确定是否应该输出会议音频,可获得更为准确的会议音频筛选结果。In this embodiment, by performing image recognition on the conference image corresponding to the conference audio, the lip feature of the face in the conference image is extracted, and according to the lip feature, it is determined whether the face meets the speech feature, that is, whether it is speaking or not. If at least one of the faces meets the speech feature, the conference audio can be output. In this way, whether the conference audio should be output can be determined based on the image feature and the audio feature, and more accurate conference audio screening results can be obtained.
可选地,步骤S371中所述检测所述会议图像中的人脸的步骤之后包括:Optionally, after the step of detecting the face in the conference image in step S371, the method includes:
步骤S373,对检测到的所述人脸进行正面侧面识别;Step S373: Perform front and side recognition on the detected face;
可预置正面侧面识别的判别模型,将经过正面、侧面标注的人脸图像作为训练样本对判别模型进行训练,直至获得包含最优模型参数的判别模型,可将检测到的人脸图像输入该判别模型,由该判别模型输出正面侧面识别结果。The discriminant model for frontal side recognition can be preset, and the face images that have been frontal and side-labeled are used as training samples to train the discriminant model until a discriminant model containing the optimal model parameters is obtained. The detected face image can be input into the The discriminant model is used to output the front and side recognition results.
步骤S374,若所述人脸为正面,则执行所述提取检测到的所述人脸的嘴唇特征的步骤;Step S374, if the face is front, perform the step of extracting the lip features of the detected face;
如果人脸为正面,则说明会议成员正对会议屏幕,在认真参与会议,同时,因为在正面状态下,可以检测到完整的人脸嘴唇,因此,为进一步准确筛选必要输出的会议音频,可进一步执行所述提取检测到的所述人脸的嘴唇特征的步骤,对其是否在说话进行判断,即执行步骤S371-S372。If the face is front, it means that the meeting members are facing the meeting screen and are seriously participating in the meeting. At the same time, because in the front state, the complete face lips can be detected. Therefore, in order to further accurately filter the necessary output meeting audio, you can The step of extracting the lip features of the detected face is further executed, and it is judged whether it is speaking, that is, steps S371-S372 are executed.
步骤S375,若所述人脸为侧面,则判定不输出所述会议音频。Step S375: If the face is a profile, it is determined not to output the conference audio.
如果人脸为侧面,则会议成员可能是需要与其他成员进行私下讨论,则判定不输出所述会议音频,可增强会议音频筛选的灵活性,对于远程会议场景而言,也具有较好的实用性。If the face is a profile, the meeting member may need to have a private discussion with other members, and it is determined not to output the meeting audio, which can enhance the flexibility of meeting audio screening, and is also useful for remote meeting scenarios Sex.
此外,本申请还提供一种与上述会议音频控制方法各步骤对应的会议音频控制系统。In addition, this application also provides a conference audio control system corresponding to the steps of the aforementioned conference audio control method.
参照图3,图3为本申请会议音频控制系统第一实施例的功能模块示意图。Referring to Fig. 3, Fig. 3 is a schematic diagram of the functional modules of the first embodiment of the conference audio control system of this application.
在本实施例中,本申请会议音频控制系统包括:In this embodiment, the conference audio control system of this application includes:
语音检测模块10,用于接收会议音频,对所述会议音频进行语音检测,判断所述会议音频中是否包含用户语音;The voice detection module 10 is configured to receive conference audio, perform voice detection on the conference audio, and determine whether the conference audio contains user voice;
文本转换模块20,用于若所述会议音频中包含用户语音,则提取所述会议音频中的用户语音,将所述用户语音转换为文本数据;The text conversion module 20 is configured to extract the user voice in the conference audio if the conference audio includes user voice, and convert the user voice into text data;
匹配输出模块30,用于将所述文本数据与预置的会议关键词进行对比匹配,并根据所述文本数据与所述会议关键词的匹配结果判断是否输出所述会议音频。The matching output module 30 is configured to compare and match the text data with preset conference keywords, and determine whether to output the conference audio according to the matching result of the text data and the conference keywords.
进一步地,所述语音检测模块10,还用于从所述会议音频中提取音频帧,并获得所述音频帧的信号能量;将所述音频帧的信号能量与预置的能量阈值进行大小比较;若所述音频帧的信号能量大于预置的能量阈值,则判定所述音频帧为语音帧。Further, the voice detection module 10 is also configured to extract audio frames from the conference audio and obtain the signal energy of the audio frames; compare the signal energy of the audio frames with a preset energy threshold ; If the signal energy of the audio frame is greater than the preset energy threshold, it is determined that the audio frame is a speech frame.
进一步地,所述语音检测模块10,还用于输出用户静音提示,对无用户语音状态下的背景噪声进行采集,并获得背景噪声能量;基于所述背景噪声能量以及预设的阈值公式计算预置的能量阈值,所述阈值公式为:E rnew=(1-p)E rold+pE silence,其中,E rnew为新的阈值,E rold为旧的阈值,E silence为背景噪声能量,p为加权值,p满足0<p<1。 Further, the voice detection module 10 is also used to output a user mute prompt, collect background noise in the state of no user voice, and obtain background noise energy; calculate the prediction based on the background noise energy and a preset threshold formula The threshold formula is: E rnew = (1-p) E rold + pE silence , where E rnew is the new threshold, E rold is the old threshold, E silence is the background noise energy, and p is Weighted value, p satisfies 0<p<1.
进一步地,所述会议音频控制系统还包括:Further, the conference audio control system further includes:
会议关键词确定模块,用于获取预存的会议资料,并基于所述会议资料获得目标文本集合,将所述目标文本集合中的目标文本进行分词,获得分词后的目标词语;获得所述目标词语的词语特征,基于所述词语特征计算所述目标词语的权重值,其中,所述词语特征至少包括词性、词位置以及词频;将权重值大于预设阀值的所述目标词语作为预置的会议关键词。The conference keyword determination module is used to obtain pre-stored conference materials, obtain a target text set based on the conference materials, segment the target text in the target text set, and obtain the target word after word segmentation; obtain the target word Calculate the weight value of the target word based on the word feature, where the word feature includes at least part of speech, word position, and word frequency; the target word with a weight value greater than a preset threshold is taken as a preset Conference keywords.
进一步地,所述匹配输出模块30,还用于对所述文本数据进行分词,获得分词后的话语关键词;将所述话语关键词与预置的会议关键词进行对比,判断所述话语关键词中是否包含所述会议关键词;若所述话语关键词中包含所述会议关键词,则所述文本数据与所述会议关键词匹配成功。Further, the matching output module 30 is also used for word segmentation of the text data to obtain speech keywords after word segmentation; comparing the speech keywords with preset conference keywords to determine the key words of the speech Whether the word includes the meeting keyword; if the speech keyword includes the meeting keyword, the text data is successfully matched with the meeting keyword.
进一步地,所述匹配输出模块30,还用于若所述文本数据与所述会议关键词匹配成功,则获取会议图像;检测所述会议图像中的人脸,并提取检测到的所述人脸的嘴唇特征,根据所述嘴唇特征判断所述人脸是否符合言语特征;若所述人脸符合言语特征,则判定输出所述会议音频。Further, the matching output module 30 is further configured to obtain a meeting image if the text data is successfully matched with the meeting keyword; detect the face in the meeting image, and extract the detected person According to the lip characteristics of the face, it is determined whether the human face meets the speech characteristics according to the lip characteristics; if the human face meets the speech characteristics, it is determined to output the conference audio.
进一步地,所述匹配输出模块30,还用于对检测到的所述人脸进行正面侧面识别;若所述人脸为正面,则执行所述提取检测到的所述人脸的嘴唇特征的步骤;若所述人脸为侧面,则判定不输出所述会议音频。Further, the matching output module 30 is also used to perform frontal side recognition of the detected face; if the face is frontal, perform the extraction of the lip features of the detected face Step; if the face is a profile, it is determined not to output the conference audio.
本申请还提出一种计算机可读存储介质,所述计算机可读存储介质可以为非易失性可读存储介质,其上存储有计算机可读指令。计算机可读存储介质可以是图1的会议音频控制设备中的存储器201,也可以是如ROM(Read-Only Memory,只读存储器)/RAM(Random Access Memory,随机存取存储器)、磁碟、光盘中的至少一种,计算机可读存储介质包括若干指令用以使得一台具有处理器的设备(可以是手机,计算机,服务器,网络设备或本申请实施例中的会议音频控制设备等)执行本申请各个实施例的方法。This application also proposes a computer-readable storage medium, which may be a non-volatile readable storage medium on which computer-readable instructions are stored. The computer-readable storage medium can be the memory 201 in the conference audio control device of FIG. 1, or it can be ROM (Read-Only Memory)/RAM (Random Access Memory), magnetic disk, At least one of the optical discs, and the computer-readable storage medium includes several instructions to make a device with a processor (which can be a mobile phone, a computer, a server, a network device, or the conference audio control device in the embodiment of the present application, etc.) to execute The method of each embodiment of this application.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、设 备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、设备所固有的要素。在没有更多限制的情况下,由语句“包括……”限定的要素,并不排除在包括该要素的过程、方法、设备中还存在另外的相同要素。It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, and device including a series of elements not only includes those elements, but also Including other elements not explicitly listed, or elements inherent to this process, method, or equipment. If there are no more restrictions, the element defined by the sentence "including..." does not exclude the existence of other same elements in the process, method, and equipment that include the element.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。
以上仅为本申请的可选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only optional embodiments of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made by using the description and drawings of this application, or directly or indirectly applied to other related technologies In the same way, all fields are included in the scope of patent protection of this application.

Claims (20)

  1. 一种会议音频控制方法,其中,所述会议音频控制方法包括以下步骤:A conference audio control method, wherein the conference audio control method includes the following steps:
    接收会议音频,对所述会议音频进行语音检测,判断所述会议音频中是否包含用户语音;Receiving conference audio, performing voice detection on the conference audio, and determining whether the conference audio includes user voice;
    若所述会议音频中包含用户语音,则提取所述会议音频中的用户语音,将所述用户语音转换为文本数据;If the conference audio includes user voice, extract the user voice in the conference audio, and convert the user voice into text data;
    将所述文本数据与预置的会议关键词进行对比匹配,并根据所述文本数据与所述会议关键词的匹配结果判断是否输出所述会议音频;Comparing and matching the text data with preset conference keywords, and judging whether to output the conference audio according to the matching result of the text data and the conference keywords;
    其中,所述对所述会议音频进行语音检测,判断所述会议音频中是否包含用户语音的步骤包括:Wherein, the step of performing voice detection on the conference audio and judging whether the conference audio includes user voice includes:
    从所述会议音频中提取音频帧,并获得所述音频帧的信号能量;Extract audio frames from the conference audio, and obtain signal energy of the audio frames;
    输出用户静音提示,对无用户语音状态下的背景噪声进行采集,并获得背景噪声能量;Output user mute prompt, collect background noise when there is no user voice, and obtain background noise energy;
    基于所述背景噪声能量以及预设的阈值公式计算预置的能量阈值,所述阈值公式为:E rnew=(1-p)E rold+pE silence,其中,E rnew为新的阈值,E rold为旧的阈值,E silence为背景噪声能量,p为加权值,p满足0<p<1; A preset energy threshold is calculated based on the background noise energy and a preset threshold formula. The threshold formula is: E rnew = (1-p) E rold + pE silence , where E rnew is the new threshold, E rold Is the old threshold, E silence is the background noise energy, p is the weighted value, and p satisfies 0<p<1;
    将所述音频帧的信号能量与预置的能量阈值进行大小比较;Comparing the signal energy of the audio frame with a preset energy threshold;
    若所述音频帧的信号能量大于预置的能量阈值,则判定所述音频帧为语音帧。If the signal energy of the audio frame is greater than the preset energy threshold, it is determined that the audio frame is a speech frame.
  2. 如权利要求1所述的会议音频控制方法,其中,所述将所述文本数据与预置的会议关键词进行对比匹配的步骤之前包括:The conference audio control method according to claim 1, wherein before the step of comparing and matching the text data with preset conference keywords comprises:
    获取预存的会议资料,并基于所述会议资料获得目标文本集合,将所述目标文本集合中的目标文本进行分词,获得分词后的目标词语;Obtain pre-stored meeting materials, obtain a target text set based on the meeting materials, perform word segmentation on the target text in the target text set, and obtain target words after word segmentation;
    获得所述目标词语的词语特征,基于所述词语特征计算所述目标词语的权重值,其中,所述词语特征至少包括词性、词位置以及词频;Obtain the word characteristics of the target word, and calculate the weight value of the target word based on the word characteristics, where the word characteristics include at least part of speech, word position, and word frequency;
    将权重值大于预设阀值的所述目标词语作为预置的会议关键词。The target words with a weight value greater than a preset threshold are used as preset conference keywords.
  3. 如权利要求1所述的会议音频控制方法,其中,所述将所述文本数据与预置的会议关键词进行对比匹配的步骤包括:The conference audio control method according to claim 1, wherein the step of comparing and matching the text data with preset conference keywords comprises:
    对所述文本数据进行分词,获得分词后的话语关键词;Perform word segmentation on the text data to obtain the discourse keywords after word segmentation;
    将所述话语关键词与预置的会议关键词进行对比,判断所述话语关键词中是否包含所述会议关键词;Comparing the speech keywords with preset meeting keywords, and judging whether the speech keywords include the meeting keywords;
    若所述话语关键词中包含所述会议关键词,则所述文本数据与所述会议关键词匹配成功。If the speech keywords include the conference keywords, the text data and the conference keywords are successfully matched.
  4. 如权利要求1所述的会议音频控制方法,其中,所述根据所述文本数据与所述会议关键词的匹配结果判断是否输出所述会议音频的步骤包括:The conference audio control method according to claim 1, wherein the step of judging whether to output the conference audio according to the matching result of the text data and the conference keywords comprises:
    若所述文本数据与所述会议关键词匹配成功,则获取会议图像;If the text data is successfully matched with the conference keyword, obtaining a conference image;
    检测所述会议图像中的人脸,并提取检测到的所述人脸的嘴唇特征,根据所述嘴唇特征判断所述人脸是否符合言语特征;Detecting the human face in the conference image, extracting lip features of the detected human face, and judging whether the human face meets the speech feature according to the lip feature;
    若所述人脸符合言语特征,则判定输出所述会议音频。If the face meets the speech feature, it is determined to output the conference audio.
  5. 如权利要求4所述的会议音频控制方法,其中,所述检测所述会议图像中的人脸的步骤之后包括:The conference audio control method according to claim 4, wherein after the step of detecting the human face in the conference image includes:
    对检测到的所述人脸进行正面侧面识别;Performing front and side face recognition on the detected face;
    若所述人脸为正面,则执行所述提取检测到的所述人脸的嘴唇特征的步骤;If the face is front, execute the step of extracting the lip features of the detected face;
    若所述人脸为侧面,则判定不输出所述会议音频。If the face is a profile, it is determined not to output the conference audio.
  6. 一种会议音频控制系统,其中,所述会议音频控制系统包括:A conference audio control system, wherein the conference audio control system includes:
    语音检测模块,用于接收会议音频,对所述会议音频进行语音检测,判断所述会议音频中是否包含用户语音;The voice detection module is configured to receive conference audio, perform voice detection on the conference audio, and determine whether the conference audio contains user voice;
    文本转换模块,用于若所述会议音频中包含用户语音,则提取所述会议音频中的用户语音,将所述用户语音转换为文本数据;A text conversion module, configured to extract the user voice in the conference audio and convert the user voice into text data if the conference audio contains user voice;
    匹配输出模块,用于将所述文本数据与预置的会议关键词进行对比匹配,并根据所述文本数据与所述会议关键词的匹配结果判断是否输出所述会议音频;The matching output module is configured to compare and match the text data with preset conference keywords, and determine whether to output the conference audio according to the matching result of the text data and the conference keywords;
    所述语音检测模块,还用于从所述会议音频中提取音频帧,并获得所述音频帧的信号能量;将所述音频帧的信号能量与预置的能量阈值进行大小比较;若所述音频帧的信号能量大于预置的能量阈值,则判定所述音频帧为语音帧;The voice detection module is further configured to extract audio frames from the conference audio and obtain the signal energy of the audio frames; compare the signal energy of the audio frames with a preset energy threshold; if the If the signal energy of the audio frame is greater than the preset energy threshold, it is determined that the audio frame is a speech frame;
    所述语音检测模块,还用于输出用户静音提示,对无用户语音状态下的背景噪声进行采集,并获得背景噪声能量;基于所述背景噪声能量以及预设的阈值公式计算预置的能量阈值,所述阈值公式为: E rnew=(1-p)E rold+pE silence,其中,E rnew为新的阈值,E rold为旧的阈值,E silence为背景噪声能量,p为加权值,p满足0<p<1。 The voice detection module is also used to output a user mute prompt, collect background noise in the state of no user voice, and obtain background noise energy; calculate a preset energy threshold based on the background noise energy and a preset threshold formula , The threshold formula is: E rnew =(1-p)E rold +pE silence , where E rnew is the new threshold, E rold is the old threshold, E silence is the background noise energy, p is the weighted value, and p Satisfy 0<p<1.
  7. 如权利要求6所述的会议音频控制系统,其中,所述会议音频控制系统还包括:8. The conference audio control system of claim 6, wherein the conference audio control system further comprises:
    会议关键词确定模块,用于获取预存的会议资料,并基于所述会议资料获得目标文本集合,将所述目标文本集合中的目标文本进行分词,获得分词后的目标词语;获得所述目标词语的词语特征,基于所述词语特征计算所述目标词语的权重值,其中,所述词语特征至少包括词性、词位置以及词频;将权重值大于预设阀值的所述目标词语作为预置的会议关键词。The conference keyword determination module is used to obtain pre-stored conference materials, obtain a target text set based on the conference materials, segment the target text in the target text set, and obtain the target word after word segmentation; obtain the target word Calculate the weight value of the target word based on the word feature, where the word feature includes at least part of speech, word position, and word frequency; the target word with a weight value greater than a preset threshold is taken as a preset Conference keywords.
  8. 如权利要求6所述的会议音频控制系统,其中,所述匹配输出模块,还用于对所述文本数据进行分词,获得分词后的话语关键词;将所述话语关键词与预置的会议关键词进行对比,判断所述话语关键词中是否包含所述会议关键词;若所述话语关键词中包含所述会议关键词,则所述文本数据与所述会议关键词匹配成功。The conference audio control system according to claim 6, wherein the matching output module is also used to segment the text data to obtain the speech keywords after word segmentation; and compare the speech keywords with the preset meeting The keywords are compared to determine whether the speech keywords include the conference keywords; if the speech keywords include the conference keywords, the text data and the conference keywords are successfully matched.
  9. 如权利要求6所述的会议音频控制系统,其中,所述匹配输出模块,还用于若所述文本数据与所述会议关键词匹配成功,则获取会议图像;检测所述会议图像中的人脸,并提取检测到的所述人脸的嘴唇特征,根据所述嘴唇特征判断所述人脸是否符合言语特征;若所述人脸符合言语特征,则判定输出所述会议音频。The conference audio control system according to claim 6, wherein the matching output module is further configured to obtain a conference image if the text data is successfully matched with the conference keyword; and detect persons in the conference image Face, and extract the detected lip features of the face, and determine whether the face meets the speech feature according to the lip feature; if the face meets the speech feature, determine to output the conference audio.
  10. 如权利要求9所述的会议音频控制系统,其中,所述匹配输出模块,还用于对检测到的所述人脸进行正面侧面识别;若所述人脸为正面,则执行所述提取检测到的所述人脸的嘴唇特征的步骤;若所述人脸为侧面,则判定不输出所述会议音频。The conference audio control system of claim 9, wherein the matching output module is further configured to perform frontal side recognition of the detected face; if the face is frontal, perform the extraction detection If the face is a profile, it is determined not to output the conference audio.
  11. 一种会议音频控制设备,其中,所述会议音频控制设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的计算机可读指令,其中所述计算机可读指令被所述处理器执行时,实现如下步骤:A conference audio control device, wherein the conference audio control device includes a processor, a memory, and computer-readable instructions stored on the memory and executable by the processor, wherein the computer-readable instructions are When the processor executes, the following steps are implemented:
    接收会议音频,对所述会议音频进行语音检测,判断所述会议音频中是否包含用户语音;Receiving conference audio, performing voice detection on the conference audio, and determining whether the conference audio includes user voice;
    若所述会议音频中包含用户语音,则提取所述会议音频中的用户语音,将所述用户语音转换为文本数据;If the conference audio includes user voice, extract the user voice in the conference audio, and convert the user voice into text data;
    将所述文本数据与预置的会议关键词进行对比匹配,并根据所述文本 数据与所述会议关键词的匹配结果判断是否输出所述会议音频;Comparing and matching the text data with preset conference keywords, and judging whether to output the conference audio according to the matching result of the text data and the conference keywords;
    其中,所述对所述会议音频进行语音检测,判断所述会议音频中是否包含用户语音的步骤包括:Wherein, the step of performing voice detection on the conference audio and judging whether the conference audio includes user voice includes:
    从所述会议音频中提取音频帧,并获得所述音频帧的信号能量;Extract audio frames from the conference audio, and obtain signal energy of the audio frames;
    输出用户静音提示,对无用户语音状态下的背景噪声进行采集,并获得背景噪声能量;Output user mute prompt, collect background noise when there is no user voice, and obtain background noise energy;
    基于所述背景噪声能量以及预设的阈值公式计算预置的能量阈值,所述阈值公式为:E rnew=(1-p)E rold+pE silence,其中,E rnew为新的阈值,E rold为旧的阈值,E silence为背景噪声能量,p为加权值,p满足0<p<1; A preset energy threshold is calculated based on the background noise energy and a preset threshold formula. The threshold formula is: E rnew = (1-p) E rold + pE silence , where E rnew is the new threshold, E rold Is the old threshold, E silence is the background noise energy, p is the weighted value, and p satisfies 0<p<1;
    将所述音频帧的信号能量与预置的能量阈值进行大小比较;Comparing the signal energy of the audio frame with a preset energy threshold;
    若所述音频帧的信号能量大于预置的能量阈值,则判定所述音频帧为语音帧。If the signal energy of the audio frame is greater than the preset energy threshold, it is determined that the audio frame is a speech frame.
  12. 如权利要求11所述的会议音频控制设备,其中,所述计算机可读指令被所述处理器执行时,还实现如下步骤:The conference audio control device according to claim 11, wherein when the computer-readable instructions are executed by the processor, the following steps are further implemented:
    获取预存的会议资料,并基于所述会议资料获得目标文本集合,将所述目标文本集合中的目标文本进行分词,获得分词后的目标词语;Obtain pre-stored meeting materials, obtain a target text set based on the meeting materials, perform word segmentation on the target text in the target text set, and obtain target words after word segmentation;
    获得所述目标词语的词语特征,基于所述词语特征计算所述目标词语的权重值,其中,所述词语特征至少包括词性、词位置以及词频;Obtain the word characteristics of the target word, and calculate the weight value of the target word based on the word characteristics, where the word characteristics include at least part of speech, word position, and word frequency;
    将权重值大于预设阀值的所述目标词语作为预置的会议关键词。The target words with a weight value greater than a preset threshold are used as preset conference keywords.
  13. 如权利要求11所述的会议音频控制设备,其中,所述计算机可读指令被所述处理器执行时,还实现如下步骤:The conference audio control device according to claim 11, wherein when the computer-readable instructions are executed by the processor, the following steps are further implemented:
    对所述文本数据进行分词,获得分词后的话语关键词;Perform word segmentation on the text data to obtain the discourse keywords after word segmentation;
    将所述话语关键词与预置的会议关键词进行对比,判断所述话语关键词中是否包含所述会议关键词;Comparing the speech keywords with preset meeting keywords, and judging whether the speech keywords include the meeting keywords;
    若所述话语关键词中包含所述会议关键词,则所述文本数据与所述会议关键词匹配成功。If the speech keywords include the conference keywords, the text data and the conference keywords are successfully matched.
  14. 如权利要求11所述的会议音频控制设备,其中,所述计算机可读指令被所述处理器执行时,还实现如下步骤:The conference audio control device according to claim 11, wherein when the computer-readable instructions are executed by the processor, the following steps are further implemented:
    若所述文本数据与所述会议关键词匹配成功,则获取会议图像;If the text data is successfully matched with the conference keyword, obtaining a conference image;
    检测所述会议图像中的人脸,并提取检测到的所述人脸的嘴唇特征,根据所述嘴唇特征判断所述人脸是否符合言语特征;Detecting the human face in the conference image, extracting lip features of the detected human face, and judging whether the human face meets the speech feature according to the lip feature;
    若所述人脸符合言语特征,则判定输出所述会议音频。If the face meets the speech feature, it is determined to output the conference audio.
  15. 如权利要求14所述的会议音频控制设备,其中,所述计算机可读指令被所述处理器执行时,还实现如下步骤:The conference audio control device according to claim 14, wherein when the computer-readable instructions are executed by the processor, the following steps are further implemented:
    对检测到的所述人脸进行正面侧面识别;Performing front and side face recognition on the detected face;
    若所述人脸为正面,则执行所述提取检测到的所述人脸的嘴唇特征的步骤;If the face is front, execute the step of extracting the lip features of the detected face;
    若所述人脸为侧面,则判定不输出所述会议音频。If the face is a profile, it is determined not to output the conference audio.
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机可读指令,其中所述计算机可读指令被处理器执行时,实现如下步骤:A computer-readable storage medium, wherein computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the following steps are implemented:
    接收会议音频,对所述会议音频进行语音检测,判断所述会议音频中是否包含用户语音;Receiving conference audio, performing voice detection on the conference audio, and determining whether the conference audio includes user voice;
    若所述会议音频中包含用户语音,则提取所述会议音频中的用户语音,将所述用户语音转换为文本数据;If the conference audio includes user voice, extract the user voice in the conference audio, and convert the user voice into text data;
    将所述文本数据与预置的会议关键词进行对比匹配,并根据所述文本数据与所述会议关键词的匹配结果判断是否输出所述会议音频;Comparing and matching the text data with preset conference keywords, and judging whether to output the conference audio according to the matching result of the text data and the conference keywords;
    其中,所述对所述会议音频进行语音检测,判断所述会议音频中是否包含用户语音的步骤包括:Wherein, the step of performing voice detection on the conference audio and judging whether the conference audio includes user voice includes:
    从所述会议音频中提取音频帧,并获得所述音频帧的信号能量;Extract audio frames from the conference audio, and obtain signal energy of the audio frames;
    输出用户静音提示,对无用户语音状态下的背景噪声进行采集,并获得背景噪声能量;Output user mute prompt, collect background noise when there is no user voice, and obtain background noise energy;
    基于所述背景噪声能量以及预设的阈值公式计算预置的能量阈值,所述阈值公式为:E rnew=(1-p)E rold+pE silence,其中,E rnew为新的阈值,E rold为旧的阈值,E silence为背景噪声能量,p为加权值,p满足0<p<1; A preset energy threshold is calculated based on the background noise energy and a preset threshold formula. The threshold formula is: E rnew = (1-p) E rold + pE silence , where E rnew is the new threshold, E rold Is the old threshold, E silence is the background noise energy, p is the weighted value, and p satisfies 0<p<1;
    将所述音频帧的信号能量与预置的能量阈值进行大小比较;Comparing the signal energy of the audio frame with a preset energy threshold;
    若所述音频帧的信号能量大于预置的能量阈值,则判定所述音频帧为语音帧。If the signal energy of the audio frame is greater than the preset energy threshold, it is determined that the audio frame is a speech frame.
  17. 如权利要求16所述的计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机可读指令,其中所述计算机可读指令被处理器执行时,还实现如下步骤:15. The computer-readable storage medium of claim 16, wherein the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the following steps are further implemented:
    获取预存的会议资料,并基于所述会议资料获得目标文本集合,将所 述目标文本集合中的目标文本进行分词,获得分词后的目标词语;Obtain pre-stored meeting materials, obtain a target text set based on the meeting materials, perform word segmentation on the target text in the target text set, and obtain target words after word segmentation;
    获得所述目标词语的词语特征,基于所述词语特征计算所述目标词语的权重值,其中,所述词语特征至少包括词性、词位置以及词频;Obtain the word characteristics of the target word, and calculate the weight value of the target word based on the word characteristics, where the word characteristics include at least part of speech, word position, and word frequency;
    将权重值大于预设阀值的所述目标词语作为预置的会议关键词。The target words with a weight value greater than a preset threshold are used as preset conference keywords.
  18. 如权利要求16所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时,还实现如下步骤:15. The computer-readable storage medium of claim 16, wherein when the computer-readable instructions are executed by the processor, the following steps are further implemented:
    对所述文本数据进行分词,获得分词后的话语关键词;Perform word segmentation on the text data to obtain speech keywords after word segmentation;
    将所述话语关键词与预置的会议关键词进行对比,判断所述话语关键词中是否包含所述会议关键词;Comparing the speech keywords with preset meeting keywords, and judging whether the speech keywords include the meeting keywords;
    若所述话语关键词中包含所述会议关键词,则所述文本数据与所述会议关键词匹配成功。If the speech keywords include the conference keywords, the text data and the conference keywords are successfully matched.
  19. 如权利要求16所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时,还实现如下步骤:15. The computer-readable storage medium of claim 16, wherein when the computer-readable instructions are executed by the processor, the following steps are further implemented:
    若所述文本数据与所述会议关键词匹配成功,则获取会议图像;If the text data is successfully matched with the conference keyword, obtaining a conference image;
    检测所述会议图像中的人脸,并提取检测到的所述人脸的嘴唇特征,根据所述嘴唇特征判断所述人脸是否符合言语特征;Detecting the human face in the conference image, extracting lip features of the detected human face, and judging whether the human face meets the speech feature according to the lip feature;
    若所述人脸符合言语特征,则判定输出所述会议音频。If the face meets the speech feature, it is determined to output the conference audio.
  20. 如权利要求17所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时,还实现如下步骤:17. The computer-readable storage medium of claim 17, wherein when the computer-readable instructions are executed by the processor, the following steps are further implemented:
    对检测到的所述人脸进行正面侧面识别;Performing front and side face recognition on the detected face;
    若所述人脸为正面,则执行所述提取检测到的所述人脸的嘴唇特征的步骤;If the face is front, execute the step of extracting the lip features of the detected face;
    若所述人脸为侧面,则判定不输出所述会议音频。If the face is a profile, it is determined not to output the conference audio.
PCT/CN2019/121711 2019-05-21 2019-11-28 Conference audio control method, system, device and computer readable storage medium WO2020233068A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910432253.9A CN110300001B (en) 2019-05-21 2019-05-21 Conference audio control method, system, device and computer readable storage medium
CN201910432253.9 2019-05-21

Publications (1)

Publication Number Publication Date
WO2020233068A1 true WO2020233068A1 (en) 2020-11-26

Family

ID=68027129

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121711 WO2020233068A1 (en) 2019-05-21 2019-11-28 Conference audio control method, system, device and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN110300001B (en)
WO (1) WO2020233068A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112969000A (en) * 2021-02-25 2021-06-15 北京百度网讯科技有限公司 Control method and device of network conference, electronic equipment and storage medium
CN113051426A (en) * 2021-03-18 2021-06-29 深圳市声扬科技有限公司 Audio information classification method and device, electronic equipment and storage medium
CN113746822A (en) * 2021-08-25 2021-12-03 安徽创变信息科技有限公司 Teleconference management method and system
US11444795B1 (en) 2021-02-25 2022-09-13 At&T Intellectual Property I, L.P. Intelligent meeting assistant
CN115828907A (en) * 2023-02-16 2023-03-21 南昌航天广信科技有限责任公司 Intelligent conference management method, system, readable storage medium and computer equipment
CN116246633A (en) * 2023-05-12 2023-06-09 深圳市宏辉智通科技有限公司 Wireless intelligent Internet of things conference system

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110300001B (en) * 2019-05-21 2022-03-15 深圳壹账通智能科技有限公司 Conference audio control method, system, device and computer readable storage medium
CN111314788A (en) * 2020-03-13 2020-06-19 广州华多网络科技有限公司 Voice password returning method and presenting method, device and equipment for voice gift
CN111510662B (en) * 2020-04-27 2021-06-22 深圳米唐科技有限公司 Network call microphone state prompting method and system based on audio and video analysis
CN111556279A (en) * 2020-05-22 2020-08-18 腾讯科技(深圳)有限公司 Monitoring method and communication method of instant session
CN111754990A (en) * 2020-06-24 2020-10-09 杨文龙 Voice chat cooperative processing method and device
CN111756939B (en) * 2020-06-28 2022-05-31 联想(北京)有限公司 Online voice control method and device and computer equipment
CN111753769A (en) * 2020-06-29 2020-10-09 歌尔科技有限公司 Terminal audio acquisition control method, electronic equipment and readable storage medium
CN111833876A (en) * 2020-07-14 2020-10-27 科大讯飞股份有限公司 Conference speech control method, system, electronic device and storage medium
CN112601045A (en) * 2020-12-10 2021-04-02 广州虎牙科技有限公司 Speaking control method, device, equipment and storage medium for video conference
CN112687272B (en) * 2020-12-18 2023-03-21 北京金山云网络技术有限公司 Conference summary recording method and device and electronic equipment
CN112687273B (en) * 2020-12-26 2024-04-16 科大讯飞股份有限公司 Voice transcription method and device
CN112765335B (en) * 2021-01-27 2024-03-08 上海三菱电梯有限公司 Voice call system
CN113505597A (en) * 2021-07-27 2021-10-15 随锐科技集团股份有限公司 Method, device and storage medium for extracting keywords in video conference
CN116110373B (en) * 2023-04-12 2023-06-09 深圳市声菲特科技技术有限公司 Voice data acquisition method and related device of intelligent conference system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070106514A1 (en) * 2005-11-08 2007-05-10 Oh Seung S Method of generating a prosodic model for adjusting speech style and apparatus and method of synthesizing conversational speech using the same
CN105405439A (en) * 2015-11-04 2016-03-16 科大讯飞股份有限公司 Voice playing method and device
CN105512348A (en) * 2016-01-28 2016-04-20 北京旷视科技有限公司 Method and device for processing videos and related audios and retrieving method and device
CN106531172A (en) * 2016-11-23 2017-03-22 湖北大学 Speaker voice playback identification method and system based on environmental noise change detection
CN107993665A (en) * 2017-12-14 2018-05-04 科大讯飞股份有限公司 Spokesman role determines method, intelligent meeting method and system in multi-conference scene
CN110300001A (en) * 2019-05-21 2019-10-01 深圳壹账通智能科技有限公司 Conference audio control method, system, equipment and computer readable storage medium

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5094804B2 (en) * 2009-08-31 2012-12-12 シャープ株式会社 Conference relay device and computer program
US9601117B1 (en) * 2011-11-30 2017-03-21 West Corporation Method and apparatus of processing user data of a multi-speaker conference call
CN103581608B (en) * 2012-07-20 2019-02-01 Polycom 通讯技术(北京)有限公司 Spokesman's detection system, spokesman's detection method and audio/video conferencingasystem figureu
CN103137137B (en) * 2013-02-27 2015-07-01 华南理工大学 Eloquent speaker finding method in conference audio
US9595271B2 (en) * 2013-06-27 2017-03-14 Getgo, Inc. Computer system employing speech recognition for detection of non-speech audio
EP2999203A1 (en) * 2014-09-22 2016-03-23 Alcatel Lucent Conferencing system
CN105162611B (en) * 2015-10-21 2019-03-15 方图智能(深圳)科技集团股份有限公司 A kind of digital conference system and management control method
WO2017124293A1 (en) * 2016-01-19 2017-07-27 王晓光 Conference discussion method and system for video conference
CN107170452A (en) * 2017-04-27 2017-09-15 广东小天才科技有限公司 The Adding Way and device of a kind of electronic meeting
CN107276777B (en) * 2017-07-27 2020-05-29 苏州科达科技股份有限公司 Audio processing method and device of conference system
CN107679506A (en) * 2017-10-12 2018-02-09 Tcl通力电子(惠州)有限公司 Awakening method, intelligent artifact and the computer-readable recording medium of intelligent artifact
CN109036381A (en) * 2018-08-08 2018-12-18 平安科技(深圳)有限公司 Method of speech processing and device, computer installation and readable storage medium storing program for executing
CN108986826A (en) * 2018-08-14 2018-12-11 中国平安人寿保险股份有限公司 Automatically generate method, electronic device and the readable storage medium storing program for executing of minutes
CN109388701A (en) * 2018-08-17 2019-02-26 深圳壹账通智能科技有限公司 Minutes generation method, device, equipment and computer storage medium
CN109145853A (en) * 2018-08-31 2019-01-04 百度在线网络技术(北京)有限公司 The method and apparatus of noise for identification
CN109274922A (en) * 2018-11-19 2019-01-25 国网山东省电力公司信息通信公司 A kind of Video Conference Controlling System based on speech recognition
CN109547729A (en) * 2018-11-27 2019-03-29 平安科技(深圳)有限公司 A kind of call voice access video-meeting method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070106514A1 (en) * 2005-11-08 2007-05-10 Oh Seung S Method of generating a prosodic model for adjusting speech style and apparatus and method of synthesizing conversational speech using the same
CN105405439A (en) * 2015-11-04 2016-03-16 科大讯飞股份有限公司 Voice playing method and device
CN105512348A (en) * 2016-01-28 2016-04-20 北京旷视科技有限公司 Method and device for processing videos and related audios and retrieving method and device
CN106531172A (en) * 2016-11-23 2017-03-22 湖北大学 Speaker voice playback identification method and system based on environmental noise change detection
CN107993665A (en) * 2017-12-14 2018-05-04 科大讯飞股份有限公司 Spokesman role determines method, intelligent meeting method and system in multi-conference scene
CN110300001A (en) * 2019-05-21 2019-10-01 深圳壹账通智能科技有限公司 Conference audio control method, system, equipment and computer readable storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112969000A (en) * 2021-02-25 2021-06-15 北京百度网讯科技有限公司 Control method and device of network conference, electronic equipment and storage medium
US11444795B1 (en) 2021-02-25 2022-09-13 At&T Intellectual Property I, L.P. Intelligent meeting assistant
CN113051426A (en) * 2021-03-18 2021-06-29 深圳市声扬科技有限公司 Audio information classification method and device, electronic equipment and storage medium
CN113746822A (en) * 2021-08-25 2021-12-03 安徽创变信息科技有限公司 Teleconference management method and system
CN113746822B (en) * 2021-08-25 2023-07-21 广州市昇博电子科技有限公司 Remote conference management method and system
CN115828907A (en) * 2023-02-16 2023-03-21 南昌航天广信科技有限责任公司 Intelligent conference management method, system, readable storage medium and computer equipment
CN115828907B (en) * 2023-02-16 2023-04-25 南昌航天广信科技有限责任公司 Intelligent conference management method, system, readable storage medium and computer device
CN116246633A (en) * 2023-05-12 2023-06-09 深圳市宏辉智通科技有限公司 Wireless intelligent Internet of things conference system
CN116246633B (en) * 2023-05-12 2023-07-21 深圳市宏辉智通科技有限公司 Wireless intelligent Internet of things conference system

Also Published As

Publication number Publication date
CN110300001B (en) 2022-03-15
CN110300001A (en) 2019-10-01

Similar Documents

Publication Publication Date Title
WO2020233068A1 (en) Conference audio control method, system, device and computer readable storage medium
CN110049270B (en) Multi-person conference voice transcription method, device, system, equipment and storage medium
US10552118B2 (en) Context based identification of non-relevant verbal communications
US9672829B2 (en) Extracting and displaying key points of a video conference
CN107910014B (en) Echo cancellation test method, device and test equipment
WO2020232865A1 (en) Meeting role-based speech synthesis method, apparatus, computer device, and storage medium
US9293133B2 (en) Improving voice communication over a network
JP4838351B2 (en) Keyword extractor
US8826210B2 (en) Visualization interface of continuous waveform multi-speaker identification
CN108346425B (en) Voice activity detection method and device and voice recognition method and device
CN110517689B (en) Voice data processing method, device and storage medium
US9390725B2 (en) Systems and methods for noise reduction using speech recognition and speech synthesis
US20070285505A1 (en) Method and apparatus for video conferencing having dynamic layout based on keyword detection
CN112102850B (en) Emotion recognition processing method and device, medium and electronic equipment
WO2019242414A1 (en) Voice processing method and apparatus, storage medium, and electronic device
US10366173B2 (en) Device and method of simultaneous interpretation based on real-time extraction of interpretation unit
WO2023040523A1 (en) Audio signal processing method and apparatus, electronic device, and storage medium
JP7255032B2 (en) voice recognition
CN111415128A (en) Method, system, apparatus, device and medium for controlling conference
CN109616116B (en) Communication system and communication method thereof
CN113345423B (en) Voice endpoint detection method, device, electronic equipment and storage medium
KR102378895B1 (en) Method for learning wake-word for speech recognition, and computer program recorded on record-medium for executing method therefor
US20230223033A1 (en) Method of Noise Reduction for Intelligent Network Communication
Kannan et al. Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
CN116052650A (en) Voice recognition method, device, storage medium and equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19929616

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 02-03-2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19929616

Country of ref document: EP

Kind code of ref document: A1