CN111508499A

CN111508499A - Conference management method based on voice recognition

Info

Publication number: CN111508499A
Application number: CN202010283746.3A
Authority: CN
Inventors: 李小强; 赵珍
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-04-13
Filing date: 2020-04-13
Publication date: 2020-08-07

Abstract

The invention relates to a conference management method based on voice recognition, which comprises the steps of obtaining a voice signal of a conference host; carrying out voice recognition on a voice signal to obtain text data; acquiring names of participants appearing in the text data according to the text data and a preset participant list database to obtain names of target participants; determining a microphone number corresponding to the name of the target participant according to the name of the target participant and a preset microphone distribution database to obtain a target microphone number; and controlling to open the microphone corresponding to the target microphone number according to the target microphone number. The conference management method does not need manual management of a conference host in the whole process, improves the management efficiency and can reduce the electric energy consumption. Therefore, the conference management method can improve the effect of conference management in an automatic control mode.

Description

Conference management method based on voice recognition

Technical Field

The invention relates to a conference management method based on voice recognition.

Background

When meeting in the conference room, personnel in the conference room include the conference host and the personnel of participating in, preside going on of meeting by the conference host, go on in the meeting, confirm the speaker from the personnel of participating in by the conference host, moreover, each personnel of participating in all is provided with the microphone in the front. Under the general condition, all microphones are in an open state, any participant can speak through the microphone in front, firstly, all the microphones are in the open state, however, under the general condition, only one participant speaks at the same time, and other microphones are in an idle state, unnecessary power consumption can be caused, and all the microphones are in the open state. Therefore, the conference host is required to manage the conference process, but the effect is not good if the conference process is manually managed by the conference host.

Disclosure of Invention

The invention aims to provide a conference management method based on voice recognition, which is used for solving the problem of poor management effect of manual management of all conference processes by a conference host.

In order to solve the problems, the invention adopts the following technical scheme:

a conference management method based on voice recognition comprises the following steps:

acquiring a voice signal of a conference host;

carrying out voice recognition on the voice signal to obtain corresponding text data;

acquiring names of participants appearing in the text data according to the text data and a preset participant list database to obtain names of target participants; wherein the attendee list database comprises at least two attendee names;

determining a microphone number corresponding to the name of the target participant according to the name of the target participant and a preset microphone distribution database to obtain a target microphone number; the microphone distribution database comprises at least two groups of data, wherein each group of data comprises names of participants and microphone numbers corresponding to the names of the participants;

and controlling to open the microphone corresponding to the target microphone number according to the target microphone number.

Optionally, the acquiring a voice signal of a conference moderator includes:

acquiring a voice signal of a conference host and the acquisition time of the voice signal;

generating a conference recording blank template according to the acquisition time, wherein the conference recording blank template comprises a conference project filling area, a speaker name filling area and a conference recording text filling area;

determining a target meeting item corresponding to the acquisition time according to the acquisition time and a preset meeting process, filling the target meeting item into a meeting item filling area, and updating the meeting record blank template;

correspondingly, after obtaining the names of the participants appearing in the text data according to the text data and a preset participant list database and obtaining the names of the target participants, the conference management method further comprises:

filling the names of the target participants into the speaker name filling area, and updating the conference record blank template;

and displaying the updated blank template of the conference record to the conference host.

Optionally, after controlling to turn on a microphone corresponding to the target microphone number according to the target microphone number, the conference management method further includes:

determining a target position coordinate corresponding to the target microphone number according to the target microphone number and a preset position coordinate database; wherein the position coordinate database comprises at least two microphone numbers and position coordinates corresponding to the microphone numbers;

and outputting a control instruction to a camera according to the target position coordinate, wherein the control instruction is used for indicating the camera to carry out image acquisition towards the target position coordinate.

Optionally, the performing speech recognition on the speech signal to obtain corresponding text data includes:

generating a voice waveform image of the voice signal in a preset voice coordinate system;

based on a voice activity detection algorithm, dividing the voice oscillogram to obtain at least two effective voice sections;

extracting a voice characteristic curve corresponding to each effective voice section through a voice characteristic recognition algorithm;

extracting a standard characteristic curve associated with each candidate character from a preset corpus;

drawing the standard characteristic curve and the voice characteristic curve on a preset characteristic coordinate, and calculating the difference area of an intersection region between the standard characteristic curve and the voice characteristic curve;

if the difference area of any candidate character is smaller than a preset area difference threshold, identifying the candidate character as character information contained in a corresponding effective speech section;

and sequentially combining the character information based on the sequence of each effective voice section in the voice oscillogram to generate the character data.

The invention has the beneficial effects that: the method comprises the steps of carrying out voice recognition on a voice signal of a conference host to obtain corresponding text data, determining names of participants contained in the text data according to a preset conference participant database, needing the participants corresponding to the name of a target participant to speak, wherein each name of the participant corresponds to one microphone, each microphone has a specific number, determining the microphone number corresponding to the name of the target participant according to the obtained name of the target participant and the preset microphone distribution database to obtain the number of the target microphone, and finally controlling to turn on the microphone corresponding to the number of the target microphone according to the number of the target microphone, namely controlling to turn on the microphone in front of the participant needing to speak. The conference management method controls and manages according to the collected voice signal of the conference host, manual management of the conference host in the whole process is not needed, and management efficiency is improved; the microphone that only need meet personnel correspond when speaking is controlled and is opened, other microphones are in the off-state, power consumption has been reduced, moreover, also, the condition that all meet personnel all speak simultaneously appear at the same moment because of all microphones all are in the on-state can be avoided, and then avoid leading to the meeting to go on smoothly because of this kind of condition, need not to prolong meeting time, can not cause the communication confusion because of the above-mentioned condition, and then avoid appearing more serious consequence. Therefore, the conference management method can improve the effect of conference management in an automatic control mode.

Drawings

In order to more clearly illustrate the technical solution of the embodiment of the present invention, the drawings needed to be used in the embodiment will be briefly described as follows:

fig. 1 is a flow chart of a conference management method based on voice recognition.

Detailed Description

The embodiment provides a conference management method based on voice recognition, the conference management method is suitable for a conference room, an execution subject of the conference management method can be computer equipment or server equipment in the conference room, and the embodiment takes the computer equipment as an example. The conference room is specially provided with a conference host seat, or a certain position is defined as the conference host seat, a conference host hosting the conference sits on the seat, and the seat is provided with a microphone. And be provided with two at least participant's seats in the meeting room, the participant who participates in the meeting sits in the participant's seat that corresponds, and in this embodiment, each participant's seat is fixed, can not change, is the one-to-one between participant's seat and the participant promptly. Each seat of the participants is provided with a microphone for the participants to speak. And the microphone of the seat of the conference host and the microphones of the seats of the participants are in communication connection with the computer equipment. Certainly, a loudspeaker, a large screen and the like can be arranged in the conference room, which belongs to the conventional technology and is not described in detail. In this embodiment, the participants do not include a conference host.

As shown in fig. 1, the conference management method includes the steps of:

acquiring a voice signal of a conference host:

when a conference host speaks, a microphone in a seat of the conference host collects voice signals of the conference host and outputs the collected voice signals to computer equipment, and the computer equipment acquires the voice signals of the conference host.

In this embodiment, in order to record the whole conference by the conference host, as for the step of acquiring the voice signal of the conference host, a specific implementation process is given as follows:

(1) and acquiring the voice signal of the conference host and the acquisition time of the voice signal. When the voice signal of the conference host is acquired, the acquisition time of the voice signal, namely the sending time of the voice signal sent by the conference host, is also acquired.

(2) And generating a conference recording blank template according to the acquisition time, wherein the conference recording blank template comprises a conference item filling area, a speaker name filling area and a conference recording text filling area. After the acquisition time is obtained, the computer device generates a conference recording blank template according to the acquisition time, wherein the conference recording blank template is an initial conference recording template, and related data information obtained subsequently can be filled into the conference recording template so as to record a conference. The conference recording blank template comprises a conference item filling area, a speaker name filling area and a conference recording text filling area. The conference item filling area is used for filling conference items, the speaker name filling area is used for filling names of speakers, and the conference record text filling area is used for filling a conference record text. Table 1 shows a specific template structure of a blank template of a conference record, where an area a is a conference item filling area, an area B is a speaker name filling area, and an area C is a conference record text filling area.

TABLE 1

(3) And determining a target meeting item corresponding to the acquisition time according to the acquisition time and a preset meeting flow, filling the target meeting item into a meeting item filling area, and updating a meeting record blank template. A conference flow is preset, and includes at least two conference time periods and a conference process (i.e., a conference item) corresponding to each conference time period, for example: the conference items of 9:00-10:00 are the general manager to speak, the conference items of 10:00-11:00 are the department manager to speak, and the conference items of 11:00-12:00 are the staff representatives to speak. Then, according to the acquisition time and the preset conference flow, the target conference item corresponding to the acquisition time can be determined, for example: if the acquisition time is 9:35, the target conference item corresponding to the acquisition time can be determined as the general manager speaking by combining the preset conference flow. Then, the target meeting item is filled into the meeting item filling area (i.e., area a of table 1), and the meeting record blank template is updated.

Carrying out voice recognition on the voice signal to obtain corresponding text data:

and the computer equipment performs voice recognition on the acquired voice signal to obtain corresponding text data. The speech recognition is performed on the speech signal to obtain the text data, which belongs to the conventional technical means, and this embodiment provides a specific implementation process, and of course, in addition to the specific implementation process, other existing implementation processes may also be adopted in the present application. The specific implementation process steps given in this embodiment include:

(1) and generating a voice waveform diagram of the voice signal in a preset voice coordinate system. The ordinate of the voice coordinate system can be voice amplitude, and the abscissa can be acquisition time, so that a voice waveform map based on a time domain is generated. In addition, before the voice oscillogram is generated, the voice signals can be filtered, the voice signals without environmental noise can be obtained through filtering, and the voice signals after noise filtering can be subjected to mild processing, so that invalid noise frequency bands can be filtered.

(2) And based on a voice activity detection algorithm, dividing the voice oscillogram to obtain at least two effective voice sections. The valid speech segment refers to a speech segment containing the content of the utterance, and correspondingly, the invalid speech segment refers to a speech segment not containing the content of the utterance. A voice start amplitude and a voice end amplitude may be set, where the voice start amplitude is greater than the voice end amplitude, i.e. the start requirement of the active voice segment is higher than the end requirement of the active voice segment. Because the conference host is at the beginning time of speaking, the volume and the tone are often higher, and the value of the corresponding voice amplitude is higher at the moment; in the speaking process, some characters have weak tones or soft tones, and the interruption of speaking should not be recognized, so the ending amplitude of the speech needs to be properly reduced to avoid the occurrence of misidentification. Therefore, according to the voice starting amplitude and the voice ending amplitude, effective voice recognition is carried out on the voice oscillogram, so that at least two effective voice sections are obtained through division, wherein the amplitude corresponding to the starting time of the effective voice sections is larger than or equal to the voice starting amplitude, and the amplitude corresponding to the ending time is smaller than or equal to the voice ending amplitude. It should be appreciated that other implementations may be used in addition to the above described implementation of the division of active speech segments.

(3) And extracting the voice characteristic curve corresponding to each effective voice section through a voice characteristic recognition algorithm. In this embodiment, the voice feature recognition algorithm may be a fourier algorithm, and the effective voice segments are converted from the time domain curve to the frequency domain waveform to obtain the voice feature curves corresponding to the effective voice segments. In addition, if the frequency domain waveform obtained by conversion is a discrete waveform, the discrete waveform can be linearly fitted by a linear fitting method, and a corresponding voice characteristic curve is output.

(4) And extracting the standard characteristic curve associated with each candidate character from a preset corpus. A corpus is preset, wherein the corpus contains all candidate characters which can be identified, and each candidate character corresponds to an associated standard characteristic curve. The standard characteristic curve can be obtained by converting a speech signal of a standard pronunciation of at least one language.

(5) And drawing a standard characteristic curve and a voice characteristic curve on a preset characteristic coordinate, and calculating the difference area of the intersection area between the standard characteristic curve and the voice characteristic curve. In this embodiment, a standard characteristic curve and a voice characteristic curve are drawn on the same characteristic coordinate system, so that the difference between the two curves can be quickly compared, wherein the calculation of the difference is mainly determined by the size of the intersection area (i.e. the difference area of the intersection region) between the two curves: if the intersection area is larger, the larger the difference degree between the two curves is, the higher the probability that the candidate character is not contained in the effective speech section is; conversely, if the intersection area is smaller, the smaller the difference between the two curves is, the higher the probability that the valid speech segment contains the candidate character is. Furthermore, in order to improve the recognition accuracy, the speech characteristic curve is normalized, the speech oscillogram is divided into a plurality of different character segments according to the peak value change of the speech oscillogram of the effective speech segment, and one character segment comprises at least one peak value, so that each character segment can be ensured to correspond to one character. And normalizing the character segment in a time domain according to the length of the character segment, namely setting the time length of the character segment as preset standard time length, adjusting the amplitude value of the character segment in an equal proportion according to preset maximum amplitude, and converting a standard characteristic curve of the normalized character segment to obtain a voice characteristic curve corresponding to the normalized character segment.

(6) And if the difference area of any candidate character is smaller than the preset area difference threshold, identifying the candidate character as the character information contained in the corresponding effective speech section. If the difference area of the intersection area between the standard characteristic curve and the voice characteristic curve of any candidate character is smaller than the difference threshold value, the candidate character can be identified in the speaking content of the effective voice section, the order of each identified candidate character is determined according to the occurrence position of each identified candidate character in the effective voice section, and the candidate characters are combined based on the order to obtain character information. The standard characteristic curve of each candidate character is compared with the voice characteristic curve, so that the character information contained in the effective voice section is identified, and the accuracy of generating the character information is improved.

(7) And sequentially combining the character information based on the sequence of each effective voice section in the voice oscillogram to generate character data. Specifically, punctuation marks used for connecting two character information can be determined according to the association degree between the last character of the last effective speech section and the first character of the next effective speech section and the interval duration between the two speech sections, and the character information is generated by identifying each character information and the punctuation marks used for connecting, so that the readability of the character information is improved. In this embodiment, the voice signal is divided into a plurality of voice segments, so that the data volume of voice recognition at each time can be reduced, and the accuracy and the calculation amount of the voice recognition are considered at the same time.

Acquiring names of participants appearing in the text data according to the text data and a preset participant list database to obtain names of target participants; wherein the attendee list database comprises at least two attendee names:

in this embodiment, the computer device is preset with a participant list database, which includes at least two names of participants. The names of the participants in the participant list database are input before the conference, and as a specific implementation mode, the names of the participants are input into computer equipment by a conference host before the conference starts to form the participant list database.

According to the text data and the participant list database, names of participants appearing in the text data are obtained, and the following two specific implementation processes are provided in the embodiment: comparing each participant name in the participant list database with the text data respectively to determine whether each participant name exists in the text data; and secondly, inputting the text data into a participant list database to obtain names of participants appearing in the text data. It should be understood that the present embodiment is not limited to the implementation process described above.

The names of participants appearing in the text data are taken as the names of the target participants.

When the conference host needs the participant to speak, the name of the participant is reflected in the spoken voice signal, and then the name of the participant appearing in the text data is the name of the participant needing to speak. Such as: the text data is 'xxxxx zhang san say', the participant list database contains zhang san of names of participants, and then zhang san of names of participants appearing in the text data is the names of the participants needing to say.

It should be noted that typically only one participant is required to speak at the same time, and therefore, only one participant name is typically present in the voice signal emitted by the conference host. Thus, the text data is text data that contains only one participant's name. Of course, as a special case, if a plurality of participants (for example, two participants) need to speak at the same time, the names of the participants may appear in the voice signal sent by the conference host. Of course, it is very rare that such a plurality of participants speak at the same time, and it is understood that such a situation does not exist.

In order to further update the blank template of the conference record, the updated blank template of the conference record is output to a conference host, and the conference host records the conference. After obtaining the names of the target participants, the conference management method further comprises the following steps: filling the obtained names of the target participants into a speaker name filling area (namely the area B in the table 1), and updating a conference record blank template; and then, displaying the updated blank template of the conference record to a conference host, correspondingly, arranging a display screen or a touch screen on the seat of the conference host, displaying the updated blank template of the conference record by the display screen or the touch screen, so that the conference host can conveniently record the conference, and specifically, manually filling the conference record in a conference record text filling area (namely, a C area in the table 1) through a keyboard.

Determining a microphone number corresponding to the name of the target participant according to the name of the target participant and a preset microphone distribution database to obtain a target microphone number; wherein the microphone distribution database comprises at least two groups of data, each group of data comprises names of participants and microphone numbers corresponding to the names of the participants:

in this embodiment, if there are N participating persons, the microphone distribution database includes N groups of data, and each participating person corresponds to each group of data one to one. For any set of data, the set of data includes a participant name and a microphone number corresponding to the participant name. The microphone number corresponding to the participant name refers to: the number of the microphone in front of the participant.

And inputting the name of the target participant into a microphone distribution database, so that the microphone number corresponding to the target participant can be obtained, wherein the microphone number is the target microphone number.

Controlling to open the microphone corresponding to the target microphone number according to the target microphone number:

and the computer equipment controls to open the microphone corresponding to the target microphone number according to the obtained target microphone number, namely controls to open the microphone in front of the target participant (the control to open can mean to control the microphone to be electrified). The target participant can then speak via the front microphone.

It should be noted that when the very special case of the above occurs, that is, when two or more participants are required to speak at the same time, the names of the participants appear in the text data, and finally, the numbers of the multiple target microphones can be determined, then the microphones corresponding to the numbers of the target microphones are controlled to be turned on, that is, the microphones in front of the target participants are controlled to be turned on, and the target participants can speak at the same time.

In this embodiment, in order to obtain the image of the target participant when the target participant speaks, so that the large screen in the conference room or the display screen of the conference host can display the image of the target participant in real time, the conference management method further includes the following steps:

determining a target position coordinate corresponding to the target microphone number according to the target microphone number and a preset position coordinate database; wherein the position coordinate database includes at least two microphone numbers, and position coordinates corresponding to the respective microphone numbers:

a position coordinate database is also preset in the computer device, and the position coordinate database comprises at least two microphone numbers and position coordinates corresponding to the microphone numbers. In this embodiment, the position coordinate database includes N microphone numbers (N is the number of the participants), and the position coordinate is a position coordinate of the position of the microphone corresponding to the microphone number in the conference room, and may also be understood as a position coordinate of the seat of the participant in the conference room. A two-dimensional coordinate system is established by taking the ground of a conference room as a basis, and a certain point on the ground of the conference room as a two-dimensional coordinate origin. In this embodiment, the floor of the conference room is a rectangle, one corner of the rectangle is set as a two-dimensional origin of coordinates, and two sides of the corner are set as an X-axis and a Y-axis, respectively, so that the microphone position (i.e., the seat of the participant) of each participant is located in the two-dimensional coordinate system, and can be converted into coordinates in the two-dimensional coordinate system.

Then, the target microphone number is input to the position coordinate database, and the position coordinate corresponding to the target microphone number is obtained, and the position coordinate is the target position coordinate.

Outputting a control instruction to a camera according to the target position coordinate, wherein the control instruction is used for indicating the camera to perform image acquisition towards the target position coordinate:

the camera is fixed at a certain position of the conference room, the shooting angle of the camera is variable, and the camera is driven by the driving motor to rotate up, down, left and right. The computer equipment is electrically connected with the driving motor to realize the control of the driving motor.

Each position coordinate corresponds to the shooting angle of the camera one by one, and different shooting angles of the camera exist in different position coordinates. The computer device is provided with a corresponding relation between each position coordinate and the shooting angle of the camera. Then, the computer device determines a corresponding shooting angle according to the obtained target position coordinates, and outputs a control instruction to the camera according to the determined shooting angle, wherein the control instruction is used for instructing the camera to act, so that the shooting angle after the action is the determined shooting angle, and the camera performs image acquisition towards the target position coordinates. The collected images can be output to a large screen in a conference room, and also can be output to a display screen of a conference host.

The above-mentioned embodiments are merely illustrative of the technical solutions of the present invention in a specific embodiment, and any equivalent substitutions and modifications or partial substitutions of the present invention without departing from the spirit and scope of the present invention should be covered by the claims of the present invention.

Claims

1. A conference management method based on voice recognition is characterized by comprising the following steps:

acquiring a voice signal of a conference host;

2. The conference management method based on voice recognition according to claim 1, wherein the acquiring the voice signal of the conference host comprises:

3. The conference management method based on speech recognition of claim 1, wherein after controlling to turn on a microphone corresponding to the target microphone number according to the target microphone number, the conference management method further comprises:

4. The conference management method based on voice recognition according to claim 1, wherein the voice signal performs voice recognition to obtain corresponding text data, and comprises: