WO2020218664A1

WO2020218664A1 - Smart conference system based on 5g communication and conference support method using robotic processing automation

Info

Publication number: WO2020218664A1
Application number: PCT/KR2019/005694
Authority: WO
Inventors: 이봉규; 이원상
Original assignee: 이봉규; (주)넵스; 이원상; (주)넵스홈
Priority date: 2019-04-25
Filing date: 2019-05-13
Publication date: 2020-10-29
Also published as: KR102061291B1

Abstract

The present invention relates to a smart conference system based on 5G communication and a conference support method using robotic processing automation and, more particularly, to a smart conference system based on 5G communication and a conference support method using robotic processing automation, the system enabling conference participants to conveniently and efficiently hold a conference through 5G-based Internet of Things technology, AI solution technology, automatic minutes creation technology, conference status analysis technology, and the like, and comprising a voice analysis device comprising: a voice model DB for storing pattern data of texts; a participant information DB for storing reference voice volume data of participants; a voice recognition module for converting a received sound signal into pattern data, and generating text information by searching the voice model DB for a text corresponding to the pattern data; a voice volume recognition module, which analyzes a voice volume of the sound signal so as to convert the analyzed voice volume into collected voice volume data, and checks emotion category information of the collected voice volume data; a participant recognition module for identifying a participant by searching the participant information DB for reference voice volume data corresponding to the collected voice volume data; and a speech information recognition module for generating speech information by classifying the text information and the emotion category information by participant.

Description

Smart conference system based on 5G communication and conference support method through robotic processing automation

The present invention relates to a 5G communication-based smart conference system and a conference support method through robotic processing automation, and more specifically, a 5G-based IoT technology, AI solution technology, automatic meeting recording technology, and meeting situation analysis technology. The present invention relates to a 5G communication-based smart conference system and a conference support method through robotic processing automation that enables conference participants to conduct conferences conveniently and efficiently.

With the recent spread of video conferencing systems, it is possible to record conference contents as video, and to edit recorded contents to generate multimedia conference minutes including video, audio and text.

In the process of generating meeting minutes through a conventional video conference system, video and audio from the beginning to the end of the meeting were recorded and recorded, respectively, to collect data necessary for the meeting minutes generation, and the collected data was stored for each type. The stored data is collected based on time, and the collected data is browsed through the data editing user interface. Here, the data editing user interface is adapted to edit content of speech, keywords, and the like. When the data collected in a time series were browsed through the data editing user interface, the user edited the data and created a keyword list, an event list, an importance list, and the like, thereby generating the minutes of the meeting.

However, in the conventional video conferencing system generation of meeting minutes, there is a limitation in recording the contents of a plurality of participants by dividing each participant's speech. Moreover, since the prior art only records the content of remarks, not all records of the meeting situation, what was the overall atmosphere of the meeting, what the emotions of the participants and other topics of interest, and in what direction the main topic will flow in the situation. It could not be predicted or grasped. Therefore, in the conventional video conferencing system, in order for non-participants to understand the contents of the meeting, it is inevitable that the video conferencing video is rotated. Moreover, there was an inconvenience and inconvenience in that the non-participant had to prepare related documents by directly judging the main topics of interest in the follow-up meeting.

In addition, the conventional video conferencing system does not have any other support functions other than the function of simply taking a video of the meeting details in real time or providing a microphone to the participants to amplify and record, so the meeting contents are recorded individually and the participants can directly review the meeting contents. There was trouble and hassle to search for relevant information and use it in the meeting.

Accordingly, the present invention was invented to solve the above problems, and not only records the contents of speech by accurately identifying participants in the meeting, but also grasps and reflects the speech sensibility, speech volume, and speech speed for each participant in the recording, and based on this This is a task to solve the provision of a 5G communication-based smart conference system that enables major interests of each participant in the conference and materials to be used in subsequent conferences in advance and a conference support method through robotic processing automation.

In order to achieve the above object, the present invention,

A speech model DB for storing text pattern data;

Participant information DB for storing standard volume data of participants;

A speech recognition module for converting the received sound signal into pattern data, searching text corresponding to the pattern data in a speech model DB, and generating text information;

A volume recognition module that analyzes the volume of the sound signal, converts it into collection volume data, and checks emotional category information of the collection volume data;

A participant recognition module for identifying a participant by searching for reference volume data corresponding to the collected volume data in the participant information DB; And

A speech information recognition module for generating speech information by classifying the text information and emotion category information for each participant;

It is a conference system for robotic processing automation including a voice analysis device having a.

The present invention described above accurately identifies participants in a meeting to record the content of speech, as well as grasps the speech emotion, speech volume, and speech speed of each participant and reflects it in the record, and based on this, major interests and follow-up of each participant at the meeting It is possible to prepare materials to be used in meetings in advance, and in particular, there is an effect of discovering specific topics based on informal texts for major interests.

1 is a perspective view schematically showing the interior of a conference room to which a conference system according to the present invention is applied,

2 is a block diagram showing an embodiment of a conference system according to the present invention,

3 is a block diagram showing a configuration unit of a voice analysis device configured in a conference system according to the present invention,

4 is a flowchart sequentially showing a conference support method operating based on a conference system according to the present invention,

FIG. 5 is a diagram showing an example of minutes created by the conference system according to the present invention,

6 is a block diagram showing a constituent unit of the conference room management apparatus configured in the conference system according to the present invention,

7 is a block diagram showing another embodiment of a conference system according to the present invention.

The features and effects of the present invention described above will become apparent through the following detailed description in connection with the accompanying drawings, whereby those of ordinary skill in the technical field to which the present invention pertains can easily implement the technical idea of the present invention. There will be. Since the present invention can apply various changes and have various forms, specific embodiments will be illustrated in the drawings and described in detail in the text. However, this is not intended to limit the present invention to a specific disclosed form, and it should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the present invention. The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention.

Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

1 is a perspective view schematically showing the interior of a conference room to which a conference system according to the present invention is applied, and FIG. 2 is a block diagram showing an embodiment of a conference system according to the present invention.

1 and 2, in the conference system 10 of the present embodiment, various equipment is installed so that a plurality of participants can gather in one place and discuss together. In addition, even if one or many of the participants cannot participate in person, additional equipment may be installed to support discussions with other participants through video.

The conference system 10 of the present embodiment includes a speech analysis device 100 that collects and records the contents of a participant's speech, and additional information corresponding to the speech information checked by the speech analysis device 100 on the participant or the contents of the meeting. Search device 200 for searching through the Internet or Ethernet (hereinafter referred to as'communication network'), a display 300 for outputting the speech information and search information, a voice analysis device 100 and a search device 200 and a display ( 300) includes a control device 400 for exchanging data and controlling interworking. In addition, a conference system 10, a translation device 500 that translates the speech information or search information into text commonly used by participants and outputs it through the display 300, and a conference room that adjusts the condition of the conference room so that the meeting environment of the participants is optimized. It may further include one or more selected from the management device 600. Of course, the control device 400 controls the translation device 500 and the conference room management device 600 to exchange data and interwork with the voice analysis device 100, the search device 200, and the display 300 beforehand.

The speech analysis apparatus 100 utilizes STT (Speech to Text) technology for converting speech data into text data. In addition, while communicating with other servers 20 and

personal terminals

30 and 40 through a communication network, it is possible to search for necessary data or communicate with participants or related persons. In particular, since the conference system according to the present invention is performed by communicating based on 5G (5th Generation Mobile Communication), additional information on conference contents can be searched and provided in real time.

Description of the unexplained reference numerals '620', '640','C', and'T' will be described in detail while describing the corresponding technology.

Further, a more detailed description of the

devices

100, 200, 300, 400, 500, 600 will be described below.

3 is a block diagram showing a configuration unit of a voice analysis apparatus configured in a conference system according to the present invention.

1 to 3, the speech analysis apparatus 100 of the present embodiment includes a speech model DB 110 for storing pattern data of text; Participant information DB 120 for storing reference volume data of participants; A voice recognition module 130 for converting the received sound signal into pattern data, searching for text corresponding to the pattern data in the voice model DB 110 to generate text information; A volume recognition module 140 which analyzes the volume of the sound signal and converts it into collection volume data, and checks emotional category information of the collection volume data; A participant recognition module 150 for identifying a participant by searching for reference volume data corresponding to the collected volume data in the participant information DB 120; And a speech information recognition module 160 for generating speech information by classifying the text information and emotion category information for each participant.

The voice model DB 110 stores text pattern data. In more detail, the pattern data is a basic waveform of an acoustic signal of a text, and is reference data such as an acoustic model and a text model for general speech recognition. Speech recognition technology is already a known technology, and pattern data of text is continuously developed through Deep Learning, which is a learning algorithm for each text waveform for a standard human voice.

Participant information DB 120 stores the standard volume data of the participant. To explain this in more detail, the standard volume data is standardized by collecting the volume of participants who have a history of participating in the meeting or who can participate in the future. In this implementation, the volume collection of the participant is performed by receiving a sound signal of the text by making a separate volume collection means (not shown) speak the text specified to the participant. In this way, the volume collection means analyzes the received sound signal to grasp information such as waveform, frequency, amplitude, decibel and wavelength, and converts the collected information into data to generate reference volume data of the participant. As a result, the conference system according to the present invention stores the reference volume data of the subscribed members and utilizes it to generate conference records.

The voice recognition module 130 converts the received sound signal into pattern data, and searches the text corresponding to the pattern data in the voice model DB 110 to generate text information. In more detail, in the conference room, the microphone 101 is installed at a position that can efficiently receive the participant's sound signal, and is usually installed on the table T facing the participant seated in the chair C. . The voice recognition module 130 extracts an acoustic signal that maintains the pattern waveforms designated primarily from among a plurality of acoustic signals received by the microphone 101, and filters and removes other acoustic signals. Here, the designated pattern waveform is a waveform of text pattern data stored in the voice model DB 110. However, comparing a number of acoustic signals received in real time with a number of pattern data stored in the voice model DB 110 may put a burden on the process. Therefore, in order to increase the efficiency of the primary filtering, text that the participant mainly repeats during the speech process may be designated as standard text, and the waveform of the standard text may be designated as a pattern waveform. The sound signals extracted through the primary filtering are sound signals of texts spoken by each participant, and the voice recognition module 130 removes all sound signals corresponding to miscellaneous sounds.

Subsequently, the voice recognition module 130 secondary filters only the sound signals in a relatively high frequency range among the first filtered sound signals. The sound signals extracted through the primary filtering may include a plurality of sound signals generated by simultaneous speech by two or more participants. However, since the participant speaks closest to the microphone 101 assigned to him/her for speaking, the participant's sound signal is received as the loudest among the plurality of sound signals. Accordingly, the speech recognition module 130 extracts only the largest frequency, that is, a relatively high frequency range of the sound signals, and removes all other sound signals. As a result, the voice recognition module 130 can extract only the sound signal of the participant even if a plurality of participants simultaneously speak in a number of noises. For reference, the microphone 101 may be assigned to each participant as described above, or two or more participants may share and use one microphone 101. However, in both cases, one microphone 101 simultaneously receives various noises and acoustic signals of multiple participants, and the participant speaks near the microphone 101 for his or her speech.

Subsequently, the voice recognition module 130 analyzes the sound signal extracted through secondary filtering to generate pattern data for the waveform. The speech recognition module 130 searches the speech model DB 110, checks the text of the pattern data corresponding to the pattern data, and generates text information. The speech recognition module 130 is applied STT (Speech to Text) deep learning technology based on natural language processing (NLP) technology.

The volume recognition module 140 analyzes the volume of the sound signal and converts it into collection volume data, and checks the emotional category information of the collection volume data. In more detail, as described above, since the acoustic signal is an analog wave, the acoustic signal includes information such as waveform, frequency, amplitude, decibel, and wavelength of the wave. Therefore, the volume recognition module 140 checks the frequency of the sound signal for filtering by the voice recognition module 130, and digitally converts information such as waveform, frequency, amplitude, decibel and wavelength of the second filtered sound signal. It converts and generates collection volume data.

Subsequently, the volume recognition module 140 compares the reference volume data of the participant identified by the participant recognition module 150 with the collection volume data, and checks emotion category information corresponding to the collection volume data. The volume is measured as an indicator of the participants' initiative and confidence in the meeting, and the following literature term matrix is weighted and reflected. On the other hand, the emotional category information is classified according to a specified ratio of the reference volume data of the corresponding participant, and if the standard volume data to the collection volume data falls within the ratio range of the general mode, the participant at the time of remarking the collection volume data Means that it is an ordinary emotion. By the way, if the reference volume data to the collected volume data falls within the ratio range of the negative mode, it means that the participant at the time of remarking the collected volume data is a negative or excited emotion. In addition, if the reference volume data to the collected volume data falls within the range of the ratio of the emotion mode, it means that the participant at the time of remarking the collected volume data is positive or a sensibility in a state of interest. In addition, if the reference volume data to the collected volume data falls within the ratio of the helpless mode, it means that the participant at the time of remarking the collected volume data has a low concentration or a feeling of helplessness.

In the present embodiment, the volume data may constitute at least one selected from a decibel of an acoustic signal, a frequency, and an amplitude, and the ratio for each mode may be set by dividing into a decibel and a frequency and amplitude. For reference, if the decibel, frequency, and amplitude of the collected volume data are more than the specified ratio compared to the decibel, frequency, and amplitude of the reference volume data, it can be seen that the emotion of the participant is currently in the negative mode. In addition, if the decibel, frequency, and amplitude of the collected volume data are less than the specified ratio compared to the decibel, frequency, and amplitude of the reference volume data, it can be seen that the emotion of the participant is the current lethargy mode.

In addition, the volume recognition module 140 of the present embodiment, in addition to one or more selected from the decibel, frequency, and amplitude of the sound signal, is the emotional category according to the hourly reception length of the sound signal for a specific participant, that is, the amount and speed of speech of the corresponding participant. You can check the information. For example, even if the emotional category was first checked through only the above-described decibel, frequency and amplitude, if the reception length of the sound signal of the same participant is more than a certain amount, it is determined that the participant's interest in the related field of the speech text is high. If it is less than a certain amount, it is judged that the interest of the participant is low.

As a result, the volume recognition module 140 may further include a function of checking the length of the sound signal of the collected volume data obtained by digitizing the corresponding sound signal in order to check the degree of interest of the participant.

The ratio for each mode of the above-described emotion category may be standardized without distinction between participants through repeated deep learning learning. In addition, the ratio of the above-described emotion category for each mode may be individualized through learning of a deep learning technique to a change ratio of the volume data according to a participant's emotional state. In addition, the ratio for each mode of the above-described emotion category may be set by learning the ratio at which at least one selected from the decibel of the volume data and the frequency and amplitude changes according to the emotional state.

The participant recognition module 150 identifies the participant by searching the participant information DB 120 for reference volume data corresponding to the collected volume data. In more detail, by comparing the collected volume data converted and generated by the volume recognition module 140 with the reference volume data of the participant information DB 120, participants of the reference volume data corresponding to the error range are identified. In general, at the beginning of the meeting, participants speak in a loud volume of daily poetry. Therefore, the participant recognition module 150 can identify the participant by using the collection volume data. However, in special cases, the participant may start speaking from the beginning of the meeting with a different strong voice or a low voice than usual. Therefore, when the participant recognition module 150 does not search for the reference volume data corresponding to the collected volume data in the participant information DB 120, the collected volume data is converted into a ratio for each mode of the emotion category to search for the standard volume data. , The participant of the searched reference volume data is identified as the speaker of the collected volume data.

On the other hand, when the microphone 101 is assigned to each participant and the identification code for each reception microphone 101 is linked to the sound signal received and filtered by the voice recognition module 130, the participant recognition module 150 Is identified according to the identification code and recognized as an acoustic signal of the previously identified participant.

However, when multiple participants share one microphone 101, a plurality of sound wave recognition sensors (not shown) are arranged along the perimeter of the microphone 101, and the voice recognition module 130 receives sound signals The sound wave recognition sensor that recognizes the highest frequency sound wave is checked, and the identification code of the sound wave recognition sensor is linked to the sound signal. Accordingly, the participant recognition module 150 identifies the sound signal received afterwards according to the identification code and recognizes it as the sound signal of the previously identified participant.

The speech information recognition module 160 generates speech information by classifying the text information and the emotional category information for each participant. In more detail, the text information of the voice recognition module 130, the emotional category information of the volume recognition module 140, and the participant of the participant recognition module 150 are identified in chronological order from the meeting point to the end point, and the speech information It is finally created as one set. The speech information recognition module 160 transmits the speech information to the printing module 180, and the printing module 180 prints the received speech information as text so that the participant can visually recognize the speech information. The printed speech information is output (refer to FIG. 5) or stored through the display 300, a printer (not shown), or a storage medium (not shown). In addition, the speech information recognition module 160 transmits the text information and the emotional category information classified by participants to the printing module 180 in real time, so that the phrase in the set unit is displayed 300 ) To print.

In addition, the speech information recognition module 160 analyzes the text information to identify key keywords and fields, and checks the emotional category information of the participant with respect to the text information, and determines the participant's interest in the key keywords and fields. To grasp. At this time, the amount and speed of speech of the participant can be checked from the collection volume data and applied to grasp of interest. This technology is not limited to the technology for detecting topics of interest by text alone, but also on the basis of unstructured text such as speech status, speech volume, and speech speed for the participant's main interests and work by discovering specific topics of the participant. To be able to use it. This text analysis technology reflects the new method of weight, such as speech sensitivity, speech volume, and speech speed, to the Document Term Matrix technology, and the weight is not limited to the acoustic signal generated by the participant's speech, but various IoT sensors installed inside the conference room, etc. It can also be measured and collected and utilized.

To explain this in more detail, the speech information recognition module 160 receives text information of a certain amount or more from the speech recognition module 130, analyzes the text information using a text mining technique, etc. to extract key keywords. do. The speech information recognition module 160 determines the field of the text information through the main keyword extracted in this way. In addition, the speech information recognition module 160 may check the emotion category information of the participant with respect to the text information, and thus determine the degree of interest in the text information. The level of interest is determined according to the emotional state at the time of remarking the text information. For example, the digitized speech of participants in the conference is through natural language processing, and literature terms such as [Table 1] The frequency of speech per text is counted in a document term matrix.

참여자Participants	컴퓨터computer	전원power	시계clock	매출sales	비용 cost

참여자AParticipant A	1010	55	44	77	44
참여자BParticipant B	22	1One	55	88	99
참여자CParticipant C	55	66	55	44	55

The speech information recognition module 160 searches the counted text as shown in Table 1 in the sentiment word dictionary, and determines the emotional state of the participant based on this. That is, as the number of texts belonging to the emotional category increases, the speech information recognition module 160 determines the emotional state of the corresponding participant as'sensibility'.

In addition, the speech information recognition module 160, when the text of the emotion category identical to the emotion state of the participant is identified, adjusts the number of texts by weighting the text counting of the emotion category, and weights the emotion state of the participant through this. Is given. Therefore, even if texts of other emotional categories have been temporarily increased, the speech information recognition module 160 does not recognize as if the emotional state of the participant has changed suddenly.

On the other hand, if the participant is in the emotional state of the general mode when speaking the text information, the participant may determine that the interest in the field of the text information is not high. In addition, if the participant is in an emotional state in the negative mode when speaking the corresponding text information, the participant may determine that the participant has a high interest in the field of the text information and has an opposite position. In addition, if the participant is in the emotional state of the emotional mode when speaking the text information, the participant may determine that the participant has a high interest and a positive position in relation to the field of the text information. In addition, if the participant is in the emotional state of the helpless mode when speaking the text information, the participant may determine that the participant has low interest in the field of the text information and is indifferent to the overall topic of the meeting.

In addition, as described above, the degree of interest of the participant in the field may be determined by referring to the length of the sound signal of the collection volume data. The speech information recognition module 160 may perform text mining of the collection volume data more precisely as the length of the sound signal corresponding to the amount of speech and speech speed is longer, and analyze the topic in the relevant field in more detail.

The speech information recognition module 160 checks the speech speed by calculating the speech amount and speech time for each participant, and assigns a weight to the speech amount for each participant. Eventually, the number of speech texts counted for each participant increases or decreases by the assigned weight.

In addition, the speech information recognition module 160 adjusts the number of speeches of each participant for each text by giving a weight to the counting number of speech texts for each participant according to the volume data.

The speech information recognition module 160 analyzes the result of the combination of the main keyword, the field and the participant's interest level through deep learning learning to infer a related topic corresponding to the participant's interest, and the participant's next speech field Search for To explain this in more detail, the speech information recognition module 160 analyzes the results of combining the major keywords, fields and interests of the text information spoken by the participant during the meeting, as well as the results collected during the previous meeting through deep learning learning. In this way, the participant is interested in what field and how the contents of speech are changing, and predicts keywords, fields, and topics that the participant will speak in the future through the learned contents. In addition, the speech information recognition module 160 searches through the search device 200 for various kinds of big data on the predicted items, and recommends the result to the participants. Big data recommended by the speech information recognition module 160 may be performed during a meeting, and may be performed through various communication media such as e-mails or text messages of the participant even in daily life other than during a meeting.

In addition, the speech information recognition module 160 may grasp a clear emotional state and interest level for each participant through the above-described weighting. To explain this in more detail, the speech information recognition module 160 discovers a topic of similar contents based on the above-described document term matrix data (topic modeling technique). Topic modeling techniques include Latent Semantic Indexing (LSI), Probabilistic Latent Semantic Indexing (PLSI), and Latent Dirichlet Allocation (LDA). do. For example, although document 1 and document 2 have similar subjects, the types and frequency of use of words appearing in each document may be different. Therefore, the simple keyword model alone has limitations in calculating similarity between documents or classifying subjects, so it is not possible to recognize that document 1 and document 2 are the same/similar subject.

The speech information recognition module 160, in order to search for a topic corresponding to the speech text of a participant, finds α and β, which are basic values (vector values) for each text, calculates Θ for each document, and uses the Θ Calculate and classify the similarity of stars. That is, the speech information recognition module 160 calculates the similarity between two words by setting a vector value for each text based on a word embedding technique. To this end, the speech information recognition module 160 employs techniques such as Word2Vec, GloVe, and FastText. Accordingly, the speech information recognition module 160 finds a topic corresponding to the speech content of the participant, and searches and presents the related document corresponding to the topic. For reference, since a document does not belong only to a topic in one field but may belong to various fields, even the same document may be presented as a topic-related document in a field corresponding to the document term matrix data of the participant.

The data processing module 170 is a setting process while relaying data communication between the voice recognition module 130, the volume recognition module 140, the participant recognition module 150, the speech information recognition module 160 and the printing module 180 The operation of the

modules

130, 140, 150, 160, 180 is controlled according to the following. The data processing module 170 corresponds to a general central processing unit (CPU).

4 is a flowchart sequentially showing a conference support method operating based on the conference system according to the present invention, and FIG. 5 is a diagram showing an example of minutes created by the conference system according to the present invention.

This will be described with reference to FIGS. 1 to 5.

S10; Sound signal reception stage

The microphone 101 receives an acoustic signal generated by the speech of a conference participant. In this embodiment, the microphone 101 is installed on the table T of the conference room so that it is distributed for each participant.

S20; Sound signal filtering step

The voice recognition module 130 checks the waveforms of the plurality of sound signals received by the microphone 101, and searches the voice model DB 110 for a pattern waveform corresponding to the confirmed waveform. As a result of the search, when an acoustic signal of a waveform corresponding to the pattern waveform is checked, the corresponding acoustic signal is extracted and other acoustic signals are filtered and removed.

Upon completion of the filtering, the speech recognition module 130 extracts only the largest frequency, that is, a relatively high frequency range of the plurality of sound signals, and removes all other sound signals.

S30; Steps to generate text information

The voice recognition module 130 analyzes the sound signal finally extracted in the sound signal filtering step (S20) to generate pattern data for the waveform. In addition, the voice recognition module 130 searches the voice model DB 110, checks the text of the pattern data corresponding to the pattern data, and checks the text corresponding to each sound signal. The speech recognition module 130 combines the confirmed text and finally generates text information.

S40; Steps to convert collected volume data

The volume recognition module 140 analyzes the sound signal finally extracted in the sound signal filtering step (S20) and converts it into collection volume data.

Since the acoustic signal includes information such as wave waveform, frequency, amplitude, decibel and wavelength, the volume recognition module 140 combines the information to generate collection volume data.

S50; Participant Recognition Stage

The participant recognition module 150 searches the participant information DB 120 for reference volume data corresponding to the collected volume data, and recognizes the participant who spoke the corresponding sound signal.

Since the reference volume data has been described above, a description thereof is omitted here.

S60; Steps to check emotional category information

The volume recognition module 140 compares the reference volume data of the participant identified in the participant recognition step S50 with the collection volume data, and checks emotion category information corresponding to the collection volume data.

The emotion category information is classified according to a specified ratio of the reference volume data of the corresponding participant, and if the standard volume data to the collection volume data falls within the ratio range of the general mode, the participant at the time of remarking the collection volume data It means that it is an ordinary emotion. By the way, if the reference volume data to the collected volume data falls within the ratio range of the negative mode, it means that the participant at the time of remarking the collected volume data is a negative or excited emotion. In addition, if the reference volume data to the collected volume data falls within the range of the ratio of the emotion mode, it means that the participant at the time of remarking the collected volume data is positive or a sensibility in a state of interest. In addition, if the reference volume data to the collected volume data falls within the ratio of the helpless mode, it means that the participant at the time of remarking the collected volume data has a low concentration or a feeling of helplessness.

S70; Steps to generate remark information

The speech information recognition module 160 generates speech information by classifying the text information and the emotional category information for each participant. In more detail, the text information of the voice recognition module 130, the emotional category information of the volume recognition module 140, and the participants of the participant recognition module 150 are classified into one set, from the time of the meeting. It is collected in chronological order to the end point and finally generated as speech information. The speech information recognition module 160 transmits the speech information to the printing module 180, and the printing module 180 prints the received speech information as text so that the participant can visually recognize the speech information. The printed speech information is output or stored as shown in FIG. 5 through a display 300, a printer (not shown), or a storage medium (not shown). In addition, the speech information recognition module 160 transmits the text information and the emotional category information classified by participants to the printing module 180 in real time, so that the phrase in the set unit is displayed 300 ) To print.

6 is a block diagram showing a constituent unit of a conference room management apparatus configured in a conference system according to the present invention.

2 and 6, the conference room management apparatus 600 is an image analysis module 630 that compares and analyzes a camera 620 for photographing a conference room and an image captured by the camera 620 to check whether the conference room is changed. ), a transparent all-glass type smart glass 640 that expresses RGB color, an environmental condition check module 650 that senses the indoor environment of a conference room, and a participant installed and seated in a chair (C) provided in the conference room It includes a chair sensor 660 for recognizing the posture of the chair, a vibrator 670 for applying vibration of the chair C, and a schedule management module 680 for managing schedules of meetings and participants.

In more detail for each configuration, the image analysis module 630 stores the image captured by the camera 620 as a video file. In addition, by comparing the videos in chronological order, the participants' behavioral patterns and other states are identified, and various items placed on the table T are checked to confirm the presence or absence of items remaining after the meeting.

The smart glass 340 constitutes a wall surface that divides the conference room, so that it becomes opaque at the time of the meeting and becomes transparent again at the end of the meeting. For reference, if it is confirmed that the item identified by the video analysis module 630 remains after the meeting is over, the smart glass 340 maintains an opaque state. Therefore, participants in the meeting are called to their attention to ensure the end of the meeting.

The environmental condition checking module 650 detects environmental conditions such as air condition, illumination, temperature and humidity, and fine dust in the conference room. Therefore, in order to improve the environment, it is possible to control devices such as a fan, an air purifier, and a humidifier.

The chair sensor 660 is installed on the chair C to check whether the participant is seated, and scan the participant's posture to estimate the current participant's state. For example, by installing a pressure gauge on the seating surface of the chair C, it is determined whether the seating state of the participant is biased. Therefore, if it is confirmed that the chair sensor 660 is in a biased state, the corresponding participant is not in a correct posture, so it is possible to recognize this and give a warning.

The vibrator 670 is installed on the chair C to apply vibration according to the state of the participant. Therefore, the seated participant can focus on the meeting without feeling helpless.

The schedule management module 680 may manage a list of participants who will use the conference room, time, and other conference schedules, and output through the display 300.

The conference system 10 according to the present invention includes: an interest information storage device 700 for storing member information and interest information on interest fields and interest levels for each member; It further includes a search device 200 for retrieving the member information of the participant having a high degree of interest in the field of interest of the participant identified by the speech information recognition module 160 in the interest information storage device and outputting it through the display 300.

The interest information storage device 700 stores the currently registered member's personal information as member information, and stores the member's interest field and degree of interest confirmed in the course of a meeting. Therefore, it is possible to store and manage what the members are interested in and what interests are of interest.

As described above, since the speech information recognition module 160 can determine the participant's interest and interest based on the participant's speech information, the search device 200 checks the interest and interest of the participant and is interested in the same field. Search and recommend other members who have. Therefore, participants can meet with other recommended members to conduct meetings, and through this, they can develop more advanced results in the field.

In the detailed description of the present invention described above, it has been described with reference to preferred embodiments of the present invention, but those skilled in the art or those of ordinary skill in the relevant technical field, the spirit of the present invention described in the claims to be described later. And it will be understood that various modifications and changes can be made to the present invention within a range not departing from the technical field.

Claims

A speech model DB for storing text pattern data;

Participant information DB for storing standard volume data of participants;

A speech recognition module for converting the received sound signal into pattern data, searching for text corresponding to the pattern data in a speech model DB, and generating text information;

A volume recognition module that analyzes the volume of the sound signal, converts it into collection volume data, and checks emotional category information of the collection volume data;

A participant recognition module for identifying a participant by searching for reference volume data corresponding to the collected volume data in a participant information DB; And

A speech information recognition module for generating speech information by classifying the text information and emotion category information for each participant;

Conference system for robotic processing automation comprising a voice analysis device having a.
The method of claim 1, wherein the voice recognition module,

Primary filtering is performed to detect an acoustic signal maintaining a specified pattern waveform among a plurality of received acoustic signals, and secondary filtering is performed to detect an acoustic signal in a relatively high frequency range among the primary filtered acoustic signals to convert into pattern data. that;

Conference system for automation of robotic processing, characterized in that.
The method of claim 2,

The emotional category information is classified according to a designated ratio of the reference volume data of the corresponding participant;

The volume recognition module is configured to calculate the second-filtered sound signal as collection volume data and compare it with the reference volume data of a corresponding participant to confirm;

Conference system for automation of robotic processing, characterized in that.
The method of claim 3,

The reference volume data constitutes at least one selected from a decibel of an acoustic signal and a frequency and amplitude;

Conference system for automation of robotic processing, characterized in that.
The method of claim 1, wherein the speech information recognition module,

Analyzing the text information to identify key keywords and fields, and identifying the participant's interest in the key keywords and fields by checking the emotional category information of the corresponding participant with respect to the text information;

Conference system for automation of robotic processing, characterized in that.
The method of claim 5, wherein the speech information recognition module,

Analyzing the result of the combination of the main keyword, the field, and the participant's interest level through deep learning learning to infer the participant's interests, and searching for the participant's speech field;

Conference system for automation of robotic processing, characterized in that.
The method of claim 6,

An interest information storage device for storing member information and interest information for each member's interest field and interest level; And

A retrieval device for retrieving member information with a relatively high interest in the field of interest of the participant identified by the remark information recognition module in an interest information storage device and outputting it through a display;

Conference system for automating robotic processing, characterized in that it further comprises.
The method of claim 1, wherein the speech information recognition module,

Checking emotional category information to which the text of the text information belongs, and counting the text for each emotional category to check the emotional state of the corresponding participant;

Conference system for automation of robotic processing, characterized in that.
The method of claim 8, wherein the speech information recognition module,

Assigning a weight to the number of counting texts spoken by the participant according to the emotional state of the participant, the speech speed for each participant checked by the volume recognition module, and at least one selected from the collection volume data;

Conference system for automation of robotic processing, characterized in that.