WO2020218664A1 - Smart conference system based on 5g communication and conference support method using robotic processing automation - Google Patents

Smart conference system based on 5g communication and conference support method using robotic processing automation Download PDF

Info

Publication number
WO2020218664A1
WO2020218664A1 PCT/KR2019/005694 KR2019005694W WO2020218664A1 WO 2020218664 A1 WO2020218664 A1 WO 2020218664A1 KR 2019005694 W KR2019005694 W KR 2019005694W WO 2020218664 A1 WO2020218664 A1 WO 2020218664A1
Authority
WO
WIPO (PCT)
Prior art keywords
participant
information
recognition module
speech
volume data
Prior art date
Application number
PCT/KR2019/005694
Other languages
French (fr)
Korean (ko)
Inventor
이봉규
이원상
Original Assignee
이봉규
(주)넵스
이원상
(주)넵스홈
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 이봉규, (주)넵스, 이원상, (주)넵스홈 filed Critical 이봉규
Publication of WO2020218664A1 publication Critical patent/WO2020218664A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Definitions

  • the present invention relates to a 5G communication-based smart conference system and a conference support method through robotic processing automation, and more specifically, a 5G-based IoT technology, AI solution technology, automatic meeting recording technology, and meeting situation analysis technology.
  • the present invention relates to a 5G communication-based smart conference system and a conference support method through robotic processing automation that enables conference participants to conduct conferences conveniently and efficiently.
  • the data editing user interface is adapted to edit content of speech, keywords, and the like.
  • the conventional video conferencing system does not have any other support functions other than the function of simply taking a video of the meeting details in real time or providing a microphone to the participants to amplify and record, so the meeting contents are recorded individually and the participants can directly review the meeting contents. There was trouble and hassle to search for relevant information and use it in the meeting.
  • the present invention was invented to solve the above problems, and not only records the contents of speech by accurately identifying participants in the meeting, but also grasps and reflects the speech sensibility, speech volume, and speech speed for each participant in the recording, and based on this
  • This is a task to solve the provision of a 5G communication-based smart conference system that enables major interests of each participant in the conference and materials to be used in subsequent conferences in advance and a conference support method through robotic processing automation.
  • a speech model DB for storing text pattern data
  • Participant information DB for storing standard volume data of participants
  • a speech recognition module for converting the received sound signal into pattern data, searching text corresponding to the pattern data in a speech model DB, and generating text information
  • a volume recognition module that analyzes the volume of the sound signal, converts it into collection volume data, and checks emotional category information of the collection volume data;
  • a participant recognition module for identifying a participant by searching for reference volume data corresponding to the collected volume data in the participant information DB;
  • a speech information recognition module for generating speech information by classifying the text information and emotion category information for each participant
  • the present invention described above accurately identifies participants in a meeting to record the content of speech, as well as grasps the speech emotion, speech volume, and speech speed of each participant and reflects it in the record, and based on this, major interests and follow-up of each participant at the meeting It is possible to prepare materials to be used in meetings in advance, and in particular, there is an effect of discovering specific topics based on informal texts for major interests.
  • FIG. 1 is a perspective view schematically showing the interior of a conference room to which a conference system according to the present invention is applied,
  • FIG. 2 is a block diagram showing an embodiment of a conference system according to the present invention
  • FIG. 3 is a block diagram showing a configuration unit of a voice analysis device configured in a conference system according to the present invention
  • FIG. 4 is a flowchart sequentially showing a conference support method operating based on a conference system according to the present invention
  • FIG. 5 is a diagram showing an example of minutes created by the conference system according to the present invention.
  • FIG. 6 is a block diagram showing a constituent unit of the conference room management apparatus configured in the conference system according to the present invention
  • FIG. 7 is a block diagram showing another embodiment of a conference system according to the present invention.
  • FIG. 1 is a perspective view schematically showing the interior of a conference room to which a conference system according to the present invention is applied
  • FIG. 2 is a block diagram showing an embodiment of a conference system according to the present invention.
  • various equipment is installed so that a plurality of participants can gather in one place and discuss together.
  • additional equipment may be installed to support discussions with other participants through video.
  • the conference system 10 of the present embodiment includes a speech analysis device 100 that collects and records the contents of a participant's speech, and additional information corresponding to the speech information checked by the speech analysis device 100 on the participant or the contents of the meeting.
  • Search device 200 for searching through the Internet or Ethernet hereinafter referred to as'communication network'
  • a display 300 for outputting the speech information and search information includes a voice analysis device 100 and a search device 200 and a display ( 300) includes a control device 400 for exchanging data and controlling interworking.
  • a conference system 10 a translation device 500 that translates the speech information or search information into text commonly used by participants and outputs it through the display 300, and a conference room that adjusts the condition of the conference room so that the meeting environment of the participants is optimized. It may further include one or more selected from the management device 600.
  • the control device 400 controls the translation device 500 and the conference room management device 600 to exchange data and interwork with the voice analysis device 100, the search device 200, and the display 300 beforehand.
  • the speech analysis apparatus 100 utilizes STT (Speech to Text) technology for converting speech data into text data.
  • STT Sound to Text
  • other servers 20 and personal terminals 30 and 40 through a communication network, it is possible to search for necessary data or communicate with participants or related persons.
  • the conference system according to the present invention is performed by communicating based on 5G (5th Generation Mobile Communication), additional information on conference contents can be searched and provided in real time.
  • FIG. 3 is a block diagram showing a configuration unit of a voice analysis apparatus configured in a conference system according to the present invention.
  • the speech analysis apparatus 100 of the present embodiment includes a speech model DB 110 for storing pattern data of text; Participant information DB 120 for storing reference volume data of participants; A voice recognition module 130 for converting the received sound signal into pattern data, searching for text corresponding to the pattern data in the voice model DB 110 to generate text information; A volume recognition module 140 which analyzes the volume of the sound signal and converts it into collection volume data, and checks emotional category information of the collection volume data; A participant recognition module 150 for identifying a participant by searching for reference volume data corresponding to the collected volume data in the participant information DB 120; And a speech information recognition module 160 for generating speech information by classifying the text information and emotion category information for each participant.
  • the voice model DB 110 stores text pattern data.
  • the pattern data is a basic waveform of an acoustic signal of a text, and is reference data such as an acoustic model and a text model for general speech recognition.
  • Speech recognition technology is already a known technology, and pattern data of text is continuously developed through Deep Learning, which is a learning algorithm for each text waveform for a standard human voice.
  • Participant information DB 120 stores the standard volume data of the participant.
  • the standard volume data is standardized by collecting the volume of participants who have a history of participating in the meeting or who can participate in the future.
  • the volume collection of the participant is performed by receiving a sound signal of the text by making a separate volume collection means (not shown) speak the text specified to the participant.
  • the volume collection means analyzes the received sound signal to grasp information such as waveform, frequency, amplitude, decibel and wavelength, and converts the collected information into data to generate reference volume data of the participant.
  • the conference system according to the present invention stores the reference volume data of the subscribed members and utilizes it to generate conference records.
  • the voice recognition module 130 converts the received sound signal into pattern data, and searches the text corresponding to the pattern data in the voice model DB 110 to generate text information.
  • the microphone 101 is installed at a position that can efficiently receive the participant's sound signal, and is usually installed on the table T facing the participant seated in the chair C. .
  • the voice recognition module 130 extracts an acoustic signal that maintains the pattern waveforms designated primarily from among a plurality of acoustic signals received by the microphone 101, and filters and removes other acoustic signals.
  • the designated pattern waveform is a waveform of text pattern data stored in the voice model DB 110.
  • text that the participant mainly repeats during the speech process may be designated as standard text, and the waveform of the standard text may be designated as a pattern waveform.
  • the sound signals extracted through the primary filtering are sound signals of texts spoken by each participant, and the voice recognition module 130 removes all sound signals corresponding to miscellaneous sounds.
  • the voice recognition module 130 secondary filters only the sound signals in a relatively high frequency range among the first filtered sound signals.
  • the sound signals extracted through the primary filtering may include a plurality of sound signals generated by simultaneous speech by two or more participants. However, since the participant speaks closest to the microphone 101 assigned to him/her for speaking, the participant's sound signal is received as the loudest among the plurality of sound signals. Accordingly, the speech recognition module 130 extracts only the largest frequency, that is, a relatively high frequency range of the sound signals, and removes all other sound signals. As a result, the voice recognition module 130 can extract only the sound signal of the participant even if a plurality of participants simultaneously speak in a number of noises.
  • the microphone 101 may be assigned to each participant as described above, or two or more participants may share and use one microphone 101. However, in both cases, one microphone 101 simultaneously receives various noises and acoustic signals of multiple participants, and the participant speaks near the microphone 101 for his or her speech.
  • the voice recognition module 130 analyzes the sound signal extracted through secondary filtering to generate pattern data for the waveform.
  • the speech recognition module 130 searches the speech model DB 110, checks the text of the pattern data corresponding to the pattern data, and generates text information.
  • the speech recognition module 130 is applied STT (Speech to Text) deep learning technology based on natural language processing (NLP) technology.
  • the volume recognition module 140 analyzes the volume of the sound signal and converts it into collection volume data, and checks the emotional category information of the collection volume data.
  • the acoustic signal since the acoustic signal is an analog wave, the acoustic signal includes information such as waveform, frequency, amplitude, decibel, and wavelength of the wave. Therefore, the volume recognition module 140 checks the frequency of the sound signal for filtering by the voice recognition module 130, and digitally converts information such as waveform, frequency, amplitude, decibel and wavelength of the second filtered sound signal. It converts and generates collection volume data.
  • the volume recognition module 140 compares the reference volume data of the participant identified by the participant recognition module 150 with the collection volume data, and checks emotion category information corresponding to the collection volume data.
  • the volume is measured as an indicator of the participants' initiative and confidence in the meeting, and the following literature term matrix is weighted and reflected.
  • the emotional category information is classified according to a specified ratio of the reference volume data of the corresponding participant, and if the standard volume data to the collection volume data falls within the ratio range of the general mode, the participant at the time of remarking the collection volume data Means that it is an ordinary emotion.
  • the reference volume data to the collected volume data falls within the ratio range of the negative mode, it means that the participant at the time of remarking the collected volume data is a negative or excited emotion.
  • the reference volume data to the collected volume data falls within the range of the ratio of the emotion mode, it means that the participant at the time of remarking the collected volume data is positive or a sensibility in a state of interest.
  • the reference volume data to the collected volume data falls within the ratio of the helpless mode, it means that the participant at the time of remarking the collected volume data has a low concentration or a feeling of helplessness.
  • the volume data may constitute at least one selected from a decibel of an acoustic signal, a frequency, and an amplitude
  • the ratio for each mode may be set by dividing into a decibel and a frequency and amplitude.
  • the decibel, frequency, and amplitude of the collected volume data are more than the specified ratio compared to the decibel, frequency, and amplitude of the reference volume data, it can be seen that the emotion of the participant is currently in the negative mode.
  • the decibel, frequency, and amplitude of the collected volume data are less than the specified ratio compared to the decibel, frequency, and amplitude of the reference volume data, it can be seen that the emotion of the participant is the current lethargy mode.
  • the volume recognition module 140 of the present embodiment in addition to one or more selected from the decibel, frequency, and amplitude of the sound signal, is the emotional category according to the hourly reception length of the sound signal for a specific participant, that is, the amount and speed of speech of the corresponding participant. You can check the information. For example, even if the emotional category was first checked through only the above-described decibel, frequency and amplitude, if the reception length of the sound signal of the same participant is more than a certain amount, it is determined that the participant's interest in the related field of the speech text is high. If it is less than a certain amount, it is judged that the interest of the participant is low.
  • the volume recognition module 140 may further include a function of checking the length of the sound signal of the collected volume data obtained by digitizing the corresponding sound signal in order to check the degree of interest of the participant.
  • the ratio for each mode of the above-described emotion category may be standardized without distinction between participants through repeated deep learning learning.
  • the ratio of the above-described emotion category for each mode may be individualized through learning of a deep learning technique to a change ratio of the volume data according to a participant's emotional state.
  • the ratio for each mode of the above-described emotion category may be set by learning the ratio at which at least one selected from the decibel of the volume data and the frequency and amplitude changes according to the emotional state.
  • the participant recognition module 150 identifies the participant by searching the participant information DB 120 for reference volume data corresponding to the collected volume data. In more detail, by comparing the collected volume data converted and generated by the volume recognition module 140 with the reference volume data of the participant information DB 120, participants of the reference volume data corresponding to the error range are identified. In general, at the beginning of the meeting, participants speak in a loud volume of daily poetry. Therefore, the participant recognition module 150 can identify the participant by using the collection volume data. However, in special cases, the participant may start speaking from the beginning of the meeting with a different strong voice or a low voice than usual.
  • the participant recognition module 150 does not search for the reference volume data corresponding to the collected volume data in the participant information DB 120, the collected volume data is converted into a ratio for each mode of the emotion category to search for the standard volume data. , The participant of the searched reference volume data is identified as the speaker of the collected volume data.
  • the participant recognition module 150 Is identified according to the identification code and recognized as an acoustic signal of the previously identified participant.
  • a plurality of sound wave recognition sensors are arranged along the perimeter of the microphone 101, and the voice recognition module 130 receives sound signals
  • the sound wave recognition sensor that recognizes the highest frequency sound wave is checked, and the identification code of the sound wave recognition sensor is linked to the sound signal. Accordingly, the participant recognition module 150 identifies the sound signal received afterwards according to the identification code and recognizes it as the sound signal of the previously identified participant.
  • the speech information recognition module 160 generates speech information by classifying the text information and the emotional category information for each participant.
  • the text information of the voice recognition module 130, the emotional category information of the volume recognition module 140, and the participant of the participant recognition module 150 are identified in chronological order from the meeting point to the end point, and the speech information It is finally created as one set.
  • the speech information recognition module 160 transmits the speech information to the printing module 180, and the printing module 180 prints the received speech information as text so that the participant can visually recognize the speech information.
  • the printed speech information is output (refer to FIG. 5) or stored through the display 300, a printer (not shown), or a storage medium (not shown).
  • the speech information recognition module 160 transmits the text information and the emotional category information classified by participants to the printing module 180 in real time, so that the phrase in the set unit is displayed 300 ) To print.
  • the speech information recognition module 160 analyzes the text information to identify key keywords and fields, and checks the emotional category information of the participant with respect to the text information, and determines the participant's interest in the key keywords and fields. To grasp. At this time, the amount and speed of speech of the participant can be checked from the collection volume data and applied to grasp of interest.
  • This technology is not limited to the technology for detecting topics of interest by text alone, but also on the basis of unstructured text such as speech status, speech volume, and speech speed for the participant's main interests and work by discovering specific topics of the participant. To be able to use it.
  • This text analysis technology reflects the new method of weight, such as speech sensitivity, speech volume, and speech speed, to the Document Term Matrix technology, and the weight is not limited to the acoustic signal generated by the participant's speech, but various IoT sensors installed inside the conference room, etc. It can also be measured and collected and utilized.
  • the speech information recognition module 160 receives text information of a certain amount or more from the speech recognition module 130, analyzes the text information using a text mining technique, etc. to extract key keywords. do.
  • the speech information recognition module 160 determines the field of the text information through the main keyword extracted in this way.
  • the speech information recognition module 160 may check the emotion category information of the participant with respect to the text information, and thus determine the degree of interest in the text information.
  • the level of interest is determined according to the emotional state at the time of remarking the text information. For example, the digitized speech of participants in the conference is through natural language processing, and literature terms such as [Table 1] The frequency of speech per text is counted in a document term matrix.
  • the speech information recognition module 160 searches the counted text as shown in Table 1 in the sentiment word dictionary, and determines the emotional state of the participant based on this. That is, as the number of texts belonging to the emotional category increases, the speech information recognition module 160 determines the emotional state of the corresponding participant as'sensibility'.
  • the speech information recognition module 160 when the text of the emotion category identical to the emotion state of the participant is identified, adjusts the number of texts by weighting the text counting of the emotion category, and weights the emotion state of the participant through this. Is given. Therefore, even if texts of other emotional categories have been temporarily increased, the speech information recognition module 160 does not recognize as if the emotional state of the participant has changed suddenly.
  • the participant may determine that the interest in the field of the text information is not high. In addition, if the participant is in an emotional state in the negative mode when speaking the corresponding text information, the participant may determine that the participant has a high interest in the field of the text information and has an opposite position. In addition, if the participant is in the emotional state of the emotional mode when speaking the text information, the participant may determine that the participant has a high interest and a positive position in relation to the field of the text information. In addition, if the participant is in the emotional state of the helpless mode when speaking the text information, the participant may determine that the participant has low interest in the field of the text information and is indifferent to the overall topic of the meeting.
  • the degree of interest of the participant in the field may be determined by referring to the length of the sound signal of the collection volume data.
  • the speech information recognition module 160 may perform text mining of the collection volume data more precisely as the length of the sound signal corresponding to the amount of speech and speech speed is longer, and analyze the topic in the relevant field in more detail.
  • the speech information recognition module 160 checks the speech speed by calculating the speech amount and speech time for each participant, and assigns a weight to the speech amount for each participant. Eventually, the number of speech texts counted for each participant increases or decreases by the assigned weight.
  • the speech information recognition module 160 adjusts the number of speeches of each participant for each text by giving a weight to the counting number of speech texts for each participant according to the volume data.
  • the speech information recognition module 160 analyzes the result of the combination of the main keyword, the field and the participant's interest level through deep learning learning to infer a related topic corresponding to the participant's interest, and the participant's next speech field Search for To explain this in more detail, the speech information recognition module 160 analyzes the results of combining the major keywords, fields and interests of the text information spoken by the participant during the meeting, as well as the results collected during the previous meeting through deep learning learning. In this way, the participant is interested in what field and how the contents of speech are changing, and predicts keywords, fields, and topics that the participant will speak in the future through the learned contents. In addition, the speech information recognition module 160 searches through the search device 200 for various kinds of big data on the predicted items, and recommends the result to the participants. Big data recommended by the speech information recognition module 160 may be performed during a meeting, and may be performed through various communication media such as e-mails or text messages of the participant even in daily life other than during a meeting.
  • the speech information recognition module 160 may grasp a clear emotional state and interest level for each participant through the above-described weighting. To explain this in more detail, the speech information recognition module 160 discovers a topic of similar contents based on the above-described document term matrix data (topic modeling technique). Topic modeling techniques include Latent Semantic Indexing (LSI), Probabilistic Latent Semantic Indexing (PLSI), and Latent Dirichlet Allocation (LDA). do. For example, although document 1 and document 2 have similar subjects, the types and frequency of use of words appearing in each document may be different. Therefore, the simple keyword model alone has limitations in calculating similarity between documents or classifying subjects, so it is not possible to recognize that document 1 and document 2 are the same/similar subject.
  • LSI Latent Semantic Indexing
  • PLSI Probabilistic Latent Semantic Indexing
  • LDA Latent Dirichlet Allocation
  • the speech information recognition module 160 in order to search for a topic corresponding to the speech text of a participant, finds ⁇ and ⁇ , which are basic values (vector values) for each text, calculates ⁇ for each document, and uses the ⁇ Calculate and classify the similarity of stars. That is, the speech information recognition module 160 calculates the similarity between two words by setting a vector value for each text based on a word embedding technique. To this end, the speech information recognition module 160 employs techniques such as Word2Vec, GloVe, and FastText. Accordingly, the speech information recognition module 160 finds a topic corresponding to the speech content of the participant, and searches and presents the related document corresponding to the topic. For reference, since a document does not belong only to a topic in one field but may belong to various fields, even the same document may be presented as a topic-related document in a field corresponding to the document term matrix data of the participant.
  • the data processing module 170 is a setting process while relaying data communication between the voice recognition module 130, the volume recognition module 140, the participant recognition module 150, the speech information recognition module 160 and the printing module 180
  • the operation of the modules 130, 140, 150, 160, 180 is controlled according to the following.
  • the data processing module 170 corresponds to a general central processing unit (CPU).
  • FIG. 4 is a flowchart sequentially showing a conference support method operating based on the conference system according to the present invention
  • FIG. 5 is a diagram showing an example of minutes created by the conference system according to the present invention.
  • the microphone 101 receives an acoustic signal generated by the speech of a conference participant.
  • the microphone 101 is installed on the table T of the conference room so that it is distributed for each participant.
  • the voice recognition module 130 checks the waveforms of the plurality of sound signals received by the microphone 101, and searches the voice model DB 110 for a pattern waveform corresponding to the confirmed waveform. As a result of the search, when an acoustic signal of a waveform corresponding to the pattern waveform is checked, the corresponding acoustic signal is extracted and other acoustic signals are filtered and removed.
  • the speech recognition module 130 Upon completion of the filtering, the speech recognition module 130 extracts only the largest frequency, that is, a relatively high frequency range of the plurality of sound signals, and removes all other sound signals.
  • the voice recognition module 130 analyzes the sound signal finally extracted in the sound signal filtering step (S20) to generate pattern data for the waveform. In addition, the voice recognition module 130 searches the voice model DB 110, checks the text of the pattern data corresponding to the pattern data, and checks the text corresponding to each sound signal. The speech recognition module 130 combines the confirmed text and finally generates text information.
  • the volume recognition module 140 analyzes the sound signal finally extracted in the sound signal filtering step (S20) and converts it into collection volume data.
  • the volume recognition module 140 combines the information to generate collection volume data.
  • the participant recognition module 150 searches the participant information DB 120 for reference volume data corresponding to the collected volume data, and recognizes the participant who spoke the corresponding sound signal.
  • the volume recognition module 140 compares the reference volume data of the participant identified in the participant recognition step S50 with the collection volume data, and checks emotion category information corresponding to the collection volume data.
  • the emotion category information is classified according to a specified ratio of the reference volume data of the corresponding participant, and if the standard volume data to the collection volume data falls within the ratio range of the general mode, the participant at the time of remarking the collection volume data It means that it is an ordinary emotion.
  • the reference volume data to the collected volume data falls within the ratio range of the negative mode, it means that the participant at the time of remarking the collected volume data is a negative or excited emotion.
  • the reference volume data to the collected volume data falls within the range of the ratio of the emotion mode, it means that the participant at the time of remarking the collected volume data is positive or a sensibility in a state of interest.
  • the reference volume data to the collected volume data falls within the ratio of the helpless mode, it means that the participant at the time of remarking the collected volume data has a low concentration or a feeling of helplessness.
  • the speech information recognition module 160 generates speech information by classifying the text information and the emotional category information for each participant.
  • the text information of the voice recognition module 130, the emotional category information of the volume recognition module 140, and the participants of the participant recognition module 150 are classified into one set, from the time of the meeting. It is collected in chronological order to the end point and finally generated as speech information.
  • the speech information recognition module 160 transmits the speech information to the printing module 180, and the printing module 180 prints the received speech information as text so that the participant can visually recognize the speech information.
  • the printed speech information is output or stored as shown in FIG. 5 through a display 300, a printer (not shown), or a storage medium (not shown).
  • the speech information recognition module 160 transmits the text information and the emotional category information classified by participants to the printing module 180 in real time, so that the phrase in the set unit is displayed 300 ) To print.
  • FIG. 6 is a block diagram showing a constituent unit of a conference room management apparatus configured in a conference system according to the present invention.
  • the conference room management apparatus 600 is an image analysis module 630 that compares and analyzes a camera 620 for photographing a conference room and an image captured by the camera 620 to check whether the conference room is changed. ), a transparent all-glass type smart glass 640 that expresses RGB color, an environmental condition check module 650 that senses the indoor environment of a conference room, and a participant installed and seated in a chair (C) provided in the conference room It includes a chair sensor 660 for recognizing the posture of the chair, a vibrator 670 for applying vibration of the chair C, and a schedule management module 680 for managing schedules of meetings and participants.
  • the image analysis module 630 stores the image captured by the camera 620 as a video file. In addition, by comparing the videos in chronological order, the participants' behavioral patterns and other states are identified, and various items placed on the table T are checked to confirm the presence or absence of items remaining after the meeting.
  • the smart glass 340 constitutes a wall surface that divides the conference room, so that it becomes opaque at the time of the meeting and becomes transparent again at the end of the meeting. For reference, if it is confirmed that the item identified by the video analysis module 630 remains after the meeting is over, the smart glass 340 maintains an opaque state. Therefore, participants in the meeting are called to their attention to ensure the end of the meeting.
  • the environmental condition checking module 650 detects environmental conditions such as air condition, illumination, temperature and humidity, and fine dust in the conference room. Therefore, in order to improve the environment, it is possible to control devices such as a fan, an air purifier, and a humidifier.
  • the chair sensor 660 is installed on the chair C to check whether the participant is seated, and scan the participant's posture to estimate the current participant's state. For example, by installing a pressure gauge on the seating surface of the chair C, it is determined whether the seating state of the participant is biased. Therefore, if it is confirmed that the chair sensor 660 is in a biased state, the corresponding participant is not in a correct posture, so it is possible to recognize this and give a warning.
  • the vibrator 670 is installed on the chair C to apply vibration according to the state of the participant. Therefore, the seated participant can focus on the meeting without feeling helpless.
  • the schedule management module 680 may manage a list of participants who will use the conference room, time, and other conference schedules, and output through the display 300.
  • FIG. 7 is a block diagram showing another embodiment of a conference system according to the present invention.
  • the conference system 10 includes: an interest information storage device 700 for storing member information and interest information on interest fields and interest levels for each member; It further includes a search device 200 for retrieving the member information of the participant having a high degree of interest in the field of interest of the participant identified by the speech information recognition module 160 in the interest information storage device and outputting it through the display 300.
  • the interest information storage device 700 stores the currently registered member's personal information as member information, and stores the member's interest field and degree of interest confirmed in the course of a meeting. Therefore, it is possible to store and manage what the members are interested in and what interests are of interest.
  • the search device 200 checks the interest and interest of the participant and is interested in the same field. Search and recommend other members who have. Therefore, participants can meet with other recommended members to conduct meetings, and through this, they can develop more advanced results in the field.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Child & Adolescent Psychology (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Biomedical Technology (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present invention relates to a smart conference system based on 5G communication and a conference support method using robotic processing automation and, more particularly, to a smart conference system based on 5G communication and a conference support method using robotic processing automation, the system enabling conference participants to conveniently and efficiently hold a conference through 5G-based Internet of Things technology, AI solution technology, automatic minutes creation technology, conference status analysis technology, and the like, and comprising a voice analysis device comprising: a voice model DB for storing pattern data of texts; a participant information DB for storing reference voice volume data of participants; a voice recognition module for converting a received sound signal into pattern data, and generating text information by searching the voice model DB for a text corresponding to the pattern data; a voice volume recognition module, which analyzes a voice volume of the sound signal so as to convert the analyzed voice volume into collected voice volume data, and checks emotion category information of the collected voice volume data; a participant recognition module for identifying a participant by searching the participant information DB for reference voice volume data corresponding to the collected voice volume data; and a speech information recognition module for generating speech information by classifying the text information and the emotion category information by participant.

Description

5G 통신 기반 스마트 회의 시스템 및 로보틱 처리 자동화를 통한 회의 지원 방법Smart conference system based on 5G communication and conference support method through robotic processing automation
본 발명은 5G 통신 기반 스마트 회의 시스템 및 로보틱 처리 자동화를 통한 회의 지원 방법에 관한 것으로, 좀 더 구체적으로는 5G 기반의 사물인터넷 기술과 AI 솔루션 기술과 자동회의록 작성 기술 및 회의 상황 분석 기술 등을 통해 회의 참여자들이 편리하고 효율적인 회의 진행을 할 수 있도록 한 5G 통신 기반 스마트 회의 시스템 및 로보틱 처리 자동화를 통한 회의 지원 방법에 관한 것이다.The present invention relates to a 5G communication-based smart conference system and a conference support method through robotic processing automation, and more specifically, a 5G-based IoT technology, AI solution technology, automatic meeting recording technology, and meeting situation analysis technology. The present invention relates to a 5G communication-based smart conference system and a conference support method through robotic processing automation that enables conference participants to conduct conferences conveniently and efficiently.
최근 화상 회의 시스템이 보급되면서, 회의 내용을 영상으로 기록할 수 있게 되었고, 기록된 내용을 편집하여 영상, 음성 및 텍스트가 포함된 멀티미디어 회의록을 생성할 수 있게 되었다.With the recent spread of video conferencing systems, it is possible to record conference contents as video, and to edit recorded contents to generate multimedia conference minutes including video, audio and text.
종래 화상 회의 시스템을 통한 회의록 생성 과정을 보면, 회의 시작부터 종료까지의 영상 및 음성을 각각 녹화, 녹음하여 회의록 생성에 필요한 자료를 수집하고, 수집된 데이터를 각각 종류별로 저장했다. 그리고 저장된 데이터는 시간을 기준으로 취합하며, 취합된 데이터를 데이터 편집 유저 인터페이스를 통해 브라우징 했다. 여기서, 상기 데이터 편집 유저 인터페이스는 발언 내용, 키워드 등을 편집할 수 있도록 되어 있었다. 이렇게 시계열적으로 취합된 데이터가 데이터 편집 유저 인터페이스를 통해 브라우징되면, 사용자는 데이터를 편집하여 키워드 리스트, 이벤트 리스트, 중요도 리스트 등을 생성함으로써, 회의록을 생성했다.In the process of generating meeting minutes through a conventional video conference system, video and audio from the beginning to the end of the meeting were recorded and recorded, respectively, to collect data necessary for the meeting minutes generation, and the collected data was stored for each type. The stored data is collected based on time, and the collected data is browsed through the data editing user interface. Here, the data editing user interface is adapted to edit content of speech, keywords, and the like. When the data collected in a time series were browsed through the data editing user interface, the user edited the data and created a keyword list, an event list, an importance list, and the like, thereby generating the minutes of the meeting.
그러나 종래 화상 회의 시스템을 통한 회의록 생성은 다수의 참여자가 발언한 내용을 참여자별로 구분해서 기록하는데는 한계가 있었다. 더욱이, 종래 기술은 회의 상황에 대한 모든 기록이 아닌 단순한 발언 내용만을 기록하는데 불과하므로, 회의의 전체적인 분위기가 어떠했는지, 참여자의 감성 및 기타 주된 관심 주제가 어떠했고 그 상황에서 주요한 화제가 어떠한 방향으로 흐를지 등을 예측할 수 있거나 파악할 수 없었다. 따라서 종래 화상 회의 시스템은 회의의 미참여자가 회의 내용을 이해하기 위해서 회의를 촬영한 동영상을 돌려 파악할 수밖에 없었다. 더욱이 상기 미참여자가 직접 후속 회의의 주된 관심 주제 등을 판단해서 관련 문서를 준비해야 하는 번거로움과 불편이 있었다.However, in the conventional video conferencing system generation of meeting minutes, there is a limitation in recording the contents of a plurality of participants by dividing each participant's speech. Moreover, since the prior art only records the content of remarks, not all records of the meeting situation, what was the overall atmosphere of the meeting, what the emotions of the participants and other topics of interest, and in what direction the main topic will flow in the situation. It could not be predicted or grasped. Therefore, in the conventional video conferencing system, in order for non-participants to understand the contents of the meeting, it is inevitable that the video conferencing video is rotated. Moreover, there was an inconvenience and inconvenience in that the non-participant had to prepare related documents by directly judging the main topics of interest in the follow-up meeting.
이외에도 종래 화상 회의 시스템은 단순히 회의 사항을 실시간으로 영상 촬영하거나, 참여자들에게 마이크를 제공해서 확성 및 녹음시키는 기능 외에는 다른 지원 기능은 갖추고 있지 않으므로, 회의 내용을 일일이 기록하고 참여자가 직접 회의 내용에 대한 관련 정보를 검색해서 회의에 활용해야 하는 수고와 번거로움이 있었다.In addition, the conventional video conferencing system does not have any other support functions other than the function of simply taking a video of the meeting details in real time or providing a microphone to the participants to amplify and record, so the meeting contents are recorded individually and the participants can directly review the meeting contents. There was trouble and hassle to search for relevant information and use it in the meeting.
이에 본 발명은 상기의 문제를 해소하기 위해 발명된 것으로, 회의 참여자들을 정확히 식별해서 발언 내용을 기록함은 물론 참여자별 발언감성과 발언량 및 발언속도까지 파악해서 기록에 반영하고, 이를 기반으로 해당 회의에서의 참여자별 주요 관심사항과 후속 회의에서의 활용 소재 등을 사전에 준비할 수 있도록 하는 5G 통신 기반 스마트 회의 시스템 및 로보틱 처리 자동화를 통한 회의 지원 방법의 제공을 해결하고자 하는 과제로 한다.Accordingly, the present invention was invented to solve the above problems, and not only records the contents of speech by accurately identifying participants in the meeting, but also grasps and reflects the speech sensibility, speech volume, and speech speed for each participant in the recording, and based on this This is a task to solve the provision of a 5G communication-based smart conference system that enables major interests of each participant in the conference and materials to be used in subsequent conferences in advance and a conference support method through robotic processing automation.
상기의 과제를 달성하기 위하여 본 발명은,In order to achieve the above object, the present invention,
텍스트의 패턴데이터를 저장하는 음성모델DB;A speech model DB for storing text pattern data;
참여자의 기준 성량데이터를 저장하는 참여자정보DB;Participant information DB for storing standard volume data of participants;
수신된 음향신호를 패턴데이터로 변환하고, 상기 패턴데이터에 해당하는 텍스트를 음성모델DB에서 검색하여 텍스트 정보를 생성하는 음성인식모듈;A speech recognition module for converting the received sound signal into pattern data, searching text corresponding to the pattern data in a speech model DB, and generating text information;
상기 음향신호의 성량을 분석해서 수집 성량데이터로 변환하고, 상기 수집 성량데이터의 감성 카테고리 정보를 확인하는 성량인식모듈;A volume recognition module that analyzes the volume of the sound signal, converts it into collection volume data, and checks emotional category information of the collection volume data;
상기 수집 성량데이터에 상응하는 기준 성량데이터를 참여자정보DB에서 검색하여 참여자를 식별하는 참여자 인식모듈; 및A participant recognition module for identifying a participant by searching for reference volume data corresponding to the collected volume data in the participant information DB; And
상기 텍스트 정보와 감성 카테고리 정보를 참여자별로 분류해서 발언정보를 생성하는 발언정보 인식모듈;A speech information recognition module for generating speech information by classifying the text information and emotion category information for each participant;
을 구비한 음성분석장치를 포함하는 로보틱 처리 자동화를 위한 회의 시스템이다.It is a conference system for robotic processing automation including a voice analysis device having a.
상기의 본 발명은, 회의 참여자들을 정확히 식별해서 발언 내용을 기록함은 물론 참여자별 발언감성과 발언량 및 발언속도까지 파악해서 기록에 반영하고, 이를 기반으로 해당 회의에서의 참여자별 주요 관심사항과 후속 회의에서의 활용 소재 등을 사전에 준비할 수 있으며, 특히 주요 관심사항에 대해 비정형 텍스트 기반의 구체적인 토픽을 발견할 수 있는 효과가 있다.The present invention described above accurately identifies participants in a meeting to record the content of speech, as well as grasps the speech emotion, speech volume, and speech speed of each participant and reflects it in the record, and based on this, major interests and follow-up of each participant at the meeting It is possible to prepare materials to be used in meetings in advance, and in particular, there is an effect of discovering specific topics based on informal texts for major interests.
도 1은 본 발명에 따른 회의 시스템이 적용된 회의실의 내부를 개략적으로 도시한 사시도이고,1 is a perspective view schematically showing the interior of a conference room to which a conference system according to the present invention is applied,
도 2는 본 발명에 따른 회의 시스템의 일실시 예를 도시한 블록도이고,2 is a block diagram showing an embodiment of a conference system according to the present invention,
도 3은 본 발명에 따른 회의 시스템에 구성된 음성분석장치의 구성유닛을 도시한 블록도이고,3 is a block diagram showing a configuration unit of a voice analysis device configured in a conference system according to the present invention,
도 4는 본 발명에 따른 회의 시스템을 기반으로 동작하는 회의지원 방법을 순차로 도시한 플로차트이고,4 is a flowchart sequentially showing a conference support method operating based on a conference system according to the present invention,
도 5는 본 발명에 따른 회의 시스템이 작성한 회의록의 일 예를 도시한 도면이고,FIG. 5 is a diagram showing an example of minutes created by the conference system according to the present invention,
도 6은 본 발명에 따른 회의 시스템에 구성된 회의실 관리장치의 구성유닛을 도시한 블록도이고,6 is a block diagram showing a constituent unit of the conference room management apparatus configured in the conference system according to the present invention,
도 7은 본 발명에 따른 회의 시스템의 다른 실시 예를 도시한 블록도이다.7 is a block diagram showing another embodiment of a conference system according to the present invention.
상술한 본 발명의 특징 및 효과는 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 분명해질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 본 발명은 다양한 변경을 가할 수 있고 여러 가지 형태를 가질 수 있는바, 특정 실시 예들을 도면에 예시하고 본문에 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 개시형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 본 출원에서 사용한 용어는 단지 특정한 실시 예들을 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다.The features and effects of the present invention described above will become apparent through the following detailed description in connection with the accompanying drawings, whereby those of ordinary skill in the technical field to which the present invention pertains can easily implement the technical idea of the present invention. There will be. Since the present invention can apply various changes and have various forms, specific embodiments will be illustrated in the drawings and described in detail in the text. However, this is not intended to limit the present invention to a specific disclosed form, and it should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the present invention. The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention.
이하, 본 발명을 구체적인 내용이 첨부된 도면에 의거하여 상세히 설명한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.
도 1은 본 발명에 따른 회의 시스템이 적용된 회의실의 내부를 개략적으로 도시한 사시도이고, 도 2는 본 발명에 따른 회의 시스템의 일실시 예를 도시한 블록도이다.1 is a perspective view schematically showing the interior of a conference room to which a conference system according to the present invention is applied, and FIG. 2 is a block diagram showing an embodiment of a conference system according to the present invention.
도 1과 도 2를 참조하면, 본 실시의 회의 시스템(10)은 다수의 참여자가 한 곳에 모여 함께 토론할 수 있도록 각종 장비가 설치된다. 또한 참여자 중 1 인 또는 다수 인이 직접 참석하지 못해도, 화상을 통해 다른 참여자와 마주하며 토론할 수 있도록 지원하는 장비가 더 설치될 수 있다.1 and 2, in the conference system 10 of the present embodiment, various equipment is installed so that a plurality of participants can gather in one place and discuss together. In addition, even if one or many of the participants cannot participate in person, additional equipment may be installed to support discussions with other participants through video.
본 실시의 회의 시스템(10)은, 참여자의 발언 내용을 수집해서 텍스트로 기록하는 음성분석장치(100)와, 참여자 또는 회의 내용에 대해서 음성분석장치(100)에서 확인한 발언정보에 대응하는 추가 정보를 인터넷 또는 이더넷(이하 '통신망') 통해 검색하는 검색장치(200)와, 상기 발언정보 및 검색정보를 출력하는 디스플레이(300)와, 음성분석장치(100)와 검색장치(200)와 디스플레이(300) 간에 데이터를 교환하며 연동하도록 제어하는 관제장치(400)를 포함한다. 또한 회의 시스템(10), 상기 발언정보 또는 검색정보를 참여자가 통용하는 텍스트로 번역해서 디스플레이(300)를 통해 출력시키는 번역장치(500)와, 참여자들의 회의 환경이 최적화 되도록 회의실 상태를 조절하는 회의실 관리장치(600) 중 선택된 하나 이상을 더 포함할 수 있다. 물론 관제장치(400)는 번역장치(500) 및 회의실 관리장치(600)가 앞서 음성분석장치(100)와 검색장치(200)와 디스플레이(300)에 데이터를 교환하며 연동하도록 제어한다.The conference system 10 of the present embodiment includes a speech analysis device 100 that collects and records the contents of a participant's speech, and additional information corresponding to the speech information checked by the speech analysis device 100 on the participant or the contents of the meeting. Search device 200 for searching through the Internet or Ethernet (hereinafter referred to as'communication network'), a display 300 for outputting the speech information and search information, a voice analysis device 100 and a search device 200 and a display ( 300) includes a control device 400 for exchanging data and controlling interworking. In addition, a conference system 10, a translation device 500 that translates the speech information or search information into text commonly used by participants and outputs it through the display 300, and a conference room that adjusts the condition of the conference room so that the meeting environment of the participants is optimized. It may further include one or more selected from the management device 600. Of course, the control device 400 controls the translation device 500 and the conference room management device 600 to exchange data and interwork with the voice analysis device 100, the search device 200, and the display 300 beforehand.
음성분석장치(100)는, 음성데이터를 텍스트데이터로 변환하는 STT(Speech to Text) 기술을 활용한다. 또한, 통신망을 통해 다른 서버(20) 및 개인 단말기(30, 40)와 통신하면서, 필요한 데이터를 검색하거나 참여자 또는 관련자들과 통신할 수 있다. 특히 본 발명에 따른 회의 시스템은 5G(5th Generation Mobile Communication)를 기반으로 통신해 이루어지므로, 회의 내용에 대한 추가 정보가 실시간으로 검색 및 제공될 수 있다.The speech analysis apparatus 100 utilizes STT (Speech to Text) technology for converting speech data into text data. In addition, while communicating with other servers 20 and personal terminals 30 and 40 through a communication network, it is possible to search for necessary data or communicate with participants or related persons. In particular, since the conference system according to the present invention is performed by communicating based on 5G (5th Generation Mobile Communication), additional information on conference contents can be searched and provided in real time.
미설명된 도면부호인 '620', '640', 'C', 'T'의 설명은 해당 기술을 설명하면서 상세히 하겠다.Description of the unexplained reference numerals '620', '640','C', and'T' will be described in detail while describing the corresponding technology.
또한, 상기 장치들(100, 200, 300, 400, 500, 600)에 대한 보다 구체적인 설명은 이하에서 설명한다.Further, a more detailed description of the devices 100, 200, 300, 400, 500, 600 will be described below.
도 3은 본 발명에 따른 회의 시스템에 구성된 음성분석장치의 구성유닛을 도시한 블록도이다.3 is a block diagram showing a configuration unit of a voice analysis apparatus configured in a conference system according to the present invention.
도 1 내지 도 3을 참조하면, 본 실시의 음성분석장치(100)는, 텍스트의 패턴데이터를 저장하는 음성모델DB(110); 참여자의 기준 성량데이터를 저장하는 참여자정보DB(120); 수신된 음향신호를 패턴데이터로 변환하고, 상기 패턴데이터에 해당하는 텍스트를 음성모델DB(110)에서 검색하여 텍스트 정보를 생성하는 음성인식모듈(130); 상기 음향신호의 성량을 분석해서 수집 성량데이터로 변환하고, 상기 수집 성량데이터의 감성 카테고리 정보를 확인하는 성량인식모듈(140); 상기 수집 성량데이터에 상응하는 기준 성량데이터를 참여자정보DB(120)에서 검색하여 참여자를 식별하는 참여자 인식모듈(150); 상기 텍스트 정보와 감성 카테고리 정보를 참여자별로 분류해서 발언정보를 생성하는 발언정보 인식모듈(160);을 포함한다.1 to 3, the speech analysis apparatus 100 of the present embodiment includes a speech model DB 110 for storing pattern data of text; Participant information DB 120 for storing reference volume data of participants; A voice recognition module 130 for converting the received sound signal into pattern data, searching for text corresponding to the pattern data in the voice model DB 110 to generate text information; A volume recognition module 140 which analyzes the volume of the sound signal and converts it into collection volume data, and checks emotional category information of the collection volume data; A participant recognition module 150 for identifying a participant by searching for reference volume data corresponding to the collected volume data in the participant information DB 120; And a speech information recognition module 160 for generating speech information by classifying the text information and emotion category information for each participant.
음성모델DB(110)는 텍스트의 패턴데이터를 저장한다. 이를 좀 더 구체적으로 설명하면, 상기 패턴데이터는 텍스트의 음향신호의 기본 파형으로서, 일반적인 음성인식을 위한 음향모델 및 텍스트모델 등의 기준데이터이다. 음성인식 기술은 이미 공지의 기술이고, 텍스트의 패턴데이터는 사람의 표준 음성에 대한 텍스트별 파형의 학습 알고리즘인 딥러닝(Deep Learning)을 통해 지속적으로 개발된다.The voice model DB 110 stores text pattern data. In more detail, the pattern data is a basic waveform of an acoustic signal of a text, and is reference data such as an acoustic model and a text model for general speech recognition. Speech recognition technology is already a known technology, and pattern data of text is continuously developed through Deep Learning, which is a learning algorithm for each text waveform for a standard human voice.
참여자정보DB(120)는 참여자의 기준 성량데이터를 저장한다. 이를 좀 더 구체적으로 설명하면, 상기 기준 성량데이터는 회의에 참여한 이력이 있거나 앞으로 참여할 수 있는 참여자의 성량을 수집해서 표준화한 것이다. 본 실시에서 참여자의 성량 수집은 별도의 성량수집수단(미 도시함)이 참여자에게 지정된 텍스트를 발언하게 해서 해당 텍스트의 음향신호를 수신하며 이루어진다. 이렇게 상기 성량수집수단은 수신된 음향신호를 분석해서 파형과 주파수와 진폭과 데시벨과 파장 등의 정보를 파악하고, 이렇게 수집된 상기 정보를 데이터화해서 해당 참여자의 기준 성량데이터로 생성한다. 결국, 본 발명에 따른 회의 시스템은 가입된 회원들의 기준 성량데이터를 보관해서 회의 기록 생성에 활용한다.Participant information DB 120 stores the standard volume data of the participant. To explain this in more detail, the standard volume data is standardized by collecting the volume of participants who have a history of participating in the meeting or who can participate in the future. In this implementation, the volume collection of the participant is performed by receiving a sound signal of the text by making a separate volume collection means (not shown) speak the text specified to the participant. In this way, the volume collection means analyzes the received sound signal to grasp information such as waveform, frequency, amplitude, decibel and wavelength, and converts the collected information into data to generate reference volume data of the participant. As a result, the conference system according to the present invention stores the reference volume data of the subscribed members and utilizes it to generate conference records.
음성인식모듈(130)은, 수신된 음향신호를 패턴데이터로 변환하고, 상기 패턴데이터에 해당하는 텍스트를 음성모델DB(110)에서 검색하여 텍스트 정보를 생성한다. 이를 좀 더 구체적으로 설명하면, 회의실에는 참여자의 음향신호를 효율적으로 수신할 수 있는 위치에 마이크(101)가 설치되며, 통상적으로 의자(C)에 앉은 참여자가 마주하는 테이블(T)에 설치된다. 음성인식모듈(130)는 마이크(101)가 수신한 다수의 음향신호 중에서 1차로 지정된 패턴 파형들을 유지하는 음향신호를 추출하고, 다른 음향신호는 필터링해서 제거한다. 여기서 상기 지정된 패턴 파형은 음성모델DB(110)에 저장된 텍스트의 패턴데이터의 파형이다. 그런데 실시간으로 수신되는 수많은 음향신호를 음성모델DB(110)에 저장된 수많은 패턴데이터와 일일이 비교하는 것은 프로세스에 부담을 줄 수 있다. 따라서 1차 필터링의 효율을 높이기 위해서 발언 과정 중에 참여자가 주로 반복하는 텍스트를 표준 텍스트로 지정해서, 상기 표준 텍스트의 파형을 패턴 파형으로 지정할 수 있다. 상기 1차 필터링을 통해 추출된 음향신호는 참여자들이 각자 발언한 텍스트의 음향신호이고, 음성인식모듈(130)은 잡소리에 해당하는 음향신호는 모두 제거한다.The voice recognition module 130 converts the received sound signal into pattern data, and searches the text corresponding to the pattern data in the voice model DB 110 to generate text information. In more detail, in the conference room, the microphone 101 is installed at a position that can efficiently receive the participant's sound signal, and is usually installed on the table T facing the participant seated in the chair C. . The voice recognition module 130 extracts an acoustic signal that maintains the pattern waveforms designated primarily from among a plurality of acoustic signals received by the microphone 101, and filters and removes other acoustic signals. Here, the designated pattern waveform is a waveform of text pattern data stored in the voice model DB 110. However, comparing a number of acoustic signals received in real time with a number of pattern data stored in the voice model DB 110 may put a burden on the process. Therefore, in order to increase the efficiency of the primary filtering, text that the participant mainly repeats during the speech process may be designated as standard text, and the waveform of the standard text may be designated as a pattern waveform. The sound signals extracted through the primary filtering are sound signals of texts spoken by each participant, and the voice recognition module 130 removes all sound signals corresponding to miscellaneous sounds.
계속해서, 음성인식모듈(130)은 상기 1차 필터링된 음향신호 중에서 상대적으로 고주파 범위의 음향신호만을 2차 필터링한다. 상기 1차 필터링을 통해 추출한 음향신호는 둘 이상 참여자가 동시에 발언해서 생성된 다수의 음향신호를 포함할 수 있다. 그런데 참여자는 발언을 위해서 자신에게 배정된 마이크(101)와 가장 근접해서 발언하므로, 다수의 음향신호 중에서도 해당 참여자의 음향신호를 가장 큰 소리로 수신한다. 따라서 음성인식모듈(130)은 음향신호들 중에서 가장 큰 주파수, 즉 상대적으로 고주파 범위의 음향신호만을 추출하고, 기타 음향신호는 모두 제거한다. 결국, 음성인식모듈(130)은 수많은 소음 속에서 다수의 참여자가 동시에 발언해도, 해당 참여자의 음향신호만을 추출할 수 있다. 참고로 마이크(101)는 전술한 바와 같이 참여자별로 배정될 수도 있고, 둘 이상의 참여자가 1 개의 마이크(101)를 공유하며 사용할 수도 있다. 그러나 두 경우 모두 1개의 마이크(101)는 각종 소음과 다수 참여자의 음향신호를 동시에 수신하고, 참여자는 자신의 발언을 위해서 마이크(101) 가까이에서 발언한다.Subsequently, the voice recognition module 130 secondary filters only the sound signals in a relatively high frequency range among the first filtered sound signals. The sound signals extracted through the primary filtering may include a plurality of sound signals generated by simultaneous speech by two or more participants. However, since the participant speaks closest to the microphone 101 assigned to him/her for speaking, the participant's sound signal is received as the loudest among the plurality of sound signals. Accordingly, the speech recognition module 130 extracts only the largest frequency, that is, a relatively high frequency range of the sound signals, and removes all other sound signals. As a result, the voice recognition module 130 can extract only the sound signal of the participant even if a plurality of participants simultaneously speak in a number of noises. For reference, the microphone 101 may be assigned to each participant as described above, or two or more participants may share and use one microphone 101. However, in both cases, one microphone 101 simultaneously receives various noises and acoustic signals of multiple participants, and the participant speaks near the microphone 101 for his or her speech.
계속해서, 음성인식모듈(130)은 2차 필터링을 통해 추출된 음향신호를 분석해서 파형에 대한 패턴데이터를 생성한다. 음성인식모듈(130)은 음성모델DB(110)를 검색해서, 상기 패턴데이터에 상응하는 패턴데이터의 텍스트를 확인하고 텍스트 정보를 생성한다. 음성인식모듈(130)은 자연어 처리(Natural Language Processing, NLP) 기술을 기반으로 하는 STT(Speech to Text) 딥러닝 기술이 적용된다. Subsequently, the voice recognition module 130 analyzes the sound signal extracted through secondary filtering to generate pattern data for the waveform. The speech recognition module 130 searches the speech model DB 110, checks the text of the pattern data corresponding to the pattern data, and generates text information. The speech recognition module 130 is applied STT (Speech to Text) deep learning technology based on natural language processing (NLP) technology.
성량인식모듈(140)은, 상기 음향신호의 성량을 분석해서 수집 성량데이터로 변환하고, 상기 수집 성량데이터의 감성 카테고리 정보를 확인한다. 이를 좀 더 구체적으로 설명하면, 전술한 바와 같이 음향신호는 아날로그 파동이므로, 상기 음향신호는 파동의 파형과 주파수와 진폭과 데시벨과 파장 등의 정보를 포함한다. 그러므로 성량인식모듈(140)은 음성인식모듈(130)의 필터링을 위한 음향신호의 주파수를 확인하고, 상기 2차 필터링된 음향신호의 파형과 주파수와 진폭과 데시벨과 파장 등의 정보를 디저털로 변환해서 수집 성량데이터를 생성한다.The volume recognition module 140 analyzes the volume of the sound signal and converts it into collection volume data, and checks the emotional category information of the collection volume data. In more detail, as described above, since the acoustic signal is an analog wave, the acoustic signal includes information such as waveform, frequency, amplitude, decibel, and wavelength of the wave. Therefore, the volume recognition module 140 checks the frequency of the sound signal for filtering by the voice recognition module 130, and digitally converts information such as waveform, frequency, amplitude, decibel and wavelength of the second filtered sound signal. It converts and generates collection volume data.
계속해서 성량인식모듈(140)은, 참여자 인식모듈(150)이 식별한 참여자의 기준 성량데이터를 상기 수집 성량데이터와 비교해서, 상기 수집 성량데이터에 상응하는 감성 카테고리 정보를 확인한다. 성량은 회의에서 참여자의 주도성 및 자신감을 나타내는 지표로 측정하고, 하기 문헌용어행렬에 가중치를 주어 반영한다. 한편, 상기 감성 카테고리 정보는, 해당 참여자의 기준 성량데이터를 지정 비율에 따라 분류한 것으로서, 수집 성량데이터 대비 기준 성량데이터가 일반 모드의 비율 범위에 속하면, 상기 수집 성량데이터를 발언할 당시의 참여자는 평상시의 감성임을 의미한다. 그런데, 수집 성량데이터 대비 기준 성량데이터가 부정 모드의 비율 범위에 속하면, 상기 수집 성량데이터를 발언할 당시의 참여자는 부정적이거나 흥분 상태의 감성임을 의미한다. 또한 수집 성량데이터 대비 기준 성량데이터가 감성 모드의 비율 범위에 속하면, 상기 수집 성량데이터를 발언할 당시의 참여자는 긍정적이거나 관심 상태의 감성임을 의미한다. 이외에도 수집 성량데이터 대비 기준 성량데이터가 무기력 모드의 비율 범위에 속하면, 상기 수집 성량데이터를 발언할 당시의 참여자는 집중도가 낮거나 무력감 상태의 감성임을 의미한다.Subsequently, the volume recognition module 140 compares the reference volume data of the participant identified by the participant recognition module 150 with the collection volume data, and checks emotion category information corresponding to the collection volume data. The volume is measured as an indicator of the participants' initiative and confidence in the meeting, and the following literature term matrix is weighted and reflected. On the other hand, the emotional category information is classified according to a specified ratio of the reference volume data of the corresponding participant, and if the standard volume data to the collection volume data falls within the ratio range of the general mode, the participant at the time of remarking the collection volume data Means that it is an ordinary emotion. By the way, if the reference volume data to the collected volume data falls within the ratio range of the negative mode, it means that the participant at the time of remarking the collected volume data is a negative or excited emotion. In addition, if the reference volume data to the collected volume data falls within the range of the ratio of the emotion mode, it means that the participant at the time of remarking the collected volume data is positive or a sensibility in a state of interest. In addition, if the reference volume data to the collected volume data falls within the ratio of the helpless mode, it means that the participant at the time of remarking the collected volume data has a low concentration or a feeling of helplessness.
본 실시 예에서, 상기 성량데이터는 음향신호의 데시벨과 주파수와 진폭 중 선택된 하나 이상을 구성하고, 상기 모드별 비율은 데시벨과 주파수와 진폭별로 구분해 설정될 수 있다. 참고로, 기준 성량데이터의 데시벨과 주파수와 진폭에 비해 수집 성량데이터의 데시벨과 주파수와 진폭이 지정 비율 이상이면, 참여자의 감성은 현재 부정 모드임 알 수 있다. 또한 기준 성량데이터의 데시벨과 주파수와 진폭에 비해 수집 성량데이터의 데시벨과 주파수와 진폭이 지정 비율 미만이면, 참여자의 감성은 현재 무기력 모드임 알 수 있다. In the present embodiment, the volume data may constitute at least one selected from a decibel of an acoustic signal, a frequency, and an amplitude, and the ratio for each mode may be set by dividing into a decibel and a frequency and amplitude. For reference, if the decibel, frequency, and amplitude of the collected volume data are more than the specified ratio compared to the decibel, frequency, and amplitude of the reference volume data, it can be seen that the emotion of the participant is currently in the negative mode. In addition, if the decibel, frequency, and amplitude of the collected volume data are less than the specified ratio compared to the decibel, frequency, and amplitude of the reference volume data, it can be seen that the emotion of the participant is the current lethargy mode.
이외에도 본 실시의 성량인식모듈(140)은, 음향신호의 데시벨과 주파수와 진폭 중 선택된 하나 이상 이외에, 특정 참여자에 대한 음향신호의 시간당 수신 길이 즉, 해당 참여자의 발언량과 발언속도 등에 따라 감성 카테고리 정보를 확인할 수 있다. 일 예를 들어 설명하면, 전술한 데시벨과 주파수와 진폭 만을 통해 감성 카테고리를 1차 확인했어도, 동일 참여자의 음향신호의 수신 길이가 일정량 이상이면 발언 텍스트의 관련 분야에 참여자의 관심도가 높은 것으로 판단하고, 일정량 미만이면 참여자의 관심도가 낮은 것으로 판단한다. In addition, the volume recognition module 140 of the present embodiment, in addition to one or more selected from the decibel, frequency, and amplitude of the sound signal, is the emotional category according to the hourly reception length of the sound signal for a specific participant, that is, the amount and speed of speech of the corresponding participant. You can check the information. For example, even if the emotional category was first checked through only the above-described decibel, frequency and amplitude, if the reception length of the sound signal of the same participant is more than a certain amount, it is determined that the participant's interest in the related field of the speech text is high. If it is less than a certain amount, it is judged that the interest of the participant is low.
결국, 성량인식모듈(140)은 참여자의 관심도를 확인하기 위해서, 해당 음향신호를 디지타이징한 수집 성량데이터의 음향신호 길이를 확인하는 기능을 더 포함할 수 있다.As a result, the volume recognition module 140 may further include a function of checking the length of the sound signal of the collected volume data obtained by digitizing the corresponding sound signal in order to check the degree of interest of the participant.
전술한 감성 카테고리의 모드별 비율은 반복된 딥러닝 기법의 학습을 통해 참여자에 구분 없이 표준화할 수도 있다. 또한 전술한 감성 카테고리의 모드별 비율은 참여자의 감성 상태에 따라 성량데이터의 변화 비율을 딥러닝 기법의 학습을 통해서 개별화할 수도 있다. 또한 전술한 감성 카테고리의 모드별 비율은 성량데이터의 데시벨과 주파수와 진폭 중 선택된 하나 이상이 감성 상태에 따라 변화하는 비율을 각각 학습해서 설정할 수도 있다.The ratio for each mode of the above-described emotion category may be standardized without distinction between participants through repeated deep learning learning. In addition, the ratio of the above-described emotion category for each mode may be individualized through learning of a deep learning technique to a change ratio of the volume data according to a participant's emotional state. In addition, the ratio for each mode of the above-described emotion category may be set by learning the ratio at which at least one selected from the decibel of the volume data and the frequency and amplitude changes according to the emotional state.
참여자 인식모듈(150)은 상기 수집 성량데이터에 상응하는 기준 성량데이터를 참여자정보DB(120)에서 검색하여 참여자를 식별한다. 이를 좀 더 구체적으로 설명하면, 성량인식모듈(140)이 변환 생성한 수집 성량데이터를 참여자정보DB(120)의 기준 성량데이터와 비교해서, 오차 범위 이내에 해당하는 기준 성량데이터의 참여자를 확인한다. 일반적으로 회의 초반에는 참여자들이 일상시의 성량으로 발언한다. 그러므로 참여자 인식모듈(150)은 수집 성량데이터를 이용해서 해당 참여자를 식별할 수 있다. 그런데 특별한 경우에는 참여자가 회의 초반부터 일상과는 다른 강한 음성 또는 낮은 음성으로 발언을 시작할 수 있다. 그러므로, 참여자 인식모듈(150)은 수집 성량데이터에 상응하는 기준 성량데이터를 참여자정보DB(120)에서 미검색하면, 상기 수집 성량데이터를 감성 카테고리의 모드별 비율로 변환해서 기준 성량데이터를 검색하고, 검색된 해당 기준 성량데이터의 참여자를 상기 수집 성량데이터의 발언자로 간주해 식별한다.The participant recognition module 150 identifies the participant by searching the participant information DB 120 for reference volume data corresponding to the collected volume data. In more detail, by comparing the collected volume data converted and generated by the volume recognition module 140 with the reference volume data of the participant information DB 120, participants of the reference volume data corresponding to the error range are identified. In general, at the beginning of the meeting, participants speak in a loud volume of daily poetry. Therefore, the participant recognition module 150 can identify the participant by using the collection volume data. However, in special cases, the participant may start speaking from the beginning of the meeting with a different strong voice or a low voice than usual. Therefore, when the participant recognition module 150 does not search for the reference volume data corresponding to the collected volume data in the participant information DB 120, the collected volume data is converted into a ratio for each mode of the emotion category to search for the standard volume data. , The participant of the searched reference volume data is identified as the speaker of the collected volume data.
한편, 마이크(101)가 참여자별로 배정되고, 음성인식모듈(130)이 수신 및 필터링된 음향신호에 수신 마이크(101)별로 식별코드를 링크하면, 참여자 인식모듈(150)은 이후 수신되는 음향신호를 식별코드에 따라 식별해서 앞서 식별된 참여자의 음향신호로 인식한다. On the other hand, when the microphone 101 is assigned to each participant and the identification code for each reception microphone 101 is linked to the sound signal received and filtered by the voice recognition module 130, the participant recognition module 150 Is identified according to the identification code and recognized as an acoustic signal of the previously identified participant.
그러나, 다수 참여자가 1개의 마이크(101)를 공유하는 경우에는, 해당 마이크(101)의 둘레를 따라 다수의 음파 인식센서(미 도시함)를 배치하고, 음성인식모듈(130)은 음향신호 수신 시 가장 고주파의 음파를 인식한 음파 인식센서를 확인해서 해당 음파 인식센서의 식별코드를 상기 음향신호에 링크한다. 따라서 참여자 인식모듈(150)은 이후 수신되는 음향신호를 식별코드에 따라 식별해서 앞서 식별된 참여자의 음향신호로 인식한다.However, when multiple participants share one microphone 101, a plurality of sound wave recognition sensors (not shown) are arranged along the perimeter of the microphone 101, and the voice recognition module 130 receives sound signals The sound wave recognition sensor that recognizes the highest frequency sound wave is checked, and the identification code of the sound wave recognition sensor is linked to the sound signal. Accordingly, the participant recognition module 150 identifies the sound signal received afterwards according to the identification code and recognizes it as the sound signal of the previously identified participant.
발언정보 인식모듈(160)은 상기 텍스트 정보와 감성 카테고리 정보를 참여자별로 분류해서 발언정보를 생성한다. 이를 좀 더 구체적으로 설명하면, 음성인식모듈(130)의 텍스트 정보와 성량인식모듈(140)의 감성 카테고리 정보와 참여자 인식모듈(150)의 참여자를 회의 시점부터 종점까지 시간 순으로 파악해서 발언정보로 하나의 셋(Set)으로 최종 생성한다. 발언정보 인식모듈(160)은 상기 발언정보를 프린팅 모듈(180)에 전달하고, 프린팅 모듈(180)은 수신된 발언정보를 참여자가 시각적으로 인식할 수 있도록 텍스트로 프린팅한다. 이렇게 프린팅된 발언정보는 디스플레이(300) 또는 프린터(미도시 함) 또는 저장매체(미도시 함)를 통해 출력(도 5 참조)되거나 저장된다. 또한 발언정보 인식모듈(160)은 상기 텍스트 정보와 감성 카테고리 정보를 참여자별로 분류한 상기 셋(Set)을 프린팅 모듈(180)에 실시간으로 전달해서, 상기 셋(Set) 단위의 문구가 디스플레이(300)를 통해 출력되도록 할 수 있다.The speech information recognition module 160 generates speech information by classifying the text information and the emotional category information for each participant. In more detail, the text information of the voice recognition module 130, the emotional category information of the volume recognition module 140, and the participant of the participant recognition module 150 are identified in chronological order from the meeting point to the end point, and the speech information It is finally created as one set. The speech information recognition module 160 transmits the speech information to the printing module 180, and the printing module 180 prints the received speech information as text so that the participant can visually recognize the speech information. The printed speech information is output (refer to FIG. 5) or stored through the display 300, a printer (not shown), or a storage medium (not shown). In addition, the speech information recognition module 160 transmits the text information and the emotional category information classified by participants to the printing module 180 in real time, so that the phrase in the set unit is displayed 300 ) To print.
또한 발언정보 인식모듈(160)은, 상기 텍스트 정보를 분석해서 주요 키워드와 분야를 확인하고, 상기 텍스트 정보에 대한 해당 참여자의 감성 카테고리 정보를 확인해서, 상기 주요 키워드와 분야에 대한 참여자의 관심도를 파악한다. 이때, 상기 참여자의 발언량과 발언속도를 수집 성량데이터로부터 확인하여 관심도 파악에 적용할 수 있다. 본 기술은 텍스트만에 의한 관심사항의 토픽 검출 기술에 한정하지 않고, 참여자의 주요 관심사항에 대해 발언 상태와 발언량 및 발언속도 등의 비정형 텍스트를 기반으로도 해당 참여자의 구체적인 토픽을 발견해서 업무에 활용할 수 있게 한다. 이러한 텍스트 분석 기술은 Document Term Matrix 기술에 새로운 방식의 가중치인 발언감성과 발언량 및 발언속도 등을 반영한 것인데, 상기 가중치는 참여자가 발언하여 발생한 음향신호에 한정하지 않고 회의실 내부에 설치된 각종 IoT 센서 등을 통해서도 측정 및 수집되어서 활용될 수 있다.In addition, the speech information recognition module 160 analyzes the text information to identify key keywords and fields, and checks the emotional category information of the participant with respect to the text information, and determines the participant's interest in the key keywords and fields. To grasp. At this time, the amount and speed of speech of the participant can be checked from the collection volume data and applied to grasp of interest. This technology is not limited to the technology for detecting topics of interest by text alone, but also on the basis of unstructured text such as speech status, speech volume, and speech speed for the participant's main interests and work by discovering specific topics of the participant. To be able to use it. This text analysis technology reflects the new method of weight, such as speech sensitivity, speech volume, and speech speed, to the Document Term Matrix technology, and the weight is not limited to the acoustic signal generated by the participant's speech, but various IoT sensors installed inside the conference room, etc. It can also be measured and collected and utilized.
이를 좀 더 구체적으로 설명하면, 발언정보 인식모듈(160)은 음성인식모듈(130)에서 일정량 이상의 텍스트 정보를 수신하고, 상기 텍스트 정보를 텍스트마이닝(Text Mining) 기법 등으로 분석해서 주요 키워드를 추출한다. 발언정보 인식모듈(160)은 이렇게 추출된 주요 키워드를 통해 상기 텍스트 정보의 분야를 판단한다. 또한 발언정보 인식모듈(160)은 해당 텍스트 정보에 대한 참여자의 감성 카테고리 정보를 확인해서, 해당 텍스트 정보의 관심도를 파악할 수 있다. 상기 관심도는 해당 텍스트 정보를 발언할 시의 감성 상태에 따라 결정되는데, 일 예를 들면 디지타이즈된 된 회의 참여자의 발언은 자연어 처리기법(Natural Language Processing)을 통해, [표 1]과 같은 문헌용어행렬(Document Term Matrix)로 텍스트별 발언 빈도수가 카운팅된다.To explain this in more detail, the speech information recognition module 160 receives text information of a certain amount or more from the speech recognition module 130, analyzes the text information using a text mining technique, etc. to extract key keywords. do. The speech information recognition module 160 determines the field of the text information through the main keyword extracted in this way. In addition, the speech information recognition module 160 may check the emotion category information of the participant with respect to the text information, and thus determine the degree of interest in the text information. The level of interest is determined according to the emotional state at the time of remarking the text information. For example, the digitized speech of participants in the conference is through natural language processing, and literature terms such as [Table 1] The frequency of speech per text is counted in a document term matrix.
참여자Participants 컴퓨터computer 전원power 시계clock 매출sales 비용 cost
참여자AParticipant A 1010 55 44 77 44
참여자BParticipant B 22 1One 55 88 99
참여자CParticipant C 55 66 55 44 55
발언정보 인식모듈(160)은, [표 1]과 같이 카운팅된 텍스트를 감성어 사전에서 검색하고, 이를 기반으로 참여자의 감성 상태를 결정한다. 즉, 감성 카테고리에 속하는 텍스트가 많을수록 발언정보 인식모듈(160)은 해당 참여자의 감성 상태를 '감성'으로 결정하는 것이다.The speech information recognition module 160 searches the counted text as shown in Table 1 in the sentiment word dictionary, and determines the emotional state of the participant based on this. That is, as the number of texts belonging to the emotional category increases, the speech information recognition module 160 determines the emotional state of the corresponding participant as'sensibility'.
또한, 발언정보 인식모듈(160)은, 참여자의 감성 상태와 동일한 감성 카테고리의 텍스트가 확인되면, 해당 감성 카테고리의 텍스트 카운팅에 가중치를 주어 텍스트의 횟수를 조정하고, 이를 통해 참여자의 감성 상태에 가중치가 부여된다. 따라서 다른 감성 카테고리의 텍스트가 일시적으로 증가했어도 발언정보 인식모듈(160)은 참여자의 감성 상태가 마치 급변한 것처럼 인식하지 않는다.In addition, the speech information recognition module 160, when the text of the emotion category identical to the emotion state of the participant is identified, adjusts the number of texts by weighting the text counting of the emotion category, and weights the emotion state of the participant through this. Is given. Therefore, even if texts of other emotional categories have been temporarily increased, the speech information recognition module 160 does not recognize as if the emotional state of the participant has changed suddenly.
한편, 해당 텍스트 정보를 발언할 때에 참여자가 일반 모드의 감성 상태라면, 해당 참여자는 상기 텍스트 정보의 분야와 관련해서 관심도가 높지 않은 것으로 판단할 수 있다. 또한, 해당 텍스트 정보를 발언할 시에 참여자가 부정 모드의 감성 상태라면, 해당 참여자는 상기 텍스트 정보의 분야와 관련해서 관심도가 높으면서도 반대 입장인 것으로 판단할 수 있다. 또한, 텍스트 정보를 발언할 시에 참여자가 감성 모드의 감성 상태라면, 해당 참여자는 상기 텍스트 정보의 분야와 관련해서 관심도가 높으면서도 긍정적인 입장인 것으로 판단할 수 있다. 또한, 텍스트 정보를 발언할 시에 참여자가 무기력 모드의 감성 상태라면, 해당 참여자는 상기 텍스트 정보의 분야와 관련해서 관심도가 낮으면서도 해당 회의의 전체적인 주제에 대해서도 무관심한 것으로 판단할 수 있다.On the other hand, if the participant is in the emotional state of the general mode when speaking the text information, the participant may determine that the interest in the field of the text information is not high. In addition, if the participant is in an emotional state in the negative mode when speaking the corresponding text information, the participant may determine that the participant has a high interest in the field of the text information and has an opposite position. In addition, if the participant is in the emotional state of the emotional mode when speaking the text information, the participant may determine that the participant has a high interest and a positive position in relation to the field of the text information. In addition, if the participant is in the emotional state of the helpless mode when speaking the text information, the participant may determine that the participant has low interest in the field of the text information and is indifferent to the overall topic of the meeting.
이외에도 전술한 바와 같이 수집 성량데이터의 음향신호 길이를 참고해서 해당 분야에 대한 참여자의 관심도를 판단할 수 있다. 발언정보 인식모듈(160)은 발언량 및 발언속도 등에 해당하는 음향신호 길이가 길수록 수집 성량데이터의 텍스트마이닝을 보다 정밀하게 진행하고, 해당 분야에 대한 토픽 분석을 좀 더 구체적으로 할 수 있다.In addition, as described above, the degree of interest of the participant in the field may be determined by referring to the length of the sound signal of the collection volume data. The speech information recognition module 160 may perform text mining of the collection volume data more precisely as the length of the sound signal corresponding to the amount of speech and speech speed is longer, and analyze the topic in the relevant field in more detail.
발언정보 인식모듈(160)은 참여자별 발언량과 발언시간을 연산해서 발언속도를 확인하고, 참여자별 발언량에 가중치를 부여한다. 결국, 참여자별 발언 텍스트의 카운팅 수는 부여된 가중치 만큼 증가하거나 감소한다.The speech information recognition module 160 checks the speech speed by calculating the speech amount and speech time for each participant, and assigns a weight to the speech amount for each participant. Eventually, the number of speech texts counted for each participant increases or decreases by the assigned weight.
또한, 발언정보 인식모듈(160)은 성량데이터에 따라 참여자별 발언 텍스트의 카운팅 수에 가중치를 주어서, 참여자의 텍스트별 텍스트별 발언 횟수를 조정한다.In addition, the speech information recognition module 160 adjusts the number of speeches of each participant for each text by giving a weight to the counting number of speech texts for each participant according to the volume data.
발언정보 인식모듈(160)은, 상기 주요 키워드와 분야와 참여자의 관심도를 조합한 결과물을 딥러닝 학습을 통해 분석해서 상기 참여자의 관심분야에 해당하는 관련 토픽을 추론하며, 상기 참여자의 다음 발언 분야를 검색한다. 이를 좀 더 구체적으로 설명하면, 발언정보 인식모듈(160)은 참여자가 회의 중에 발언한 텍스트 정보의 주요 키워드와 분야와 관심도를 조합한 결과물은 물론, 이전 회의 중에서 수집된 결과물들을 딥러닝 학습으로 분석해서, 상기 참여자가 어떤 분야에 관심을 가지며 발언 내용은 어떻게 변해가는지를 파악하고, 이렇게 파악된 학습 내용을 통해 상기 참여자가 향후 발언할 키워드와 분야와 주제 등을 예측한다. 또한 발언정보 인식모듈(160)은 예측된 사항에 대한 각종 빅데이터를 검색장치(200)를 통해 검색하고, 그 결과물을 상기 참여자에게 추천한다. 발언정보 인식모듈(160)이 추천한 빅데이터는 회의 중에 이루어질 수 있고, 회의 중이 아닌 일상에도 상기 참여자의 이메일 또는 문자 등의 각종 통신 매체를 통해 이루어질 수 있다.The speech information recognition module 160 analyzes the result of the combination of the main keyword, the field and the participant's interest level through deep learning learning to infer a related topic corresponding to the participant's interest, and the participant's next speech field Search for To explain this in more detail, the speech information recognition module 160 analyzes the results of combining the major keywords, fields and interests of the text information spoken by the participant during the meeting, as well as the results collected during the previous meeting through deep learning learning. In this way, the participant is interested in what field and how the contents of speech are changing, and predicts keywords, fields, and topics that the participant will speak in the future through the learned contents. In addition, the speech information recognition module 160 searches through the search device 200 for various kinds of big data on the predicted items, and recommends the result to the participants. Big data recommended by the speech information recognition module 160 may be performed during a meeting, and may be performed through various communication media such as e-mails or text messages of the participant even in daily life other than during a meeting.
또한 발언정보 인식모듈(160)은, 전술한 가중치 부여를 통해서 참여자별로 분명한 감성 상태와 관심도 등을 파악할 수 있다. 이를 좀 더 구체적으로 설명하면, 발언정보 인식모듈(160)은, 전술한 문헌용어행렬 데이터를 기반으로 유사 내용들의 토픽을 발견(토픽모델링 기법)한다. 토픽모델링 기법은 Latent Semantic Indexing(LSI)와, Probabilistic Latent Semantic Indexing(PLSI)와, Latent Dirichlet Allocation(LDA) 등이 있으며, 본 실시의 발언정보 인식모듈(160)은 LDA 알고리즘으로 참여자의 토픽을 추론한다. 예를 들어 설명하면, 문서1과 문서2가 주제는 유사해도 각 문서에 등장하는 단어의 종류나 사용 빈도는 다를 수 있다. 따라서 단순한 키워드 모델만으로는 문서 간에 유사도를 계산하거나 주제를 분류하는데에도 한계가 있으므로, 문서1과 문서2가 동일/유사한 주제임을 인지할 수 없다.In addition, the speech information recognition module 160 may grasp a clear emotional state and interest level for each participant through the above-described weighting. To explain this in more detail, the speech information recognition module 160 discovers a topic of similar contents based on the above-described document term matrix data (topic modeling technique). Topic modeling techniques include Latent Semantic Indexing (LSI), Probabilistic Latent Semantic Indexing (PLSI), and Latent Dirichlet Allocation (LDA). do. For example, although document 1 and document 2 have similar subjects, the types and frequency of use of words appearing in each document may be different. Therefore, the simple keyword model alone has limitations in calculating similarity between documents or classifying subjects, so it is not possible to recognize that document 1 and document 2 are the same/similar subject.
발언정보 인식모듈(160)은, 참여자의 발언 텍스트에 대응한 토픽을 검색하기 위해서, 텍스트별 기초값(벡터값)인 α와 β를 찾고, 문서별 Θ를 계산하며, 상기 Θ를 활용하여 텍스트별 유사도 계산 및 분류 작업을 진행한다. 즉, 발언정보 인식모듈(160)은 단어 임베딩 기법을 기반으로 텍스트별 벡터값을 설정해서 두 단어 간에 유사도를 연산하는 것이다. 이를 위해서 발언정보 인식모듈(160)은 Word2Vec이나 GloVe, FastText 등과 같은 기법들이 적용된다. 따라서 발언정보 인식모듈(160)은 참여자의 발언 내용에 대응하는 토픽을 발견하고, 상기 토픽에 해당하는 관련 문헌을 검색해 제시한다. 참고로, 문서는 한 분야의 토픽에만 속하는 것이 아니라 다양한 분야에 속할 수 있으므로, 동일한 문서라도 참여자의 문헌용어행렬 데이터에 대응하는 분야의 토픽 관련 문서로 제시될 수 있다.The speech information recognition module 160, in order to search for a topic corresponding to the speech text of a participant, finds α and β, which are basic values (vector values) for each text, calculates Θ for each document, and uses the Θ Calculate and classify the similarity of stars. That is, the speech information recognition module 160 calculates the similarity between two words by setting a vector value for each text based on a word embedding technique. To this end, the speech information recognition module 160 employs techniques such as Word2Vec, GloVe, and FastText. Accordingly, the speech information recognition module 160 finds a topic corresponding to the speech content of the participant, and searches and presents the related document corresponding to the topic. For reference, since a document does not belong only to a topic in one field but may belong to various fields, even the same document may be presented as a topic-related document in a field corresponding to the document term matrix data of the participant.
데이터 처리모듈(170)은, 음성인식모듈(130)과 성량인식모듈(140)과 참여자 인식모듈(150)과 발언정보 인식모듈(160)과 프린팅 모듈(180) 간에 데이터 통신을 중계하면서 설정 프로세스에 따라 상기 모듈들(130, 140, 150, 160, 180)의 동작을 제어한다. 데이터 처리모듈(170)은 일반적인 중앙처리장치(CPU)에 해당한다.The data processing module 170 is a setting process while relaying data communication between the voice recognition module 130, the volume recognition module 140, the participant recognition module 150, the speech information recognition module 160 and the printing module 180 The operation of the modules 130, 140, 150, 160, 180 is controlled according to the following. The data processing module 170 corresponds to a general central processing unit (CPU).
도 4는 본 발명에 따른 회의 시스템을 기반으로 동작하는 회의지원 방법을 순차로 도시한 플로차트이고, 도 5는 본 발명에 따른 회의 시스템이 작성한 회의록의 일 예를 도시한 도면이다.4 is a flowchart sequentially showing a conference support method operating based on the conference system according to the present invention, and FIG. 5 is a diagram showing an example of minutes created by the conference system according to the present invention.
도 1 내지 도 5를 참조해서 설명한다.This will be described with reference to FIGS. 1 to 5.
S10; 음향신호 수신 단계S10; Sound signal reception stage
마이크(101)는 회의 참여자의 발언으로 생성된 음향신호를 수신한다. 본 실시 예에서 마이크(101)는 회의실의 테이블(T)에 설치되어서 참여자별로 분배되도록 했다.The microphone 101 receives an acoustic signal generated by the speech of a conference participant. In this embodiment, the microphone 101 is installed on the table T of the conference room so that it is distributed for each participant.
S20; 음향신호 필터링 단계S20; Sound signal filtering step
음성인식모듈(130)은 마이크(101)가 수신한 다수 음향신호의 파형을 확인하고, 확인된 상기 파형에 대응하는 패턴 파형을 음성모델DB(110)에서 검색한다. 검색 결과 상기 패턴 파형에 대응하는 파형의 음향신호를 확인하면, 해당 음향신호를 추출하고 다른 음향신호는 필터링해서 제거한다.The voice recognition module 130 checks the waveforms of the plurality of sound signals received by the microphone 101, and searches the voice model DB 110 for a pattern waveform corresponding to the confirmed waveform. As a result of the search, when an acoustic signal of a waveform corresponding to the pattern waveform is checked, the corresponding acoustic signal is extracted and other acoustic signals are filtered and removed.
상기 필터링을 완료하면, 음성인식모듈(130)은 다수의 음향신호 중에서 가장 큰 주파수, 즉 상대적으로 고주파 범위의 음향신호만을 추출하고, 기타 음향신호는 모두 제거한다.Upon completion of the filtering, the speech recognition module 130 extracts only the largest frequency, that is, a relatively high frequency range of the plurality of sound signals, and removes all other sound signals.
S30; 텍스트 정보 생성 단계S30; Steps to generate text information
음성인식모듈(130)은 음향신호 필터링 단계(S20)에서 최종 추출된 음향신호를 분석해서 파형에 대한 패턴데이터를 생성한다. 또한, 음성인식모듈(130)은 음성모델DB(110)를 검색해서, 상기 패턴데이터에 상응하는 패턴데이터의 텍스트를 확인해서, 음향신호별 해당하는 텍스트를 확인한다. 음성인식모듈(130)은 이렇게 확인된 텍스트를 조합해서 텍스트 정보로 최종 생성한다.The voice recognition module 130 analyzes the sound signal finally extracted in the sound signal filtering step (S20) to generate pattern data for the waveform. In addition, the voice recognition module 130 searches the voice model DB 110, checks the text of the pattern data corresponding to the pattern data, and checks the text corresponding to each sound signal. The speech recognition module 130 combines the confirmed text and finally generates text information.
S40; 수집 성량데이터 변환 단계S40; Steps to convert collected volume data
성량인식모듈(140)은 음향신호 필터링 단계(S20)에서 최종 추출된 음향신호를 분석해서 수집 성량데이터로 변환한다.The volume recognition module 140 analyzes the sound signal finally extracted in the sound signal filtering step (S20) and converts it into collection volume data.
상기 음향신호는 파동의 파형과 주파수와 진폭과 데시벨과 파장 등의 정보를 포함하므로, 성량인식모듈(140)은 상기 정보를 조합해서 수집 성량데이터를 생성한다.Since the acoustic signal includes information such as wave waveform, frequency, amplitude, decibel and wavelength, the volume recognition module 140 combines the information to generate collection volume data.
S50; 참여자 인식 단계S50; Participant Recognition Stage
참여자 인식모듈(150)은 상기 수집 성량데이터에 상응하는 기준 성량데이터를 참여자정보DB(120)에서 검색하여, 해당 음향신호를 발언한 참여자를 인식한다.The participant recognition module 150 searches the participant information DB 120 for reference volume data corresponding to the collected volume data, and recognizes the participant who spoke the corresponding sound signal.
상기 기준 성량데이터는 전술한 바 있으므로, 여기서는 그 설명을 생략한다.Since the reference volume data has been described above, a description thereof is omitted here.
S60; 감성 카테고리 정보 확인 단계S60; Steps to check emotional category information
성량인식모듈(140)은, 참여자 인식 단계(S50)에서 확인된 참여자의 기준 성량데이터를 상기 수집 성량데이터와 비교해서, 상기 수집 성량데이터에 상응하는 감성 카테고리 정보를 확인한다. The volume recognition module 140 compares the reference volume data of the participant identified in the participant recognition step S50 with the collection volume data, and checks emotion category information corresponding to the collection volume data.
상기 상기 감성 카테고리 정보는, 해당 참여자의 기준 성량데이터를 지정 비율에 따라 분류한 것으로서, 수집 성량데이터 대비 기준 성량데이터가 일반 모드의 비율 범위에 속하면, 상기 수집 성량데이터를 발언할 당시의 참여자는 평상시의 감성임을 의미한다. 그런데, 수집 성량데이터 대비 기준 성량데이터가 부정 모드의 비율 범위에 속하면, 상기 수집 성량데이터를 발언할 당시의 참여자는 부정적이거나 흥분 상태의 감성임을 의미한다. 또한 수집 성량데이터 대비 기준 성량데이터가 감성 모드의 비율 범위에 속하면, 상기 수집 성량데이터를 발언할 당시의 참여자는 긍정적이거나 관심 상태의 감성임을 의미한다. 이외에도 수집 성량데이터 대비 기준 성량데이터가 무기력 모드의 비율 범위에 속하면, 상기 수집 성량데이터를 발언할 당시의 참여자는 집중도가 낮거나 무력감 상태의 감성임을 의미한다.The emotion category information is classified according to a specified ratio of the reference volume data of the corresponding participant, and if the standard volume data to the collection volume data falls within the ratio range of the general mode, the participant at the time of remarking the collection volume data It means that it is an ordinary emotion. By the way, if the reference volume data to the collected volume data falls within the ratio range of the negative mode, it means that the participant at the time of remarking the collected volume data is a negative or excited emotion. In addition, if the reference volume data to the collected volume data falls within the range of the ratio of the emotion mode, it means that the participant at the time of remarking the collected volume data is positive or a sensibility in a state of interest. In addition, if the reference volume data to the collected volume data falls within the ratio of the helpless mode, it means that the participant at the time of remarking the collected volume data has a low concentration or a feeling of helplessness.
S70; 발언정보 생성 단계S70; Steps to generate remark information
발언정보 인식모듈(160)은 상기 텍스트 정보와 감성 카테고리 정보를 참여자별로 분류해서 발언정보를 생성한다. 이를 좀 더 구체적으로 설명하면, 음성인식모듈(130)의 텍스트 정보와 성량인식모듈(140)의 감성 카테고리 정보와 참여자 인식모듈(150)의 참여자를 하나의 셋(Set)으로 분류해서 회의 시점부터 종점까지 시간 순으로 수집해서 발언정보로 최종 생성한다. 발언정보 인식모듈(160)은 상기 발언정보를 프린팅 모듈(180)에 전달하고, 프린팅 모듈(180)은 수신된 발언정보를 참여자가 시각적으로 인식할 수 있도록 텍스트로 프린팅한다. 이렇게 프린팅된 발언정보는 디스플레이(300) 또는 프린터(미도시 함) 또는 저장매체(미도시 함)를 통해 도 5와 같이 출력되거나 저장된다. 또한 발언정보 인식모듈(160)은 상기 텍스트 정보와 감성 카테고리 정보를 참여자별로 분류한 상기 셋(Set)을 프린팅 모듈(180)에 실시간으로 전달해서, 상기 셋(Set) 단위의 문구가 디스플레이(300)를 통해 출력되도록 할 수 있다.The speech information recognition module 160 generates speech information by classifying the text information and the emotional category information for each participant. In more detail, the text information of the voice recognition module 130, the emotional category information of the volume recognition module 140, and the participants of the participant recognition module 150 are classified into one set, from the time of the meeting. It is collected in chronological order to the end point and finally generated as speech information. The speech information recognition module 160 transmits the speech information to the printing module 180, and the printing module 180 prints the received speech information as text so that the participant can visually recognize the speech information. The printed speech information is output or stored as shown in FIG. 5 through a display 300, a printer (not shown), or a storage medium (not shown). In addition, the speech information recognition module 160 transmits the text information and the emotional category information classified by participants to the printing module 180 in real time, so that the phrase in the set unit is displayed 300 ) To print.
도 6은 본 발명에 따른 회의 시스템에 구성된 회의실 관리장치의 구성유닛을 도시한 블록도이다.6 is a block diagram showing a constituent unit of a conference room management apparatus configured in a conference system according to the present invention.
도 2와 도 6을 참조하면, 회의실 관리장치(600)는, 회의실을 촬영하는 카메라(620)와, 카메라(620)의 촬영영상을 비교 분석해서 회의실의 변화 여부를 확인하는 영상분석모듈(630)과, RGB 컬러를 발현하는 투명전광유리 형태의 스마트글라스(640)와, 회의실의 실내 환경을 센싱하는 환경상태 확인모듈(650)과, 회의실에 구비된 의자(C)에 설치되어서 착석한 참여자의 자세를 인식하는 체어센서(660)와, 의자(C)의 진동을 가하는 바이브레이터(670)와, 회의 일정과 참여자 등의 일정을 관리하는 스케쥴 관리모듈(680)을 포함한다.2 and 6, the conference room management apparatus 600 is an image analysis module 630 that compares and analyzes a camera 620 for photographing a conference room and an image captured by the camera 620 to check whether the conference room is changed. ), a transparent all-glass type smart glass 640 that expresses RGB color, an environmental condition check module 650 that senses the indoor environment of a conference room, and a participant installed and seated in a chair (C) provided in the conference room It includes a chair sensor 660 for recognizing the posture of the chair, a vibrator 670 for applying vibration of the chair C, and a schedule management module 680 for managing schedules of meetings and participants.
각 구성에 대해 좀 더 구체적으로 설명하면, 영상분석모듈(630)은 카메라(620)에서 촬영된 영상을 동영상 파일로 저장한다. 또한, 상기 동영상을 시간 순으로 비교해서 참여자의 행동 패턴과 기타 상태 등을 파악하고, 테이블(T)에 놓인 각종 물품을 확인해서 회의 종료 후에도 잔존하는 물품의 유무를 확인한다. In more detail for each configuration, the image analysis module 630 stores the image captured by the camera 620 as a video file. In addition, by comparing the videos in chronological order, the participants' behavioral patterns and other states are identified, and various items placed on the table T are checked to confirm the presence or absence of items remaining after the meeting.
스마트글라스(340)는 회의실을 구획한 벽면체를 구성해서, 회의 시점에는 불투명하게 변색하고, 회의 종료 시에는 다시 투명하게 변색한다. 참고로, 회의 종료 후 영상분석모듈(630)에서 확인된 물품이 잔존하는 것으로 확인되면, 스마트글라스(340)는 불투명상태를 유지시킨다. 따라서 회의 참여자는 확실한 회의 종료를 위해 주의를 환기하게 된다.The smart glass 340 constitutes a wall surface that divides the conference room, so that it becomes opaque at the time of the meeting and becomes transparent again at the end of the meeting. For reference, if it is confirmed that the item identified by the video analysis module 630 remains after the meeting is over, the smart glass 340 maintains an opaque state. Therefore, participants in the meeting are called to their attention to ensure the end of the meeting.
환경상태 확인모듈(650)은, 회의실의 공기 상태, 조도, 온습도, 미세먼지 등의 환경상태를 감지한다. 따라서 환경 개선을 위해서 환풍기, 공기청정기, 가습기 등의 기기를 제어할 수 있도록 한다.The environmental condition checking module 650 detects environmental conditions such as air condition, illumination, temperature and humidity, and fine dust in the conference room. Therefore, in order to improve the environment, it is possible to control devices such as a fan, an air purifier, and a humidifier.
체어센서(660)는 의자(C)에 설치되어서 참여자의 착석 여부를 체크하고, 참여자의 자세 등을 스캐닝해서 현재 참여자의 상태를 추정할 수 있게 한다. 일 예를 들어 설명하면, 의자(C)의 착석면에 압력계 등을 설치해서, 참여자의 착석 상태가 편중되었는지 여부를 판단한다. 따라서 체어센서(660)가 편중 상태임을 확인하면, 해당 참여자는 바른 자세가 아닌 것이므로, 이를 인식해서 경고 등을 할 수 있다.The chair sensor 660 is installed on the chair C to check whether the participant is seated, and scan the participant's posture to estimate the current participant's state. For example, by installing a pressure gauge on the seating surface of the chair C, it is determined whether the seating state of the participant is biased. Therefore, if it is confirmed that the chair sensor 660 is in a biased state, the corresponding participant is not in a correct posture, so it is possible to recognize this and give a warning.
바이브레이터(670)는 의자(C)에 설치되어서 참여자의 상태 등에 따라 진동을 가한다. 따라서 착석한 참여자는 무기력감에서 탈피해서 회의에 집중할 수 있다.The vibrator 670 is installed on the chair C to apply vibration according to the state of the participant. Therefore, the seated participant can focus on the meeting without feeling helpless.
스케쥴 관리모듈(680)은 회의실을 사용할 참여자들의 명단과 시간 및 기타 회의 일정 등을 관리하고, 디스플레이(300)를 통해 출력할 수 있다.The schedule management module 680 may manage a list of participants who will use the conference room, time, and other conference schedules, and output through the display 300.
도 7은 본 발명에 따른 회의 시스템의 다른 실시 예를 도시한 블록도이다.7 is a block diagram showing another embodiment of a conference system according to the present invention.
본 발명에 따른 회의 시스템(10)은, 회원정보와, 회원별 관심분야와 관심도에 대한 관심정보를 저장하는 관심정보 저장장치(700); 발언정보 인식모듈(160)에서 파악한 해당 참여자의 관심분야의 관심도가 높은 회원정보를 관심정보 저장장치에서 검색해서 디스플레이(300)를 통해 출력되도록 하는 검색장치(200);를 더 포함한다.The conference system 10 according to the present invention includes: an interest information storage device 700 for storing member information and interest information on interest fields and interest levels for each member; It further includes a search device 200 for retrieving the member information of the participant having a high degree of interest in the field of interest of the participant identified by the speech information recognition module 160 in the interest information storage device and outputting it through the display 300.
관심정보 저장장치(700)는, 현재 등록된 회원의 개인정보를 회원정보로 저장하고, 회의 과정에서 확인된 회원의 관심분야와 관심도를 저장한다. 따라서 회원별로 관심분야가 무엇인지, 그리고 관심분야의 관심도가 어떠한지를 저장해 관리할 수 있다.The interest information storage device 700 stores the currently registered member's personal information as member information, and stores the member's interest field and degree of interest confirmed in the course of a meeting. Therefore, it is possible to store and manage what the members are interested in and what interests are of interest.
전술한 바와 같이 발언정보 인식모듈(160)은 참여자의 발언정보를 기반으로 참여자의 관심분야와 관심도를 파악할 수 있으므로, 검색장치(200)는 해당 참여자의 관심분야와 관심도를 확인해서 동일한 분야에 관심을 갖는 다른 회원을 검색하고 추천한다. 따라서 참여자는 추전된 다른 회원과 만나 회의를 진행할 수 있고, 이를 통해서 해당 분야에 대해 좀 더 발전된 결과물을 개발할 수 있다.As described above, since the speech information recognition module 160 can determine the participant's interest and interest based on the participant's speech information, the search device 200 checks the interest and interest of the participant and is interested in the same field. Search and recommend other members who have. Therefore, participants can meet with other recommended members to conduct meetings, and through this, they can develop more advanced results in the field.
앞서 설명한 본 발명의 상세한 설명에서는 본 발명의 바람직한 실시 예들을 참조해 설명했지만, 해당 기술분야의 숙련된 당업자 또는 해당 기술분야에 통상의 지식을 갖는 자라면 후술될 특허청구범위에 기재된 본 발명의 사상 및 기술영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.In the detailed description of the present invention described above, it has been described with reference to preferred embodiments of the present invention, but those skilled in the art or those of ordinary skill in the relevant technical field, the spirit of the present invention described in the claims to be described later. And it will be understood that various modifications and changes can be made to the present invention within a range not departing from the technical field.

Claims (9)

  1. 텍스트의 패턴데이터를 저장하는 음성모델DB;A speech model DB for storing text pattern data;
    참여자의 기준 성량데이터를 저장하는 참여자정보DB;Participant information DB for storing standard volume data of participants;
    수신된 음향신호를 패턴데이터로 변환하고, 상기 패턴데이터에 해당하는 텍스트를 음성모델DB에서 검색하여 텍스트 정보를 생성하는 음성인식모듈;A speech recognition module for converting the received sound signal into pattern data, searching for text corresponding to the pattern data in a speech model DB, and generating text information;
    상기 음향신호의 성량을 분석해서 수집 성량데이터로 변환하고, 상기 수집 성량데이터의 감성 카테고리 정보를 확인하는 성량인식모듈;A volume recognition module that analyzes the volume of the sound signal, converts it into collection volume data, and checks emotional category information of the collection volume data;
    상기 수집 성량데이터에 상응하는 기준 성량데이터를 참여자정보DB에서 검색하여 참여자를 식별하는 참여자 인식모듈; 및A participant recognition module for identifying a participant by searching for reference volume data corresponding to the collected volume data in a participant information DB; And
    상기 텍스트 정보와 감성 카테고리 정보를 참여자별로 분류해서 발언정보를 생성하는 발언정보 인식모듈;A speech information recognition module for generating speech information by classifying the text information and emotion category information for each participant;
    을 구비한 음성분석장치를 포함하는 것을 특징으로 하는 로보틱 처리 자동화를 위한 회의 시스템.Conference system for robotic processing automation comprising a voice analysis device having a.
  2. 제 1 항에 있어서, 상기 음성인식모듈은,The method of claim 1, wherein the voice recognition module,
    수신된 다수의 음향신호 중에서 지정된 패턴 파형들을 유지하는 음향신호가 검출되도록 1차 필터링하고, 상기 1차 필터링된 음향신호 중에서 상대적으로 고주파 범위의 음향신호가 검출되도록 2차 필터링하여 패턴데이터로 변환하는 것;Primary filtering is performed to detect an acoustic signal maintaining a specified pattern waveform among a plurality of received acoustic signals, and secondary filtering is performed to detect an acoustic signal in a relatively high frequency range among the primary filtered acoustic signals to convert into pattern data. that;
    을 특징으로 하는 로보틱 처리 자동화를 위한 회의 시스템.Conference system for automation of robotic processing, characterized in that.
  3. 제 2 항에 있어서,The method of claim 2,
    상기 감성 카테고리 정보는, 해당 참여자의 기준 성량데이터를 지정 비율에 따라 분류한 것이고;The emotional category information is classified according to a designated ratio of the reference volume data of the corresponding participant;
    상기 성량인식모듈은, 상기 2차 필터링된 음향신호를 수집 성량데이터로 연산해서 해당 참여자의 기준 성량데이터와 비교해 확인하는 것;The volume recognition module is configured to calculate the second-filtered sound signal as collection volume data and compare it with the reference volume data of a corresponding participant to confirm;
    을 특징으로 하는 로보틱 처리 자동화를 위한 회의 시스템.Conference system for automation of robotic processing, characterized in that.
  4. 제 3 항에 있어서,The method of claim 3,
    상기 기준 성량데이터는, 음향신호의 데시벨과 주파수와 진폭 중 선택된 하나 이상을 구성하는 것;The reference volume data constitutes at least one selected from a decibel of an acoustic signal and a frequency and amplitude;
    을 특징으로 하는 로보틱 처리 자동화를 위한 회의 시스템.Conference system for automation of robotic processing, characterized in that.
  5. 제 1 항에 있어서, 상기 발언정보 인식모듈은, The method of claim 1, wherein the speech information recognition module,
    상기 텍스트 정보를 분석해서 주요 키워드와 분야를 확인하고, 상기 텍스트 정보에 대한 해당 참여자의 감성 카테고리 정보를 확인해서 상기 주요 키워드와 분야에 대한 상기 상기 참여자의 관심도를 파악하는 것;Analyzing the text information to identify key keywords and fields, and identifying the participant's interest in the key keywords and fields by checking the emotional category information of the corresponding participant with respect to the text information;
    을 특징으로 하는 로보틱 처리 자동화를 위한 회의 시스템.Conference system for automation of robotic processing, characterized in that.
  6. 제 5 항에 있어서,상기 발언정보 인식모듈은, The method of claim 5, wherein the speech information recognition module,
    상기 주요 키워드와 분야와 참여자의 관심도를 조합한 결과물을 딥러닝 학습을 통해 분석해서 상기 참여자의 관심분야를 추론하며, 상기 참여자의 발언 분야를 검색하는 것;Analyzing the result of the combination of the main keyword, the field, and the participant's interest level through deep learning learning to infer the participant's interests, and searching for the participant's speech field;
    을 특징으로 하는 로보틱 처리 자동화를 위한 회의 시스템.Conference system for automation of robotic processing, characterized in that.
  7. 제 6 항에 있어서,The method of claim 6,
    회원정보와, 회원별 관심분야와 관심도에 대한 관심정보를 저장하는 관심정보 저장장치; 및An interest information storage device for storing member information and interest information for each member's interest field and interest level; And
    상기 발언정보 인식모듈에서 파악한 해당 참여자의 관심분야의 관심도가 상대적으로 높은 회원정보를 관심정보 저장장치에서 검색해서 디스플레이를 통해 출력되도록 하는 검색장치;A retrieval device for retrieving member information with a relatively high interest in the field of interest of the participant identified by the remark information recognition module in an interest information storage device and outputting it through a display;
    를 더 포함하는 것을 특징으로 하는 로보틱 처리 자동화를 위한 회의 시스템.Conference system for automating robotic processing, characterized in that it further comprises.
  8. 제 1 항에 있어서, 상기 발언정보 인식모듈은, The method of claim 1, wherein the speech information recognition module,
    상기 텍스트 정보의 텍스트가 속하는 감성 카테고리 정보를 확인하고, 상기 텍스트를 감성 카테고리별로 카운팅해서 해당 참여자의 감성 상태를 확인하는 것;Checking emotional category information to which the text of the text information belongs, and counting the text for each emotional category to check the emotional state of the corresponding participant;
    을 특징으로 하는 로보틱 처리 자동화를 위한 회의 시스템.Conference system for automation of robotic processing, characterized in that.
  9. 제 8 항에 있어서, 상기 발언정보 인식모듈은,The method of claim 8, wherein the speech information recognition module,
    상기 참여자의 감성 상태와, 상기 성량인식모듈이 확인한 참여자별 발언속도와, 상기 수집 성량데이터 중 선택된 하나 이상에 따라, 상기 참여자가 발언한 텍스트의 카운팅 횟수에 가중치를 부여하는 것;Assigning a weight to the number of counting texts spoken by the participant according to the emotional state of the participant, the speech speed for each participant checked by the volume recognition module, and at least one selected from the collection volume data;
    을 특징으로 하는 로보틱 처리 자동화를 위한 회의 시스템.Conference system for automation of robotic processing, characterized in that.
PCT/KR2019/005694 2019-04-25 2019-05-13 Smart conference system based on 5g communication and conference support method using robotic processing automation WO2020218664A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2019-0048308 2019-04-25
KR1020190048308A KR102061291B1 (en) 2019-04-25 2019-04-25 Smart conferencing system based on 5g communication and conference surpporting method by robotic and automatic processing

Publications (1)

Publication Number Publication Date
WO2020218664A1 true WO2020218664A1 (en) 2020-10-29

Family

ID=69051517

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2019/005694 WO2020218664A1 (en) 2019-04-25 2019-05-13 Smart conference system based on 5g communication and conference support method using robotic processing automation

Country Status (2)

Country Link
KR (1) KR102061291B1 (en)
WO (1) WO2020218664A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112839195A (en) * 2020-12-30 2021-05-25 深圳市皓丽智能科技有限公司 Method and device for consulting meeting record, computer equipment and storage medium
CN113691382A (en) * 2021-08-25 2021-11-23 平安国际智慧城市科技股份有限公司 Conference recording method, conference recording device, computer equipment and medium
CN117174091A (en) * 2023-09-07 2023-12-05 河南声之美电子科技有限公司 Intelligent meeting record generation system and device based on role recognition

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102193654B1 (en) * 2020-01-20 2020-12-21 권경애 Service providing system and method for record reflecting consulting situation
KR102293903B1 (en) * 2020-10-26 2021-08-24 박영규 Method of providing extended information of books using printed optical codes and computer program therefor

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20030038069A (en) * 2001-11-08 2003-05-16 엘지전자 주식회사 noise elimination system of the telephone handset and controlling method therefore
KR100406307B1 (en) * 2001-08-09 2003-11-19 삼성전자주식회사 Voice recognition method and system based on voice registration method and system
JP2005277462A (en) * 2004-03-22 2005-10-06 Fujitsu Ltd Conference support system, proceeding forming method, and computer program
KR20060094343A (en) * 2005-02-24 2006-08-29 에스케이 텔레콤주식회사 Service system and method of emotion expressing using animation for video call and mobile communication terminal therefor
KR101818980B1 (en) * 2016-12-12 2018-01-16 주식회사 소리자바 Multi-speaker speech recognition correction system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100406307B1 (en) * 2001-08-09 2003-11-19 삼성전자주식회사 Voice recognition method and system based on voice registration method and system
KR20030038069A (en) * 2001-11-08 2003-05-16 엘지전자 주식회사 noise elimination system of the telephone handset and controlling method therefore
JP2005277462A (en) * 2004-03-22 2005-10-06 Fujitsu Ltd Conference support system, proceeding forming method, and computer program
KR20060094343A (en) * 2005-02-24 2006-08-29 에스케이 텔레콤주식회사 Service system and method of emotion expressing using animation for video call and mobile communication terminal therefor
KR101818980B1 (en) * 2016-12-12 2018-01-16 주식회사 소리자바 Multi-speaker speech recognition correction system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112839195A (en) * 2020-12-30 2021-05-25 深圳市皓丽智能科技有限公司 Method and device for consulting meeting record, computer equipment and storage medium
CN112839195B (en) * 2020-12-30 2023-10-10 深圳市皓丽智能科技有限公司 Conference record consulting method and device, computer equipment and storage medium
CN113691382A (en) * 2021-08-25 2021-11-23 平安国际智慧城市科技股份有限公司 Conference recording method, conference recording device, computer equipment and medium
CN117174091A (en) * 2023-09-07 2023-12-05 河南声之美电子科技有限公司 Intelligent meeting record generation system and device based on role recognition

Also Published As

Publication number Publication date
KR102061291B1 (en) 2019-12-31

Similar Documents

Publication Publication Date Title
WO2020218664A1 (en) Smart conference system based on 5g communication and conference support method using robotic processing automation
CN108346034B (en) Intelligent conference management method and system
EP0495622A2 (en) Indexing of data sets
US5787414A (en) Data retrieval system using secondary information of primary data to be retrieved as retrieval key
US6687671B2 (en) Method and apparatus for automatic collection and summarization of meeting information
WO2018128238A1 (en) Virtual consultation system and method using display device
JP3895892B2 (en) Multimedia information collection management device and storage medium storing program
JP2019095552A (en) Voice analysis system, voice analysis device, and voice analysis program
CN111899740A (en) Voice recognition system crowdsourcing test case generation method based on test requirements
Reidsma et al. Exploiting''Subjective''Annotations
JP4469867B2 (en) Apparatus, method and program for managing communication status
CN114666454A (en) Intelligent conference system
WO2024090713A1 (en) User psychology management system through empathic psychology-based chatbot service
WO2020122291A1 (en) Apparatus and method for automating artificial intelligence-based apartment house management work instructions
Pesarin et al. Conversation analysis at work: detection of conflict in competitive discussions through semi-automatic turn-organization analysis
WO2020213785A1 (en) System for automatically generating text-based sentences on basis of deep learning to achieve improvement related to infinity of utterance patterns
CN116939150B (en) Multimedia platform monitoring system and method based on machine vision
JP3234083B2 (en) Search device
WO2023146030A1 (en) Device, method, and program for interaction based on artificial intelligence in which emotion, concentration degree, and conversation are integrated
Aarts et al. A real-time speech-music discriminator
WO2013147374A1 (en) Method for analyzing video streams using multi-channel analysis
Ronzhin et al. A software system for the audiovisual monitoring of an intelligent meeting room in support of scientific and education activities
US20230066829A1 (en) Server device, conference assistance system, and conference assistance method
JP3622711B2 (en) Video content viewer information providing system and method, viewer information providing apparatus, program, and program recording medium
JP2007199866A (en) Meeting recording system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19926532

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19926532

Country of ref document: EP

Kind code of ref document: A1