KR20220067180A

KR20220067180A - System for Voice recognition based automatic AI meeting record for multi-party video conference and method thereof

Info

Publication number: KR20220067180A
Application number: KR1020200153603A
Authority: KR
Inventors: 김종복
Original assignee: 유엔젤주식회사
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2022-05-24

Abstract

The present invention relates to a system and a method for automatically writing an AI meeting record based on voice recognition during a multi-party video conference. The system according to one embodiment of the present invention includes: a signal processing unit performing call processing via at least one first user terminal and a voice call network and performing call processing via at least one second user terminal and a data network; a media processing unit decoding and mixing data transmitted from the first user terminal and the second user terminal, encoding the mixed data with a codec supported by the first user terminal and the second user terminal, and transmitting the data; an AI processing unit performing voice recognition processing and meeting record writing with respect to voice data transmitted from the first user terminal and the second user terminal; and a service processing unit sharing a meeting record subtitle generated by the AI processing unit with the second user terminal in real time.

Description

System for Voice recognition based automatic AI meeting record for multi-party video conference and method thereof

본 발명은 다자 화상회의 시 음성인식 기반 AI 회의록 자동작성 시스템 및 방법에 관한 것이다.The present invention relates to a system and method for automatically creating AI meeting minutes based on voice recognition during a multi-party video conference.

최근 들어 코로나-19 사태 이후 다양한 산업분야와 일상생활에서 비대면 다자간 화상회의가 급속도로 증가하고 있는 상황이다. 그런데 기존 화상회의 솔루션들은 다자 화상회의, 채팅, 문서 공유 등 기업용 화상회의 기본 기능에 맞춰서 개발된 제품들이 대다수이며, 일부 음성인식 AI 기술을 활용한 AI 회의록 기능은 회의종료 후 생성되는 회의록이 단순한 음성/텍스트 변환기술을 기반의 Text 파일 형태를 제공하여 업무 효율성이 떨어지는 문제점이 있었다.In recent years, since the COVID-19 outbreak, non-face-to-face multi-party video conferencing is rapidly increasing in various industries and daily life. However, most of the existing video conferencing solutions are products developed for the basic functions of corporate video conferencing such as multi-party video conferencing, chatting, and document sharing. / There was a problem in that work efficiency was lowered by providing a text file format based on text conversion technology.

그리고 기존 AI 회의록 제품들은 실시간 자막제공 기능은 제공하고 있지만 자동회의록 작성 기능은 제공하지 않고 있어서 회의 후에 사용자들이 텍스트 파일 형태의 녹취록을 토대로 수동으로 회의록을 작성해야 하는 불편함이 존재하였다.In addition, existing AI meeting minutes products provide real-time captioning, but do not provide automatic meeting minutes writing, so users have to manually write meeting minutes based on text file transcripts after a meeting.

따라서 본 발명이 해결하고자 하는 기술적 과제는 다자 화상회의 시 음성인식 기반 AI 회의록 자동작성 시스템 및 방법을 제공하는 것을 과제로 한다.Therefore, the technical problem to be solved by the present invention is to provide a system and method for automatically creating voice recognition-based AI meeting minutes during multi-party video conference.

상기한 기술적 과제를 해결하기 위한 본 발명의 한 실시예에 따른 다자 화상회의 시 음성인식 기반 AI 회의록 자동작성 시스템은 하나 이상의 제1 사용자 단말과 음성통화망을 통한 호 처리를 수행하고, 하나 이상의 제2 사용자 단말과 데이터망을 통한 호 처리를 수행하는 신호 처리부, 상기 제1 사용자 단말과 상기 제2 사용자 단말로부터 전달되는 데이터를 디코딩하여 믹싱하고, 상기 믹싱된 데이터를 상기 제1 사용자 단말 및 상기 제2 사용자 단말에서 지원하는 코덱으로 인코딩하여 전달하는 미디어 처리부, 상기 제1 사용자 단말과 상기 제2 사용자 단말로부터 전달되는 음성 데이터에 대한 음성 인식 처리 및 회의록 작성을 수행하는 AI 처리부, 그리고 상기 제2 사용자 단말에 상기 AI 처리부에서 생성되는 회의록 자막을 실시간으로 공유하는 서비스 처리부를 포함한다.In order to solve the above technical problem, the system for automatically creating AI meeting minutes based on voice recognition during multi-party video conference according to an embodiment of the present invention performs call processing through one or more first user terminals and a voice communication network, and one or more 2 A signal processing unit that performs call processing through a user terminal and a data network, decodes and mixes data transmitted from the first user terminal and the second user terminal, and mixes the mixed data with the first user terminal and the second user terminal 2 A media processing unit that encodes and delivers a codec supported by a user terminal, an AI processing unit that performs voice recognition processing and meeting minutes preparation for voice data transmitted from the first user terminal and the second user terminal, and the second user and a service processing unit for sharing the captions of the meeting minutes generated by the AI processing unit in the terminal in real time.

상기 AI 처리부는, 상기 회의록의 대화 내용을 자동으로 분석하여 요약한 회의록 요약본을 생성할 수 있다.The AI processing unit may generate a summary of the meeting minutes by automatically analyzing the conversation contents of the meeting minutes.

상기 미디어 처리부는, 상기 제1 사용자 단말과 상기 제2 사용자 단말로부터 전달되는 음성 데이터 및 영상 데이터를 녹화할 수 있다.The media processing unit may record audio data and image data transmitted from the first user terminal and the second user terminal.

상기 서비스 처리부는, 상기 회의록 또는 상기 회의록 요약본에서 선택된 특정 내용에 대응하는 음성 데이터 및 영상 데이터를 사용자 단말의 요청에 따라 제공할 수 있다.The service processing unit may provide audio data and video data corresponding to a specific content selected from the meeting minutes or the meeting minutes summary at the request of the user terminal.

상기 미디어 처리부는, 음성통화망에서 AMR 코덱(Adaptive Multi-Rate Codec)으로 처리되어 전송된 데이터와 데이터망에서 Opus 코덱으로 처리되어 전송된 데이터를 8K 또는 16K 선형(Linear) PCM으로 실시간 변환하여 상기 AI 처리부로 전달할 수 있다.The media processing unit converts data processed by AMR codec (Adaptive Multi-Rate Codec) in the voice communication network and transmitted data processed by Opus codec in the data network into 8K or 16K linear PCM in real time, and the It can be transmitted to the AI processing unit.

상기 미디어 처리부는, SID(Silence Insertion Descriptor) 또는 유실 패킷(Loss Packet)의 경우 디코딩하지 않고 묵음 파일을 복사하고, RTP 패킷이 유실될 경우 이전 패킷을 복사하여 사용하며, RTP 패킷의 순서가 바뀔 경우 디코더에서 이를 보정하여 순서대로 음성을 내보낼 수 있다.The media processing unit copies the silence file without decoding in the case of a Silence Insertion Descriptor (SID) or a Loss Packet, and copies and uses the previous packet when the RTP packet is lost, and when the order of the RTP packet is changed By correcting this in the decoder, the audio can be output in order.

상기 하나 이상의 제1 사용자 단말은 유선 전화 단말 또는 무선 전화 단말일 수 있다.The one or more first user terminals may be a landline telephone terminal or a wireless telephone terminal.

상기 하나 이상의 제2 사용자 단말은 스마트폰, 태블릿 PC, 데스크탑 PC 또는 노트북일 수 있다.The one or more second user terminals may be a smart phone, a tablet PC, a desktop PC, or a notebook computer.

상기 하나 이상의 제2 사용자 단말은 음성 인식 엔진을 구비하고, 사용자 음성 인식 분석 결과를 제공할 수 있다.The one or more second user terminals may include a voice recognition engine, and may provide a user voice recognition analysis result.

상기한 기술적 과제를 해결하기 위한 본 발명의 한 실시예에 따른 다자 화상회의 시 음성인식 기반 AI 회의록 자동작성 방법은, 신호 처리부에서 하나 이상의 제1 사용자 단말과 음성통화망을 통한 호 처리를 수행하고, 하나 이상의 제2 사용자 단말과 데이터망을 통한 호 처리를 수행하는 단계, 미디어 처리부에서 상기 제1 사용자 단말과 상기 제2 사용자 단말로부터 전달되는 데이터를 디코딩하여 믹싱하고, 상기 믹싱된 데이터를 상기 제1 사용자 단말 및 상기 제2 사용자 단말에서 지원하는 코덱으로 인코딩하여 전달하는 단계, AI 처리부에서 상기 제1 사용자 단말과 상기 제2 사용자 단말로부터 전달되는 음성 데이터에 대한 음성 인식 처리 및 회의록 작성을 수행하는 단계, 그리고 서비스 처리부에서 상기 제2 사용자 단말에 상기 AI 처리부에서 생성되는 회의록 자막을 실시간으로 공유하는 단계를 포함한다.In the method for automatically creating voice recognition-based AI meeting minutes during multi-party video conference according to an embodiment of the present invention for solving the above technical problem, the signal processing unit performs call processing through one or more first user terminals and a voice communication network, , performing call processing through a data network with one or more second user terminals, decoding and mixing data transmitted from the first user terminal and the second user terminal in a media processing unit, and mixing the mixed data with the first user terminal Encoding and delivering with a codec supported by the first user terminal and the second user terminal, the AI processing unit performs voice recognition processing and meeting minutes for voice data transmitted from the first user terminal and the second user terminal and sharing, in real time, the caption of the meeting minutes generated by the AI processing unit to the second user terminal in the service processing unit.

본 발명에 의하면, 다자 화상회의 시 음성인식 기반 AI 회의록 자동작성 시스템 및 방법을 제공할 수 있다. 특히 데이터망과 음성통화망을 모두 사용 가능하므로 데이터망의 성능이 떨어지는 지역에 위치한 사용자도 회의에 참여할 수 있는 장점이 있다. 또한 AI 인식으로 디지털화된 정보가 증가함에 따라 AI 데이터 분석 기술 시장의 확대가 가능하며, STT 자막기술, 텍스트 입력, TTS 기능을 활용하여 청각 장애인도 회의에 원활하게 참석할 수 있다.According to the present invention, it is possible to provide a system and method for automatically creating AI meeting minutes based on voice recognition during a multi-party video conference. In particular, since both the data network and the voice communication network can be used, users located in areas with poor data network performance can participate in the conference. In addition, as digitized information increases with AI recognition, the AI data analysis technology market can be expanded, and even the hearing impaired can smoothly attend the meeting by using STT caption technology, text input, and TTS functions.

도 1은 본 발명의 일 실시예에 따른 회의록 자동작성 시스템을 설명하기 위한 도면이다.
도 2는 본 발명에 따른 회의록 자동작성 시스템의 세부 구성도이다.
도 3은 본 발명에 따른 화상회의 및 실시간 자막 화면을 예시한 도면이다.
도 4는 본 발명에 따른 회의록에서 선택된 내용의 음성 재생 화면을 예시한 도면이다.
도 5는 본 발명에 따른 회의록 자동 요약 기능을 설명하기 위한 도면이다.1 is a view for explaining a system for automatically creating meeting minutes according to an embodiment of the present invention.
2 is a detailed configuration diagram of the automatic meeting minutes creation system according to the present invention.
3 is a diagram illustrating a video conference and real-time caption screen according to the present invention.
4 is a diagram illustrating an audio reproduction screen of content selected from meeting minutes according to the present invention.
5 is a view for explaining the automatic meeting minutes summary function according to the present invention.

그러면 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다.Then, with reference to the accompanying drawings, the embodiments of the present invention will be described in detail so that those of ordinary skill in the art can easily implement them.

도 1은 본 발명의 일 실시예에 따른 다자 화상회의 시 음성인식 기반 AI 회의록 자동작성 시스템의 구성도이다.1 is a block diagram of a system for automatically creating voice recognition-based AI meeting minutes during a multi-party video conference according to an embodiment of the present invention.

도 1을 참고하면, 본 발명에 따른 다자 화상회의 시 음성인식 기반 AI 회의록 자동작성 시스템(이하 '회의록 자동작성 시스템'이라 함)(100)은 통신망(10)을 통해 하나 이상의 제1 사용자 단말(200) 및/또는 하나 이상의 제2 사용자 단말(300)과 연결되어 각종 정보 및 데이터를 교환할 수 있다.Referring to FIG. 1 , the voice recognition-based AI meeting minutes automatic creation system (hereinafter referred to as 'meeting minutes automatic creation system') 100 during multi-party video conference according to the present invention 100 includes one or more first user terminals ( 200) and/or one or more second user terminals 300 may be connected to exchange various types of information and data.

통신망(10)은 구내 정보 통신망(local area network, LAN), 도시권 통신망(metropolitan area network, MAN), 광역 통신망(wide area network, WAN), 인터넷, 2G, 3G, 4G, 5G, LTE 이동 통신망, 블루투스, 와이파이(Wi-Fi), 와이브로(Wibro), 위성 통신망 등을 포함할 수 있으며, 통신 방식도 유선, 무선을 가리지 않으며 어떠한 통신 방식이라도 상관없다.The communication network 10 includes a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), the Internet, 2G, 3G, 4G, 5G, LTE mobile communication network, It may include Bluetooth, Wi-Fi, Wibro, satellite communication network, etc., and the communication method does not discriminate between wired and wireless, and any communication method does not matter.

제1 사용자 단말(200)은 회의록 자동작성 시스템(100)을 이용하는 사용자가 이용하는 통신 단말로서, 유선 전화 단말 또는 무선 전화 단말일 수 있다. 그리고 제1 사용자 단말(200)은 음성통화망을 통해 회의록 자동작성 시스템(100)과 연결되어 음성 데이터를 교환할 수 있다.The first user terminal 200 is a communication terminal used by a user who uses the automatic meeting minutes creation system 100 , and may be a wired phone terminal or a wireless phone terminal. In addition, the first user terminal 200 may be connected to the automatic meeting minutes writing system 100 through a voice communication network to exchange voice data.

제2 사용자 단말(300)은 회의록 자동작성 시스템(100)을 이용하는 사용자가 이용하는 통신 단말로서, 데스크탑 PC, 스마트폰, 태블릿 PC(Personal Computer), 노트북 등과 같이 메모리 수단을 구비하고 마이크로프로세서(microprocessor)를 탑재하여 연산 능력을 갖춘 단말기일 수 있다. 제2 사용자 단말(300)은 데이터망을 통해 회의록 자동작성 시스템(100)과 연결되어 음성 데이터 및 영상 데이터를 교환할 수 있다.The second user terminal 300 is a communication terminal used by a user who uses the automatic meeting minutes creation system 100, and includes a memory means such as a desktop PC, a smart phone, a tablet PC (Personal Computer), a notebook computer, and a microprocessor. It may be a terminal equipped with an arithmetic capability. The second user terminal 300 may be connected to the automatic meeting minutes creation system 100 through a data network to exchange voice data and image data.

제1 사용자 단말(200)과 제2 사용자 단말(300)의 사용자 각각은 회의록 자동작성 시스템(100)을 통해 동시에 동일한 회의에 참여할 수 있다. 그리고 제2 사용자 단말(300)의 사용자는 실시간으로 회의록 자동작성 시스템(100)에서 제공하는 회의록 자막을 실시간으로 공유할 수 있으며, 회의 종류 후에도 회의록을 확인하고, 회의 중 녹화된 음성이나 영상을 확인할 수 있다.Each of the users of the first user terminal 200 and the second user terminal 300 may participate in the same meeting at the same time through the automatic meeting minutes creation system 100 . And the user of the second user terminal 300 can share the captions of the meeting minutes provided by the automatic meeting minutes creation system 100 in real time in real time, check the meeting minutes even after the meeting type, and check the voice or video recorded during the meeting can

한편 제1 사용자 단말(200)의 사용자 역시 통화 후 또는 통화 중에 제1 사용자 단말(200)과 별도의 장치, 예컨대 데스크탑 PC, 노트북, 스마트폰 등을 이용하여 마찬가지로 회의록 확인, 회의 중 녹화된 음성이나 영상을 확인할 수 있다.On the other hand, the user of the first user terminal 200 also uses a device separate from the first user terminal 200, for example, a desktop PC, a laptop computer, a smart phone, etc., after a call or during a call, to check the minutes of the meeting, record voices during the meeting, or the like. You can check the video.

도 2는 본 발명에 따른 회의록 자동작성 시스템의 세부 구성도이다.2 is a detailed configuration diagram of the automatic meeting minutes creation system according to the present invention.

도 2를 참고하면, 본 발명에 따른 회의록 자동작성 시스템(100)은 신호 처리부(110), 미디어 처리부(120), AI 처리부(130) 및 서비스 처리부(140)를 포함할 수 있다.Referring to FIG. 2 , the system 100 for automatically creating meeting minutes according to the present invention may include a signal processing unit 110 , a media processing unit 120 , an AI processing unit 130 , and a service processing unit 140 .

신호 처리부(110)는 하나 이상의 제1 사용자 단말(200)과 음성통화망을 통한 호 처리를 수행하고, 하나 이상의 제2 사용자 단말(300)과 데이터망을 통한 호 처리를 수행할 수 있다. 구체적으로 신호 처리부(110)는 사용자 인증, 세션 관리, SIP 호처리, 미디어 처리, WebRTC 연동, 데이터망 및 음성통화망 융합 관리 등을 수행할 수 있다.The signal processing unit 110 may perform call processing with one or more first user terminals 200 and a voice communication network, and may perform call processing with one or more second user terminals 300 and a data network. Specifically, the signal processing unit 110 may perform user authentication, session management, SIP call processing, media processing, WebRTC interworking, data network and voice communication network convergence management, and the like.

미디어 처리부(120)는 제1 사용자 단말(200)과 제2 사용자 단말(300)로부터 전달되는 데이터를 디코딩하여 믹싱하고, 믹싱된 데이터를 제1 사용자 단말(200) 및 제2 사용자 단말(300)에서 지원하는 코덱으로 인코딩하여 전달할 수 있다. 구체적으로 미디어 처리부(120)는 음성/영상 녹취, QoS 관리, 미디어 믹싱, 실시간 미디어 변환, RTP 처리 등을 수행할 수 있다.The media processing unit 120 decodes and mixes data transmitted from the first user terminal 200 and the second user terminal 300 , and mixes the mixed data with the first user terminal 200 and the second user terminal 300 . It can be transmitted by encoding it with a codec supported by . Specifically, the media processing unit 120 may perform voice/video recording, QoS management, media mixing, real-time media conversion, RTP processing, and the like.

미디어 처리부(120)는 음성 인식 음성통화망에서 AMR 코덱(Adaptive Multi-Rate Codec)으로 처리되어 전송된 데이터와 데이터망에서 Opus 코덱으로 처리되어 전송된 데이터를, AI 처리부(130)에서 인식할 수 있도록 8K 또는 16K 선형(Linear) PCM으로 실시간 변환하여 AI 처리부(130)로 전달할 수 있다. 또한 AI 처리부(130)에서 전달된 데이터를 역변환할 수도 있다.The media processing unit 120 may recognize the data transmitted by processing with the AMR codec (Adaptive Multi-Rate Codec) in the voice recognition voice communication network and the transmitted data processed by the Opus codec in the data network, and the AI processing unit 130 being able to recognize it. It can be converted to 8K or 16K linear PCM in real time and transmitted to the AI processing unit 130 . In addition, data transmitted from the AI processing unit 130 may be inversely transformed.

미디어 처리부(120)는 트랜스코딩 및 AI 인식 효율을 최적화하기 위해, SID(Silence Insertion Descriptor) 또는 유실 패킷(Loss Packet)의 경우 디코딩하지 않고 묵음 파일을 복사할 수 있다. 그리고 미디어 처리부(120)는 RTP 패킷이 유실될 경우 이전 패킷을 복사하여 사용할 수 있다. 그리고 미디어 처리부(120)는 RTP 패킷의 순서가 바뀔 경우 디코더에서 이를 보정하여 순서대로 음성을 내보낼 수도 있다.In order to optimize transcoding and AI recognition efficiency, the media processing unit 120 may copy a silence file without decoding in the case of a Silence Insertion Descriptor (SID) or a Loss Packet. In addition, when the RTP packet is lost, the media processing unit 120 may copy and use the previous packet. In addition, when the order of the RTP packets is changed, the media processing unit 120 may correct the order of the RTP packets and output the voices in the order.

미디어 처리부(120)는 제1 사용자 단말(200)과 제2 사용자 단말(300)로부터 전달되는 음성 데이터 및 영상 데이터를 녹화할 수 있다.The media processing unit 120 may record audio data and image data transmitted from the first user terminal 200 and the second user terminal 300 .

AI 처리부(130)는 제1 사용자 단말(200)과 제2 사용자 단말(300)로부터 전달되는 음성 데이터에 대한 음성 인식 처리 및 회의록 작성을 수행할 수 있다. 구체적으로 AI 처리부(130)는 인공지능 기반으로 대화 분석(언어 분석, 개체명 추출 등), 회의록 자동 요약(요약 객체 추출, 연관정보 구조화 등), 검색 및 실시간 음성 분석(Speech To Text)(STT) 등을 수행할 수 있다.The AI processing unit 130 may perform voice recognition processing and meeting minutes preparation for voice data transmitted from the first user terminal 200 and the second user terminal 300 . Specifically, the AI processing unit 130 artificial intelligence-based conversation analysis (language analysis, entity name extraction, etc.), automatic summary of meeting minutes (summary object extraction, related information structuring, etc.), search and real-time speech analysis (Speech To Text) (STT) ) and so on.

AI 처리부(130)는 엣지(Edge) 클라우딩 기술을 이용하여 제2 사용자 단말(300)에서 제공하는 사용자 음성 인식 분석 결과를 이용할 수도 있다. 이를 위해 제2 사용자 단말(300)은 경량의 음성 인식 엔진을 구비할 수 있다.The AI processing unit 130 may use the user voice recognition analysis result provided by the second user terminal 300 using edge clouding technology. To this end, the second user terminal 300 may include a lightweight voice recognition engine.

AI 처리부(130)는 회의록의 대화 내용을 자동으로 분석하여 요약한 회의록 요약본을 생성할 수 있다.The AI processing unit 130 may generate a summary of the meeting minutes by automatically analyzing the conversation contents of the meeting minutes.

서비스 처리부(140)는 제2 사용자 단말(300)에 AI 처리부(130)에서 생성되는 회의록 자막을 실시간으로 공유하는 서비스를 제공한다.The service processing unit 140 provides a service for sharing the captions of the meeting minutes generated by the AI processing unit 130 to the second user terminal 300 in real time.

도 3은 본 발명에 따른 화상회의 및 실시간 자막 화면을 예시한 도면이다.3 is a diagram illustrating a video conference and real-time caption screen according to the present invention.

제2 사용자 단말(300)의 사용자는 도 3에 예시한 것과 같은 화상회의 및 실시간 자막 화면을 확인하면서 회의에 참석할 수 있다.The user of the second user terminal 300 may participate in the meeting while checking the video conference and real-time caption screens as illustrated in FIG. 3 .

도 4는 본 발명에 따른 회의록에서 선택된 내용의 음성 재생 화면을 예시한 도면이고, 도 5는 본 발명에 따른 회의록 자동 요약 기능을 설명하기 위한 도면이다.4 is a diagram illustrating an audio playback screen of content selected from the minutes according to the present invention, and FIG. 5 is a diagram for explaining the automatic summary function of the minutes according to the present invention.

서비스 처리부(140)는 도 4 및 도 5에 예시한 것과 같은 화면에서 회의록 또는 회의록 요약본에서 선택된 특정 내용에 대응하는 음성 데이터 및 영상 데이터를 제2 사용자 단말(300)의 요청에 따라 제공하여 출력되게 할 수 있다. 또한 서비스 처리부(140)는 제1 사용자 단말(200)을 이용하여 회의에 참석한 사용자가 데스크탑 PC, 노트북, 스마트폰 등을 이용하여 접속해서 회의록 또는 회의록 요약본을 확인하고 특정 내용에 대응하는 음성 데이터 및 영상 데이터를 요청하면 이를 제공할 수도 있다.The service processing unit 140 provides audio data and image data corresponding to the specific content selected from the meeting minutes or the meeting minutes summary on the screen as illustrated in FIGS. 4 and 5 according to the request of the second user terminal 300 to be output. can do. In addition, the service processing unit 140 uses the first user terminal 200 to allow a user attending a meeting to access using a desktop PC, laptop, smartphone, etc. and image data may be provided upon request.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented by a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the apparatus, methods, and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate (FPGA) array), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or apparatus, to be interpreted by or to provide instructions or data to the processing device. may be permanently or temporarily embody in The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited drawings, those skilled in the art may apply various technical modifications and variations based on the above. For example, the described techniques are performed in an order different from the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

Claims

A signal processing unit that performs call processing through one or more first user terminals and a voice communication network, and performs call processing through one or more second user terminals and a data network,
A media processing unit that decodes and mixes data transmitted from the first user terminal and the second user terminal, and encodes and transmits the mixed data with a codec supported by the first user terminal and the second user terminal;
AI processing unit for performing voice recognition processing and meeting minutes preparation for voice data transmitted from the first user terminal and the second user terminal, and
A service processing unit for sharing the caption of the meeting minutes generated by the AI processing unit to the second user terminal in real time
A voice recognition-based AI meeting minutes automatic creation system during multi-party video conferences, including

In claim 1,
The AI processing unit,
A voice recognition-based AI automatic meeting minutes creation system for multi-party video conferencing that automatically analyzes the conversation contents of the meeting minutes and creates a summary of the meeting minutes.

In claim 2,
The media processing unit,
Recording audio data and video data transmitted from the first user terminal and the second user terminal,
The service processing unit,
A voice recognition-based AI meeting minutes automatic creation system in a multi-party video conference that provides audio and video data corresponding to the specific content selected from the meeting minutes or the meeting minutes summary at the request of a user terminal.

In claim 3,
The media processing unit,
Data processed by AMR codec (Adaptive Multi-Rate Codec) in the voice communication network and transmitted data processed by Opus codec in the data network are converted in real time to 8K or 16K Linear PCM and delivered to the AI processing unit. Voice recognition-based AI automatic meeting minutes creation system during multi-party video conferencing.

In claim 3,
The media processing unit,
In case of Silence Insertion Descriptor (SID) or Loss Packet, copy the silence file without decoding,
If the RTP packet is lost, the previous packet is copied and used.
When the order of RTP packets is changed, the decoder corrects it and sends out voices in order. A voice recognition-based AI meeting minutes automatic writing system during multi-party video conferencing.

In claim 1,
the at least one first user terminal is a landline phone terminal or a wireless phone terminal;
The at least one second user terminal is a smart phone, tablet PC, desktop PC, or notebook computer, a voice recognition-based AI meeting minutes automatic creation system during multi-party video conference.

In claim 1,
The one or more second user terminals are provided with a speech recognition engine and
A voice recognition-based AI automatic meeting minutes creation system during multi-party video conferences that provides user voice recognition analysis results.

In the signal processing unit, performing call processing through one or more first user terminals and a voice communication network, and performing call processing through one or more second user terminals and a data network;
Decoding and mixing data transmitted from the first user terminal and the second user terminal in a media processing unit, encoding the mixed data with a codec supported by the first user terminal and the second user terminal, and delivering ,
performing voice recognition processing and meeting minutes preparation for voice data transmitted from the first user terminal and the second user terminal in the AI processing unit; and
Sharing the captions of the meeting minutes generated by the AI processing unit to the second user terminal in real time by the service processing unit
A method for automatically writing AI meeting minutes based on voice recognition during multi-party video conferences, including

In claim 8,
generating a summary of the meeting minutes by automatically analyzing the conversation contents of the meeting minutes in the AI processing unit
A method for automatically creating voice recognition-based AI meeting minutes during multi-party video conferences further comprising a.

10. In claim 9,
recording audio data and image data transmitted from the first user terminal and the second user terminal in the media processing unit; and
providing, in the service processing unit, audio data and video data corresponding to a specific content selected from the meeting minutes or the meeting minutes summary, according to a request of a user terminal;
A method for automatically creating voice recognition-based AI meeting minutes during multi-party video conferences further comprising a.

11. In claim 10,
In the media processing unit, the data processed by the AMR codec (Adaptive Multi-Rate Codec) in the voice communication network and transmitted data processed by the Opus codec in the data network are converted in real time to 8K or 16K linear PCM, and the AI pass to the processing unit
A method for automatically creating voice recognition-based AI meeting minutes during multi-party video conferences further comprising a.

In claim 11,
In the case of SID (Silence Insertion Descriptor) or Loss Packet, the media processing unit copies the silence file without decoding, and when the RTP packet is lost, the previous packet is copied and used, and when the order of the RTP packet is changed, the decoder corrects this and outputs the voice in order,
the at least one first user terminal is a landline phone terminal or a wireless phone terminal, and the at least one second user terminal is a smart phone, a tablet PC, a desktop PC or a notebook computer,
The at least one second user terminal is provided with a voice recognition engine, and the voice recognition-based AI meeting minutes automatic creation method during a multi-party video conference that provides a user voice recognition analysis result.

A computer-readable recording medium in which a program for executing any one of the methods of claims 7 to 12 is recorded on a computer.