KR20230098077A

KR20230098077A - Method, apparatus and system for signal processing for AI neural network operation

Info

Publication number: KR20230098077A
Application number: KR1020220183660A
Authority: KR
Inventors: 조경환; 박영진; 정계영
Original assignee: 한국전기연구원
Priority date: 2021-12-24
Filing date: 2022-12-23
Publication date: 2023-07-03

Abstract

본 발명은 인공지능 신경망 연산을 위한 신호 처리 방법, 장치 및 시스템에 관한 것으로서, 보다 구체적으로는 엣지 컴퓨팅, IoT 디바이스, 보청기 등의 저전력 기기에서 인공지능 기능을 제공하기 위한 저전력, 저복잡도의 고속 신호 처리 방법, 장치 및 시스템에 관한 것이다.
본 발명에서는, 디지털 신호 처리를 수행하는 하나 이상의 디지털 신호 처리부; 및 상기 디지털 신호 처리부와 연동되어 신경망 연산을 수행하는 인공지능 처리 엔진;을 포함하여 구성되며, 상기 인공지능 처리 엔진에는, 이진 데이터를 처리하는 이진 신경망 처리부; 정수 데이터를 처리하는 정수 신경망 처리부; 및 상기 이진 신경망 처리부 및 상기 정수 신경망 처리부에 대한 제어를 수행하는 코어부;가 구비되는 것을 특징으로 하는 신호 처리 장치를 개시한다.The present invention relates to a signal processing method, apparatus, and system for artificial intelligence neural network calculation, and more specifically, a low-power, low-complexity high-speed signal for providing artificial intelligence functions in low-power devices such as edge computing, IoT devices, and hearing aids. It relates to processing methods, devices and systems.
In the present invention, one or more digital signal processors for performing digital signal processing; and an artificial intelligence processing engine that performs a neural network operation in conjunction with the digital signal processing unit, wherein the artificial intelligence processing engine includes: a binary neural network processing unit that processes binary data; an integer neural network processor processing integer data; and a core unit that controls the binary neural network processing unit and the integer neural network processing unit.

Description

Signal processing method, apparatus and system for artificial intelligence neural network operation {Method, apparatus and system for signal processing for AI neural network operation}

본 발명은 인공지능 신경망 연산을 위한 신호 처리 방법, 장치 및 시스템에 관한 것으로서, 보다 구체적으로는 엣지 컴퓨팅, IoT 디바이스, 보청기 등의 저전력 기기에서 인공지능 기능을 제공하기 위한 저전력, 저복잡도의 고속 신호 처리 방법, 장치 및 시스템에 관한 것이다.The present invention relates to a signal processing method, apparatus, and system for artificial intelligence neural network calculation, and more specifically, a low-power, low-complexity high-speed signal for providing artificial intelligence functions in low-power devices such as edge computing, IoT devices, and hearing aids. It relates to processing methods, devices and systems.

최근, 생체 신호 취득을 위한 웨어러블 단말 기기 및 이를 이용한 건강 관리 서비스 등 다양한 단말 기기와 이를 활용한 서비스가 빠르게 성장하고 있다.Recently, various terminal devices such as wearable terminal devices for obtaining bio-signals and health management services using the wearable terminal devices and services using the same are rapidly growing.

이와 관련하여, 웨어러블 단말의 경우는 의료용에서 더 나아가 일반인을 대상으로 한 웰니스 제품의 개념을 갖춘 제품들이 출시되어 널리 활용되고 있다.In this regard, in the case of wearable terminals, products with the concept of wellness products targeting the general public beyond medical use have been released and are widely used.

보다 구체적으로, 애플의 스마트 시계인 애플 워치와 무선 이어폰인 에어팟은 브랜드 인지도와 우수한 사용성을 바탕으로 고객을 확보하고 건강 관리를 위한 데이터를 지속적으로 확보하고 있어, 향후 상당 기간 해당 분야에서 상당한 경쟁력을 확보할 것으로 예상되고 있다.More specifically, Apple Watch, Apple's smart watch, and AirPods, a wireless earphone, secure customers based on brand awareness and excellent usability and continuously secure data for health management, which will provide significant competitiveness in the field for a considerable period of time in the future. is expected to obtain.

그런데, 현재 이러한 웨어러블 기기를 이용한 건강 관리 서비스는 클라우드 컴퓨팅 기반의 헬스케어 플랫폼을 기반으로 하고 있다.However, current health care services using such wearable devices are based on cloud computing-based health care platforms.

보다 구체적으로, 도 1에서 볼 수 있는 바와 같이, 최근 사물 인터넷 기기를 활용하여, 다양한 라이프 로그 데이터를 수집하고, 수집한 데이터를 클라우드에 전송하여 데이터 기반의 헬스케어를 서비스할 수 있는 플랫폼이 시도되고 있으며, 나아가 웨어러블 기기 및 다양한 유형의 센서를 통해 생성되는 방대한 정보를 기반으로 다양한 서비스가 시도되고 있다.More specifically, as shown in FIG. 1, a platform capable of providing data-based healthcare services by collecting various life log data and transmitting the collected data to the cloud by utilizing recent IoT devices has been attempted. Furthermore, various services are being attempted based on the vast amount of information generated through wearable devices and various types of sensors.

이와 관련하여, 최근 위와 같은 클라우드 플랫폼 기반의 헬스케어 서비스 분야의 시간 지연성, 개인 정보의 보안성 이슈를 해결하기 위한 포그(fog) 또는 엣지(edge) 컴퓨팅 기술의 활용 되고 있다.In this regard, recently, a fog or edge computing technology has been used to solve the above-described time delay and security issues of personal information in the cloud platform-based healthcare service field.

즉, 종래의 전형적인 클라우드 컴퓨팅 방식의 서비스에서는 왕복 지연, 네트워크 정체, 보안 문제 등에 대한 이슈가 제기되고 있고, 이로 인하여 모바일 헬스케어에 적용하기에는 제약이 따르게 된다.That is, issues such as round-trip delay, network congestion, and security issues have been raised in the conventional typical cloud computing-based service, which limits its application to mobile healthcare.

또한, 원격 모니터링 또는 원격 상태와 같은 즉각적인 실시간 피드백이 필요한 응용 프로그램의 경우 사용자와 원격 클라우드 서버 간의 통신 시간은 높은 왕복 지연, 네트워크 정체 및 기타 문제와 같은 심각한 문제를 유발할 수 있다. In addition, for applications that require immediate real-time feedback, such as remote monitoring or remote status, the communication time between users and remote cloud servers can cause serious problems such as high round-trip delays, network congestion, and other issues.

이에 대하여, 도 2에서 볼 수 있는 바와 같이, 최근 위와 같은 문제점을 해결할 수 있고 확장성이 뛰어난 새로운 포그 컴퓨팅 또는 엣지 컴퓨팅 플랫폼을 모바일 헬스케어에 적용하여 기존 클라우드 컴퓨팅을 확장하려는 시도가 이루어지고 있다.In this regard, as can be seen in FIG. 2 , an attempt has recently been made to expand existing cloud computing by applying a new fog computing or edge computing platform capable of solving the above problems and having excellent scalability to mobile healthcare.

보다 구체적으로, 포그 컴퓨팅 토폴로지는 시스코(CISCO)에서 처음으로 소개되었으며, 엣지 컴퓨팅이라고도 불린다. 포그 컴퓨팅 방식은 연산 능력, 데이터 저장 용량을 네트워크의 끝까지 확장하는 방식으로, 사물 인터넷(IoT) 기기에 근접하게 되고, 이로 인해 응용과 서비스가 네트워크의 엣지에 위치하게 되어, 실시간 데이터 분석 및 응답을 즉시 수행 할 수 있다는 장점을 가진다.More specifically, the fog computing topology was first introduced by Cisco (CISCO), and is also called edge computing. The fog computing method is a method that extends computing power and data storage capacity to the end of the network, bringing it closer to Internet of Things (IoT) devices, which places applications and services at the edge of the network, enabling real-time data analysis and response. It has the advantage that it can be performed immediately.

그런데, 이러한 포그(엣지) 컴퓨팅을 위해서는 사용자들이 사용하는 웨어러블 단말 기기에 인공지능(AI) 기능을 수행하기 위한 분류기가 구비되어 있어야 한다. However, for such fog (edge) computing, wearable terminal devices used by users must be equipped with classifiers for performing artificial intelligence (AI) functions.

이때, 웨어러블 단말 기기는 통상적으로 배터리로부터 전원을 공급받아 동작하고 제안된 연산 능력을 갖고 있기 때문에 웨어러블 단말 기기에 내장되는 분류기는 전력 소모를 최소화되도록 해야 하며, 나아가 연산의 복잡도가 낮도록 설계하여 칩의 크기를 소형화하고, 특히 메모리 사용을 최적화하여 전체 칩이 최적화되도록 설계하는 것이 중요하다.At this time, since the wearable terminal device normally operates by receiving power from a battery and has the proposed computing capability, the classifier built in the wearable terminal device should minimize power consumption, and furthermore, design the complexity of the calculation to be low. It is important to design the entire chip to be optimized by miniaturizing the size of and optimizing memory usage in particular.

대한민국 공개특허공보 제10-2017-0045099호 (2017.04.26)Republic of Korea Patent Publication No. 10-2017-0045099 (2017.04.26)

본 발명은 상기와 같은 종래 기술의 문제점을 해결하기 위해 창안된 것으로, 웨어러블 헬스케어 단말 등 저전력 기기에 적용하여 인공지능 기능을 제공할 수 있는 인공지능 신경망 연산을 위한 신호 처리 방법, 장치 및 시스템을 제공하는 것을 목적으로 한다.The present invention was invented to solve the problems of the prior art as described above, and provides a signal processing method, device, and system for artificial intelligence neural network calculation that can provide artificial intelligence functions by applying to low-power devices such as wearable healthcare terminals. intended to provide

그 외 본 발명의 세부적인 목적은 아래에 기재되는 구체적인 내용을 통하여 이 기술 분야의 전문가나 연구자에게 자명하게 파악되고 이해될 수 있을 것이다.Other detailed objects of the present invention will be clearly identified and understood by experts or researchers in the art through the specific details described below.

본 발명의 일 실시예에 따른 신호 처리 장치는, 디지털 신호 처리를 수행하는 하나 이상의 디지털 신호 처리부; 및 상기 디지털 신호 처리부와 연동되어 신경망 연산을 수행하는 인공지능 처리 엔진;을 포함하여 구성되며, 상기 인공지능 처리 엔진에는, 이진 데이터를 처리하는 이진 신경망 처리부; 정수 데이터를 처리하는 정수 신경망 처리부; 및 상기 이진 신경망 처리부 및 상기 정수 신경망 처리부에 대한 제어를 수행하는 코어부;가 구비되는 것을 특징으로 한다.A signal processing apparatus according to an embodiment of the present invention includes at least one digital signal processing unit performing digital signal processing; and an artificial intelligence processing engine that performs a neural network operation in conjunction with the digital signal processing unit, wherein the artificial intelligence processing engine includes: a binary neural network processing unit that processes binary data; an integer neural network processor processing integer data; and a core unit that controls the binary neural network processing unit and the integer neural network processing unit.

이때, 상기 이진 신경망 처리부 및 상기 정수 신경망 처리부에서는, 합성곱 신경망(CNN)을 기초로 상기 이진 데이터 또는 상기 정수 데이터에 대한 신호 처리를 수행할 수 있다.In this case, the binary neural network processing unit and the integer neural network processing unit may perform signal processing on the binary data or the integer data based on a convolutional neural network (CNN).

또한, 상기 이진 신경망 처리부에서는, 배타적 부정 논리합(XNOR)의 비트(Bit) 연산을 사용하여 곱셈 연산(Multiplication)을 처리할 수 있다.In addition, the binary neural network processing unit may process a multiplication operation using a bit operation of an exclusive negation OR (XNOR).

또한, 상기 정수 신경망 처리부에서는, 양자화를 통해 산출된 상기 정수 데이터에 대하여 신호 처리를 수행할 수 있다.In addition, the integer neural network processing unit may perform signal processing on the integer data calculated through quantization.

또한, 상기 코어부에서는, 상기 이진 신경망 처리부 및 상기 정수 신경망 처리부에서 지원하지 않는 데이터에 대한 신호 처리를 수행할 수 있다.In addition, the core unit may perform signal processing on data not supported by the binary neural network processing unit and the integer neural network processing unit.

또한, 상기 인공지능 처리 엔진에는, 상기 신경망 연산을 수행하기 위하여 전용으로 사용되는 인공지능 처리 엔진 전용 메모리가 구비될 수 있다.In addition, the artificial intelligence processing engine may include a dedicated memory for the artificial intelligence processing engine exclusively used to perform the neural network calculation.

또한, 상기 이진 신경망 처리부는, 하나 이상의 신경망 네트워크; 상기 하나 이상의 신경망 네트워크에서 출력되는 제1-1 데이터에 대한 배치 정규화를 수행하는 하나 이상의 스케일러; 상기 하나 이상의 스케일러에서 출력되는 제1-2 데이터에 대한 누적 연산을 수행하는 어큐물레이터; 및 출력 데이터를 생성하는 풀링부;를 구비할 수 있다.In addition, the binary neural network processing unit, one or more neural network networks; at least one scaler performing batch normalization on the 1-1 data output from the at least one neural network; an accumulator performing an accumulation operation on data 1-2 output from the at least one scaler; and a pooling unit generating output data.

이때, 상기 이진 신경망 처리부가 복수의 단계로 나누어 연산을 수행하는 반복 연산 모드로 동작하는 경우, 상기 어큐물레이터에서는 상기 복수의 단계 중 중간 단계의 연산 결과를 저장하여 후속 단계의 연산 결과를 가산하도록 할 수 있다.At this time, when the binary neural network processing unit operates in an iterative operation mode in which operations are performed by dividing into a plurality of stages, the accumulator stores the calculation result of an intermediate stage among the plurality of stages and adds the calculation result of a subsequent stage. can do.

또한, 상기 하나 이상의 스케일러와 상기 풀링부는, 상기 복수의 단계 중 마지막 단계에서 활성화되어 최종 연산 결과를 산출하도록 할 수 있다.In addition, the one or more scalers and the pooling unit may be activated in a final step among the plurality of steps to calculate a final operation result.

또한, 상기 정수 신경망 처리부는, 하나 이상의 신경망 네트워크; 상기 하나 이상의 신경망 네트워크에서 출력되는 제2-1 데이터에 대한 배치 정규화를 수행하는 하나 이상의 스케일러; 상기 하나 이상의 스케일러에서 출력되는 제2-2 데이터에 대한 누적 연산을 수행하는 어큐물레이터; 및 출력 데이터를 생성하는 풀링부;를 구비할 수 있다.In addition, the integer neural network processing unit may include one or more neural network networks; at least one scaler performing batch normalization on the 2-1 data output from the at least one neural network; an accumulator performing an accumulation operation on the 2-2 data output from the at least one scaler; and a pooling unit generating output data.

이때, 상기 정수 신경망 처리부가 복수의 단계로 나누어 연산을 수행하는 반복 연산 모드로 동작하는 경우, 상기 어큐물레이터에서는 상기 복수의 단계 중 중간 단계의 연산 결과를 저장하여 후속 단계의 연산 결과를 가산하도록 할 수 있다.At this time, when the integer neural network processing unit operates in an iterative operation mode in which operations are performed by dividing into a plurality of steps, the accumulator stores the calculation result of an intermediate step among the plurality of steps to add the calculation result of a subsequent step. can do.

이에 따라, 본 발명의 일 실시예에 따른 인공지능 신경망 연산을 위한 저복잡도의 고속 신호 처리 방법, 장치 및 시스템에서는, 이진 신경망 네트워크에 최적화한 구조를 이용하여 저전력으로 구동 가능한 고성능의 인공지능 분류기를 제공하는 것이 가능하게 된다.Accordingly, in the low-complexity high-speed signal processing method, apparatus, and system for artificial intelligence neural network calculation according to an embodiment of the present invention, a high-performance artificial intelligence classifier that can be driven with low power using a structure optimized for a binary neural network It is possible to provide

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과로 제한되지 않으며, 언급하지 않은 또 다른 효과는 본 명세서에 기재된 내용으로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtainable in the present invention are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the contents described in this specification. There will be.

본 발명에 관한 이해를 돕기 위해 상세한 설명의 일부로 포함되는, 첨부도면은 본 발명에 대한 실시 예를 제공하고, 상세한 설명과 함께 본 발명의 기술적 사상을 설명한다.
도 1은 종래 기술에 따른 사물 인터넷(IoT) 기기와 클라우드 컴퓨팅(cloud computing)에 기반한 헬스케어 시스템의 구성을 예시하는 도면이다.
도 2는 종래 기술에 따른 포그 컴퓨팅 방식의 계층 구조 및 운영 방식을 예시하는 도면이다.
도 3은 본 발명의 일 실시예에 따른 신호 처리 장치(100)의 블록도이다.
도 4는 본 발명의 일 실시예에 따른 신호 처리 장치(100)의 구성을 예시하는 도면이다.
도 5는 본 발명의 일 실시예에 따른 신호 처리 장치(100)의 인공지능 처리 엔진(110)의 구성도이다.
도 6은 본 발명의 일 실시예에 따른 신호 처리 장치(100)에서의 구체적인 동작 프로세스를 예시하는 도면이다.
도 7은 본 발명의 일 실시예에 따른 신호 처리 장치(100)에서의 신호 처리 시뮬레이션 결과를 예시하는 도면이다.
도 8 내지 도 13은 본 발명의 일 실시예에 따른 신호 처리 장치(100)의 구체적인 동작을 설명하는 도면이다.
도 14는 본 발명의 다른 실시예에 따른 신호 처리 장치(200)의 구체적인 구성을 예시하는 도면이다.The accompanying drawings, which are included as part of the detailed description to aid understanding of the present invention, provide examples of the present invention and explain the technical idea of the present invention together with the detailed description.
1 is a diagram illustrating the configuration of a healthcare system based on an Internet of Things (IoT) device and cloud computing according to the prior art.
2 is a diagram illustrating a hierarchical structure and an operating method of a fog computing method according to the prior art.
3 is a block diagram of a signal processing device 100 according to an embodiment of the present invention.
4 is a diagram illustrating the configuration of a signal processing apparatus 100 according to an embodiment of the present invention.
5 is a block diagram of the artificial intelligence processing engine 110 of the signal processing apparatus 100 according to an embodiment of the present invention.
6 is a diagram illustrating a specific operation process in the signal processing apparatus 100 according to an embodiment of the present invention.
7 is a diagram illustrating signal processing simulation results in the signal processing apparatus 100 according to an embodiment of the present invention.
8 to 13 are diagrams for explaining specific operations of the signal processing apparatus 100 according to an embodiment of the present invention.
14 is a diagram illustrating a specific configuration of a signal processing apparatus 200 according to another embodiment of the present invention.

본 발명에서 사용되는 기술적 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명의 권리범위를 한정하려는 의도가 아님을 유의해야 한다. 또한, 본 발명에서 사용되는 기술적 용어는 본 발명에서 특별히 다른 의미로 정의되지 않는 한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 의미로 해석되어야 하며, 과도하게 포괄적인 의미로 해석되거나, 과도하게 축소된 의미로 해석되지 않아야 한다. 또한, 본 발명에서 사용되는 기술적인 용어가본 발명의 사상을 정확하게 표현하지 못하는 잘못된 기술적 용어일 때에는, 해당 분야의 통상의 기술자가 올바르게 이해할 수 있는 기술적 용어로 대체되어 이해되어야 할 것이다. 또한, 본 발명에서 사용되는 일반적인 용어는 사전에 정의되어 있는 바에 따라, 또는 전후 문맥상에 따라 해석되어야 하며, 과도하게 축소된 의미로 해석되지 않아야 한다.It should be noted that technical terms used in the present invention are only used to describe specific embodiments and are not intended to limit the scope of the present invention. In addition, technical terms used in the present invention should be interpreted in terms commonly understood by those of ordinary skill in the art to which the present invention belongs, unless specifically defined otherwise in the present invention, and are excessively inclusive. It should not be interpreted in a positive sense or in an excessively reduced sense. In addition, when the technical terms used in the present invention are incorrect technical terms that do not accurately express the spirit of the present invention, they should be replaced with technical terms that those skilled in the art can correctly understand. In addition, general terms used in the present invention should be interpreted as defined in advance or according to context, and should not be interpreted in an excessively reduced sense.

또한, 본 발명에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함한다. 본 발명에서, "구성된다" 또는 "포함한다" 등의 용어는 발명에 기재된 여러 구성요소들, 또는 여러 단계를 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.Also, singular expressions used in the present invention include plural expressions unless the context clearly dictates otherwise. In the present invention, terms such as "consisting of" or "comprising" should not be construed as necessarily including all of the various components or steps described in the invention, and some of the components or steps are included. It should be construed that it may not be, or may further include additional components or steps.

또한, 본 발명에서 사용되는 제1, 제2 등과 같이 서수를 포함하는 용어는 구성요소들을 설명하는데 사용될 수 있지만, 구성요소들은 용어들에 의해 한정되어서는 아니된다. 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리범위를 벗어나지 않으면서 제1 구성요소는 제2구성 요소로 명명될 수 있고, 유사하게 제2구성 요소도 제1 구성요소로 명명될 수 있다.Also, terms including ordinal numbers such as first and second used in the present invention may be used to describe elements, but elements should not be limited by the terms. Terms are only used to distinguish one component from another. For example, a first element may be termed a second element, and similarly, a second element may be termed a first element, without departing from the scope of the present invention.

이하, 첨부된 도면을 참조하여 본 발명에 따른 실시 예들을 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성요소는 동일한 참조번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings, but the same or similar components are given the same reference numerals regardless of reference numerals, and overlapping descriptions thereof will be omitted.

또한, 본 발명을 설명함에 있어서 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 발명의 기술사상을 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 발명의 기술사상이 제한되는 것으로 해석되어서는 아니 됨을 유의해야 한다.In addition, in describing the present invention, if it is determined that a detailed description of related known technologies may obscure the gist of the present invention, the detailed description will be omitted. In addition, it should be noted that the accompanying drawings are only for easy understanding of the technical idea of the present invention, and should not be construed as limiting the technical idea of the present invention by the accompanying drawings.

이하, 도면을 참조하여 본 발명의 일 실시예에 따른 인공지능 신경망 연산을 위한 신호 처리 방법, 장치 및 시스템에 대하여 살핀다.Hereinafter, a signal processing method, apparatus, and system for artificial intelligence neural network calculation according to an embodiment of the present invention will be described with reference to the drawings.

먼저, 도 3에서 볼 수 있는 바와 같이, 본 발명의 일 실시예에 따른 인공지능 신경망 연산을 위한 신호 처리 장치(100)는, 디지털 신호 처리를 수행하는 하나 이상의 디지털 신호 처리부(120) 및 상기 디지털 신호 처리부(120)와 연동되어 신경망 연산을 수행하는 인공지능 처리 엔진(110)을 포함하여 구성될 수 있다.First, as can be seen in FIG. 3, the signal processing apparatus 100 for artificial intelligence neural network operation according to an embodiment of the present invention includes one or more digital signal processing units 120 that perform digital signal processing and the digital It may be configured to include an artificial intelligence processing engine 110 that performs neural network calculation in conjunction with the signal processor 120.

이때, 상기 인공지능 처리 엔진(110)에는, 이진 데이터를 처리하는 이진 신경망 처리부(112a), 정수 데이터를 처리하는 정수 신경망 처리부(112b) 및 상기 이진 신경망 처리부(112a) 및 상기 정수 신경망 처리부(112b)에 대한 제어를 수행하는 코어부(111)가 구비될 수 있다.At this time, the artificial intelligence processing engine 110 includes a binary neural network processing unit 112a processing binary data, an integer neural network processing unit 112b processing integer data, and the binary neural network processing unit 112a and the integer neural network processing unit 112b. ) may be provided with a core unit 111 that performs control.

또한, 도 3에서 볼 수 있는 바와 같이, 상기 인공지능 처리 엔진(110)에는, 상기 신경망 연산을 수행하기 위하여 전용으로 사용되는 인공지능 처리 엔진 전용 메모리가 구비될 수도 있다.Also, as shown in FIG. 3 , the artificial intelligence processing engine 110 may include an artificial intelligence processing engine dedicated memory exclusively used to perform the neural network calculation.

여기서, 상기 이진 신경망 처리부(112a) 및 상기 정수 신경망 처리부(112b)에서는, 합성곱 신경망(Convolutional Neural Network, CNN)을 기초로 상기 이진 데이터 또는 상기 정수 데이터에 대한 신호 처리를 수행할 수 있다.Here, the binary neural network processing unit 112a and the integer neural network processing unit 112b may perform signal processing on the binary data or the integer data based on a convolutional neural network (CNN).

또한, 상기 이진 신경망 처리부(112a)에서는, 배타적 부정 논리합(XNOR)의 비트(Bit) 연산을 사용하여 곱셈 연산(Multiplication)을 처리할 수 있다.In addition, the binary neural network processing unit 112a may process a multiplication operation using a bit operation of an exclusive negation OR (XNOR).

또한, 상기 정수 신경망 처리부(112b)에서는, 양자화를 통해 산출된 상기 정수 데이터에 대하여 신호 처리를 수행할 수 있다.In addition, the integer neural network processing unit 112b may perform signal processing on the integer data calculated through quantization.

또한, 상기 코어부(111)에서는, 상기 이진 신경망 처리부(112a) 및 상기 정수 신경망 처리부(112b)에서 지원하지 않는 데이터에 대한 신호 처리를 수행할 수 있다.In addition, the core unit 111 may perform signal processing on data not supported by the binary neural network processing unit 112a and the integer neural network processing unit 112b.

또한, 상기 이진 신경망 처리부(112a)는, 하나 이상의 신경망 네트워크(1122a), 상기 하나 이상의 신경망 네트워크(1122a)에서 출력되는 제1-1 데이터에 대한 배치 정규화를 수행하는 하나 이상의 스케일러(1123a), 상기 하나 이상의 스케일러(1123a)에서 출력되는 제1-2 데이터에 대한 누적 연산을 수행하는 어큐물레이터(1124a) 및 출력 데이터를 생성하는 풀링부(1125a)를 구비할 수 있다.In addition, the binary neural network processing unit 112a includes one or more neural network networks 1122a, one or more scalers 1123a performing batch normalization on the 1-1 data output from the one or more neural network networks 1122a, the An accumulator 1124a performing an accumulation operation on the 1-2 data output from one or more scalers 1123a and a pooling unit 1125a generating output data may be provided.

이때, 상기 이진 신경망 처리부(112a)가 복수의 단계로 나누어 연산을 수행하는 반복 연산 모드로 동작하는 경우, 상기 어큐물레이터(1124a)에서는 상기 복수의 단계 중 중간 단계의 연산 결과를 저장하여 후속 단계의 연산 결과를 가산하도록 할 수 있다.In this case, when the binary neural network processing unit 112a operates in an iterative operation mode in which operations are performed by dividing into a plurality of stages, the accumulator 1124a stores the calculation result of an intermediate stage among the plurality of stages to perform subsequent operations. The calculation result of can be added.

또한, 상기 이진 신경망 처리부(112a)에서는, 상기 하나 이상의 스케일러(1123a)와 상기 풀링부(1125a)가 상기 복수의 단계 중 마지막 단계에서 활성화되어 최종 연산 결과를 산출하도록 할 수 있다.Also, in the binary neural network processing unit 112a, the at least one scaler 1123a and the pooling unit 1125a may be activated in a last step among the plurality of steps to calculate a final operation result.

또한, 상기 정수 신경망 처리부(112b)는, 하나 이상의 신경망 네트워크(1122b), 상기 하나 이상의 신경망 네트워크(1122b)에서 출력되는 제2-1 데이터에 대한 배치 정규화를 수행하는 하나 이상의 스케일러(1123b), 상기 하나 이상의 스케일러(1123b)에서 출력되는 제2-2 데이터에 대한 누적 연산을 수행하는 어큐물레이터(1124b) 및 출력 데이터를 생성하는 풀링부(1125b)를 구비할 수 있다.In addition, the integer neural network processor 112b includes one or more neural network networks 1122b, one or more scalers 1123b performing batch normalization on the 2-1 data output from the one or more neural network networks 1122b, the An accumulator 1124b performing an accumulation operation on the 2-2 data output from one or more scalers 1123b and a pooling unit 1125b generating output data may be provided.

이때, 상기 정수 신경망 처리부(112b)가 복수의 단계로 나누어 연산을 수행하는 반복 연산 모드로 동작하는 경우, 상기 어큐물레이터(1124b)에서는 상기 복수의 단계 중 중간 단계의 연산 결과를 저장하여 후속 단계의 연산 결과를 가산하도록 할 수 있다.At this time, when the integer neural network processing unit 112b operates in an iterative operation mode in which calculation is performed by dividing it into a plurality of steps, the accumulator 1124b stores the calculation result of an intermediate step among the plurality of steps to perform a subsequent step. The calculation result of can be added.

또한, 상기 정수 신경망 처리부(112b)에서는, 상기 하나 이상의 스케일러(1123b)와 상기 풀링부(1125b)가 상기 복수의 단계 중 마지막 단계에서 활성화되어 최종 연산 결과를 산출하도록 할 수 있다.Also, in the integer neural network processing unit 112b, the at least one scaler 1123b and the pooling unit 1125b may be activated in a last step among the plurality of steps to calculate a final operation result.

이에 따라, 본 발명의 일 실시예에 따른 인공지능 신경망 연산을 위한 신호 처리 방법, 장치 및 시스템에서는, 다수의 통합 메모리를 병렬 사용하여 고대역 데이터 처리를 가능하게 하고, 합성곱 신경망(Convolution Neural Network, CNN) 등에 최적화된 설계 구조를 적용할 수 있다(예를 들어, Convolution Module 설계 및 파이프라인 구조를 적용).Accordingly, in the signal processing method, apparatus, and system for artificial intelligence neural network calculation according to an embodiment of the present invention, high-bandwidth data processing is possible by using a plurality of integrated memories in parallel, and the convolutional neural network (Convolution Neural Network) , CNN), etc., an optimized design structure can be applied (eg, convolution module design and pipeline structure applied).

또한, 본 발명의 일 실시예에 따른 인공지능 신경망 연산을 위한 신호 처리 방법, 장치 및 시스템에서는, 이진(Binary) 신경망 네트워크 및 양자화된(Quantized) 정수 신경망 네트워크를 사용하여 계산 복잡도를 감소시킬 수 있고, 나아가 상기 이진 신경망 네트워크에서의 곱셈 연산(Multiplication)은 배타적 부정 논리합(eXclusive NOR, XNOR)의 비트(Bit) 연산으로 처리할 수 있으며, 이에 따라 처리 가능한 연산수를 비약적으로 증가시키면서도 소모 전력 및 네트워크의 파라미터(Weight)를 감소시키는 효과를 가질 수 있다.In addition, in the signal processing method, apparatus, and system for artificial intelligence neural network calculation according to an embodiment of the present invention, computational complexity can be reduced by using a binary neural network and a quantized integer neural network, Furthermore, the multiplication operation (Multiplication) in the binary neural network can be processed as a bit operation of exclusive NOR (XNOR), thereby dramatically increasing the number of operable operations that can be processed while consuming power and network It may have the effect of reducing the parameter (Weight) of.

이와 관련하여, 본 발명의 일 실시예에 따른 인공지능 신경망 연산을 위한 신호 처리 방법, 장치 및 시스템에서는, 디지털 신호 처리(Digital Signal Processing, DSP) 모듈 내부에 별도 블록으로 인공지능 처리 엔진(AI Engine)을 구성할 수 있다.In this regard, in the signal processing method, apparatus, and system for artificial intelligence neural network calculation according to an embodiment of the present invention, an artificial intelligence processing engine (AI Engine) is a separate block inside a digital signal processing (DSP) module. ) can be configured.

보다 구체적으로, 인공지능 처리 엔진(AI Engine)과 관련된 아키텍쳐적인 측면은 도 4에 예시된 바와 같다.More specifically, the architectural aspect related to the artificial intelligence processing engine (AI Engine) is as illustrated in FIG. 4 .

이때, 도 4에서 볼 수 있는 바와 같이, 본 발명의 일 실시예에 따른 웨어러블 기기 등 신호 처리 장치(100)에는 인공지능 처리 엔진(AI Engine)(110)이 구비될 수 있고, 상기 인공지능 처리 엔진(AI Engine)(110)은 디지털 신호 처리(DSP) 등을 수행할 수 있는 코어부(111), 합성곱 신경망(CNN) 가속기 등으로 구성될 수 있는 신경망 처리부(112) 및 인공지능 처리 엔진 전용 메모리(113)를 포함하여 구성될 수 있다.At this time, as can be seen in FIG. 4, the signal processing device 100 such as a wearable device according to an embodiment of the present invention may include an artificial intelligence processing engine (AI Engine) 110, and the artificial intelligence processing The engine (AI Engine) 110 includes a core unit 111 that can perform digital signal processing (DSP), etc., a neural network processing unit 112 that can be composed of a convolutional neural network (CNN) accelerator, and the like, and an artificial intelligence processing engine. It may be configured to include a dedicated memory 113.

이때, 상기 신경망 처리부(112)는 이진 신경망 처리부(112a)와 정수 신경망 처리부(112b)를 포함하여 구성될 수 있다.At this time, the neural network processing unit 112 may include a binary neural network processing unit 112a and an integer neural network processing unit 112b.

보다 구체적으로, 본 발명의 일 실시예에 따른 웨어러블 기기 등 신호 처리 장치(100)에서는, 도 4에서 볼 수 있는 바와 같이, 일반적인 신호 처리(예를 들어, 음성 신호 처리)를 수행하는 디지털 신호 처리 상부 구조(DSP Top) 내에 인공지능 처리 엔진(AI Engine)(110)을 구성하여, 신호 데이터를 적은 오버헤드로 인공지능 기능을 수행할 수 있게 구성할 수 있다.More specifically, in the signal processing apparatus 100 such as a wearable device according to an embodiment of the present invention, as shown in FIG. 4, digital signal processing to perform general signal processing (eg, audio signal processing) By configuring the artificial intelligence processing engine (AI Engine) 110 in the upper structure (DSP Top), it is possible to configure the signal data to perform the artificial intelligence function with little overhead.

이때, 상기 인공지능 처리 엔진(AI Engine)(110) 내의 인공지능 처리 엔진 전용 메모리(AIE Memory)(113)는 최대 성능을 발휘할 수 있도록 인공지능 처리 엔진(AI Engine) 전용으로 구성하며, 일반적인 메모리(Memory)에 비해 대역폭을 크게 증대할 수 있다(예를 들어, 공유 메모리(Shared Memory) : 32bit, 인공지능 처리 엔진 전용 메모리(AIE Memory) : 256 bit의 데이터 폭을 가짐).At this time, the artificial intelligence processing engine dedicated memory (AIE Memory) 113 in the artificial intelligence processing engine (AI Engine) 110 is configured exclusively for the artificial intelligence processing engine (AI Engine) so as to exert maximum performance, and a general memory (Memory) can greatly increase the bandwidth (eg, shared memory: 32 bits, AI processing engine dedicated memory (AIE Memory): has a data width of 256 bits).

이때, 기본적인 합성곱 신경망 기능은 신경망 처리부(112)에서 수행하고, 모듈간 제어나 신경망 처리부(112)에서 제공하지 않는 데이터 처리 기능은 전용 디지털 신호 처리(DSP) 코어를 구비하는 코어부(111)를 통해 수행할 수 있다.At this time, the basic convolutional neural network function is performed by the neural network processing unit 112, and the inter-module control or data processing function not provided by the neural network processing unit 112 is performed by the core unit 111 having a dedicated digital signal processing (DSP) core. can be done through

보다 구체적으로, 본 발명의 일 실시예에 따른 웨어러블 기기 등 신호 처리 장치(100)에서는, 상기 코어부(111)에 전용 디지털 신호 처리(DSP) 코어(core)를 구비하여 인공지능 처리 엔진(AI Engine)(110)을 제어하거나 추가적인 데이터 처리가 가능하도록 구성할 수 있으며, 이때 상기 코어부(111)에서는 이진(Binarized) 신경망 네트워크(Neural Network)에 최적화된 설계 및 계산 복잡도의 최소화가 가능하도록 구성될 수 있다.More specifically, in the signal processing device 100 such as a wearable device according to an embodiment of the present invention, a dedicated digital signal processing (DSP) core is provided in the core unit 111 to generate an artificial intelligence processing engine (AI) Engine 110 can be controlled or configured to allow additional data processing, and at this time, the core unit 111 is designed to be optimized for a binary neural network and configured to minimize computational complexity. It can be.

또한, 본 발명의 일 실시예에 따른 웨어러블 기기 등 신호 처리 장치(100)에서는, 입/출력단 및 필요 부분에 대하여 정수 연산이 가능하도록 구성할 수 있으며, 보다 구체적으로, 도 5에서 볼 수 있는 바와 같이, 크게 이진 합성 신경망 가속기(Binary CNN Accelerator)를 구비하는 이진 신경망 처리부(112a)와 정수 합성 신경망 가속기(Integer CNN Accelerator)(112b)를 구비하는 정수 신경망 처리부(112b)를 포함하여 구성될 수 있다.In addition, in the signal processing device 100 such as a wearable device according to an embodiment of the present invention, it can be configured to enable integer calculation for input/output terminals and necessary parts, and more specifically, as shown in FIG. Similarly, it can be configured to include a binary neural network processing unit 112a having a binary neural network accelerator and an integer neural network processing unit 112b having an integer neural network accelerator 112b. .

이때, 일반적인 정수 신경망 네트워크를 구성하는 경우 또는 이진 신경망 네트워크를 사용하는 경우에도, 전체 신경망 네트워크의 입력단과 출력단(이때, 입력단과 출력단에서는 정수 데이터 사용)의 경우에는 정수 합성 신경망 가속기(Integer CNN Accelerator)를 구비하는 정수 신경망 처리부(112b)를 사용하여 처리할 수 있다.At this time, even if a general integer neural network is configured or a binary neural network is used, an integer synthesis neural network accelerator (Integer CNN Accelerator) is used for the input and output terminals of the entire neural network (in this case, integer data is used in the input and output terminals) It can be processed using the integer neural network processing unit 112b having

여기서, 이진(Binary) 합성 신경망 네트워크(CNN)의 경우 비트(bit) 연산인 XNOR(eXclusive NOR) 연산을 사용하여 계산이 가능하기 때문에 복잡도를 현저히 감소시킬 수 있게 된다.Here, in the case of a binary synthetic neural network (CNN), complexity can be remarkably reduced because calculation can be performed using an eXclusive NOR (XNOR) operation, which is a bit operation.

이와 관련하여, 도 6에서는 본 발명에 따른 구성을 16 x 16의 스펙트럼 피쳐(Spectrum Feature)를 분류하는 양자화(Quantized) 신경망 네트워크 예제에 적용하는 경우를 예시하고 있다.In this regard, FIG. 6 illustrates a case in which the configuration according to the present invention is applied to an example of a quantized neural network classifying a 16×16 spectrum feature.

이때, 본 발명의 일 실시예에 따른 신호 처리 장치(100)에서는 소리 신호의 피쳐(Feature)를 사용하여 잡음 환경을 분류하는 간단한 예제로 멜스펙트로그램(MelSpectrogram)을 16 x 16의 2차원 피쳐(2D Feature)로 구성하여 입력으로 사용하였다. At this time, in the signal processing apparatus 100 according to an embodiment of the present invention, as a simple example of classifying a noise environment using a feature of a sound signal, a MelSpectrogram is a 16 x 16 2-dimensional feature ( 2D feature) and used as input.

이에 따라, 도 6에서 볼 수 있는 바와 같이, 첫번째 단계에서는 정수의 3 x 3 2차원 콘볼루션(2D Convolution)을 진행한 후 이진화(Binarization)을 수행하고, 그 이후로 2차례의 3 x 3 2차원 이진 콘볼루션(2D Binary Convolution)을 수행한 후 전체 연결 레이어(Fully Connected Layer)와 소프트맥스(SoftMax)를 거쳐 분류를 수행하게 된다.Accordingly, as can be seen in FIG. 6, in the first step, binarization is performed after 3 x 3 2D convolution of integers, and thereafter, two rounds of 3 x 3 2 After performing 2D Binary Convolution, classification is performed through a Fully Connected Layer and SoftMax.

이에 따라, 도 7에서는 본 발명의 일 실시예에 따른 웨어러블 기기 등 신호 처리 장치(100)에서의 신호 처리 시뮬레이션 결과를 예시하고 있다.Accordingly, FIG. 7 exemplifies signal processing simulation results in the signal processing device 100 such as a wearable device according to an embodiment of the present invention.

보다 구체적으로, 도 7에서는 상기 도 6의 (A)로 표시된 부분에 해당하는 시뮬레이션 결과를 예시하고 있다.More specifically, FIG. 7 illustrates simulation results corresponding to the portion indicated by (A) in FIG. 6 .

이때, 도 7에서는 1) Input Data & Kernel Coefficient를 인공지능 처리 엔진(AIE) 전용 메모리(113)로 복사하고, 2) Kernel Coefficient를 인공지능(AIE)으로 로딩하여, 3) 2차원 이진 콘볼루션(2D Binary Convolution)을 수행하게 되며[ (W, H, InCh, OutCh) = (16, 16, 32, 32) 약 262kOps ], 이에 따라 총 소요시간은 361ns (= 726MOps/Sec@5.12MHz) 클럭(CLK임)을 알 수 있다.At this time, in FIG. 7, 1) Copy the Input Data & Kernel Coefficient to the memory 113 dedicated to the artificial intelligence processing engine (AIE), 2) load the kernel coefficient into the artificial intelligence (AIE), and 3) 2-dimensional binary convolution (2D Binary Convolution) is performed [(W, H, InCh, OutCh) = (16, 16, 32, 32) about 262kOps], so the total required time is 361ns (= 726MOps/Sec@5.12MHz) clock (which is CLK).

이와 같이, 본 발명의 일 실시예에 따른 신호 처리 장치(100)에서는 일상 생활에서의 개인 생체 정보, 상황 인지 등의 데이터을 기반으로 인공지능 기술을 적용하여 실시간으로 개인 맞춤형 헬스케어 서비스가 가능한 웨어러블 헬스케어 기기 등에 적용할 수 있는 저복잡도의 인공지능 분류기와 상기 인공지능 분류기를 이용한 신호 처리 방법을 제공할 수 있게 된다.As such, the signal processing device 100 according to an embodiment of the present invention applies artificial intelligence technology based on data such as personal biometric information and situational awareness in daily life to provide personalized healthcare service in real time Wearable health It is possible to provide a low-complexity artificial intelligence classifier applicable to care devices and the like and a signal processing method using the artificial intelligence classifier.

이에 따라, 본 발명에서는 웨어러블 헬스케어 기기 등에 적용 가능한 인공지능 분류기 기술, 실시간 개인 맞춤형 헬스케어가 가능한 새로운 웨어러블 헬스케어 기기, 인공지능 엣지 컴퓨팅 기술이 적용된 헬스케어 서비스를 통하여 사용자들에게 보다 나은 개인 맞춤형 건강 관리 서비스를 제공하고, 나아가 새로운 인공지능 제품의 개발도 가능하다.Accordingly, in the present invention, artificial intelligence classifier technology applicable to wearable healthcare devices, new wearable healthcare devices capable of real-time personalized healthcare, and healthcare services with artificial intelligence edge computing technology provide better personalized service to users. It can provide health management services and develop new artificial intelligence products.

또한, 도 8에서는 본 발명의 일 실시예에 따른 신호 처리 장치(100)에서의 이진(binary) 입출력 데이터의 포맷(format)을 설명하고 있다.8 illustrates a format of binary input/output data in the signal processing apparatus 100 according to an embodiment of the present invention.

보다 구체적으로, 도 8의(a)에서는 전체 256 비트에서 하위 32 비트가 유효하도록 구성되어 각 채널별 비트를 할당하는 경우를 예시하고 있으며, 입력과 출력이 모두 바이너리 포맷일 때 유용하게 활용할 수 있다. More specifically, in (a) of FIG. 8, a case in which the lower 32 bits are configured to be effective from a total of 256 bits and bits are allocated for each channel is exemplified, and it can be usefully used when both input and output are in binary format. .

보다 구체적으로, 도 8의 (a)에서 오른쪽의 경우처럼 이진화(Binarization)가 이 반복적으로 나타나고 순차적으로 구성되어 높은 정확도(Precision)가 요구되지 않는 경우에 유용하게 활용될 수 있다.More specifically, as in the case on the right in (a) of FIG. 8, binarization is repeatedly performed and configured sequentially, so it can be usefully used when high precision is not required.

또한, 도 8의(b)에서는 전체 256 비트에서 8비트 혹은 16비트 간격으로 유효한 바이너리 값이 존재하는 경우를 예시하고 있으며, 출력 결과를 보존해야 하는 경우 유용하게 활용될 수 있다.In addition, in (b) of FIG. 8, a case in which valid binary values exist in 8-bit or 16-bit intervals from a total of 256 bits is exemplified, and can be usefully used when an output result needs to be preserved.

보다 구체적으로, 도 8의 (a)에서 오른쪽 경우의 적색 실선 부분과 같이 정확도(Precision)를 유지해야 하는 경우 유용하게 활용될 수 있으며, 데이터를 모두 저장하고 로드(load)와 동시에 이진화(Binarization)을 바로 수행하면서 사용할 수 있다.More specifically, it can be usefully utilized when precision must be maintained, as shown by the red solid line in the right case in (a) of FIG. It can be used while executing.

나아가, 도 9에서는 본 발명의 일 실시예에 따른 신호 처리 장치(100)에서의 정수형(integer) 입출력 데이터의 포맷(format)을 설명하고 있다.Furthermore, FIG. 9 illustrates a format of integer input/output data in the signal processing device 100 according to an embodiment of the present invention.

도 9에서 볼 수 있는 바와 같이, 정수형 입출력의 경우 8 비트 혹은 16 비트 단위로 구성되는 것이 가능하다.As can be seen in FIG. 9, in the case of integer type input/output, it is possible to configure in units of 8 bits or 16 bits.

이때, 병렬로 처리할 있는 2D CNN 모듈의 개수에 따라 유효한 비트 수는 달라질 수 있다.At this time, the effective number of bits may vary according to the number of 2D CNN modules that can be processed in parallel.

보다 구체적인 예를 들어, 2D CNN이 4개인 경우 입력 CH0~CH3 밖에 처리할 수 없으므로, 8 * 4 = 32bit만 유효한 입력이 될 수 있다.For a more specific example, if there are four 2D CNNs, only inputs CH0 to CH3 can be processed, so only 8 * 4 = 32 bits can be valid inputs.

또한, 도 10에서는 상기 신경망 처리부(112)에서 한번에 처리가 불가한 크기의 데이터가 입력되는 경우, 상기 입력된 데이터를 복수의 단계로 나누어 처리하는 반복(iteration) 연산 모드에서의 데이터 포맷을 설명하고 있다.In addition, in FIG. 10, when data of a size that cannot be processed at one time by the neural network processing unit 112 is input, a data format in an iteration operation mode in which the input data is divided into a plurality of steps and processed is described. there is.

보다 구체적으로, 본 발명의 일 실시예에 따른 신호 처리 장치(100)에서는, 상기 신경망 처리부(112)에서 한 번에 처리할 수 있는 채널 수 이상을 처리해야 하는 경우 (이는 CNN 모듈의 개수와 입력 데이터 포맷에 따라 정해진다), 여러 번 나누어 처리하여야 하기 때문에 중간 결과 누적값을 저장한 후 다음 시행 시에 저장값을 다시 불러와 누적하여 처리하게 된다.More specifically, in the signal processing apparatus 100 according to an embodiment of the present invention, when more than the number of channels that can be processed by the neural network processing unit 112 is processed at one time (this is the number of CNN modules and input Depending on the data format), since the process must be divided several times, the accumulated value of the intermediate result is stored, and then the stored value is recalled and accumulated for processing in the next trial.

예를 들어, 도 10에서 볼 수 있는 바와 같이, 입력이 64채널로 한 번에 처리 가능한 채널인 32채널 이상인 경우 순차적으로 두 번에 나누어 처리하게 된다.For example, as can be seen in FIG. 10 , when the input is 64 channels and 32 channels or more, which is a channel that can be processed at one time, is sequentially divided into two and processed.

이때, 첫번째 단계에서는 CH0~CH31의 누적 결과를 저장한 후에, 두번째 단계에서 CH32~CH63 수행 시에는 CH0~CH31의 이전의 누적 결과를 로드(load)하여 함께 누적하여 출력값을 산출하게 된다.At this time, after storing the accumulated results of CH0 to CH31 in the first step, when performing CH32 to CH63 in the second step, the previous accumulated results of CH0 to CH31 are loaded and accumulated together to calculate an output value.

이때, 한 번에 처리할 수 있는 계산량의 경우, 입출력 비트 수 제한이 있기 때문에 출력 채널 수 기준으로 누적에 필요한 비트수에 따라 제약될 수 있다.In this case, since there is a limit on the number of input/output bits in the case of a calculation amount that can be processed at one time, it may be limited according to the number of bits required for accumulation based on the number of output channels.

예를 들어, 누적할 때 필요한 비트가 32비트인 경우에는 한 번에 최대 8채널 출력을 처리할 있다(예시: 256b(Total Output wide) / 32bit (Channel당 bit수) = 8 CH)).For example, if the number of bits required for accumulation is 32 bits, up to 8 channel outputs can be processed at one time (Example: 256b (Total Output wide) / 32bit (number of bits per channel) = 8 CH)).

또한, 도 11에서는 입력된 데이터를 한번에 처리하는 비반복(non-iteration) 연산 모드에서의 이진 신경망 처리부(112a)의 동작을 예시하고 있다.In addition, FIG. 11 illustrates the operation of the binary neural network processing unit 112a in a non-iteration operation mode that processes input data at once.

도 11의 (B)로 표시된 처리 경로를 참조하여 살펴보면, 입력, 출력이 모두 32채널 이하이고, CNN모듈이 연속적인 경우 결과를 모두 저장할 필요가 없는 경우, 이진화(Binarization)한 결과를 저장해도 되므로 이진 합성곱 신경망(BCNN) 모듈의 최대 성능을 발휘할 수 있으며, 여기서 한번의 단계에서 처리하므로 필요에 따라 배치 정규화(Batch Normalization)를 수행하는 스케일러(1123a)나 풀링부(1125a)를 구동하여 출력값을 산출할 수 있다(누적 연산(Accumulation)을 수행하는 어큐물레이터(1124a)는 구동할 필요가 없음).Referring to the processing path indicated by (B) of FIG. 11, if both the input and output are 32 channels or less and the CNN module is continuous, if it is not necessary to store all the results, the result of binarization may be stored. The maximum performance of the binary convolutional neural network (BCNN) module can be demonstrated, and since it is processed in one step, the output value is calculated by driving the scaler 1123a or pooling unit 1125a that performs batch normalization as needed. can be calculated (the accumulator 1124a performing the accumulation operation does not need to be driven).

이에 대하여, 도 12a 내지 도 12c 및 도 13a 내지 도 13c 에서는 상기 신경망 처리부(112)에서 한번에 처리가 불가한 크기의 데이터가 입력되는 경우, 상기 입력된 데이터를 복수의 단계로 나누어 처리하는 반복(iteration) 연산 모드에서의 상기 이진 신경망 처리부(112a)와 상기 정수 신경망 처리부(112b)의 동작을 예시하고 있다.In contrast, in FIGS. 12A to 12C and 13A to 13C , when data of a size that cannot be processed at one time by the neural network processing unit 112 is input, the input data is divided into a plurality of steps and processed. ) exemplifies the operations of the binary neural network processing unit 112a and the integer neural network processing unit 112b in the arithmetic mode.

보다 구체적인 예를 들어, 본 발명의 일 실시예에 따른 신호 처리 장치(100)에서는, 입력이 32채널 이상인 경우 등과 같이 입력을 수차례 나누어서 처리해야 하거나(예를 들어, 도 12a의 (b)), 데이터 처리 경로(path)로 인해 한 번에 처리가 어려운 경우(예를 들어, 도 12a의 (c)), 중간 결과를 저장하여 처리하는 것이 필요할 수 있다.For a more specific example, in the signal processing apparatus 100 according to an embodiment of the present invention, when the input is 32 channels or more, it is necessary to divide the input several times and process it (for example, FIG. 12a (b)) , If it is difficult to process at once due to a data processing path (eg, (c) of FIG. 12A), it may be necessary to store and process intermediate results.

먼저, 도 12a의 (C1) 및 도 13a의 (D1)로 표시된 처리 경로를 살펴보면, 첫번째 반복(Iteration) 단계에서는 연산을 수행하면서 마지막 저장 시 이진화(Binarization)을 수행하지 않고 최대한 모든 비트를 저장할 수 있으며, 이 때의 비트 수에 따라 최대 처리 채널 수가 제약이 있을 수 있다.First, looking at the processing paths indicated by (C1) of FIG. 12A and (D1) of FIG. 13A, while performing an operation in the first iteration step, it is possible to store all bits as much as possible without performing binarization in the last storage. There may be restrictions on the maximum number of processing channels depending on the number of bits at this time.

이어서, 도 12b의 (C2) 및 도 13b의 (D2)로 표시된 처리 경로를 살펴보면, 중간 반복(Iteration) 단계(두번째 ~ 마지막 이전 단계)에서는, 이전 반복 단계에서 저장한 중간값을 로드(Load)하여 상기 어큐물레이터(Accumulator/Adder)(1124a, 1124b)에서 누적 연산을 수행한 후 재저장할 수 있다.Subsequently, looking at the processing paths indicated by (C2) of FIG. 12B and (D2) of FIG. Thus, the accumulator/adder 1124a, 1124b may perform an accumulation operation and then restore the data.

또한, 도 12c의 (C3) 및 도 13c의 (D3)로 표시된 처리 경로를 살펴보면, 마지막 반복(Iteration) 단계에서는, 마지막 반복(Iteration)이므로 필요에 따라 배치 정규화(Batch Normalization)를 수행하는 스케일러(1123a, 1123b) 및 풀링부(Max Pool 2D)(1125a, 1125b)를 수행하고 저장 시 다음 레이어의 포맷에 맞추어 이진화(Binarization)를 수행하는 것도 가능하다.In addition, looking at the processing paths indicated by (C3) of FIG. 12C and (D3) of FIG. 13C, in the last Iteration step, since it is the last Iteration, a scaler that performs batch normalization as necessary ( 1123a, 1123b) and pooling (Max Pool 2D) (1125a, 1125b), it is also possible to perform binarization according to the format of the next layer when saving.

또한, 본 발명의 또 다른 측면에 따른 컴퓨터 프로그램은 앞서 살핀 신호 처리 장치, 방법 및 시스템의 각 단계를 컴퓨터에서 실행시키기 위하여 컴퓨터로 판독 가능한 매체에 저장된 컴퓨터 프로그램인 것을 특징으로 한다. 상기 컴퓨터 프로그램은 컴파일러에 의해 만들어지는 기계어 코드를 포함하는 컴퓨터 프로그램뿐만 아니라, 인터프리터 등을 사용해서 컴퓨터에서 실행될 수 있는 고급 언어 코드를 포함하는 컴퓨터 프로그램일 수도 있다. 이때, 상기 컴퓨터로서는 퍼스널 컴퓨터(PC)나 노트북 컴퓨터 등에 한정되지 아니하며, 서버, 스마트폰, 태블릿 PC, PDA, 휴대전화 등 중앙처리장치(CPU)를 구비하여 컴퓨터 프로그램을 실행할 수 있는 일체의 정보처리 장치를 포함한다. In addition, the computer program according to another aspect of the present invention is characterized in that it is a computer program stored in a computer readable medium in order to execute each step of the salpin signal processing apparatus, method, and system on a computer. The computer program may be a computer program including machine code generated by a compiler, as well as a computer program including high-level language code that can be executed on a computer using an interpreter or the like. At this time, the computer is not limited to a personal computer (PC) or a notebook computer, etc., and has a central processing unit (CPU) such as a server, smart phone, tablet PC, PDA, mobile phone, etc. to execute a computer program. All information processing include the device

또한, 컴퓨터가 읽을 수 있는 매체는, 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 임시 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 복수의 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 따라서, 상기의 상세한 설명은 모든 면에서 제한적으로 해석되어서는 아니되고 예시적인 것으로 고려되어야 한다. 본 발명의 범위는 첨부된 청구항의 합리적 해석에 의해 결정되어야 하고, 본 발명의 등가적 범위 내에서의 모든 변경은 본 발명의 범위에 포함된다.Also, the computer-readable medium may continuously store programs executable by the computer or temporarily store them for execution or download. In addition, the medium may be various recording means or storage means in the form of a single or a plurality of hardware combined, but is not limited to a medium directly connected to a certain computer system, and may be distributed on a network. Accordingly, the above detailed description should not be construed as limiting in all respects and should be considered illustrative. The scope of the present invention should be determined by reasonable interpretation of the appended claims, and all changes within the equivalent scope of the present invention are included in the scope of the present invention.

또한, 도 14에서는 본 발명의 제안 방법이 적용될 수 있는 장치(200)를 예시한다.14 illustrates an apparatus 200 to which the proposed method of the present invention can be applied.

도 14를 참조하면, 장치(200)는 본 발명의 제안 방법에 따른 신호 처리 프로세스를 구현하도록 구성될 수 있다. 일 예로, 장치(200)는 음성 신호를 처리하는 보청기 등의 웨어러블 장치(200)일 수 있다.Referring to FIG. 14 , an apparatus 200 may be configured to implement a signal processing process according to the proposed method of the present invention. For example, the device 200 may be a wearable device 200 such as a hearing aid that processes a voice signal.

또한, 예를 들어, 본 발명의 제안 방법이 적용될 수 있는 장치(200)는 리피터, 허브, 브리지, 스위치, 라우터, 게이트웨이 등과 같은 네트워크 장치, 데스크톱 컴퓨터, 워크스테이션 등과 같은 컴퓨터 장치, 스마트폰 등과 같은 이동 단말, 랩톱 컴퓨터 등과 같은 휴대용 기기, 디지털 TV 등과 같은 가전 제품, 자동차 등과 같은 이동 수단 등을 포함할 수 있다. 다른 예로, 본 발명이 적용될 수 있는 장치(200)는 SoC(System On Chip) 형태로 구현된 ASIC(Application Specific Integrated Circuit)의 일부로 포함될 수 있다.In addition, for example, the device 200 to which the proposed method of the present invention can be applied includes network devices such as repeaters, hubs, bridges, switches, routers, and gateways, computer devices such as desktop computers and workstations, and smart phones. It may include portable devices such as mobile terminals and laptop computers, home appliances such as digital TVs, and means of transportation such as automobiles. As another example, the device 200 to which the present invention can be applied may be included as a part of an application specific integrated circuit (ASIC) implemented in a system on chip (SoC) form.

메모리(20)는 프로세서(10)와 동작 시 연결될 수 있고, 프로세서(10)의 처리 및 제어를 위한 프로그램 및/또는 명령어들을 저장할 수 있고, 본 발명에서 사용되는 데이터와 정보, 본 발명에 따른 데이터 및 정보 처리를 위해 필요한 제어 정보, 데이터 및 정보 처리 과정에서 발생하는 임시 데이터 등을 저장할 수 있다. 메모리(20)는 ROM(Read Only Memory), RAM(Random Access Memory), EPROM(Erasable Programmable Read Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), 플래쉬(flash) 메모리, SRAM(Static RAM), HDD(Hard Disk Drive), SSD(Solid State Drive) 등과 같은 저장 장치로서 구현될 수 있다.The memory 20 may be connected to the processor 10 during operation, and may store programs and/or instructions for processing and controlling the processor 10, data and information used in the present invention, and data according to the present invention. and control information and data required for information processing and temporary data generated in the information processing process. The memory 20 includes read only memory (ROM), random access memory (RAM), erasable programmable read only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory, and static RAM (SRAM). , HDD (Hard Disk Drive), SSD (Solid State Drive), etc. may be implemented as a storage device.

프로세서(10)는 메모리(20) 및/또는 네트워크 인터페이스(30)와 동작 시 연결(operatively connected)될 수 있고, 장치(200) 내 각 모듈의 동작을 제어한다. 특히, 프로세서(10)는 본 발명의 제안 방법을 수행하기 위한 각종 제어 기능을 수행할 수 있다. 프로세서(120)는 컨트롤러(controller), 마이크로 컨트롤러(microcontroller), 마이크로 프로세서(microprocessor), 마이크로 컴퓨터(microcomputer) 등으로도 불릴 수 있다. 본 발명의 제안 방법은 하드웨어(hardware) 또는 펌웨어(firmware), 소프트웨어, 또는 이들의 결합에 의해 구현될 수 있다. 하드웨어를 이용하여 본 발명을 구현하는 경우에는, 본 발명을 수행하도록 구성된 ASIC(application specific integrated circuit) 또는 DSP(digital signal processor), DSPD(digital signal processing device), PLD(programmable logic device), FPGA(field programmable gate array) 등이 프로세서(10)에 구비될 수 있다. 한편, 펌웨어나 소프트웨어를 이용하여 본 발명의 제안 방법을 구현하는 경우에는 펌웨어나 소프트웨어는 본 발명의 제안 방법을 구현하는 데 필요한 기능 또는 동작들을 수행하는 모듈, 절차 또는 함수 등과 관련된 명령어(instruction)들을 포함할 수 있으며, 명령어들은 메모리(20)에 저장되거나 메모리(20)와 별도로 컴퓨터 판독가능한 기록 매체(미도시)에 저장되어 프로세서(10)에 의해 실행될 때 장치(120)가 본 발명의 제안 방법을 구현하도록 구성될 수 있다.The processor 10 may be operatively connected to the memory 20 and/or the network interface 30 and controls the operation of each module in the device 200 . In particular, the processor 10 may perform various control functions for performing the proposed method of the present invention. The processor 120 may also be called a controller, a microcontroller, a microprocessor, a microcomputer, or the like. The proposed method of the present invention may be implemented by hardware, firmware, software, or a combination thereof. In the case of implementing the present invention using hardware, an application specific integrated circuit (ASIC) or a digital signal processor (DSP) configured to perform the present invention, a digital signal processing device (DSPD), a programmable logic device (PLD), an FPGA ( field programmable gate array) and the like may be provided in the processor 10 . On the other hand, when implementing the proposed method of the present invention using firmware or software, the firmware or software provides instructions related to modules, procedures, or functions that perform functions or operations necessary to implement the proposed method of the present invention. When the instructions are stored in the memory 20 or stored in a computer readable recording medium (not shown) separate from the memory 20 and executed by the processor 10, the device 120 performs the proposed method of the present invention. It can be configured to implement.

또한, 장치(200)는 네트워크 인터페이스 디바이스(network interface device)(30)를 포함할 수 있다. 네트워크 인터페이스 디바이스(30)는 프로세서(10)와 동작 시 연결되며, 프로세서(10)는 네트워크 인터페이스 디바이스(30)를 제어하여 무선/유선 네트워크를 통해 정보 및/또는 데이터, 신호, 메시지 등을 나르는 무선/유선 신호를 전송 또는 수신할 수 있다. 네트워크 인터페이스 디바이스(30)는 예를 들어 IEEE 802 계열, 3GPP LTE(-A), 3GPP 5G 등과 같은 다양한 통신 규격을 지원하며, 해당 통신 규격에 따라 제어 정보 및/또는 데이터 신호를 송수신할 수 있다. 네트워크 인터페이스 디바이스(30)는 필요에 따라 장치(200) 밖에 구현될 수도 있다.In addition, the apparatus 200 may include a network interface device 30 . The network interface device 30 is connected to the processor 10 during operation, and the processor 10 controls the network interface device 30 to transmit information and/or data, signals, messages, etc. through a wireless/wired network. /Can transmit or receive wired signals. The network interface device 30 supports various communication standards such as IEEE 802 series, 3GPP LTE(-A), and 3GPP 5G, and can transmit and receive control information and/or data signals according to the communication standards. The network interface device 30 may be implemented outside the apparatus 200 as needed.

본 명세서에서 설명된 위 실시예 및 도면들은 단지 예시적인 것일 뿐, 어떠한 방법으로도 본 발명의 범위를 한정하는 것은 아니다. 또한, 도면에 도시된 구성요소들 간의 선들의 연결 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것으로서, 실제장치에서는 대체 가능하거나 추가의 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들로서 나타내어질 수 있다. 또한, "필수적인", "중요하게" 등과 같이 구체적인 언급이 없다면 본 발명의 적용을 위하여 반드시 필요한 구성요소가 아닐 수 있다.The above embodiments and drawings described in this specification are merely illustrative, and do not limit the scope of the present invention in any way. In addition, the connection of lines or connecting members between the components shown in the drawings are examples of functional connections and / or physical or circuit connections, which can be replaced in actual devices or additional various functional connections, physical connection, or circuit connections. In addition, if there is no specific reference such as "essential" or "important", it may not be a necessary component for the application of the present invention.

본 발명의 명세서(특히 특허청구범위에서)에서 "상기"의 용어 및 이와 유사한 지시 용어의 사용은 단수 및 복수 모두에 해당하는 것일 수 있다. 또한, 본 발명에서 범위(range)를 기재한 경우 상기 범위에 속하는 개별적인 값을 적용한 발명을 포함하는 것으로서(이에 반하는 기재가 없다면), 발명의 상세한 설명에 상기 범위를 구성하는 각 개별적인 값을 기재한 것과 같다. 또한 본 발명 중 방법 발명에서 제시하는 단계들은 반드시 그 선후의 순서에 대한 구속을 의도한 것이 아니며, 각 공정의 본질에 따라 반드시 어느 단계가 선행되어야 하는 것이 아닌 한 순서는 필요에 따라 적절히 변경될 수 있다. 본 발명에서 모든 예들 또는 예시적인 용어(예들 들어, 등등)의 사용은 단순히 본 발명을 상세히 설명하기 위한 것으로서 특허청구범위에 의해 한정되지 않는 이상 상기 예들 또는 예시적인 용어로 인해본 발명의 범위가 한정되는 것은 아니다. 또한, 통상의 기술자는 다양한 수정, 조합 및 변경이 부가된 특허청구범위 또는 그 균등 범주 내에서 설계 조건 및 요소에 따라 구성될 수 있음을 이해할 수 있다.In the specification of the present invention (particularly in the claims), the use of the term "above" and similar indicating terms may correspond to both singular and plural. In addition, when a range is described in the present invention, it includes an invention in which individual values belonging to the range are applied (unless there is a description to the contrary), and each individual value constituting the range is described in the detailed description of the invention Same as In addition, the steps presented in the method invention of the present invention are not necessarily intended to restrict the order of their precedence and subsequent steps, and the order can be appropriately changed as needed unless a certain step must be preceded according to the nature of each process. there is. The use of all examples or exemplary terms (eg, etc.) in the present invention is simply to explain the present invention in detail, and the scope of the present invention is limited due to the examples or exemplary terms unless limited by the claims. it is not going to be In addition, those skilled in the art can understand that various modifications, combinations and changes can be made according to the design conditions and elements within the appended claims or equivalents thereof.

10 : 프로세서
20 : 메모리
30 : 인터페이스 디바이스
100 : 신호 처리 장치
110 : 인공지능 처리 엔진
111 : 코어부
112 : 신경망 처리부
112a : 이진 신경망 처리부
112b : 정수 신경망 처리부
113 : 인공지능 처리 엔진 전용 메모리
200 : 장치
1121a : 활성화 입력부
1122a : 신경망 네트워크
1123a : 스케일러
1124a : 어큐물레이터
1125a : 풀링부
1126a : 활성화 출력부
1121b : 활성화 입력부
1122b : 신경망 네트워크
1123b : 스케일러
1124b : 어큐물레이터
1125b : 풀링부
1126b : 활성화 출력부10: Processor
20: memory
30: interface device
100: signal processing device
110: artificial intelligence processing engine
111: core part
112: neural network processing unit
112a: binary neural network processing unit
112b: integer neural network processing unit
113: Dedicated memory for artificial intelligence processing engine
200: device
1121a: activation input
1122a: neural network
1123a: Scaler
1124a: accumulator
1125a: pulling unit
1126a: activation output
1121b: activation input
1122b: neural network
1123b: Scaler
1124b: accumulator
1125b: pooling unit
1126b: activation output

Claims

one or more digital signal processors that perform digital signal processing; and
It is configured to include; an artificial intelligence processing engine that performs neural network calculation in conjunction with the digital signal processing unit,
In the artificial intelligence processing engine,
a binary neural network processor processing binary data;
an integer neural network processor processing integer data; and
and a core unit that controls the binary neural network processing unit and the integer neural network processing unit.

According to claim 1,
In the binary neural network processing unit and the integer neural network processing unit,
A signal processing device characterized by performing signal processing on the binary data or the integer data based on a convolutional neural network (CNN).

According to claim 1,
In the binary neural network processing unit,
A signal processing device characterized in that it processes a multiplication operation using a bit operation of exclusive negation OR (XNOR).

According to claim 1,
In the integer neural network processing unit,
A signal processing device characterized by performing signal processing on the integer data calculated through quantization.

According to claim 1,
In the core part,
The signal processing device, characterized in that performing signal processing on data not supported by the binary neural network processing unit and the integer neural network processing unit.

According to claim 1,
In the artificial intelligence processing engine,
A signal processing device characterized in that a dedicated memory for an artificial intelligence processing engine used exclusively for performing the neural network operation is provided.

According to claim 1,
The binary neural network processing unit,
one or more neural network networks;
at least one scaler performing batch normalization on the 1-1 data output from the at least one neural network;
an accumulator performing an accumulation operation on data 1-2 output from the at least one scaler; and
A signal processing device comprising a puller that generates output data.

According to claim 7,
When the binary neural network processing unit operates in an iterative operation mode in which operations are performed by dividing into a plurality of steps,
The signal processing device according to claim 1 , wherein the accumulator stores an operation result of an intermediate stage among the plurality of stages and adds an operation result of a subsequent stage.

According to claim 8,
The one or more scalers and the pooling unit,
A signal processing device characterized in that it is activated in a last step among the plurality of steps to calculate a final operation result.

According to claim 1,
The integer neural network processing unit,
one or more neural network networks;
at least one scaler performing batch normalization on the 2-1 data output from the at least one neural network;
an accumulator performing an accumulation operation on the 2-2 data output from the at least one scaler; and
A signal processing device comprising a puller that generates output data.

According to claim 10,
When the integer neural network processing unit operates in an iterative operation mode in which operations are performed by dividing into a plurality of steps,
The signal processing device according to claim 1 , wherein the accumulator stores an operation result of an intermediate stage among the plurality of stages and adds an operation result of a subsequent stage.

According to claim 11,
The one or more scalers and the pooling unit,
A signal processing device characterized in that it is activated in a last step among the plurality of steps to calculate a final operation result.