KR20210147862A

KR20210147862A - Method and apparatus for training retrosynthesis prediction model

Info

Publication number: KR20210147862A
Application number: KR1020210020694A
Authority: KR
Inventors: 송유영; 서승우; 신진우; 양은호; 황성주
Original assignee: 삼성전자주식회사; 한국과학기술원
Priority date: 2020-05-29
Filing date: 2021-02-16
Publication date: 2021-12-07

Abstract

A method for learning of an inverse synthesis prediction model comprises: a step of determining the information to be paid attention to in a first character string information of the product based on the first graph information of the product, and encoding the first character string information based on a determination result; a step of determining the information to be attended to from the first graph information and the second graph information of a reactant, and decoding the second character string information of the reactant based on the determination result; and a step of learning the inverse synthesis prediction model based on a decoding result of the second character string information.

Description

Method and apparatus for training retrosynthesis prediction model

역합성 예측 모델의 학습 방법 및 장치에 관한다.It relates to a learning method and apparatus for a retrosynthesis prediction model.

뉴럴 네트워크(neural network)는 생물학적 뇌를 모델링한 컴퓨터 과학적 아키텍쳐(computational architecture)를 참조한다. 뉴럴 네트워크(neural network) 기술이 발전함에 따라, 다양한 종류의 전자 시스템에서 뉴럴 네트워크를 활용하여 입력 데이터를 분석하고 유효한 정보를 추출하고 있다.Neural network refers to a computational architecture that models the biological brain. As neural network technology develops, various types of electronic systems use neural networks to analyze input data and extract valid information.

뉴럴 네트워크를 이용하여 생성물로부터 반응물을 정확하게 예측할 수 있는 기술이 요구된다. A technology capable of accurately predicting a reactant from a product using a neural network is required.

역합성 예측 모델의 학습 방법 및 장치를 제공하는 데 있다. 본 실시예가 이루고자 하는 기술적 과제는 상기와 같은 기술적 과제들로 한정되지 않으며 이하의 실시예들로부터 또 따른 기술적 과제들이 유추될 수 있다.An object of the present invention is to provide a learning method and apparatus for a retrosynthesis prediction model. The technical problem to be achieved by the present embodiment is not limited to the above technical problems, and further technical problems may be inferred from the following embodiments.

일 측면에 따르면, 역합성 예측 모델의 학습 방법은 생성물의 제1 그래프 정보에 기초하여 상기 생성물의 제1 문자열 정보에서 제1 어텐션 정보(attention information)를 판단하고, 판단 결과에 기초하여 상기 제1 문자열 정보를 인코딩하는 단계, 상기 제1 그래프 정보 및 반응물의 제2 그래프 정보에서 제2 어텐션 정보를 판단하고, 판단 결과에 기초하여 상기 반응물의 제2 문자열 정보를 디코딩하는 단계 및 상기 제2 문자열 정보의 디코딩 결과에 기초하여 상기 역합성 예측 모델을 학습시키는 단계를 포함한다.According to one aspect, the method for learning the inverse synthesis prediction model determines first attention information from the first character string information of the product based on first graph information of the product, and based on the determination result, the first Encoding string information, determining second attention information from the first graph information and second graph information of the reactant, and decoding second string information of the reactant based on the determination result, and the second string information and training the inverse synthesis prediction model based on the decoding result of .

다른 측면에 따르면, 컴퓨터로 읽을 수 있는 기록매체는 상술한 방법을 실행하는 명령어들을 포함하는 하나 이상의 프로그램이 기록된 기록매체를 포함한다.According to another aspect, the computer-readable recording medium includes a recording medium in which one or more programs including instructions for executing the above-described method are recorded.

또 다른 측면에 따르면, 역합성 예측 모델을 이용하여 반응 생성물을 예측하는 장치는 적어도 하나의 프로그램이 저장된 메모리 및 상기 적어도 하나의 프로그램을 실행하는 프로세서를 포함하고, 상기 프로세서는 생성물의 제1 그래프 정보에 기초하여 상기 생성물의 제1 문자열 정보에서 제1 어텐션 정보(attention information)를 판단하고, 판단 결과에 기초하여 상기 제1 문자열 정보를 인코딩하고, 상기 제1 그래프 정보 및 반응물의 제2 그래프 정보에서 제2 어텐션 정보를 판단하고, 판단 결과에 기초하여 상기 반응물의 제2 문자열 정보를 디코딩하고, 상기 제2 문자열 정보의 디코딩 결과에 기초하여 상기 역합성 예측 모델을 학습시킨다.According to another aspect, an apparatus for predicting a reaction product using the inverse synthesis prediction model includes a memory storing at least one program and a processor executing the at least one program, wherein the processor includes first graph information of the product Determines first attention information from the first character string information of the product based on The second attention information is determined, the second string information of the reactant is decoded based on the determination result, and the inverse synthesis prediction model is trained based on the decoding result of the second string information.

도 1은 일 실시예에 따른 뉴럴 네트워크 장치의 하드웨어 구성을 도시한 블록도이다.
도 2는 일 실시예에 따른 문자열 정보를 설명하기 위한 도면이다.
도 3은 일 실시예에 따른 그래프 정보를 설명하기 위한 도면이다.
도 4는 일 실시예에 따른 프로세서에서 수행되는 역합성 예측 모델의 학습 방법을 설명하기 위한 도면이다.
도 5는 일 실시예에 따른 인코딩 방법을 설명하기 위한 도면이다.
도 6은 일 실시예에 따른 제1 마스크 행렬의 생성 방법을 설명하기 위한 도면이다.
도 7은 일 실시예에 따른 제1 마스크 행렬의 생성 방법을 설명하기 위해 참조되는 도면이다.
도 8은 일 실시예에 따른 제1 마스크 행렬의 효과를 설명하기 위한 도면이다.
도 9는 일 실시예에 따른 디코딩 방법을 설명하기 위한 도면이다.
도 10은 일 실시예에 따른 제2 마스크 행렬의 생성 방법을 설명하기 위한 도면이다.
도 11은 일 실시예에 따른 제2 마스크 행렬의 생성 방법을 설명하기 위해 참조되는 도면이다.
도 12는 일 실시예에 따른 역합성 예측 모델의 동작 방법을 설명하기 위한 순서도이다.
도 13은 일 실시예에 따른 인코딩 방법을 설명하기 위한 순서도이다.
도 14는 일 실시예에 따른 디코딩 방법을 설명하기 위한 순서도이다.
도 15는 일 실시예에 따른 역합성 예측 모델의 학습 방법을 설명하기 위한 순서도이다.1 is a block diagram illustrating a hardware configuration of a neural network device according to an embodiment.
2 is a diagram for explaining character string information according to an embodiment.
3 is a diagram for explaining graph information according to an embodiment.
4 is a diagram for explaining a method of learning an inverse synthesis prediction model performed by a processor according to an embodiment.
5 is a diagram for explaining an encoding method according to an embodiment.
6 is a diagram for describing a method of generating a first mask matrix according to an exemplary embodiment.
7 is a diagram referenced to describe a method of generating a first mask matrix according to an embodiment.
8 is a diagram for describing an effect of a first mask matrix according to an exemplary embodiment.
9 is a diagram for explaining a decoding method according to an embodiment.
10 is a diagram for describing a method of generating a second mask matrix according to an exemplary embodiment.
11 is a diagram referenced to describe a method of generating a second mask matrix according to an embodiment.
12 is a flowchart illustrating a method of operating an inverse synthesis prediction model according to an embodiment.
13 is a flowchart illustrating an encoding method according to an embodiment.
14 is a flowchart illustrating a decoding method according to an embodiment.
15 is a flowchart illustrating a method of learning an inverse synthesis prediction model according to an embodiment.

본 명세서에서 다양한 곳에 등장하는 "일부 실시예에서" 또는 "일 실시예에서" 등의 어구는 반드시 모두 동일한 실시예를 가리키는 것은 아니다.The appearances of the phrases "in some embodiments" or "in one embodiment" in various places in this specification are not necessarily all referring to the same embodiment.

본 개시의 일부 실시예는 기능적인 블록 구성들 및 다양한 처리 단계들로 나타내어질 수 있다. 이러한 기능 블록들의 일부 또는 전부는, 특정 기능들을 실행하는 다양한 개수의 하드웨어 및/또는 소프트웨어 구성들로 구현될 수 있다. 예를 들어, 본 개시의 기능 블록들은 하나 이상의 마이크로프로세서들에 의해 구현되거나, 소정의 기능을 위한 회로 구성들에 의해 구현될 수 있다. 또한, 예를 들어, 본 개시의 기능 블록들은 다양한 프로그래밍 또는 스크립팅 언어로 구현될 수 있다. 기능 블록들은 하나 이상의 프로세서들에서 실행되는 알고리즘으로 구현될 수 있다. 또한, 본 개시는 전자적인 환경 설정, 신호 처리, 및/또는 데이터 처리 등을 위하여 종래 기술을 채용할 수 있다. “매커니즘”, “요소”, “수단” 및 “구성”등과 같은 용어는 넓게 사용될 수 있으며, 기계적이고 물리적인 구성들로서 한정되는 것은 아니다.Some embodiments of the present disclosure may be represented by functional block configurations and various processing steps. Some or all of these functional blocks may be implemented in various numbers of hardware and/or software configurations that perform specific functions. For example, the functional blocks of the present disclosure may be implemented by one or more microprocessors, or by circuit configurations for a given function. Also, for example, the functional blocks of the present disclosure may be implemented in various programming or scripting languages. The functional blocks may be implemented as an algorithm running on one or more processors. In addition, the present disclosure may employ prior art for electronic configuration, signal processing, and/or data processing, and the like. Terms such as “mechanism”, “element”, “means” and “configuration” may be used broadly and are not limited to mechanical and physical components.

또한, 도면에 도시된 구성 요소들 간의 연결 선 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것일 뿐이다. 실제 장치에서는 대체 가능하거나 추가된 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들에 의해 구성 요소들 간의 연결이 나타내어질 수 있다.In addition, the connecting lines or connecting members between the components shown in the drawings only exemplify functional connections and/or physical or circuit connections. In an actual device, a connection between components may be represented by various functional connections, physical connections, or circuit connections that are replaceable or added.

한편, 본 명세서에서 사용된 용어와 관련하여, 뉴럴 네트워크 시스템에서 이용되는 데이터인 구조(structure)는 물질의 원자(atom) 레벨의 구조를 의미할 수 있다. 구조는, 원자와 원자 간의 연결 관계(bond)에 기반하는 구조식일 수 있다.Meanwhile, in relation to the terms used in this specification, a structure that is data used in a neural network system may mean an atom-level structure of a material. The structure may be a structural formula based on a bond between atoms.

도 1은 일 실시예에 따른 뉴럴 네트워크 장치의 하드웨어 구성을 도시한 블록도이다.1 is a block diagram illustrating a hardware configuration of a neural network device according to an embodiment.

뉴럴 네트워크 장치(100)는 PC(personal computer), 서버 디바이스, 모바일 디바이스, 임베디드 디바이스 등의 다양한 종류의 디바이스들로 구현될 수 있고, 구체적인 예로서 뉴럴 네트워크를 이용한 음성 인식, 영상 인식, 영상 분류 등을 수행하는 스마트폰, 태블릿 디바이스, AR(Augmented Reality) 디바이스, IoT(Internet of Things) 디바이스, 자율주행 자동차, 로보틱스, 의료기기 등에 해당될 수 있으나, 이에 제한되지 않는다. 나아가서, 뉴럴 네트워크 장치(100)는 위와 같은 디바이스에 탑재되는 전용 하드웨어 가속기(HW accelerator)에 해당될 수 있고, 뉴럴 네트워크 장치(100)는 뉴럴 네트워크 구동을 위한 전용 모듈인 NPU(neural processing unit), TPU(Tensor Processing Unit), Neural Engine 등과 같은 하드웨어 가속기일 수 있으나, 이에 제한되지 않는다.The neural network apparatus 100 may be implemented with various types of devices such as a personal computer (PC), a server device, a mobile device, and an embedded device, and as a specific example, voice recognition, image recognition, image classification, etc. using a neural network It may correspond to a smartphone, tablet device, AR (Augmented Reality) device, IoT (Internet of Things) device, autonomous vehicle, robotics, medical device, etc., but is not limited thereto. Furthermore, the neural network apparatus 100 may correspond to a dedicated hardware accelerator mounted on the above device, and the neural network apparatus 100 is a dedicated module for driving a neural network, a neural processing unit (NPU), It may be a hardware accelerator such as a Tensor Processing Unit (TPU), a Neural Engine, or the like, but is not limited thereto.

도 1을 참조하면, 뉴럴 네트워크 장치(100)는 프로세서(110), 메모리(120) 및 사용자 인터페이스(130)를 포함한다. 도 1에 도시된 뉴럴 네트워크 장치(100)에는 본 실시예들과 관련된 구성요소들만이 도시되어 있다. 따라서, 뉴럴 네트워크 장치(100)에는 도 1에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 더 포함될 수 있음은 당해 기술분야의 통상의 기술자에게 자명하다.Referring to FIG. 1 , the neural network device 100 includes a processor 110 , a memory 120 , and a user interface 130 . In the neural network device 100 shown in FIG. 1, only the components related to the present embodiments are shown. Accordingly, it is apparent to those skilled in the art that the neural network device 100 may further include other general-purpose components in addition to the components shown in FIG. 1 .

프로세서(110)는 뉴럴 네트워크 장치(100)를 실행하기 위한 전반적인 기능들을 제어하는 역할을 한다. 예를 들어, 프로세서(110)는 뉴럴 네트워크 장치(100) 내의 메모리(120)에 저장된 프로그램들을 실행함으로써, 뉴럴 네트워크 장치(100)를 전반적으로 제어한다. 프로세서(110)는 뉴럴 네트워크 장치(100) 내에 구비된 CPU(central processing unit), GPU(graphics processing unit), AP(application processor) 등으로 구현될 수 있으나, 이에 제한되지 않는다.The processor 110 serves to control overall functions for executing the neural network device 100 . For example, the processor 110 generally controls the neural network device 100 by executing programs stored in the memory 120 in the neural network device 100 . The processor 110 may be implemented as a central processing unit (CPU), a graphics processing unit (GPU), an application processor (AP), etc. included in the neural network device 100 , but is not limited thereto.

메모리(120)는 뉴럴 네트워크 장치(100) 내에서 처리되는 각종 데이터들을 저장하는 하드웨어로서, 예를 들어, 메모리(120)는 뉴럴 네트워크 장치(100)에서 처리된 데이터들 및 처리될 데이터들을 저장할 수 있다. 또한, 메모리(120)는 뉴럴 네트워크 장치(100)에 의해 구동될 애플리케이션들, 드라이버들 등을 저장할 수 있다. 메모리(120)는 DRAM(dynamic random access memory), SRAM(static random access memory) 등과 같은 RAM(random access memory), ROM(read-only memory), EEPROM(electrically erasable programmable read-only memory), CD-ROM, 블루레이 또는 다른 광학 디스크 스토리지, HDD(hard disk drive), SSD(solid state drive), 또는 플래시 메모리를 포함할 수 있다.The memory 120 is hardware for storing various data processed in the neural network device 100 . For example, the memory 120 may store data processed by the neural network device 100 and data to be processed. have. Also, the memory 120 may store applications, drivers, and the like to be driven by the neural network device 100 . The memory 120 includes random access memory (RAM), such as dynamic random access memory (DRAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), CD- It may include ROM, Blu-ray or other optical disk storage, a hard disk drive (HDD), a solid state drive (SSD), or flash memory.

프로세서(110)는 테스트 데이터 세트(test data set)를 통해서 역합성 예측 모델을 학습시키고, 학습된 역합성 예측 모델에 기초하여 표적 생성물로부터 적어도 어느 하나의 반응물 조합을 예측할 수 있다. 역합성 예측은 표적 생성물의 역반응 경로(inverse reaction pathway)를 탐색함으로써, 표적 생성물을 합성하기 위한 반응물 분자 조합을 예측하는 것을 의미할 수 있다.The processor 110 may train a retrosynthesis prediction model through a test data set, and predict at least one combination of reactants from a target product based on the learned retrosynthesis prediction model. Reverse synthesis prediction may mean predicting a combination of reactant molecules for synthesizing a target product by exploring an inverse reaction pathway of the target product.

역합성 예측 모델을 트랜스포머(transformer) 모델을 이용하여 구현될 수 있다. 뉴럴 네트워크 장치(100)가 트랜스포머 모델을 이용함에 따라, 데이터의 병렬 처리 및 신속한 연산이 가능하다.The inverse synthesis prediction model may be implemented using a transformer model. As the neural network device 100 uses the transformer model, parallel processing of data and rapid computation are possible.

프로세서(110)는 테스트 데이터 세트를 이용하여 트랜스포머 모델을 학습시킬 수 있다.The processor 110 may train the transformer model using the test data set.

테스트 데이터 세트는 테스트 생성물들(test products) 및 테스트 생성물들 각각에 대응하는 테스트 반응물(test reactant) 조합을 포함할 수 있다. 예를 들어, 테스트 생성물 중 어느 하나는 벤젠(benzene)이고, 벤젠에 대응되는 테스트 반응물 조합은 퓨란(furan) 및 에틸렌(ethylene)일 수 있다.The test data set may include test products and combinations of test reactants corresponding to each of the test products. For example, any one of the test products may be benzene, and the test reactant combination corresponding to benzene may be furan and ethylene.

실시예에 따라, 테스트 데이터 세트는 실험 조건에 따른 테스트 생성물들의 예측 수율 및 테스트 반응물 조합들 각각의 반응 방식들을 더 포함할 수 있다.According to an embodiment, the test data set may further include a predicted yield of test products according to an experimental condition and reaction methods of each of test reactant combinations.

실험 조건은 반응물을 이용하여 생성물을 생성하는 실험을 진행하기 위해 설정되는 여러 가지 조건을 의미할 수 있다. 예를 들어, 실험 조건은 촉매, 염기, 용매, 시약, 온도, 반응 시간 중 적어도 어느 하나를 포함할 수 있다.Experimental conditions may refer to various conditions set in order to proceed with an experiment for generating a product using a reactant. For example, the experimental conditions may include at least one of a catalyst, a base, a solvent, a reagent, a temperature, and a reaction time.

반응 방식은 반응물 조합들을 이용하여 생성물을 생성하기 위한 화학 반응 방법을 의미할 수 있다.Reaction mode may refer to a chemical reaction method for producing a product using reactant combinations.

프로세서(110)는 역합성 예측 모델을 학습시키기 위하여 테스트 생성물들 및 테스트 생성물들 각각에 대응하는 테스트 반응물 조합을 수신할 수 있다.The processor 110 may receive test products and a test reactant combination corresponding to each of the test products to train the inverse synthesis prediction model.

프로세서(110)는 테스트 생성물의 화학 구조를 1차원의 문자열 형식 및 2차원의 그래프 형식으로 입력 받을 수 있다. 또한, 프로세서(110)는 테스트 반응물의 화학 구조를 1차원의 문자열 형식 및 2차원의 그래프 형식으로 입력 받을 수 있다.The processor 110 may receive the chemical structure of the test product in a one-dimensional string format and a two-dimensional graph format. In addition, the processor 110 may receive the chemical structure of the test reactant in a one-dimensional string format and a two-dimensional graph format.

프로세서(110)는 테스트 생성물의 제1 문자열 정보를 벡터 형식으로 인코딩할 수 있다.The processor 110 may encode the first string information of the test product in a vector format.

프로세서(110)는 테스트 생성물의 제1 그래프 정보에 기초하여 제1 문자열 정보에서 어텐션(attention)해야할 정보를 판단하고, 판단 결과에 기초하여 제1 문자열 정보를 인코딩할 수 있다. 제1 문자열 정보에서 어텐션해야할 정보는 제1 어텐션 정보(attention information)라고 명명될 수 있다. 제1 어텐션 정보는 인코딩 대상 원자와 인접(neighbor)하는 인접 원자들(neighbor atoms)에 대한 정보일 수 있다. 다시 말해, 프로세서(110)는 테스트 생성물의 제1 그래프 정보로부터 인코딩 대상 원자의 인접 원자들에 대한 정보를 획득하고, 인코딩 대상 원자와 인접 원자들 사이의 관계에 더욱 어텐션하여, 인코딩 대상 원자를 인코딩할 수 있다.The processor 110 may determine information to be focused on in the first string information based on the first graph information of the test product, and encode the first string information based on the determination result. Information to be focused on in the first character string information may be referred to as first attention information. The first attention information may be information on neighboring atoms adjacent to the encoding target atom. In other words, the processor 110 obtains information on adjacent atoms of the encoding object atom from the first graph information of the test product, and further pays attention to the relationship between the encoding object atom and the adjacent atoms to encode the encoding object atom can do.

프로세서(110)가 입력 정보(예를 들어, 토큰)와 관련된 정보를 전체 문자열 정보에서 특정하고, 특정된 정보에 어텐션하여 입력 정보를 인코딩함에 따라, 고정 길이 벡터로 모든 문자열 정보를 인코딩함에 따라 발생되는 정보 손실(vanishing)이 감소될 수 있다. 또한, 프로세서(110)가 인코딩 대상 원자와 화학적으로 관련성이 높은 인접 원자들 사이의 관계에 더욱 어텐션하여, 인코딩 대상 원자를 인코딩함에 따라, 역합성 예측 모델의 효율적인 학습 및 신속한 학습이 가능하고, 역합성 예측 모델의 정확도가 현저하게 상승될 수 있다.As the processor 110 specifies information related to input information (eg, token) from the entire string information, and encodes the input information by paying attention to the specified information, it occurs as all string information is encoded into a fixed-length vector information vanishing can be reduced. In addition, as the processor 110 encodes the encoding target atom by paying more attention to the relationship between the encoding target atom and the chemically related adjacent atoms, efficient and rapid learning of the inverse synthesis prediction model is possible, The accuracy of the synthetic prediction model can be significantly increased.

프로세서(110)는 인코딩된 제1 문자열 정보를 제1 출력 시퀀스로써 출력할 수 있다.The processor 110 may output the encoded first string information as a first output sequence.

프로세서(110)는 테스트 생성물의 제1 그래프 정보 및 테스트 반응물의 제2 그래프 정보에서 어텐션해야할 정보를 판단하고, 판단 결과에 기초하여 반응물의 제2 문자열 정보를 디코딩할 수 있다. 제1 그래프 정보 및 제2 그래프 정보에서 어텐션해야할 정보는 제2 어텐션 정보라고 명명될 수 있다. 테스트 생성물과 테스트 반응물의 관계는 교차 어텐션 행렬(cross attention matrix)로 나타낼 수 있고, 이상적인 교차 어텐션 행렬은 생성물에 포함된 원자와 반응물에 포함된 원자 사이의 관계를 나타내는 원자 맵핑(atom-mapping) 정보를 추종(catch)한다. 따라서, 역합성 예측 모델의 학습을 위하여 제1 그래프 정보 및 제2 그래프 정보에서 어텐션해야할 정보는 원자 맵핑 정보일 수 있다. 다시 말해, 프로세서(110)는 제1 그래프 정보 및 제2 그래프 정보로부터 원자 맵핑 정보를 획득하고, 원자 맵핑 정보에 더욱 어텐션하여, 디코딩 대상 원자를 디코딩할 수 있다.The processor 110 may determine information to be paid attention to in the first graph information of the test product and the second graph information of the test reactant, and decode the second string information of the reactant based on the determination result. Information to be paid attention to in the first graph information and the second graph information may be referred to as second attention information. The relationship between the test product and the test reactant can be represented by a cross attention matrix, and the ideal cross attention matrix is atom-mapping information representing the relationship between the atoms included in the product and the atoms included in the reactant. to catch Accordingly, information to be paid attention to in the first graph information and the second graph information for learning the inverse synthesis prediction model may be atomic mapping information. In other words, the processor 110 may obtain atomic mapping information from the first graph information and the second graph information, and further pay attention to the atomic mapping information to decode an atom to be decoded.

프로세서(110)가 출력 정보(예를 들어, 토큰)의 예측 시, 입력 문자열 정보에서 관련도가 높은 부분(예를 들어, 토큰)을 특정하고, 특정된 정보에 어텐션하여 입력 정보를 디코딩함에 따라, 입력 문자열이 길어짐에 따른 정보 손실 (vanishing)이 감소될 수 있다.When the processor 110 predicts output information (eg, token), it specifies a highly relevant part (eg, token) in the input string information, and decodes the input information by paying attention to the specified information , information loss (vanishing) due to the lengthening of the input string can be reduced.

또한, 프로세서(110)가 역합성 예측 모델의 학습을 위하여 원자 맵핑 정보에 더욱 어텐션하여 디코딩 대상 원자를 디코딩함에 따라, 역합성 예측 모델의 효율적인 학습 및 신속한 학습이 가능하고, 역합성 예측 모델의 정확도가 현저하게 상승될 수 있다.In addition, as the processor 110 decodes an atom to be decoded with more attention to the atomic mapping information for learning the inverse synthesis prediction model, efficient and rapid learning of the inverse synthesis prediction model is possible, and the accuracy of the inverse synthesis prediction model can be significantly increased.

프로세서(110)는 제2 문자열 정보의 디코딩 결과에 기초하여 역합성 예측 모델을 학습시킬 수 있다. 프로세서(110)가 역합성 예측 모델을 학습시킨다는 의미는 교차 어텐션 행렬이 원자 맵핑 정보를 추종하도록 인코더 및 디코더의 히든 스테이트들(hidden states)을 학습시킨다는 의미일 수 있다. 또한, 프로세서(110)가 역합성 예측 모델을 학습시킨다는 의미는 테스트 생성물 및 테스트 반응물 각각의 그래프 정보에서 노드(node)와 노드 사이의 관계를 나타내는 엣지(edge)에 대한 정보를 훈련시킨다는 의미일 수 있다. 또한, 프로세서(110)가 역합성 예측 모델을 학습시킨다는 의미는 뉴럴 네트워크 장치(100)의 예측 결과의 정확성을 향상시킨다는 의미일 수 있다. 실시예에 따라, 프로세서(110)가 역합성 예측 모델을 학습시킨다는 의미는 생성물에 포함된 원자와 반응물에 포함된 원자 사이의 관계를 나타내는 원자 맵핑 정보를 학습시킨다는 의미일 수도 있다.The processor 110 may train the inverse synthesis prediction model based on the decoding result of the second string information. That the processor 110 trains the inverse synthesis prediction model may mean that the cross-attention matrix learns hidden states of the encoder and the decoder to follow the atomic mapping information. In addition, the meaning that the processor 110 trains the inverse synthesis prediction model may mean that it trains information on the edge indicating the relationship between the node and the node in the graph information of each test product and test reactant. have. Also, the meaning that the processor 110 trains the inverse synthesis prediction model may mean that the accuracy of the prediction result of the neural network apparatus 100 is improved. According to an embodiment, the fact that the processor 110 trains the inverse synthesis prediction model may mean that it learns atomic mapping information indicating a relationship between an atom included in a product and an atom included in a reactant.

프로세서(110)는 학습된 역합성 예측 모델에 기초하여 입력 생성물로부터 적어도 어느 하나의 후보 반응물 조합을 예측할 수 있다. 역합성 예측 모델은 트랜스포머 모델을 이용하여 구현되므로, 프로세서(110)는 트랜스포머 모델을 이용하여 입력 생성물로부터 후보 반응물 조합을 예측할 수 있다.The processor 110 may predict at least one candidate reactant combination from the input product based on the learned inverse synthesis prediction model. Since the inverse synthesis prediction model is implemented using the transformer model, the processor 110 may predict a candidate reactant combination from the input product using the transformer model.

사용자 인터페이스(130)는 실험 결과를 피드백(feedback) 받는 입력 수단을 의미할 수 있다. 예를 들어, 사용자 인터페이스에는 키 패드(key pad), 돔 스위치 (dome switch), 터치 패드(접촉식 정전 용량 방식, 압력식 저항막 방식, 적외선 감지 방식, 표면 초음파 전도 방식, 적분식 장력 측정 방식, 피에조 효과 방식 등), 조그 휠, 조그 스위치 등이 있을 수 있으나 이에 한정되는 것은 아니다. 프로세서(110)는 실험 결과를 피드백 받음으로써 트랜스포머 모델을 업데이트할 수 있다.The user interface 130 may refer to an input means for receiving a feedback of an experiment result. For example, the user interface includes a key pad, a dome switch, and a touch pad (contact capacitive method, pressure resistive film method, infrared sensing method, surface ultrasonic conduction method, integral tension measurement method) , piezo effect method, etc.), a jog wheel, a jog switch, etc., but is not limited thereto. The processor 110 may update the transformer model by receiving a feedback of the experimental result.

이하 본 개시의 설명에서는 역합성 예측 모델의 학습 방법에 대해 상세하게 설명한다. 이하에서 설명된 방법들은 뉴럴 네트워크 장치(100)의 프로세서(110), 메모리(120) 및 사용자 인터페이스(130)에 의해 수행될 수 있다. 또한, 설명의 편의를 위하여 이하에서 생성물은 테스트 생성물을 지칭하고, 반응물은 테스트 반응물을 지칭한다.Hereinafter, in the description of the present disclosure, a method of learning the inverse synthesis prediction model will be described in detail. The methods described below may be performed by the processor 110 , the memory 120 , and the user interface 130 of the neural network device 100 . In addition, for convenience of description, hereinafter, a product refers to a test product, and a reactant refers to a test reactant.

도 2는 일 실시예에 따른 문자열 정보를 설명하기 위한 도면이다.2 is a diagram for explaining character string information according to an embodiment.

도 2를 참조하면, 생성물 및 반응물의 화학 구조는 1차원의 문자열 형식으로 뉴럴 네트워크 장치(100)에 입력될 수 있다. 구조는 원자와 원자 간의 연결 관계(bond)에 기반하는 구조식일 수 있다. 일 실시예에서, 프로세서(110)는 SMILES(Simplified Molecular-Input Line-Entry System) 형식으로 생성물 및 반응물의 화학 구조를 입력 받을 수 있다.Referring to FIG. 2 , chemical structures of products and reactants may be input to the neural network device 100 in the form of a one-dimensional character string. The structure may be a structural formula based on a bond between atoms. In an embodiment, the processor 110 may receive chemical structures of products and reactants in a Simplified Molecular-Input Line-Entry System (SMILES) format.

SMILES 표기법은 고유하지 않으며, 중심 원자의 선택이나 시퀀스의 시작에 따라 다르기 때문에 표준화된 알고리즘이 사용될 수 있다. 본 개시의 SMILES 표기법은 각각의 원자 토큰(예를 들어, B, C, N, O)고 비원자 토큰(non atom token)을 분리한다. 비원자 토큰은 원자 사이의 결합들(예를 들어, -, =, #), 괄호, 공백(whitespace)을 가진 순환 구조들(cyclic structures)의 숫자 등을 포함할 수 있다.Since the SMILES notation is not unique and depends on the choice of the central atom or the start of the sequence, a standardized algorithm can be used. The SMILES notation of this disclosure separates each atomic token (eg, B, C, N, O) and a non-atom token. A non-atomic token may contain bonds between atoms (eg, -, =, #), parentheses, a number of cyclic structures with whitespace, and the like.

예를 들어, 퓨란(210)과 에틸렌(220)이 결합하여 벤젠(230)을 생성하는 경우, 퓨란(210), 에틸렌(220) 및 벤젠(230) 각각의 SMILES는 c1cocc1, c=c 및 c1ccccc1로 표현될 수 있다.For example, when furan 210 and ethylene 220 combine to produce benzene 230, the furan 210, ethylene 220, and benzene 230 SMILES are c1cocc1, c=c and c1ccccc1 can be expressed as

도 2에는 SMILES 표기법에 대해서만 도시되어 있으나, 실시예에 따라, 프로세서(110)는 SMARTS(Smiles Arbitrary Target Specification) 형식 또는 InChi(International Chemical Identifier) 형식으로 생성물 및 반응물의 화학 구조를 입력 받는 것도 가능하다.Although only the SMILES notation is shown in FIG. 2 , according to an embodiment, the processor 110 may receive chemical structures of products and reactants in the SMARTS (Smiles Arbitrary Target Specification) format or InChi (International Chemical Identifier) format. It is also possible .

도 3은 일 실시예에 따른 그래프 정보를 설명하기 위한 도면이다.3 is a diagram for explaining graph information according to an embodiment.

도 3을 참조하면, 생성물 및 반응물의 화학 구조는 2차원의 그래프 형식으로 뉴럴 네트워크 장치(100)에 입력될 수 있다. Referring to FIG. 3 , chemical structures of products and reactants may be input to the neural network device 100 in the form of a two-dimensional graph.

그래프 정보는 노드(node) 및 엣지(edge)를 포함할 수 있다. 노드는 생성물 및 반응물의 원자에 대한 정보를 포함하고, 엣지는 각 원자의 연결관계에 대한 정보를 포함할 수 있다.Graph information may include nodes and edges. A node may include information about atoms of products and reactants, and an edge may include information about a connection relationship between each atom.

트랜스포머 모델은 문자열 정보를 인코딩 및 디코딩하지만, 그래프 신경망으로 재해석될 수도 있다. 일 실시예에서, 트랜스포머 모델에 입력되는 소스 시퀀스(source sequence) 및 타겟 시퀀스(target sequence)의 토큰들은 노드들에 대응될 수 있다. 또한, 트랜스포머 모델의 어텐션(attention)은 엣지에 대응될 수 있다. 예를 들어, 벤젠(230)의 그래프 정보(310)는 제1 내지 제6 노드(n1 내지 n6)를 포함할 수 있다. 또한, 그래프 정보(310)는 각 노드의 연결관계를 나타내는 제1 엣지 내지 제6 엣지(e1 내지 e6)를 포함할 수 있다. 엣지는 초기에 그 값을 알 수 없으나, 트랜스포머 모델의 학습에 의해 파악될 수 있다.The transformer model encodes and decodes string information, but it can also be reinterpreted as a graph neural network. In an embodiment, tokens of a source sequence and a target sequence input to the transformer model may correspond to nodes. In addition, the attention (attention) of the transformer model may correspond to the edge. For example, the graph information 310 of the benzene 230 may include first to sixth nodes n1 to n6 . In addition, the graph information 310 may include first to sixth edges e1 to e6 indicating the connection relationship of each node. Although the value of the edge is not known initially, it can be grasped by learning the transformer model.

도 4는 일 실시예에 따른 프로세서에서 수행되는 역합성 예측 모델의 학습 방법을 설명하기 위한 도면이다.4 is a diagram for explaining a method of learning an inverse synthesis prediction model performed by a processor according to an embodiment.

도 4를 참조하면, 프로세서(110)는 테스트 데이터 세트(test data set)를 통해서 역합성 예측 모델을 학습시키고, 학습된 역합성 예측 모델에 기초하여 표적 생성물로부터 적어도 어느 하나의 반응물 조합을 예측할 수 있다. 이를 위하여, 프로세서(110)는 인코더 임베딩부(410), 제1 위치 인코딩부(420), 인코더부(430), 제1 마스크 행렬 생성부(440), 디코더 임베딩부(450), 제2 위치 인코딩부(460), 디코더부(470) 및 제2 마스크 행렬 생성부(480)를 포함할 수 있다. 도 4에서 인코더 임베딩부(410), 제1 위치 인코딩부(420), 인코더부(430), 제1 마스크 행렬 생성부(440), 디코더 임베딩부(450), 제2 위치 인코딩부(460), 디코더부(470) 및 제2 마스크 행렬 생성부(480)는 프로세서(110)에 포함된 어느 하나의 부(unit)로 도시되어 있으나, 인코더 임베딩부(410), 제1 위치 인코딩부(420), 인코더부(430), 제1 마스크 행렬 생성부(440), 디코더 임베딩부(450), 제2 위치 인코딩부(460), 디코더부(470) 및 제2 마스크 행렬 생성부(480)는 프로세서(110)의 트랜스포머 모델에 포함된 계층(layer)을 의미할 수도 있다. 또한, 잔여 연결(residual connection) 서브 계층 및 정규화(normalization) 서브 계층은 모든 하위 계층들에 개별적으로 적용될 수 있다.Referring to FIG. 4 , the processor 110 trains a reverse synthesis prediction model through a test data set, and predicts at least one reactant combination from a target product based on the learned reverse synthesis prediction model. have. To this end, the processor 110 includes the encoder embedding unit 410 , the first position encoding unit 420 , the encoder unit 430 , the first mask matrix generation unit 440 , the decoder embedding unit 450 , and the second position It may include an encoder 460 , a decoder 470 , and a second mask matrix generator 480 . In FIG. 4 , the encoder embedding unit 410 , the first position encoding unit 420 , the encoder unit 430 , the first mask matrix generation unit 440 , the decoder embedding unit 450 , and the second position encoding unit 460 . , the decoder unit 470 and the second mask matrix generation unit 480 are shown as any one unit included in the processor 110 , but the encoder embedding unit 410 and the first position encoding unit 420 are ), the encoder unit 430 , the first mask matrix generation unit 440 , the decoder embedding unit 450 , the second position encoding unit 460 , the decoder unit 470 , and the second mask matrix generation unit 480 are It may mean a layer included in the transformer model of the processor 110 . In addition, a residual connection sub-layer and a normalization sub-layer may be individually applied to all lower layers.

인코더 임베딩부(410)는 입력 데이터를 문자 단위로 임베딩할 수 있다. 이 때, 입력 데이터는 생성물의 제1 문자열 정보(491)를 의미할 수 있다. 다시 말해, 인코더 임베딩부(410)는 문자열 형식의 제1 문자열 정보(491)를 기 설정된 차원의 벡터에 맵핑할 수 있다. 예를 들어, 기 설정된 차원은 512 차원일 수 있으나 이에 제한되지 않는다.The encoder embedding unit 410 may embed input data in character units. In this case, the input data may mean the first character string information 491 of the product. In other words, the encoder embedding unit 410 may map the first string information 491 in a string format to a vector having a preset dimension. For example, the preset dimension may be 512 dimensions, but is not limited thereto.

제1 위치 인코딩부(420)는 제1 문자열 정보(491)에 포함된 각각의 문자의 위치를 식별하기 위하여, 위치 인코딩(positional encoding)을 수행할 수 있다. 예를 들어, 제1 위치 인코딩부(420)는 상이한 주파수의 정현파를 이용하여 제1 문자열 정보(491)를 위치 인코딩할 수 있다. 제1 위치 인코딩부(420)는 임베딩된 제1 문자열 정보(491)에 위치 정보를 결합하여 인코더부(430)에 제공할 수 있다.The first positional encoding unit 420 may perform positional encoding to identify the position of each character included in the first string information 491 . For example, the first position encoding unit 420 may position-encode the first character string information 491 using sine waves of different frequencies. The first position encoding unit 420 may provide the encoder unit 430 by combining the position information with the embedded first character string information 491 .

인코더부(430)는 인코더 셀프 어텐션부(431) 및 인코더 피드 포워드부(432)를 포함할 수 있다. 도 4에서 인코더부(430)는 하나의 부(unit)로 도시되어 있으나, N개의 인코더들이 적층된 형태일 수 있다. 또한, 도 4에서 인코더 셀프 어텐션부(431) 및 인코더 피드 포워드부(432)는 인코더부(430)는 하나의 부(unit)로 표시되나, 인코더 셀프 어텐션부(431) 및 인코더 피드 포워드부(432)는 인코더 계층에 포함된 각각의 서브 계층을 의미할 수도 있다.The encoder unit 430 may include an encoder self-attention unit 431 and an encoder feed forward unit 432 . In FIG. 4 , the encoder unit 430 is illustrated as one unit, but N encoders may be stacked. In addition, in FIG. 4, the encoder self-attention unit 431 and the encoder feed-forward unit 432 are displayed as one unit, the encoder unit 430, but the encoder self-attention unit 431 and the encoder feed-forward unit ( 432 may mean each sub-layer included in the encoder layer.

인코더 셀프 어텐션부(431)는 제1 문자열 정보(491)의 인코딩 시, 제1 문자열 정보(491)에서 셀프 어텐션(self attention)해야할 정보를 특정할 수 있다. 이를 위하여 인코더 셀프 어텐션부(431)는 셀프 어텐션 점수 행렬(self attention score matrix)을 생성할 수 있다. 셀프 어텐션 점수 행렬에는 제1 문자열 정보(491)에 포함된 문자열들 사이의 관련 정도를 나타내는 점수(score)가 맵핑(mapping)될 수 있다. 인코더 셀프 어텐션부(431)는 소프트 맥스 함수를 이용하여 셀프 어텐션 점수 행렬의 요소 값들(element values)을 확률로써 나타내는 셀프 어텐션 행렬(self attention matrix)을 생성할 수 있다.When encoding the first character string information 491 , the encoder self-attention unit 431 may specify information to be self-attention to in the first character string information 491 . To this end, the encoder self-attention unit 431 may generate a self-attention score matrix. A score indicating a degree of relevance between character strings included in the first character string information 491 may be mapped to the self-attention score matrix. The encoder self-attention unit 431 may generate a self-attention matrix representing element values of the self-attention score matrix as probabilities by using the soft max function.

인코더부(430)는 셀프 어텐션 행렬에 기초하여 입력 문자열의 인코딩 시, 셀프 어텐션해야할 요소들(elements)을 제1 문자열 정보(491)에서 특정할 수 있다. 인코더 셀프 어텐션부(431)는 멀티 헤드(multi head)로 구성되어 복수의 셀프 어텐션 행렬을 생성할 수 있다.When encoding the input string based on the self-attention matrix, the encoder unit 430 may specify elements to be self-attended in the first string information 491 . The encoder self-attention unit 431 may be configured as a multi-head to generate a plurality of self-attention matrices.

한편, 인코더 셀프 어텐션부(431)는 제1 문자열 정보(491)에 포함된 토큰들 사이의 관련 정도를 표현하는 공간을 조정하지 않을 수 있으나, 후술하는 셀프 어텐션(self attention) 및 교차 어텐션(cross attention)에 적용될 수 있는 그래프 구조를 사용하여 관련 정도를 표현하는 공간을 제한할 수 있다.Meanwhile, the encoder self-attention unit 431 may not adjust the space expressing the degree of relevance between tokens included in the first string information 491 , but self-attention and cross-attention, which will be described later A graph structure that can be applied to attention) can be used to limit the space expressing the degree of relevance.

프로세서(110)는 학습의 효율성, 신속성 및 정확성을 위하여 제1 그래프 정보(441)에 기초하여 셀프 어텐션 점수 행렬 및/또는 셀프 어텐션 행렬에서 제1 문자열 정보(491)의 인코딩 시 어텐션 해야할 정보를 판단할 수 있다. 셀프 어텐션 점수 행렬 및/또는 셀프 어텐션 행렬에서 어텐션 해야할 정보는 제1 어텐션 정보일 수 있다. 이를 위하여, 제1 마스크 행렬 생성부(440)는 셀프 어텐션 점수 행렬 및/또는 셀프 어텐션 행렬에서 불필요한 요소들(elements)을 마스킹하는 제1 마스크 행렬(442)을 생성할 수 있다.The processor 110 determines information to be attended when encoding the first string information 491 in the self-attention score matrix and/or the self-attention matrix based on the first graph information 441 for efficiency, speed, and accuracy of learning. can do. Information to be attended to in the self-attention score matrix and/or the self-attention matrix may be first attention information. To this end, the first mask matrix generator 440 may generate a first mask matrix 442 for masking unnecessary elements in the self-attention score matrix and/or the self-attention matrix.

제1 마스크 행렬 생성부(440)는 생성물의 제1 그래프 정보(441) 및 기 설정된 기준 거리에 기초하여 제1 마스크 행렬(442)을 생성할 수 있다.The first mask matrix generator 440 may generate a first mask matrix 442 based on the first graph information 441 of the product and a preset reference distance.

제1 마스크 행렬 생성부(440)는 제1 문자열 정보(491)에 포함된 토큰들을 행 및 열로 설정하여 제1 프리 마스크 행렬(pre mask matrix)을 생성할 수 있다. 또한, 제1 마스크 행렬 생성부(440)는 기 설정된 기준 거리에 기초하여 제1 그래프 정보(441)에서 서로 이웃하는 이웃 노드들을 판단할 수 있다. 또한, 제1 마스크 행렬 생성부(440)는 판단된 이웃 노드들에 기초하여 제1 마스크 행렬(442)을 생성할 수 있다. 예를 들어, 제1 마스크 행렬 생성부(440)는 제1 프리 마스크 행렬에 셀프 어텐션 행렬 계산시 필요한 노드들에 대한 정보를 할당하되, 기준 노드와 이웃하는 이웃 노드들에 '1'을 할당하고, 나머지 노드들에 '0'을 할당함으로써, 제1 마스크 행렬(442)을 생성할 수 있다. 제1 마스크 행렬 생성부(440)가 기준 노드의 이웃 노드들에 기초하여 제1 마스크 행렬(442)을 생성하는 것은 기준 노드와 화학적으로 관련성이 큰 이웃 노드들에 보다 어텐션하여 역합성 예측 모델을 학습시키기 위함이다.The first mask matrix generator 440 may generate a first pre mask matrix by setting tokens included in the first string information 491 as rows and columns. Also, the first mask matrix generator 440 may determine neighboring nodes in the first graph information 441 based on a preset reference distance. Also, the first mask matrix generator 440 may generate the first mask matrix 442 based on the determined neighboring nodes. For example, the first mask matrix generator 440 allocates information on nodes necessary for calculating the self-attention matrix to the first pre-mask matrix, but allocates '1' to the reference node and neighboring nodes, and , by allocating '0' to the remaining nodes, the first mask matrix 442 may be generated. When the first mask matrix generator 440 generates the first mask matrix 442 based on the neighboring nodes of the reference node, the desynthesis prediction model is generated by paying more attention to neighboring nodes that are chemically related to the reference node. in order to learn

제1 마스크 행렬 생성부(440)는 제1 마스크 행렬(442)을 인코더 셀프 어텐션부(431)에 제공할 수 있다.The first mask matrix generator 440 may provide the first mask matrix 442 to the encoder self-attention unit 431 .

인코더 셀프 어텐션부(431)는 제1 마스크 행렬(442)에 기초하여 셀프 어텐션 행렬에서 제1 문자열 정보(491)의 인코딩 시 어텐션해야할 요소를 판단할 수 있다. 또한, 인코더 셀프 어텐션부(431)는 판단 결과에 기초하여 마스크가 적용된 셀프 어텐션 행렬을 출력할 수 있다.The encoder self-attention unit 431 may determine an element to be attended to when encoding the first string information 491 in the self-attention matrix based on the first mask matrix 442 . Also, the encoder self-attention unit 431 may output a self-attention matrix to which a mask is applied based on the determination result.

마스크가 적용된 셀프 어텐션 행렬은 인코더 피드 포워드부(432)에 제공될 수 있다.The self-attention matrix to which the mask is applied may be provided to the encoder feed forward unit 432 .

인코더 피드 포워드부(432)는 피드-포워드 신경망(feed-forward neural network)을 포함할 수 있다. 피드-포워드 신경망에 의하여 입력 시퀀스가 변환되어 출력될 수 있다. 변환된 입력 시퀀스는 디코더부(470)에 제공될 수 있다.The encoder feed-forward unit 432 may include a feed-forward neural network. The input sequence may be converted and output by the feed-forward neural network. The transformed input sequence may be provided to the decoder unit 470 .

디코더 임베딩부(450)는 입력 데이터를 문자 단위로 임베딩할 수 있다. 이 때, 입력 데이터는 반응물의 제2 문자열 정보(492)를 의미할 수 있다. 다시 말해, 디코더 임베딩부(450)는 문자열 형식의 제2 문자열 정보(492)를 기 설정된 차원의 벡터에 맵핑할 수 있다. 예를 들어, 기 설정된 차원은 512 차원일 수 있으나 이에 제한되지 않는다.The decoder embedding unit 450 may embed input data in character units. In this case, the input data may mean the second character string information 492 of the reactant. In other words, the decoder embedding unit 450 may map the second string information 492 in a string format to a vector having a preset dimension. For example, the preset dimension may be 512 dimensions, but is not limited thereto.

제2 위치 인코딩부(460)는 제2 문자열 정보(492)에 포함된 각각의 문자의 위치를 식별하기 위하여, 위치 인코딩(positional encoding)을 수행할 수 있다. 예를 들어, 제2 위치 인코딩부(460)는 상이한 주파수의 정현파를 이용하여 제2 문자열 정보(492)를 위치 인코딩할 수 있다. 제2 위치 인코딩부(460)는 임베딩된 제2 문자열 정보(492)에 위치 정보를 결합하여 디코더부(470)에 제공할 수 있다.The second positional encoding unit 460 may perform positional encoding to identify the position of each character included in the second string information 492 . For example, the second position encoding unit 460 may position-encode the second character string information 492 using sine waves of different frequencies. The second position encoding unit 460 may provide the decoder unit 470 by combining the position information with the embedded second character string information 492 .

디코더부(470)는 디코더 셀프 어텐션부(471), 디코더 교차 어텐션부(472) 및 디코더 피드 포워드부(473)를 포함할 수 있다. 인코더부(430)와 마찬가지로, 디코더부(470)는 하나의 부(unit)로 도시되어 있으나, N개의 디코더들이 적층된 형태일 수 있다. 또한, 디코더 셀프 어텐션부(471), 디코더 교차 어텐션부(472) 및 디코더 피드 포워드부(473)는 하나의 부(unit)로 표시되나, 디코더 셀프 어텐션부(471), 디코더 교차 어텐션부(472) 및 디코더 피드 포워드부(473)는 디코더 계층에 포함된 각각의 서브 계층을 의미할 수도 있다.The decoder unit 470 may include a decoder self-attention unit 471 , a decoder cross-attention unit 472 , and a decoder feed forward unit 473 . Like the encoder unit 430 , the decoder unit 470 is illustrated as one unit, but N decoders may be stacked. In addition, the decoder self-attention unit 471 , the decoder cross-attention unit 472 , and the decoder feed-forward unit 473 are displayed as one unit, but the decoder self-attention unit 471 and the decoder cross-attention unit 472 ) and the decoder feed forward unit 473 may refer to each sub-layer included in the decoder layer.

디코더 셀프 어텐션부(471)는 인코더 셀프 어텐션부(431)와 유사한 동작을 수행할 수 있다. 다시 말해, 디코더 셀프 어텐션부(471)는 제2 문자열 정보(492)의 디코딩 시, 제2 문자열 정보(492)에서 셀프 어텐션해야할 정보를 특정하고, 특정된 정보에 기초하여 셀프 어텐션 행렬을 생성할 수 있다. 또한, 디코더 셀프 어텐션부(471)도 멀티 헤드로 구성되어 복수의 셀프 어텐션 행렬을 생성할 수 있다. 디코더 셀프 어텐션부(471)에서 생성된 셀프 어텐션 행렬을 인코더 셀프 어텐션부(431)에서 생성된 셀프 어텐션 행렬과 구분하기 위하여, 인코더 셀프 어텐션부(431)에서 생성된 셀프 어텐션 행렬을 제1 셀프 어텐션 행렬이라고 명명하고, 디코더 셀프 어텐션부(471)에서 생성된 셀프 어텐션 행렬을 제2 셀프 어텐션 행렬로 명명할 수 있다.The decoder self-attention unit 471 may perform an operation similar to that of the encoder self-attention unit 431 . In other words, when the decoder self-attention unit 471 decodes the second string information 492, the second string information 492 specifies information to be self-attention to, and generates a self-attention matrix based on the specified information. can In addition, the decoder self-attention unit 471 may also be configured as a multi-head to generate a plurality of self-attention matrices. In order to distinguish the self-attention matrix generated by the decoder self-attention unit 471 from the self-attention matrix generated by the encoder self-attention unit 431, the self-attention matrix generated by the encoder self-attention unit 431 is first self-attention. A matrix may be named, and the self-attention matrix generated by the decoder self-attention unit 471 may be referred to as a second self-attention matrix.

디코더 셀프 어텐션부(471)와 인코더 셀프 어텐션부(431)의 차이는 마스크의 적용 여부이다. 디코더 셀프 어텐션부(471)는 인코더 셀프 어텐션부(431)에서 설명한 제1 마스크 행렬(442)을 이용하여 셀프 어텐션 행렬을 마스킹하지 않는다. 다만, 디코더 셀프 어텐션부(471)는 현재 출력 위치가 다음 출력 위치에 대한 정보로 사용되지 않도록 하기 위한 마스크를 사용할 수는 있다.The difference between the decoder self-attention unit 471 and the encoder self-attention unit 431 is whether a mask is applied. The decoder self-attention unit 471 does not mask the self-attention matrix using the first mask matrix 442 described in the encoder self-attention unit 431 . However, the decoder self-attention unit 471 may use a mask for preventing the current output position from being used as information on the next output position.

디코더 교차 어텐션부(472)는 제2 문자열 정보(492)의 디코딩 시, 제1 문자열 정보(491)에서 교차 어텐션(cross attention)해야할 정보를 특정할 수 있다. 이를 위하여 디코더 교차 어텐션부(472)는 제2 문자열 정보(492)와 제1 문자열 정보(491)의 관련 정도를 나타내는 교차 어텐션 점수 행렬(cross attention score matrix)을 생성할 수 있다. 교차 어텐션 점수 행렬에는 제1 문자열 정보(491)에 포함된 문자열들과 제2 문자열 정보(492)에 포함된 문자열들 사이의 관련 정도를 나타내는 점수가 맵핑될 수 있다. 디코더 교차 어텐션부(472)는 소프트 맥스 함수를 사용하여 교차 어텐션 점수 행렬의 요소 값들(values)을 확률로써 나타내는 교차 어텐션 행렬(cross attention matrix)을 생성할 수 있다. 디코더부(470)는 교차 어텐션 행렬에 기초하여 입력 문자열의 디코딩 시, 교차 어텐션해야할 요소들(elements)을 제1 문자열 정보(491)에서 특정할 수 있다.When decoding the second character string information 492 , the decoder cross-attention unit 472 may specify information to be cross-attended in the first character string information 491 . To this end, the decoder cross-attention unit 472 may generate a cross attention score matrix indicating the degree of relevance between the second string information 492 and the first string information 491 . A score indicating a degree of relevance between character strings included in the first character string information 491 and the character strings included in the second character string information 492 may be mapped to the cross attention score matrix. The decoder cross-attention unit 472 may use a soft max function to generate a cross-attention matrix representing element values of the cross-attention score matrix as probabilities. The decoder unit 470 may specify, in the first string information 491 , elements to be cross-attended upon when decoding the input string based on the cross-attention matrix.

프로세서(110)는 학습의 효율성 및 신속성 및 정확성을 위하여 제1 그래프 정보(441) 및 제2 그래프 정보(481)에 기초하여 교차 어텐션 행렬에서 어텐션 손실 계산시 어텐션 해야할 정보를 판단할 수 있다. 교차 어텐션 행렬에서 어텐션 해야할 정보는 제2 어텐션 정보일 수 있다. 이를 위하여, 제2 마스크 행렬 생성부(480)는 교차 어텐션 행렬에서 불필요한 요소들을 마스킹하는 제2 마스크 행렬(482)을 생성할 수 있다. 제1 마스크 행렬(442)과 제2 마스크 행렬(482)의 역할은 서로 상이할 수 있다. 제1 마스크 행렬(442)과 달리 제2 마스크 행렬(482)은 역합성 예측 모델의 어텐션 손실(attention loss) 계산 시, 불필요한 요소들을 마스킹하기 위한 행렬일 수 있다.The processor 110 may determine the information to be attended to when calculating the attention loss in the cross attention matrix based on the first graph information 441 and the second graph information 481 for efficiency, speed, and accuracy of learning. Information to be attended to in the cross attention matrix may be second attention information. To this end, the second mask matrix generator 480 may generate a second mask matrix 482 for masking unnecessary elements in the cross attention matrix. The roles of the first mask matrix 442 and the second mask matrix 482 may be different from each other. Unlike the first mask matrix 442 , the second mask matrix 482 may be a matrix for masking unnecessary elements when an attention loss of the desynthesis prediction model is calculated.

제2 마스크 행렬 생성부(480)는 제1 그래프 정보(441) 및 제2 그래프 정보(481)에 기초하여 생성물에 포함된 원자들과 반응물에 포함된 원자들의 대응관계를 나타내는 원자 맵핑(atom-mapping) 정보를 획득하고, 원자 맵핑 정보에 기초하여 제2 마스크 행렬(482)을 생성할 수 있다.The second mask matrix generator 480 is configured to perform an atom mapping (atom-) representing a correspondence between atoms included in a product and atoms included in a reactant based on the first graph information 441 and the second graph information 481 . mapping) information may be obtained, and a second mask matrix 482 may be generated based on the atomic mapping information.

제2 마스크 행렬 생성부(480)는 제2 문자열 정보(492)를 행으로 설정하고, 제1 문자열 정보(491)를 열로 설정하여 제2 프리 마스크 행렬을 생성할 수 있다. 또한, 제2 마스크 행렬 생성부(480)는 제1 그래프 정보(441) 및 제2 그래프 정보(481)로부터 원자 맵핑(atom-mapping) 정보를 획득할 수 있다. 제1 그래프 정보(441) 및 제2 그래프 정보(481) 사이의 유사도를 판단하기 위하여 최대 공통 하위 구조(Maximum Common Substructure: MCS) 기법을 이용하는 경우 NP(Non-deterministic Polynomial time)-hard 문제가 발생될 수 있다. 따라서, 본 개시의 원자 맵핑 정보는 제1 그래프 정보(441) 및 제2 그래프 정보(481) 사이의 유사도를 판단하기 위하여 정확한 원자 맵핑 정보를 요구하지 않고 특정 쌍의 정보만을 활용하는 유연한 최대 공통 하위 구조(Flexible Maximum Common Substructure: FMCS) 기법을 이용하여 설정될 수 있다. 유연한 최대 공통 하위 구조(FMCS)는 RDkit에 구현된 FMCS 알고리즘일 수 있다. 제2 마스크 행렬 생성부(480)가 정확한 원자 맵핑 정보가 아닌 특정 쌍의 정보만을 활용한 원자 맵핑을 이용함에 따라, 컴퓨팅 비용(computing cost)가 절감될 수 있다.The second mask matrix generator 480 may set the second string information 492 as rows and set the first string information 491 as columns to generate a second free mask matrix. Also, the second mask matrix generator 480 may obtain atom-mapping information from the first graph information 441 and the second graph information 481 . When a Maximum Common Substructure (MCS) technique is used to determine the similarity between the first graph information 441 and the second graph information 481 , a non-deterministic polynomial time (NP)-hard problem occurs. can be Accordingly, the atomic mapping information of the present disclosure does not require accurate atomic mapping information in order to determine the degree of similarity between the first graph information 441 and the second graph information 481 and uses only a specific pair of information for a flexible maximum common sub It may be configured using a flexible maximum common substructure (FMCS) technique. The flexible maximum common substructure (FMCS) may be the FMCS algorithm implemented in the RDkit. As the second mask matrix generator 480 uses atomic mapping using only a specific pair of information rather than accurate atomic mapping information, computing cost may be reduced.

제2 마스크 행렬 생성부(480)는 획득된 원자 맵핑 정보에 기초하여 제2 마스크 행렬(482)을 생성할 수 있다. 예를 들어, 제2 마스크 행렬 생성부(480)는 제2 프리 마스크 행렬에 어텐션 손실 계산 시 필요한 노드들에 대한 정보를 할당하되, 서로 대응되는 노드들에 '1'을 할당하고, 나머지 노드들에 '0'을 할당함으로써, 제2 마스크 행렬(482)을 생성할 수 있다.The second mask matrix generator 480 may generate a second mask matrix 482 based on the obtained atomic mapping information. For example, the second mask matrix generator 480 allocates information on nodes necessary for calculating the attention loss to the second premask matrix, assigns '1' to corresponding nodes, and By assigning '0' to , the second mask matrix 482 may be generated.

제2 마스크 행렬 생성부(480)는 제2 마스크 행렬(482)을 디코더 교차 어텐션부(472)에 제공할 수 있다.The second mask matrix generator 480 may provide the second mask matrix 482 to the decoder cross-attention unit 472 .

디코더 교차 어텐션부(472)는 제2 마스크 행렬(482)에 기초하여 교차 어텐션 행렬에서 역합성 예측 모델의 어텐션 손실 계산 시, 어텐션해야할 요소를 판단할 수 있다. 또한, 디코더 교차 어텐션부(472)는 판단 결과에 기초하여 마스크가 적용된 교차 어텐션 행렬을 출력할 수 있다. 마스크가 적용된 교차 어텐션 행렬은 디코더 피드 포워드부(473)에 제공될 수 있다.The decoder cross-attention unit 472 may determine an element to be attended to when calculating the attention loss of the desynthesis prediction model in the cross-attention matrix based on the second mask matrix 482 . Also, the decoder cross-attention unit 472 may output a cross-attention matrix to which a mask is applied based on the determination result. The masked cross attention matrix may be provided to the decoder feed forward unit 473 .

디코더 피드 포워드부(473)는 인코더 피드 포워드부(432)와 유사한 동작을 수행할 수 있다. 다시 말해, 디코더 피드 포워드부(473)는 피드-포워드 신경망(feed-forward neural network)을 포함하고, 피드-포워드 신경망에 의하여 입력 시퀀스가 변환될 수 있다. 디코더 피드 포워드부(473)는 변환된 입력 시퀀스를 출력 반응물 정보(493)로써 출력할 수 있다.The decoder feed forward unit 473 may perform an operation similar to that of the encoder feed forward unit 432 . In other words, the decoder feed-forward unit 473 includes a feed-forward neural network, and an input sequence may be converted by the feed-forward neural network. The decoder feed forward unit 473 may output the converted input sequence as the output reactant information 493 .

한편, 프로세서(110)는 마스크가 적용된 교차 어텐션 행렬로부터 역합성 예측 모델의 어텐션 손실을 획득할 수 있다. 또한, 프로세서(110)는 출력 반응물 정보(493)로부터 역합성 예측 모델의 교차 엔트로피(cross entropy) 손실을 획득할 수 있다.Meanwhile, the processor 110 may obtain the attention loss of the desynthesis prediction model from the cross-attention matrix to which the mask is applied. Also, the processor 110 may obtain a cross entropy loss of the inverse synthesis prediction model from the output reactant information 493 .

프로세서(110)는 어텐션 손실 및 교차 엔트로피 손실에 기초하여 역합성 예측 모델을 학습시킬 수 있다. 예를 들어, 프로세서(110)는 어텐션 손실 및 교차 엔트로피 손실의 합이 작아지도록 역합성 예측 모델을 학습시킬 수 있다.The processor 110 may train the inverse synthesis prediction model based on the loss of attention and the loss of cross entropy. For example, the processor 110 may train the inverse synthesis prediction model so that the sum of the attention loss and the cross entropy loss is small.

도 5는 일 실시예에 따른 인코딩 방법을 설명하기 위한 도면이고, 도 6은 일 실시예에 따른 제1 마스크 행렬의 생성 방법을 설명하기 위한 도면이고, 도 7은 일 실시예에 따른 제1 마스크 행렬의 생성 방법을 설명하기 위해 참조되는 도면이고, 도 8은 일 실시예에 따른 제1 마스크 행렬의 효과를 설명하기 위한 도면이다.FIG. 5 is a diagram for explaining an encoding method according to an embodiment, FIG. 6 is a diagram for explaining a method for generating a first mask matrix according to an embodiment, and FIG. 7 is a diagram for explaining a first mask according to an embodiment Reference is made to a method of generating a matrix, and FIG. 8 is a diagram for explaining an effect of the first mask matrix according to an embodiment.

도 5를 참조하면, 인코더 셀프 어텐션부(431)는 제1 문자열 정보(491)로부터 쿼리(query), 키(key) 및 밸류(value) 정보를 획득할 수 있다. 인코더 셀프 어텐션부(431)는 제1 차원을 가지는 입력 벡터 시퀀스를 제1 차원 보다 작은 제2 차원을 가지는 쿼리 벡터, 키 벡터 및 밸류 벡터로 변환할 수 있다. 입력 벡터 시퀀스는 인코더 임베딩부(410) 및 제1 위치 인코딩부(420)에 의해 변환된 제1 문자열 정보(491)의 벡터 시퀀스를 의미할 수 있다. 차원의 변환은 각각의 벡터에 가중치 행렬을 곱셈 연산함으로써 수행될 수 있다. 가중치 행렬은 훈련에 의해 업데이트될 수 있다. 예를 들어, 제1 차원은 512이고 제2 차원은 64일 수 있으나 이에 제한되지 않는다.Referring to FIG. 5 , the encoder self-attention unit 431 may obtain query, key, and value information from the first string information 491 . The encoder self-attention unit 431 may convert an input vector sequence having a first dimension into a query vector, a key vector, and a value vector having a second dimension smaller than the first dimension. The input vector sequence may mean a vector sequence of the first character string information 491 converted by the encoder embedding unit 410 and the first position encoding unit 420 . Dimensional transformation may be performed by multiplying each vector by a weight matrix. The weight matrix can be updated by training. For example, the first dimension may be 512 and the second dimension may be 64, but is not limited thereto.

인코더 셀프 어텐션부(431)는 쿼리 및 키에 기초하여 셀프 어텐션 점수 행렬(511)을 생성할 수 있다. 인코더 셀프 어텐션부(431)는 각각의 쿼리에 대해서 모든 키와의 관련도를 판단하고, 판단 결과를 셀프 어텐션 점수 행렬(511)에 나타낼 수 있다. 셀프 어텐션 점수 행렬(511)은 쿼리와 키의 관련도를 나타내는 점수(score) 에 대한 정보를 포함할 수 있다. 일 실시예에서 셀프 어텐션 점수 행렬(511)은 수학식 1과 같이 스케일드 닷-프로덕트 어텐션(scaled dot product attention) 연산에 의해 도출될 수 있다.The encoder self-attention unit 431 may generate a self-attention score matrix 511 based on a query and a key. The encoder self-attention unit 431 may determine the degree of relevance to all keys for each query, and display the determination result in the self-attention score matrix 511 . The self-attention score matrix 511 may include information about a score indicating the relationship between the query and the key. In an embodiment, the self-attention score matrix 511 may be derived by a scaled dot product attention operation as shown in Equation (1).

수학식 1에서 S는 셀프 어텐션 점수 행렬(511)이고, Q는 쿼리 벡터이고, K는 키 벡터이고, T는 전치 행렬을 의미하며, d_k는 키 벡터의 차원을 의미할 수 있다.In Equation 1, S is the self-attention score matrix 511, Q is a query vector, K is a key vector, T is a transpose matrix, and d _k may mean a dimension of the key vector.

인코더 셀프 어텐션부(431)는 제1 마스크 행렬 생성부(440)로부터 제1 마스크 행렬(442)을 제공받을 수 있다.The encoder self-attention unit 431 may receive the first mask matrix 442 from the first mask matrix generator 440 .

제1 마스크 행렬 생성부(440)는 셀프 어텐션 점수 행렬(511)에서 불필요한 요소들(elements)을 마스킹하는 제1 마스크 행렬(442)을 생성할 수 있다.The first mask matrix generator 440 may generate a first mask matrix 442 for masking unnecessary elements in the self-attention score matrix 511 .

제1 마스크 행렬 생성부(440)는 인코딩 대상 노드와 화학적으로 관련성이 큰 이웃 노드들에 어텐션하여 역합성 예측 모델을 학습시키기 위하여, 기 설정된 기준 거리에 기초하여 제1 그래프 정보(441)에서 인코딩 대상 노드와 인접하는 이웃 노드들을 판단하고, 판단 결과에 기초하여 제1 마스크 행렬(442)을 생성할 수 있다. 이때, 거리는 그래프 상의 측지 거리(geodesic distance)를 의미할 수 있다. 또한, 거리는 그래프 상의 홉 이웃(hop neighbor)을 의미할 수 있다.The first mask matrix generator 440 encodes the first graph information 441 based on a preset reference distance in order to learn the inverse synthesis prediction model by paying attention to neighboring nodes that are chemically related to the encoding target node. Neighbors adjacent to the target node may be determined, and a first mask matrix 442 may be generated based on the determination result. In this case, the distance may mean a geodesic distance on the graph. Also, the distance may mean a hop neighbor on the graph.

제1 마스크 행렬 생성부(440)는 제1 그래프 정보(441)에 포함된 노드들 중에서 어느 하나의 노드를 기준 노드로 설정할 수 있다. 또한, 기준 노드를 중심으로 기준 거리만큼 떨어진 거리에 존재하는 인접 노드들을 "1"로 표현하고, 나머지 노드들을 "0"으로 표현할 수 있다. 실시예에 따라, 제1 마스크 행렬 생성부(440)는 기준 노드 및 기준 노드를 중심으로 기준 거리만큼 떨어진 거리에 존재하는 인접 노드들을 "1"로 표현하고, 나머지 노드들을 "0"으로 표현할 수도 있다.The first mask matrix generator 440 may set any one node among the nodes included in the first graph information 441 as a reference node. Also, adjacent nodes existing at a distance apart from the reference node by the reference distance may be expressed as “1”, and the remaining nodes may be expressed as “0”. According to an embodiment, the first mask matrix generator 440 may express a reference node and adjacent nodes existing at a distance by a reference distance from the reference node as “1” and the remaining nodes as “0”. have.

도 6에는 기준 노드가 n1이고 기준 거리가 1일 때, 제1 마스크 행렬 생성부(440)가 제1 마스크 행렬(442)을 생성하는 방법이 도시되어 있다.6 illustrates a method in which the first mask matrix generator 440 generates the first mask matrix 442 when the reference node is n1 and the reference distance is 1.

도 6을 참조하면, 제1 마스크 행렬 생성부(440)는 제1 그래프 정보(441)에 포함된 노드들(n1 내지 n6) 중에서 어느 하나의 노드를 기준 노드(n1)로 설정할 수 있다. 또한, 제1 마스크 행렬 생성부(440)는 제1 기준 노드(n1)를 중심으로 기준 거리만큼 떨어진 거리에 존재하는 이웃 노드들(n2, n6)을 판단할 수 있다. 제1 마스크 행렬 생성부(440)는 제1 프리 마스크 행렬에 기준 노드(n1) 및 기준 노드(n1)와 서로 이웃하는 이웃 노드들(n2, n6)에 대한 정보를 할당할 수 있다. 일 실시예에서, 제1 마스크 행렬 생성부(440)는 기준 노드(n1)와 서로 이웃하는 이웃 노드들(n2, n6)에 "1"을 할당하고, 나머지 노드들에 "0"을 할당할 수 있다. 도 6에는 이웃 노드들(n2, n6)에 "1"이 할당되는 예가 도시되어 있으나, 실시예에 따라, 제1 마스크 행렬 생성부(440)는 기준 노드(n1) 및 기준 노드(n1)와 서로 이웃하는 이웃 노드들(n2, n6)에 "1"을 할당하고, 나머지 노드들에 "0"을 할당할 수도 있다.Referring to FIG. 6 , the first mask matrix generator 440 may set any one of the nodes n1 to n6 included in the first graph information 441 as the reference node n1 . Also, the first mask matrix generator 440 may determine the neighboring nodes n2 and n6 existing at a distance apart from the first reference node n1 by the reference distance. The first mask matrix generator 440 may allocate the reference node n1 and information on the reference node n1 and neighboring nodes n2 and n6 to the first pre-mask matrix. In an embodiment, the first mask matrix generator 440 allocates “1” to the reference node n1 and neighboring nodes n2 and n6 adjacent to each other, and allocates “0” to the remaining nodes. can 6 illustrates an example in which “1” is assigned to the neighboring nodes n2 and n6, according to an exemplary embodiment, the first mask matrix generator 440 includes the reference node n1 and the reference node n1. “1” may be allocated to the neighboring nodes n2 and n6 that are adjacent to each other, and “0” may be allocated to the remaining nodes.

한편, 제1 문자열 정보(491)는 원자 토큰들 이외에 원자 사이의 결합들(예를 들어, -, =, #), 괄호, 공백(whitespace)을 가진 순환 구조들(cyclic structures)의 숫자와 같이 비원자 토큰들을 더 포함하므로, 제1 문자열 정보(491)의 토큰들과 제1 그래프 정보(441)의 노드들은 서로 일치하지 않는다. 이러한 비원자 토큰들은 문자열 정보의 전체 문맥(context)에서 명확해질 수 있으므로, 더 넓은 범위의 정보가 필요할 수 있다. 따라서, 제1 마스크 행렬 생성부(440)는 비원자 토큰들에 대응되는 노드들을 마스킹하지 않을 수 있다. 다시 말해, 제1 마스크 행렬 생성부(440)는 비원자 토큰들에 대응되는 노드들에 "1"을 할당할 수 있다. 비원자 토큰들이 그래프 구조에 관계없이 다른 모든 토큰들과 어텐션(attention)을 교환하므로, 역합성 예측 모델의 정확성이 향상될 수 있다.On the other hand, the first character string information 491 is, in addition to the atomic tokens, bonds between atoms (eg, -, =, #), parentheses, such as the number of cyclic structures having a space (whitespace). Since non-atomic tokens are further included, the tokens of the first string information 491 and the nodes of the first graph information 441 do not match each other. Since these non-atomic tokens can be disambiguated from the full context of string information, a wider range of information may be required. Accordingly, the first mask matrix generator 440 may not mask nodes corresponding to non-atomic tokens. In other words, the first mask matrix generator 440 may assign “1” to nodes corresponding to non-atomic tokens. Since non-atomic tokens exchange attention with all other tokens regardless of the graph structure, the accuracy of the inverse synthesis prediction model can be improved.

제1 마스크 행렬 생성부(440)는 기준 노드를 변경하고, 변경된 기준 노드의 이웃 노드들을 판단할 수 있다. 또한, 제1 마스크 행렬 생성부(440)는 판단결과에 기초하여 제1 마스크 행렬(442)을 생성할 수 있다. The first mask matrix generator 440 may change a reference node and determine neighboring nodes of the changed reference node. Also, the first mask matrix generator 440 may generate a first mask matrix 442 based on the determination result.

기준 거리에 대한 정보가 저장된 거리 행렬이 D=(d_ij)인 경우, 제1 마스크 행렬(442)에 포함된 요소들(elements)은 다음의 수학식 2에 의해 결정될 수 있다. When the distance matrix in which information about the reference distance is stored is D=(d _ij ), elements included in the first mask matrix 442 may be determined by Equation 2 below.

수학식 2에서, m_ij는 제1 마스크 행렬(442)에 포함된 요소(element)의 값을 의미하고, i 및 j는 원자의 토큰을 의미할 수 있다. d_h는 h 번째 헤드에 설정되는 기준 거리로써, 제1 마스크 행렬 생성부(440)는 각각의 헤드마다 상이한 기준 거리를 설정할 수 있다. 다시 말해, 제1 마스크 행렬 생성부(440)는 서로 상이한 기준 거리를 가지는 복수의 제1 마스크 행렬들을 생성하여 인코더 셀프 어텐션부(431)에 포함된 각각의 헤드에 제공할 수 있다.In Equation 2, m _ij may mean a value of an element included in the first mask matrix 442 , and i and j may mean an atom token. d _h is a reference distance set to the h-th head, and the first mask matrix generator 440 may set a different reference distance for each head. In other words, the first mask matrix generator 440 may generate a plurality of first mask matrices having different reference distances and provide them to each head included in the encoder self-attention unit 431 .

도 7은 기준 거리에 따른 마스크 행렬의 예를 도시한다. 도 7에는 기준 거리가 1인 경우의 제1 마스크 행렬(711), 기준 거리가 2인 경우의 제1 마스크 행렬(712) 및 기준 거리가 3일 때, 제1 마스크 행렬(713)이 도시되어 있다.7 shows an example of a mask matrix according to a reference distance. 7 shows a first mask matrix 711 when the reference distance is 1, a first mask matrix 712 when the reference distance is 2, and a first mask matrix 713 when the reference distance is 3, have.

도 7을 참조하면, 제1 마스크 행렬 생성부(440)는 제1 헤드에 제공되는 제1 마스크 행렬(711)의 기준 거리를 1로 설정하고, 제2 헤드에 제공되는 제2 마스크 행렬(712)의 기준 거리를 2로 설정하고, 제3 헤드에 제공되는 제3 마스크 행렬(713)을 3으로 설정할 수 있다. 제1 마스크 행렬 생성부(440)가 각각의 헤드에 제공되는 제1 마스크 행렬들의 기준 거리를 서로 상이하게 설정함에 따라, 역합성 예측 모델의 학습이 강화될 수 있다.Referring to FIG. 7 , the first mask matrix generator 440 sets the reference distance of the first mask matrix 711 provided to the first head to 1, and the second mask matrix 712 provided to the second head. ) may be set to 2, and the third mask matrix 713 provided to the third head may be set to 3. As the first mask matrix generator 440 sets the reference distances of the first mask matrices provided to each head to be different from each other, learning of the inverse synthesis prediction model may be strengthened.

다시 도 5를 참조하면, 제1 마스크 행렬 생성부(440)는 제1 마스크 행렬(442)을 인코더 셀프 어텐션부(431)에 제공할 수 있다.Referring back to FIG. 5 , the first mask matrix generator 440 may provide the first mask matrix 442 to the encoder self-attention unit 431 .

인코더 셀프 어텐션부(431)는 제1 마스크 행렬(442)에 기초하여 셀프 어텐션 점수 행렬(511)을 마스킹할 수 있다.The encoder self-attention unit 431 may mask the self-attention score matrix 511 based on the first mask matrix 442 .

셀프 어텐션 점수 행렬(511)이 S=(s_ij)이고, 제1 마스크 행렬(442)이 M=(M_ij)인 경우, 마스크가 적용된 셀프 어텐션 점수 행렬(511)은 다음의 수학식 3과 같아 나타낼 수 있다.When the self-attention score matrix 511 is S=(s _ij ) and the first mask matrix 442 is M=(M _ij ), the self-attention score matrix 511 to which the mask is applied is expressed by Equation 3 and can be expressed as

수학식 3에서와 같이, 제1 마스크 행렬(442)의 요소가 1(즉, m_ij=1)인 경우, 셀프 어텐션 점수 행렬(511)의 요소가 그대로 출력되고, 제1 마스크 행렬(442)의 요소가 0(즉, m_ij=0)인 경우, "-∞"가 출력될 수 있다.As in Equation 3, when the element of the first mask matrix 442 is 1 (that is, m _ij =1), the element of the self-attention score matrix 511 is output as it is, and the first mask matrix 442 is When the element of is 0 (that is, m _ij = 0), "-∞" may be output.

인코더 셀프 어텐션부(431)는 소프트 맥스(softmax) 함수를 이용하여 마스크가 적용된 셀프 어텐션 점수 행렬(511)의 어텐션 분포(attention distribution)을 계산하고, 계산 결과와 각각의 밸류를 가중합함으로써 어텐션 값(attention value)을 생성할 수 있다. 어텐션 값들은 셀프 어텐션 행렬로써 표현될 수 있다. 소프트 맥스 함수에 의해 셀프 어텐션 점수 행렬(511)의 점수가 확률로써 표현될 수 있다. 셀프 어텐션 행렬은 시퀀스의 문맥 정보를 포함하므로, 문맥 벡터(context vector)라고 명명할 수도 있다. 다시 말해, 인코더 셀프 어텐션부(431)는 다음의 수학식 4에 의해 마스크가 적용된 셀프 어텐션 행렬을 출력할 수 있다.The encoder self-attention unit 431 calculates an attention distribution of the self-attention score matrix 511 to which a mask is applied using a softmax function, and weights and sums the calculation result and each value to obtain an attention value. (attention value) can be created. Attention values may be expressed as a self-attention matrix. The score of the self-attention score matrix 511 may be expressed as a probability by the soft max function. Since the self-attention matrix includes context information of a sequence, it may be referred to as a context vector. In other words, the encoder self-attention unit 431 may output a self-attention matrix to which a mask is applied by Equation 4 below.

인코더 셀프 어텐션부(431)가 불필요한 요소들이 제거된 셀프 어텐션 행렬을 출력함에 따라, 학습의 효율성, 신속성 및 정확성이 향상될 수 있다.As the encoder self-attention unit 431 outputs the self-attention matrix from which unnecessary elements are removed, learning efficiency, speed, and accuracy may be improved.

도 8에는 제1 마스크 행렬(442)의 적용 없이 인코더 셀프 어텐션(encoder self attention)하는 경우 프로세서(110)의 연산 방법을 설명하기 위한 그래프(811) 및 제1 마스크 행렬(442)을 적용하여 인코더 셀프 어텐션 하는 경우 프로세서(110)의 연산 방법을 설명하기 위한 그래프(813)가 도시되어 있다.In FIG. 8 , a graph 811 and a first mask matrix 442 are applied to the encoder for explaining the calculation method of the processor 110 when the encoder self-attention is performed without applying the first mask matrix 442 . A graph 813 is shown for explaining the calculation method of the processor 110 in case of self-attention.

도 8을 참조하면, 프로세서(110)가 역합성 예측 모델의 학습에 있어 불필요한 요소들을 제거하고, 화학적으로 관련성이 큰 이웃 노드들 사이의 관계에 보다 집중함으로써, 추가 매개 변수(parameter)의 도입 없이도, 학습의 효율성, 신속성 및 정확성이 증가될 수 있다.Referring to FIG. 8 , the processor 110 removes unnecessary elements in learning the inverse synthesis prediction model and concentrates more on the relationship between chemically related neighboring nodes, without introducing additional parameters , the efficiency, speed and accuracy of learning can be increased.

도 9는 일 실시예에 따른 디코딩 방법을 설명하기 위한 도면이고, 도 10은 일 실시예에 따른 제2 마스크 행렬의 생성 방법을 설명하기 위한 도면이고, 도 11은 일 실시예에 따른 제2 마스크 행렬의 생성 방법을 설명하기 위해 참조되는 도면이다.9 is a diagram illustrating a decoding method according to an embodiment, FIG. 10 is a diagram illustrating a method of generating a second mask matrix according to an embodiment, and FIG. 11 is a second mask according to an embodiment It is a diagram referenced to describe a method of generating a matrix.

도 9를 참조하면, 디코더 교차 어텐션부(471)는 인코더부(430)가 제공한 제1 문자열 정보(491)로부터 키 및 밸류 정보를 획득할 수 있다. 또한, 디코더 교차 어텐션부(471)는 제2 문자열 정보(492)로부터 쿼리 정보를 획득할 수 있다. 디코더 교차 어텐션부(471)가 쿼리, 키 및 밸류를 획득하는 방법은 인코더 셀프 어텐션부(431)가 쿼리, 키 및 밸류를 획득하는 방법과 유사할 수 있다. 다시 말해, 디코더 교차 어텐션부(471)는 입력 벡터에 가중치 행렬을 곱셈 연산함으로써, 쿼리, 키 및 밸류를 획득할 수 있다.Referring to FIG. 9 , the decoder cross-attention unit 471 may obtain key and value information from the first string information 491 provided by the encoder unit 430 . Also, the decoder cross-attention unit 471 may obtain query information from the second string information 492 . A method in which the decoder cross-attention unit 471 obtains the query, key, and value may be similar to a method in which the encoder self-attention unit 431 obtains the query, key and value. In other words, the decoder cross-attention unit 471 may obtain a query, a key, and a value by multiplying an input vector by a weight matrix.

디코더 교차 어텐션부(471)는 쿼리 및 키에 기초하여 교차 어텐션 점수 행렬을 생성할 수 있다. 디코더 교차 어텐션부(471)는 각각의 쿼리에 대해서 모든 키와의 관련도를 판단하고, 판단 결과를 교차 어텐션 점수 행렬에 나타낼 수 있다. 교차 어텐션 점수 행렬은 쿼리와 키의 관련도를 나타내는 점수에 대한 정보를 포함할 수 있다. 일 실시예에서 교차 어텐션 점수 행렬은 상술한 수학식 1과 같이 스케일드 닷-프로덕트 어텐션(scaled dot product attention) 연산에 의해 도출될 수 있다.The decoder cross-attention unit 471 may generate a cross-attention score matrix based on the query and the key. The decoder cross-attention unit 471 may determine the degree of relevance to all keys for each query, and display the determination result in a cross-attention score matrix. The cross-attention score matrix may include information about a score indicating a relationship between a query and a key. In an embodiment, the cross attention score matrix may be derived by a scaled dot product attention operation as in Equation 1 above.

디코더 교차 어텐션부(471)는 소프트 맥스(softmax) 함수를 이용하여 교차 어텐션 점수 행렬의 어텐션 분포(attention distribution)를 계산하고, 계산 결과와 각각의 밸류를 가중합함으로써 어텐션 값(attention value)를 생성할 수 있다. 어텐션 값들은 교차 어텐션 행렬(911)로써 표현될 수 있다.The decoder cross-attention unit 471 calculates an attention distribution of a cross-attention score matrix using a softmax function, and weights the calculation result and each value to generate an attention value. can do. Attention values may be expressed as a cross attention matrix 911 .

제2 마스크 행렬 생성부(480)는 교차 어텐션 행렬(911)에서 불필요한 요소들(elements)을 마스킹하는 제2 마스크 행렬(482)을 생성할 수 있다.The second mask matrix generator 480 may generate a second mask matrix 482 for masking unnecessary elements in the cross attention matrix 911 .

반응(reaction)은 분자를 완전히 분해하여 완전히 새로운 생성물을 생성하는 과정이 아니기 때문에 생성물의 분자와 반응물의 분자는 일반적으로 공통적인 구조를 가진다. 따라서, 생성물의 원자와 반응물의 원자 사이에 원자 맵핑이 가능하다. 또한, 교차 어텐션 행렬(911)은 생성물 토큰과 반응물 토큰 사이의 관계를 반영하기 때문에 이상적인 교차 어텐션 행렬(911)은 원자 맵핑 정보를 추종(catch)한다. 따라서, 제2 마스크 행렬 생성부(480)는 원자 맵핑 정보에 어텐션하여 역합성 예측 모델을 학습시키기 위하여, 생성물과 반응물 사이의 원자 맵핑 정보에 기초하여 제2 마스크 행렬(482)을 생성할 수 있다.Since a reaction is not a process of completely decomposing a molecule to produce a completely new product, the molecule of the product and the molecule of the reactant generally have a common structure. Thus, atomic mapping between the atoms of the product and the atoms of the reactant is possible. Also, since the cross-attention matrix 911 reflects the relationship between the product token and the reactant token, the ideal cross-attention matrix 911 catches the atomic mapping information. Accordingly, the second mask matrix generator 480 may generate the second mask matrix 482 based on the atomic mapping information between the product and the reactant in order to learn the desynthesis prediction model by paying attention to the atomic mapping information. .

제2 마스크 행렬 생성부(480)는 제1 그래프 정보(441) 및 제2 그래프 정보(481)로부터 원자 맵핑 정보를 획득할 수 있다. 제2 마스크 행렬 생성부(480)는 유연한 최대 공통 하위 구조(Flexible Maximum Common Substructure: FMCS)기법을 이용하여 원자 맵핑 정보를 획득할 수 있다. 예를 들어, 유연한 최대 공통 하위 구조(FMCS)는 RDkit에 구현된 FMCS 알고리즘일 수 있으나 이에 제한되지 않는다.The second mask matrix generator 480 may obtain atomic mapping information from the first graph information 441 and the second graph information 481 . The second mask matrix generator 480 may obtain atomic mapping information using a flexible maximum common substructure (FMCS) technique. For example, a flexible maximum common substructure (FMCS) may be, but is not limited to, an FMCS algorithm implemented in RDkit.

제2 마스크 행렬 생성부(480)는 제1 그래프 정보(441)에 포함된 노드들 중에서 어느 하나의 노드를 기준 노드로 설정할 수 있다. 또한, 제2 그래프 정보(481)에 포함된 노드들 중에서 기준 노드에 대응되는 노드를 "1"로 표현하고, 나머지 노드들을 "0"으로 표현할 수 있다.The second mask matrix generator 480 may set any one node among the nodes included in the first graph information 441 as a reference node. Also, among the nodes included in the second graph information 481 , a node corresponding to the reference node may be expressed as “1” and the remaining nodes may be expressed as “0”.

도 10 및 도 11에는 생성물이 벤젠이고, 반응물 조합이 퓨란 및 에틸렌일 때, 제2 마스크 행렬 생성부(480)가 제2 마스크 행렬(482)을 생성하는 방법이 도시되어 있다.10 and 11 illustrate a method in which the second mask matrix generator 480 generates the second mask matrix 482 when the product is benzene and the reactant combination is furan and ethylene.

도 10 및 도 11을 참조하면, 제2 마스크 행렬 생성부(480)는 제1 그래프 정보(441)에 포함된 노드들(n1 내지 n6) 중에서 어느 하나의 노드를 기준 노드(n1)로 설정할 수 있다. 제2 마스크 행렬 생성부(480)가 설정한 기준 노드(n1)를 제1 마스크 행렬 생성부(440)가 설정한 기준 노드와 구분하기 위하여, 제1 마스크 행렬 생성부(440)가 설정한 기준 노드를 제1 기준 노드라고 명명하고, 제2 마스크 행렬 생성부(480)가 설정한 기준 노드를 제2 기준 노드라고 명명할 수 있다.10 and 11 , the second mask matrix generator 480 may set any one of the nodes n1 to n6 included in the first graph information 441 as the reference node n1. have. In order to distinguish the reference node n1 set by the second mask matrix generator 480 from the reference node set by the first mask matrix generator 440, the reference set by the first mask matrix generator 440 The node may be called a first reference node, and the reference node set by the second mask matrix generator 480 may be called a second reference node.

제2 마스크 행렬 생성부(480)는 제2 그래프 정보(481)의 노드들(na 내지 ng) 중에서 기준 노드(n1)에 대응되는 노드(na)를 판단할 수 있다. 제2 마스크 행렬 생성부(480)는 제2 프리 마스크 행렬에 기준 노드(n1)에 대응되는 노드(na)에 대한 정보를 할당할 수 있다. 일 실시예에서, 제2 마스크 행렬 생성부(480)는 기준 노드(n1)에 대응되는 노드(na)에 "1"을 할당하고, 나머지 노드들에 "0"을 할당할 수 있다.The second mask matrix generator 480 may determine a node na corresponding to the reference node n1 from among the nodes na to ng of the second graph information 481 . The second mask matrix generator 480 may allocate information on the node na corresponding to the reference node n1 to the second premask matrix. In an embodiment, the second mask matrix generator 480 may allocate “1” to the node na corresponding to the reference node n1 and allocate “0” to the remaining nodes.

일 실시예에서, 제2 마스크 행렬 생성부(480)는 도 10과 같이 비원자 토큰들에 대응되는 노드들을 마스킹하지 않을 수 있다. 다시 말해, 제2 마스크 행렬 생성부(480)는 비원자 토큰들에 대응되는 노드들에 "1"을 할당할 수 있다. 다른 실시예에서, 제2 마스크 행렬 생성부(480)는 도 11과 같이, 원자들 사이의 대응관계에 어텐션하기 위하여 비원자 토큰들에 대응되는 노드들을 마스킹할 수 있다. 다시 말해, 제2 마스크 행렬 생성부(480)는 비원자 토큰들에 대응되는 노드들에 "0"을 할당할 수 있다.In an embodiment, the second mask matrix generator 480 may not mask nodes corresponding to non-atomic tokens as shown in FIG. 10 . In other words, the second mask matrix generator 480 may assign “1” to nodes corresponding to non-atomic tokens. In another embodiment, as shown in FIG. 11 , the second mask matrix generator 480 may mask nodes corresponding to non-atomic tokens in order to pay attention to the correspondence between atoms. In other words, the second mask matrix generator 480 may assign “0” to nodes corresponding to non-atomic tokens.

제2 마스크 행렬 생성부(480)는 기준 노드를 변경하고, 변경된 기준 노드에 대응되는 노드를 판단할 수 있다. 또한, 제2 마스크 행렬 생성부(480)는 판단 결과에 기초하여 제2 마스크 행렬(482)을 생성할 수 있다.The second mask matrix generator 480 may change a reference node and determine a node corresponding to the changed reference node. Also, the second mask matrix generator 480 may generate a second mask matrix 482 based on the determination result.

반응물이 R이고, 생성물이 P인 경우, 제2 마스크 행렬(482)에 포함된 요소들(elements)은 다음의 수학식 5에 의해 결정될 수 있다.When the reactant is R and the product is P, elements included in the second mask matrix 482 may be determined by Equation 5 below.

수학식 5에서 i'는 제2 문자열 정보(492)의 i번째 토큰에 대응되는 제2 그래프 정보(481)의 노드 인덱스이고, j'는 제1 문자열 정보(491)의 j번째 토큰에 대응되는 제1 그래프 정보(441)의 노드 인덱스를 의미할 수 있다.In Equation 5, i' is a node index of the second graph information 481 corresponding to the i-th token of the second string information 492, and j' is the j-th token of the first string information 491. It may mean a node index of the first graph information 441 .

다시 도 9를 참조하면, 제2 마스크 행렬 생성부(480)는 제2 마스크 행렬(482)을 디코더 교차 어텐션부(471)에 제공할 수 있다.Referring back to FIG. 9 , the second mask matrix generator 480 may provide the second mask matrix 482 to the decoder cross-attention unit 471 .

디코더 교차 어텐션부(471)는 제2 마스크 행렬(482)에 기초하여 교차 어텐션 행렬(911)을 마스킹할 수 있다.The decoder cross-attention unit 471 may mask the cross-attention matrix 911 based on the second mask matrix 482 .

디코더 교차 어텐션부(471)의 마스킹의 역할은 인코더 셀프 어텐션부(431)의 마스킹의 역할과 상이할 수 있다. 이는 단지 원자 맵핑이 완벽하지 않기 때문만이 아니라, 교차 주의에서 디코더부(470)의 자동 회귀 특성(auto-regressive nature)이 불완전한 문자열 정보(예를 들어, SMILES)를 생성하고, 추론 시간에서 시퀀스 생성 동안에 디코더부(470)가 원자 맵핑 정보를 찾을 수 없기 때문이다. 따라서, 디코더 교차 어텐션부(471)는 하드 마스크(hard mask)로 어텐션(attention)을 강요하지 않고, 불완전한 원자 맵핑 정보 중에서 특정 정보(즉, m_ij=1)로만 어텐션(attention)을 유도하여 교차 어텐션 행렬(911)이 완전한 원자 맵핑을 점진적으로 학습하도록 할 수 있다.The masking role of the decoder cross-attention unit 471 may be different from the masking role of the encoder self-attention unit 431 . This is not only because the atomic mapping is not perfect, but also because the auto-regressive nature of the decoder unit 470 in the intersection attention generates incomplete string information (eg, SMILES), and the sequence at the inference time. This is because the decoder unit 470 cannot find the atomic mapping information during generation. Accordingly, the decoder cross-attention unit 471 does not force attention with a hard mask, but induces attention only with _{specific information (ie, m ij =1) among incomplete atomic mapping information to cross} It can cause the attention matrix 911 to gradually learn the complete atomic mapping.

디코더 교차 어텐션부(472)는 마스킹이 적용된 교차 어텐션 행렬(911)을 출력할 수 있다.The decoder cross-attention unit 472 may output a cross-attention matrix 911 to which masking is applied.

프로세서(110)는 마스킹이 적용된 교차 어텐션 행렬(911)로부터 어텐션 손실(attention loss) 계산 시 어텐션해야할 요소들을 판단할 수 있다. 프로세서(110)는 어텐션해야할 요소들에 기초하여 교차 어텐션 행렬(911)의 어텐션 손실을 획득할 수 있다. 어텐션 손실은 제2 마스크 행렬(482)과 교차 어텐션 행렬(911)의 오차를 의미할 수 있다. 일 실시예에서, 어텐션 손실은 다음의 수학식 6에 의해 결정될 수 있다.The processor 110 may determine elements to be focused on when calculating an attention loss from the cross-attention matrix 911 to which the masking is applied. The processor 110 may obtain an attention loss of the cross attention matrix 911 based on elements to be attended to. The attention loss may mean an error between the second mask matrix 482 and the cross attention matrix 911 . In an embodiment, the attention loss may be determined by the following Equation (6).

수학식 6에서 L_attn은 어텐션 손실을 의미하고, M_cross은 제2 마스크 행렬(482)을 의미하고, A_cross는 교차 어텐션 행렬(911)을 의미할 수 있다. 또한, ⊙는 아다미르 곱(Hadamard product)을 의미할 수 있다.In Equation 6, L _attn may mean an attention loss, M _cross may mean the second mask matrix 482 , and A _cross may mean a cross attention matrix 911 . In addition, ⊙ may mean a Hadamard product.

프로세서(110)는 제2 출력 시퀀스로부터 교차 엔트로피 손실(cross entropy loss)을 획득할 수 있다. 일 실시예에서, 프로세서(110)는 제2 출력 시퀀스와 제2 문자열 정보(492)를 비교하여 교차 엔트로피 손실을 획득할 수 있다. 교차 엔트로피 손실의 획득 방법은 특정 방법으로 제한되지 않는다.The processor 110 may obtain a cross entropy loss from the second output sequence. In an embodiment, the processor 110 may obtain a cross-entropy loss by comparing the second output sequence with the second string information 492 . A method of obtaining the cross-entropy loss is not limited to a specific method.

프로세서(110)는 어텐션 손실 및 교차 엔트로피 손실에 기초하여 역합성 예측 모델의 전체 손실을 계산할 수 있다. 프로세서(110)는 다음의 수학식 7을 이용하여 역합성 예측 모델의 전체 손실을 계산할 수 있다.The processor 110 may calculate a total loss of the inverse synthesis prediction model based on the loss of attention and the loss of cross entropy. The processor 110 may calculate the total loss of the inverse synthesis prediction model using Equation 7 below.

수학식 7에서 L_total은 역합성 예측 모델의 전체 손실을 의미하고, L_attn은 어텐션 손실을 의미하고,

는 전체 손실과 어텐션 손실의 균형을 위한 조정 가능한 매개 변수(parameter)일 수 있다. 예를 들어, 매개 변수는 1로 설정될 수 있으나 이에 제한되지 않는다.In Equation 7, L _total means the total loss of the _{retrosynthesis} prediction model, L attn means the attention loss,

may be an adjustable parameter for balancing the total loss and the attention loss. For example, the parameter may be set to 1, but is not limited thereto.

제1 마스크 행렬(442)은 역합성 예측 모델의 출력을 통해 교차 엔트로피 손실에 기여하므로, 교차 엔트로피 손실에는 제1 마스크 행렬(442)에 의한 마스킹 효과가 반영될 수 있다.Since the first mask matrix 442 contributes to the cross entropy loss through the output of the desynthesis prediction model, the masking effect of the first mask matrix 442 may be reflected in the cross entropy loss.

프로세서(110)는 역합성 예측 모델의 전체 손실이 작아지도록 역합성 예측 모델을 학습시킬 수 있다.The processor 110 may train the inverse synthesis prediction model so that the overall loss of the inverse synthesis prediction model is small.

프로세서(110)가 불필요한 요소들이 제거하고 특정 요소들에 어텐션하여 역합성 예측 모델의 손실들을 계산함에 따라, 학습의 효율성, 신속성 및 정확성이 향상될 수 있다.As the processor 110 removes unnecessary elements and calculates the losses of the inverse synthesis prediction model by paying attention to specific elements, efficiency, speed, and accuracy of learning may be improved.

도 12는 일 실시예에 따른 역합성 예측 모델의 동작 방법을 설명하기 위한 순서도이다.12 is a flowchart illustrating a method of operating an inverse synthesis prediction model according to an embodiment.

도 12를 참조하면, S1210 단계에서, 프로세서(110)는 생성물의 제1 그래프 정보(441)에 기초하여 생성물의 제1 문자열 정보(491)에서 제1 어텐션 정보를 판단하고, 판단 결과에 기초하여 제1 문자열 정보(491)를 인코딩할 수 있다.Referring to FIG. 12 , in step S1210 , the processor 110 determines the first attention information from the first string information 491 of the product based on the first graph information 441 of the product, and based on the determination result The first string information 491 may be encoded.

서로 이웃하는 인접 원자들은 화학적으로 관련성이 크므로, 역합성 예측 모델의 효율적인 학습을 위하여 제1 어텐션 정보는 인코딩 대상 원자와 인접하는 인접 원자들에 대한 정보일 수 있다.Since neighboring atoms are chemically related to each other, the first attention information may be information on neighboring atoms adjacent to the encoding target atom for efficient learning of the inverse synthesis prediction model.

서로 이웃하는 인접 원자들은 제1 그래프 정보(441)에서 노드들 사이의 거리에 의해 결정되므로, 프로세서(110)는 제1 그래프 정보(441) 및 기 설정된 기준 거리에 기초하여 제1 문자열 정보(491)에서 어텐션해야할 정보를 판단할 수 있다. 이때, 거리는 그래프 상의 측지 거리를 의미할 수 있다. 또한, 거리는 그래프 상의 홉 이웃 거리를 의미할 수 있다.Since neighboring atoms are determined by the distance between nodes in the first graph information 441 , the processor 110 performs the first string information 491 based on the first graph information 441 and a preset reference distance. ), it is possible to determine the information to be attended to. In this case, the distance may mean a geodesic distance on the graph. Also, the distance may mean a hop-neighbor distance on the graph.

프로세서(110)는 제1 그래프 정보(441)에서 서로 이웃하는 이웃 노드들을 판단하고, 판단 결과에 기초하여 제1 문자열 정보(491)의 토큰들을 인코딩할 수 있다.The processor 110 may determine neighboring nodes in the first graph information 441 , and encode tokens of the first string information 491 based on the determination result.

프로세서(110)는 인코딩된 제1 문자열 정보(491)를 제1 출력 시퀀스로써 출력할 수 있다.The processor 110 may output the encoded first string information 491 as a first output sequence.

S1220 단계에서, 프로세서(110)는 제1 그래프 정보(441) 및 반응물의 제2 그래프 정보(481)에서 제2 어텐션 정보를 판단하고, 판단 결과에 기초하여 반응물의 제2 문자열 정보(492)를 디코딩할 수 있다.In step S1220 , the processor 110 determines second attention information from the first graph information 441 and the second graph information 481 of the reactant, and generates second string information 492 of the reactant based on the determination result can be decoded.

생성물과 반응물의 관계는 교차 어텐션 행렬(cross attention matrix)로 나타낼 수 있고, 이상적인 교차 어텐션 행렬(911)은 생성물에 포함된 원자와 반응물에 포함된 원자 사이의 관계를 나타내는 원자 맵핑(atom-mapping) 정보를 추종(catch)한다. 따라서, 역합성 예측 모델의 효율적인 학습을 위하여 제1 그래프 정보(441) 및 제2 그래프 정보(481)에서 어텐션해야할 정보는 원자 맵핑 정보일 수 있다. 이때, 원자 맵핑 정보는 특정 쌍의 '생성물 원자 - 반응물 원자'만을 활용하는 유연한 최대 공통 하위 구조(FMCS) 기법을 이용하여 설정될 수 있다.The relationship between the product and the reactant may be represented by a cross attention matrix, and the ideal cross attention matrix 911 is an atom-mapping representing the relationship between the atoms included in the product and the atoms included in the reactant. Catch information. Accordingly, in order to efficiently learn the inverse synthesis prediction model, information to be paid attention to in the first graph information 441 and the second graph information 481 may be atomic mapping information. In this case, the atomic mapping information may be set using a flexible maximum common substructure (FMCS) technique that utilizes only a specific pair of 'product atom - reactant atom'.

프로세서(110)는 제1 그래프 정보(441) 및 제2 그래프 정보(481)에서 서로 대응되는 특정 쌍의 노드들을 판단하고, 판단 결과에 기초하여 제2 문자열 정보(492)의 토큰들을 디코딩할 수 있다.The processor 110 may determine a specific pair of nodes corresponding to each other in the first graph information 441 and the second graph information 481 , and decode tokens of the second string information 492 based on the determination result. have.

프로세서(110)는 디코딩된 제2 문자열 정보(492)를 제2 출력 시퀀스로써 출력할 수 있다.The processor 110 may output the decoded second string information 492 as a second output sequence.

S1230 단계에서, 프로세서(110)는 제2 문자열 정보(492)의 디코딩 결과에 기초하여 역합성 예측 모델을 학습시킬 수 있다.In operation S1230 , the processor 110 may train the inverse synthesis prediction model based on the decoding result of the second string information 492 .

프로세서(110)는 원자 맵핑 정보에 기초하여 제2 문자열 정보(492)의 토큰들과 제1 문자열 정보(491)의 토큰들의 관련도를 나타내는 교차 어텐션 행렬(911)의 어텐션 손실을 계산할 수 있다. 또한, 프로세서(110)는 제2 출력 시퀀스에 기초하여 역합성 예측 모델의 교차 엔트로피 손실을 계산할 수 있다. 또한, 프로세서(110)는 어텐션 손실 및 교차 엔트로피 손실을 합산하여 역합성 예측 모델의 전체 손실을 계산할 수 있다.The processor 110 may calculate an attention loss of the cross attention matrix 911 indicating a degree of relevance between tokens of the second character string information 492 and tokens of the first character string information 491 based on the atomic mapping information. Also, the processor 110 may calculate a cross-entropy loss of the inverse synthesis prediction model based on the second output sequence. Also, the processor 110 may calculate the total loss of the inverse synthesis prediction model by summing the loss of attention and the loss of cross entropy.

프로세서(110)는 전체 손실에 기초하여 역합성 예측 모델을 학습시킬 수 있다. 예를 들어, 프로세서(110)는 역합성 예측 모델의 전체 손실이 기 설정된 기준 손실 보다 작아질 때까지 역합성 예측 모델을 학습시킬 수 있으나 이에 제한되지 않는다.The processor 110 may train an inverse synthesis prediction model based on the total loss. For example, the processor 110 may train the inverse synthesis prediction model until the total loss of the inverse synthesis prediction model becomes smaller than a preset reference loss, but is not limited thereto.

도 13은 일 실시예에 따른 인코딩 방법을 설명하기 위한 순서도이다.13 is a flowchart illustrating an encoding method according to an embodiment.

도 13을 참조하면, S1310 단계에서, 프로세서(110)는 제1 문자열 정보(491) 및 제1 그래프 정보(441)를 수신할 수 있다.Referring to FIG. 13 , in step S1310 , the processor 110 may receive first string information 491 and first graph information 441 .

제1 문자열 정보(491)는 SMILES 형식으로 프로세서(110)에 입력될 수 있다. SMILES 형식의 제1 문자열 정보(491)는 원자 토큰(예를 들어, B, C, N, O) 및 원자 사이의 결합들(예를 들어, -, =, #), 괄호, 공백(whitespace)을 가진 순환 구조들(cyclic structures)의 숫자 등과 같은 비원자 토큰을 포함할 수 있다.The first character string information 491 may be input to the processor 110 in a SMILES format. The first string information 491 in SMILES format includes atomic tokens (eg, B, C, N, O) and bonds between atoms (eg, -, =, #), parentheses, and whitespace. may contain non-atomic tokens, such as a number of cyclic structures with

제1 그래프 정보(441)는 2차원의 그래프 형식으로 프로세서(110)에 입력될 수 있다. 그래프 정보는 노드 및 엣지를 포함할 수 있다. 노드는 생성물의 원자에 대한 정보를 포함하고, 엣지는 각 원자의 연결관계에 대한 정보를 포함할 수 있다.The first graph information 441 may be input to the processor 110 in the form of a two-dimensional graph. Graph information may include nodes and edges. A node may contain information about the atoms of a product, and an edge may contain information about the connection relationship of each atom.

S1320 단계에서, 프로세서(110)는 제1 문자열 정보(491)에 포함된 토큰들 사이의 관련 정도를 나타내는 셀프 어텐션 점수 행렬(511)을 생성할 수 있다.In operation S1320 , the processor 110 may generate a self-attention score matrix 511 indicating a degree of relevance between tokens included in the first string information 491 .

프로세서(110)는 셀프 어텐션 점수 행렬(511)을 생성하기 위하여 제1 문자열 정보(491)로부터 쿼리, 키 및 밸류 정보를 획득할 수 있다.The processor 110 may obtain query, key, and value information from the first string information 491 to generate the self-attention score matrix 511 .

프로세서(110)는 각각의 쿼리에 대해서 모든 키와의 관련도를 판단하고, 판단 결과를 셀프 어텐션 점수 행렬(511)에 나타낼 수 있다. 셀프 어텐션 점수 행렬(511)은 쿼리와 키의 관련도를 나타내는 점수(score)에 대한 정보를 포함할 수 있다. 일 실시예에서, 셀프 어텐션 점수 행렬(511)은 쿼리와 키의 스케일드 닷-프로덕트 어텐션(scaled dot product attention) 연산에 의해 도출될 수 있다.The processor 110 may determine the degree of relevance to all keys for each query, and display the determination result in the self-attention score matrix 511 . The self-attention score matrix 511 may include information about a score indicating the relationship between the query and the key. In one embodiment, the self-attention score matrix 511 may be derived by a scaled dot product attention operation of a query and a key.

S1330 단계에서, 프로세서(110)는 제1 그래프 정보(441)에 기초하여 셀프 어텐션 점수 행렬(511)에 마스크를 적용할 수 있다.In operation S1330 , the processor 110 may apply a mask to the self-attention score matrix 511 based on the first graph information 441 .

프로세서(110)는 셀프 어텐션 점수 행렬(511)에서 불필요한 요소들을 마스킹하기 위하여 제1 그래프 정보(441) 및 기 설정된 기준 거리에 기초하여 제1 마스크 행렬(442)을 생성할 수 있다.The processor 110 may generate a first mask matrix 442 based on the first graph information 441 and a preset reference distance in order to mask unnecessary elements in the self-attention score matrix 511 .

프로세서(110)는 제1 문자열 정보(491)에 포함된 토큰들을 행 및 열로 설정하여 제1 프리 마스크 행렬을 생성할 수 있다.The processor 110 may generate a first free mask matrix by setting tokens included in the first string information 491 as rows and columns.

프로세서(110)는 제1 그래프 정보(441)에 포함된 노드들 중에서 어느 하나의 노드를 기준 노드로 설정할 수 있다. 또한, 프로세서(110)는 기준 노드를 중심으로 기 설정된 기준 거리만큼 떨어진 거리에 존재하는 이웃 노드들을 판단할 수 있다. 이때, 거리는 그래프 상의 측지 거리(geodesic distance)를 의미할 수 있다. 또한, 거리는 그래프 상의 홉 이웃(hop neighbor)을 의미할 수 있다. 기준 거리는 설정에 의해 조정될 수 있다.The processor 110 may set any one node among the nodes included in the first graph information 441 as a reference node. Also, the processor 110 may determine neighboring nodes existing at a distance apart from the reference node by a preset reference distance. In this case, the distance may mean a geodesic distance on the graph. Also, the distance may mean a hop neighbor on the graph. The reference distance can be adjusted by setting.

프로세서(110)는 제1 프리 마스크 행렬에 기준 노드 및 기준 노드와 서로 이웃하는 이웃 노드들에 대한 정보를 할당할 수 있다. 일 실시예에서, 프로세서(110)는 기준 노드와 서로 이웃하는 이웃 노드들에 "1"을 할당하고, 나머지 노드들에 "0"을 할당할 수 있다. 다른 실시예에서, 프로세서(110)는 기준 노드 및 기준 노드와 서로 이웃하는 이웃 노드들에 "1"을 할당하고, 나머지 노드들에 "0"을 할당할 수도 있다.The processor 110 may allocate information about a reference node and neighboring nodes adjacent to the reference node to the first premask matrix. In an embodiment, the processor 110 may allocate “1” to the reference node and neighboring nodes adjacent to each other, and allocate “0” to the remaining nodes. In another embodiment, the processor 110 may allocate "1" to the reference node and neighboring nodes adjacent to the reference node, and allocate "0" to the remaining nodes.

한편, 제1 문자열 정보(491)는 원자 토큰들 이외에 원자 사이의 결합들(예를 들어, -, =, #), 괄호, 공백(whitespace)을 가진 순환 구조들(cyclic structures)의 숫자와 같이 비원자 토큰들을 더 포함하므로, 제1 문자열 정보(491)의 토큰들과 제1 그래프 정보(441)의 노드들은 서로 일치하지 않는다. 이러한 비원자 토큰들은 문자열 정보의 전체 문맥(context)에서 명확해질 수 있으므로, 더 넓은 범위의 정보가 필요할 수 있다. 따라서, 프로세서(110)는 비원자 토큰들에 대응되는 노드들을 마스킹하지 않을 수 있다. 다시 말해, 프로세서(110)는 비원자 토큰들에 대응되는 노드들에 "1"을 할당할 수 있다. On the other hand, the first character string information 491 is, in addition to the atomic tokens, bonds between atoms (eg, -, =, #), parentheses, such as the number of cyclic structures having a space (whitespace). Since non-atomic tokens are further included, the tokens of the first string information 491 and the nodes of the first graph information 441 do not match each other. Since these non-atomic tokens can be disambiguated from the full context of string information, a wider range of information may be required. Accordingly, the processor 110 may not mask the nodes corresponding to the non-atomic tokens. In other words, the processor 110 may allocate “1” to nodes corresponding to the non-atomic tokens.

프로세서(110)는 역합성 예측 모델의 학습을 강화하기 위하여, 인코더부(430)의 헤드마다 상이한 기준 거리를 설정할 수 있다. 예를 들어, 프로세서(110)는 제1 헤드의 기준 거리를 제1 거리로 설정하고, 제1 헤드와 상이한 제2 헤드에 제1 거리와 상이한 제2 거리를 설정할 수 있다.Processor 110 reverse synthesis In order to enhance the learning of the predictive model, different reference distances may be set for each head of the encoder unit 430 . For example, the processor 110 may set the reference distance of the first head as the first distance, and set a second distance different from the first distance to a second head different from the first head.

프로세서(110)는 제1 마스크 행렬(442)에 기초하여, 셀프 어텐션 점수 행렬(511)에서 제1 문자열 정보(491)의 인코딩 시 어텐션 해야할 요소들을 판단하고, 판단 결과에 기초하여 마스크가 적용된 셀프 어텐션 점수 행렬(511)을 출력할 수 있다.Based on the first mask matrix 442 , the processor 110 determines elements to be attended to when encoding the first string information 491 in the self-attention score matrix 511 , and based on the determination result, the self-attention score matrix 511 determines the self-attention score matrix 511 . An attention score matrix 511 may be output.

프로세서(110)는 제1 마스크 행렬(442)의 요소들 중에서 "1"의 값을 가지는 요소들을 특정할 수 있다. 또한, 프로세서(110)는 특정된 요소들에 대응(즉, 좌표가 동일)되는 셀프 어텐션 점수 행렬(511)의 요소들을 판단할 수 있다. 또한, 프로세서(110)는 셀프 어텐션 점수 행렬(511)의 요소들 중에서 제1 마스크 행렬(442)에서 특정된 요소들에 대응되는 요소들의 값은 변경하지 않고, 나머지 요소들은 "-∞"로 변경할 수 있다. 이에 따라, 프로세서(110)는 셀프 어텐션 점수 행렬(511)의 요소들 중에서 어텐션해야할 요소들을 판단할 수 있다.The processor 110 may specify elements having a value of “1” among elements of the first mask matrix 442 . Also, the processor 110 may determine the elements of the self-attention score matrix 511 that correspond to the specified elements (ie, the coordinates are the same). In addition, the processor 110 does not change the values of the elements corresponding to the elements specified in the first mask matrix 442 among the elements of the self-attention score matrix 511, and changes the remaining elements to "-∞". can Accordingly, the processor 110 may determine elements to be attended to among elements of the self-attention score matrix 511 .

S1340 단계에서, 프로세서(110)는 마스크가 적용된 셀프 어텐션 점수 행렬(511)에 기초하여 제1 문자열 정보(491)에 포함된 토큰들 사이의 관련 정도를 확률로 나타내는 셀프 어텐션 행렬을 생성할 수 있다.In step S1340 , the processor 110 may generate a self-attention matrix indicating the degree of relevance between tokens included in the first string information 491 as a probability based on the self-attention score matrix 511 to which the mask is applied. .

프로세서(110)는 소프트 맥스 함수를 이용하여 마스크가 적용된 셀프 어텐션 점수 행렬(511)의 어텐션 분포를 계산하고, 계산 결과와 각각의 밸류를 가중합함으로써 어텐션 값을 생성할 수 있다. 어텐션 값들은 셀프 어텐션 행렬로써 표현될 수 있다. 소프트 맥스 함수에 의해 셀프 어텐션 점수 행렬(511)의 점수가 확률로써 표현될 수 있다.The processor 110 may generate an attention value by calculating the attention distribution of the self-attention score matrix 511 to which the mask is applied using the soft max function, and weighting the calculation result and each value. Attention values may be expressed as a self-attention matrix. The score of the self-attention score matrix 511 may be expressed as a probability by the soft max function.

S1350 단계에서, 프로세서(110)는 셀프 어텐션 행렬에 기초하여 인코딩된 제1 출력 시퀀스를 출력할 수 있다.In operation S1350 , the processor 110 may output the encoded first output sequence based on the self-attention matrix.

도 14는 일 실시예에 따른 디코딩 방법을 설명하기 위한 순서도이다.14 is a flowchart illustrating a decoding method according to an embodiment.

도 14를 참조하면, S1410 단계에서, 프로세서(110)는 제2 문자열 정보(492) 및 제2 그래프 정보(481)를 수신할 수 있다.Referring to FIG. 14 , in step S1410 , the processor 110 may receive second string information 492 and second graph information 481 .

제2 문자열 정보(492)는 SMILES 형식으로 프로세서(110)에 입력될 수 있다. SMILES 형식의 제2 문자열 정보(492)는 원자 토큰(예를 들어, B, C, N, O) 및 원자 사이의 결합들(예를 들어, -, =, #), 괄호, 공백(whitespace)을 가진 순환 구조들(cyclic structures)의 숫자 등과 같은 비원자 토큰을 포함할 수 있다.The second character string information 492 may be input to the processor 110 in a SMILES format. The second string information 492 in SMILES format includes an atomic token (eg, B, C, N, O) and bonds between atoms (eg, -, =, #), parentheses, and whitespace. may contain non-atomic tokens, such as a number of cyclic structures with

제2 그래프 정보(481)는 2차원의 그래프 형식으로 프로세서(110)에 입력될 수 있다. 그래프 정보는 노드 및 엣지를 포함할 수 있다. 노드는 생성물의 원자에 대한 정보를 포함하고, 엣지는 각 원자의 연결관계에 대한 정보를 포함할 수 있다.The second graph information 481 may be input to the processor 110 in the form of a two-dimensional graph. Graph information may include nodes and edges. A node may contain information about the atoms of a product, and an edge may contain information about the connection relationship of each atom.

S1420 단계에서, 프로세서(110)는 제1 문자열 정보(491)에 포함된 토큰들과 제2 문자열 정보(492)에 포함된 토큰들 사이의 관련 정도를 확률로 나타내는 교차 어텐션 행렬(911)을 생성할 수 있다.In step S1420 , the processor 110 generates a cross-attention matrix 911 representing the degree of correlation between tokens included in the first character string information 491 and tokens included in the second character string information 492 as a probability. can do.

프로세서(110)는 교차 어텐션 행렬(911)을 생성하기 위하여, 제1 문자열 정보(491)로부터 키 및 밸류 정보를 획득할 수 있다. 또한, 프로세서(110)는 제2 문자열 정보(492)로부터 쿼리 정보를 획득할 수 있다.The processor 110 may obtain key and value information from the first string information 491 to generate the cross attention matrix 911 . Also, the processor 110 may obtain query information from the second string information 492 .

프로세서(110)는 각각의 쿼리에 대해서 모든 키와의 관련도를 판단하고, 판단 결과를 교차 어텐션 점수 행렬에 나타낼 수 있다. 교차 어텐션 점수 행렬은 쿼리와 키의 관련도를 나타내는 점수에 대한 정보를 포함할 수 있다. 일 실시예에서, 교차 어텐션 점수 행렬은 쿼리와 키의 스케일드 닷-프로덕트 어텐션(scaled dot product attention) 연산에 의해 도출될 수 있다.The processor 110 may determine the degree of relevance to all keys for each query, and display the determination result in a cross-attention score matrix. The cross-attention score matrix may include information about a score indicating a relationship between a query and a key. In one embodiment, the cross attention score matrix may be derived by a scaled dot product attention operation of a query and a key.

프로세서(110)는 소프트 맥스 함수를 이용하여 교차 어텐션 점수 행렬의 어텐션 분포를 계산하고, 계산 결과와 각각의 밸류를 가중합함으로써 어텐션 값을 생성할 수 있다. 어텐션 값들은 교차 어텐션 행렬(911)로써 표현될 수 있다.The processor 110 may generate an attention value by calculating the attention distribution of the cross attention score matrix using the soft max function, and weighting the calculation result and each value. Attention values may be expressed as a cross attention matrix 911 .

S1430 단계에서, 프로세서(110)는 생성물에 포함된 원자와 반응물에 포함된 원자 사이의 관계를 나타내는 원자 맵핑 정보에 기초하여 교차 어텐션 행렬(911)에 마스킹을 적용할 수 있다.In operation S1430 , the processor 110 may apply a masking to the cross attention matrix 911 based on atomic mapping information indicating a relationship between the atoms included in the product and the atoms included in the reactant.

프로세서(110)는 제1 그래프 정보(441) 및 제2 그래프 정보(481)에 기초하여 생성물에 포함된 원자와 반응물에 포함된 원자 사이의 관계를 나타내는 원자 맵핑 정보를 획득할 수 있다.The processor 110 may obtain atomic mapping information indicating a relationship between an atom included in a product and an atom included in a reactant based on the first graph information 441 and the second graph information 481 .

프로세서(110)는 유연한 최대 공통 하위 구조(Flexible Maximum Common Substructure: FMCS)기법을 이용하여 원자 맵핑 정보를 획득할 수 있다. 예를 들어, 유연한 최대 공통 하위 구조(FMCS)는 RDkit에 구현된 FMCS 알고리즘일 수 있으나 이에 제한되지 않는다.The processor 110 may acquire atomic mapping information using a flexible maximum common substructure (FMCS) technique. For example, a flexible maximum common substructure (FMCS) may be, but is not limited to, an FMCS algorithm implemented in RDkit.

프로세서(110)는 원자 맵핑 정보에 기초하여 생성물에 포함된 원자들과 반응물에 포함된 원자들 각각의 대응 여부를 판단하고, 판단 결과에 기초하여 제2 마스크 행렬(482)을 생성할 수 있다.The processor 110 may determine whether each of the atoms included in the product and the atoms included in the reactant correspond to each other based on the atomic mapping information, and generate the second mask matrix 482 based on the determination result.

프로세서(110)는 제2 문자열 정보(492)를 행으로 설정하고, 제1 문자열 정보(491)를 열로 설정하여 제2 프리 마스크 행렬을 생성할 수 있다.The processor 110 may set the second string information 492 as a row and set the first string information 491 as a column to generate a second free mask matrix.

프로세서(110)는 제1 그래프 정보(441)에 포함된 노드들 중에서 어느 하나의 노드를 기준 노드로 설정할 수 있다. 또한, 프로세서(110)는 제2 그래프 정보(481)에 포함된 노드들 중에서 기준 노드에 대응되는 노드를 판단할 수 있다.The processor 110 may set any one node among the nodes included in the first graph information 441 as a reference node. Also, the processor 110 may determine a node corresponding to the reference node from among the nodes included in the second graph information 481 .

프로세서(110)는 제2 프리 마스크 행렬에 기준 노드에 대응되는 노드에 대한 정보를 할당할 수 있다. 일 실시예에서, 프로세서(110)는 기준 노드(node)에 대응되는 노드에 "1"을 할당하고, 나머지 노드들(nodes)에 "0"을 할당할 수 있다.The processor 110 may allocate information on a node corresponding to the reference node to the second premask matrix. In an embodiment, the processor 110 may allocate “1” to a node corresponding to a reference node and allocate “0” to the remaining nodes.

일 실시예에서, 프로세서(110)는 비원자 토큰들에 대응되는 노드들을 마스킹하지 않을 수 있다. 다시 말해, 프로세서(110)는 비원자 토큰들에 대응되는 노드들에 "1"을 할당할 수 있다. 다른 실시예에서, 프로세서(110)는 원자들 사이의 대응관계에 어텐션하기 위하여 비원자 토큰들에 대응되는 노드들을 마스킹할 수 있다. 다시 말해, 프로세서(110)는 비원자 토큰들에 대응되는 노드들에 "0"을 할당할 수 있다.In one embodiment, the processor 110 may not mask nodes corresponding to non-atomic tokens. In other words, the processor 110 may allocate “1” to nodes corresponding to the non-atomic tokens. In another embodiment, the processor 110 may mask nodes corresponding to non-atomic tokens in order to attend to correspondences between atoms. In other words, the processor 110 may allocate “0” to nodes corresponding to the non-atomic tokens.

프로세서(110)는 제2 마스크 행렬(482)에 기초하여, 교차 어텐션 행렬(911)에서 역합성 예측 모델의 어텐션 손실 계산 시 어텐션해야할 요소들을 판단하고, 판단 결과에 기초하여 마스크가 적용된 교차 어텐션 행렬(911)을 출력할 수 있다.The processor 110 determines elements to be attended when calculating the attention loss of the desynthesis prediction model in the cross-attention matrix 911 based on the second mask matrix 482 , and based on the determination result, the cross-attention matrix to which the mask is applied (911) can be output.

프로세서(110)는 제2 마스크 행렬(482)의 요소들 중에서 "1"의 값을 가지는 요소들을 특정할 수 있다. 또한, 프로세서(110)는 특정된 요소들에 대응(즉, 좌표가 동일)되는 교차 어텐션 행렬(911)의 요소들이 역합성 예측 모델의 교차 어텐션 손실 계산 시 어텐션해야할 요소들이라고 결정할 수 있다.The processor 110 may specify elements having a value of “1” among the elements of the second mask matrix 482 . Also, the processor 110 may determine that elements of the cross-attention matrix 911 corresponding to the specified elements (ie, the coordinates are the same) are elements to be attended to when calculating the cross-attention loss of the inverse synthesis prediction model.

S1440 단계에서, 프로세서(110)는 마스크가 적용된 교차 어텐션 행렬(911)에 기초하여 디코딩된 제2 출력 시퀀스를 출력할 수 있다.In operation S1440 , the processor 110 may output the decoded second output sequence based on the cross-attention matrix 911 to which the mask is applied.

도 15는 일 실시예에 따른 역합성 예측 모델의 학습 방법을 설명하기 위한 순서도이다.15 is a flowchart illustrating a method of learning an inverse synthesis prediction model according to an embodiment.

도 15를 참조하면, S1510 단계에서, 프로세서(110)는 마스크가 적용된 교차 어텐션 행렬(911)로부터 역합성 예측 모델의 어텐션 손실을 획득할 수 있다. 어텐션 손실은 제2 마스크 행렬(482)과 교차 어텐션 행렬(911)의 오차를 의미할 수 있다.Referring to FIG. 15 , in step S1510 , the processor 110 may obtain the attention loss of the desynthesis prediction model from the cross-attention matrix 911 to which the mask is applied. The attention loss may mean an error between the second mask matrix 482 and the cross attention matrix 911 .

S1520 단계에서, 프로세서(110)는 제2 출력 시퀀스로부터 교차 엔트로피 손실을 획득할 수 있다. 일 실시예에서, 프로세서(110)는 제2 출력 시퀀스와 제2 문자열 정보(492)를 비교하여 교차 엔트로피 손실을 획득할 수 있다. 교차 엔트로피 손실의 획득 방법은 특정 방법으로 제한되지 않는다.In step S1520 , the processor 110 may obtain a cross entropy loss from the second output sequence. In an embodiment, the processor 110 may obtain a cross-entropy loss by comparing the second output sequence with the second string information 492 . A method of obtaining the cross-entropy loss is not limited to a specific method.

S1530 단계에서, 프로세서(110)는 어텐션 손실 및 교차 엔트로피 손실에 기초하여 역합성 예측 모델을 학습시킬 수 있다.In step S1530 , the processor 110 may train the inverse synthesis prediction model based on the loss of attention and the loss of cross entropy.

프로세서(110)는 어텐션 손실과 교차 엔트로피 손실을 합산함으로써 역합성 예측 모델의 전체 손실을 계산할 수 있다.The processor 110 may calculate the total loss of the inverse synthesis prediction model by summing the loss of attention and the loss of cross entropy.

프로세서(110)는 역합성 예측 모델의 전체 손실이 작아지도록 역합성 예측 모델을 학습시킬 수 있다. 예를 들어, 프로세서(110)는 역합성 예측 모델의 전체 손실이 기 설정된 기준 손실 보다 작아질 때까지 역합성 예측 모델을 학습시킬 수 있으나 이에 제한되지 않는다.The processor 110 may train the inverse synthesis prediction model so that the overall loss of the inverse synthesis prediction model is small. For example, the processor 110 may train the inverse synthesis prediction model until the total loss of the inverse synthesis prediction model becomes smaller than a preset reference loss, but is not limited thereto.

템플릿을 이용한 역합성 예측 방법은 경험 많은 화학자의 도메인 지식(domain knowledge)이 필요한 반면, 본 개시의 뉴럴 네트워크 장치(100)는 템플릿 없이 역합성 예측을 수행하므로, 시간과 비용 효율성이 증대된다. 또한. 본 개시의 뉴럴 네트워크 장치(100)는 템플릿의 범위(coverage)를 넘어서까지 생성물에 대응되는 반응물 조합들을 예측할 수 있다.While the reverse synthesis prediction method using a template requires domain knowledge of an experienced chemist, the neural network apparatus 100 of the present disclosure performs reverse synthesis prediction without a template, thereby increasing time and cost effectiveness. In addition. The neural network device 100 of the present disclosure may predict reactant combinations corresponding to products beyond the template coverage.

화합물의 문자열 정보만을 이용한 역합성 예측 방법은 정확성이 떨어지며, 화합물의 그래프 정보만을 이용한 역합성 예측 방법은 원자 맵핑 정보에 지나치게 의존하는 반면, 본 개시의 뉴럴 네트워크 장치는 "문자열 정보-그래프 정보"의 이중성(duality)을 이용하므로 역합성 예측 모델의 신속성, 정확성 및 효율성이 증대될 수 있다.The reverse synthesis prediction method using only the string information of the compound has poor accuracy, and the reverse synthesis prediction method using only the graph information of the compound relies too much on atomic mapping information, whereas the neural network device of the present disclosure is the “string information-graph information” of the present disclosure. By using duality, the speed, accuracy, and efficiency of the inverse synthesis prediction model can be increased.

한편, 상술한 실시예들은 컴퓨터에서 실행될 수 있는 프로그램으로 작성 가능하고, 컴퓨터로 읽을 수 있는 기록매체를 이용하여 상기 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다. 또한, 상술한 실시예들에서 사용된 데이터의 구조는 컴퓨터로 읽을 수 있는 기록매체에 여러 수단을 통하여 기록될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드 디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등)와 같은 저장매체를 포함한다.Meanwhile, the above-described embodiments can be written as a program that can be executed on a computer, and can be implemented in a general-purpose digital computer that operates the program using a computer-readable recording medium. In addition, the structure of data used in the above-described embodiments may be recorded in a computer-readable recording medium through various means. The computer-readable recording medium includes a storage medium such as a magnetic storage medium (eg, ROM, floppy disk, hard disk, etc.) and an optically readable medium (eg, CD-ROM, DVD, etc.).

본 실시예와 관련된 기술 분야에서 통상의 지식을 가진 자는 상기된 기재의 본질적인 특성에서 벗어나지 않는 범위에서 실시예가 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예는 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 권리 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 실시예에 포함된 것으로 해석되어야 할 것이다.Those of ordinary skill in the art related to the present embodiment will understand that the embodiment may be implemented in a modified form without departing from the essential characteristics of the above description. Therefore, the disclosed embodiments are to be considered in an illustrative rather than a restrictive sense. The scope of the rights is indicated in the claims rather than the above description, and all differences within the scope equivalent thereto should be construed as being included in the present embodiment.

100: 뉴럴 네트워크 장치
110: 프로세서
430: 인코더부
440: 제1 마스크 행렬 생성부
470: 디코더부
480: 제2 마스크 행렬 생성부100: neural network device
110: processor
430: encoder unit
440: first mask matrix generator
470: decoder unit
480: second mask matrix generator

Claims

In the learning method of the retrosynthesis prediction model,
determining first attention information from the first character string information of the product based on the first graph information of the product, and encoding the first character string information based on the determination result;
determining second attention information from the first graph information and second graph information of the reactant, and decoding second string information of the reactant based on the determination result; and
and learning the inverse synthesis prediction model based on the decoding result of the second character string information.

According to claim 1,
The step of encoding the first string information is
receiving the first character string information and the first graph information;
generating a self-attention score matrix indicating a degree of relevance between tokens included in the first string information;
applying a mask to the self-attention score matrix based on the first graph information;
generating a self-attention matrix representing the degree of attention of each of the tokens included in the first string information as a probability based on the self-attention score matrix to which the mask is applied; and
outputting an encoded first output sequence based on the self-attention matrix.

3. The method of claim 2,
The step of generating the self-attention matrix is
obtaining a query, a key, and a value from the first character string information; and
generating the self-attention matrix based on the query, key, and value.

3. The method of claim 2,
Applying the mask
generating a first mask matrix based on the first graph information and a preset reference distance; and
Based on the first mask matrix, determining elements to be attended to when encoding the first string information in the self-attention score matrix, and outputting a self-attention score matrix to which a mask is applied based on the determination result A method comprising;

5. The method of claim 4,
The step of generating the first mask matrix includes:
setting any one node among the nodes included in the first graph information as a reference node; and
and expressing the reference node and adjacent nodes existing at a distance by the reference distance from the reference node as “1” and expressing the remaining nodes as “0”.

According to claim 1,
The step of decoding the second string information is
receiving the second character string information and the second graph information;
generating a cross attention matrix indicating a degree of relevance between tokens included in the first character string information and tokens included in the second character string information as a probability;
applying a mask to the cross attention matrix based on atom mapping indicating a relationship between atoms included in the product and atoms included in the reactant; and
outputting a decoded second output sequence based on the cross-attention matrix to which the mask is applied.

7. The method of claim 6,
The step of generating the cross attention matrix is
obtaining a key and a value from the first character string information;
obtaining a query from the second character string information; and
generating the cross attention matrix based on the query, key and value.

7. The method of claim 6,
Applying the mask
obtaining the atomic mapping information based on the first graph information and the second graph information;
determining whether each of the atoms included in the product and the atoms included in the reactant correspond to each other based on the atomic mapping information, and generating a second mask matrix based on the determination result; and
Based on the second mask matrix, determining elements to be attended to when calculating an attention loss of the inverse synthesis prediction model in the cross-attention matrix, and outputting a cross-attention matrix to which a mask is applied based on the determination result How to include ;.

9. The method of claim 8,
The step of generating the second mask matrix is
setting any one node among the nodes included in the first graph information as a reference node; and
and expressing a node corresponding to the reference node as “1” among the nodes included in the second graph information and expressing the remaining nodes as “0”.

9. The method of claim 8,
The step of training the inverse synthesis prediction model is
obtaining an attention loss of the inverse synthesis prediction model from the cross-attention matrix to which the mask is applied;
obtaining a cross entropy loss of the inverse synthesis prediction model from the second output sequence; and
Training the inverse synthesis prediction model based on the loss of attention and the loss of cross entropy.

11. The method of claim 10,
The method in which the attention loss is tunable by parameters.

According to claim 1,
The first character string information and the second character string information are
A method in the form of a Simplified Molecular-Input Line-Entry System (SMILES) code.

According to claim 1,
The first graph information and the second graph information
at least one node and at least one edge;
said node contains information about atoms of said product or said reactant;
The edge includes information about the connection relationship of the atoms.

A computer-readable recording medium in which a program for executing the method of claim 1 in a computer is recorded.

In the apparatus for predicting a reaction product using a reverse synthesis prediction model,
a memory in which at least one program is stored; and
a processor executing the at least one program;
the processor is
determine first attention information from the first character string information of the product based on the first graph information of the product, and encode the first character string information based on the determination result;
determining second attention information from the first graph information and second graph information of the reactant, and decoding second string information of the reactant based on the determination result;
An apparatus for learning the inverse synthesis prediction model based on the decoding result of the second character string information.