KR20200072585A

KR20200072585A - Method for predicting the HAZARD and RISK of target chemicals BASED ON AI

Info

Publication number: KR20200072585A
Application number: KR1020180152485A
Authority: KR
Inventors: 이율희; 윤영선; 이혜정
Original assignee: 이율희; 윤영선
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2020-06-23

Abstract

According to an embodiment of the present invention, a method for predicting harmfulness and risk of a target substance based on artificial intelligence, which is performed by a physical property prediction device for predicting physical properties of a target substance, comprises: a pattern analysis step of analyzing a character string pattern for chemical properties or a structure of the target substance; a prediction modeling step of providing a physical property analysis and a harmfulness and risk prediction model based on artificial intelligence by using data about physical properties and properties of at least one chemical substance as learning data, and calculating a prediction value according to similarity for the analyzed character string pattern based on a toxicity prediction model; and an information providing step of comparing the calculated prediction value with an experimental value of an experimental substance having chemical properties similar to those of the target substance so as to predict physical properties, harmfulness and risk of the target substance and determine whether the target substance meets the chemical substance regulatory standards, and matching the determined physical properties, harmfulness and risk with preset chemical substance regulatory standards to provide physical properties and toxicity determination result including a matching result.

Description

Method for predicting the HAZARD and RISK of target chemicals BASED AI based on artificial intelligence

본 발명은 인공지능에 기반한 복수의 분석 모델을 조합하여 대상 물질들의 물성 및 인체, 환경 유해 유발 가능성을 예측하기 위한 인공 지능에 기반한 대상 물질의 유해성 및 위해성 예측 방법에 관한 것이다.The present invention relates to a method for predicting the harmfulness and risk of a target substance based on artificial intelligence for predicting the physical properties of the target substances and a possibility of causing harm to the human body and the environment by combining a plurality of analysis models based on artificial intelligence.

일반적으로 신화학 물질이 개발될 때 신물질 개발자는 화학물질 등록 및 평가 등에 관한 법률(이하, 화평법이라고 함)과 화학물질 위해성 평가의 구체적 방법 등에 관한 규정에 적합한지 사후 테스트를 거치고, 테스트 결과에 대한 서류를 제출하면 정부의 유관부서에서는 화평법 가이드라인에 통과, 제한, 금지로 신물질을 등급화한다.In general, when a new chemical substance is developed, the developer of the new substance undergoes a post-test to determine whether it conforms to the regulations on the registration and evaluation of chemical substances (hereinafter referred to as the Peace Act) and the specific methods of chemical risk assessment, etc. When the documents are submitted, the government-related departments grade new substances by passing, restricting, or prohibiting them according to the guidelines of the Peace Act.

여기서 화평법의 주요 평가 분야는 물리 및 화학적 특성, 인체 유해성, 그리고, 환경 유해성이고, 인체 유해성 평가가 적용되는 사업 분야에는 제약 및 의료 등 바이오 분야이며, 환경 유해성 평가가 적용되는 사업 분야에는 화학 및 석유화학/반도체 및 전자/에너지 기업 등으로 폐수 및 산업 폐기물 분야이다. Here, the major evaluation areas of the Peace Act are physical and chemical properties, human hazards, and environmental hazards.In the business field to which the human hazard assessment is applied, the pharmaceutical and medical fields are bio, and to the business field to which the environmental hazard assessment is applied, chemical and It is a petrochemical/semiconductor and electronic/energy company, and is a wastewater and industrial waste sector.

이때, 신물질 개발자는 사전에 신물질 설계시, 화학식으로 대략적인 독성을 포함한 물성 예측이 가능하지만 그 정확도가 높지 않고, 복합적이고 복잡한 화학 물질의 경우, 화학식 정보로만 그 물성의 파악이 어렵기 때문에 대부분 신물질 개발자는 사후 테스트를 수행한다.At this time, new material developers can predict the physical properties including approximate toxicity in chemical formulas when designing new substances in advance, but the accuracy is not high, and in the case of complex and complex chemical substances, it is difficult to grasp the physical properties only with chemical formula information. The developer performs post-tests.

신물질에 대한 사후 테스트 과정은 테스트 자체에 비용이 발생하고, 독성물질의 노출로 인한 실험자가 독성에 노출되어 안전 여부를 장담할 수 없다는 문제점이 있으며, 신물질이 사후 테스트에서 화평법을 충족시키지 못한 물질로 판별될 경우 해당 물질 개발을 위해 투입된 원료 및 인건비 등 비용은 매몰 비용이 된다는 문제점이 있다. In the post-testing process for new substances, there is a problem in that the test itself is expensive, and there is a problem that the experimenter is exposed to toxicity due to the exposure of toxic substances, so that safety cannot be guaranteed. If it is determined as, there is a problem that the costs such as raw materials and labor costs for the development of the material become a sunk cost.

화학 물질의 인체 및 환경에의 유해성에 대한 우려의 목소리가 높아지면서 화평법이 강화되고 있고, 신물질 개발도 증가하는 추세로 제조사 민 감사기관의 화평법 적용으로 인한 비용이 증가되고 있는 추세이다. As the voice of concern about the harmfulness of chemicals to the human body and the environment increases, the peace law is strengthening, and the development of new materials is also increasing.

화학 물질 개발 분야에서는 빅데이터를 분석하여 물성을 예측하는 화학 정보학(Cheminformatics)이 부상하고 있고, 해외에서는 선두업체가 물성 예측기(predictor)를 개발하였으며, 국내의 몇몇 제약 기업에서는 물성 예측기를 도입하였으나 그 활용이 돌연변이성 예측이나 발암예측 등의 제약 분야로 국한되어 있다는 문제점이 잇다. In the field of chemical development, Cheminformatics, which analyzes big data to predict physical properties, is emerging, and a leading company developed a physical property predictor overseas, and some domestic pharmaceutical companies introduced a physical property predictor. There is a problem that utilization is limited to the pharmaceutical field, such as predicting mutagenicity or predicting carcinogenesis.

최근, 가습기 살균제 및 생리대 파동, 산업 폐기물의 환경 오염에 대한 우려가 커지면서 화학 물질의 유해성 검사 의무화에 대한 당위성이 커지고 있고, 그로 인해 화평법에 의한 신규 화학물질 등록 기준이 엄격해지면서 평가 기준 항목이 대폭 확대됨과 동시에 절차도 복잡해지고 있다. Recently, as concerns about humidifier disinfectants, sanitary napkin surges, and environmental pollution of industrial wastes have increased, the justification for the mandatory inspection of chemical substances has increased, and as a result, the criteria for evaluation of new chemical substances by the Peace Act have become stricter At the same time, the process has also been complicated.

새로운 신물질이나 최초 화평법에 의해 등록되는 물질의 경우, 등록시 유해성과 위해성 확인에서 등록에 실패요인으로 작용할 수 있기 때문에 신물질 개발 과정에서 비용이나 시간적 불이익을 초래할 수 있다. In the case of a new new substance or a substance registered under the first peace law, it can cause cost or time disadvantages in the development of new substances because it can act as a failure factor in registration in confirming the hazards and risks during registration.

따라서, 신물질 개발자 또는 개발업체는 신물질의 화학구조를 디자인하기 전에 환경적 영향력을 미리 확인하고, 그 결과를 빠르게 생산 공정에 적용하여 트렌드에 뒤쳐지지 않도록 예측 분석할 수 있는 기술을 필요로 하고 있다. Therefore, a new material developer or developer needs a technology that can check the environmental impact in advance before designing the chemical structure of a new material, and apply the results to the production process quickly and predictive analysis to keep up with trends.

본 발명은 전술한 문제점을 해결하기 위하여, 본 발명의 일 실시예에 따라 신물질에 대한 물성 및 독성을 분석 및 예측할 수 있는 다수의 분석 모델을 조합하여 대상 물질의 독성 유발 가능성을 예측할 수 있는 인공 지능에 기반한 대상 물질의 유해성 및 위해성 예측 방법을 제공하는 것에 목적이 있다.The present invention, in order to solve the above-mentioned problems, according to an embodiment of the present invention, by combining a number of analytical models capable of analyzing and predicting properties and toxicity for new substances, artificial intelligence capable of predicting the possibility of causing toxicity of target substances It is an object to provide a method for predicting the hazard and risk of a target substance based on.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the present embodiment is not limited to the technical problem as described above, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서 본 발명의 일 실시예에 따른 대상 물질의 물성을 예측하는 물성 예측 장치에 의해 수행되는 딥러닝에 기반한 대상 물질의 유해성 및 위해성 예측 방법은, 상기 대상 물질의 화학적 특성 또는 구조에 대한 문자열 패턴을 분석하는 패턴 분석 단계; 적어도 하나 이상의 화학 물질의 물성과 특성에 대한 데이터들을 학습 데이터로 사용하여 딥러닝에 기반한 독성 예측 모델을 제공하고, 상기 독성 예측 모델에 기반하여 상기 분석된 문자열 패턴 및 분자구조 특성(기하학적 입체, 전자적 특성 등)에 대한 유사도에 따라 예측값을 산출하는 예측 모델링 단계; 및 상기 산출된 예측값을 상기 대상 물질과 유사한 화학적 특성을 가지는 실험 물질의 실험값과 비교하여 상기 대상 물질의 독성을 판별하고, 상기 판별된 독성을 기설정된 화학물질 규제 기준과 매칭하여 매칭 결과를 포함한 물성 및 독성 판별 결과를 제공하는 정보 제공 단계를 포함하는 것이다.As a technical means for achieving the above technical problem, a method for predicting the harmfulness and risk of a target material based on deep learning performed by a property prediction device for predicting a property of the target material according to an embodiment of the present invention includes the target material A pattern analysis step of analyzing a string pattern for a chemical property or structure of the; Provides a prediction model based on deep learning using data on the properties and properties of at least one chemical substance as learning data, and analyzes the string pattern and molecular structure characteristics (geometrical stereoscopic, electronic based on the prediction model based on the toxicity prediction model) Predictive modeling step of calculating a predicted value according to the similarity to characteristics; And comparing the calculated predicted value with an experimental value of a test substance having chemical characteristics similar to the target substance, to determine the toxicity of the target substance, and matching the determined toxicity with a preset chemical substance regulation criterion to include properties of the result including matching results. And providing an information providing step of providing a toxicity determination result.

상기 예측 모델링 단계는, 적어도 하나 이상의 분석 모델을 조합하여 기설정된 유사도 범위를 가지는 화학적 특성끼리 군집화하고, 상기 군집화된 화학적 특성에서 상기 대상 물질의 문자열 패턴에 대한 기여도를 판별하여 분류 및 회귀 분석을 진행하는 것이다. In the predictive modeling step, at least one analysis model is combined to group chemical properties having a predetermined similarity range, and classification and regression analysis is performed by determining the contribution to the string pattern of the target material from the clustered chemical properties. Is to do.

여기서, 상기 적어도 하나 이상의 분석 모델은, Scikit-learn, Keras, Tensorflow, PyTorch, k-NN(k-Nearest Neighbors), decision tree, Random Forest(RF), gradient boosting, ensemble, SVM(Suppoert vector machine), Naive Bayes, Regression, K-means, PCA(principal component analysis), clustering, Grid Search, Cross Validation, ANN, CNN, RNN, Autoencoder, GAN, DNN, QNN, ANOVA(Analysis of variance) 중 적어도 하나 이상을 포함하는 것이다. Here, the at least one analysis model, Scikit-learn, Keras, Tensorflow, PyTorch, k-N (k-Nearest Neighbors), decision tree, Random Forest (RF), gradient boosting, ensemble, SVM (Suppoert vector machine) , Naive Bayes, Regression, K-means, PCA (principal component analysis), clustering, Grid Search, Cross Validation, ANN, CNN, RNN, Autoencoder, GAN, DNN, QNN, ANOVA (Analysis of variance) To include.

한편, 상기 패턴 분석 단계는, 상기 대상 물질의 화학 구조를 1차원 문자배열로 변환한 후 비교하는 SMILES(Simplified molecular-input line-entry system) 코드 및 화학 구조식을 표현하는 다양한 명명 표기를 이용하여 문자열 패턴과 분자식 구조의 유사성을 분석하는 것이다. On the other hand, in the pattern analysis step, the chemical structure of the target substance is converted into a one-dimensional character array and then compared using SMILES (Simplified molecular-input line-entry system) code and various naming notations representing chemical structural formulas. It is to analyze the similarity of patterns and molecular structures.

상기 정보 제공 단계는, 피실험체의 반수치사농도(LC50) 또는 반수영향농도(EC50)를 수집하고, 상기 수집된 반수 치사농도 또는 반수영향농도, 그 외 물리화학적 특성에 기반하여 상기 대상물질의 유해성과 위해성을 수치 예측하고 화학물질 규제에 부합하는지 여부를 판별하는 것이다. In the step of providing the information, the half-lethal concentration (LC50) or the half-water impact concentration (EC50) of the subject is collected, and the harmfulness of the target material is based on the collected half-lethal concentration or half-water impact concentration, and other physicochemical properties It is a numerical prediction of the risks and the determination of compliance with chemical regulations.

상기 정보 제공 단계는, 정적 분석을 통해 상기 대상 물질의 화학적 구조와 유해성간의 상관 관계를 분석하여 상기 대상 물질의 화학적 특성 또는 구조와 환경 유해성과 위해성 간의 관계에 대한 용량-반응 곡선과 예측값 결과, 상기 기설정된 화학물질 규제 기준과의 적법성 여부를 독성 판별 결과로 제공하는 것이다. 상기 기설정된 화학물질 규제 기준은 화학물질 등록 및 평가 등에 관한 법률에 의해 상기 대상 물질을 허가 물질로 평가하기 위한 평가 기준 항목으로 이루어진 것이다. In the providing of the information, a correlation between a chemical structure and a hazard of the target substance is analyzed through a static analysis, and a dose-response curve and a predicted value result for the relationship between the chemical property or structure of the target substance and environmental hazard and risk, and the It is to provide legality as a result of toxicity determination as to whether or not it is compatible with a predetermined chemical substance regulation standard. The predetermined chemical substance regulation standards consist of evaluation criteria items for evaluating the target substance as a licensed substance in accordance with the Act on Registration and Evaluation of Chemical Substances.

상기 정보 제공 단계는, 상기 예측값과 연동하여 상기 대상 물질의 독성을 감소시키는 화학적 구조를 리디자인(redesign)하여 제공하는 것이다. The step of providing information is to redesign and provide a chemical structure that reduces toxicity of the target substance in association with the predicted value.

전술한 본 발명의 과제 해결 수단에 의하면, 신물질의 분자식을 문자열로 변환하여 빠르고 용이하게 패턴 분석을 수행할 수 있고, 다수의 딥러닝 알고리즘을 이용하여 문자열 분석을 수행하여 '패턴 분석의 정확성'과 '화학구조와 유해성 및 위해성과의 상관관계' , '패턴의 다양성'을 확보할 수 있다. According to the above-described problem solving means of the present invention, it is possible to quickly and easily perform a pattern analysis by converting a molecular formula of a new material into a character string, and performing a character string analysis using a number of deep learning algorithms to achieve the accuracy of'pattern analysis' and 'Correlation of chemical structure with hazards and risks' and'pattern diversity' can be secured.

본 발명은 대상 물질의 화학 구조식과 독성과의 연관 관계를 파악하여 대상 물질의 독성 특성을 실제 분석 전에 예측할 수 있기 때문에 신물질 개발시 환경 보호 측면 안전성을 확보하고, 시간, 비용 절감으로 연규 효율성을 높일 수 있다. 대상 물질의 환경 유해성 기준에 부합하는지를 판별할 수 있다.Since the present invention can predict the toxic properties of the target substance prior to the actual analysis by understanding the relationship between the chemical structural formula and the toxicity of the target substance, it ensures safety in terms of environmental protection when developing a new substance, and improves the efficiency of smoke by reducing time and cost. Can. It is possible to determine whether the target substance meets the environmental hazard standards.

또한, 본 발명은 패턴 분석 과정이나 예측 모델링 과정을 모듈화하고, 해당 모듈을 통해 화학 실험에 대한 응용과 이해도를 높일 수 있으며, 기존의 화학 물질을 학습 데이터로 이용할 수 있고, 새로운 물질에 대한 독성 판별 결과를 화평법 등록시 컨설팅 자료로 활용할 수 있다.In addition, the present invention modularizes the pattern analysis process or the predictive modeling process, and through this module, the application and understanding of chemical experiments can be improved, existing chemicals can be used as learning data, and toxicity of new materials can be determined. The results can be used as consulting data when registering the Peace Act.

도 1은 본 발명의 일 실시예에 따른 딥러닝에 기반한 대상 물질의 유해성 예측 방법을 구현하기 위한 물성 예측 장치의 구성을 설명하는 도면이다.
도 2는 도 1의 프로세서의 세부 구성을 나타낸 도면이다.
도 3 은 본 발명의 일 실시예에 따른 딥러닝에 기반한 대상 물질의 유해성 예측 방법을 나타낸 동작 흐름도이다.
도 4는 본 발명의 일 실시예에 따른 패턴 분석 단계에서 입력되는 대상물질의 화학적 구조를 설명하는 예시도이다.
도 5는 본 발명의 일 실시예에 따른 패턴 분석 단계 및 예측 모델링 단계에서 대상 물질에 대한 패턴 분석 상태를 설명하는 예시도이다.
도 6은 본 발명의 일 실시예에 따른 정보 제공 단계에서 독성 판별 결과를 시각화하여 출력하는 과정을 설명하는 도면이다.
도 7 및 도 8은 인공 지능에 기반한 대상 물질의 유해성 및 위해성 예측 방법에 적용되는 컨벌루션 신경망 기반의 독성 예측 모델을 설명하는 도면이다.1 is a view for explaining a configuration of a property prediction device for implementing a method for predicting the harmfulness of a target material based on deep learning according to an embodiment of the present invention.
FIG. 2 is a diagram showing a detailed configuration of the processor of FIG. 1.
3 is an operation flowchart illustrating a method for predicting the harmfulness of a target material based on deep learning according to an embodiment of the present invention.
4 is an exemplary view illustrating a chemical structure of a target material input in a pattern analysis step according to an embodiment of the present invention.
5 is an exemplary diagram illustrating a pattern analysis state for a target material in a pattern analysis step and a predictive modeling step according to an embodiment of the present invention.
6 is a view for explaining a process of visualizing and outputting a result of toxic discrimination in an information providing step according to an embodiment of the present invention.
7 and 8 are diagrams for explaining a prediction model of toxicity based on a convolutional neural network applied to a method for predicting harmfulness and risk of a target substance based on artificial intelligence.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art to which the present invention pertains may easily practice. However, the present invention can be implemented in many different forms and is not limited to the embodiments described herein. In addition, in order to clearly describe the present invention in the drawings, parts irrelevant to the description are omitted, and like reference numerals are assigned to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element in between. . Also, when a part is said to "include" a certain component, it means that the component may further include other components, not exclude other components, unless specifically stated otherwise. However, it should be understood that the existence or addition possibilities of numbers, steps, actions, components, parts or combinations thereof are not excluded in advance.

이하의 실시예는 본 발명의 이해를 돕기 위한 상세한 설명이며, 본 발명의 권리 범위를 제한하는 것이 아니다. 따라서 본 발명과 동일한 기능을 수행하는 동일 범위의 발명 역시 본 발명의 권리 범위에 속할 것이다.The following examples are detailed descriptions to aid the understanding of the present invention, and do not limit the scope of the present invention. Therefore, the same scope of the invention performing the same function as the present invention will also belong to the scope of the present invention.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 인공 지능에 기반한 대상 물질의 유해성 예측 방법을 구현하기 위한 물성 예측 장치의 구성을 설명하는 도면이다.1 is a view for explaining the configuration of a property prediction device for implementing a method for predicting the harmfulness of a target material based on artificial intelligence according to an embodiment of the present invention.

도 1을 참조하면, 물성 예측 장치(100)는 통신 모듈(110), 메모리(120), 프로세서(130) 및 데이터베이스(140)를 포함한다.Referring to FIG. 1, the property prediction apparatus 100 includes a communication module 110, a memory 120, a processor 130, and a database 140.

상세히, 통신 모듈(110)은 통신망(300)과 연동하여 물성 예측 장치(100)로 송수신 신호를 패킷 데이터 형태로 제공하는 데 필요한 통신 인터페이스를 제공한다. 여기서, 통신 모듈(110)은 다른 네트워크 장치와 유무선 연결을 통해 제어 신호 또는 데이터 신호와 같은 신호를 송수신하기 위해 필요한 하드웨어 및 소프트웨어를 포함하는 장치일 수 있다.In detail, the communication module 110 provides a communication interface required to provide a transmission/reception signal in the form of packet data to the property prediction apparatus 100 in conjunction with the communication network 300. Here, the communication module 110 may be a device including hardware and software necessary for transmitting and receiving a signal such as a control signal or a data signal through a wired or wireless connection with another network device.

메모리(120)는 인공 지능에 기반한 대상 물질의 유해성 예측 방법을 수행하기 위한 프로그램이 기록된다. 또한, 프로세서(130)가 처리하는 데이터를 일시적 또는 영구적으로 저장하는 기능을 수행한다. 여기서, 메모리(120)는 휘발성 저장 매체(volatile storage media) 또는 비휘발성 저장 매체(non-volatile storage media)를 포함할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.In the memory 120, a program for performing a method for predicting the harmfulness of a target substance based on artificial intelligence is recorded. In addition, it performs a function of temporarily or permanently storing data processed by the processor 130. Here, the memory 120 may include volatile storage media or non-volatile storage media, but the scope of the present invention is not limited thereto.

프로세서(130)는 인공 지능에 기반한 대상 물질의 유해성 예측 방법을 제공하는 전체 과정을 제어한다. 프로세서(130)가 수행하는 각 단계에 대해서는 도 2와 도 3을 참조하여 후술하기로 한다.The processor 130 controls the entire process of providing a method for predicting the harmfulness of a target material based on artificial intelligence. Each step performed by the processor 130 will be described later with reference to FIGS. 2 and 3.

여기서, 프로세서(130)는 데이터를 처리할 수 있는 모든 종류의 장치를 포함할 수 있다. 여기서, '프로세서(processor)'는, 예를 들어 프로그램 내에 포함된 코드 또는 명령으로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다. 이와 같이 하드웨어에 내장된 데이터 처리 장치의 일 예로써, 마이크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 프로세서 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 처리 장치를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.Here, the processor 130 may include all kinds of devices capable of processing data. Here, a'processor' may mean a data processing device embedded in hardware having physically structured circuits, for example, to perform functions represented by codes or instructions included in a program. As an example of such a data processing device embedded in hardware, a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, and an application-specific integrated ASIC circuit), a field programmable gate array (FPGA), and the like, but the scope of the present invention is not limited thereto.

데이터베이스(140)는 인공 지능에 기반한 대상 물질의 유해성 예측 방법을 수행하면서 누적되는 데이터가 저장된다. 예컨대, 데이터베이스(140)에는 학습 데이터, 대상 물질의 화학적 특성 및 구조, 패턴 분석 결과, 독성 판별 결과 등이 저장될 수 있다.The database 140 stores data accumulated while performing a method of predicting the harmfulness of a target substance based on artificial intelligence. For example, the database 140 may store learning data, chemical properties and structures of the target material, pattern analysis results, and toxicity determination results.

도 2는 도 1의 프로세서의 세부 구성을 나타낸 도면이다.FIG. 2 is a diagram showing a detailed configuration of the processor of FIG. 1.

도 2를 참조하면, 프로세서(130)는 패턴 분석 모듈(131), 예측 모델링 모듈(132) 및 정보 제공 모듈(133)을 포함하지만 이에 한정되지 않는다.Referring to FIG. 2, the processor 130 includes, but is not limited to, a pattern analysis module 131, a prediction modeling module 132, and an information providing module 133.

패턴 분석 모듈(131)은 대상 물질의 화학적 특성 또는 구조에 대한 문자열 패턴을 분석한다.The pattern analysis module 131 analyzes a character string pattern for a chemical property or structure of a target material.

예측 모델링 모듈(132)은 인공 지능에 기반한 독성 예측 모델을 제공하고, 이 독성 예측 모델에 기반하여 대상 물질에 대한 독성 여부에 대한 예측값을 산출한다. The predictive modeling module 132 provides a toxicity prediction model based on artificial intelligence, and calculates a predicted value of whether the target substance is toxic based on the toxicity prediction model.

정보 제공 모듈(133)은 대상 물질의 독성을 판별하고, 판별된 독성을 기설정된 화학물질 규제 기준과 매칭하여 매칭 결과를 포함한 물성 및 독성 판별 결과를 시각화하여 사용자에게 제공한다. The information providing module 133 determines the toxicity of the target substance, and visualizes the property and toxicity determination result including the matching result by providing the user with the determined toxicity by matching the predetermined chemical regulation standard.

이와 같이, 프로세서(130)는 패턴 분석 모듈(131), 예측 모델링 모듈(132) 및 정보 제공 모듈(133)을 개별적으로 강화 학습하여 모듈 별로 성능을 개선할 수 있도록 한다. As such, the processor 130 individually reinforces and learns the pattern analysis module 131, the prediction modeling module 132, and the information providing module 133 to improve performance for each module.

도 3 은 본 발명의 일 실시예에 따른 인공 지능에 기반한 대상 물질의 유해성 예측 방법을 나타낸 동작 흐름도이고, 도 4는 본 발명의 일 실시예에 따른 패턴 분석 단계에서 입력되는 대상물질의 화학적 구조를 설명하는 예시도이며, 도 5는 본 발명의 일 실시예에 따른 패턴 분석 단계 및 예측 모델링 단계에서 대상 물질에 대한 패턴 분석 상태를 설명하는 예시도이고, 도 6은 본 발명의 일 실시예에 따른 정보 제공 단계에서 독성 판별 결과를 시각화하여 출력하는 과정을 설명하는 도면이다.3 is an operation flowchart showing a method for predicting the harmfulness of a target material based on artificial intelligence according to an embodiment of the present invention, and FIG. 4 shows the chemical structure of the target material input in the pattern analysis step according to an embodiment of the present invention 5 is an exemplary diagram illustrating a pattern analysis state for a target material in a pattern analysis step and a predictive modeling step according to an embodiment of the present invention, and FIG. 6 is a diagram according to an embodiment of the present invention It is a diagram explaining the process of visualizing and outputting the result of toxicity determination in the information provision step.

도 3 내지 도 6을 참조하면, 인공 지능에 기반한 대상 물질의 유해성 예측 방법은, 대상 물질의 화학식이 SMILES(Simplified molecular-input line-entry system) 코드로 입력되면(S110), 대상 물질의 화학적 특성 또는 구조에 대한 문자열 패턴 및 분자구조 특성(기하학적 입체, 전자적 특성 등)을 분석한다(S120). SMILES 코드 외에도 타 명명법 및 분자 구조도가 입력될 수도 있다. Referring to FIGS. 3 to 6, in a method for predicting the harmfulness of a target substance based on artificial intelligence, when the chemical formula of the target substance is input as a SMILES (Simplified molecular-input line-entry system) code (S110), the chemical properties of the target substance Alternatively, character string patterns and molecular structure characteristics (geometry, electronic characteristics, etc.) of the structure are analyzed (S120). In addition to the SMILES code, other nomenclature and molecular structure diagrams can also be entered.

화학물질을 구분하는 방법에는 화학물질의 3차원적 구조를 1차원 문자배열로 변환한 후 이를 비교하는 변환 방식이 있다. 이러한 변환방식에 가장 많이 쓰이는 방법은 SMILES과 InChI(International Chemical Identifier) 이다. 이러한 변환 방식으로 생성된 문자 배열은 화학물질을 3차원 구조로 저장하는 방법보다 DB의 크기를 줄일 수 있는 장점을 가지고 있다.As a method of classifying chemical substances, there is a conversion method in which a three-dimensional structure of a chemical substance is converted into a one-dimensional character array and then compared. The most commonly used methods for this conversion method are SMILES and InChI (International Chemical Identifier). The character array generated by this conversion method has the advantage of reducing the size of the DB than the method of storing chemical substances in a three-dimensional structure.

특히, SMILES는 도 4에 도시된 바와 같이 분자 내에 포함된 원자를 선형으로 표시하여 원자의 배열과 결합을 나타낸다. 따라서 SMILES로 나타낸 3차원 구조는 복잡하고 긴 분자식도 빠르고 정교하게 분석하여 가독성이 좋고, 단순한 3차원 구조를 나타내는데 장점을 가지고 있다. In particular, SMILES linearly displays atoms contained in a molecule as shown in FIG. 4 to indicate the arrangement and bonding of atoms. Therefore, the three-dimensional structure represented by SMILES has a merit to represent a simple three-dimensional structure with good readability by quickly and elaborately analyzing complex and long molecular formulas.

프로세서는 도 5에 도시된 바와 같이 문자열 패턴을 분석하기 위해 RNN, CNN, GRU, GAN 등이 알고리즘을 이용하여 명명한 화학식이 화학 구조 패턴 분석, 화학 분자식 구조의 특징 추출, 텍스트 처리, 화학 구조 패턴 분석의 정확도 향상, 새로운 화학 구조식 디자인 등을 수행할 수 있다. As shown in FIG. 5, the processor analyzes the chemical structure pattern of chemical formulas named using RNN, CNN, GRU, and GAN algorithms to analyze string patterns, extracts features of chemical molecular formula structures, processes text, and chemical structure patterns Improve analysis accuracy, design new chemical structures, and more.

한편, 프로세서(130)는 적어도 하나 이상의 화학 물질의 물성과 물리 또는 화학적 특성 및 생물학적 정보에 대한 데이터들을 학습 데이터로 사용하여 딥러닝에 기반한 독성 예측 모델을 제공하고, 독성 예측 모델에 기반하여 분석된 문자열 패턴에 대한 유사도에 따라 예측값을 산출한다(S130). 여기서, 독성 예측 모델은 대상 물질의 물성 분석, 유해성 및 위해성을 예측할 수 있다. Meanwhile, the processor 130 provides a toxicity prediction model based on deep learning by using data on physical properties, physical or chemical properties, and biological information of at least one chemical substance as training data, and is analyzed based on the toxicity prediction model The predicted value is calculated according to the similarity to the string pattern (S130). Here, the toxicity prediction model can predict physical property analysis, hazards, and risks of the target substance.

예측 모델링 단계(S130)는 Scikit-learn, Keras, Tensorflow, PyTorch, k-NN(k-Nearest Neighbors), decision tree, Random Forest(RF), gradient boosting, ensemble, SVM (Suppoert vector machine), Naive Bayes, Regression, K-means, PCA(principal component analysis), clustering, Grid Search, Cross Validation, ANN, CNN, RNN, Autoencoder, GAN, DNN, QNN, ANOVA(Analysis of variance) 중 적어도 하나 이상의 분석 모델을 조합하여 기설정된 유사도 범위를 가지는 화학적 특성끼리 군집화하고, 군집화된 화학적 특성에서 대상 물질의 문자열 패턴에 대한 기여도를 판별하여 분류 및 회귀 분석을 진행한다.The predictive modeling step (S130) includes Scikit-learn, Keras, Tensorflow, PyTorch, k-Nearest Neighbors (k-NN), decision tree, Random Forest (RF), gradient boosting, ensemble, SVM (Suppoert vector machine), Naive Bayes , Regression, K-means, PCA (principal component analysis), clustering, Grid Search, Cross Validation, ANN, CNN, RNN, Autoencoder, GAN, DNN, QNN, ANOVA (Analysis of variance) Then, the chemical properties having a predetermined similarity range are grouped, and the contribution to the string pattern of the target material is determined from the grouped chemical properties, and classification and regression analysis is performed.

Keras는 대상 물질 내부 패턴간의 상호 작용 관계를 예측 분석하고, Random Forest(RF)은 신물질의 환경 유해 가능성을 계산하며, SVM (Suppoert vector machine)은 분자 구성 원소에 따른 유해성과 위해성의 기여도를 분석하고, Naive Bayes는 분자 구조 패턴에 따라 조건부 환경 유해성과 위해성의 수치가 화학물질 평가 및 관리 기준에 부합되지 않는 확률을 계산하며, Regression는 물질의 양과 독성간의 관계 및 독성 미발현 최대량을 제시하고, K-means는 변수간의 연관성을 분석한다. Keras predicts and analyzes the interactions between patterns inside the target substance, Random Forest (RF) calculates the environmental hazards of new substances, and SVM (Suppoert vector machine) analyzes the contributions of hazards and risks according to molecular constituents. , Naive Bayes calculates the probability that the values of conditional environmental hazards and risks do not meet the criteria for chemical evaluation and management according to the molecular structure pattern, and Regression presents the relationship between the amount and toxicity of substances and the maximum amount of non-toxicity, K -means analyzes associations between variables.

따라서, 예측 모델링 단계(130)는 Keras를 이용한 예측 분석을 메인 분석으로 하면서, 나머지 분석 모델을 통해 유사한 특성끼리 군집화하고, 패턴에 대한 영향력을 판별하여 회귀 분석을 진행한다. 화학 물질의 경우, 분자의 특성, 즉 분자 구성 원소, molecular reactive area, bond의 종류, steric hindrance(입체 장해), 결합 순서, electric density(전자 밀도) 등 물리 또는 화학적 특성으로 해당 물질의 특성을 예측할 수 있기 때문에, 다수의 분석 모델을 조합하여 대상 물질에 대한 패턴 기여도에 따라 독성에 영향을 주는 원인을 분석할 수 있다. Therefore, in the predictive modeling step 130, while predictive analysis using Keras is the main analysis, similar characteristics are clustered through the remaining analysis models, and the influence on the pattern is determined to perform regression analysis. In the case of chemical substances, the properties of the substance can be predicted based on the physical or chemical characteristics such as molecular constituent elements, molecular reactive area, type of bond, steric hindrance, bonding order, and electric density. Because it can be combined, multiple analytical models can be combined to analyze causes affecting toxicity according to the pattern contribution to the target substance.

한편, 예측 모델링 단계(130)는 반수치사농도(LC50) 또는 반수영향농도(ECx)를 목표 변수로 하고, 현재까지의 학습 데이터를 통해 가장 잔차가 적은 값을 출력하며, 예측값과 실험값을 비교하여 학습시킴으로써 예측값의 정확도를 높일 수 있다. On the other hand, the predictive modeling step 130 sets the half-lethal concentration (LC50) or the half-number impact concentration (ECx) as a target variable, outputs the smallest residual value through the training data so far, and compares the predicted value with the experimental value. The accuracy of the predicted value can be improved by learning.

프로세서는 산출된 예측값을 대상 물질과 유사한 화학적 특성을 가지는 실험 물질의 실험값과 비교하여 상기 대상 물질의 독성을 판별하고(S140), 판별된 독성을 기설정된 화학물질 규제 기준과 매칭하여 매칭 결과를 포함한 독성 판별 결과를 시각화하여 사용자에게 제공한다(S150). 화학물질 규제 기준은 화학물질의 등록 및 평가 등에 관한 법률과 화학물질 위해성평가의 구체적 방법 등에 관한 규정 등을 참고한 항목이 될 수 있다. The processor determines the toxicity of the target substance by comparing the calculated predicted value with the experimental value of the experiment substance having chemical characteristics similar to the target substance (S140), and matches the determined toxicity with a preset chemical regulation standard, and includes a matching result. Visualize the result of toxicity determination and provide it to the user (S150). The standard for chemical substance regulation may be an item referring to the Act on Registration and Evaluation of Chemical Substances and the Regulation on Specific Methods for Chemical Risk Assessment.

정보 제공 단계(S150)는 피실험체의 반수치사농도(LC50) 또는 반수영향농도(ECx)를 수집하고, 수집된 반수 치사농도 또는 반수영향농도에 기반하여 대상물질의 독성 여부를 판별한다.The information providing step (S150) collects the half-lethal concentration (LC50) or the half-water impact concentration (ECx) of the subject, and determines whether the target substance is toxic based on the collected half-lethal concentration or half-water impact concentration.

또한, 정보 제공 단계(S150)는 정적 분석을 통해 상기 대상 물질의 화학적 구조와 독성간의 상관 관계를 분석하여, 도 6에 도시된 바와 같이, 대상 물질의 화학적 특성 또는 구조와 환경 유해성 간의 관계에 대한 용량-반응 곡선과 예측값 결과와, 기설정된 화학물질 규제 기준(화평법)과 매칭하여 대상 물질이 허가 물질로의 등록 불가능성에 대한 확률값을 독성 판별 결과로 제공한다. In addition, the information providing step (S150) analyzes the correlation between the chemical structure and the toxicity of the target material through static analysis, as shown in FIG. 6, for the relationship between the chemical properties or structure of the target material and environmental hazards. The dose-response curve and the predicted value result, and the predetermined chemical regulation criteria (peace law) are matched to provide a probability value for the inability to register the target substance as a licensed substance as a toxicity determination result.

한편, 정보 제공 단계(S150)는 예측값과 연동하여 대상 물질의 독성을 감소시키는 화학적 구조를 리디자인(redesign)하여 독성 판별 결과와 함께 제공할 수 있다. 정보 제공 단계(S150)는 전문가 단말을 통해 대상 물질의 물리화학적 특성 및 구조, 화학 구조식을 이용해 환경 유해성 및 위해성을 예측한 전문가 의견을 판별 결과에 포함하여 제공할 수 있다. On the other hand, in the information providing step (S150), the chemical structure that reduces the toxicity of the target substance in conjunction with the predicted value may be redesigned to provide the result of the toxicity determination. In the information providing step (S150 ), expert opinions predicting environmental hazards and risks using the physicochemical properties, structures, and chemical structural formulas of the target material may be included in the determination result through the expert terminal.

이와 같이, 본 발명은 화평법에 의거하여 대상 물질의 등록 가능성 여부, 등록시 평가받아야할 항목에 대한 사전 준비에 대응할 수 있고, 예측값과 연동하여 물질의 독성을 감소시키는 화학 구조식의 리디자인을 수행할 수 있다. As described above, the present invention can respond to the possibility of registration of a target substance based on the peace law, and to prepare in advance for an item to be evaluated upon registration, and to perform a redesign of a chemical structural formula that reduces toxicity of a substance in conjunction with a predicted value. have.

한편 도 3의 단계 S110 내지 S150은 본 발명의 구현예에 따라서 추가적인 단계들로 분할되거나, 더 적은 단계들로 조합될 수 있다. Meanwhile, steps S110 to S150 of FIG. 3 may be divided into additional steps or combined into fewer steps according to an embodiment of the present invention.

도 7 및 도 8은 인공 지능에 기반한 대상 물질의 유해성 및 위해성 예측 방법에 적용되는 컨벌루션 신경망 기반의 독성 예측 모델을 설명하는 도면이다.7 and 8 are diagrams for explaining a prediction model of toxicity based on a convolutional neural network applied to a method for predicting harmfulness and risk of a target substance based on artificial intelligence.

일반적으로, CNN은 하나 또는 여러 개의 컨벌루션 레이어와 그 위에 올려진 일반적인 인공 신경망 계층들로 이루어져 있으며, 가중치와 풀링 레이어(pooling layer)들을 추가로 활용한다. 이러한 구조로 인해 CNN은 2차원 구조의 입력 데이터를 충분히 활용할 수 있다. In general, CNN consists of one or several convolutional layers and common artificial neural network layers on top of it, and additionally uses weights and pooling layers. Due to this structure, CNN can fully utilize input data of a two-dimensional structure.

도 7 및 도 8에 도시된 바와 같이, 독성 예측 모델은 복수의 컨벌루션 레이어로 이루어지는데, 컨벌루션 레이어 수가 많아질수록 비선형이 더 증가하기 때문에 좀더 유용한 특징을 추출할 수 있다. 7 and 8, the toxicity prediction model is composed of a plurality of convolutional layers. As the number of convolutional layers increases, more nonlinearity increases, so more useful features can be extracted.

좀더 구체적으로, 화학 구조식의 패턴과 물질의 특성과의 상관관계를 연관짓기 위해 CNN 외 LSTM 등의 순환 신경망(RNN) 기법 활용이 가능하다. 다수의 화학 명명과 물리 또는 화학적 특성, 생물학적 특성들을 변수로, 상관관계를 파악하여 그 특징을 추출한다. 화학식의 패턴이 분석되면 추출되는 패턴 특성이 가중치에 포함되며, 해당 정보가 신경망 층에서 요소별 연산을 통해 입력된 정보들을 더 정확히 예측할 수 있게 된다.More specifically, it is possible to use cyclic neural network (RNN) techniques such as CNN and LSTM in order to correlate the pattern of chemical structural formulas with the properties of substances. Multiple chemical naming, physical or chemical properties, and biological properties as variables are correlated to extract the characteristics. When the pattern of the chemical formula is analyzed, the extracted pattern characteristic is included in the weight, and the information can be predicted more accurately by inputting information through element-specific operations in the neural network layer.

밀집(Dense) 레이어는 입력과 출력을 모두 연결해주고, 각 연결선에는 가중치(weight)를 포함하고 있다. 손실(loss)은 현재 가중치 세트를 평가하는 데 사용하는 손실함수이다.The dense layer connects both the input and the output, and each connection line includes a weight. Loss is the loss function used to evaluate the current set of weights.

이러한 독성 예측 모델은 데이터 크기를 정규화하여 대략적으로 데이터 크기를 동일하게 맞추고, 특정 은닉 레이어(Hidden Layer)에 들어가기 전에 Batch Normalization Layer를 더해주어 입력을 수정한 후 새로운 값을 활성화 함수(activation function)으로 넣어주는 방식으로 사용한다.This toxicity prediction model normalizes the data size to roughly match the data size, adds a batch normalization layer before entering a specific hidden layer, corrects the input, and activates the new value as an activation function. Use it in a way that you put it.

이와 같이, 머신러닝 또는 딥러닝에 기반한 독성 예측 모델은 대상 물질에 대한 분류, 회귀분석, 예측 및 모델 결과의 도출을 통해 대상 물질의 물성 분석, 유해성 및 위해성을 예측할 수 있고, 물성 분석 결과와 독성 판별 결과 등을 시각화할 수 있다. As described above, a machine learning or deep learning based toxicity prediction model can predict physical properties, hazards, and risks of a target substance through classification, regression analysis, prediction, and derivation of model results for the target substance. Discrimination results and the like can be visualized.

이상에서 설명한 본 발명의 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 이러한 기록 매체는 컴퓨터 판독 가능 매체를 포함하며, 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함하며, 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다.The embodiments of the present invention described above may also be implemented in the form of a recording medium including instructions executable by a computer, such as program modules, executed by a computer. Such recording media include computer readable media, which can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, computer readable media includes computer storage media, which are volatile and nonvolatile implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. , Removable and non-removable media.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustration only, and those skilled in the art to which the present invention pertains can understand that the present invention can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the above detailed description, and it should be interpreted that all changes or modified forms derived from the meaning and scope of the claims and equivalent concepts thereof are included in the scope of the present invention. do.

100: 물성 예측 장치
110: 통신 모듈 120: 메모리
130: 프로세서 140: 데이터베이스100: physical property prediction device
110: communication module 120: memory
130: processor 140: database

Claims

In the method for predicting the harmfulness and risk of a target material based on artificial intelligence performed by a property prediction device for predicting the property of the target material,
A pattern analysis step of analyzing a character string pattern for a chemical property or structure of the target material;
Provides a prediction model based on artificial intelligence using data on properties and properties of at least one chemical substance as training data, and based on the similarity to the analyzed string pattern and molecular structure characteristics based on the prediction model based on the toxicity prediction model. A predictive modeling step of calculating a predicted value; And
The calculated predicted value is compared with the experimental value of the experimental substance having similar chemical properties to the target substance to determine the toxicity of the target substance, and the determined toxicity is matched with a predetermined chemical substance regulation criterion to obtain physical properties including matching results or A method of predicting the harmfulness and risk of a target substance based on artificial intelligence, comprising the step of providing information providing a result of toxicity determination.

According to claim 1,
The predictive modeling step,
Artificial intelligence, in which at least one analysis model is combined to group chemical properties having a predetermined similarity range, and to classify and regression analysis by determining the contribution to the string pattern of the target material from the clustered chemical properties -Based method for predicting the hazard and risk of a target substance.

According to claim 2,
The at least one analysis model includes: Scikit-learn, Keras, Tensorflow, PyTorch, k-Nearest Neighbors (k-NN), decision tree, Random Forest (RF), gradient boosting, ensemble, SVM (Suppoert vector machine), Naive Bayes, Regression, K-means, PCA(principal component analysis), clustering, Grid Search, Cross Validation, ANN, CNN, RNN, Autoencoder, GAN, DNN, QNN. Methods for predicting the hazards and risks of a target substance.

According to claim 1,
The pattern analysis step,
SMILES (Simplified molecular-input line-entry system) code that converts the chemical structure of the target material into a one-dimensional character array and compares it. A method of predicting the harmfulness and risk of a target substance based on artificial intelligence, which analyzes the similarity between a string pattern and a molecular structure using any one of InCHI and IUPAC.

According to claim 1,
The information providing step,
Collect the half-lethal concentration (LC50) or half-water impact concentration (EC50) of the test subject, and numerically predict whether the target substance is toxic, harmful, and risk based on the collected half-lethal concentration or half-water impact concentration and physicochemical properties And, it is to determine whether it meets the chemical regulations, artificial intelligence-based method for predicting the hazard and risk of the target substance.

According to claim 1,
The information providing step,
By analyzing the correlation between the chemical structure of the target substance and the hazard and risk through static analysis, the dose-response curve and the predicted value result for the relationship between the chemical structure of the target substance and the environmental hazard and risk, and the preset chemical regulation criteria A method for predicting the harmfulness and risk of a target substance based on artificial intelligence, which provides the legality of a result as a result.

According to claim 1,
The preset chemical regulation standard is composed of evaluation criteria for evaluating the target substance as a licensed substance according to the Act on Registration and Evaluation of Chemical Substances, and the method for predicting the hazard and risk of the target substance based on artificial intelligence.

According to claim 1,
The information providing step is a method for predicting the harmfulness and risk of a target substance based on artificial intelligence, by redesigning and providing a chemical structure that reduces toxicity of the target substance in conjunction with the predicted value.