CN115206450B - Synthetic route recommendation method and terminal - Google Patents

Synthetic route recommendation method and terminal Download PDF

Info

Publication number
CN115206450B
CN115206450B CN202211119273.9A CN202211119273A CN115206450B CN 115206450 B CN115206450 B CN 115206450B CN 202211119273 A CN202211119273 A CN 202211119273A CN 115206450 B CN115206450 B CN 115206450B
Authority
CN
China
Prior art keywords
synthetic route
chemical reaction
user
reaction
credibility
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211119273.9A
Other languages
Chinese (zh)
Other versions
CN115206450A (en
Inventor
杨柳青
王薇
王中健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yaorongyun Digital Technology Chengdu Co ltd
Original Assignee
Yaorongyun Digital Technology Chengdu Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yaorongyun Digital Technology Chengdu Co ltd filed Critical Yaorongyun Digital Technology Chengdu Co ltd
Priority to CN202211119273.9A priority Critical patent/CN115206450B/en
Publication of CN115206450A publication Critical patent/CN115206450A/en
Application granted granted Critical
Publication of CN115206450B publication Critical patent/CN115206450B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/10Analysis or design of chemical reactions, syntheses or processes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/90Programming languages; Computing architectures; Database systems; Data warehousing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Analytical Chemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Data Mining & Analysis (AREA)

Abstract

The invention discloses a synthetic route recommendation method and a terminal, belonging to the technical field of chemical informatics, wherein a chemical reaction database is established, and the document types in the database comprise journal papers and patents; determining the credibility of the chemical reaction in each document; giving a synthetic route recommendation result according to the credibility of the chemical reaction, or giving the synthetic route recommendation result according to the requirement parameters of the user and the credibility of the chemical reaction; the demand parameters are any one or more of yield, availability of raw materials, rigor of reaction conditions, danger of raw materials, price of raw materials, step number of synthetic routes and reaction time. The invention introduces a document credibility confirmation mechanism, can acquire the authority and the confidence of the document, and ensures that the synthetic route recommendation result is more real and reliable; and finally, combining the actual requirements of the users, the method can recommend the synthetic route recommendation result fitting the requirements of the users for different users, and greatly improves the recommendation accuracy.

Description

Synthetic route recommendation method and terminal
Technical Field
The invention relates to the technical field of chemical informatics, in particular to a synthetic route recommendation method and a terminal.
Background
When an organic synthesis reaction database is constructed, query results need to be ranked, and because data sources mainly include patents and thesis, the content of the two different data sources is difficult to be evaluated uniformly, the prior art generally ranks the data based on time or hit rate scores of search content, and the ranking mode cannot reflect the authority and the credibility of the synthesis reaction. However, the user searches and queries the organic synthesis reaction in order to obtain a reaction with high reliability and guide the relevant test of the user. Therefore, it is necessary to design a mechanism for evaluating the reliability of organic synthesis reaction data.
Synthetic routes are schemes describing the generation of a target compound from starting materials by one or more chemical reactions. In the prior art, most of the evaluation of the relevant indexes is carried out on the single-step reaction, the evaluation of the synthetic route is not involved, and the corresponding synthetic route is not recommended according to different requirements of a user.
In addition, the prior art discloses organic synthesis route design based on chemical reaction databases, which discloses matching a target compound with a chemical reaction product in a compound raw material database, a reaction fingerprint database, a group compatibility analysis database, and a compound intermediate database by establishing the compound raw material database, the reaction fingerprint database, the group compatibility analysis database, and the compound intermediate database, and searching a result by inputting the target compound. Therefore, according to the design method of the synthetic route in the prior art, the synthetic route can not be recommended according to the specific requirements of the user, and the reliability of the synthetic route in patent or periodical literature is not clear, so that the recommendation accuracy is not high.
Disclosure of Invention
The invention aims to overcome the problems in the prior art and provides a synthetic route recommendation method and a terminal.
The purpose of the invention is realized by the following technical scheme: a synthetic route recommendation method comprising the steps of:
establishing a chemical reaction database, wherein the document types in the database comprise journal papers and patents;
determining the credibility of the chemical reaction in each document;
giving a synthetic route recommendation result according to the credibility of the chemical reaction, or giving the synthetic route recommendation result according to the demand parameters of the user and the credibility of the chemical reaction; the demand parameters are any one or more of yield, availability of raw materials, rigor of reaction conditions, danger of raw materials, price of raw materials, step number of synthetic routes and reaction time.
In one example, the determining the confidence level of the chemical reaction in each document comprises the sub-steps of:
scoring the recurrence of the chemical reaction in the literature according to experts, and highly determining the reliability of the chemical reaction according to the score; and/or the presence of a gas in the gas,
carrying out a chemical synthesis experiment according to the chemical reaction recorded in the literature, and determining the reliability of the chemical reaction according to the experimental result; and/or the presence of a gas in the atmosphere,
evaluating the chemical reaction according to indexes of different types of documents, and determining the reliability of the chemical reaction according to an evaluation result;
the patent index is any one or more of legal information, family information, citation information, claim number, specification page number, embodiment number, applicant type and inventor information; the index of the journal paper is any one or more of a journal influence factor, quoted times, unit quoted paper quantity, unit type and author information.
In one example, the determining the confidence level of the chemical reaction in each document comprises the sub-steps of:
carrying out quantitative processing on indexes of various types of documents in a database, wherein the indexes of the patent are any one or more of legal information, family information, citation information, the number of claims, the number of pages of a specification, the number of embodiments, the type of an applicant and inventor information; the index of the journal paper is any one or more of journal influence factors, quoted times, unit quoted paper quantity, unit type and author information;
scoring the reproducibility of chemical reactions in the literature according to experts to obtain a first data set comprising different scores; and/or performing a chemical synthesis experiment according to a chemical reaction described in the literature, resulting in a second data set comprising different experimental results;
and training the model according to the first data set and/or the second data set, and performing classification learning on documents with unknown credibility and quantized indexes by adopting the trained model so as to obtain the credibility of all documents.
In one example, the step of giving the recommended synthetic route according to the demand parameters of the user and the credibility of the chemical reaction further comprises the following sub-steps:
carrying out indexing treatment on the demand parameters;
defining the weight coefficient of each demand parameter according to the sensitivity of different user types to different demand parameters to obtain default evaluation formulas of different types of users;
and giving a synthetic route recommendation result based on a default evaluation formula according to target products input by different types of users.
In one example, the method further comprises optimizing a default evaluation formula, comprising the sub-steps of:
and optimizing the weight parameters in the default evaluation formula according to the behavior parameters of the user, wherein the behavior parameters comprise the scoring result of the user on a certain reaction route, the staying time of the user on a certain reaction route page, the attention of the user on the certain reaction route and the retrieval times of the user on the certain reaction route.
In an example, after the optimizing the weight parameter in the default evaluation formula according to the behavior parameter of the user, the method further includes:
and judging whether the default evaluation formula is converged, and if so, obtaining a standard evaluation formula.
In an example, the indexing processing on the demand parameter specifically includes:
and carrying out forward processing and normalization processing on each demand parameter.
In one example, the presenting the composite route recommendation based on the default evaluation formula further comprises:
according to a target product input by a user, a synthetic route is obtained through reverse derivation;
calculating the recommended index of each step of chemical reaction in the synthetic route based on a default evaluation formula to obtain the recommended index of the single step reactionS k
According to a single stepIndex of reaction recommendationS k Calculating a recommended index for a synthetic routeSAnd further obtaining a synthetic route recommendation result.
In one example, the weight coefficients of the default rating formula are customized by a user.
It should be further noted that the technical features corresponding to the above examples can be combined with each other or replaced to form a new technical solution.
The invention also includes a storage medium having stored thereon computer instructions operable to perform the steps of a composite route recommendation method as described in any one or more of the examples above.
The invention also includes a terminal comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, the processor executing the computer instructions to perform the steps of the method of generating a composite route recommendation in accordance with any one or more of the examples above.
Compared with the prior art, the invention has the beneficial effects that:
1. in one example, the database comprises multiple types of documents, which is beneficial for a user to obtain more comprehensive synthetic route recommendation results; a document credibility confirmation mechanism is introduced, so that the authority and the confidence of the document can be known, and the synthetic route recommendation result is more real and reliable; and finally, combining the actual requirements of the users, the method can recommend the synthetic route recommendation result fitting the requirements of the users for different users, and greatly improves the recommendation accuracy.
2. In one example, a mechanism for scoring the reproducibility of the reaction by experts and/or a chemical synthesis experiment mechanism and/or a mechanism for evaluating the chemical reaction based on literature indexes are introduced, so that the confidence of the chemical reaction in the literature can be obtained from multiple levels; by combining the three mechanisms, the accuracy and reliability of the document credibility evaluation can be further improved.
3. In one example, a model is trained by establishing a data set, and the credibility of the document is classified and predicted based on the model, so that the workload can be reduced, the document credibility confirmation efficiency can be improved, and the prediction accuracy is high.
4. In one example, different default evaluation formulas are designed for different types of users, so that the synthetic route can be accurately recommended for the different types of users to ensure the recommendation accuracy.
5. In one example, the default evaluation formula is optimized according to the behavior parameters of different users, the differentiated requirements of each user are considered, a synthetic route more suitable for the requirements of each user is conveniently recommended for each user, and the recommendation reliability is guaranteed.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention.
FIG. 1 is a flow chart of an exemplary method of the present invention;
FIG. 2 is a flow chart of a preferred exemplary method of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that directions or positional relationships indicated by "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like are directions or positional relationships described based on the drawings, and are only for convenience of description and simplification of description, and do not indicate or imply that the device or element referred to must have a specific orientation, be configured and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, ordinal words (e.g., "first and second," "first through fourth," etc.) are used to distinguish between objects, and are not limited to the order, but rather are to be construed to indicate or imply relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly stated or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
In one example, as shown in fig. 1, a synthetic route recommendation method specifically includes the following steps:
s1: establishing a chemical reaction database, wherein the document types in the database comprise journal papers and patents; the document types can also comprise books, newspapers, scientific reports and the like; journal articles include, but are not limited to, journal articles, academic papers, conference papers, and the like; the patent comprises an invention patent and a utility model patent, and is preferably an invention patent.
S2: determining the credibility of the chemical reaction in each document; wherein the chemical reaction is a single-step chemical reaction.
S3: giving a recommended result of the synthetic route according to the credibility of the chemical reaction, or giving a recommended result of the synthetic route according to the demand parameter of the user and the credibility of the chemical reaction, preferably giving the recommended result of the synthetic route according to the demand parameter of the user and the credibility r of the chemical reaction, wherein the demand parameter is any one or more of yield y, availability a of raw materials, severity c of reaction conditions, danger d of raw materials, price p of raw materials, step number s of the synthetic route and reaction time t. More specifically, for the yield y, there are differences in the sensitivity of the different users of the type to this parameter; for the availability of the starting materials a, the synthetic route has to converge finally to the commercially available starting materials; the reaction condition severity c generally comprises indexes such as temperature, pressure and the like, and different user sensitivities are different; the material danger d is evaluated according to the dimensionalities of whether the material is inflammable and explosive, environment-friendly and the like, and the sensitivity of different users is different; different user sensitivities are different according to the price p of the raw material, the step number s of the synthetic route and the reaction time t.
Specifically, in step S2, the credibility of the chemical reaction in each document is determined, that is, the authority and confidence of the chemical reaction in each document are evaluated, so as to determine whether the user can be guided to complete the corresponding experiment.
Specifically, in the step S3, demand parameters of the user are introduced to optimize the synthetic route recommendation result, so that the recommendation result can better meet actual demands of the user, for example, the demand parameters serve as raw material prices, cost overhead is a factor which must be considered heavily for employees of a company, and for scientific researchers, the priority of the raw material prices can be greatly reduced, so that the synthetic route recommendation result meeting the demands of the user can be recommended for different users in combination with the actual demand parameters of the user, and the recommendation accuracy is greatly improved. Further, as an optional mode, the requirement parameters can be selected or input by the user; of course, the user types can also be classified according to the basic information of the users, such as occupation, working units and the like, the demand parameters are automatically distributed to the users according to different types of users, and even the priorities of the different demand parameters are defined for the users, for example, the priority of the raw material price of scientific research personnel is smaller than the priority of the step number of the synthetic route.
In another example, steps S1, S2 may be transposed, or both steps may be performed simultaneously.
In one example, determining the trustworthiness of a chemical reaction in each document includes the sub-steps of:
scoring the recurrence of the chemical reaction in the literature according to experts, and highly determining the reliability of the chemical reaction according to the score; and/or the presence of a gas in the gas,
carrying out a chemical synthesis experiment according to the chemical reaction recorded in the literature, and determining the reliability of the chemical reaction according to the experimental result; and/or the presence of a gas in the gas,
evaluating the chemical reaction according to indexes of different types of documents, and determining the reliability of the chemical reaction according to an evaluation result; the patent indexes are one or more of legal information, namely validity ps, family information, namely family number nsf, citation information (including cited number nc 1), claim number cr, specification page number nsp, specification figure number nf, embodiment number ne, applicant type it1 and inventor information, preferably indexes of all patents are adopted, and different indexes are given different weight factors; the indexes of the journal papers are any one or more of a journal influence factor if, quoted times nc2, unit quoted paper quantity ihc, unit type it2 and author information, preferably indexes of all the journal papers are adopted, and different weighting factors are given to different indexes.
In this example, the three ways of determining the reliability of the chemical reaction in each document are preferably combined to perform comprehensive evaluation on the chemical reaction in each document, for example, calculating a mean value or a weighted mean value of the three ways, so as to further improve the accuracy and reliability of the evaluation of the reliability of the document.
In the three methods for determining the reliability of the chemical reaction in each document, the application range is all documents or part of documents, and in order to further improve the efficiency, the application range of the method is part of documents, so that a sample data set is obtained, and the reliability of the chemical reaction in all documents is predicted based on the sample data set. Specifically, in one example, determining the trustworthiness of the chemical reaction in each document includes the sub-steps of:
s21: carrying out quantization processing on indexes of various types of documents in a database;
s22: scoring the reproducibility of chemical reactions in the literature according to experts to obtain a first data set comprising different scores; and/or performing a chemical synthesis experiment according to a chemical reaction described in the literature, resulting in a second data set comprising different experimental results;
s23: and training the model according to the first data set and/or the second data set, and performing classification learning on documents with unknown credibility and quantized indexes by adopting the trained model so as to obtain the credibility of all documents.
More specifically, the first data set comprises quantitative indexes in each document and recurrence scores of chemical reactions in each document, the score reflects the probability of the recurrence, and the confidence of the chemical reactions in the corresponding documents is reflected according to the probability of the recurrence; and the second data set comprises quantitative indexes in each document and chemical synthesis experiment results in each document, and the score is 0-10 according to the experiment results, wherein 10 represents that the reproducibility is complete, 0 represents that the reproducibility is not complete, and the score height corresponds to the reproducibility height.
Taking a second data set as an example for training a model, randomly extracting a plurality of percentage documents from a database, and performing a chemical synthesis test according to chemical reaction conditions recorded in the documents to obtain a second data set; the second data set is divided into a training set and a test set, and the ratio of the number of the training set to the number of the test set is 8:2. Further, the model is preferably an existing machine learning model, and certainly can also be other algorithm models for data mining, the machine learning model learns data characteristics (quantization indexes in each document) in a training set, predicts recurrence probability of chemical reactions in corresponding documents (classification learning), continuously trains and iterates, and adjusts hyper-parameters of the model until the model prediction accuracy reaches 95% or more, so that model training is completed. The credibility of the chemical reaction in the literature is predicted based on the trained model, so that the workload can be greatly reduced, the confirmation efficiency of the credibility of the literature is improved, and the prediction accuracy is high.
Of course, when the model is trained according to the first data set and/or the second data set, if the random literature selected by the first data set and the random literature selected by the second data set are the same, the average score of the two data sets is taken to represent the recurrence of the chemical reaction in the current literature.
In another example, the execution order of steps S21, S22 may be reversed, or both steps may be executed simultaneously.
In one example, the step of giving the recommended synthetic route according to the demand parameters of the user and the credibility of the chemical reaction further comprises the following sub-steps:
s31: carrying out indexing treatment on the demand parameters; the indexing process is to perform numerical processing on the demand parameters, so that the demand parameters can be conveniently brought into the synthetic route recommendation operation in the following process.
S32: defining the weight coefficient of each demand parameter according to the sensitivity of different user types to different demand parameters to obtain default evaluation formulas of different types of users; the sensitivity expresses the priority of different types of users to each demand parameter, and the higher the priority is, the larger the weight coefficient is. For different users, the application scenarios of querying the combination route are different, and the sensitivity to the same index is different, so that default evaluation formulas with different weights need to be set:
Figure 764313DEST_PATH_IMAGE001
wherein the content of the first and second substances,Srepresenting the final recommendation index;Wa weight indicating a default setting is set to,ithe value range of (a) is [1,n],nrepresenting the number of types of users;jhas a value range of [1,m]Wherein m is the number of the normalized demand parameters;Pthe index is different indexes including the demand parameter of the user and the credibility r of the chemical reaction.
S33: and giving a synthetic route recommendation result based on a default evaluation formula according to target products input by different types of users.
In one example, the method further comprises the step of optimizing the default evaluation formula, and the method comprises the following sub-steps:
according to the behavior parameters of the user, weighting parameters in the default evaluation formulaWAnd optimizing, wherein the behavior parameters comprise a scoring result of the user on a certain reaction route, the stay time of the user on a certain reaction route page, the attention of the user on the certain reaction route, the retrieval times of the user on the certain reaction route, and the items ignored by the user when the user screens the reaction route. In this case, it is equivalent to reversely correcting each user by the behavior parameter of the userThe method comprises the steps of obtaining personalized labels of each user according to the requirement parameter preferences of the users, and formulating an evaluation formula which is more suitable for actual requirements for each user according to the personalized labels, namely considering the differential requirements of each user, so that a synthetic route which is more suitable for the requirements of each user can be conveniently recommended for each user, and the recommendation reliability is guaranteed.
In an example, after the optimization processing of the weight parameter in the default evaluation formula according to the behavior parameter of the user, the method further includes:
and judging whether the default evaluation formula is converged, and if so, obtaining a standard evaluation formula. And continuously iterating the default evaluation formula, and when the difference value of two adjacent iteration results is smaller than a convergence coefficient L, representing that the default formula is converged to obtain a standard evaluation formula based on the user type.
In one example, the indexing process of the demand parameter specifically includes:
and carrying out forward processing and normalization processing on each demand parameter.
Specifically, for the forward direction indexx i E.g. yield, availability of raw materials, etc., forward conversion values thereofy i As such, i.e.y i = x i (ii) a For reverse indexes such as reaction time, synthetic route steps and the like, the forward calculation formula is as follows:
y i =1/x i
further, the forward index is saidx i Is a set of interval type index sequences, and the optimal interval is [ 2 ]a,b]WhereinaThe lower limit of the interval is expressed,bexpressing the interval upper limit, the forward formulation is as follows:
M=max{a-min{x i },max{x i }-b}
Figure 353557DEST_PATH_IMAGE002
wherein the content of the first and second substances,Mrepresentingx i All values in } and the optimal interval [ a, b }]The maximum distance of (d);
further, the normalization formula in this example is:
Figure 591684DEST_PATH_IMAGE003
in one example, presenting the composite route recommendation based on the default evaluation formula further comprises:
s331: according to a target product input by a user, a synthetic route is obtained through reverse derivation, and the synthetic route is converged to a raw material and can be purchased;
s332: calculating the recommended index of each step of chemical reaction in the synthetic route based on a default evaluation formula to obtain the recommended index of the single step reactionS k kThe value range of (a) is [1,q]whereinqThe number of steps for the synthetic route;
s333: index recommended from Single step reactionS k Calculating a recommended index for a synthetic routeSAnd further obtaining a synthetic route recommendation result. The calculation formula of the recommendation index S is as follows:S=
Figure 924576DEST_PATH_IMAGE004
. Preferably, all the synthetic route recommendation indexes are calculated and arranged in a reverse order to obtain a synthetic route recommendation result based on the user type. Further, the ranking results of the synthetic route may be actively intervened by the user, for example, the user may further choose to ignore a certain requirement parameter to filter out more accurate recommendation results.
In an example, the weight coefficient of the default evaluation formula can be customized by the user, so that the definition error of the user type can be eliminated to the greatest extent, a personalized recommendation result is generated, and a new evaluation formula is formed:
Figure 660320DEST_PATH_IMAGE005
wherein the content of the first and second substances,Ua percentage weight set for the user.
Combining the above examples, a preferred embodiment of the present invention is shown in fig. 2, comprising the steps of:
s1': establishing a chemical reaction database, wherein the document types in the database comprise journal papers and invented patents;
and S2': carrying out quantization processing on indexes of various types of documents in a database;
s3': performing a chemical synthesis experiment according to a chemical reaction described in literature, obtaining a second data set comprising different experimental results;
s4': training the model according to a second data set, and performing classification learning on documents with unknown credibility and quantized indexes by adopting the trained model so as to obtain the credibility of all documents;
and S5': carrying out indexing treatment on the demand parameters;
s6': defining the weight coefficient of each demand parameter according to the sensitivity of different user types to different demand parameters to obtain default evaluation formulas of different types of users;
s7': according to the behavior parameters of the user, weighting parameters in the default evaluation formulaWCarrying out optimization treatment;
and S8': according to a target product input by a user, carrying out reverse derivation to obtain a synthetic route;
s9': calculating the recommended index of each step of chemical reaction in the synthetic route based on a default evaluation formula to obtain the recommended index of the single-step reactionS k
S10': index recommended from Single step reactionS k Calculating a recommended index for a synthetic routeSAnd further obtaining a synthetic route recommendation result which is sorted based on the recommendation index.
To further illustrate the technical idea of the present invention, it will now be illustrated by the following four examples:
example 1: employees of a certain company search m-hydroxybenzoic acid as a target product in a database to obtain 10 data, namely 10 recommended synthetic routes, and the data reliability r value of each piece of data is detailed in table 1:
TABLE 1 retrieval results table obtained with m-hydroxybenzoic acid as the target product
Figure 951624DEST_PATH_IMAGE006
The prior art can only sort according to a single factor, for example, sort according to time, and can sort data from journal papers according to influence factors, but can not sort papers and patents in the same system. As can be seen from table 1, the present invention can sort by integrating multidimensional conditions, so that the employee of the company searches hydroxybenzoic acid as a product in the database, and the results of sorting 10 pieces of data searched are as follows: 9 → 1 → 3 → 7 → 6 → 2 → 5 → 8 → 10 → 4, the ranking result can reflect the data credibility, when the user performs the reproduction, the user can preferentially refer to the reaction condition of the data number 9 for the reproduction, the success of the reproduction can be realized with a high probability, and the time and the research and development cost of the user are saved.
Example 2: an enterprise employee queries a synthetic route of pomalidomide in a database, aiming at carrying out process optimization, wherein a retrieval result obtained by taking pomalidomide as a target product is as follows:
search result 1:
Figure 140029DEST_PATH_IMAGE007
+
Figure 276612DEST_PATH_IMAGE008
Figure 132441DEST_PATH_IMAGE009
Figure 63488DEST_PATH_IMAGE010
Figure 270347DEST_PATH_IMAGE011
Figure 945042DEST_PATH_IMAGE012
(ii) a Wherein reflux represents reflux; r.t. denotes room temperature.
And 2, retrieval result:
Figure 655378DEST_PATH_IMAGE013
+
Figure 22906DEST_PATH_IMAGE014
Figure 185903DEST_PATH_IMAGE015
Figure 664289DEST_PATH_IMAGE016
Figure 494710DEST_PATH_IMAGE017
Figure 767560DEST_PATH_IMAGE018
and (3) retrieval result:
Figure 701010DEST_PATH_IMAGE019
+
Figure 717508DEST_PATH_IMAGE020
Figure 136856DEST_PATH_IMAGE021
Figure 846187DEST_PATH_IMAGE022
Figure 983776DEST_PATH_IMAGE023
Figure 803964DEST_PATH_IMAGE024
Figure 343399DEST_PATH_IMAGE025
Figure 223630DEST_PATH_IMAGE026
and 4, search result:
Figure 848515DEST_PATH_IMAGE027
+
Figure 737974DEST_PATH_IMAGE028
Figure 397494DEST_PATH_IMAGE029
Figure 183048DEST_PATH_IMAGE030
and 5, search results:
Figure 295229DEST_PATH_IMAGE031
+
Figure 457220DEST_PATH_IMAGE032
Figure 971247DEST_PATH_IMAGE033
Figure 927702DEST_PATH_IMAGE034
Figure 533038DEST_PATH_IMAGE035
Figure 233141DEST_PATH_IMAGE036
Figure 601674DEST_PATH_IMAGE037
Figure 994610DEST_PATH_IMAGE038
Figure 81383DEST_PATH_IMAGE039
Figure 585177DEST_PATH_IMAGE040
and 6, retrieval result:
Figure 808217DEST_PATH_IMAGE041
+
Figure 372053DEST_PATH_IMAGE042
Figure 946123DEST_PATH_IMAGE043
Figure 988028DEST_PATH_IMAGE044
+
Figure 331154DEST_PATH_IMAGE045
Figure 65892DEST_PATH_IMAGE046
Figure 861678DEST_PATH_IMAGE047
Figure 972854DEST_PATH_IMAGE048
Figure 170486DEST_PATH_IMAGE049
;
and 7, retrieval result:
Figure 76125DEST_PATH_IMAGE050
Figure 353348DEST_PATH_IMAGE051
Figure 737056DEST_PATH_IMAGE052
Figure 54774DEST_PATH_IMAGE053
Figure 131314DEST_PATH_IMAGE054
Figure 901693DEST_PATH_IMAGE055
Figure 89092DEST_PATH_IMAGE056
Figure 261316DEST_PATH_IMAGE057
Figure 508758DEST_PATH_IMAGE058
Figure 766433DEST_PATH_IMAGE059
Figure 757522DEST_PATH_IMAGE060
(ii) a Wherein the structural formula of the compound cmm14471 is
Figure 784253DEST_PATH_IMAGE061
The yield y, raw material availability a, reaction condition severity c, raw material risk d, raw material price p, synthesis route step number s, reaction time and data reliability r of the 7 search results in example 2 were indexed, and the data parameter indexing results are shown in table 2:
TABLE 2 data parameter indexing result table
Figure 202596DEST_PATH_IMAGE062
According to the table 2 and the 7 retrieval results, based on the technical scheme of the invention, the ranking can be performed by integrating the multidimensional conditions, so that the staff of the company searches in the database by taking pomalidomide as a target product, the 7 retrieved chemical synthesis routes are performed in a reverse order according to the recommendation index S, and the ranking results are as follows: 3 → 4 → 6 → 2 → 1 → 5 → 7, the ranking result is based on the label of the user' S process optimization in the system, the system is based on the recommendation index S obtained by the calculation formula under the label, the ranking result comprehensively considers various factors referred by the user in the process optimization, and has a great reference value for the user.
Example 3: on the basis of the embodiment 2, the user re-searches the synthetic route of pomalidomide in a way of self-defining evaluation index weight, at this time, the user retrieves 7 search results shown in the embodiment 2, at this time, the index weight is re-defined, and the specific weight value is set as: u shape y =0.3,U a =0.1,U c =0.15,U d =0.1,U p =0.1,U s =0.1,U t =0.1,U r =0.05, the calculation of the recommended index at this time is shown in table 3:
TABLE 3 recommendation index calculation results Table modified based on weight values
Figure 681988DEST_PATH_IMAGE063
And sorting the calculation results in the table 3 according to the recommendation index S, wherein the sorting results are as follows: 4 → 3 → 6 → 2 → 5 → 1 → 7, it can be seen that, compared with the ranking results in Table 2, the recommendation results are changed after the user passes the customized index weight, so that the recommendation results are more suitable for the user's requirements.
Example 4: on the basis of the embodiment 2, the user obtains the data sorting result according to the standard evaluation mode, and selects to ignore part of the evaluation indexes, so that when the recommendation index S is calculated, the indexes ignored by the user are discarded, and a new recommendation index S is calculated. In this example, the main purpose of the user synthetic route is to obtain the compound, test the performance of the compound, and do not care whether the scheme is mass production, so the user chooses to ignore the raw material risk, raw material price and yield, and the raw material risk, raw material price and yield are not included in the evaluation calculation, and the recommended index calculation result is shown in table 4:
table 4 recommendation index calculation result table omitting part of evaluation indexes
Figure 476768DEST_PATH_IMAGE064
The results in Table 4 are now as follows: 4 → 3 → 1 → 6 → 2 → 7 → 5, it can be seen that the recommendation results are changed compared to the ranking results in Table 3.
The present application further includes a storage medium having the same inventive concept as a composite route recommendation method comprising any one or more of the above examples, and having stored thereon computer instructions that, when executed, perform the steps of the above composite route recommendation method.
Based on such understanding, the technical solution of the present embodiment or parts of the technical solution may be essentially implemented in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The present application further includes a terminal, having the same inventive concept as any one or more example combinations corresponding to the above-mentioned synthetic route recommendation method, including a memory and a processor, the memory having stored thereon computer instructions executable on the processor, the processor executing the steps of the above-mentioned synthetic route recommendation method when executing the computer instructions. The processor may be a single or multi-core central processing unit or a specific integrated circuit, or one or more integrated circuits configured to implement the present invention.
In one example, a terminal, i.e., an electronic device, is represented in the form of a general purpose computing device, and components of the electronic device may include, but are not limited to: the at least one processing unit (processor), the at least one memory unit, and a bus connecting various system components including the memory unit and the processing unit.
Wherein the storage unit stores program code executable by the processing unit to cause the processing unit to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary method" of the present specification. For example, the processing unit may execute one of the synthetic route recommendation methods described above.
The memory unit may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM) 3201 and/or a cache memory unit, and may further include a read only memory unit (ROM).
The storage unit may also include a program/utility having a set (at least one) of program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The bus may be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface. Also, the electronic device may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via a network adapter. The network adapter communicates with other modules of the electronic device over the bus. It should be appreciated that other hardware and/or software modules may be used in conjunction with the electronic device, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, to name a few.
Through the above description, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the exemplary embodiment may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method of the exemplary embodiment of the present application.
The above detailed description is for the purpose of describing the invention in detail, and it should not be construed that the detailed description is limited to the description, and it will be apparent to those skilled in the art that various modifications and substitutions can be made without departing from the spirit of the invention.

Claims (8)

1. A synthetic route recommendation method, characterized by: the method comprises the following steps:
establishing a chemical reaction database, wherein the document types in the database comprise journal papers and patents;
determining the credibility of the chemical reaction in each document;
giving a synthetic route recommendation result according to the credibility of the chemical reaction, or giving the synthetic route recommendation result according to the requirement parameters of the user and the credibility of the chemical reaction; the demand parameters are any one or more of yield, raw material availability, reaction condition rigor, raw material danger, raw material price, synthesis route step number and reaction time;
the determination of the confidence level of the chemical reaction in each document comprises the following substeps:
scoring the recurrence of the chemical reaction in the literature according to experts, and highly determining the reliability of the chemical reaction according to the score; and/or the presence of a gas in the gas,
carrying out a chemical synthesis experiment according to the chemical reaction recorded in the literature, and determining the reliability of the chemical reaction according to the experimental result; and/or the presence of a gas in the gas,
evaluating the chemical reaction according to indexes of different types of documents, and determining the reliability of the chemical reaction according to an evaluation result;
determining the trustworthiness of a chemical reaction in each document comprises the following sub-steps:
the method comprises the steps of quantifying indexes of various types of documents in a database, wherein the indexes of patents are any one or more of legal information, family information, citation information, the number of claims, the number of pages of a specification, the number of embodiments, the type of an applicant and inventor information; the index of the journal paper is any one or more of journal influence factors, quoted times, unit quoted paper quantity, unit type and author information;
scoring the reproducibility of chemical reactions in the literature according to experts to obtain a first data set comprising different scores; and/or performing a chemical synthesis experiment according to a chemical reaction described in the literature, resulting in a second data set comprising different experimental results;
and training the model according to the first data set and/or the second data set, and performing classification learning on documents with unknown credibility and quantized indexes by using the trained model to further obtain the credibility of all documents.
2. The synthetic route recommendation method according to claim 1, wherein: the step of giving the recommendation result of the synthetic route according to the demand parameters of the user and the credibility of the chemical reaction further comprises the following substeps:
carrying out indexing treatment on the demand parameters;
defining the weight coefficient of each demand parameter according to the sensitivity of different user types to different demand parameters to obtain default evaluation formulas of different types of users;
and giving a synthetic route recommendation result based on a default evaluation formula according to target products input by different types of users.
3. The synthetic route recommendation method according to claim 2, wherein: the method further comprises the step of optimizing the default evaluation formula, and the method comprises the following sub-steps:
and optimizing the weight parameters in the default evaluation formula according to the behavior parameters of the user, wherein the behavior parameters comprise a rating result of the user on a certain reaction route, the stay time of the user on a certain reaction route page, the attention of the user on the certain reaction route and the retrieval times of the user on the certain reaction route.
4. A synthetic route recommendation method according to claim 3, wherein: the optimizing the weight parameter in the default evaluation formula according to the behavior parameter of the user further comprises:
and judging whether the default evaluation formula is converged, and if so, obtaining a standard evaluation formula.
5. The synthetic route recommendation method according to claim 2, wherein: the indexing process of the demand parameters specifically includes:
and carrying out forward processing and normalization processing on each demand parameter.
6. The synthetic route recommendation method according to claim 2, wherein: the presenting of the composite route recommendation based on the default rating formula further comprises:
according to a target product input by a user, carrying out reverse derivation to obtain a synthetic route;
calculating a recommended index of chemical reaction in each step in the synthetic route based on a default evaluation formula to obtain a single-step reaction recommended index Sk;
and calculating the recommendation index S of the synthetic route according to the single-step reaction recommendation index Sk to further obtain a synthetic route recommendation result.
7. The synthetic route recommendation method according to claim 2, wherein: and the weight coefficient of the default evaluation formula is self-defined by a user.
8. A terminal comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, the terminal comprising: the processor, when executing the computer instructions, performs the steps of a synthetic route recommendation method of any of claims 1-7.
CN202211119273.9A 2022-09-15 2022-09-15 Synthetic route recommendation method and terminal Active CN115206450B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211119273.9A CN115206450B (en) 2022-09-15 2022-09-15 Synthetic route recommendation method and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211119273.9A CN115206450B (en) 2022-09-15 2022-09-15 Synthetic route recommendation method and terminal

Publications (2)

Publication Number Publication Date
CN115206450A CN115206450A (en) 2022-10-18
CN115206450B true CN115206450B (en) 2022-12-06

Family

ID=83572760

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211119273.9A Active CN115206450B (en) 2022-09-15 2022-09-15 Synthetic route recommendation method and terminal

Country Status (1)

Country Link
CN (1) CN115206450B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115588471B (en) * 2022-11-23 2023-05-05 药融云数字科技(成都)有限公司 Self-correction single-step inverse synthesis method, terminal, server and system under continuous learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105160551A (en) * 2015-08-25 2015-12-16 摩贝(上海)生物科技有限公司 Compound price estimation system and method
CN108763354A (en) * 2018-05-16 2018-11-06 浙江工业大学 A kind of academic documents recommendation method of personalization
CN109872780A (en) * 2019-03-14 2019-06-11 北京深度制耀科技有限公司 A kind of determination method and device of chemical synthesis route
WO2021084234A1 (en) * 2019-10-28 2021-05-06 Benevolentai Technology Limited Designing a molecule and determining a route to its synthesis
CN114388071A (en) * 2021-12-31 2022-04-22 明度智云(浙江)科技有限公司 Method and device for managing compound synthesis path and storage medium
CN114388070A (en) * 2021-12-31 2022-04-22 明度智云(浙江)科技有限公司 Drug synthesis experiment optimization method, device and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020077757A1 (en) * 2000-04-03 2002-06-20 Libraria, Inc. Chemistry resource database
US20210011935A1 (en) * 2017-11-29 2021-01-14 John MacLaren Walsh Recommender methods and systems for patent processing
US11961595B2 (en) * 2018-01-30 2024-04-16 Sri International Computational generation of chemical synthesis routes and methods
US11557378B2 (en) * 2020-05-06 2023-01-17 Toyota Research Institute, Inc. Synthesis route recommendation engine for inorganic materials
CN114664388A (en) * 2020-12-23 2022-06-24 武汉智化科技有限公司 Rapid screening method for electrophilic substitution reaction of aromatic system
CN114613446A (en) * 2022-03-11 2022-06-10 冰洲石生物科技(上海)有限公司 Interactive/chemical synthesis route design method, system, medium, and electronic device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105160551A (en) * 2015-08-25 2015-12-16 摩贝(上海)生物科技有限公司 Compound price estimation system and method
CN108763354A (en) * 2018-05-16 2018-11-06 浙江工业大学 A kind of academic documents recommendation method of personalization
CN109872780A (en) * 2019-03-14 2019-06-11 北京深度制耀科技有限公司 A kind of determination method and device of chemical synthesis route
WO2021084234A1 (en) * 2019-10-28 2021-05-06 Benevolentai Technology Limited Designing a molecule and determining a route to its synthesis
CN114388071A (en) * 2021-12-31 2022-04-22 明度智云(浙江)科技有限公司 Method and device for managing compound synthesis path and storage medium
CN114388070A (en) * 2021-12-31 2022-04-22 明度智云(浙江)科技有限公司 Drug synthesis experiment optimization method, device and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CompRet: a comprehensive recommendation framework for chemical synthesis planning with algorithmic enumeration;Ryosuke Shibukawa等;《Journal of Cheminformatics》;20200901;1-14 *
基于门控图卷积神经网络的有机化学反应预测研究;赖自成;《中国优秀硕士学位论文全文数据库 医药卫生科技辑》;20210715;E080-49 *
氰氟虫腙合成方法述评;白丽萍等;《农药》;20130310(第03期);228-230 *
盐酸舍曲林的合成;李春钢等;《中国城乡企业卫生》;20091015(第05期);40-41 *

Also Published As

Publication number Publication date
CN115206450A (en) 2022-10-18

Similar Documents

Publication Publication Date Title
KR101511656B1 (en) Ascribing actionable attributes to data that describes a personal identity
Srinivasan et al. Evolutionary multi objective optimization for rule mining: a review
US8447766B2 (en) Method and system for searching unstructured textual data for quantitative answers to queries
Azadeh et al. A hybrid genetic algorithm-TOPSIS-computer simulation approach for optimum operator assignment in cellular manufacturing systems
US20140108047A1 (en) Methods and systems for medical auto-coding using multiple agents with automatic adjustment
CN112100512A (en) Collaborative filtering recommendation method based on user clustering and project association analysis
CN111831905A (en) Recommendation method and device based on team scientific research influence and sustainability modeling
CN115206450B (en) Synthetic route recommendation method and terminal
CN110866782A (en) Customer classification method and system and electronic equipment
CN110046713A (en) Robustness sequence learning method and its application based on multi-objective particle swarm optimization
CN114358657B (en) Post recommendation method and device based on model fusion
Heredia et al. Improving detection of untrustworthy online reviews using ensemble learners combined with feature selection
Dou et al. Accurate identification of RNA D modification using multiple features
US11599831B2 (en) Method and system for generating an alimentary element prediction machine-learning model
KR101823463B1 (en) Apparatus for providing researcher searching service and method thereof
Luaphol et al. Text mining approaches for dependent bug report assembly and severity prediction.
WO2020262183A1 (en) Information processing device, information processing method, and program
CN107807990A (en) A kind of intelligent search method and system based on user preference
US20220004955A1 (en) Method and system for determining resource allocation instruction set for meal preparation
Müller Classification of consumer goods into 5-digit COICOP 2018 codes
Al Sarkhi Building a data washing machine for unsupervised entity resolution of unstandardized references sources
Huang et al. Rough-set-based approach to manufacturing process document retrieval
KR102242587B1 (en) Method and server for recommending user customized hs code
CN110956528A (en) Recommendation method and system for e-commerce platform
CN110162694B (en) Recommendation system and method based on paired association rules

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant