CN116705176A - Analysis method, system and equipment for synthesis difficulty of gene synthesis sequence - Google Patents

Analysis method, system and equipment for synthesis difficulty of gene synthesis sequence Download PDF

Info

Publication number
CN116705176A
CN116705176A CN202310763933.5A CN202310763933A CN116705176A CN 116705176 A CN116705176 A CN 116705176A CN 202310763933 A CN202310763933 A CN 202310763933A CN 116705176 A CN116705176 A CN 116705176A
Authority
CN
China
Prior art keywords
difficulty
sequence
analysis
module
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310763933.5A
Other languages
Chinese (zh)
Inventor
陆荣
杨鹏
杨祥华
孙健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Junjin Gene Technology Co ltd
Original Assignee
Suzhou Junjin Gene Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Junjin Gene Technology Co ltd filed Critical Suzhou Junjin Gene Technology Co ltd
Priority to CN202310763933.5A priority Critical patent/CN116705176A/en
Publication of CN116705176A publication Critical patent/CN116705176A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biotechnology (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The analysis system for the synthesis difficulty of the gene synthesis sequence comprises a foreground page management unit and a background page analysis unit, wherein the foreground page management unit comprises an input data judgment module and a sequence difficulty analysis result output module, the background page analysis unit comprises a synthesis difficulty analysis model module, the input data judgment module is used for judging whether data input by a user on a page accords with input analysis conditions, if so, all input data are placed into an array, the array is serialized into a Json character string array, the Json character string array is sent to the synthesis difficulty analysis model module through a Jquery ajax method, the analysis of the synthesis difficulty of the gene synthesis sequence is executed by the synthesis difficulty analysis model module, and the sequence difficulty analysis result output module is used for outputting the sequence difficulty analysis result of the synthesis difficulty analysis model module.

Description

Analysis method, system and equipment for synthesis difficulty of gene synthesis sequence
Technical Field
The invention relates to the field of computers, in particular to a method, a system and equipment for analyzing the synthesis difficulty of a gene synthesis sequence.
Background
With the continuous progress of the technology in the field of bioscience, the gene synthesis technology is continuously developed, so that the existing genes in the nature can be synthesized, and the genes which do not exist in the nature can also be synthesized. In the future, gene synthesis will play a greater role in the life sciences, artificial life, and biomedical fields.
Currently, in order to obtain a large amount of synthetic genes more rapidly, an industrialized gene synthesis method is created to meet the increasing demands of related research institutes or enterprises. For related enterprises, the genetic synthesis orders of different clients are quite different, the difficulty of the genetic sequences is different, the production period of the genetic sequence synthesis cannot be expected, other orders cannot be reasonably arranged, and the genetic synthesis efficiency is reduced.
The existing manual judgment of the sequence difficulty consumes time and labor, the difficulty analysis may have errors, the working efficiency is reduced, the production efficiency is reduced, and the like.
Disclosure of Invention
One of the purposes of the invention is to provide an analysis method, a system and a device for the synthesis difficulty of a gene synthesis sequence, which can analyze the synthesis difficulty of the gene synthesis sequence to judge the accurate synthesis period, is beneficial to the overall arrangement of a gene synthesis company and is beneficial to the improvement of the production efficiency.
One of the purposes of the invention is to provide a method, a system and a device for analyzing the synthesis difficulty of a gene synthesis sequence, which can reduce the workload, and a user can obtain all details of an input sequence in a few seconds after configuring reasonable difficulty analysis parameters once: difficulty rating, difficulty score, sequence length, GC coverage ratio, GC fluctuation, overall repetition coverage length, poly repetition coverage length, forward repetition coverage length, reverse repetition coverage length, overall repetition coverage ratio, poly repetition coverage ratio, forward repetition coverage ratio, and reverse repetition coverage ratio.
The invention aims to provide a method, a system and equipment for analyzing the synthesis difficulty of a gene synthesis sequence, which can realize service standardization in an informatization mode, realize programming of the sequence difficulty analysis service, and enable the service to be more standard and simple and generate intelligent and informatization.
In order to achieve at least one object of the present invention, the present invention provides a system for analyzing a synthetic difficulty of a genetic synthetic sequence, where the system for analyzing a synthetic difficulty of a genetic synthetic sequence includes a front page management unit and a back page analysis unit, where the front page management unit includes an input data judgment module and a sequence difficulty analysis result output module, and the back page analysis unit includes a synthetic difficulty analysis model module, where the input data judgment module is configured to judge whether data input by a user on a page meets a condition of input analysis, if yes, place all input data into an array, serialize the array into a Json string array, send the Json string array to the synthetic difficulty analysis model module by jry's ajax method, and the synthetic difficulty analysis model module performs analysis of the synthetic difficulty of the genetic synthetic sequence, and the sequence difficulty analysis result output module is configured to output a sequence difficulty analysis result of the synthetic difficulty analysis model module.
In some embodiments, the foreground page management unit further includes a reset module and a parameter description management module, the reset module performs page refreshing in response to a user reset request operation, and the parameter description management module provides a tool principle feedback of the difficulty analysis on a page of the foreground in response to a user difficulty analysis tool principle description request operation.
In some embodiments, the foreground side page management unit further includes a save parameter management module configured to: if the difficulty analysis parameters are not stored in the database, displaying default values on the parameters on the page; if the stored difficulty analysis parameters exist in the database, the parameters are directly displayed in the page through the view engine.
In some embodiments, the synthetic difficulty analysis model module of the backend analysis unit includes an deserialization module for deserializing a Json string array into a set of generalized strings.
In some embodiments, the synthesis difficulty analysis model module of the background analysis unit includes a sequence repeated area set acquisition module and a repeated area length calculation module, where the sequence repeated area set acquisition module brings parameters related to sequence and repeated analysis into an interface method of sequence repeated analysis to acquire a repeated area set of the sequence; the repeated area length calculation module obtains a whole repeated area set, a Poly repeated area set, a forward repeated area set and an inverse repeated area set of the sequence, and calculates the length of the whole repeated area, the length of the Poly repeated area, the length of the forward repeated area and the length of the inverse repeated area on the sequence according to the sets.
In some embodiments, the synthetic difficulty analysis model module of the backend analysis unit includes a sequence difficulty score calculation module that calculates a sequence difficulty score, and obtains a difficulty level of a sequence according to the calculated sequence difficulty score.
According to another aspect of the present invention, there is also provided a method for analyzing difficulty in synthesizing a gene synthesis sequence, the method comprising the steps of:
responding to the operation of the system user to configure the difficulty analysis parameters, converting the parameters into Json objects, and storing the Json objects in a database in a serialization manner;
acquiring a gene sequence and an operation parameter input by a page;
establishing a gene synthesis sequence synthesis difficulty analysis model, executing preset analysis logic, and obtaining a difficulty score and a difficulty grade according to a difficulty analysis formula; and
and obtaining and outputting a sequence difficulty analysis result.
In some embodiments, the method of analyzing the synthetic difficulty of the synthetic sequence of the gene comprises the following sequence difficulty analysis steps: acquiring a sequence and parameters of page input of a foreground terminal; judging whether the sequences and the parameters meet the judging conditions of qualification in JavaScript; if the input sequence and the parameter data do not accord with the judging conditions, feeding back to page prompting error information; if the input sequence and the parameter data meet the conditions, serializing the parameter set into a Json character string array; the ajax method of the Jquery transmits the Json character string array and the Json character string sequence to a background terminal; the background terminal calls an interface to obtain various repeated areas and GC contents of the sequence; obtaining a difficulty score and a difficulty grade according to a difficulty analysis formula; obtaining all results of the difficulty analysis, and putting the results into a distribution view; putting the distributed view into a callback method of ajax, and displaying the distributed view on a page of a foreground terminal; wherein, the sequence difficulty score=sequence length weight+a+gc fluctuation weight+overall repetition coverage length+overall repetition coverage length weight+poly repetition coverage length+forward repetition coverage length+reverse repetition coverage length weight+overall repetition coverage ratio+overall repetition coverage ratio weight+poly repetition coverage ratio+forward repetition coverage ratio.
According to another aspect of the present invention, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method for analyzing the difficulty of synthesizing a gene synthesis sequence.
According to another aspect of the present invention, there is also provided an analysis apparatus for difficulty in synthesizing a gene synthesis sequence, comprising:
a memory for storing a software application,
and the processor is used for executing the software application program, and each program of the software application program correspondingly executes the steps of the analysis method for the synthesis difficulty of the gene synthesis sequence.
Drawings
FIG. 1 is a flow chart showing the steps of a method for analyzing the difficulty of synthesizing a gene synthesis sequence according to an embodiment of the present invention.
FIG. 2 is a flow chart showing the steps of the method for analyzing the difficulty of synthesizing the gene synthesis sequence according to the above embodiment of the present invention.
FIG. 3 is a flow chart showing the steps of the method for analyzing the difficulty of synthesizing the gene synthesis sequence according to the above embodiment of the present invention.
FIG. 4 is a difficulty analysis interface diagram of a difficulty analysis system for synthesizing a gene synthesis sequence according to one embodiment of the invention.
Detailed Description
The following description is presented to enable one of ordinary skill in the art to make and use the invention. The preferred embodiments in the following description are by way of example only and other obvious variations will occur to those skilled in the art. The basic principles of the invention defined in the following description may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.
It will be understood that the terms "a" and "an" should be interpreted as referring to "at least one" or "one or more," i.e., in one embodiment, the number of elements may be one, while in another embodiment, the number of elements may be plural, and the term "a" should not be interpreted as limiting the number.
The present invention relates to computer programs. FIG. 1 is a flow chart of a method for analyzing the difficulty of synthesizing a gene synthesis sequence according to the present invention, and illustrates a solution for controlling or processing a computer external object or an internal object by executing a computer program programmed according to the above flow based on the processing flow of the computer program in order to solve the problems of the present invention. The method can analyze the synthesis difficulty of the gene synthesis sequence by using a computer system to judge the accurate synthesis period, is beneficial to the overall arrangement of a gene synthesis company and is beneficial to improving the production efficiency.
Specifically, the analysis method for the synthesis difficulty of the gene synthesis sequence comprises the following steps:
s100: configuring difficulty analysis parameters and storing the difficulty analysis parameters into a database;
s200: acquiring a gene sequence and an operation parameter input by a page;
s300: establishing a gene synthesis sequence synthesis difficulty analysis model, executing preset analysis logic, and obtaining a difficulty score and a difficulty grade according to a difficulty analysis formula;
s400: and obtaining and outputting a sequence difficulty analysis result.
In a specific embodiment, the step S100 is to convert the parameters into Json objects and store the Json objects in a database in a serialization manner in response to the operation of configuring the difficulty analysis parameters by the system user.
More specifically, the analysis method for the synthetic difficulty of the gene synthesis sequence comprises the following steps of preserving parameters:
acquiring parameter data input by a page of a foreground terminal;
judging whether all the input parameter data meet the judging conditions of qualified parameters in JavaScript, wherein the sequence requirement is not null and only ATGCatgc can be contained, and the parameters are numbers which are not negative numbers;
if the input parameter data does not accord with the judging condition, feeding back to page prompting error information;
if the input parameter data meets the condition, executing parameter set serialization into a Json character string array, wherein all the input data is put into one array, and the array is serialized into the Json character string array;
transmitting the Json character string array to a background end through an ajax method of a Jquery;
the background end builds a difficulty analysis parameter object;
the Json character string array is assigned to a ParamValue field of the difficulty analysis parameter object; and
save or update to the database.
More specifically, the analysis method of the synthetic difficulty of the gene synthetic sequence comprises the following sequence difficulty analysis steps:
acquiring a sequence and parameters of page input of a foreground terminal;
judging whether the sequences and the parameters meet the judging conditions of qualification in JavaScript;
if the input sequence and the parameter data do not accord with the judging conditions, feeding back to page prompting error information;
if the input sequence and the parameter data meet the conditions, serializing the parameter set into a Json character string array;
the ajax method of the Jquery transmits the Json character string array and the Json character string sequence to a background terminal;
the background terminal calls an interface to obtain a repeated region and GC content of the sequence;
obtaining a difficulty score and a difficulty grade according to a difficulty analysis formula;
obtaining all results of the difficulty analysis, and putting the results into a distribution view; and
and putting the distribution view into an ajax callback method, and displaying the distribution view on a page of a foreground terminal.
More specifically, the analysis method of the synthetic difficulty of the gene synthesis sequence comprises the following steps:
s310: de-serializing the Json string array into a generalized string set, wherein the set is all parameters of difficulty analysis;
s320: the method comprises the steps of bringing the sequence and the parameters related to repeated analysis into an interface method of repeated analysis of the sequence, obtaining various repeated area sets of the sequence, returning to a page if repeated analysis interfaces are abnormal, and popping up a prompt box to prompt error information;
s330: acquiring a whole repeated area set, a Poly repeated area set, a forward repeated area set and an inverse repeated area set of the sequence, and calculating the length of the whole repeated area, the length of the forward repeated area and the length of the inverse repeated area on the sequence according to the sets;
s340: calculating GC fluctuations, wherein GC fluctuations = maximum GC ratio-minimum GC ratio; for example, if the GC analysis length in the parameter is 40, then the GC ratio of the 40-length subsequence is analyzed in the sequence, and one of the largest GC ratio and the smallest GC ratio is obtained;
s350: acquiring the GC proportion and the sequence length of the sequence, and defining a value as A, wherein if the GC proportion is higher than the high GC limit, A=the GC proportion is higher than the high GC limit; if GC ratio < low GC limit, a = low GC limit-GC ratio; if low GC limit < = GC ratio < = high GC limit, a = 0;
wherein sequence difficulty score = sequence length weight + a + GC fluctuation weight + GC fluctuation length + global repetition length weight + Poly repetition length + forward repetition length + reverse repetition length weight + global repetition ratio weight + Poly repetition ratio weight + forward repetition ratio;
s360: judging the difficulty level range of the difficulty score, and obtaining the difficulty level of the sequence;
s410: obtaining various results after sequence difficulty analysis: difficulty level, difficulty fraction, sequence length, GC ratio, GC fluctuation, overall repeat coverage length, poly repeat coverage length, forward repeat coverage length, reverse repeat coverage length, overall repeat coverage ratio, poly repeat coverage ratio, forward repeat coverage ratio, reverse repeat coverage ratio;
s420: and constructing each result after the sequence difficulty analysis into an object, putting the result into a distribution view for display, returning the distribution view to a callback function of the called ajax method, and then displaying the distribution view on a page of a foreground terminal.
Through the steps, the workload can be reduced, a user only needs to configure the parameters of reasonable difficulty analysis once, and then all details of the input sequence can be obtained in a few seconds, and the details are: difficulty rating, difficulty score, sequence length, GC coverage ratio, GC fluctuation, overall repetition coverage length, poly repetition coverage length, forward repetition coverage length, reverse repetition coverage length, overall repetition coverage ratio, poly repetition coverage ratio, forward repetition coverage ratio, and reverse repetition coverage ratio.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided in the form of a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects.
Those skilled in the art will appreciate that the methods of the present invention may be implemented in hardware, software, or a combination of hardware and software. The invention may be implemented in a centralized fashion in at least one computer system or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods is suited. The combination of hardware and software may be a general-purpose computer system with a computer program installed thereon, and the computer system may be controlled to operate according to the method by installing and executing the program.
The present invention can be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein. The computer program product is embodied in one or more computer-readable storage media having computer-readable program code embodied therein. According to another aspect of the invention there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, is capable of performing the steps of the method of the invention. Computer storage media is the medium in computer memory that stores some discrete physical quantity. Computer storage media includes, but is not limited to, semiconductors, disk storage, magnetic cores, drums, tapes, laser disks, and the like. It will be appreciated by those skilled in the art that the computer storage media is not limited to the foregoing examples, which are provided by way of example only and are not limiting of the invention.
According to another aspect of the present invention, there is also provided an apparatus for analyzing difficulty in synthesizing a gene synthesis sequence, the apparatus comprising: the system comprises a software application, a memory for storing the software application, and a processor for executing the software application. The respective programs of the software application program are capable of correspondingly executing the steps in the analysis method of the synthetic difficulty of the gene synthesis sequence of the present invention.
Corresponding to the embodiment of the method, according to another aspect of the invention, there is also provided a system for analyzing the difficulty of synthesizing the gene synthesis sequence, which is the application of the method for analyzing the difficulty of synthesizing the gene synthesis sequence in the improvement of computer programs.
In a specific embodiment, the analysis system of the synthetic difficulty of the gene synthesis sequence uses an asp.net mvc+DBfirst mode, the database is sqlsever, and the tag patterns of the pages are mostly referenced bootstrips. The interaction of the foreground and the background relies on the ajax method of the Jquery, and the method of displaying data on pages of the foreground is two methods of a distributed view and a view engine. The Ajax+ distribution view is used for showing a difficulty analysis result, and the view engine is used for directly showing the well-stored difficulty analysis parameters in the database on the page of the foreground terminal.
Specifically, the analysis system for the synthesis difficulty of the gene synthesis sequence comprises a foreground page management unit and a background analysis unit.
The background analysis unit comprises a difficulty analysis type object judgment module, an attribute giving module, a database updating module and a synthesis difficulty analysis model module.
The foreground side page management unit comprises a reset module, a parameter description management module, a storage parameter management module, an input data judgment module and a sequence difficulty analysis result output module.
The reset module performs a page refresh in response to a user reset demand operation. And responding to the operation required by the tool principle description of the user difficulty analysis, and providing the tool principle feedback of the difficulty analysis on the page by the parameter description management module. In a specific embodiment, the parameter description management module provides two states of hiding and displaying, the hidden Div of the requirement description is arranged in the page of the foreground terminal, the parameter description button can enable the Div of the requirement description to switch between the two states of hiding and displaying, and when a user needs to know the tool principle of difficulty analysis, the parameter description management module provides corresponding information to the page of the foreground terminal.
The save parameter management module is configured to: if the difficulty analysis parameters are not stored in the database, displaying default values on the parameters on the page; if the stored difficulty analysis parameters exist in the database, the parameters are directly displayed in the page through the view engine.
More specifically, the saved parameter management module includes a parameter judgment module and a parameter array serialization module.
In response to the operation of saving parameters of a user on a page, the parameter judging module acquires parameter arrays in JavaScript, traverses the arrays, judges whether the parameters are null values, digital values or less than 0, and if the parameters are null values, digital values or more than 0, the parameter verification is passed. If the verification is not passed, the page pops up a prompt box to prompt error information.
If the verification is passed, the parameter array serialization module serializes the parameter array into a Json character string array by using an ajax method of a Jquery, and transmits the Json character string array into the difficulty analysis type object judgment module of the background analysis unit, the difficulty analysis type object judgment module judges whether an object with the type being difficulty analysis exists in a parameter table in a database, if the judgment result of the difficulty analysis type object judgment module is yes, the attribute giving module gives the Json character string array to a ParamValue attribute of the corresponding object, and the database updating module updates the object into the database; if the judging result of the difficulty analysis type object judging module is not yes, the attribute giving module builds an object with the type being difficulty analysis, the Json character string array is given to the ParamValue attribute of the object, and the database updating module stores the object into a database.
More specifically, in a specific embodiment, the input data determining module is configured to determine whether the data input by the user on the page meets the condition of input analysis, and more specifically, the input data determining module is configured to: acquiring data input by a user, and judging whether the data meets the judging condition of qualified parameters in JavaScript, wherein the sequence requirement is not null and only contains ATGCatgc, and the parameters are numbers which are not negative numbers; if the input parameter data does not accord with the judging condition, feeding back to page prompting error information; if the input parameter data meets the condition, executing parameter set serialization into a Json character string array, wherein all the input data is put into one array, and the array is serialized into the Json character string array; and sending the Json character string array to the synthesis difficulty analysis model module of the background analysis unit through an ajax method of a Jquery.
The synthesis difficulty analysis model module of the background analysis unit comprises an anti-serialization module, a sequence repeated area set acquisition module, a repeated area length calculation module, a fluctuation calculation module and a sequence difficulty score calculation module.
Specifically, the inverse serialization module is configured to deserialize the Json string array into a generalized string set. The set includes sequence name, sequence and difficulty analysis parameters.
The sequence repeated region set acquisition module brings the parameters related to the sequence and repeated analysis into an interface method of the sequence repeated analysis to acquire a repeated region set of the sequence. If the repeated analysis interface is abnormal, returning to the page, and popping up a prompt box to prompt error information.
The repeated area length calculation module obtains a whole repeated area set, a Poly repeated desuperheating set, a forward repeated area set and an inverse repeated area set of the sequence, and calculates the length of the whole repeated area, the length of the Poly repeated area, the length of the forward repeated area and the length of the inverse repeated area on the sequence according to the sets.
The fluctuation calculation module calculates GC fluctuation, wherein GC fluctuation = maximum GC ratio-minimum GC ratio. For example, if the GC analysis length in the parameter is 40, then a maximum GC ratio and a minimum GC ratio can be obtained from the GC ratios of the subsequences of 40 length analyzed in the sequence.
The sequence difficulty score calculating module calculates a sequence difficulty score and obtains the difficulty level of the sequence according to the calculated sequence difficulty score. Wherein sequence difficulty score = sequence length weight + a + GC fluctuation weight + GC fluctuation length + global repetition coverage length weight + Poly repetition coverage length + forward repetition length + reverse repetition coverage length + global repetition coverage ratio weight + Poly repetition coverage ratio + forward repetition ratio weight + forward repetition ratio + reverse repetition ratio. In a specific embodiment, the GC ratio and the sequence length of the sequence are obtained, defining a value as a, if GC ratio > high GC limit, a=gc ratio-high GC limit; if GC ratio < low GC limit, a = low GC limit-GC ratio; if low GC limit < = GC ratio < = high GC limit, a = 0.
The sequence difficulty analysis result output module is used for outputting a sequence difficulty analysis result. The sequence difficulty analysis result output module comprises a sequence difficulty analysis result acquisition module and a distribution view processing module. The sequence difficulty analysis result acquisition module acquires various results after the sequence difficulty analysis of the synthesis difficulty analysis model module, wherein the various results comprise difficulty level, difficulty fraction, sequence length, GC proportion, GC fluctuation, integral repeated coverage length, poly repeated coverage length, forward repeated coverage length, reverse repeated coverage length, integral repeated coverage proportion, poly repeated coverage proportion, forward repeated coverage proportion and reverse repeated coverage proportion. The sequence difficulty analysis result acquisition module constructs a sequence difficulty analysis result into an object, sends the result to the distribution view processing module, and the distribution view processing module returns the distribution view to a callback function of the called ajax method and then displays the distribution view on a page.
In a specific embodiment, a displayed difficulty analysis interface diagram is shown in FIG. 4.
In a specific embodiment, the parameter tables in the database are as follows:
in a specific embodiment, parameters and attributes in the difficulty analysis interface diagram are as follows:
minimal poly repeats Int
Minimal forward repetition int
Minimal reverse repetition int
GC analysis Length int
Minimum GC ratio int
GC highest proportion int
Removing forward repetition interval>=400bp int
Removal of forward repetition/spacing<=4% int
Removal of the inverted repeat interval>=400bp int
Removal of inverted repeats/intervals<=4% int
Length weight double
Low GC-limited weight double
High GC limit weight double
GC fluctuation weight double
Overall repeat coverage length weight double
Poly repetition coverage length weight double
Forward repeat coverage length weight double
Inverted repeat coverage length weight double
Overall repeated coverage ratio weight double
Poly repetition coverage ratio weights double
Forward repeat coverage scaling weights double
Inverse repeat coverage scaling weights double
Difficulty of low level<100 int
Medium level difficulty of 100-200 int
High level of difficulty>200 int
In a specific embodiment, the tag element of the page adopts a BootStrap style, the appearance of the page is neat, and the development speed is improved. The mode of MVC + view engine is employed. MVC: the system comprises a page layer, a business logic layer and a data interaction layer; if a layer is found, it is possible to quickly locate which layer is abnormal and then make modifications. The view engine has a quite single mode of displaying the data in the page in the past, and mostly sets the value of the page element through Js; however, the advent of the view engine has made it unnecessary to set page element values by Js, and background code can be written directly in the foreground side. The page adopts an ajax+ distribution view method, ajax is a client request method, the distribution view is a part of data views processed by the background, and the part of data views are displayed in the page through an ajax callback method. And the distributed view is displayed on the page without refreshing, so that man-machine interaction is more friendly. By adopting the EF mode to realize the interaction between the background and the database, a developer only needs to call the EF self-contained method to access, delete or modify the data, and the development speed is greatly improved.
By the analysis system for the synthesis difficulty of the gene synthesis sequence, the workload can be reduced, a user only needs to configure the parameters for reasonable difficulty analysis once, and then all details of an input sequence can be obtained in a few seconds: difficulty rating, difficulty score, sequence length, GC coverage ratio, GC fluctuation, overall repetition coverage length, poly repetition coverage length, forward repetition coverage length, reverse repetition coverage length, overall repetition coverage ratio, poly repetition coverage ratio, forward repetition coverage ratio, and reverse repetition coverage ratio.
By the analysis system for the synthesis difficulty of the gene synthesis sequence, disclosed by the invention, service standardization can be realized in an informatization mode, the sequence difficulty analysis service can be programmed, the service can be more standardized and simpler, and the intelligent and informatization can be generated.
It will be appreciated by persons skilled in the art that the present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, systems and computer program products according to the invention. Each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart and/or block diagram block or blocks.
It will be appreciated by persons skilled in the art that the embodiments of the invention described above and shown in the drawings are by way of example only and are not limiting. The objects of the present invention have been fully and effectively achieved. The functional and structural principles of the present invention have been shown and described in the examples and embodiments of the invention may be modified or practiced without departing from such principles.

Claims (10)

1. The analysis system for the synthetic difficulty of the gene synthesis sequence is characterized by comprising a front-end page management unit and a back-end analysis unit, wherein the front-end page management unit comprises an input data judgment module and a sequence difficulty analysis result output module, the back-end analysis unit comprises a synthetic difficulty analysis model module, the input data judgment module is used for judging whether data input by a user on a page meets the condition of input analysis, if so, all the input data are placed into an array, the array is serialized into a Json character string array, the Json character string array is sent to the synthetic difficulty analysis model module through the ajax method of a Jquery, the synthetic difficulty analysis model module executes analysis of the synthetic difficulty of the gene synthesis sequence, and the sequence difficulty analysis result output module is used for outputting the sequence difficulty analysis result of the synthetic difficulty analysis model module.
2. The analysis system of the synthetic difficulty of the genetic synthesis sequence according to claim 1, wherein the foreground side page management unit further comprises a reset module and a parameter specification management module, the reset module performs page refreshing in response to a user reset requirement operation, and the parameter specification management module provides a tool principle feedback of the difficulty analysis on a page of the foreground side in response to a user difficulty analysis tool principle specification requirement operation.
3. The analysis system of difficulty in synthesizing a genetic synthesis sequence according to claim 1, wherein the foreground side page management unit further comprises a save parameter management module configured to: if the difficulty analysis parameters are not stored in the database, displaying default values on the parameters on the page; if the stored difficulty analysis parameters exist in the database, the parameters are directly displayed in the page through the view engine.
4. The analysis system for difficulty in synthesizing a gene synthesis sequence according to any one of claims 1 to 3, wherein the analysis model module for difficulty in synthesizing the background analysis unit comprises an inverse serialization module for inverse serializing an array of Json strings into a set of generalized strings.
5. The analysis system for the synthetic difficulty of the gene synthesis sequence according to any one of claims 1 to 3, wherein the synthetic difficulty analysis model module of the background analysis unit comprises a sequence repetition region set acquisition module and a repetition region length calculation module, wherein the sequence repetition region set acquisition module brings parameters related to sequence and repetition analysis into an interface method of sequence repetition analysis to acquire a repetition region set of the sequence; the repeated area length calculation module obtains a whole repeated area set, a Poly repeated area set, a forward repeated area set and an inverse repeated area set of the sequence, and calculates the length of the whole repeated area, the length of the Poly repeated area, the length of the forward repeated area and the length of the inverse repeated area on the sequence according to the sets.
6. The analysis system for the synthetic difficulty of the gene synthesis sequence according to any one of claims 1 to 3, wherein the synthetic difficulty analysis model module of the backend analysis unit comprises a sequence difficulty score calculation module which calculates a sequence difficulty score and acquires a difficulty level of a sequence according to the calculated sequence difficulty score.
7. The analysis method of the synthesis difficulty of the gene synthesis sequence is characterized by comprising the following steps of:
responding to the operation of the system user to configure the difficulty analysis parameters, converting the parameters into Json objects, and storing the Json objects in a database in a serialization manner;
acquiring a gene sequence and an operation parameter input by a page;
establishing a gene synthesis sequence synthesis difficulty analysis model, executing preset analysis logic, and obtaining a difficulty score and a difficulty grade according to a difficulty analysis formula; and
and obtaining and outputting a sequence difficulty analysis result.
8. The method for analyzing the difficulty in synthesizing a gene synthesis sequence according to claim 7, wherein the method for analyzing the difficulty in synthesizing a gene synthesis sequence comprises the steps of: acquiring a sequence and parameters of page input of a foreground terminal; judging whether the sequences and the parameters meet the judging conditions of qualification in JavaScript; if the input sequence and the parameter data do not accord with the judging conditions, feeding back to page prompting error information; if the input sequence and the parameter data meet the conditions, serializing the parameter set into a Json character string array; the ajax method of the Jquery transmits the Json character string array and the Json character string sequence to a background terminal; the background terminal calls an interface to obtain various repeated areas and GC contents of the sequence; obtaining a difficulty score and a difficulty grade according to a difficulty analysis formula; obtaining all results of the difficulty analysis, and putting the results into a distribution view; putting the distributed view into a callback method of ajax, and displaying the distributed view on a page of a foreground terminal; wherein, the sequence difficulty score=sequence length weight+a+gc fluctuation weight+overall repetition coverage length+overall repetition coverage length weight+poly repetition coverage length+forward repetition coverage length+reverse repetition coverage length weight+overall repetition coverage ratio+overall repetition coverage ratio weight+poly repetition coverage ratio+forward repetition coverage ratio.
9. A computer-readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, performs the steps of the method for analyzing the difficulty of synthesizing a gene synthesis sequence according to claim 7 or 8.
10. An analysis apparatus for difficulty in synthesizing a gene synthesis sequence, comprising:
a memory for storing a software application,
a processor for executing the software application program, each program of the software application program correspondingly executing the steps of the analysis method for the difficulty of synthesizing a gene synthesis sequence according to claim 7 or 8.
CN202310763933.5A 2023-06-27 2023-06-27 Analysis method, system and equipment for synthesis difficulty of gene synthesis sequence Pending CN116705176A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310763933.5A CN116705176A (en) 2023-06-27 2023-06-27 Analysis method, system and equipment for synthesis difficulty of gene synthesis sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310763933.5A CN116705176A (en) 2023-06-27 2023-06-27 Analysis method, system and equipment for synthesis difficulty of gene synthesis sequence

Publications (1)

Publication Number Publication Date
CN116705176A true CN116705176A (en) 2023-09-05

Family

ID=87833837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310763933.5A Pending CN116705176A (en) 2023-06-27 2023-06-27 Analysis method, system and equipment for synthesis difficulty of gene synthesis sequence

Country Status (1)

Country Link
CN (1) CN116705176A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102637244A (en) * 2011-12-31 2012-08-15 苏州金唯智生物科技有限公司 Biological sequence analysis platform and using method thereof
CN111192629A (en) * 2019-12-23 2020-05-22 苏州金唯智生物科技有限公司 Construction method and application of gene sequence difficulty analysis model
WO2022271159A1 (en) * 2021-06-22 2022-12-29 Foundation Medicine, Inc. Systems and methods for evaluating tumor fraction
CN115786470A (en) * 2022-12-27 2023-03-14 常州先趋医疗科技有限公司 Method, device and storage medium for multi-probe specific screening of LAMP product
CN116312783A (en) * 2022-06-08 2023-06-23 中国科学院天津工业生物技术研究所 System for predicting DNA synthesis difficulty and application thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102637244A (en) * 2011-12-31 2012-08-15 苏州金唯智生物科技有限公司 Biological sequence analysis platform and using method thereof
CN111192629A (en) * 2019-12-23 2020-05-22 苏州金唯智生物科技有限公司 Construction method and application of gene sequence difficulty analysis model
WO2022271159A1 (en) * 2021-06-22 2022-12-29 Foundation Medicine, Inc. Systems and methods for evaluating tumor fraction
CN116312783A (en) * 2022-06-08 2023-06-23 中国科学院天津工业生物技术研究所 System for predicting DNA synthesis difficulty and application thereof
CN115786470A (en) * 2022-12-27 2023-03-14 常州先趋医疗科技有限公司 Method, device and storage medium for multi-probe specific screening of LAMP product

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YAN ZHENG ET AL.: "Machine learning-aided scoring of synthesis difficulties for designer chromosomes", SCI CHINA LIFE SCI, vol. 66, no. 7, 3 March 2023 (2023-03-03), pages 1615 *
佚名: "基因合成难度界定", pages 1 - 2, Retrieved from the Internet <URL:https://www.docin.com/p-1097249031.html> *

Similar Documents

Publication Publication Date Title
AU2021240155B2 (en) Control Pulse Generation Method, Apparatus, System, Device And Storage Medium
Greenwell pdp: An R package for constructing partial dependence plots.
Holmes et al. Analysis of multivariate time-series using the MARSS package
US10191968B2 (en) Automated data analysis
Poolman ScrumPy: metabolic modelling with Python
Diks et al. E&F Chaos: a user friendly software package for nonlinear economic dynamics
US20130187922A1 (en) Systems and Methods for Graphical Layout
JP7403638B2 (en) Fast sparse neural network
CN110826708B (en) Method for realizing neural network model splitting by using multi-core processor and related product
Nasridinov et al. Decision tree construction on GPU: ubiquitous parallel computing approach
CN109840205A (en) Applied program testing method, device, readable storage medium storing program for executing and terminal device
Müller et al. Enhancing the visualization process with principal component analysis to support the exploration of trends
CN112685026A (en) Multi-language-based visual modeling platform and method
David Readable and efficient HEP data analysis with bamboo
US9092303B2 (en) Dictionary-based dependency determination
CN116705176A (en) Analysis method, system and equipment for synthesis difficulty of gene synthesis sequence
Yue et al. A machine learning approach for predicting computational intensity and domain decomposition in parallel geoprocessing
US8711142B2 (en) Visual model importation
CN102141954A (en) Relational modeling for performance analysis of multi-core processors
Knowles et al. Package ‘merTools’
Iida et al. Bootstrap estimation and model selection for multivariate normal mixtures using parallel computing with graphics processing units
Bajzát et al. Cell automaton modelling algorithms: Implementation and testing in GPU systems
CN115599195B (en) GPU energy consumption prediction method and system based on CUDA performance counter
KR102457154B1 (en) Method and system for generating intermediate representation for program for execution on accelerator
Zhang et al. Research on data visualization and application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination