Automatically add kinetic and thermodynamic data
If you want to perform COBRA-k's thermodynamic and/or enzyme kinetic ("thermokinetic") analyses, you need appropriate data. When loading an existing model (see previous chapter) or create a new Model instance from scratch (see second to last chapter), such data is often missing or at least not directly included in the model instance. To collect such data, COBRA-k provides two major ways, a fully automated one and one where you can add data manually thanks to its model_instantiation
submodule:
Automatic way
The automatic way of adding thermokinetic data to a newly loaded SBML file uses the function get_cobrak_model_with_kinetic_data_from_sbml_model_alone
. Regarding the SBML model, it needs the following identifier usage for reections and metabolites:
- for the reactions: An EC number annotation using the annotation key "ec-code"
- for the metabolites: BiGG IDs as identifiers
Furthermore, some extra non-optional settings have to be provided, whereby the latino-greek organism name is very important.
In addition, the following databases have to be downloaded manually beforehand:
- The BRENDA .json.tar.gz from https://www.brenda-enzymes.org/download.php
- The BiGG metabolites txt from http://bigg.ucsd.edu/data_access
- taxdmp.zip from https://ftp.ncbi.nih.gov/pub/taxonomy/
SABIO-RK and UniProt data are downloaded automatically into a file in the given folder of the function (see below). Keep in mind that this download may take several dozens of minutes! Once the database is downloaded, it is cached, and no new download is triggered.
Using all this information the automatic procedure collects the following information and adds it to the Model:
- \(k_{cat}\), \(K_M\) and \(K_I\) data: From SABIO-RK and BRENDA
- Molecular enzyme weights: From UniProt
- Taxonomic distances (used to collect taxonomically nearer enzyme kinetic data): From NCBI TAXONOMY
- \(Δ_r G^{'°}\): Using the eQuilibrator API
Here's a usage example:
from cobrak.model_instantiation import get_cobrak_model_with_kinetic_data_from_sbml_model_alone
from cobrak.dataclasses import ParameterRange
cobrak_model = get_cobrak_model_with_kinetic_data_from_sbml_model_alone(
sbml_model_path="/path/to/sbml.xml",
path_to_external_resources="/path/where/the/manually/downloaded/datafiles/are",
folder_of_sabio_database: "/path/where/the/sabiork/database/shall/be/downloaded",
brenda_version="$CURRENT_BRENDA_VERSION", # E.g. 2023_1
prefer_brenda=True, # Whether or not BRENDA k_cat or k_M, ... values shall be used if SABIO-RK data is available
base_species="$MODEL_SPECIES", # E.g. Escherichia coli
max_prot_pool=0.5,
conc_ranges={
# E.g., for all metabolites without a given identifier,
# we can use the key "DEFAULT":
"DEFAULT": ParameterRange(1e-6, 0.02),
# (...)
},
inner_to_outer_compartments=["INNERMOST_COMPARTMENT", "NEXT_TO_INNERMOST_", ], # E.g., ["c", "p", "e"], used for dG0 calculation
phs={"c": 7.0, } # dict[str, float], shows ph of each compartment in the model, used for dG0 calculation
pmgs={"c": 2.5, } # dict[str, float], shows pMg of each compartment in the model, used for dG0 calculation
ionic_strenghts={"c": 250, }, # dict[str, float], shows ionic strength in mM of each compartment in the model, used for dG0 calculation
potential_differences={("c", "p"): 0.15, }, # dict[tuple[str, str], float], shows potential difference from first to second given compartment in mV, used for dG0 calculation
kinetic_ignored_enzymes=["IDS", "OF", "IGNORED", "ENZYMES", ], # Enzymes for which no kinetic shall be found
custom_kms_and_kcats={}, # Can be dict[str, EnzymeReactionData | None] if you want to overwrite some kms or kcats
kinetic_ignored_metabolites=["IDS", "OF", "IGNORED", "METABOLITES",], # IDs of metabolites for which no enzyme kinetic value (e.g., K_M) shall be found
do_model_fullsplit = True, # Explained below
do_delete_enzymatically_suboptimal_reactions = True, # Explained below
ignore_dG0_uncertainty=True, # Whether or not eQuilibrator-calculated dG0 uncertainties shall be simply set to 0
enzyme_conc_ranges={}, # Is dict[str, ParameterRange | None]
dG0_exclusion_prefixes=[], # Prefixes (first parts of IDs) for which no dG0 shall be set, a common one would be "EX_"; is list[str]
dG0_exclusion_inner_parts=[], # Infixes (inner parts of IDs) for which no dG0 shall be set, is list[str]
extra_flux_constraints=[], # list[ExtraFluxConstraint]
extra_conc_ratios=[], # Is list[ExtraConcRatios]
data_cache_folder="/path/to/folder/for/uniprot/cache",
R=$GAS_CONSTANT, # Default is STANDARD_R
T=$TEMPERATURE, # Default is STANDARD_T
)
Two of the arguments have the following non-obvious meanings:
do_model_fullsplit
: This means that each reaction is going to be split i) for forward & reverse directions and ii) for each enzyme (complex) catalyzing it. E.g., a reversible reactionR1: A → B
catalyzed by the enzyme \(E_1\) and the enzyme complex \(E_{2,sub1} \space and \space E_{2,sub2}\) is going to be split into the four reactionsR1_ENZ_E1_FWD: A → B$, $R1_ENZ_E2SUB1_AND_E2SUB2_FWD: A → B
andR1_ENZ_E1_REV: B → A
,R1_ENZ_E2SUB1_AND_E2SUB2_REV: B → A
. This fullsplit is neccessary in order to perform thermodynamic and enzymatic calculations later on.do_delete_enzymatically_suboptimal_reactions
: Akin to the enzyme constraint method sMOMENT [!], all (fullsplit) variants of a reaction which do not have the lowest \(k_{cat}/MW\) ratio (i.e., which have higher enzyme costs à flux) are deleted. Keep in mind that, while this can drastically reduce a model's size, this also means that any \(K_M\), \(K_I\) etc. variants of reactions are not considered.
Manual and semi-automatic way
If you want to automatically create only a select amount of data, look up COBRA-k's submodules
equilibrator_functionality
(for \(Δ_r G^{'°}\)), uniprot_functionality
(for molecular enzyme weights),
sabio_rk_functionality
(for enzyme kinetic data from SABIO-RK), brenda_functionality
(for enzyme kinetic
data from BRENDA) and ncbi_taxonomy_functionality
(for taxonomy distance data). Their functions are also described in this documentation's "API reference".
Finally, if you already have some data and want to add it to an SBML file Model generation, you can use get_cobrak_model_from_sbml_and_thermokinetic_data
as follows.
from cobrak.model_instantiation import get_cobrak_model_from_sbml_and_thermokinetic_data
from cobrak.dataclasses import EnzymeReactionData, ParameterRange
cobrak_model = get_cobrak_model_from_sbml_and_thermokinetic_data(
sbml_path="path/to/sbml.xml",
extra_flux_constraints: list[ExtraFluxConstraint],
dG0s={ # Is dict[str, float], unit is kJ/mol
"$REAC_ID": dG0_of_reaction,
# (...)
},
dG0_uncertainties={ # Is dict[str, float], unit is kJ/mol
"$REAC_ID": dG0_of_reaction,
# (...)
},
conc_ranges={ # Is dict[str, ParameterRange], these are the concentrations in M
"$MET_ID": ParameterRange(minimum=min_conc_of_met, maximum=max_conc_of_met),
# (...)
},
extra_conc_ratios=[], # Is a list[ExtraConcRatios]
enzyme_molecular_weights={ # Is dict[str, float], MW in g/mmol
"$ENZYME_ID": molecular_weight,
# (...)
},
enzyme_reaction_data: dict[str, EnzymeReactionData | None],
max_prot_pool=0.5, # In g/gDW
kinetic_ignored_metabolites=["h2_c", "h2_p",], # Is list[str]
enzyme_conc_ranges = { # Is dict[str, ParameterRange | None]
"$ENZYME_ID": ParameterRange(minimum=min_enzyme_conc, maximum=max_enzyme_conc),
# (...)
},
do_model_fullsplit: bool = False, # Explained below
do_delete_enzymatically_suboptimal_reactions: bool = True, # Explained below
R: float = STANDARD_R, # Standard gas constant
T: float = STANDARD_T, # Standard temperature
)