D47crunch
Standardization and analytical error propagation of Δ47 and Δ48 clumped-isotope measurements
Process and standardize carbonate and/or CO2 clumped-isotope analyses, from low-level data out of a dual-inlet mass spectrometer to final, “absolute” Δ47, Δ48 and Δ49 values with fully propagated analytical error estimates (Daëron, 2021).
The tutorial section takes you through a series of simple steps to import/process data and print out the results. The how-to section provides instructions applicable to various specific tasks.
1. Tutorial
1.1 Installation
The easy option is to use pip
; open a shell terminal and simply type:
python -m pip install D47crunch
For those wishing to experiment with the bleeding-edge development version, this can be done through the following steps:
- Download the
dev
branch source code here and rename it toD47crunch.py
. - Do any of the following:
- copy
D47crunch.py
to somewhere in your Python path - copy
D47crunch.py
to a working directory (import D47crunch
will only work if called within that directory) - copy
D47crunch.py
to any other location (e.g.,/foo/bar
) and then use the following code snippet in your own code to importD47crunch
:
- copy
import sys
sys.path.append('/foo/bar')
import D47crunch
Documentation for the development version can be downloaded here (save html file and open it locally).
1.2 Usage
Start by creating a file named rawdata.csv
with the following contents:
UID, Sample, d45, d46, d47, d48, d49
A01, ETH-1, 5.79502, 11.62767, 16.89351, 24.56708, 0.79486
A02, MYSAMPLE-1, 6.21907, 11.49107, 17.27749, 24.58270, 1.56318
A03, ETH-2, -6.05868, -4.81718, -11.63506, -10.32578, 0.61352
A04, MYSAMPLE-2, -3.86184, 4.94184, 0.60612, 10.52732, 0.57118
A05, ETH-3, 5.54365, 12.05228, 17.40555, 25.96919, 0.74608
A06, ETH-2, -6.06706, -4.87710, -11.69927, -10.64421, 1.61234
A07, ETH-1, 5.78821, 11.55910, 16.80191, 24.56423, 1.47963
A08, MYSAMPLE-2, -3.87692, 4.86889, 0.52185, 10.40390, 1.07032
Then instantiate a D47data
object which will store and process this data:
import D47crunch
mydata = D47data()
For now, this object is empty:
>>> print(mydata)
[]
To load the analyses saved in rawdata.csv
into our D47data
object and process the data:
mydata.read('rawdata.csv')
# compute δ13C, δ18O of working gas:
mydata.wg()
# compute δ13C, δ18O, raw Δ47 values for each analysis:
mydata.crunch()
# compute absolute Δ47 values for each analysis
# as well as average Δ47 values for each sample:
mydata.standardize()
We can now print a summary of the data processing:
>>> mydata.summary(verbose = True, save_to_file = False)
[summary]
––––––––––––––––––––––––––––––– –––––––––
N samples (anchors + unknowns) 5 (3 + 2)
N analyses (anchors + unknowns) 8 (5 + 3)
Repeatability of δ13C_VPDB 4.2 ppm
Repeatability of δ18O_VSMOW 47.5 ppm
Repeatability of Δ47 (anchors) 13.4 ppm
Repeatability of Δ47 (unknowns) 2.5 ppm
Repeatability of Δ47 (all) 9.6 ppm
Model degrees of freedom 3
Student's 95% t-factor 3.18
Standardization method pooled
––––––––––––––––––––––––––––––– –––––––––
This tells us that our data set contains 5 different samples: 3 anchors (ETH-1, ETH-2, ETH-3) and 2 unknowns (MYSAMPLE-1, MYSAMPLE-2). The total number of analyses is 8, with 5 anchor analyses and 3 unknown analyses. We get an estimate of the analytical repeatability (i.e. the overall, pooled standard deviation) for δ13C, δ18O and Δ47, as well as the number of degrees of freedom (here, 3) that these estimated standard deviations are based on, along with the corresponding Student's t-factor (here, 3.18) for 95 % confidence limits. Finally, the summary indicates that we used a “pooled” standardization approach (see [Daëron, 2021]).
To see the actual results:
>>> mydata.table_of_samples(verbose = True, save_to_file = False)
[table_of_samples]
–––––––––– – ––––––––– –––––––––– –––––– –––––– –––––––– –––––– ––––––––
Sample N d13C_VPDB d18O_VSMOW D47 SE 95% CL SD p_Levene
–––––––––– – ––––––––– –––––––––– –––––– –––––– –––––––– –––––– ––––––––
ETH-1 2 2.01 37.01 0.2052 0.0131
ETH-2 2 -10.17 19.88 0.2085 0.0026
ETH-3 1 1.73 37.49 0.6132
MYSAMPLE-1 1 2.48 36.90 0.2996 0.0091 ± 0.0291
MYSAMPLE-2 2 -8.17 30.05 0.6600 0.0115 ± 0.0366 0.0025
–––––––––– – ––––––––– –––––––––– –––––– –––––– –––––––– –––––– ––––––––
This table lists, for each sample, the number of analytical replicates, average δ13C and δ18O values (for the analyte CO2 , not for the carbonate itself), the average Δ47 value and the SD of Δ47 for all replicates of this sample. For unknown samples, the SE and 95 % confidence limits for mean Δ47 are also listed These 95 % CL take into account the number of degrees of freedom of the regression model, so that in large datasets the 95 % CL will tend to 1.96 times the SE, but in this case the applicable t-factor is much larger.
We can also generate a table of all analyses in the data set (again, note that d18O_VSMOW
is the composition of the CO2 analyte):
>>> mydata.table_of_analyses(verbose = True, save_to_file = False)
[table_of_analyses]
––– ––––––––– –––––––––– ––––––––––– –––––––––––– ––––––––– ––––––––– –––––––––– –––––––––– –––––––– –––––––––– –––––––––– ––––––––– ––––––––– –––––––––– ––––––––
UID Session Sample d13Cwg_VPDB d18Owg_VSMOW d45 d46 d47 d48 d49 d13C_VPDB d18O_VSMOW D47raw D48raw D49raw D47
––– ––––––––– –––––––––– ––––––––––– –––––––––––– ––––––––– ––––––––– –––––––––– –––––––––– –––––––– –––––––––– –––––––––– ––––––––– ––––––––– –––––––––– ––––––––
A01 mySession ETH-1 -3.807 24.921 5.795020 11.627670 16.893510 24.567080 0.794860 2.014086 37.041843 -0.574686 1.149684 -27.690250 0.214454
A02 mySession MYSAMPLE-1 -3.807 24.921 6.219070 11.491070 17.277490 24.582700 1.563180 2.476827 36.898281 -0.499264 1.435380 -27.122614 0.299589
A03 mySession ETH-2 -3.807 24.921 -6.058680 -4.817180 -11.635060 -10.325780 0.613520 -10.166796 19.907706 -0.685979 -0.721617 16.716901 0.206693
A04 mySession MYSAMPLE-2 -3.807 24.921 -3.861840 4.941840 0.606120 10.527320 0.571180 -8.159927 30.087230 -0.248531 0.613099 -4.979413 0.658270
A05 mySession ETH-3 -3.807 24.921 5.543650 12.052280 17.405550 25.969190 0.746080 1.727029 37.485567 -0.226150 1.678699 -28.280301 0.613200
A06 mySession ETH-2 -3.807 24.921 -6.067060 -4.877100 -11.699270 -10.644210 1.612340 -10.173599 19.845192 -0.683054 -0.922832 17.861363 0.210328
A07 mySession ETH-1 -3.807 24.921 5.788210 11.559100 16.801910 24.564230 1.479630 2.009281 36.970298 -0.591129 1.282632 -26.888335 0.195926
A08 mySession MYSAMPLE-2 -3.807 24.921 -3.876920 4.868890 0.521850 10.403900 1.070320 -8.173486 30.011134 -0.245768 0.636159 -4.324964 0.661803
––– ––––––––– –––––––––– ––––––––––– –––––––––––– ––––––––– ––––––––– –––––––––– –––––––––– –––––––– –––––––––– –––––––––– ––––––––– ––––––––– –––––––––– ––––––––
2. How-to
2.1 Simulate a virtual data set to play with
It is sometimes convenient to quickly build a virtual data set of analyses, for instance to assess the final analytical precision achievable for a given combination of anchor and unknown analyses (see also Fig. 6 of Daëron, 2021).
This can be achieved with virtual_data()
. The example below creates a dataset with four sessions, each of which comprises three analyses of anchor ETH-1, three of ETH-2, three of ETH-3, and three analyses each of two unknown samples named FOO
and BAR
with an arbitrarily defined isotopic composition. Analytical repeatabilities for Δ47 and Δ48 are also specified arbitrarily. See the virtual_data()
documentation for additional configuration parameters.
from D47crunch import virtual_data, D47data
args = dict(
samples = [
dict(Sample = 'ETH-1', N = 3),
dict(Sample = 'ETH-2', N = 3),
dict(Sample = 'ETH-3', N = 3),
dict(Sample = 'FOO', N = 3,
d13C_VPDB = -5., d18O_VPDB = -10.,
D47 = 0.3, D48 = 0.15),
dict(Sample = 'BAR', N = 3,
d13C_VPDB = -15., d18O_VPDB = -2.,
D47 = 0.6, D48 = 0.2),
], rD47 = 0.010, rD48 = 0.030)
session1 = virtual_data(session = 'Session_01', **args, seed = 123)
session2 = virtual_data(session = 'Session_02', **args, seed = 1234)
session3 = virtual_data(session = 'Session_03', **args, seed = 12345)
session4 = virtual_data(session = 'Session_04', **args, seed = 123456)
D = D47data(session1 + session2 + session3 + session4)
D.crunch()
D.standardize()
D.table_of_sessions(verbose = True, save_to_file = False)
D.table_of_samples(verbose = True, save_to_file = False)
D.table_of_analyses(verbose = True, save_to_file = False)
2.2 Control data quality
D47crunch
offers several tools to visualize processed data. The examples below use the same virtual data set, generated with:
from D47crunch import *
from random import shuffle
# generate virtual data:
args = dict(
samples = [
dict(Sample = 'ETH-1', N = 8),
dict(Sample = 'ETH-2', N = 8),
dict(Sample = 'ETH-3', N = 8),
dict(Sample = 'FOO', N = 4,
d13C_VPDB = -5., d18O_VPDB = -10.,
D47 = 0.3, D48 = 0.15),
dict(Sample = 'BAR', N = 4,
d13C_VPDB = -15., d18O_VPDB = -15.,
D47 = 0.5, D48 = 0.2),
])
sessions = [
virtual_data(session = f'Session_{k+1:02.0f}', seed = 123456+k, **args)
for k in range(10)]
# shuffle the data:
data = [r for s in sessions for r in s]
shuffle(data)
data = sorted(data, key = lambda r: r['Session'])
# create D47data instance:
data47 = D47data(data)
# process D47data instance:
data47.crunch()
data47.standardize()
2.2.1 Plotting the distribution of analyses through time
data47.plot_distribution_of_analyses(filename = 'time_distribution.pdf')
The plot above shows the succession of analyses as if they were all distributed at regular time intervals. See D4xdata.plot_distribution_of_analyses()
for how to plot analyses as a function of “true” time (based on the TimeTag
for each analysis).
2.2.2 Generating session plots
data47.plot_sessions()
Below is one of the resulting sessions plots. Each cross marker is an analysis. Anchors are in red and unknowns in blue. Short horizontal lines show the nominal Δ47 value for anchors, in red, or the average Δ47 value for unknowns, in blue (overall average for all sessions). Curved grey contours correspond to Δ47 standardization errors in this session.
2.2.3 Plotting Δ47 or Δ48 residuals
data47.plot_residuals(filename = 'residuals.pdf', kde = True)
Again, note that this plot only shows the succession of analyses as if they were all distributed at regular time intervals.
2.2.4 Checking δ13C and δ18O dispersion
mydata = D47data(virtual_data(
session = 'mysession',
samples = [
dict(Sample = 'ETH-1', N = 4),
dict(Sample = 'ETH-2', N = 4),
dict(Sample = 'ETH-3', N = 4),
dict(Sample = 'MYSAMPLE', N = 8, D47 = 0.6, D48 = 0.1, d13C_VPDB = -4.0, d18O_VPDB = -12.0),
], seed = 123))
mydata.refresh()
mydata.wg()
mydata.crunch()
mydata.plot_bulk_compositions()
D4xdata.plot_bulk_compositions()
produces a series of plots, one for each sample, and an additional plot with all samples together. For example, here is the plot for sample MYSAMPLE
:
2.3 Use a different set of anchors, change anchor nominal values, and/or change oxygen-17 correction parameters
Nominal values for various carbonate standards are defined in four places:
D4xdata.Nominal_d13C_VPDB
D4xdata.Nominal_d18O_VPDB
D47data.Nominal_D4x
(also accessible throughD47data.Nominal_D47
)D48data.Nominal_D4x
(also accessible throughD48data.Nominal_D48
)
17O correction parameters are defined by:
D4xdata.R13_VPDB
D4xdata.R18_VSMOW
D4xdata.R18_VPDB
D4xdata.LAMBDA_17
D4xdata.R17_VSMOW
D4xdata.R17_VPDB
When creating a new instance of D47data
or D48data
, the current values of these variables are copied as properties of the new object. Applying custom values for, e.g., R17_VSMOW
and Nominal_D47
can thus be done in several ways:
Option 1: by redefining D4xdata.R17_VSMOW
and D47data.Nominal_D47
_before_ creating a D47data
object:
from D47crunch import D4xdata, D47data
# redefine R17_VSMOW:
D4xdata.R17_VSMOW = 0.00037 # new value
# redefine R17_VPDB for consistency:
D4xdata.R17_VPDB = D4xdata.R17_VSMOW * (D4xdata.R18_VPDB/D4xdata.R18_VSMOW) ** D4xdata.LAMBDA_17
# edit Nominal_D47 to only include ETH-1/2/3:
D47data.Nominal_D4x = {
a: D47data.Nominal_D4x[a]
for a in ['ETH-1', 'ETH-2', 'ETH-3']
}
# redefine ETH-3:
D47data.Nominal_D4x['ETH-3'] = 0.600
# only now create D47data object:
mydata = D47data()
# check the results:
print(mydata.R17_VSMOW, mydata.R17_VPDB)
print(mydata.Nominal_D47)
# NB: mydata.Nominal_D47 is just an alias for mydata.Nominal_D4x
# should print out:
# 0.00037 0.00037599710894149464
# {'ETH-1': 0.2052, 'ETH-2': 0.2085, 'ETH-3': 0.6}
Option 2: by redefining R17_VSMOW
and Nominal_D47
_after_ creating a D47data
object:
from D47crunch import D47data
# first create D47data object:
mydata = D47data()
# redefine R17_VSMOW:
mydata.R17_VSMOW = 0.00037 # new value
# redefine R17_VPDB for consistency:
mydata.R17_VPDB = mydata.R17_VSMOW * (mydata.R18_VPDB/mydata.R18_VSMOW) ** mydata.LAMBDA_17
# edit Nominal_D47 to only include ETH-1/2/3:
mydata.Nominal_D47 = {
a: mydata.Nominal_D47[a]
for a in ['ETH-1', 'ETH-2', 'ETH-3']
}
# redefine ETH-3:
mydata.Nominal_D47['ETH-3'] = 0.600
# check the results:
print(mydata.R17_VSMOW, mydata.R17_VPDB)
print(mydata.Nominal_D47)
# should print out:
# 0.00037 0.00037599710894149464
# {'ETH-1': 0.2052, 'ETH-2': 0.2085, 'ETH-3': 0.6}
The two options above are equivalent, but the latter provides a simple way to compare different data processing choices:
from D47crunch import D47data
# create two D47data objects:
foo = D47data()
bar = D47data()
# modify foo in various ways:
foo.LAMBDA_17 = 0.52
foo.R17_VSMOW = 0.00037 # new value
foo.R17_VPDB = foo.R17_VSMOW * (foo.R18_VPDB/foo.R18_VSMOW) ** foo.LAMBDA_17
foo.Nominal_D47 = {
'ETH-1': foo.Nominal_D47['ETH-1'],
'ETH-2': foo.Nominal_D47['ETH-1'],
'IAEA-C2': foo.Nominal_D47['IAEA-C2'],
'INLAB_REF_MATERIAL': 0.666,
}
# now import the same raw data into foo and bar:
foo.read('rawdata.csv')
foo.wg() # compute δ13C, δ18O of working gas
foo.crunch() # compute all δ13C, δ18O and raw Δ47 values
foo.standardize() # compute absolute Δ47 values
bar.read('rawdata.csv')
bar.wg() # compute δ13C, δ18O of working gas
bar.crunch() # compute all δ13C, δ18O and raw Δ47 values
bar.standardize() # compute absolute Δ47 values
# and compare the final results:
foo.table_of_samples(verbose = True, save_to_file = False)
bar.table_of_samples(verbose = True, save_to_file = False)
2.4 Process paired Δ47 and Δ48 values
Purely in terms of data processing, it is not obvious why Δ47 and Δ48 data should not be handled separately. For now, D47crunch
uses two independent classes — D47data
and D48data
— which crunch numbers and deal with standardization in very similar ways. The following example demonstrates how to print out combined outputs for D47data
and D48data
.
from D47crunch import *
# generate virtual data:
args = dict(
samples = [
dict(Sample = 'ETH-1', N = 3),
dict(Sample = 'ETH-2', N = 3),
dict(Sample = 'ETH-3', N = 3),
dict(Sample = 'FOO', N = 3,
d13C_VPDB = -5., d18O_VPDB = -10.,
D47 = 0.3, D48 = 0.15),
], rD47 = 0.010, rD48 = 0.030)
session1 = virtual_data(session = 'Session_01', **args)
session2 = virtual_data(session = 'Session_02', **args)
# create D47data instance:
data47 = D47data(session1 + session2)
# process D47data instance:
data47.crunch()
data47.standardize()
# create D48data instance:
data48 = D48data(data47) # alternatively: data48 = D48data(session1 + session2)
# process D48data instance:
data48.crunch()
data48.standardize()
# output combined results:
table_of_sessions(data47, data48)
table_of_samples(data47, data48)
table_of_analyses(data47, data48)
Expected output:
–––––––––– –– –– ––––––––––– –––––––––––– –––––– –––––– –––––– ––––––––––––– ––––––––––––––– –––––––––––––– –––––– ––––––––––––– ––––––––––––––– ––––––––––––––
Session Na Nu d13Cwg_VPDB d18Owg_VSMOW r_d13C r_d18O r_D47 a_47 ± SE 1e3 x b_47 ± SE c_47 ± SE r_D48 a_48 ± SE 1e3 x b_48 ± SE c_48 ± SE
–––––––––– –– –– ––––––––––– –––––––––––– –––––– –––––– –––––– ––––––––––––– ––––––––––––––– –––––––––––––– –––––– ––––––––––––– ––––––––––––––– ––––––––––––––
Session_01 9 3 -4.000 26.000 0.0000 0.0000 0.0098 1.021 ± 0.019 -0.398 ± 0.260 -0.903 ± 0.006 0.0486 0.540 ± 0.151 1.235 ± 0.607 -0.390 ± 0.025
Session_02 9 3 -4.000 26.000 0.0000 0.0000 0.0090 1.015 ± 0.019 0.376 ± 0.260 -0.905 ± 0.006 0.0186 1.350 ± 0.156 -0.871 ± 0.608 -0.504 ± 0.027
–––––––––– –– –– ––––––––––– –––––––––––– –––––– –––––– –––––– ––––––––––––– ––––––––––––––– –––––––––––––– –––––– ––––––––––––– ––––––––––––––– ––––––––––––––
–––––– – ––––––––– –––––––––– –––––– –––––– –––––––– –––––– –––––––– –––––– –––––– –––––––– –––––– ––––––––
Sample N d13C_VPDB d18O_VSMOW D47 SE 95% CL SD p_Levene D48 SE 95% CL SD p_Levene
–––––– – ––––––––– –––––––––– –––––– –––––– –––––––– –––––– –––––––– –––––– –––––– –––––––– –––––– ––––––––
ETH-1 6 2.02 37.02 0.2052 0.0078 0.1380 0.0223
ETH-2 6 -10.17 19.88 0.2085 0.0036 0.1380 0.0482
ETH-3 6 1.71 37.45 0.6132 0.0080 0.2700 0.0176
FOO 6 -5.00 28.91 0.3026 0.0044 ± 0.0093 0.0121 0.164 0.1397 0.0121 ± 0.0255 0.0267 0.127
–––––– – ––––––––– –––––––––– –––––– –––––– –––––––– –––––– –––––––– –––––– –––––– –––––––– –––––– ––––––––
––– –––––––––– –––––– ––––––––––– –––––––––––– ––––––––– ––––––––– –––––––––– –––––––––– –––––––––– –––––––––– –––––––––– ––––––––– ––––––––– ––––––––– –––––––– ––––––––
UID Session Sample d13Cwg_VPDB d18Owg_VSMOW d45 d46 d47 d48 d49 d13C_VPDB d18O_VSMOW D47raw D48raw D49raw D47 D48
––– –––––––––– –––––– ––––––––––– –––––––––––– ––––––––– ––––––––– –––––––––– –––––––––– –––––––––– –––––––––– –––––––––– ––––––––– ––––––––– ––––––––– –––––––– ––––––––
1 Session_01 ETH-1 -4.000 26.000 6.018962 10.747026 16.120787 21.286237 27.780042 2.020000 37.024281 -0.708176 -0.316435 -0.000013 0.197297 0.087763
2 Session_01 ETH-1 -4.000 26.000 6.018962 10.747026 16.132240 21.307795 27.780042 2.020000 37.024281 -0.696913 -0.295333 -0.000013 0.208328 0.126791
3 Session_01 ETH-1 -4.000 26.000 6.018962 10.747026 16.132438 21.313884 27.780042 2.020000 37.024281 -0.696718 -0.289374 -0.000013 0.208519 0.137813
4 Session_01 ETH-2 -4.000 26.000 -5.995859 -5.976076 -12.700300 -12.210735 -18.023381 -10.170000 19.875825 -0.683938 -0.297902 -0.000002 0.209785 0.198705
5 Session_01 ETH-2 -4.000 26.000 -5.995859 -5.976076 -12.707421 -12.270781 -18.023381 -10.170000 19.875825 -0.691145 -0.358673 -0.000002 0.202726 0.086308
6 Session_01 ETH-2 -4.000 26.000 -5.995859 -5.976076 -12.700061 -12.278310 -18.023381 -10.170000 19.875825 -0.683696 -0.366292 -0.000002 0.210022 0.072215
7 Session_01 ETH-3 -4.000 26.000 5.742374 11.161270 16.684379 22.225827 28.306614 1.710000 37.450394 -0.273094 -0.216392 -0.000014 0.623472 0.270873
8 Session_01 ETH-3 -4.000 26.000 5.742374 11.161270 16.660163 22.233729 28.306614 1.710000 37.450394 -0.296906 -0.208664 -0.000014 0.600150 0.285167
9 Session_01 ETH-3 -4.000 26.000 5.742374 11.161270 16.675191 22.215632 28.306614 1.710000 37.450394 -0.282128 -0.226363 -0.000014 0.614623 0.252432
10 Session_01 FOO -4.000 26.000 -0.840413 2.828738 1.328380 5.374933 4.665655 -5.000000 28.907344 -0.582131 -0.288924 -0.000006 0.314928 0.175105
11 Session_01 FOO -4.000 26.000 -0.840413 2.828738 1.302220 5.384454 4.665655 -5.000000 28.907344 -0.608241 -0.279457 -0.000006 0.289356 0.192614
12 Session_01 FOO -4.000 26.000 -0.840413 2.828738 1.322530 5.372841 4.665655 -5.000000 28.907344 -0.587970 -0.291004 -0.000006 0.309209 0.171257
13 Session_02 ETH-1 -4.000 26.000 6.018962 10.747026 16.140853 21.267202 27.780042 2.020000 37.024281 -0.688442 -0.335067 -0.000013 0.207730 0.138730
14 Session_02 ETH-1 -4.000 26.000 6.018962 10.747026 16.127087 21.256983 27.780042 2.020000 37.024281 -0.701980 -0.345071 -0.000013 0.194396 0.131311
15 Session_02 ETH-1 -4.000 26.000 6.018962 10.747026 16.148253 21.287779 27.780042 2.020000 37.024281 -0.681165 -0.314926 -0.000013 0.214898 0.153668
16 Session_02 ETH-2 -4.000 26.000 -5.995859 -5.976076 -12.715859 -12.204791 -18.023381 -10.170000 19.875825 -0.699685 -0.291887 -0.000002 0.207349 0.149128
17 Session_02 ETH-2 -4.000 26.000 -5.995859 -5.976076 -12.709763 -12.188685 -18.023381 -10.170000 19.875825 -0.693516 -0.275587 -0.000002 0.213426 0.161217
18 Session_02 ETH-2 -4.000 26.000 -5.995859 -5.976076 -12.715427 -12.253049 -18.023381 -10.170000 19.875825 -0.699249 -0.340727 -0.000002 0.207780 0.112907
19 Session_02 ETH-3 -4.000 26.000 5.742374 11.161270 16.685994 22.249463 28.306614 1.710000 37.450394 -0.271506 -0.193275 -0.000014 0.618328 0.244431
20 Session_02 ETH-3 -4.000 26.000 5.742374 11.161270 16.681351 22.298166 28.306614 1.710000 37.450394 -0.276071 -0.145641 -0.000014 0.613831 0.279758
21 Session_02 ETH-3 -4.000 26.000 5.742374 11.161270 16.676169 22.306848 28.306614 1.710000 37.450394 -0.281167 -0.137150 -0.000014 0.608813 0.286056
22 Session_02 FOO -4.000 26.000 -0.840413 2.828738 1.324359 5.339497 4.665655 -5.000000 28.907344 -0.586144 -0.324160 -0.000006 0.314015 0.136535
23 Session_02 FOO -4.000 26.000 -0.840413 2.828738 1.297658 5.325854 4.665655 -5.000000 28.907344 -0.612794 -0.337727 -0.000006 0.287767 0.126473
24 Session_02 FOO -4.000 26.000 -0.840413 2.828738 1.310185 5.339898 4.665655 -5.000000 28.907344 -0.600291 -0.323761 -0.000006 0.300082 0.136830
––– –––––––––– –––––– ––––––––––– –––––––––––– ––––––––– ––––––––– –––––––––– –––––––––– –––––––––– –––––––––– –––––––––– ––––––––– ––––––––– ––––––––– –––––––– ––––––––
3. Command-Line Interface (CLI)
Instead of writing Python code, you may directly use the CLI to process raw Δ47 and Δ48 data using reasonable defaults. The simplest way is simply to call:
D47crunch rawdata.csv
This will create a directory named output
and populate it by calling the following methods:
D47data.wg()
D47data.crunch()
D47data.standardize()
D47data.summary()
D47data.table_of_samples()
D47data.table_of_sessions()
D47data.plot_sessions()
D47data.plot_residuals()
D47data.table_of_analyses()
D47data.plot_distribution_of_analyses()
D47data.plot_bulk_compositions()
D47data.save_D47_correl()
You may specify a custom set of anchors instead of the default ones using the --anchors
or -a
option:
D47crunch -a anchors.csv rawdata.csv
In this case, the anchors.csv
file (you may use any other file name) must have the following format:
Sample, d13C_VPDB, d18O_VPDB, D47
ETH-1, 2.02, -2.19, 0.2052
ETH-2, -10.17, -18.69, 0.2085
ETH-3, 1.71, -1.78, 0.6132
ETH-4, , , 0.4511
The samples with non-empty d13C_VPDB
, d18O_VPDB
, and D47
values are used to standardize δ13C, δ18O, and Δ47 values respectively.
You may also provide a list of analyses and/or samples to exclude from the input. This is done with the --exclude
or -e
option:
D47crunch -e badbatch.csv rawdata.csv
In this case, the badbatch.csv
file (again, you may use a different file name) must have the following format:
UID, Sample
A03
A09
B06
, MYBADSAMPLE-1
, MYBADSAMPLE-2
This will exclude (ignore) analyses with the UIDs A03
, A09
, and B06
, and those of samples MYBADSAMPLE-1
and MYBADSAMPLE-2
. It is possible to have and exclude file with only the UID
column, or only the Sample
column, or both, in any order.
The --output-dir
or -o
option may be used to specify a custom directory name for the output. For example, in unix-like shells the following command will create a time-stamped output directory:
D47crunch -o `date "+%Y-%M-%d-%Hh%M"` rawdata.csv
To process Δ48 as well as Δ47, just add the --D48
option.
4. API Documentation
1''' 2Standardization and analytical error propagation of Δ47 and Δ48 clumped-isotope measurements 3 4Process and standardize carbonate and/or CO2 clumped-isotope analyses, 5from low-level data out of a dual-inlet mass spectrometer to final, “absolute” 6Δ47, Δ48 and Δ49 values with fully propagated analytical error estimates 7([Daëron, 2021](https://doi.org/10.1029/2020GC009592)). 8 9The **tutorial** section takes you through a series of simple steps to import/process data and print out the results. 10The **how-to** section provides instructions applicable to various specific tasks. 11 12.. include:: ../../docpages/tutorial.md 13.. include:: ../../docpages/howto.md 14.. include:: ../../docpages/cli.md 15 16# 4. API Documentation 17''' 18 19__docformat__ = "restructuredtext" 20__author__ = 'Mathieu Daëron' 21__contact__ = 'daeron@lsce.ipsl.fr' 22__copyright__ = 'Copyright (c) 2024 Mathieu Daëron' 23__license__ = 'Modified BSD License - https://opensource.org/licenses/BSD-3-Clause' 24__date__ = '2024-09-10' 25__version__ = '2.4.1' 26 27import os 28import numpy as np 29import typer 30from typing_extensions import Annotated 31from statistics import stdev 32from scipy.stats import t as tstudent 33from scipy.stats import levene 34from scipy.interpolate import interp1d 35from numpy import linalg 36from lmfit import Minimizer, Parameters, report_fit 37from matplotlib import pyplot as ppl 38from datetime import datetime as dt 39from functools import wraps 40from colorsys import hls_to_rgb 41from matplotlib import rcParams 42 43typer.rich_utils.STYLE_HELPTEXT = '' 44 45rcParams['font.family'] = 'sans-serif' 46rcParams['font.sans-serif'] = 'Helvetica' 47rcParams['font.size'] = 10 48rcParams['mathtext.fontset'] = 'custom' 49rcParams['mathtext.rm'] = 'sans' 50rcParams['mathtext.bf'] = 'sans:bold' 51rcParams['mathtext.it'] = 'sans:italic' 52rcParams['mathtext.cal'] = 'sans:italic' 53rcParams['mathtext.default'] = 'rm' 54rcParams['xtick.major.size'] = 4 55rcParams['xtick.major.width'] = 1 56rcParams['ytick.major.size'] = 4 57rcParams['ytick.major.width'] = 1 58rcParams['axes.grid'] = False 59rcParams['axes.linewidth'] = 1 60rcParams['grid.linewidth'] = .75 61rcParams['grid.linestyle'] = '-' 62rcParams['grid.alpha'] = .15 63rcParams['savefig.dpi'] = 150 64 65Petersen_etal_CO2eqD47 = np.array([[-12, 1.147113572], [-11, 1.139961218], [-10, 1.132872856], [-9, 1.125847677], [-8, 1.118884889], [-7, 1.111983708], [-6, 1.105143366], [-5, 1.098363105], [-4, 1.091642182], [-3, 1.084979862], [-2, 1.078375423], [-1, 1.071828156], [0, 1.065337360], [1, 1.058902349], [2, 1.052522443], [3, 1.046196976], [4, 1.039925291], [5, 1.033706741], [6, 1.027540690], [7, 1.021426510], [8, 1.015363585], [9, 1.009351306], [10, 1.003389075], [11, 0.997476303], [12, 0.991612409], [13, 0.985796821], [14, 0.980028975], [15, 0.974308318], [16, 0.968634304], [17, 0.963006392], [18, 0.957424055], [19, 0.951886769], [20, 0.946394020], [21, 0.940945302], [22, 0.935540114], [23, 0.930177964], [24, 0.924858369], [25, 0.919580851], [26, 0.914344938], [27, 0.909150167], [28, 0.903996080], [29, 0.898882228], [30, 0.893808167], [31, 0.888773459], [32, 0.883777672], [33, 0.878820382], [34, 0.873901170], [35, 0.869019623], [36, 0.864175334], [37, 0.859367901], [38, 0.854596929], [39, 0.849862028], [40, 0.845162813], [41, 0.840498905], [42, 0.835869931], [43, 0.831275522], [44, 0.826715314], [45, 0.822188950], [46, 0.817696075], [47, 0.813236341], [48, 0.808809404], [49, 0.804414926], [50, 0.800052572], [51, 0.795722012], [52, 0.791422922], [53, 0.787154979], [54, 0.782917869], [55, 0.778711277], [56, 0.774534898], [57, 0.770388426], [58, 0.766271562], [59, 0.762184010], [60, 0.758125479], [61, 0.754095680], [62, 0.750094329], [63, 0.746121147], [64, 0.742175856], [65, 0.738258184], [66, 0.734367860], [67, 0.730504620], [68, 0.726668201], [69, 0.722858343], [70, 0.719074792], [71, 0.715317295], [72, 0.711585602], [73, 0.707879469], [74, 0.704198652], [75, 0.700542912], [76, 0.696912012], [77, 0.693305719], [78, 0.689723802], [79, 0.686166034], [80, 0.682632189], [81, 0.679122047], [82, 0.675635387], [83, 0.672171994], [84, 0.668731654], [85, 0.665314156], [86, 0.661919291], [87, 0.658546854], [88, 0.655196641], [89, 0.651868451], [90, 0.648562087], [91, 0.645277352], [92, 0.642014054], [93, 0.638771999], [94, 0.635551001], [95, 0.632350872], [96, 0.629171428], [97, 0.626012487], [98, 0.622873870], [99, 0.619755397], [100, 0.616656895], [102, 0.610519107], [104, 0.604459143], [106, 0.598475670], [108, 0.592567388], [110, 0.586733026], [112, 0.580971342], [114, 0.575281125], [116, 0.569661187], [118, 0.564110371], [120, 0.558627545], [122, 0.553211600], [124, 0.547861454], [126, 0.542576048], [128, 0.537354347], [130, 0.532195337], [132, 0.527098028], [134, 0.522061450], [136, 0.517084654], [138, 0.512166711], [140, 0.507306712], [142, 0.502503768], [144, 0.497757006], [146, 0.493065573], [148, 0.488428634], [150, 0.483845370], [152, 0.479314980], [154, 0.474836677], [156, 0.470409692], [158, 0.466033271], [160, 0.461706674], [162, 0.457429176], [164, 0.453200067], [166, 0.449018650], [168, 0.444884242], [170, 0.440796174], [172, 0.436753787], [174, 0.432756438], [176, 0.428803494], [178, 0.424894334], [180, 0.421028350], [182, 0.417204944], [184, 0.413423530], [186, 0.409683531], [188, 0.405984383], [190, 0.402325531], [192, 0.398706429], [194, 0.395126543], [196, 0.391585347], [198, 0.388082324], [200, 0.384616967], [202, 0.381188778], [204, 0.377797268], [206, 0.374441954], [208, 0.371122364], [210, 0.367838033], [212, 0.364588505], [214, 0.361373329], [216, 0.358192065], [218, 0.355044277], [220, 0.351929540], [222, 0.348847432], [224, 0.345797540], [226, 0.342779460], [228, 0.339792789], [230, 0.336837136], [232, 0.333912113], [234, 0.331017339], [236, 0.328152439], [238, 0.325317046], [240, 0.322510795], [242, 0.319733329], [244, 0.316984297], [246, 0.314263352], [248, 0.311570153], [250, 0.308904364], [252, 0.306265654], [254, 0.303653699], [256, 0.301068176], [258, 0.298508771], [260, 0.295975171], [262, 0.293467070], [264, 0.290984167], [266, 0.288526163], [268, 0.286092765], [270, 0.283683684], [272, 0.281298636], [274, 0.278937339], [276, 0.276599517], [278, 0.274284898], [280, 0.271993211], [282, 0.269724193], [284, 0.267477582], [286, 0.265253121], [288, 0.263050554], [290, 0.260869633], [292, 0.258710110], [294, 0.256571741], [296, 0.254454286], [298, 0.252357508], [300, 0.250281174], [302, 0.248225053], [304, 0.246188917], [306, 0.244172542], [308, 0.242175707], [310, 0.240198194], [312, 0.238239786], [314, 0.236300272], [316, 0.234379441], [318, 0.232477087], [320, 0.230593005], [322, 0.228726993], [324, 0.226878853], [326, 0.225048388], [328, 0.223235405], [330, 0.221439711], [332, 0.219661118], [334, 0.217899439], [336, 0.216154491], [338, 0.214426091], [340, 0.212714060], [342, 0.211018220], [344, 0.209338398], [346, 0.207674420], [348, 0.206026115], [350, 0.204393315], [355, 0.200378063], [360, 0.196456139], [365, 0.192625077], [370, 0.188882487], [375, 0.185226048], [380, 0.181653511], [385, 0.178162694], [390, 0.174751478], [395, 0.171417807], [400, 0.168159686], [405, 0.164975177], [410, 0.161862398], [415, 0.158819521], [420, 0.155844772], [425, 0.152936426], [430, 0.150092806], [435, 0.147312286], [440, 0.144593281], [445, 0.141934254], [450, 0.139333710], [455, 0.136790195], [460, 0.134302294], [465, 0.131868634], [470, 0.129487876], [475, 0.127158722], [480, 0.124879906], [485, 0.122650197], [490, 0.120468398], [495, 0.118333345], [500, 0.116243903], [505, 0.114198970], [510, 0.112197471], [515, 0.110238362], [520, 0.108320625], [525, 0.106443271], [530, 0.104605335], [535, 0.102805877], [540, 0.101043985], [545, 0.099318768], [550, 0.097629359], [555, 0.095974915], [560, 0.094354612], [565, 0.092767650], [570, 0.091213248], [575, 0.089690648], [580, 0.088199108], [585, 0.086737906], [590, 0.085306341], [595, 0.083903726], [600, 0.082529395], [605, 0.081182697], [610, 0.079862998], [615, 0.078569680], [620, 0.077302141], [625, 0.076059794], [630, 0.074842066], [635, 0.073648400], [640, 0.072478251], [645, 0.071331090], [650, 0.070206399], [655, 0.069103674], [660, 0.068022424], [665, 0.066962168], [670, 0.065922439], [675, 0.064902780], [680, 0.063902748], [685, 0.062921909], [690, 0.061959837], [695, 0.061016122], [700, 0.060090360], [705, 0.059182157], [710, 0.058291131], [715, 0.057416907], [720, 0.056559120], [725, 0.055717414], [730, 0.054891440], [735, 0.054080860], [740, 0.053285343], [745, 0.052504565], [750, 0.051738210], [755, 0.050985971], [760, 0.050247546], [765, 0.049522643], [770, 0.048810974], [775, 0.048112260], [780, 0.047426227], [785, 0.046752609], [790, 0.046091145], [795, 0.045441581], [800, 0.044803668], [805, 0.044177164], [810, 0.043561831], [815, 0.042957438], [820, 0.042363759], [825, 0.041780573], [830, 0.041207664], [835, 0.040644822], [840, 0.040091839], [845, 0.039548516], [850, 0.039014654], [855, 0.038490063], [860, 0.037974554], [865, 0.037467944], [870, 0.036970054], [875, 0.036480707], [880, 0.035999734], [885, 0.035526965], [890, 0.035062238], [895, 0.034605393], [900, 0.034156272], [905, 0.033714724], [910, 0.033280598], [915, 0.032853749], [920, 0.032434032], [925, 0.032021309], [930, 0.031615443], [935, 0.031216300], [940, 0.030823749], [945, 0.030437663], [950, 0.030057915], [955, 0.029684385], [960, 0.029316951], [965, 0.028955498], [970, 0.028599910], [975, 0.028250075], [980, 0.027905884], [985, 0.027567229], [990, 0.027234006], [995, 0.026906112], [1000, 0.026583445], [1005, 0.026265908], [1010, 0.025953405], [1015, 0.025645841], [1020, 0.025343124], [1025, 0.025045163], [1030, 0.024751871], [1035, 0.024463160], [1040, 0.024178947], [1045, 0.023899147], [1050, 0.023623680], [1055, 0.023352467], [1060, 0.023085429], [1065, 0.022822491], [1070, 0.022563577], [1075, 0.022308615], [1080, 0.022057533], [1085, 0.021810260], [1090, 0.021566729], [1095, 0.021326872], [1100, 0.021090622]]) 66_fCO2eqD47_Petersen = interp1d(Petersen_etal_CO2eqD47[:,0], Petersen_etal_CO2eqD47[:,1]) 67def fCO2eqD47_Petersen(T): 68 ''' 69 CO2 equilibrium Δ47 value as a function of T (in degrees C) 70 according to [Petersen et al. (2019)](https://doi.org/10.1029/2018GC008127). 71 72 ''' 73 return float(_fCO2eqD47_Petersen(T)) 74 75 76Wang_etal_CO2eqD47 = np.array([[-83., 1.8954], [-73., 1.7530], [-63., 1.6261], [-53., 1.5126], [-43., 1.4104], [-33., 1.3182], [-23., 1.2345], [-13., 1.1584], [-3., 1.0888], [7., 1.0251], [17., 0.9665], [27., 0.9125], [37., 0.8626], [47., 0.8164], [57., 0.7734], [67., 0.7334], [87., 0.6612], [97., 0.6286], [107., 0.5980], [117., 0.5693], [127., 0.5423], [137., 0.5169], [147., 0.4930], [157., 0.4704], [167., 0.4491], [177., 0.4289], [187., 0.4098], [197., 0.3918], [207., 0.3747], [217., 0.3585], [227., 0.3431], [237., 0.3285], [247., 0.3147], [257., 0.3015], [267., 0.2890], [277., 0.2771], [287., 0.2657], [297., 0.2550], [307., 0.2447], [317., 0.2349], [327., 0.2256], [337., 0.2167], [347., 0.2083], [357., 0.2002], [367., 0.1925], [377., 0.1851], [387., 0.1781], [397., 0.1714], [407., 0.1650], [417., 0.1589], [427., 0.1530], [437., 0.1474], [447., 0.1421], [457., 0.1370], [467., 0.1321], [477., 0.1274], [487., 0.1229], [497., 0.1186], [507., 0.1145], [517., 0.1105], [527., 0.1068], [537., 0.1031], [547., 0.0997], [557., 0.0963], [567., 0.0931], [577., 0.0901], [587., 0.0871], [597., 0.0843], [607., 0.0816], [617., 0.0790], [627., 0.0765], [637., 0.0741], [647., 0.0718], [657., 0.0695], [667., 0.0674], [677., 0.0654], [687., 0.0634], [697., 0.0615], [707., 0.0597], [717., 0.0579], [727., 0.0562], [737., 0.0546], [747., 0.0530], [757., 0.0515], [767., 0.0500], [777., 0.0486], [787., 0.0472], [797., 0.0459], [807., 0.0447], [817., 0.0435], [827., 0.0423], [837., 0.0411], [847., 0.0400], [857., 0.0390], [867., 0.0380], [877., 0.0370], [887., 0.0360], [897., 0.0351], [907., 0.0342], [917., 0.0333], [927., 0.0325], [937., 0.0317], [947., 0.0309], [957., 0.0302], [967., 0.0294], [977., 0.0287], [987., 0.0281], [997., 0.0274], [1007., 0.0268], [1017., 0.0261], [1027., 0.0255], [1037., 0.0249], [1047., 0.0244], [1057., 0.0238], [1067., 0.0233], [1077., 0.0228], [1087., 0.0223], [1097., 0.0218]]) 77_fCO2eqD47_Wang = interp1d(Wang_etal_CO2eqD47[:,0] - 0.15, Wang_etal_CO2eqD47[:,1]) 78def fCO2eqD47_Wang(T): 79 ''' 80 CO2 equilibrium Δ47 value as a function of `T` (in degrees C) 81 according to [Wang et al. (2004)](https://doi.org/10.1016/j.gca.2004.05.039) 82 (supplementary data of [Dennis et al., 2011](https://doi.org/10.1016/j.gca.2011.09.025)). 83 ''' 84 return float(_fCO2eqD47_Wang(T)) 85 86 87def correlated_sum(X, C, w = None): 88 ''' 89 Compute covariance-aware linear combinations 90 91 **Parameters** 92 93 + `X`: list or 1-D array of values to sum 94 + `C`: covariance matrix for the elements of `X` 95 + `w`: list or 1-D array of weights to apply to the elements of `X` 96 (all equal to 1 by default) 97 98 Return the sum (and its SE) of the elements of `X`, with optional weights equal 99 to the elements of `w`, accounting for covariances between the elements of `X`. 100 ''' 101 if w is None: 102 w = [1 for x in X] 103 return np.dot(w,X), (np.dot(w,np.dot(C,w)))**.5 104 105 106def make_csv(x, hsep = ',', vsep = '\n'): 107 ''' 108 Formats a list of lists of strings as a CSV 109 110 **Parameters** 111 112 + `x`: the list of lists of strings to format 113 + `hsep`: the field separator (`,` by default) 114 + `vsep`: the line-ending convention to use (`\\n` by default) 115 116 **Example** 117 118 ```py 119 print(make_csv([['a', 'b', 'c'], ['d', 'e', 'f']])) 120 ``` 121 122 outputs: 123 124 ```py 125 a,b,c 126 d,e,f 127 ``` 128 ''' 129 return vsep.join([hsep.join(l) for l in x]) 130 131 132def pf(txt): 133 ''' 134 Modify string `txt` to follow `lmfit.Parameter()` naming rules. 135 ''' 136 return txt.replace('-','_').replace('.','_').replace(' ','_') 137 138 139def smart_type(x): 140 ''' 141 Tries to convert string `x` to a float if it includes a decimal point, or 142 to an integer if it does not. If both attempts fail, return the original 143 string unchanged. 144 ''' 145 try: 146 y = float(x) 147 except ValueError: 148 return x 149 if '.' not in x: 150 return int(y) 151 return y 152 153 154def pretty_table(x, header = 1, hsep = ' ', vsep = '–', align = '<'): 155 ''' 156 Reads a list of lists of strings and outputs an ascii table 157 158 **Parameters** 159 160 + `x`: a list of lists of strings 161 + `header`: the number of lines to treat as header lines 162 + `hsep`: the horizontal separator between columns 163 + `vsep`: the character to use as vertical separator 164 + `align`: string of left (`<`) or right (`>`) alignment characters. 165 166 **Example** 167 168 ```py 169 x = [['A', 'B', 'C'], ['1', '1.9999', 'foo'], ['10', 'x', 'bar']] 170 print(pretty_table(x)) 171 ``` 172 yields: 173 ``` 174 -- ------ --- 175 A B C 176 -- ------ --- 177 1 1.9999 foo 178 10 x bar 179 -- ------ --- 180 ``` 181 182 ''' 183 txt = [] 184 widths = [np.max([len(e) for e in c]) for c in zip(*x)] 185 186 if len(widths) > len(align): 187 align += '>' * (len(widths)-len(align)) 188 sepline = hsep.join([vsep*w for w in widths]) 189 txt += [sepline] 190 for k,l in enumerate(x): 191 if k and k == header: 192 txt += [sepline] 193 txt += [hsep.join([f'{e:{a}{w}}' for e, w, a in zip(l, widths, align)])] 194 txt += [sepline] 195 txt += [''] 196 return '\n'.join(txt) 197 198 199def transpose_table(x): 200 ''' 201 Transpose a list if lists 202 203 **Parameters** 204 205 + `x`: a list of lists 206 207 **Example** 208 209 ```py 210 x = [[1, 2], [3, 4]] 211 print(transpose_table(x)) # yields: [[1, 3], [2, 4]] 212 ``` 213 ''' 214 return [[e for e in c] for c in zip(*x)] 215 216 217def w_avg(X, sX) : 218 ''' 219 Compute variance-weighted average 220 221 Returns the value and SE of the weighted average of the elements of `X`, 222 with relative weights equal to their inverse variances (`1/sX**2`). 223 224 **Parameters** 225 226 + `X`: array-like of elements to average 227 + `sX`: array-like of the corresponding SE values 228 229 **Tip** 230 231 If `X` and `sX` are initially arranged as a list of `(x, sx)` doublets, 232 they may be rearranged using `zip()`: 233 234 ```python 235 foo = [(0, 1), (1, 0.5), (2, 0.5)] 236 print(w_avg(*zip(*foo))) # yields: (1.3333333333333333, 0.3333333333333333) 237 ``` 238 ''' 239 X = [ x for x in X ] 240 sX = [ sx for sx in sX ] 241 W = [ sx**-2 for sx in sX ] 242 W = [ w/sum(W) for w in W ] 243 Xavg = sum([ w*x for w,x in zip(W,X) ]) 244 sXavg = sum([ w**2*sx**2 for w,sx in zip(W,sX) ])**.5 245 return Xavg, sXavg 246 247 248def read_csv(filename, sep = ''): 249 ''' 250 Read contents of `filename` in csv format and return a list of dictionaries. 251 252 In the csv string, spaces before and after field separators (`','` by default) 253 are optional. 254 255 **Parameters** 256 257 + `filename`: the csv file to read 258 + `sep`: csv separator delimiting the fields. By default, use `,`, `;`, or `\t`, 259 whichever appers most often in the contents of `filename`. 260 ''' 261 with open(filename) as fid: 262 txt = fid.read() 263 264 if sep == '': 265 sep = sorted(',;\t', key = lambda x: - txt.count(x))[0] 266 txt = [[x.strip() for x in l.split(sep)] for l in txt.splitlines() if l.strip()] 267 return [{k: smart_type(v) for k,v in zip(txt[0], l) if v} for l in txt[1:]] 268 269 270def simulate_single_analysis( 271 sample = 'MYSAMPLE', 272 d13Cwg_VPDB = -4., d18Owg_VSMOW = 26., 273 d13C_VPDB = None, d18O_VPDB = None, 274 D47 = None, D48 = None, D49 = 0., D17O = 0., 275 a47 = 1., b47 = 0., c47 = -0.9, 276 a48 = 1., b48 = 0., c48 = -0.45, 277 Nominal_D47 = None, 278 Nominal_D48 = None, 279 Nominal_d13C_VPDB = None, 280 Nominal_d18O_VPDB = None, 281 ALPHA_18O_ACID_REACTION = None, 282 R13_VPDB = None, 283 R17_VSMOW = None, 284 R18_VSMOW = None, 285 LAMBDA_17 = None, 286 R18_VPDB = None, 287 ): 288 ''' 289 Compute working-gas delta values for a single analysis, assuming a stochastic working 290 gas and a “perfect” measurement (i.e. raw Δ values are identical to absolute values). 291 292 **Parameters** 293 294 + `sample`: sample name 295 + `d13Cwg_VPDB`, `d18Owg_VSMOW`: bulk composition of the working gas 296 (respectively –4 and +26 ‰ by default) 297 + `d13C_VPDB`, `d18O_VPDB`: bulk composition of the carbonate sample 298 + `D47`, `D48`, `D49`, `D17O`: clumped-isotope and oxygen-17 anomalies 299 of the carbonate sample 300 + `Nominal_D47`, `Nominal_D48`: where to lookup Δ47 and 301 Δ48 values if `D47` or `D48` are not specified 302 + `Nominal_d13C_VPDB`, `Nominal_d18O_VPDB`: where to lookup δ13C and 303 δ18O values if `d13C_VPDB` or `d18O_VPDB` are not specified 304 + `ALPHA_18O_ACID_REACTION`: 18O/16O acid fractionation factor 305 + `R13_VPDB`, `R17_VSMOW`, `R18_VSMOW`, `LAMBDA_17`, `R18_VPDB`: oxygen-17 306 correction parameters (by default equal to the `D4xdata` default values) 307 308 Returns a dictionary with fields 309 `['Sample', 'D17O', 'd13Cwg_VPDB', 'd18Owg_VSMOW', 'd45', 'd46', 'd47', 'd48', 'd49']`. 310 ''' 311 312 if Nominal_d13C_VPDB is None: 313 Nominal_d13C_VPDB = D4xdata().Nominal_d13C_VPDB 314 315 if Nominal_d18O_VPDB is None: 316 Nominal_d18O_VPDB = D4xdata().Nominal_d18O_VPDB 317 318 if ALPHA_18O_ACID_REACTION is None: 319 ALPHA_18O_ACID_REACTION = D4xdata().ALPHA_18O_ACID_REACTION 320 321 if R13_VPDB is None: 322 R13_VPDB = D4xdata().R13_VPDB 323 324 if R17_VSMOW is None: 325 R17_VSMOW = D4xdata().R17_VSMOW 326 327 if R18_VSMOW is None: 328 R18_VSMOW = D4xdata().R18_VSMOW 329 330 if LAMBDA_17 is None: 331 LAMBDA_17 = D4xdata().LAMBDA_17 332 333 if R18_VPDB is None: 334 R18_VPDB = D4xdata().R18_VPDB 335 336 R17_VPDB = R17_VSMOW * (R18_VPDB / R18_VSMOW) ** LAMBDA_17 337 338 if Nominal_D47 is None: 339 Nominal_D47 = D47data().Nominal_D47 340 341 if Nominal_D48 is None: 342 Nominal_D48 = D48data().Nominal_D48 343 344 if d13C_VPDB is None: 345 if sample in Nominal_d13C_VPDB: 346 d13C_VPDB = Nominal_d13C_VPDB[sample] 347 else: 348 raise KeyError(f"Sample {sample} is missing d13C_VDP value, and it is not defined in Nominal_d13C_VDP.") 349 350 if d18O_VPDB is None: 351 if sample in Nominal_d18O_VPDB: 352 d18O_VPDB = Nominal_d18O_VPDB[sample] 353 else: 354 raise KeyError(f"Sample {sample} is missing d18O_VPDB value, and it is not defined in Nominal_d18O_VPDB.") 355 356 if D47 is None: 357 if sample in Nominal_D47: 358 D47 = Nominal_D47[sample] 359 else: 360 raise KeyError(f"Sample {sample} is missing D47 value, and it is not defined in Nominal_D47.") 361 362 if D48 is None: 363 if sample in Nominal_D48: 364 D48 = Nominal_D48[sample] 365 else: 366 raise KeyError(f"Sample {sample} is missing D48 value, and it is not defined in Nominal_D48.") 367 368 X = D4xdata() 369 X.R13_VPDB = R13_VPDB 370 X.R17_VSMOW = R17_VSMOW 371 X.R18_VSMOW = R18_VSMOW 372 X.LAMBDA_17 = LAMBDA_17 373 X.R18_VPDB = R18_VPDB 374 X.R17_VPDB = R17_VSMOW * (R18_VPDB / R18_VSMOW)**LAMBDA_17 375 376 R45wg, R46wg, R47wg, R48wg, R49wg = X.compute_isobar_ratios( 377 R13 = R13_VPDB * (1 + d13Cwg_VPDB/1000), 378 R18 = R18_VSMOW * (1 + d18Owg_VSMOW/1000), 379 ) 380 R45, R46, R47, R48, R49 = X.compute_isobar_ratios( 381 R13 = R13_VPDB * (1 + d13C_VPDB/1000), 382 R18 = R18_VPDB * (1 + d18O_VPDB/1000) * ALPHA_18O_ACID_REACTION, 383 D17O=D17O, D47=D47, D48=D48, D49=D49, 384 ) 385 R45stoch, R46stoch, R47stoch, R48stoch, R49stoch = X.compute_isobar_ratios( 386 R13 = R13_VPDB * (1 + d13C_VPDB/1000), 387 R18 = R18_VPDB * (1 + d18O_VPDB/1000) * ALPHA_18O_ACID_REACTION, 388 D17O=D17O, 389 ) 390 391 d45 = 1000 * (R45/R45wg - 1) 392 d46 = 1000 * (R46/R46wg - 1) 393 d47 = 1000 * (R47/R47wg - 1) 394 d48 = 1000 * (R48/R48wg - 1) 395 d49 = 1000 * (R49/R49wg - 1) 396 397 for k in range(3): # dumb iteration to adjust for small changes in d47 398 R47raw = (1 + (a47 * D47 + b47 * d47 + c47)/1000) * R47stoch 399 R48raw = (1 + (a48 * D48 + b48 * d48 + c48)/1000) * R48stoch 400 d47 = 1000 * (R47raw/R47wg - 1) 401 d48 = 1000 * (R48raw/R48wg - 1) 402 403 return dict( 404 Sample = sample, 405 D17O = D17O, 406 d13Cwg_VPDB = d13Cwg_VPDB, 407 d18Owg_VSMOW = d18Owg_VSMOW, 408 d45 = d45, 409 d46 = d46, 410 d47 = d47, 411 d48 = d48, 412 d49 = d49, 413 ) 414 415 416def virtual_data( 417 samples = [], 418 a47 = 1., b47 = 0., c47 = -0.9, 419 a48 = 1., b48 = 0., c48 = -0.45, 420 rd45 = 0.020, rd46 = 0.060, 421 rD47 = 0.015, rD48 = 0.045, 422 d13Cwg_VPDB = None, d18Owg_VSMOW = None, 423 session = None, 424 Nominal_D47 = None, Nominal_D48 = None, 425 Nominal_d13C_VPDB = None, Nominal_d18O_VPDB = None, 426 ALPHA_18O_ACID_REACTION = None, 427 R13_VPDB = None, 428 R17_VSMOW = None, 429 R18_VSMOW = None, 430 LAMBDA_17 = None, 431 R18_VPDB = None, 432 seed = 0, 433 shuffle = True, 434 ): 435 ''' 436 Return list with simulated analyses from a single session. 437 438 **Parameters** 439 440 + `samples`: a list of entries; each entry is a dictionary with the following fields: 441 * `Sample`: the name of the sample 442 * `d13C_VPDB`, `d18O_VPDB`: bulk composition of the carbonate sample 443 * `D47`, `D48`, `D49`, `D17O` (all optional): clumped-isotope and oxygen-17 anomalies of the carbonate sample 444 * `N`: how many analyses to generate for this sample 445 + `a47`: scrambling factor for Δ47 446 + `b47`: compositional nonlinearity for Δ47 447 + `c47`: working gas offset for Δ47 448 + `a48`: scrambling factor for Δ48 449 + `b48`: compositional nonlinearity for Δ48 450 + `c48`: working gas offset for Δ48 451 + `rd45`: analytical repeatability of δ45 452 + `rd46`: analytical repeatability of δ46 453 + `rD47`: analytical repeatability of Δ47 454 + `rD48`: analytical repeatability of Δ48 455 + `d13Cwg_VPDB`, `d18Owg_VSMOW`: bulk composition of the working gas 456 (by default equal to the `simulate_single_analysis` default values) 457 + `session`: name of the session (no name by default) 458 + `Nominal_D47`, `Nominal_D48`: where to lookup Δ47 and Δ48 values 459 if `D47` or `D48` are not specified (by default equal to the `simulate_single_analysis` defaults) 460 + `Nominal_d13C_VPDB`, `Nominal_d18O_VPDB`: where to lookup δ13C and 461 δ18O values if `d13C_VPDB` or `d18O_VPDB` are not specified 462 (by default equal to the `simulate_single_analysis` defaults) 463 + `ALPHA_18O_ACID_REACTION`: 18O/16O acid fractionation factor 464 (by default equal to the `simulate_single_analysis` defaults) 465 + `R13_VPDB`, `R17_VSMOW`, `R18_VSMOW`, `LAMBDA_17`, `R18_VPDB`: oxygen-17 466 correction parameters (by default equal to the `simulate_single_analysis` default) 467 + `seed`: explicitly set to a non-zero value to achieve random but repeatable simulations 468 + `shuffle`: randomly reorder the sequence of analyses 469 470 471 Here is an example of using this method to generate an arbitrary combination of 472 anchors and unknowns for a bunch of sessions: 473 474 ```py 475 .. include:: ../../code_examples/virtual_data/example.py 476 ``` 477 478 This should output something like: 479 480 ``` 481 .. include:: ../../code_examples/virtual_data/output.txt 482 ``` 483 ''' 484 485 kwargs = locals().copy() 486 487 from numpy import random as nprandom 488 if seed: 489 rng = nprandom.default_rng(seed) 490 else: 491 rng = nprandom.default_rng() 492 493 N = sum([s['N'] for s in samples]) 494 errors45 = rng.normal(loc = 0, scale = 1, size = N) # generate random measurement errors 495 errors45 *= rd45 / stdev(errors45) # scale errors to rd45 496 errors46 = rng.normal(loc = 0, scale = 1, size = N) # generate random measurement errors 497 errors46 *= rd46 / stdev(errors46) # scale errors to rd46 498 errors47 = rng.normal(loc = 0, scale = 1, size = N) # generate random measurement errors 499 errors47 *= rD47 / stdev(errors47) # scale errors to rD47 500 errors48 = rng.normal(loc = 0, scale = 1, size = N) # generate random measurement errors 501 errors48 *= rD48 / stdev(errors48) # scale errors to rD48 502 503 k = 0 504 out = [] 505 for s in samples: 506 kw = {} 507 kw['sample'] = s['Sample'] 508 kw = { 509 **kw, 510 **{var: kwargs[var] 511 for var in [ 512 'd13Cwg_VPDB', 'd18Owg_VSMOW', 'ALPHA_18O_ACID_REACTION', 513 'Nominal_D47', 'Nominal_D48', 'Nominal_d13C_VPDB', 'Nominal_d18O_VPDB', 514 'R13_VPDB', 'R17_VSMOW', 'R18_VSMOW', 'LAMBDA_17', 'R18_VPDB', 515 'a47', 'b47', 'c47', 'a48', 'b48', 'c48', 516 ] 517 if kwargs[var] is not None}, 518 **{var: s[var] 519 for var in ['d13C_VPDB', 'd18O_VPDB', 'D47', 'D48', 'D49', 'D17O'] 520 if var in s}, 521 } 522 523 sN = s['N'] 524 while sN: 525 out.append(simulate_single_analysis(**kw)) 526 out[-1]['d45'] += errors45[k] 527 out[-1]['d46'] += errors46[k] 528 out[-1]['d47'] += (errors45[k] + errors46[k] + errors47[k]) * a47 529 out[-1]['d48'] += (2*errors46[k] + errors48[k]) * a48 530 sN -= 1 531 k += 1 532 533 if session is not None: 534 for r in out: 535 r['Session'] = session 536 537 if shuffle: 538 nprandom.shuffle(out) 539 540 return out 541 542def table_of_samples( 543 data47 = None, 544 data48 = None, 545 dir = 'output', 546 filename = None, 547 save_to_file = True, 548 print_out = True, 549 output = None, 550 ): 551 ''' 552 Print out, save to disk and/or return a combined table of samples 553 for a pair of `D47data` and `D48data` objects. 554 555 **Parameters** 556 557 + `data47`: `D47data` instance 558 + `data48`: `D48data` instance 559 + `dir`: the directory in which to save the table 560 + `filename`: the name to the csv file to write to 561 + `save_to_file`: whether to save the table to disk 562 + `print_out`: whether to print out the table 563 + `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`); 564 if set to `'raw'`: return a list of list of strings 565 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 566 ''' 567 if data47 is None: 568 if data48 is None: 569 raise TypeError("Arguments must include at least one D47data() or D48data() instance.") 570 else: 571 return data48.table_of_samples( 572 dir = dir, 573 filename = filename, 574 save_to_file = save_to_file, 575 print_out = print_out, 576 output = output 577 ) 578 else: 579 if data48 is None: 580 return data47.table_of_samples( 581 dir = dir, 582 filename = filename, 583 save_to_file = save_to_file, 584 print_out = print_out, 585 output = output 586 ) 587 else: 588 out47 = data47.table_of_samples(save_to_file = False, print_out = False, output = 'raw') 589 out48 = data48.table_of_samples(save_to_file = False, print_out = False, output = 'raw') 590 out = transpose_table(transpose_table(out47) + transpose_table(out48)[4:]) 591 592 if save_to_file: 593 if not os.path.exists(dir): 594 os.makedirs(dir) 595 if filename is None: 596 filename = f'D47D48_samples.csv' 597 with open(f'{dir}/{filename}', 'w') as fid: 598 fid.write(make_csv(out)) 599 if print_out: 600 print('\n'+pretty_table(out)) 601 if output == 'raw': 602 return out 603 elif output == 'pretty': 604 return pretty_table(out) 605 606 607def table_of_sessions( 608 data47 = None, 609 data48 = None, 610 dir = 'output', 611 filename = None, 612 save_to_file = True, 613 print_out = True, 614 output = None, 615 ): 616 ''' 617 Print out, save to disk and/or return a combined table of sessions 618 for a pair of `D47data` and `D48data` objects. 619 ***Only applicable if the sessions in `data47` and those in `data48` 620 consist of the exact same sets of analyses.*** 621 622 **Parameters** 623 624 + `data47`: `D47data` instance 625 + `data48`: `D48data` instance 626 + `dir`: the directory in which to save the table 627 + `filename`: the name to the csv file to write to 628 + `save_to_file`: whether to save the table to disk 629 + `print_out`: whether to print out the table 630 + `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`); 631 if set to `'raw'`: return a list of list of strings 632 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 633 ''' 634 if data47 is None: 635 if data48 is None: 636 raise TypeError("Arguments must include at least one D47data() or D48data() instance.") 637 else: 638 return data48.table_of_sessions( 639 dir = dir, 640 filename = filename, 641 save_to_file = save_to_file, 642 print_out = print_out, 643 output = output 644 ) 645 else: 646 if data48 is None: 647 return data47.table_of_sessions( 648 dir = dir, 649 filename = filename, 650 save_to_file = save_to_file, 651 print_out = print_out, 652 output = output 653 ) 654 else: 655 out47 = data47.table_of_sessions(save_to_file = False, print_out = False, output = 'raw') 656 out48 = data48.table_of_sessions(save_to_file = False, print_out = False, output = 'raw') 657 for k,x in enumerate(out47[0]): 658 if k>7: 659 out47[0][k] = out47[0][k].replace('a', 'a_47').replace('b', 'b_47').replace('c', 'c_47') 660 out48[0][k] = out48[0][k].replace('a', 'a_48').replace('b', 'b_48').replace('c', 'c_48') 661 out = transpose_table(transpose_table(out47) + transpose_table(out48)[7:]) 662 663 if save_to_file: 664 if not os.path.exists(dir): 665 os.makedirs(dir) 666 if filename is None: 667 filename = f'D47D48_sessions.csv' 668 with open(f'{dir}/{filename}', 'w') as fid: 669 fid.write(make_csv(out)) 670 if print_out: 671 print('\n'+pretty_table(out)) 672 if output == 'raw': 673 return out 674 elif output == 'pretty': 675 return pretty_table(out) 676 677 678def table_of_analyses( 679 data47 = None, 680 data48 = None, 681 dir = 'output', 682 filename = None, 683 save_to_file = True, 684 print_out = True, 685 output = None, 686 ): 687 ''' 688 Print out, save to disk and/or return a combined table of analyses 689 for a pair of `D47data` and `D48data` objects. 690 691 If the sessions in `data47` and those in `data48` do not consist of 692 the exact same sets of analyses, the table will have two columns 693 `Session_47` and `Session_48` instead of a single `Session` column. 694 695 **Parameters** 696 697 + `data47`: `D47data` instance 698 + `data48`: `D48data` instance 699 + `dir`: the directory in which to save the table 700 + `filename`: the name to the csv file to write to 701 + `save_to_file`: whether to save the table to disk 702 + `print_out`: whether to print out the table 703 + `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`); 704 if set to `'raw'`: return a list of list of strings 705 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 706 ''' 707 if data47 is None: 708 if data48 is None: 709 raise TypeError("Arguments must include at least one D47data() or D48data() instance.") 710 else: 711 return data48.table_of_analyses( 712 dir = dir, 713 filename = filename, 714 save_to_file = save_to_file, 715 print_out = print_out, 716 output = output 717 ) 718 else: 719 if data48 is None: 720 return data47.table_of_analyses( 721 dir = dir, 722 filename = filename, 723 save_to_file = save_to_file, 724 print_out = print_out, 725 output = output 726 ) 727 else: 728 out47 = data47.table_of_analyses(save_to_file = False, print_out = False, output = 'raw') 729 out48 = data48.table_of_analyses(save_to_file = False, print_out = False, output = 'raw') 730 731 if [l[1] for l in out47[1:]] == [l[1] for l in out48[1:]]: # if sessions are identical 732 out = transpose_table(transpose_table(out47) + transpose_table(out48)[-1:]) 733 else: 734 out47[0][1] = 'Session_47' 735 out48[0][1] = 'Session_48' 736 out47 = transpose_table(out47) 737 out48 = transpose_table(out48) 738 out = transpose_table(out47[:2] + out48[1:2] + out47[2:] + out48[-1:]) 739 740 if save_to_file: 741 if not os.path.exists(dir): 742 os.makedirs(dir) 743 if filename is None: 744 filename = f'D47D48_sessions.csv' 745 with open(f'{dir}/{filename}', 'w') as fid: 746 fid.write(make_csv(out)) 747 if print_out: 748 print('\n'+pretty_table(out)) 749 if output == 'raw': 750 return out 751 elif output == 'pretty': 752 return pretty_table(out) 753 754 755def _fullcovar(minresult, epsilon = 0.01, named = False): 756 ''' 757 Construct full covariance matrix in the case of constrained parameters 758 ''' 759 760 import asteval 761 762 def f(values): 763 interp = asteval.Interpreter() 764 for n,v in zip(minresult.var_names, values): 765 interp(f'{n} = {v}') 766 for q in minresult.params: 767 if minresult.params[q].expr: 768 interp(f'{q} = {minresult.params[q].expr}') 769 return np.array([interp.symtable[q] for q in minresult.params]) 770 771 # construct Jacobian 772 J = np.zeros((minresult.nvarys, len(minresult.params))) 773 X = np.array([minresult.params[p].value for p in minresult.var_names]) 774 sX = np.array([minresult.params[p].stderr for p in minresult.var_names]) 775 776 for j in range(minresult.nvarys): 777 x1 = [_ for _ in X] 778 x1[j] += epsilon * sX[j] 779 x2 = [_ for _ in X] 780 x2[j] -= epsilon * sX[j] 781 J[j,:] = (f(x1) - f(x2)) / (2 * epsilon * sX[j]) 782 783 _names = [q for q in minresult.params] 784 _covar = J.T @ minresult.covar @ J 785 _se = np.diag(_covar)**.5 786 _correl = _covar.copy() 787 for k,s in enumerate(_se): 788 if s: 789 _correl[k,:] /= s 790 _correl[:,k] /= s 791 792 if named: 793 _covar = {i: {j:_covar[i,j] for j in minresult.params} for i in minresult.params} 794 _se = {i: _se[i] for i in minresult.params} 795 _correl = {i: {j:_correl[i,j] for j in minresult.params} for i in minresult.params} 796 797 return _names, _covar, _se, _correl 798 799 800class D4xdata(list): 801 ''' 802 Store and process data for a large set of Δ47 and/or Δ48 803 analyses, usually comprising more than one analytical session. 804 ''' 805 806 ### 17O CORRECTION PARAMETERS 807 R13_VPDB = 0.01118 # (Chang & Li, 1990) 808 ''' 809 Absolute (13C/12C) ratio of VPDB. 810 By default equal to 0.01118 ([Chang & Li, 1990](http://www.cnki.com.cn/Article/CJFDTotal-JXTW199004006.htm)) 811 ''' 812 813 R18_VSMOW = 0.0020052 # (Baertschi, 1976) 814 ''' 815 Absolute (18O/16C) ratio of VSMOW. 816 By default equal to 0.0020052 ([Baertschi, 1976](https://doi.org/10.1016/0012-821X(76)90115-1)) 817 ''' 818 819 LAMBDA_17 = 0.528 # (Barkan & Luz, 2005) 820 ''' 821 Mass-dependent exponent for triple oxygen isotopes. 822 By default equal to 0.528 ([Barkan & Luz, 2005](https://doi.org/10.1002/rcm.2250)) 823 ''' 824 825 R17_VSMOW = 0.00038475 # (Assonov & Brenninkmeijer, 2003, rescaled to R13_VPDB) 826 ''' 827 Absolute (17O/16C) ratio of VSMOW. 828 By default equal to 0.00038475 829 ([Assonov & Brenninkmeijer, 2003](https://dx.doi.org/10.1002/rcm.1011), 830 rescaled to `R13_VPDB`) 831 ''' 832 833 R18_VPDB = R18_VSMOW * 1.03092 834 ''' 835 Absolute (18O/16C) ratio of VPDB. 836 By definition equal to `R18_VSMOW * 1.03092`. 837 ''' 838 839 R17_VPDB = R17_VSMOW * 1.03092 ** LAMBDA_17 840 ''' 841 Absolute (17O/16C) ratio of VPDB. 842 By definition equal to `R17_VSMOW * 1.03092 ** LAMBDA_17`. 843 ''' 844 845 LEVENE_REF_SAMPLE = 'ETH-3' 846 ''' 847 After the Δ4x standardization step, each sample is tested to 848 assess whether the Δ4x variance within all analyses for that 849 sample differs significantly from that observed for a given reference 850 sample (using [Levene's test](https://en.wikipedia.org/wiki/Levene%27s_test), 851 which yields a p-value corresponding to the null hypothesis that the 852 underlying variances are equal). 853 854 `LEVENE_REF_SAMPLE` (by default equal to `'ETH-3'`) specifies which 855 sample should be used as a reference for this test. 856 ''' 857 858 ALPHA_18O_ACID_REACTION = round(np.exp(3.59 / (90 + 273.15) - 1.79e-3), 6) # (Kim et al., 2007, calcite) 859 ''' 860 Specifies the 18O/16O fractionation factor generally applicable 861 to acid reactions in the dataset. Currently used by `D4xdata.wg()`, 862 `D4xdata.standardize_d13C`, and `D4xdata.standardize_d18O`. 863 864 By default equal to 1.008129 (calcite reacted at 90 °C, 865 [Kim et al., 2007](https://dx.doi.org/10.1016/j.chemgeo.2007.08.005)). 866 ''' 867 868 Nominal_d13C_VPDB = { 869 'ETH-1': 2.02, 870 'ETH-2': -10.17, 871 'ETH-3': 1.71, 872 } # (Bernasconi et al., 2018) 873 ''' 874 Nominal δ13C_VPDB values assigned to carbonate standards, used by 875 `D4xdata.standardize_d13C()`. 876 877 By default equal to `{'ETH-1': 2.02, 'ETH-2': -10.17, 'ETH-3': 1.71}` after 878 [Bernasconi et al. (2018)](https://doi.org/10.1029/2017GC007385). 879 ''' 880 881 Nominal_d18O_VPDB = { 882 'ETH-1': -2.19, 883 'ETH-2': -18.69, 884 'ETH-3': -1.78, 885 } # (Bernasconi et al., 2018) 886 ''' 887 Nominal δ18O_VPDB values assigned to carbonate standards, used by 888 `D4xdata.standardize_d18O()`. 889 890 By default equal to `{'ETH-1': -2.19, 'ETH-2': -18.69, 'ETH-3': -1.78}` after 891 [Bernasconi et al. (2018)](https://doi.org/10.1029/2017GC007385). 892 ''' 893 894 d13C_STANDARDIZATION_METHOD = '2pt' 895 ''' 896 Method by which to standardize δ13C values: 897 898 + `none`: do not apply any δ13C standardization. 899 + `'1pt'`: within each session, offset all initial δ13C values so as to 900 minimize the difference between final δ13C_VPDB values and 901 `Nominal_d13C_VPDB` (averaged over all analyses for which `Nominal_d13C_VPDB` is defined). 902 + `'2pt'`: within each session, apply a affine trasformation to all δ13C 903 values so as to minimize the difference between final δ13C_VPDB 904 values and `Nominal_d13C_VPDB` (averaged over all analyses for which `Nominal_d13C_VPDB` 905 is defined). 906 ''' 907 908 d18O_STANDARDIZATION_METHOD = '2pt' 909 ''' 910 Method by which to standardize δ18O values: 911 912 + `none`: do not apply any δ18O standardization. 913 + `'1pt'`: within each session, offset all initial δ18O values so as to 914 minimize the difference between final δ18O_VPDB values and 915 `Nominal_d18O_VPDB` (averaged over all analyses for which `Nominal_d18O_VPDB` is defined). 916 + `'2pt'`: within each session, apply a affine trasformation to all δ18O 917 values so as to minimize the difference between final δ18O_VPDB 918 values and `Nominal_d18O_VPDB` (averaged over all analyses for which `Nominal_d18O_VPDB` 919 is defined). 920 ''' 921 922 def __init__(self, l = [], mass = '47', logfile = '', session = 'mySession', verbose = False): 923 ''' 924 **Parameters** 925 926 + `l`: a list of dictionaries, with each dictionary including at least the keys 927 `Sample`, `d45`, `d46`, and `d47` or `d48`. 928 + `mass`: `'47'` or `'48'` 929 + `logfile`: if specified, write detailed logs to this file path when calling `D4xdata` methods. 930 + `session`: define session name for analyses without a `Session` key 931 + `verbose`: if `True`, print out detailed logs when calling `D4xdata` methods. 932 933 Returns a `D4xdata` object derived from `list`. 934 ''' 935 self._4x = mass 936 self.verbose = verbose 937 self.prefix = 'D4xdata' 938 self.logfile = logfile 939 list.__init__(self, l) 940 self.Nf = None 941 self.repeatability = {} 942 self.refresh(session = session) 943 944 945 def make_verbal(oldfun): 946 ''' 947 Decorator: allow temporarily changing `self.prefix` and overriding `self.verbose`. 948 ''' 949 @wraps(oldfun) 950 def newfun(*args, verbose = '', **kwargs): 951 myself = args[0] 952 oldprefix = myself.prefix 953 myself.prefix = oldfun.__name__ 954 if verbose != '': 955 oldverbose = myself.verbose 956 myself.verbose = verbose 957 out = oldfun(*args, **kwargs) 958 myself.prefix = oldprefix 959 if verbose != '': 960 myself.verbose = oldverbose 961 return out 962 return newfun 963 964 965 def msg(self, txt): 966 ''' 967 Log a message to `self.logfile`, and print it out if `verbose = True` 968 ''' 969 self.log(txt) 970 if self.verbose: 971 print(f'{f"[{self.prefix}]":<16} {txt}') 972 973 974 def vmsg(self, txt): 975 ''' 976 Log a message to `self.logfile` and print it out 977 ''' 978 self.log(txt) 979 print(txt) 980 981 982 def log(self, *txts): 983 ''' 984 Log a message to `self.logfile` 985 ''' 986 if self.logfile: 987 with open(self.logfile, 'a') as fid: 988 for txt in txts: 989 fid.write(f'\n{dt.now().strftime("%Y-%m-%d %H:%M:%S")} {f"[{self.prefix}]":<16} {txt}') 990 991 992 def refresh(self, session = 'mySession'): 993 ''' 994 Update `self.sessions`, `self.samples`, `self.anchors`, and `self.unknowns`. 995 ''' 996 self.fill_in_missing_info(session = session) 997 self.refresh_sessions() 998 self.refresh_samples() 999 1000 1001 def refresh_sessions(self): 1002 ''' 1003 Update `self.sessions` and set `scrambling_drift`, `slope_drift`, and `wg_drift` 1004 to `False` for all sessions. 1005 ''' 1006 self.sessions = { 1007 s: {'data': [r for r in self if r['Session'] == s]} 1008 for s in sorted({r['Session'] for r in self}) 1009 } 1010 for s in self.sessions: 1011 self.sessions[s]['scrambling_drift'] = False 1012 self.sessions[s]['slope_drift'] = False 1013 self.sessions[s]['wg_drift'] = False 1014 self.sessions[s]['d13C_standardization_method'] = self.d13C_STANDARDIZATION_METHOD 1015 self.sessions[s]['d18O_standardization_method'] = self.d18O_STANDARDIZATION_METHOD 1016 1017 1018 def refresh_samples(self): 1019 ''' 1020 Define `self.samples`, `self.anchors`, and `self.unknowns`. 1021 ''' 1022 self.samples = { 1023 s: {'data': [r for r in self if r['Sample'] == s]} 1024 for s in sorted({r['Sample'] for r in self}) 1025 } 1026 self.anchors = {s: self.samples[s] for s in self.samples if s in self.Nominal_D4x} 1027 self.unknowns = {s: self.samples[s] for s in self.samples if s not in self.Nominal_D4x} 1028 1029 1030 def read(self, filename, sep = '', session = ''): 1031 ''' 1032 Read file in csv format to load data into a `D47data` object. 1033 1034 In the csv file, spaces before and after field separators (`','` by default) 1035 are optional. Each line corresponds to a single analysis. 1036 1037 The required fields are: 1038 1039 + `UID`: a unique identifier 1040 + `Session`: an identifier for the analytical session 1041 + `Sample`: a sample identifier 1042 + `d45`, `d46`, and at least one of `d47` or `d48`: the working-gas delta values 1043 1044 Independently known oxygen-17 anomalies may be provided as `D17O` (in ‰ relative to 1045 VSMOW, λ = `self.LAMBDA_17`), and are otherwise assumed to be zero. Working-gas deltas `d47`, `d48` 1046 and `d49` are optional, and set to NaN by default. 1047 1048 **Parameters** 1049 1050 + `fileneme`: the path of the file to read 1051 + `sep`: csv separator delimiting the fields 1052 + `session`: set `Session` field to this string for all analyses 1053 ''' 1054 with open(filename) as fid: 1055 self.input(fid.read(), sep = sep, session = session) 1056 1057 1058 def input(self, txt, sep = '', session = ''): 1059 ''' 1060 Read `txt` string in csv format to load analysis data into a `D47data` object. 1061 1062 In the csv string, spaces before and after field separators (`','` by default) 1063 are optional. Each line corresponds to a single analysis. 1064 1065 The required fields are: 1066 1067 + `UID`: a unique identifier 1068 + `Session`: an identifier for the analytical session 1069 + `Sample`: a sample identifier 1070 + `d45`, `d46`, and at least one of `d47` or `d48`: the working-gas delta values 1071 1072 Independently known oxygen-17 anomalies may be provided as `D17O` (in ‰ relative to 1073 VSMOW, λ = `self.LAMBDA_17`), and are otherwise assumed to be zero. Working-gas deltas `d47`, `d48` 1074 and `d49` are optional, and set to NaN by default. 1075 1076 **Parameters** 1077 1078 + `txt`: the csv string to read 1079 + `sep`: csv separator delimiting the fields. By default, use `,`, `;`, or `\t`, 1080 whichever appers most often in `txt`. 1081 + `session`: set `Session` field to this string for all analyses 1082 ''' 1083 if sep == '': 1084 sep = sorted(',;\t', key = lambda x: - txt.count(x))[0] 1085 txt = [[x.strip() for x in l.split(sep)] for l in txt.splitlines() if l.strip()] 1086 data = [{k: v if k in ['UID', 'Session', 'Sample'] else smart_type(v) for k,v in zip(txt[0], l) if v != ''} for l in txt[1:]] 1087 1088 if session != '': 1089 for r in data: 1090 r['Session'] = session 1091 1092 self += data 1093 self.refresh() 1094 1095 1096 @make_verbal 1097 def wg(self, samples = None, a18_acid = None): 1098 ''' 1099 Compute bulk composition of the working gas for each session based on 1100 the carbonate standards defined in both `self.Nominal_d13C_VPDB` and 1101 `self.Nominal_d18O_VPDB`. 1102 ''' 1103 1104 self.msg('Computing WG composition:') 1105 1106 if a18_acid is None: 1107 a18_acid = self.ALPHA_18O_ACID_REACTION 1108 if samples is None: 1109 samples = [s for s in self.Nominal_d13C_VPDB if s in self.Nominal_d18O_VPDB] 1110 1111 assert a18_acid, f'Acid fractionation factor should not be zero.' 1112 1113 samples = [s for s in samples if s in self.Nominal_d13C_VPDB and s in self.Nominal_d18O_VPDB] 1114 R45R46_standards = {} 1115 for sample in samples: 1116 d13C_vpdb = self.Nominal_d13C_VPDB[sample] 1117 d18O_vpdb = self.Nominal_d18O_VPDB[sample] 1118 R13_s = self.R13_VPDB * (1 + d13C_vpdb / 1000) 1119 R17_s = self.R17_VPDB * ((1 + d18O_vpdb / 1000) * a18_acid) ** self.LAMBDA_17 1120 R18_s = self.R18_VPDB * (1 + d18O_vpdb / 1000) * a18_acid 1121 1122 C12_s = 1 / (1 + R13_s) 1123 C13_s = R13_s / (1 + R13_s) 1124 C16_s = 1 / (1 + R17_s + R18_s) 1125 C17_s = R17_s / (1 + R17_s + R18_s) 1126 C18_s = R18_s / (1 + R17_s + R18_s) 1127 1128 C626_s = C12_s * C16_s ** 2 1129 C627_s = 2 * C12_s * C16_s * C17_s 1130 C628_s = 2 * C12_s * C16_s * C18_s 1131 C636_s = C13_s * C16_s ** 2 1132 C637_s = 2 * C13_s * C16_s * C17_s 1133 C727_s = C12_s * C17_s ** 2 1134 1135 R45_s = (C627_s + C636_s) / C626_s 1136 R46_s = (C628_s + C637_s + C727_s) / C626_s 1137 R45R46_standards[sample] = (R45_s, R46_s) 1138 1139 for s in self.sessions: 1140 db = [r for r in self.sessions[s]['data'] if r['Sample'] in samples] 1141 assert db, f'No sample from {samples} found in session "{s}".' 1142# dbsamples = sorted({r['Sample'] for r in db}) 1143 1144 X = [r['d45'] for r in db] 1145 Y = [R45R46_standards[r['Sample']][0] for r in db] 1146 x1, x2 = np.min(X), np.max(X) 1147 1148 if x1 < x2: 1149 wgcoord = x1/(x1-x2) 1150 else: 1151 wgcoord = 999 1152 1153 if wgcoord < -.5 or wgcoord > 1.5: 1154 # unreasonable to extrapolate to d45 = 0 1155 R45_wg = np.mean([y/(1+x/1000) for x,y in zip(X,Y)]) 1156 else : 1157 # d45 = 0 is reasonably well bracketed 1158 R45_wg = np.polyfit(X, Y, 1)[1] 1159 1160 X = [r['d46'] for r in db] 1161 Y = [R45R46_standards[r['Sample']][1] for r in db] 1162 x1, x2 = np.min(X), np.max(X) 1163 1164 if x1 < x2: 1165 wgcoord = x1/(x1-x2) 1166 else: 1167 wgcoord = 999 1168 1169 if wgcoord < -.5 or wgcoord > 1.5: 1170 # unreasonable to extrapolate to d46 = 0 1171 R46_wg = np.mean([y/(1+x/1000) for x,y in zip(X,Y)]) 1172 else : 1173 # d46 = 0 is reasonably well bracketed 1174 R46_wg = np.polyfit(X, Y, 1)[1] 1175 1176 d13Cwg_VPDB, d18Owg_VSMOW = self.compute_bulk_delta(R45_wg, R46_wg) 1177 1178 self.msg(f'Session {s} WG: δ13C_VPDB = {d13Cwg_VPDB:.3f} δ18O_VSMOW = {d18Owg_VSMOW:.3f}') 1179 1180 self.sessions[s]['d13Cwg_VPDB'] = d13Cwg_VPDB 1181 self.sessions[s]['d18Owg_VSMOW'] = d18Owg_VSMOW 1182 for r in self.sessions[s]['data']: 1183 r['d13Cwg_VPDB'] = d13Cwg_VPDB 1184 r['d18Owg_VSMOW'] = d18Owg_VSMOW 1185 1186 1187 def compute_bulk_delta(self, R45, R46, D17O = 0): 1188 ''' 1189 Compute δ13C_VPDB and δ18O_VSMOW, 1190 by solving the generalized form of equation (17) from 1191 [Brand et al. (2010)](https://doi.org/10.1351/PAC-REP-09-01-05), 1192 assuming that δ18O_VSMOW is not too big (0 ± 50 ‰) and 1193 solving the corresponding second-order Taylor polynomial. 1194 (Appendix A of [Daëron et al., 2016](https://doi.org/10.1016/j.chemgeo.2016.08.014)) 1195 ''' 1196 1197 K = np.exp(D17O / 1000) * self.R17_VSMOW * self.R18_VSMOW ** -self.LAMBDA_17 1198 1199 A = -3 * K ** 2 * self.R18_VSMOW ** (2 * self.LAMBDA_17) 1200 B = 2 * K * R45 * self.R18_VSMOW ** self.LAMBDA_17 1201 C = 2 * self.R18_VSMOW 1202 D = -R46 1203 1204 aa = A * self.LAMBDA_17 * (2 * self.LAMBDA_17 - 1) + B * self.LAMBDA_17 * (self.LAMBDA_17 - 1) / 2 1205 bb = 2 * A * self.LAMBDA_17 + B * self.LAMBDA_17 + C 1206 cc = A + B + C + D 1207 1208 d18O_VSMOW = 1000 * (-bb + (bb ** 2 - 4 * aa * cc) ** .5) / (2 * aa) 1209 1210 R18 = (1 + d18O_VSMOW / 1000) * self.R18_VSMOW 1211 R17 = K * R18 ** self.LAMBDA_17 1212 R13 = R45 - 2 * R17 1213 1214 d13C_VPDB = 1000 * (R13 / self.R13_VPDB - 1) 1215 1216 return d13C_VPDB, d18O_VSMOW 1217 1218 1219 @make_verbal 1220 def crunch(self, verbose = ''): 1221 ''' 1222 Compute bulk composition and raw clumped isotope anomalies for all analyses. 1223 ''' 1224 for r in self: 1225 self.compute_bulk_and_clumping_deltas(r) 1226 self.standardize_d13C() 1227 self.standardize_d18O() 1228 self.msg(f"Crunched {len(self)} analyses.") 1229 1230 1231 def fill_in_missing_info(self, session = 'mySession'): 1232 ''' 1233 Fill in optional fields with default values 1234 ''' 1235 for i,r in enumerate(self): 1236 if 'D17O' not in r: 1237 r['D17O'] = 0. 1238 if 'UID' not in r: 1239 r['UID'] = f'{i+1}' 1240 if 'Session' not in r: 1241 r['Session'] = session 1242 for k in ['d47', 'd48', 'd49']: 1243 if k not in r: 1244 r[k] = np.nan 1245 1246 1247 def standardize_d13C(self): 1248 ''' 1249 Perform δ13C standadization within each session `s` according to 1250 `self.sessions[s]['d13C_standardization_method']`, which is defined by default 1251 by `D47data.refresh_sessions()`as equal to `self.d13C_STANDARDIZATION_METHOD`, but 1252 may be redefined abitrarily at a later stage. 1253 ''' 1254 for s in self.sessions: 1255 if self.sessions[s]['d13C_standardization_method'] in ['1pt', '2pt']: 1256 XY = [(r['d13C_VPDB'], self.Nominal_d13C_VPDB[r['Sample']]) for r in self.sessions[s]['data'] if r['Sample'] in self.Nominal_d13C_VPDB] 1257 X,Y = zip(*XY) 1258 if self.sessions[s]['d13C_standardization_method'] == '1pt': 1259 offset = np.mean(Y) - np.mean(X) 1260 for r in self.sessions[s]['data']: 1261 r['d13C_VPDB'] += offset 1262 elif self.sessions[s]['d13C_standardization_method'] == '2pt': 1263 a,b = np.polyfit(X,Y,1) 1264 for r in self.sessions[s]['data']: 1265 r['d13C_VPDB'] = a * r['d13C_VPDB'] + b 1266 1267 def standardize_d18O(self): 1268 ''' 1269 Perform δ18O standadization within each session `s` according to 1270 `self.ALPHA_18O_ACID_REACTION` and `self.sessions[s]['d18O_standardization_method']`, 1271 which is defined by default by `D47data.refresh_sessions()`as equal to 1272 `self.d18O_STANDARDIZATION_METHOD`, but may be redefined abitrarily at a later stage. 1273 ''' 1274 for s in self.sessions: 1275 if self.sessions[s]['d18O_standardization_method'] in ['1pt', '2pt']: 1276 XY = [(r['d18O_VSMOW'], self.Nominal_d18O_VPDB[r['Sample']]) for r in self.sessions[s]['data'] if r['Sample'] in self.Nominal_d18O_VPDB] 1277 X,Y = zip(*XY) 1278 Y = [(1000+y) * self.R18_VPDB * self.ALPHA_18O_ACID_REACTION / self.R18_VSMOW - 1000 for y in Y] 1279 if self.sessions[s]['d18O_standardization_method'] == '1pt': 1280 offset = np.mean(Y) - np.mean(X) 1281 for r in self.sessions[s]['data']: 1282 r['d18O_VSMOW'] += offset 1283 elif self.sessions[s]['d18O_standardization_method'] == '2pt': 1284 a,b = np.polyfit(X,Y,1) 1285 for r in self.sessions[s]['data']: 1286 r['d18O_VSMOW'] = a * r['d18O_VSMOW'] + b 1287 1288 1289 def compute_bulk_and_clumping_deltas(self, r): 1290 ''' 1291 Compute δ13C_VPDB, δ18O_VSMOW, and raw Δ47, Δ48, Δ49 values for a single analysis `r`. 1292 ''' 1293 1294 # Compute working gas R13, R18, and isobar ratios 1295 R13_wg = self.R13_VPDB * (1 + r['d13Cwg_VPDB'] / 1000) 1296 R18_wg = self.R18_VSMOW * (1 + r['d18Owg_VSMOW'] / 1000) 1297 R45_wg, R46_wg, R47_wg, R48_wg, R49_wg = self.compute_isobar_ratios(R13_wg, R18_wg) 1298 1299 # Compute analyte isobar ratios 1300 R45 = (1 + r['d45'] / 1000) * R45_wg 1301 R46 = (1 + r['d46'] / 1000) * R46_wg 1302 R47 = (1 + r['d47'] / 1000) * R47_wg 1303 R48 = (1 + r['d48'] / 1000) * R48_wg 1304 R49 = (1 + r['d49'] / 1000) * R49_wg 1305 1306 r['d13C_VPDB'], r['d18O_VSMOW'] = self.compute_bulk_delta(R45, R46, D17O = r['D17O']) 1307 R13 = (1 + r['d13C_VPDB'] / 1000) * self.R13_VPDB 1308 R18 = (1 + r['d18O_VSMOW'] / 1000) * self.R18_VSMOW 1309 1310 # Compute stochastic isobar ratios of the analyte 1311 R45stoch, R46stoch, R47stoch, R48stoch, R49stoch = self.compute_isobar_ratios( 1312 R13, R18, D17O = r['D17O'] 1313 ) 1314 1315 # Check that R45/R45stoch and R46/R46stoch are undistinguishable from 1, 1316 # and raise a warning if the corresponding anomalies exceed 0.02 ppm. 1317 if (R45 / R45stoch - 1) > 5e-8: 1318 self.vmsg(f'This is unexpected: R45/R45stoch - 1 = {1e6 * (R45 / R45stoch - 1):.3f} ppm') 1319 if (R46 / R46stoch - 1) > 5e-8: 1320 self.vmsg(f'This is unexpected: R46/R46stoch - 1 = {1e6 * (R46 / R46stoch - 1):.3f} ppm') 1321 1322 # Compute raw clumped isotope anomalies 1323 r['D47raw'] = 1000 * (R47 / R47stoch - 1) 1324 r['D48raw'] = 1000 * (R48 / R48stoch - 1) 1325 r['D49raw'] = 1000 * (R49 / R49stoch - 1) 1326 1327 1328 def compute_isobar_ratios(self, R13, R18, D17O=0, D47=0, D48=0, D49=0): 1329 ''' 1330 Compute isobar ratios for a sample with isotopic ratios `R13` and `R18`, 1331 optionally accounting for non-zero values of Δ17O (`D17O`) and clumped isotope 1332 anomalies (`D47`, `D48`, `D49`), all expressed in permil. 1333 ''' 1334 1335 # Compute R17 1336 R17 = self.R17_VSMOW * np.exp(D17O / 1000) * (R18 / self.R18_VSMOW) ** self.LAMBDA_17 1337 1338 # Compute isotope concentrations 1339 C12 = (1 + R13) ** -1 1340 C13 = C12 * R13 1341 C16 = (1 + R17 + R18) ** -1 1342 C17 = C16 * R17 1343 C18 = C16 * R18 1344 1345 # Compute stochastic isotopologue concentrations 1346 C626 = C16 * C12 * C16 1347 C627 = C16 * C12 * C17 * 2 1348 C628 = C16 * C12 * C18 * 2 1349 C636 = C16 * C13 * C16 1350 C637 = C16 * C13 * C17 * 2 1351 C638 = C16 * C13 * C18 * 2 1352 C727 = C17 * C12 * C17 1353 C728 = C17 * C12 * C18 * 2 1354 C737 = C17 * C13 * C17 1355 C738 = C17 * C13 * C18 * 2 1356 C828 = C18 * C12 * C18 1357 C838 = C18 * C13 * C18 1358 1359 # Compute stochastic isobar ratios 1360 R45 = (C636 + C627) / C626 1361 R46 = (C628 + C637 + C727) / C626 1362 R47 = (C638 + C728 + C737) / C626 1363 R48 = (C738 + C828) / C626 1364 R49 = C838 / C626 1365 1366 # Account for stochastic anomalies 1367 R47 *= 1 + D47 / 1000 1368 R48 *= 1 + D48 / 1000 1369 R49 *= 1 + D49 / 1000 1370 1371 # Return isobar ratios 1372 return R45, R46, R47, R48, R49 1373 1374 1375 def split_samples(self, samples_to_split = 'all', grouping = 'by_session'): 1376 ''' 1377 Split unknown samples by UID (treat all analyses as different samples) 1378 or by session (treat analyses of a given sample in different sessions as 1379 different samples). 1380 1381 **Parameters** 1382 1383 + `samples_to_split`: a list of samples to split, e.g., `['IAEA-C1', 'IAEA-C2']` 1384 + `grouping`: `by_uid` | `by_session` 1385 ''' 1386 if samples_to_split == 'all': 1387 samples_to_split = [s for s in self.unknowns] 1388 gkeys = {'by_uid':'UID', 'by_session':'Session'} 1389 self.grouping = grouping.lower() 1390 if self.grouping in gkeys: 1391 gkey = gkeys[self.grouping] 1392 for r in self: 1393 if r['Sample'] in samples_to_split: 1394 r['Sample_original'] = r['Sample'] 1395 r['Sample'] = f"{r['Sample']}__{r[gkey]}" 1396 elif r['Sample'] in self.unknowns: 1397 r['Sample_original'] = r['Sample'] 1398 self.refresh_samples() 1399 1400 1401 def unsplit_samples(self, tables = False): 1402 ''' 1403 Reverse the effects of `D47data.split_samples()`. 1404 1405 This should only be used after `D4xdata.standardize()` with `method='pooled'`. 1406 1407 After `D4xdata.standardize()` with `method='indep_sessions'`, one should 1408 probably use `D4xdata.combine_samples()` instead to reverse the effects of 1409 `D47data.split_samples()` with `grouping='by_uid'`, or `w_avg()` to reverse the 1410 effects of `D47data.split_samples()` with `grouping='by_sessions'` (because in 1411 that case session-averaged Δ4x values are statistically independent). 1412 ''' 1413 unknowns_old = sorted({s for s in self.unknowns}) 1414 CM_old = self.standardization.covar[:,:] 1415 VD_old = self.standardization.params.valuesdict().copy() 1416 vars_old = self.standardization.var_names 1417 1418 unknowns_new = sorted({r['Sample_original'] for r in self if 'Sample_original' in r}) 1419 1420 Ns = len(vars_old) - len(unknowns_old) 1421 vars_new = vars_old[:Ns] + [f'D{self._4x}_{pf(u)}' for u in unknowns_new] 1422 VD_new = {k: VD_old[k] for k in vars_old[:Ns]} 1423 1424 W = np.zeros((len(vars_new), len(vars_old))) 1425 W[:Ns,:Ns] = np.eye(Ns) 1426 for u in unknowns_new: 1427 splits = sorted({r['Sample'] for r in self if 'Sample_original' in r and r['Sample_original'] == u}) 1428 if self.grouping == 'by_session': 1429 weights = [self.samples[s][f'SE_D{self._4x}']**-2 for s in splits] 1430 elif self.grouping == 'by_uid': 1431 weights = [1 for s in splits] 1432 sw = sum(weights) 1433 weights = [w/sw for w in weights] 1434 W[vars_new.index(f'D{self._4x}_{pf(u)}'),[vars_old.index(f'D{self._4x}_{pf(s)}') for s in splits]] = weights[:] 1435 1436 CM_new = W @ CM_old @ W.T 1437 V = W @ np.array([[VD_old[k]] for k in vars_old]) 1438 VD_new = {k:v[0] for k,v in zip(vars_new, V)} 1439 1440 self.standardization.covar = CM_new 1441 self.standardization.params.valuesdict = lambda : VD_new 1442 self.standardization.var_names = vars_new 1443 1444 for r in self: 1445 if r['Sample'] in self.unknowns: 1446 r['Sample_split'] = r['Sample'] 1447 r['Sample'] = r['Sample_original'] 1448 1449 self.refresh_samples() 1450 self.consolidate_samples() 1451 self.repeatabilities() 1452 1453 if tables: 1454 self.table_of_analyses() 1455 self.table_of_samples() 1456 1457 def assign_timestamps(self): 1458 ''' 1459 Assign a time field `t` of type `float` to each analysis. 1460 1461 If `TimeTag` is one of the data fields, `t` is equal within a given session 1462 to `TimeTag` minus the mean value of `TimeTag` for that session. 1463 Otherwise, `TimeTag` is by default equal to the index of each analysis 1464 in the dataset and `t` is defined as above. 1465 ''' 1466 for session in self.sessions: 1467 sdata = self.sessions[session]['data'] 1468 try: 1469 t0 = np.mean([r['TimeTag'] for r in sdata]) 1470 for r in sdata: 1471 r['t'] = r['TimeTag'] - t0 1472 except KeyError: 1473 t0 = (len(sdata)-1)/2 1474 for t,r in enumerate(sdata): 1475 r['t'] = t - t0 1476 1477 1478 def report(self): 1479 ''' 1480 Prints a report on the standardization fit. 1481 Only applicable after `D4xdata.standardize(method='pooled')`. 1482 ''' 1483 report_fit(self.standardization) 1484 1485 1486 def combine_samples(self, sample_groups): 1487 ''' 1488 Combine analyses of different samples to compute weighted average Δ4x 1489 and new error (co)variances corresponding to the groups defined by the `sample_groups` 1490 dictionary. 1491 1492 Caution: samples are weighted by number of replicate analyses, which is a 1493 reasonable default behavior but is not always optimal (e.g., in the case of strongly 1494 correlated analytical errors for one or more samples). 1495 1496 Returns a tuplet of: 1497 1498 + the list of group names 1499 + an array of the corresponding Δ4x values 1500 + the corresponding (co)variance matrix 1501 1502 **Parameters** 1503 1504 + `sample_groups`: a dictionary of the form: 1505 ```py 1506 {'group1': ['sample_1', 'sample_2'], 1507 'group2': ['sample_3', 'sample_4', 'sample_5']} 1508 ``` 1509 ''' 1510 1511 samples = [s for k in sorted(sample_groups.keys()) for s in sorted(sample_groups[k])] 1512 groups = sorted(sample_groups.keys()) 1513 group_total_weights = {k: sum([self.samples[s]['N'] for s in sample_groups[k]]) for k in groups} 1514 D4x_old = np.array([[self.samples[x][f'D{self._4x}']] for x in samples]) 1515 CM_old = np.array([[self.sample_D4x_covar(x,y) for x in samples] for y in samples]) 1516 W = np.array([ 1517 [self.samples[i]['N']/group_total_weights[j] if i in sample_groups[j] else 0 for i in samples] 1518 for j in groups]) 1519 D4x_new = W @ D4x_old 1520 CM_new = W @ CM_old @ W.T 1521 1522 return groups, D4x_new[:,0], CM_new 1523 1524 1525 @make_verbal 1526 def standardize(self, 1527 method = 'pooled', 1528 weighted_sessions = [], 1529 consolidate = True, 1530 consolidate_tables = False, 1531 consolidate_plots = False, 1532 constraints = {}, 1533 ): 1534 ''' 1535 Compute absolute Δ4x values for all replicate analyses and for sample averages. 1536 If `method` argument is set to `'pooled'`, the standardization processes all sessions 1537 in a single step, assuming that all samples (anchors and unknowns alike) are homogeneous, 1538 i.e. that their true Δ4x value does not change between sessions, 1539 ([Daëron, 2021](https://doi.org/10.1029/2020GC009592)). If `method` argument is set to 1540 `'indep_sessions'`, the standardization processes each session independently, based only 1541 on anchors analyses. 1542 ''' 1543 1544 self.standardization_method = method 1545 self.assign_timestamps() 1546 1547 if method == 'pooled': 1548 if weighted_sessions: 1549 for session_group in weighted_sessions: 1550 if self._4x == '47': 1551 X = D47data([r for r in self if r['Session'] in session_group]) 1552 elif self._4x == '48': 1553 X = D48data([r for r in self if r['Session'] in session_group]) 1554 X.Nominal_D4x = self.Nominal_D4x.copy() 1555 X.refresh() 1556 result = X.standardize(method = 'pooled', weighted_sessions = [], consolidate = False) 1557 w = np.sqrt(result.redchi) 1558 self.msg(f'Session group {session_group} MRSWD = {w:.4f}') 1559 for r in X: 1560 r[f'wD{self._4x}raw'] *= w 1561 else: 1562 self.msg(f'All D{self._4x}raw weights set to 1 ‰') 1563 for r in self: 1564 r[f'wD{self._4x}raw'] = 1. 1565 1566 params = Parameters() 1567 for k,session in enumerate(self.sessions): 1568 self.msg(f"Session {session}: scrambling_drift is {self.sessions[session]['scrambling_drift']}.") 1569 self.msg(f"Session {session}: slope_drift is {self.sessions[session]['slope_drift']}.") 1570 self.msg(f"Session {session}: wg_drift is {self.sessions[session]['wg_drift']}.") 1571 s = pf(session) 1572 params.add(f'a_{s}', value = 0.9) 1573 params.add(f'b_{s}', value = 0.) 1574 params.add(f'c_{s}', value = -0.9) 1575 params.add(f'a2_{s}', value = 0., 1576# vary = self.sessions[session]['scrambling_drift'], 1577 ) 1578 params.add(f'b2_{s}', value = 0., 1579# vary = self.sessions[session]['slope_drift'], 1580 ) 1581 params.add(f'c2_{s}', value = 0., 1582# vary = self.sessions[session]['wg_drift'], 1583 ) 1584 if not self.sessions[session]['scrambling_drift']: 1585 params[f'a2_{s}'].expr = '0' 1586 if not self.sessions[session]['slope_drift']: 1587 params[f'b2_{s}'].expr = '0' 1588 if not self.sessions[session]['wg_drift']: 1589 params[f'c2_{s}'].expr = '0' 1590 1591 for sample in self.unknowns: 1592 params.add(f'D{self._4x}_{pf(sample)}', value = 0.5) 1593 1594 for k in constraints: 1595 params[k].expr = constraints[k] 1596 1597 def residuals(p): 1598 R = [] 1599 for r in self: 1600 session = pf(r['Session']) 1601 sample = pf(r['Sample']) 1602 if r['Sample'] in self.Nominal_D4x: 1603 R += [ ( 1604 r[f'D{self._4x}raw'] - ( 1605 p[f'a_{session}'] * self.Nominal_D4x[r['Sample']] 1606 + p[f'b_{session}'] * r[f'd{self._4x}'] 1607 + p[f'c_{session}'] 1608 + r['t'] * ( 1609 p[f'a2_{session}'] * self.Nominal_D4x[r['Sample']] 1610 + p[f'b2_{session}'] * r[f'd{self._4x}'] 1611 + p[f'c2_{session}'] 1612 ) 1613 ) 1614 ) / r[f'wD{self._4x}raw'] ] 1615 else: 1616 R += [ ( 1617 r[f'D{self._4x}raw'] - ( 1618 p[f'a_{session}'] * p[f'D{self._4x}_{sample}'] 1619 + p[f'b_{session}'] * r[f'd{self._4x}'] 1620 + p[f'c_{session}'] 1621 + r['t'] * ( 1622 p[f'a2_{session}'] * p[f'D{self._4x}_{sample}'] 1623 + p[f'b2_{session}'] * r[f'd{self._4x}'] 1624 + p[f'c2_{session}'] 1625 ) 1626 ) 1627 ) / r[f'wD{self._4x}raw'] ] 1628 return R 1629 1630 M = Minimizer(residuals, params) 1631 result = M.least_squares() 1632 self.Nf = result.nfree 1633 self.t95 = tstudent.ppf(1 - 0.05/2, self.Nf) 1634 new_names, new_covar, new_se = _fullcovar(result)[:3] 1635 result.var_names = new_names 1636 result.covar = new_covar 1637 1638 for r in self: 1639 s = pf(r["Session"]) 1640 a = result.params.valuesdict()[f'a_{s}'] 1641 b = result.params.valuesdict()[f'b_{s}'] 1642 c = result.params.valuesdict()[f'c_{s}'] 1643 a2 = result.params.valuesdict()[f'a2_{s}'] 1644 b2 = result.params.valuesdict()[f'b2_{s}'] 1645 c2 = result.params.valuesdict()[f'c2_{s}'] 1646 r[f'D{self._4x}'] = (r[f'D{self._4x}raw'] - c - b * r[f'd{self._4x}'] - c2 * r['t'] - b2 * r['t'] * r[f'd{self._4x}']) / (a + a2 * r['t']) 1647 1648 1649 self.standardization = result 1650 1651 for session in self.sessions: 1652 self.sessions[session]['Np'] = 3 1653 for k in ['scrambling', 'slope', 'wg']: 1654 if self.sessions[session][f'{k}_drift']: 1655 self.sessions[session]['Np'] += 1 1656 1657 if consolidate: 1658 self.consolidate(tables = consolidate_tables, plots = consolidate_plots) 1659 return result 1660 1661 1662 elif method == 'indep_sessions': 1663 1664 if weighted_sessions: 1665 for session_group in weighted_sessions: 1666 X = D4xdata([r for r in self if r['Session'] in session_group], mass = self._4x) 1667 X.Nominal_D4x = self.Nominal_D4x.copy() 1668 X.refresh() 1669 # This is only done to assign r['wD47raw'] for r in X: 1670 X.standardize(method = method, weighted_sessions = [], consolidate = False) 1671 self.msg(f'D{self._4x}raw weights set to {1000*X[0][f"wD{self._4x}raw"]:.1f} ppm for sessions in {session_group}') 1672 else: 1673 self.msg('All weights set to 1 ‰') 1674 for r in self: 1675 r[f'wD{self._4x}raw'] = 1 1676 1677 for session in self.sessions: 1678 s = self.sessions[session] 1679 p_names = ['a', 'b', 'c', 'a2', 'b2', 'c2'] 1680 p_active = [True, True, True, s['scrambling_drift'], s['slope_drift'], s['wg_drift']] 1681 s['Np'] = sum(p_active) 1682 sdata = s['data'] 1683 1684 A = np.array([ 1685 [ 1686 self.Nominal_D4x[r['Sample']] / r[f'wD{self._4x}raw'], 1687 r[f'd{self._4x}'] / r[f'wD{self._4x}raw'], 1688 1 / r[f'wD{self._4x}raw'], 1689 self.Nominal_D4x[r['Sample']] * r['t'] / r[f'wD{self._4x}raw'], 1690 r[f'd{self._4x}'] * r['t'] / r[f'wD{self._4x}raw'], 1691 r['t'] / r[f'wD{self._4x}raw'] 1692 ] 1693 for r in sdata if r['Sample'] in self.anchors 1694 ])[:,p_active] # only keep columns for the active parameters 1695 Y = np.array([[r[f'D{self._4x}raw'] / r[f'wD{self._4x}raw']] for r in sdata if r['Sample'] in self.anchors]) 1696 s['Na'] = Y.size 1697 CM = linalg.inv(A.T @ A) 1698 bf = (CM @ A.T @ Y).T[0,:] 1699 k = 0 1700 for n,a in zip(p_names, p_active): 1701 if a: 1702 s[n] = bf[k] 1703# self.msg(f'{n} = {bf[k]}') 1704 k += 1 1705 else: 1706 s[n] = 0. 1707# self.msg(f'{n} = 0.0') 1708 1709 for r in sdata : 1710 a, b, c, a2, b2, c2 = s['a'], s['b'], s['c'], s['a2'], s['b2'], s['c2'] 1711 r[f'D{self._4x}'] = (r[f'D{self._4x}raw'] - c - b * r[f'd{self._4x}'] - c2 * r['t'] - b2 * r['t'] * r[f'd{self._4x}']) / (a + a2 * r['t']) 1712 r[f'wD{self._4x}'] = r[f'wD{self._4x}raw'] / (a + a2 * r['t']) 1713 1714 s['CM'] = np.zeros((6,6)) 1715 i = 0 1716 k_active = [j for j,a in enumerate(p_active) if a] 1717 for j,a in enumerate(p_active): 1718 if a: 1719 s['CM'][j,k_active] = CM[i,:] 1720 i += 1 1721 1722 if not weighted_sessions: 1723 w = self.rmswd()['rmswd'] 1724 for r in self: 1725 r[f'wD{self._4x}'] *= w 1726 r[f'wD{self._4x}raw'] *= w 1727 for session in self.sessions: 1728 self.sessions[session]['CM'] *= w**2 1729 1730 for session in self.sessions: 1731 s = self.sessions[session] 1732 s['SE_a'] = s['CM'][0,0]**.5 1733 s['SE_b'] = s['CM'][1,1]**.5 1734 s['SE_c'] = s['CM'][2,2]**.5 1735 s['SE_a2'] = s['CM'][3,3]**.5 1736 s['SE_b2'] = s['CM'][4,4]**.5 1737 s['SE_c2'] = s['CM'][5,5]**.5 1738 1739 if not weighted_sessions: 1740 self.Nf = len(self) - len(self.unknowns) - np.sum([self.sessions[s]['Np'] for s in self.sessions]) 1741 else: 1742 self.Nf = 0 1743 for sg in weighted_sessions: 1744 self.Nf += self.rmswd(sessions = sg)['Nf'] 1745 1746 self.t95 = tstudent.ppf(1 - 0.05/2, self.Nf) 1747 1748 avgD4x = { 1749 sample: np.mean([r[f'D{self._4x}'] for r in self if r['Sample'] == sample]) 1750 for sample in self.samples 1751 } 1752 chi2 = np.sum([(r[f'D{self._4x}'] - avgD4x[r['Sample']])**2 for r in self]) 1753 rD4x = (chi2/self.Nf)**.5 1754 self.repeatability[f'sigma_{self._4x}'] = rD4x 1755 1756 if consolidate: 1757 self.consolidate(tables = consolidate_tables, plots = consolidate_plots) 1758 1759 1760 def standardization_error(self, session, d4x, D4x, t = 0): 1761 ''' 1762 Compute standardization error for a given session and 1763 (δ47, Δ47) composition. 1764 ''' 1765 a = self.sessions[session]['a'] 1766 b = self.sessions[session]['b'] 1767 c = self.sessions[session]['c'] 1768 a2 = self.sessions[session]['a2'] 1769 b2 = self.sessions[session]['b2'] 1770 c2 = self.sessions[session]['c2'] 1771 CM = self.sessions[session]['CM'] 1772 1773 x, y = D4x, d4x 1774 z = a * x + b * y + c + a2 * x * t + b2 * y * t + c2 * t 1775# x = (z - b*y - b2*y*t - c - c2*t) / (a+a2*t) 1776 dxdy = -(b+b2*t) / (a+a2*t) 1777 dxdz = 1. / (a+a2*t) 1778 dxda = -x / (a+a2*t) 1779 dxdb = -y / (a+a2*t) 1780 dxdc = -1. / (a+a2*t) 1781 dxda2 = -x * a2 / (a+a2*t) 1782 dxdb2 = -y * t / (a+a2*t) 1783 dxdc2 = -t / (a+a2*t) 1784 V = np.array([dxda, dxdb, dxdc, dxda2, dxdb2, dxdc2]) 1785 sx = (V @ CM @ V.T) ** .5 1786 return sx 1787 1788 1789 @make_verbal 1790 def summary(self, 1791 dir = 'output', 1792 filename = None, 1793 save_to_file = True, 1794 print_out = True, 1795 ): 1796 ''' 1797 Print out an/or save to disk a summary of the standardization results. 1798 1799 **Parameters** 1800 1801 + `dir`: the directory in which to save the table 1802 + `filename`: the name to the csv file to write to 1803 + `save_to_file`: whether to save the table to disk 1804 + `print_out`: whether to print out the table 1805 ''' 1806 1807 out = [] 1808 out += [['N samples (anchors + unknowns)', f"{len(self.samples)} ({len(self.anchors)} + {len(self.unknowns)})"]] 1809 out += [['N analyses (anchors + unknowns)', f"{len(self)} ({len([r for r in self if r['Sample'] in self.anchors])} + {len([r for r in self if r['Sample'] in self.unknowns])})"]] 1810 out += [['Repeatability of δ13C_VPDB', f"{1000 * self.repeatability['r_d13C_VPDB']:.1f} ppm"]] 1811 out += [['Repeatability of δ18O_VSMOW', f"{1000 * self.repeatability['r_d18O_VSMOW']:.1f} ppm"]] 1812 out += [[f'Repeatability of Δ{self._4x} (anchors)', f"{1000 * self.repeatability[f'r_D{self._4x}a']:.1f} ppm"]] 1813 out += [[f'Repeatability of Δ{self._4x} (unknowns)', f"{1000 * self.repeatability[f'r_D{self._4x}u']:.1f} ppm"]] 1814 out += [[f'Repeatability of Δ{self._4x} (all)', f"{1000 * self.repeatability[f'r_D{self._4x}']:.1f} ppm"]] 1815 out += [['Model degrees of freedom', f"{self.Nf}"]] 1816 out += [['Student\'s 95% t-factor', f"{self.t95:.2f}"]] 1817 out += [['Standardization method', self.standardization_method]] 1818 1819 if save_to_file: 1820 if not os.path.exists(dir): 1821 os.makedirs(dir) 1822 if filename is None: 1823 filename = f'D{self._4x}_summary.csv' 1824 with open(f'{dir}/{filename}', 'w') as fid: 1825 fid.write(make_csv(out)) 1826 if print_out: 1827 self.msg('\n' + pretty_table(out, header = 0)) 1828 1829 1830 @make_verbal 1831 def table_of_sessions(self, 1832 dir = 'output', 1833 filename = None, 1834 save_to_file = True, 1835 print_out = True, 1836 output = None, 1837 ): 1838 ''' 1839 Print out an/or save to disk a table of sessions. 1840 1841 **Parameters** 1842 1843 + `dir`: the directory in which to save the table 1844 + `filename`: the name to the csv file to write to 1845 + `save_to_file`: whether to save the table to disk 1846 + `print_out`: whether to print out the table 1847 + `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`); 1848 if set to `'raw'`: return a list of list of strings 1849 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 1850 ''' 1851 include_a2 = any([self.sessions[session]['scrambling_drift'] for session in self.sessions]) 1852 include_b2 = any([self.sessions[session]['slope_drift'] for session in self.sessions]) 1853 include_c2 = any([self.sessions[session]['wg_drift'] for session in self.sessions]) 1854 1855 out = [['Session','Na','Nu','d13Cwg_VPDB','d18Owg_VSMOW','r_d13C','r_d18O',f'r_D{self._4x}','a ± SE','1e3 x b ± SE','c ± SE']] 1856 if include_a2: 1857 out[-1] += ['a2 ± SE'] 1858 if include_b2: 1859 out[-1] += ['b2 ± SE'] 1860 if include_c2: 1861 out[-1] += ['c2 ± SE'] 1862 for session in self.sessions: 1863 out += [[ 1864 session, 1865 f"{self.sessions[session]['Na']}", 1866 f"{self.sessions[session]['Nu']}", 1867 f"{self.sessions[session]['d13Cwg_VPDB']:.3f}", 1868 f"{self.sessions[session]['d18Owg_VSMOW']:.3f}", 1869 f"{self.sessions[session]['r_d13C_VPDB']:.4f}", 1870 f"{self.sessions[session]['r_d18O_VSMOW']:.4f}", 1871 f"{self.sessions[session][f'r_D{self._4x}']:.4f}", 1872 f"{self.sessions[session]['a']:.3f} ± {self.sessions[session]['SE_a']:.3f}", 1873 f"{1e3*self.sessions[session]['b']:.3f} ± {1e3*self.sessions[session]['SE_b']:.3f}", 1874 f"{self.sessions[session]['c']:.3f} ± {self.sessions[session]['SE_c']:.3f}", 1875 ]] 1876 if include_a2: 1877 if self.sessions[session]['scrambling_drift']: 1878 out[-1] += [f"{self.sessions[session]['a2']:.1e} ± {self.sessions[session]['SE_a2']:.1e}"] 1879 else: 1880 out[-1] += [''] 1881 if include_b2: 1882 if self.sessions[session]['slope_drift']: 1883 out[-1] += [f"{self.sessions[session]['b2']:.1e} ± {self.sessions[session]['SE_b2']:.1e}"] 1884 else: 1885 out[-1] += [''] 1886 if include_c2: 1887 if self.sessions[session]['wg_drift']: 1888 out[-1] += [f"{self.sessions[session]['c2']:.1e} ± {self.sessions[session]['SE_c2']:.1e}"] 1889 else: 1890 out[-1] += [''] 1891 1892 if save_to_file: 1893 if not os.path.exists(dir): 1894 os.makedirs(dir) 1895 if filename is None: 1896 filename = f'D{self._4x}_sessions.csv' 1897 with open(f'{dir}/{filename}', 'w') as fid: 1898 fid.write(make_csv(out)) 1899 if print_out: 1900 self.msg('\n' + pretty_table(out)) 1901 if output == 'raw': 1902 return out 1903 elif output == 'pretty': 1904 return pretty_table(out) 1905 1906 1907 @make_verbal 1908 def table_of_analyses( 1909 self, 1910 dir = 'output', 1911 filename = None, 1912 save_to_file = True, 1913 print_out = True, 1914 output = None, 1915 ): 1916 ''' 1917 Print out an/or save to disk a table of analyses. 1918 1919 **Parameters** 1920 1921 + `dir`: the directory in which to save the table 1922 + `filename`: the name to the csv file to write to 1923 + `save_to_file`: whether to save the table to disk 1924 + `print_out`: whether to print out the table 1925 + `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`); 1926 if set to `'raw'`: return a list of list of strings 1927 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 1928 ''' 1929 1930 out = [['UID','Session','Sample']] 1931 extra_fields = [f for f in [('SampleMass','.2f'),('ColdFingerPressure','.1f'),('AcidReactionYield','.3f')] if f[0] in {k for r in self for k in r}] 1932 for f in extra_fields: 1933 out[-1] += [f[0]] 1934 out[-1] += ['d13Cwg_VPDB','d18Owg_VSMOW','d45','d46','d47','d48','d49','d13C_VPDB','d18O_VSMOW','D47raw','D48raw','D49raw',f'D{self._4x}'] 1935 for r in self: 1936 out += [[f"{r['UID']}",f"{r['Session']}",f"{r['Sample']}"]] 1937 for f in extra_fields: 1938 out[-1] += [f"{r[f[0]]:{f[1]}}"] 1939 out[-1] += [ 1940 f"{r['d13Cwg_VPDB']:.3f}", 1941 f"{r['d18Owg_VSMOW']:.3f}", 1942 f"{r['d45']:.6f}", 1943 f"{r['d46']:.6f}", 1944 f"{r['d47']:.6f}", 1945 f"{r['d48']:.6f}", 1946 f"{r['d49']:.6f}", 1947 f"{r['d13C_VPDB']:.6f}", 1948 f"{r['d18O_VSMOW']:.6f}", 1949 f"{r['D47raw']:.6f}", 1950 f"{r['D48raw']:.6f}", 1951 f"{r['D49raw']:.6f}", 1952 f"{r[f'D{self._4x}']:.6f}" 1953 ] 1954 if save_to_file: 1955 if not os.path.exists(dir): 1956 os.makedirs(dir) 1957 if filename is None: 1958 filename = f'D{self._4x}_analyses.csv' 1959 with open(f'{dir}/{filename}', 'w') as fid: 1960 fid.write(make_csv(out)) 1961 if print_out: 1962 self.msg('\n' + pretty_table(out)) 1963 return out 1964 1965 @make_verbal 1966 def covar_table( 1967 self, 1968 correl = False, 1969 dir = 'output', 1970 filename = None, 1971 save_to_file = True, 1972 print_out = True, 1973 output = None, 1974 ): 1975 ''' 1976 Print out, save to disk and/or return the variance-covariance matrix of D4x 1977 for all unknown samples. 1978 1979 **Parameters** 1980 1981 + `dir`: the directory in which to save the csv 1982 + `filename`: the name of the csv file to write to 1983 + `save_to_file`: whether to save the csv 1984 + `print_out`: whether to print out the matrix 1985 + `output`: if set to `'pretty'`: return a pretty text matrix (see `pretty_table()`); 1986 if set to `'raw'`: return a list of list of strings 1987 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 1988 ''' 1989 samples = sorted([u for u in self.unknowns]) 1990 out = [[''] + samples] 1991 for s1 in samples: 1992 out.append([s1]) 1993 for s2 in samples: 1994 if correl: 1995 out[-1].append(f'{self.sample_D4x_correl(s1, s2):.6f}') 1996 else: 1997 out[-1].append(f'{self.sample_D4x_covar(s1, s2):.8e}') 1998 1999 if save_to_file: 2000 if not os.path.exists(dir): 2001 os.makedirs(dir) 2002 if filename is None: 2003 if correl: 2004 filename = f'D{self._4x}_correl.csv' 2005 else: 2006 filename = f'D{self._4x}_covar.csv' 2007 with open(f'{dir}/{filename}', 'w') as fid: 2008 fid.write(make_csv(out)) 2009 if print_out: 2010 self.msg('\n'+pretty_table(out)) 2011 if output == 'raw': 2012 return out 2013 elif output == 'pretty': 2014 return pretty_table(out) 2015 2016 @make_verbal 2017 def table_of_samples( 2018 self, 2019 dir = 'output', 2020 filename = None, 2021 save_to_file = True, 2022 print_out = True, 2023 output = None, 2024 ): 2025 ''' 2026 Print out, save to disk and/or return a table of samples. 2027 2028 **Parameters** 2029 2030 + `dir`: the directory in which to save the csv 2031 + `filename`: the name of the csv file to write to 2032 + `save_to_file`: whether to save the csv 2033 + `print_out`: whether to print out the table 2034 + `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`); 2035 if set to `'raw'`: return a list of list of strings 2036 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 2037 ''' 2038 2039 out = [['Sample','N','d13C_VPDB','d18O_VSMOW',f'D{self._4x}','SE','95% CL','SD','p_Levene']] 2040 for sample in self.anchors: 2041 out += [[ 2042 f"{sample}", 2043 f"{self.samples[sample]['N']}", 2044 f"{self.samples[sample]['d13C_VPDB']:.2f}", 2045 f"{self.samples[sample]['d18O_VSMOW']:.2f}", 2046 f"{self.samples[sample][f'D{self._4x}']:.4f}",'','', 2047 f"{self.samples[sample][f'SD_D{self._4x}']:.4f}" if self.samples[sample]['N'] > 1 else '', '' 2048 ]] 2049 for sample in self.unknowns: 2050 out += [[ 2051 f"{sample}", 2052 f"{self.samples[sample]['N']}", 2053 f"{self.samples[sample]['d13C_VPDB']:.2f}", 2054 f"{self.samples[sample]['d18O_VSMOW']:.2f}", 2055 f"{self.samples[sample][f'D{self._4x}']:.4f}", 2056 f"{self.samples[sample][f'SE_D{self._4x}']:.4f}", 2057 f"± {self.samples[sample][f'SE_D{self._4x}'] * self.t95:.4f}", 2058 f"{self.samples[sample][f'SD_D{self._4x}']:.4f}" if self.samples[sample]['N'] > 1 else '', 2059 f"{self.samples[sample]['p_Levene']:.3f}" if self.samples[sample]['N'] > 2 else '' 2060 ]] 2061 if save_to_file: 2062 if not os.path.exists(dir): 2063 os.makedirs(dir) 2064 if filename is None: 2065 filename = f'D{self._4x}_samples.csv' 2066 with open(f'{dir}/{filename}', 'w') as fid: 2067 fid.write(make_csv(out)) 2068 if print_out: 2069 self.msg('\n'+pretty_table(out)) 2070 if output == 'raw': 2071 return out 2072 elif output == 'pretty': 2073 return pretty_table(out) 2074 2075 2076 def plot_sessions(self, dir = 'output', figsize = (8,8), filetype = 'pdf', dpi = 100): 2077 ''' 2078 Generate session plots and save them to disk. 2079 2080 **Parameters** 2081 2082 + `dir`: the directory in which to save the plots 2083 + `figsize`: the width and height (in inches) of each plot 2084 + `filetype`: 'pdf' or 'png' 2085 + `dpi`: resolution for PNG output 2086 ''' 2087 if not os.path.exists(dir): 2088 os.makedirs(dir) 2089 2090 for session in self.sessions: 2091 sp = self.plot_single_session(session, xylimits = 'constant') 2092 ppl.savefig(f'{dir}/D{self._4x}_plot_{session}.{filetype}', **({'dpi': dpi} if filetype.lower() == 'png' else {})) 2093 ppl.close(sp.fig) 2094 2095 2096 2097 @make_verbal 2098 def consolidate_samples(self): 2099 ''' 2100 Compile various statistics for each sample. 2101 2102 For each anchor sample: 2103 2104 + `D47` or `D48`: the nominal Δ4x value for this anchor, specified by `self.Nominal_D4x` 2105 + `SE_D47` or `SE_D48`: set to zero by definition 2106 2107 For each unknown sample: 2108 2109 + `D47` or `D48`: the standardized Δ4x value for this unknown 2110 + `SE_D47` or `SE_D48`: the standard error of Δ4x for this unknown 2111 2112 For each anchor and unknown: 2113 2114 + `N`: the total number of analyses of this sample 2115 + `SD_D47` or `SD_D48`: the “sample” (in the statistical sense) standard deviation for this sample 2116 + `d13C_VPDB`: the average δ13C_VPDB value for this sample 2117 + `d18O_VSMOW`: the average δ18O_VSMOW value for this sample (as CO2) 2118 + `p_Levene`: the p-value from a [Levene test](https://en.wikipedia.org/wiki/Levene%27s_test) of equal 2119 variance, indicating whether the Δ4x repeatability this sample differs significantly from 2120 that observed for the reference sample specified by `self.LEVENE_REF_SAMPLE`. 2121 ''' 2122 D4x_ref_pop = [r[f'D{self._4x}'] for r in self.samples[self.LEVENE_REF_SAMPLE]['data']] 2123 for sample in self.samples: 2124 self.samples[sample]['N'] = len(self.samples[sample]['data']) 2125 if self.samples[sample]['N'] > 1: 2126 self.samples[sample][f'SD_D{self._4x}'] = stdev([r[f'D{self._4x}'] for r in self.samples[sample]['data']]) 2127 2128 self.samples[sample]['d13C_VPDB'] = np.mean([r['d13C_VPDB'] for r in self.samples[sample]['data']]) 2129 self.samples[sample]['d18O_VSMOW'] = np.mean([r['d18O_VSMOW'] for r in self.samples[sample]['data']]) 2130 2131 D4x_pop = [r[f'D{self._4x}'] for r in self.samples[sample]['data']] 2132 if len(D4x_pop) > 2: 2133 self.samples[sample]['p_Levene'] = levene(D4x_ref_pop, D4x_pop, center = 'median')[1] 2134 2135 if self.standardization_method == 'pooled': 2136 for sample in self.anchors: 2137 self.samples[sample][f'D{self._4x}'] = self.Nominal_D4x[sample] 2138 self.samples[sample][f'SE_D{self._4x}'] = 0. 2139 for sample in self.unknowns: 2140 self.samples[sample][f'D{self._4x}'] = self.standardization.params.valuesdict()[f'D{self._4x}_{pf(sample)}'] 2141 try: 2142 self.samples[sample][f'SE_D{self._4x}'] = self.sample_D4x_covar(sample)**.5 2143 except ValueError: 2144 # when `sample` is constrained by self.standardize(constraints = {...}), 2145 # it is no longer listed in self.standardization.var_names. 2146 # Temporary fix: define SE as zero for now 2147 self.samples[sample][f'SE_D4{self._4x}'] = 0. 2148 2149 elif self.standardization_method == 'indep_sessions': 2150 for sample in self.anchors: 2151 self.samples[sample][f'D{self._4x}'] = self.Nominal_D4x[sample] 2152 self.samples[sample][f'SE_D{self._4x}'] = 0. 2153 for sample in self.unknowns: 2154 self.msg(f'Consolidating sample {sample}') 2155 self.unknowns[sample][f'session_D{self._4x}'] = {} 2156 session_avg = [] 2157 for session in self.sessions: 2158 sdata = [r for r in self.sessions[session]['data'] if r['Sample'] == sample] 2159 if sdata: 2160 self.msg(f'{sample} found in session {session}') 2161 avg_D4x = np.mean([r[f'D{self._4x}'] for r in sdata]) 2162 avg_d4x = np.mean([r[f'd{self._4x}'] for r in sdata]) 2163 # !! TODO: sigma_s below does not account for temporal changes in standardization error 2164 sigma_s = self.standardization_error(session, avg_d4x, avg_D4x) 2165 sigma_u = sdata[0][f'wD{self._4x}raw'] / self.sessions[session]['a'] / len(sdata)**.5 2166 session_avg.append([avg_D4x, (sigma_u**2 + sigma_s**2)**.5]) 2167 self.unknowns[sample][f'session_D{self._4x}'][session] = session_avg[-1] 2168 self.samples[sample][f'D{self._4x}'], self.samples[sample][f'SE_D{self._4x}'] = w_avg(*zip(*session_avg)) 2169 weights = {s: self.unknowns[sample][f'session_D{self._4x}'][s][1]**-2 for s in self.unknowns[sample][f'session_D{self._4x}']} 2170 wsum = sum([weights[s] for s in weights]) 2171 for s in weights: 2172 self.unknowns[sample][f'session_D{self._4x}'][s] += [self.unknowns[sample][f'session_D{self._4x}'][s][1]**-2 / wsum] 2173 2174 for r in self: 2175 r[f'D{self._4x}_residual'] = r[f'D{self._4x}'] - self.samples[r['Sample']][f'D{self._4x}'] 2176 2177 2178 2179 def consolidate_sessions(self): 2180 ''' 2181 Compute various statistics for each session. 2182 2183 + `Na`: Number of anchor analyses in the session 2184 + `Nu`: Number of unknown analyses in the session 2185 + `r_d13C_VPDB`: δ13C_VPDB repeatability of analyses within the session 2186 + `r_d18O_VSMOW`: δ18O_VSMOW repeatability of analyses within the session 2187 + `r_D47` or `r_D48`: Δ4x repeatability of analyses within the session 2188 + `a`: scrambling factor 2189 + `b`: compositional slope 2190 + `c`: WG offset 2191 + `SE_a`: Model stadard erorr of `a` 2192 + `SE_b`: Model stadard erorr of `b` 2193 + `SE_c`: Model stadard erorr of `c` 2194 + `scrambling_drift` (boolean): whether to allow a temporal drift in the scrambling factor (`a`) 2195 + `slope_drift` (boolean): whether to allow a temporal drift in the compositional slope (`b`) 2196 + `wg_drift` (boolean): whether to allow a temporal drift in the WG offset (`c`) 2197 + `a2`: scrambling factor drift 2198 + `b2`: compositional slope drift 2199 + `c2`: WG offset drift 2200 + `Np`: Number of standardization parameters to fit 2201 + `CM`: model covariance matrix for (`a`, `b`, `c`, `a2`, `b2`, `c2`) 2202 + `d13Cwg_VPDB`: δ13C_VPDB of WG 2203 + `d18Owg_VSMOW`: δ18O_VSMOW of WG 2204 ''' 2205 for session in self.sessions: 2206 if 'd13Cwg_VPDB' not in self.sessions[session]: 2207 self.sessions[session]['d13Cwg_VPDB'] = self.sessions[session]['data'][0]['d13Cwg_VPDB'] 2208 if 'd18Owg_VSMOW' not in self.sessions[session]: 2209 self.sessions[session]['d18Owg_VSMOW'] = self.sessions[session]['data'][0]['d18Owg_VSMOW'] 2210 self.sessions[session]['Na'] = len([r for r in self.sessions[session]['data'] if r['Sample'] in self.anchors]) 2211 self.sessions[session]['Nu'] = len([r for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns]) 2212 2213 self.msg(f'Computing repeatabilities for session {session}') 2214 self.sessions[session]['r_d13C_VPDB'] = self.compute_r('d13C_VPDB', samples = 'anchors', sessions = [session]) 2215 self.sessions[session]['r_d18O_VSMOW'] = self.compute_r('d18O_VSMOW', samples = 'anchors', sessions = [session]) 2216 self.sessions[session][f'r_D{self._4x}'] = self.compute_r(f'D{self._4x}', sessions = [session]) 2217 2218 if self.standardization_method == 'pooled': 2219 for session in self.sessions: 2220 2221 # different (better?) computation of D4x repeatability for each session: 2222 sqresiduals = [(r[f'D{self._4x}'] - self.samples[r['Sample']][f'D{self._4x}'])**2 for r in self.sessions[session]['data']] 2223 self.sessions[session][f'r_D{self._4x}'] = np.mean(sqresiduals)**.5 2224 2225 self.sessions[session]['a'] = self.standardization.params.valuesdict()[f'a_{pf(session)}'] 2226 i = self.standardization.var_names.index(f'a_{pf(session)}') 2227 self.sessions[session]['SE_a'] = self.standardization.covar[i,i]**.5 2228 2229 self.sessions[session]['b'] = self.standardization.params.valuesdict()[f'b_{pf(session)}'] 2230 i = self.standardization.var_names.index(f'b_{pf(session)}') 2231 self.sessions[session]['SE_b'] = self.standardization.covar[i,i]**.5 2232 2233 self.sessions[session]['c'] = self.standardization.params.valuesdict()[f'c_{pf(session)}'] 2234 i = self.standardization.var_names.index(f'c_{pf(session)}') 2235 self.sessions[session]['SE_c'] = self.standardization.covar[i,i]**.5 2236 2237 self.sessions[session]['a2'] = self.standardization.params.valuesdict()[f'a2_{pf(session)}'] 2238 if self.sessions[session]['scrambling_drift']: 2239 i = self.standardization.var_names.index(f'a2_{pf(session)}') 2240 self.sessions[session]['SE_a2'] = self.standardization.covar[i,i]**.5 2241 else: 2242 self.sessions[session]['SE_a2'] = 0. 2243 2244 self.sessions[session]['b2'] = self.standardization.params.valuesdict()[f'b2_{pf(session)}'] 2245 if self.sessions[session]['slope_drift']: 2246 i = self.standardization.var_names.index(f'b2_{pf(session)}') 2247 self.sessions[session]['SE_b2'] = self.standardization.covar[i,i]**.5 2248 else: 2249 self.sessions[session]['SE_b2'] = 0. 2250 2251 self.sessions[session]['c2'] = self.standardization.params.valuesdict()[f'c2_{pf(session)}'] 2252 if self.sessions[session]['wg_drift']: 2253 i = self.standardization.var_names.index(f'c2_{pf(session)}') 2254 self.sessions[session]['SE_c2'] = self.standardization.covar[i,i]**.5 2255 else: 2256 self.sessions[session]['SE_c2'] = 0. 2257 2258 i = self.standardization.var_names.index(f'a_{pf(session)}') 2259 j = self.standardization.var_names.index(f'b_{pf(session)}') 2260 k = self.standardization.var_names.index(f'c_{pf(session)}') 2261 CM = np.zeros((6,6)) 2262 CM[:3,:3] = self.standardization.covar[[i,j,k],:][:,[i,j,k]] 2263 try: 2264 i2 = self.standardization.var_names.index(f'a2_{pf(session)}') 2265 CM[3,[0,1,2,3]] = self.standardization.covar[i2,[i,j,k,i2]] 2266 CM[[0,1,2,3],3] = self.standardization.covar[[i,j,k,i2],i2] 2267 try: 2268 j2 = self.standardization.var_names.index(f'b2_{pf(session)}') 2269 CM[3,4] = self.standardization.covar[i2,j2] 2270 CM[4,3] = self.standardization.covar[j2,i2] 2271 except ValueError: 2272 pass 2273 try: 2274 k2 = self.standardization.var_names.index(f'c2_{pf(session)}') 2275 CM[3,5] = self.standardization.covar[i2,k2] 2276 CM[5,3] = self.standardization.covar[k2,i2] 2277 except ValueError: 2278 pass 2279 except ValueError: 2280 pass 2281 try: 2282 j2 = self.standardization.var_names.index(f'b2_{pf(session)}') 2283 CM[4,[0,1,2,4]] = self.standardization.covar[j2,[i,j,k,j2]] 2284 CM[[0,1,2,4],4] = self.standardization.covar[[i,j,k,j2],j2] 2285 try: 2286 k2 = self.standardization.var_names.index(f'c2_{pf(session)}') 2287 CM[4,5] = self.standardization.covar[j2,k2] 2288 CM[5,4] = self.standardization.covar[k2,j2] 2289 except ValueError: 2290 pass 2291 except ValueError: 2292 pass 2293 try: 2294 k2 = self.standardization.var_names.index(f'c2_{pf(session)}') 2295 CM[5,[0,1,2,5]] = self.standardization.covar[k2,[i,j,k,k2]] 2296 CM[[0,1,2,5],5] = self.standardization.covar[[i,j,k,k2],k2] 2297 except ValueError: 2298 pass 2299 2300 self.sessions[session]['CM'] = CM 2301 2302 elif self.standardization_method == 'indep_sessions': 2303 pass # Not implemented yet 2304 2305 2306 @make_verbal 2307 def repeatabilities(self): 2308 ''' 2309 Compute analytical repeatabilities for δ13C_VPDB, δ18O_VSMOW, Δ4x 2310 (for all samples, for anchors, and for unknowns). 2311 ''' 2312 self.msg('Computing reproducibilities for all sessions') 2313 2314 self.repeatability['r_d13C_VPDB'] = self.compute_r('d13C_VPDB', samples = 'anchors') 2315 self.repeatability['r_d18O_VSMOW'] = self.compute_r('d18O_VSMOW', samples = 'anchors') 2316 self.repeatability[f'r_D{self._4x}a'] = self.compute_r(f'D{self._4x}', samples = 'anchors') 2317 self.repeatability[f'r_D{self._4x}u'] = self.compute_r(f'D{self._4x}', samples = 'unknowns') 2318 self.repeatability[f'r_D{self._4x}'] = self.compute_r(f'D{self._4x}', samples = 'all samples') 2319 2320 2321 @make_verbal 2322 def consolidate(self, tables = True, plots = True): 2323 ''' 2324 Collect information about samples, sessions and repeatabilities. 2325 ''' 2326 self.consolidate_samples() 2327 self.consolidate_sessions() 2328 self.repeatabilities() 2329 2330 if tables: 2331 self.summary() 2332 self.table_of_sessions() 2333 self.table_of_analyses() 2334 self.table_of_samples() 2335 2336 if plots: 2337 self.plot_sessions() 2338 2339 2340 @make_verbal 2341 def rmswd(self, 2342 samples = 'all samples', 2343 sessions = 'all sessions', 2344 ): 2345 ''' 2346 Compute the χ2, root mean squared weighted deviation 2347 (i.e. reduced χ2), and corresponding degrees of freedom of the 2348 Δ4x values for samples in `samples` and sessions in `sessions`. 2349 2350 Only used in `D4xdata.standardize()` with `method='indep_sessions'`. 2351 ''' 2352 if samples == 'all samples': 2353 mysamples = [k for k in self.samples] 2354 elif samples == 'anchors': 2355 mysamples = [k for k in self.anchors] 2356 elif samples == 'unknowns': 2357 mysamples = [k for k in self.unknowns] 2358 else: 2359 mysamples = samples 2360 2361 if sessions == 'all sessions': 2362 sessions = [k for k in self.sessions] 2363 2364 chisq, Nf = 0, 0 2365 for sample in mysamples : 2366 G = [ r for r in self if r['Sample'] == sample and r['Session'] in sessions ] 2367 if len(G) > 1 : 2368 X, sX = w_avg([r[f'D{self._4x}'] for r in G], [r[f'wD{self._4x}'] for r in G]) 2369 Nf += (len(G) - 1) 2370 chisq += np.sum([ ((r[f'D{self._4x}']-X)/r[f'wD{self._4x}'])**2 for r in G]) 2371 r = (chisq / Nf)**.5 if Nf > 0 else 0 2372 self.msg(f'RMSWD of r["D{self._4x}"] is {r:.6f} for {samples}.') 2373 return {'rmswd': r, 'chisq': chisq, 'Nf': Nf} 2374 2375 2376 @make_verbal 2377 def compute_r(self, key, samples = 'all samples', sessions = 'all sessions'): 2378 ''' 2379 Compute the repeatability of `[r[key] for r in self]` 2380 ''' 2381 2382 if samples == 'all samples': 2383 mysamples = [k for k in self.samples] 2384 elif samples == 'anchors': 2385 mysamples = [k for k in self.anchors] 2386 elif samples == 'unknowns': 2387 mysamples = [k for k in self.unknowns] 2388 else: 2389 mysamples = samples 2390 2391 if sessions == 'all sessions': 2392 sessions = [k for k in self.sessions] 2393 2394 if key in ['D47', 'D48']: 2395 # Full disclosure: the definition of Nf is tricky/debatable 2396 G = [r for r in self if r['Sample'] in mysamples and r['Session'] in sessions] 2397 chisq = (np.array([r[f'{key}_residual'] for r in G])**2).sum() 2398 Nf = len(G) 2399# print(f'len(G) = {Nf}') 2400 Nf -= len([s for s in mysamples if s in self.unknowns]) 2401# print(f'{len([s for s in mysamples if s in self.unknowns])} unknown samples to consider') 2402 for session in sessions: 2403 Np = len([ 2404 _ for _ in self.standardization.params 2405 if ( 2406 self.standardization.params[_].expr is not None 2407 and ( 2408 (_[0] in 'abc' and _[1] == '_' and _[2:] == pf(session)) 2409 or (_[0] in 'abc' and _[1:3] == '2_' and _[3:] == pf(session)) 2410 ) 2411 ) 2412 ]) 2413# print(f'session {session}: {Np} parameters to consider') 2414 Na = len({ 2415 r['Sample'] for r in self.sessions[session]['data'] 2416 if r['Sample'] in self.anchors and r['Sample'] in mysamples 2417 }) 2418# print(f'session {session}: {Na} different anchors in that session') 2419 Nf -= min(Np, Na) 2420# print(f'Nf = {Nf}') 2421 2422# for sample in mysamples : 2423# X = [ r[key] for r in self if r['Sample'] == sample and r['Session'] in sessions ] 2424# if len(X) > 1 : 2425# chisq += np.sum([ (x-self.samples[sample][key])**2 for x in X ]) 2426# if sample in self.unknowns: 2427# Nf += len(X) - 1 2428# else: 2429# Nf += len(X) 2430# if samples in ['anchors', 'all samples']: 2431# Nf -= sum([self.sessions[s]['Np'] for s in sessions]) 2432 r = (chisq / Nf)**.5 if Nf > 0 else 0 2433 2434 else: # if key not in ['D47', 'D48'] 2435 chisq, Nf = 0, 0 2436 for sample in mysamples : 2437 X = [ r[key] for r in self if r['Sample'] == sample and r['Session'] in sessions ] 2438 if len(X) > 1 : 2439 Nf += len(X) - 1 2440 chisq += np.sum([ (x-np.mean(X))**2 for x in X ]) 2441 r = (chisq / Nf)**.5 if Nf > 0 else 0 2442 2443 self.msg(f'Repeatability of r["{key}"] is {1000*r:.1f} ppm for {samples}.') 2444 return r 2445 2446 def sample_average(self, samples, weights = 'equal', normalize = True): 2447 ''' 2448 Weighted average Δ4x value of a group of samples, accounting for covariance. 2449 2450 Returns the weighed average Δ4x value and associated SE 2451 of a group of samples. Weights are equal by default. If `normalize` is 2452 true, `weights` will be rescaled so that their sum equals 1. 2453 2454 **Examples** 2455 2456 ```python 2457 self.sample_average(['X','Y'], [1, 2]) 2458 ``` 2459 2460 returns the value and SE of [Δ4x(X) + 2 Δ4x(Y)]/3, 2461 where Δ4x(X) and Δ4x(Y) are the average Δ4x 2462 values of samples X and Y, respectively. 2463 2464 ```python 2465 self.sample_average(['X','Y'], [1, -1], normalize = False) 2466 ``` 2467 2468 returns the value and SE of the difference Δ4x(X) - Δ4x(Y). 2469 ''' 2470 if weights == 'equal': 2471 weights = [1/len(samples)] * len(samples) 2472 2473 if normalize: 2474 s = sum(weights) 2475 if s: 2476 weights = [w/s for w in weights] 2477 2478 try: 2479# indices = [self.standardization.var_names.index(f'D47_{pf(sample)}') for sample in samples] 2480# C = self.standardization.covar[indices,:][:,indices] 2481 C = np.array([[self.sample_D4x_covar(x, y) for x in samples] for y in samples]) 2482 X = [self.samples[sample][f'D{self._4x}'] for sample in samples] 2483 return correlated_sum(X, C, weights) 2484 except ValueError: 2485 return (0., 0.) 2486 2487 2488 def sample_D4x_covar(self, sample1, sample2 = None): 2489 ''' 2490 Covariance between Δ4x values of samples 2491 2492 Returns the error covariance between the average Δ4x values of two 2493 samples. If if only `sample_1` is specified, or if `sample_1 == sample_2`), 2494 returns the Δ4x variance for that sample. 2495 ''' 2496 if sample2 is None: 2497 sample2 = sample1 2498 if self.standardization_method == 'pooled': 2499 i = self.standardization.var_names.index(f'D{self._4x}_{pf(sample1)}') 2500 j = self.standardization.var_names.index(f'D{self._4x}_{pf(sample2)}') 2501 return self.standardization.covar[i, j] 2502 elif self.standardization_method == 'indep_sessions': 2503 if sample1 == sample2: 2504 return self.samples[sample1][f'SE_D{self._4x}']**2 2505 else: 2506 c = 0 2507 for session in self.sessions: 2508 sdata1 = [r for r in self.sessions[session]['data'] if r['Sample'] == sample1] 2509 sdata2 = [r for r in self.sessions[session]['data'] if r['Sample'] == sample2] 2510 if sdata1 and sdata2: 2511 a = self.sessions[session]['a'] 2512 # !! TODO: CM below does not account for temporal changes in standardization parameters 2513 CM = self.sessions[session]['CM'][:3,:3] 2514 avg_D4x_1 = np.mean([r[f'D{self._4x}'] for r in sdata1]) 2515 avg_d4x_1 = np.mean([r[f'd{self._4x}'] for r in sdata1]) 2516 avg_D4x_2 = np.mean([r[f'D{self._4x}'] for r in sdata2]) 2517 avg_d4x_2 = np.mean([r[f'd{self._4x}'] for r in sdata2]) 2518 c += ( 2519 self.unknowns[sample1][f'session_D{self._4x}'][session][2] 2520 * self.unknowns[sample2][f'session_D{self._4x}'][session][2] 2521 * np.array([[avg_D4x_1, avg_d4x_1, 1]]) 2522 @ CM 2523 @ np.array([[avg_D4x_2, avg_d4x_2, 1]]).T 2524 ) / a**2 2525 return float(c) 2526 2527 def sample_D4x_correl(self, sample1, sample2 = None): 2528 ''' 2529 Correlation between Δ4x errors of samples 2530 2531 Returns the error correlation between the average Δ4x values of two samples. 2532 ''' 2533 if sample2 is None or sample2 == sample1: 2534 return 1. 2535 return ( 2536 self.sample_D4x_covar(sample1, sample2) 2537 / self.unknowns[sample1][f'SE_D{self._4x}'] 2538 / self.unknowns[sample2][f'SE_D{self._4x}'] 2539 ) 2540 2541 def plot_single_session(self, 2542 session, 2543 kw_plot_anchors = dict(ls='None', marker='x', mec=(.75, 0, 0), mew = .75, ms = 4), 2544 kw_plot_unknowns = dict(ls='None', marker='x', mec=(0, 0, .75), mew = .75, ms = 4), 2545 kw_plot_anchor_avg = dict(ls='-', marker='None', color=(.75, 0, 0), lw = .75), 2546 kw_plot_unknown_avg = dict(ls='-', marker='None', color=(0, 0, .75), lw = .75), 2547 kw_contour_error = dict(colors = [[0, 0, 0]], alpha = .5, linewidths = 0.75), 2548 xylimits = 'free', # | 'constant' 2549 x_label = None, 2550 y_label = None, 2551 error_contour_interval = 'auto', 2552 fig = 'new', 2553 ): 2554 ''' 2555 Generate plot for a single session 2556 ''' 2557 if x_label is None: 2558 x_label = f'δ$_{{{self._4x}}}$ (‰)' 2559 if y_label is None: 2560 y_label = f'Δ$_{{{self._4x}}}$ (‰)' 2561 2562 out = _SessionPlot() 2563 anchors = [a for a in self.anchors if [r for r in self.sessions[session]['data'] if r['Sample'] == a]] 2564 unknowns = [u for u in self.unknowns if [r for r in self.sessions[session]['data'] if r['Sample'] == u]] 2565 anchors_d = [r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.anchors] 2566 anchors_D = [r[f'D{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.anchors] 2567 unknowns_d = [r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns] 2568 unknowns_D = [r[f'D{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns] 2569 anchor_avg = (np.array([ np.array([ 2570 np.min([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) - 1, 2571 np.max([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) + 1 2572 ]) for sample in anchors]).T, 2573 np.array([ np.array([0, 0]) + self.Nominal_D4x[sample] for sample in anchors]).T) 2574 unknown_avg = (np.array([ np.array([ 2575 np.min([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) - 1, 2576 np.max([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) + 1 2577 ]) for sample in unknowns]).T, 2578 np.array([ np.array([0, 0]) + self.unknowns[sample][f'D{self._4x}'] for sample in unknowns]).T) 2579 2580 2581 if fig == 'new': 2582 out.fig = ppl.figure(figsize = (6,6)) 2583 ppl.subplots_adjust(.1,.1,.9,.9) 2584 2585 out.anchor_analyses, = ppl.plot( 2586 anchors_d, 2587 anchors_D, 2588 **kw_plot_anchors) 2589 out.unknown_analyses, = ppl.plot( 2590 unknowns_d, 2591 unknowns_D, 2592 **kw_plot_unknowns) 2593 out.anchor_avg = ppl.plot( 2594 *anchor_avg, 2595 **kw_plot_anchor_avg) 2596 out.unknown_avg = ppl.plot( 2597 *unknown_avg, 2598 **kw_plot_unknown_avg) 2599 if xylimits == 'constant': 2600 x = [r[f'd{self._4x}'] for r in self] 2601 y = [r[f'D{self._4x}'] for r in self] 2602 x1, x2, y1, y2 = np.min(x), np.max(x), np.min(y), np.max(y) 2603 w, h = x2-x1, y2-y1 2604 x1 -= w/20 2605 x2 += w/20 2606 y1 -= h/20 2607 y2 += h/20 2608 ppl.axis([x1, x2, y1, y2]) 2609 elif xylimits == 'free': 2610 x1, x2, y1, y2 = ppl.axis() 2611 else: 2612 x1, x2, y1, y2 = ppl.axis(xylimits) 2613 2614 if error_contour_interval != 'none': 2615 xi, yi = np.linspace(x1, x2), np.linspace(y1, y2) 2616 XI,YI = np.meshgrid(xi, yi) 2617 SI = np.array([[self.standardization_error(session, x, y) for x in xi] for y in yi]) 2618 if error_contour_interval == 'auto': 2619 rng = np.max(SI) - np.min(SI) 2620 if rng <= 0.01: 2621 cinterval = 0.001 2622 elif rng <= 0.03: 2623 cinterval = 0.004 2624 elif rng <= 0.1: 2625 cinterval = 0.01 2626 elif rng <= 0.3: 2627 cinterval = 0.03 2628 elif rng <= 1.: 2629 cinterval = 0.1 2630 else: 2631 cinterval = 0.5 2632 else: 2633 cinterval = error_contour_interval 2634 2635 cval = np.arange(np.ceil(SI.min() / .001) * .001, np.ceil(SI.max() / .001 + 1) * .001, cinterval) 2636 out.contour = ppl.contour(XI, YI, SI, cval, **kw_contour_error) 2637 out.clabel = ppl.clabel(out.contour) 2638 contour = (XI, YI, SI, cval, cinterval) 2639 2640 if fig == None: 2641 return { 2642 'anchors':anchors, 2643 'unknowns':unknowns, 2644 'anchors_d':anchors_d, 2645 'anchors_D':anchors_D, 2646 'unknowns_d':unknowns_d, 2647 'unknowns_D':unknowns_D, 2648 'anchor_avg':anchor_avg, 2649 'unknown_avg':unknown_avg, 2650 'contour':contour, 2651 } 2652 2653 ppl.xlabel(x_label) 2654 ppl.ylabel(y_label) 2655 ppl.title(session, weight = 'bold') 2656 ppl.grid(alpha = .2) 2657 out.ax = ppl.gca() 2658 2659 return out 2660 2661 def plot_residuals( 2662 self, 2663 kde = False, 2664 hist = False, 2665 binwidth = 2/3, 2666 dir = 'output', 2667 filename = None, 2668 highlight = [], 2669 colors = None, 2670 figsize = None, 2671 dpi = 100, 2672 yspan = None, 2673 ): 2674 ''' 2675 Plot residuals of each analysis as a function of time (actually, as a function of 2676 the order of analyses in the `D4xdata` object) 2677 2678 + `kde`: whether to add a kernel density estimate of residuals 2679 + `hist`: whether to add a histogram of residuals (incompatible with `kde`) 2680 + `histbins`: specify bin edges for the histogram 2681 + `dir`: the directory in which to save the plot 2682 + `highlight`: a list of samples to highlight 2683 + `colors`: a dict of `{<sample>: <color>}` for all samples 2684 + `figsize`: (width, height) of figure 2685 + `dpi`: resolution for PNG output 2686 + `yspan`: factor controlling the range of y values shown in plot 2687 (by default: `yspan = 1.5 if kde else 1.0`) 2688 ''' 2689 2690 from matplotlib import ticker 2691 2692 if yspan is None: 2693 if kde: 2694 yspan = 1.5 2695 else: 2696 yspan = 1.0 2697 2698 # Layout 2699 fig = ppl.figure(figsize = (8,4) if figsize is None else figsize) 2700 if hist or kde: 2701 ppl.subplots_adjust(left = .08, bottom = .05, right = .98, top = .8, wspace = -0.72) 2702 ax1, ax2 = ppl.subplot(121), ppl.subplot(1,15,15) 2703 else: 2704 ppl.subplots_adjust(.08,.05,.78,.8) 2705 ax1 = ppl.subplot(111) 2706 2707 # Colors 2708 N = len(self.anchors) 2709 if colors is None: 2710 if len(highlight) > 0: 2711 Nh = len(highlight) 2712 if Nh == 1: 2713 colors = {highlight[0]: (0,0,0)} 2714 elif Nh == 3: 2715 colors = {a: c for a,c in zip(highlight, [(0,0,1), (1,0,0), (0,2/3,0)])} 2716 elif Nh == 4: 2717 colors = {a: c for a,c in zip(highlight, [(0,0,1), (1,0,0), (0,2/3,0), (.75,0,.75)])} 2718 else: 2719 colors = {a: hls_to_rgb(k/Nh, .4, 1) for k,a in enumerate(highlight)} 2720 else: 2721 if N == 3: 2722 colors = {a: c for a,c in zip(self.anchors, [(0,0,1), (1,0,0), (0,2/3,0)])} 2723 elif N == 4: 2724 colors = {a: c for a,c in zip(self.anchors, [(0,0,1), (1,0,0), (0,2/3,0), (.75,0,.75)])} 2725 else: 2726 colors = {a: hls_to_rgb(k/N, .4, 1) for k,a in enumerate(self.anchors)} 2727 2728 ppl.sca(ax1) 2729 2730 ppl.axhline(0, color = 'k', alpha = .25, lw = 0.75) 2731 2732 ax1.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: f'${x:+.0f}$' if x else '$0$')) 2733 2734 session = self[0]['Session'] 2735 x1 = 0 2736# ymax = np.max([1e3 * (r['D47'] - self.samples[r['Sample']]['D47']) for r in self]) 2737 x_sessions = {} 2738 one_or_more_singlets = False 2739 one_or_more_multiplets = False 2740 multiplets = set() 2741 for k,r in enumerate(self): 2742 if r['Session'] != session: 2743 x2 = k-1 2744 x_sessions[session] = (x1+x2)/2 2745 ppl.axvline(k - 0.5, color = 'k', lw = .5) 2746 session = r['Session'] 2747 x1 = k 2748 singlet = len(self.samples[r['Sample']]['data']) == 1 2749 if not singlet: 2750 multiplets.add(r['Sample']) 2751 if r['Sample'] in self.unknowns: 2752 if singlet: 2753 one_or_more_singlets = True 2754 else: 2755 one_or_more_multiplets = True 2756 kw = dict( 2757 marker = 'x' if singlet else '+', 2758 ms = 4 if singlet else 5, 2759 ls = 'None', 2760 mec = colors[r['Sample']] if r['Sample'] in colors else (0,0,0), 2761 mew = 1, 2762 alpha = 0.2 if singlet else 1, 2763 ) 2764 if highlight and r['Sample'] not in highlight: 2765 kw['alpha'] = 0.2 2766 ppl.plot(k, 1e3 * r[f'D{self._4x}_residual'], **kw) 2767 x2 = k 2768 x_sessions[session] = (x1+x2)/2 2769 2770 ppl.axhspan(-self.repeatability[f'r_D{self._4x}']*1000, self.repeatability[f'r_D{self._4x}']*1000, color = 'k', alpha = .05, lw = 1) 2771 ppl.axhspan(-self.repeatability[f'r_D{self._4x}']*1000*self.t95, self.repeatability[f'r_D{self._4x}']*1000*self.t95, color = 'k', alpha = .05, lw = 1) 2772 if not (hist or kde): 2773 ppl.text(len(self), self.repeatability[f'r_D{self._4x}']*1000, f" SD = {self.repeatability[f'r_D{self._4x}']*1000:.1f} ppm", size = 9, alpha = 1, va = 'center') 2774 ppl.text(len(self), self.repeatability[f'r_D{self._4x}']*1000*self.t95, f" 95% CL = ± {self.repeatability[f'r_D{self._4x}']*1000*self.t95:.1f} ppm", size = 9, alpha = 1, va = 'center') 2775 2776 xmin, xmax, ymin, ymax = ppl.axis() 2777 if yspan != 1: 2778 ymin, ymax = (ymin + ymax)/2 - yspan * (ymax - ymin)/2, (ymin + ymax)/2 + yspan * (ymax - ymin)/2 2779 for s in x_sessions: 2780 ppl.text( 2781 x_sessions[s], 2782 ymax +1, 2783 s, 2784 va = 'bottom', 2785 **( 2786 dict(ha = 'center') 2787 if len(self.sessions[s]['data']) > (0.15 * len(self)) 2788 else dict(ha = 'left', rotation = 45) 2789 ) 2790 ) 2791 2792 if hist or kde: 2793 ppl.sca(ax2) 2794 2795 for s in colors: 2796 kw['marker'] = '+' 2797 kw['ms'] = 5 2798 kw['mec'] = colors[s] 2799 kw['label'] = s 2800 kw['alpha'] = 1 2801 ppl.plot([], [], **kw) 2802 2803 kw['mec'] = (0,0,0) 2804 2805 if one_or_more_singlets: 2806 kw['marker'] = 'x' 2807 kw['ms'] = 4 2808 kw['alpha'] = .2 2809 kw['label'] = 'other (N$\\,$=$\\,$1)' if one_or_more_multiplets else 'other' 2810 ppl.plot([], [], **kw) 2811 2812 if one_or_more_multiplets: 2813 kw['marker'] = '+' 2814 kw['ms'] = 4 2815 kw['alpha'] = 1 2816 kw['label'] = 'other (N$\\,$>$\\,$1)' if one_or_more_singlets else 'other' 2817 ppl.plot([], [], **kw) 2818 2819 if hist or kde: 2820 leg = ppl.legend(loc = 'upper right', bbox_to_anchor = (1, 1), bbox_transform=fig.transFigure, borderaxespad = 1.5, fontsize = 9) 2821 else: 2822 leg = ppl.legend(loc = 'lower right', bbox_to_anchor = (1, 0), bbox_transform=fig.transFigure, borderaxespad = 1.5) 2823 leg.set_zorder(-1000) 2824 2825 ppl.sca(ax1) 2826 2827 ppl.ylabel(f'Δ$_{{{self._4x}}}$ residuals (ppm)') 2828 ppl.xticks([]) 2829 ppl.axis([-1, len(self), None, None]) 2830 2831 if hist or kde: 2832 ppl.sca(ax2) 2833 X = 1e3 * np.array([r[f'D{self._4x}_residual'] for r in self if r['Sample'] in multiplets or r['Sample'] in self.anchors]) 2834 2835 if kde: 2836 from scipy.stats import gaussian_kde 2837 yi = np.linspace(ymin, ymax, 201) 2838 xi = gaussian_kde(X).evaluate(yi) 2839 ppl.fill_betweenx(yi, xi, xi*0, fc = (0,0,0,.15), lw = 1, ec = (.75,.75,.75,1)) 2840# ppl.plot(xi, yi, 'k-', lw = 1) 2841 elif hist: 2842 ppl.hist( 2843 X, 2844 orientation = 'horizontal', 2845 histtype = 'stepfilled', 2846 ec = [.4]*3, 2847 fc = [.25]*3, 2848 alpha = .25, 2849 bins = np.linspace(-9e3*self.repeatability[f'r_D{self._4x}'], 9e3*self.repeatability[f'r_D{self._4x}'], int(18/binwidth+1)), 2850 ) 2851 ppl.text(0, 0, 2852 f" SD = {self.repeatability[f'r_D{self._4x}']*1000:.1f} ppm\n 95% CL = ± {self.repeatability[f'r_D{self._4x}']*1000*self.t95:.1f} ppm", 2853 size = 7.5, 2854 alpha = 1, 2855 va = 'center', 2856 ha = 'left', 2857 ) 2858 2859 ppl.axis([0, None, ymin, ymax]) 2860 ppl.xticks([]) 2861 ppl.yticks([]) 2862# ax2.spines['left'].set_visible(False) 2863 ax2.spines['right'].set_visible(False) 2864 ax2.spines['top'].set_visible(False) 2865 ax2.spines['bottom'].set_visible(False) 2866 2867 ax1.axis([None, None, ymin, ymax]) 2868 2869 if not os.path.exists(dir): 2870 os.makedirs(dir) 2871 if filename is None: 2872 return fig 2873 elif filename == '': 2874 filename = f'D{self._4x}_residuals.pdf' 2875 ppl.savefig(f'{dir}/{filename}', dpi = dpi) 2876 ppl.close(fig) 2877 2878 2879 def simulate(self, *args, **kwargs): 2880 ''' 2881 Legacy function with warning message pointing to `virtual_data()` 2882 ''' 2883 raise DeprecationWarning('D4xdata.simulate is deprecated and has been replaced by virtual_data()') 2884 2885 def plot_distribution_of_analyses( 2886 self, 2887 dir = 'output', 2888 filename = None, 2889 vs_time = False, 2890 figsize = (6,4), 2891 subplots_adjust = (0.02, 0.13, 0.85, 0.8), 2892 output = None, 2893 dpi = 100, 2894 ): 2895 ''' 2896 Plot temporal distribution of all analyses in the data set. 2897 2898 **Parameters** 2899 2900 + `dir`: the directory in which to save the plot 2901 + `vs_time`: if `True`, plot as a function of `TimeTag` rather than sequentially. 2902 + `dpi`: resolution for PNG output 2903 + `figsize`: (width, height) of figure 2904 + `dpi`: resolution for PNG output 2905 ''' 2906 2907 asamples = [s for s in self.anchors] 2908 usamples = [s for s in self.unknowns] 2909 if output is None or output == 'fig': 2910 fig = ppl.figure(figsize = figsize) 2911 ppl.subplots_adjust(*subplots_adjust) 2912 Xmin = min([r['TimeTag'] if vs_time else j for j,r in enumerate(self)]) 2913 Xmax = max([r['TimeTag'] if vs_time else j for j,r in enumerate(self)]) 2914 Xmax += (Xmax-Xmin)/40 2915 Xmin -= (Xmax-Xmin)/41 2916 for k, s in enumerate(asamples + usamples): 2917 if vs_time: 2918 X = [r['TimeTag'] for r in self if r['Sample'] == s] 2919 else: 2920 X = [x for x,r in enumerate(self) if r['Sample'] == s] 2921 Y = [-k for x in X] 2922 ppl.plot(X, Y, 'o', mec = None, mew = 0, mfc = 'b' if s in usamples else 'r', ms = 3, alpha = .75) 2923 ppl.axhline(-k, color = 'b' if s in usamples else 'r', lw = .5, alpha = .25) 2924 ppl.text(Xmax, -k, f' {s}', va = 'center', ha = 'left', size = 7, color = 'b' if s in usamples else 'r') 2925 ppl.axis([Xmin, Xmax, -k-1, 1]) 2926 ppl.xlabel('\ntime') 2927 ppl.gca().annotate('', 2928 xy = (0.6, -0.02), 2929 xycoords = 'axes fraction', 2930 xytext = (.4, -0.02), 2931 arrowprops = dict(arrowstyle = "->", color = 'k'), 2932 ) 2933 2934 2935 x2 = -1 2936 for session in self.sessions: 2937 x1 = min([r['TimeTag'] if vs_time else j for j,r in enumerate(self) if r['Session'] == session]) 2938 if vs_time: 2939 ppl.axvline(x1, color = 'k', lw = .75) 2940 if x2 > -1: 2941 if not vs_time: 2942 ppl.axvline((x1+x2)/2, color = 'k', lw = .75, alpha = .5) 2943 x2 = max([r['TimeTag'] if vs_time else j for j,r in enumerate(self) if r['Session'] == session]) 2944# from xlrd import xldate_as_datetime 2945# print(session, xldate_as_datetime(x1, 0), xldate_as_datetime(x2, 0)) 2946 if vs_time: 2947 ppl.axvline(x2, color = 'k', lw = .75) 2948 ppl.axvspan(x1,x2,color = 'k', zorder = -100, alpha = .15) 2949 ppl.text((x1+x2)/2, 1, f' {session}', ha = 'left', va = 'bottom', rotation = 45, size = 8) 2950 2951 ppl.xticks([]) 2952 ppl.yticks([]) 2953 2954 if output is None: 2955 if not os.path.exists(dir): 2956 os.makedirs(dir) 2957 if filename == None: 2958 filename = f'D{self._4x}_distribution_of_analyses.pdf' 2959 ppl.savefig(f'{dir}/{filename}', dpi = dpi) 2960 ppl.close(fig) 2961 elif output == 'ax': 2962 return ppl.gca() 2963 elif output == 'fig': 2964 return fig 2965 2966 2967 def plot_bulk_compositions( 2968 self, 2969 samples = None, 2970 dir = 'output/bulk_compositions', 2971 figsize = (6,6), 2972 subplots_adjust = (0.15, 0.12, 0.95, 0.92), 2973 show = False, 2974 sample_color = (0,.5,1), 2975 analysis_color = (.7,.7,.7), 2976 labeldist = 0.3, 2977 radius = 0.05, 2978 ): 2979 ''' 2980 Plot δ13C_VBDP vs δ18O_VSMOW (of CO2) for all analyses. 2981 2982 By default, creates a directory `./output/bulk_compositions` where plots for 2983 each sample are saved. Another plot named `__all__.pdf` shows all analyses together. 2984 2985 2986 **Parameters** 2987 2988 + `samples`: Only these samples are processed (by default: all samples). 2989 + `dir`: where to save the plots 2990 + `figsize`: (width, height) of figure 2991 + `subplots_adjust`: passed to `subplots_adjust()` 2992 + `show`: whether to call `matplotlib.pyplot.show()` on the plot with all samples, 2993 allowing for interactive visualization/exploration in (δ13C, δ18O) space. 2994 + `sample_color`: color used for replicate markers/labels 2995 + `analysis_color`: color used for sample markers/labels 2996 + `labeldist`: distance (in inches) from replicate markers to replicate labels 2997 + `radius`: radius of the dashed circle providing scale. No circle if `radius = 0`. 2998 ''' 2999 3000 from matplotlib.patches import Ellipse 3001 3002 if samples is None: 3003 samples = [_ for _ in self.samples] 3004 3005 saved = {} 3006 3007 for s in samples: 3008 3009 fig = ppl.figure(figsize = figsize) 3010 fig.subplots_adjust(*subplots_adjust) 3011 ax = ppl.subplot(111) 3012 ppl.xlabel('$δ^{18}O_{VSMOW}$ of $CO_2$ (‰)') 3013 ppl.ylabel('$δ^{13}C_{VPDB}$ (‰)') 3014 ppl.title(s) 3015 3016 3017 XY = np.array([[_['d18O_VSMOW'], _['d13C_VPDB']] for _ in self.samples[s]['data']]) 3018 UID = [_['UID'] for _ in self.samples[s]['data']] 3019 XY0 = XY.mean(0) 3020 3021 for xy in XY: 3022 ppl.plot([xy[0], XY0[0]], [xy[1], XY0[1]], '-', lw = 1, color = analysis_color) 3023 3024 ppl.plot(*XY.T, 'wo', mew = 1, mec = analysis_color) 3025 ppl.plot(*XY0, 'wo', mew = 2, mec = sample_color) 3026 ppl.text(*XY0, f' {s}', va = 'center', ha = 'left', color = sample_color, weight = 'bold') 3027 saved[s] = [XY, XY0] 3028 3029 x1, x2, y1, y2 = ppl.axis() 3030 x0, dx = (x1+x2)/2, (x2-x1)/2 3031 y0, dy = (y1+y2)/2, (y2-y1)/2 3032 dx, dy = [max(max(dx, dy), radius)]*2 3033 3034 ppl.axis([ 3035 x0 - 1.2*dx, 3036 x0 + 1.2*dx, 3037 y0 - 1.2*dy, 3038 y0 + 1.2*dy, 3039 ]) 3040 3041 XY0_in_display_space = fig.dpi_scale_trans.inverted().transform(ax.transData.transform(XY0)) 3042 3043 for xy, uid in zip(XY, UID): 3044 3045 xy_in_display_space = fig.dpi_scale_trans.inverted().transform(ax.transData.transform(xy)) 3046 vector_in_display_space = xy_in_display_space - XY0_in_display_space 3047 3048 if (vector_in_display_space**2).sum() > 0: 3049 3050 unit_vector_in_display_space = vector_in_display_space / ((vector_in_display_space**2).sum())**0.5 3051 label_vector_in_display_space = vector_in_display_space + unit_vector_in_display_space * labeldist 3052 label_xy_in_display_space = XY0_in_display_space + label_vector_in_display_space 3053 label_xy_in_data_space = ax.transData.inverted().transform(fig.dpi_scale_trans.transform(label_xy_in_display_space)) 3054 3055 ppl.text(*label_xy_in_data_space, uid, va = 'center', ha = 'center', color = analysis_color) 3056 3057 else: 3058 3059 ppl.text(*xy, f'{uid} ', va = 'center', ha = 'right', color = analysis_color) 3060 3061 if radius: 3062 ax.add_artist(Ellipse( 3063 xy = XY0, 3064 width = radius*2, 3065 height = radius*2, 3066 ls = (0, (2,2)), 3067 lw = .7, 3068 ec = analysis_color, 3069 fc = 'None', 3070 )) 3071 ppl.text( 3072 XY0[0], 3073 XY0[1]-radius, 3074 f'\n± {radius*1e3:.0f} ppm', 3075 color = analysis_color, 3076 va = 'top', 3077 ha = 'center', 3078 linespacing = 0.4, 3079 size = 8, 3080 ) 3081 3082 if not os.path.exists(dir): 3083 os.makedirs(dir) 3084 fig.savefig(f'{dir}/{s}.pdf') 3085 ppl.close(fig) 3086 3087 fig = ppl.figure(figsize = figsize) 3088 fig.subplots_adjust(*subplots_adjust) 3089 ppl.xlabel('$δ^{18}O_{VSMOW}$ of $CO_2$ (‰)') 3090 ppl.ylabel('$δ^{13}C_{VPDB}$ (‰)') 3091 3092 for s in saved: 3093 for xy in saved[s][0]: 3094 ppl.plot([xy[0], saved[s][1][0]], [xy[1], saved[s][1][1]], '-', lw = 1, color = analysis_color) 3095 ppl.plot(*saved[s][0].T, 'wo', mew = 1, mec = analysis_color) 3096 ppl.plot(*saved[s][1], 'wo', mew = 1.5, mec = sample_color) 3097 ppl.text(*saved[s][1], f' {s}', va = 'center', ha = 'left', color = sample_color, weight = 'bold') 3098 3099 x1, x2, y1, y2 = ppl.axis() 3100 ppl.axis([ 3101 x1 - (x2-x1)/10, 3102 x2 + (x2-x1)/10, 3103 y1 - (y2-y1)/10, 3104 y2 + (y2-y1)/10, 3105 ]) 3106 3107 3108 if not os.path.exists(dir): 3109 os.makedirs(dir) 3110 fig.savefig(f'{dir}/__all__.pdf') 3111 if show: 3112 ppl.show() 3113 ppl.close(fig) 3114 3115 3116 def _save_D4x_correl( 3117 self, 3118 samples = None, 3119 dir = 'output', 3120 filename = None, 3121 D4x_precision = 4, 3122 correl_precision = 4, 3123 ): 3124 ''' 3125 Save D4x values along with their SE and correlation matrix. 3126 3127 **Parameters** 3128 3129 + `samples`: Only these samples are output (by default: all samples). 3130 + `dir`: the directory in which to save the faile (by defaut: `output`) 3131 + `filename`: the name to the csv file to write to (by default: `D4x_correl.csv`) 3132 + `D4x_precision`: the precision to use when writing `D4x` and `D4x_SE` values (by default: 4) 3133 + `correl_precision`: the precision to use when writing correlation factor values (by default: 4) 3134 ''' 3135 if samples is None: 3136 samples = sorted([s for s in self.unknowns]) 3137 3138 out = [['Sample']] + [[s] for s in samples] 3139 out[0] += [f'D{self._4x}', f'D{self._4x}_SE', f'D{self._4x}_correl'] 3140 for k,s in enumerate(samples): 3141 out[k+1] += [f'{self.samples[s][f"D{self._4x}"]:.4f}', f'{self.samples[s][f"SE_D{self._4x}"]:.4f}'] 3142 for s2 in samples: 3143 out[k+1] += [f'{self.sample_D4x_correl(s,s2):.4f}'] 3144 3145 if not os.path.exists(dir): 3146 os.makedirs(dir) 3147 if filename is None: 3148 filename = f'D{self._4x}_correl.csv' 3149 with open(f'{dir}/{filename}', 'w') as fid: 3150 fid.write(make_csv(out)) 3151 3152 3153 3154 3155class D47data(D4xdata): 3156 ''' 3157 Store and process data for a large set of Δ47 analyses, 3158 usually comprising more than one analytical session. 3159 ''' 3160 3161 Nominal_D4x = { 3162 'ETH-1': 0.2052, 3163 'ETH-2': 0.2085, 3164 'ETH-3': 0.6132, 3165 'ETH-4': 0.4511, 3166 'IAEA-C1': 0.3018, 3167 'IAEA-C2': 0.6409, 3168 'MERCK': 0.5135, 3169 } # I-CDES (Bernasconi et al., 2021) 3170 ''' 3171 Nominal Δ47 values assigned to the Δ47 anchor samples, used by 3172 `D47data.standardize()` to normalize unknown samples to an absolute Δ47 3173 reference frame. 3174 3175 By default equal to (after [Bernasconi et al. (2021)](https://doi.org/10.1029/2020GC009588)): 3176 ```py 3177 { 3178 'ETH-1' : 0.2052, 3179 'ETH-2' : 0.2085, 3180 'ETH-3' : 0.6132, 3181 'ETH-4' : 0.4511, 3182 'IAEA-C1' : 0.3018, 3183 'IAEA-C2' : 0.6409, 3184 'MERCK' : 0.5135, 3185 } 3186 ``` 3187 ''' 3188 3189 3190 @property 3191 def Nominal_D47(self): 3192 return self.Nominal_D4x 3193 3194 3195 @Nominal_D47.setter 3196 def Nominal_D47(self, new): 3197 self.Nominal_D4x = dict(**new) 3198 self.refresh() 3199 3200 3201 def __init__(self, l = [], **kwargs): 3202 ''' 3203 **Parameters:** same as `D4xdata.__init__()` 3204 ''' 3205 D4xdata.__init__(self, l = l, mass = '47', **kwargs) 3206 3207 3208 def D47fromTeq(self, fCo2eqD47 = 'petersen', priority = 'new'): 3209 ''' 3210 Find all samples for which `Teq` is specified, compute equilibrium Δ47 3211 value for that temperature, and add treat these samples as additional anchors. 3212 3213 **Parameters** 3214 3215 + `fCo2eqD47`: Which CO2 equilibrium law to use 3216 (`petersen`: [Petersen et al. (2019)](https://doi.org/10.1029/2018GC008127); 3217 `wang`: [Wang et al. (2019)](https://doi.org/10.1016/j.gca.2004.05.039)). 3218 + `priority`: if `replace`: forget old anchors and only use the new ones; 3219 if `new`: keep pre-existing anchors but update them in case of conflict 3220 between old and new Δ47 values; 3221 if `old`: keep pre-existing anchors but preserve their original Δ47 3222 values in case of conflict. 3223 ''' 3224 f = { 3225 'petersen': fCO2eqD47_Petersen, 3226 'wang': fCO2eqD47_Wang, 3227 }[fCo2eqD47] 3228 foo = {} 3229 for r in self: 3230 if 'Teq' in r: 3231 if r['Sample'] in foo: 3232 assert foo[r['Sample']] == f(r['Teq']), f'Different values of `Teq` provided for sample `{r["Sample"]}`.' 3233 else: 3234 foo[r['Sample']] = f(r['Teq']) 3235 else: 3236 assert r['Sample'] not in foo, f'`Teq` is inconsistently specified for sample `{r["Sample"]}`.' 3237 3238 if priority == 'replace': 3239 self.Nominal_D47 = {} 3240 for s in foo: 3241 if priority != 'old' or s not in self.Nominal_D47: 3242 self.Nominal_D47[s] = foo[s] 3243 3244 def save_D47_correl(self, *args, **kwargs): 3245 return self._save_D4x_correl(*args, **kwargs) 3246 3247 save_D47_correl.__doc__ = D4xdata._save_D4x_correl.__doc__.replace('D4x', 'D47') 3248 3249 3250class D48data(D4xdata): 3251 ''' 3252 Store and process data for a large set of Δ48 analyses, 3253 usually comprising more than one analytical session. 3254 ''' 3255 3256 Nominal_D4x = { 3257 'ETH-1': 0.138, 3258 'ETH-2': 0.138, 3259 'ETH-3': 0.270, 3260 'ETH-4': 0.223, 3261 'GU-1': -0.419, 3262 } # (Fiebig et al., 2019, 2021) 3263 ''' 3264 Nominal Δ48 values assigned to the Δ48 anchor samples, used by 3265 `D48data.standardize()` to normalize unknown samples to an absolute Δ48 3266 reference frame. 3267 3268 By default equal to (after [Fiebig et al. (2019)](https://doi.org/10.1016/j.chemgeo.2019.05.019), 3269 [Fiebig et al. (2021)](https://doi.org/10.1016/j.gca.2021.07.012)): 3270 3271 ```py 3272 { 3273 'ETH-1' : 0.138, 3274 'ETH-2' : 0.138, 3275 'ETH-3' : 0.270, 3276 'ETH-4' : 0.223, 3277 'GU-1' : -0.419, 3278 } 3279 ``` 3280 ''' 3281 3282 3283 @property 3284 def Nominal_D48(self): 3285 return self.Nominal_D4x 3286 3287 3288 @Nominal_D48.setter 3289 def Nominal_D48(self, new): 3290 self.Nominal_D4x = dict(**new) 3291 self.refresh() 3292 3293 3294 def __init__(self, l = [], **kwargs): 3295 ''' 3296 **Parameters:** same as `D4xdata.__init__()` 3297 ''' 3298 D4xdata.__init__(self, l = l, mass = '48', **kwargs) 3299 3300 def save_D48_correl(self, *args, **kwargs): 3301 return self._save_D4x_correl(*args, **kwargs) 3302 3303 save_D48_correl.__doc__ = D4xdata._save_D4x_correl.__doc__.replace('D4x', 'D48') 3304 3305 3306class D49data(D4xdata): 3307 ''' 3308 Store and process data for a large set of Δ49 analyses, 3309 usually comprising more than one analytical session. 3310 ''' 3311 3312 Nominal_D4x = {"1000C": 0.0, "25C": 2.228} # Wang 2004 3313 ''' 3314 Nominal Δ49 values assigned to the Δ49 anchor samples, used by 3315 `D49data.standardize()` to normalize unknown samples to an absolute Δ49 3316 reference frame. 3317 3318 By default equal to (after [Wang et al. (2004)](https://doi.org/10.1016/j.gca.2004.05.039)): 3319 3320 ```py 3321 { 3322 "1000C": 0.0, 3323 "25C": 2.228 3324 } 3325 ``` 3326 ''' 3327 3328 @property 3329 def Nominal_D49(self): 3330 return self.Nominal_D4x 3331 3332 @Nominal_D49.setter 3333 def Nominal_D49(self, new): 3334 self.Nominal_D4x = dict(**new) 3335 self.refresh() 3336 3337 def __init__(self, l=[], **kwargs): 3338 ''' 3339 **Parameters:** same as `D4xdata.__init__()` 3340 ''' 3341 D4xdata.__init__(self, l=l, mass='49', **kwargs) 3342 3343 def save_D49_correl(self, *args, **kwargs): 3344 return self._save_D4x_correl(*args, **kwargs) 3345 3346 save_D49_correl.__doc__ = D4xdata._save_D4x_correl.__doc__.replace('D4x', 'D49') 3347 3348class _SessionPlot(): 3349 ''' 3350 Simple placeholder class 3351 ''' 3352 def __init__(self): 3353 pass 3354 3355_app = typer.Typer( 3356 add_completion = False, 3357 context_settings={'help_option_names': ['-h', '--help']}, 3358 rich_markup_mode = 'rich', 3359 ) 3360 3361@_app.command() 3362def _cli( 3363 rawdata: Annotated[str, typer.Argument(help = "Specify the path of a rawdata input file")], 3364 exclude: Annotated[str, typer.Option('--exclude', '-e', help = 'The path of a file specifying UIDs and/or Samples to exclude')] = 'none', 3365 anchors: Annotated[str, typer.Option('--anchors', '-a', help = 'The path of a file specifying custom anchors')] = 'none', 3366 output_dir: Annotated[str, typer.Option('--output-dir', '-o', help = 'Specify the output directory')] = 'output', 3367 run_D48: Annotated[bool, typer.Option('--D48', help = 'Also standardize D48')] = False, 3368 ): 3369 """ 3370 Process raw D47 data and return standardized results. 3371 3372 See [b]https://mdaeron.github.io/D47crunch/#3-command-line-interface-cli[/b] for more details. 3373 3374 Reads raw data from an input file, optionally excluding some samples and/or analyses, thean standardizes 3375 the data based either on the default [b]d13C_VDPB[/b], [b]d18O_VPDB[/b], [b]D47[/b], and [b]D48[/b] anchors or on different 3376 user-specified anchors. A new directory (named `output` by default) is created to store the results and 3377 the following sequence is applied: 3378 3379 * [b]D47data.wg()[/b] 3380 * [b]D47data.crunch()[/b] 3381 * [b]D47data.standardize()[/b] 3382 * [b]D47data.summary()[/b] 3383 * [b]D47data.table_of_samples()[/b] 3384 * [b]D47data.table_of_sessions()[/b] 3385 * [b]D47data.plot_sessions()[/b] 3386 * [b]D47data.plot_residuals()[/b] 3387 * [b]D47data.table_of_analyses()[/b] 3388 * [b]D47data.plot_distribution_of_analyses()[/b] 3389 * [b]D47data.plot_bulk_compositions()[/b] 3390 * [b]D47data.save_D47_correl()[/b] 3391 3392 Optionally, also apply similar methods for [b]]D48[/b]. 3393 3394 [b]Example CSV file for --anchors option:[/b] 3395 [i] 3396 Sample, d13C_VPDB, d18O_VPDB, D47, D48 3397 ETH-1, 2.02, -2.19, 0.2052, 0.138 3398 ETH-2, -10.17, -18.69, 0.2085, 0.138 3399 ETH-3, 1.71, -1.78, 0.6132, 0.270 3400 ETH-4, , , 0.4511, 0.223 3401 [/i] 3402 Except for [i]Sample[/i], none of the columns above are mandatory. 3403 3404 [b]Example CSV file for --exclude option:[/b] 3405 [i] 3406 Sample, UID 3407 FOO-1, 3408 BAR-2, 3409 , A04 3410 , A17 3411 , A88 3412 [/i] 3413 This will exclude all analyses of samples [i]FOO-1[/i] and [i]BAR-2[/i], 3414 and the analyses with UIDs [i]A04[/i], [i]A17[/i], and [i]A88[/i]. 3415 Neither column is mandatory. 3416 """ 3417 3418 data = D47data() 3419 data.read(rawdata) 3420 3421 if exclude != 'none': 3422 exclude = read_csv(exclude) 3423 exclude_uid = {r['UID'] for r in exclude if 'UID' in r} 3424 exclude_sample = {r['Sample'] for r in exclude if 'Sample' in r} 3425 else: 3426 exclude_uid = [] 3427 exclude_sample = [] 3428 3429 data = D47data([r for r in data if r['UID'] not in exclude_uid and r['Sample'] not in exclude_sample]) 3430 3431 if anchors != 'none': 3432 anchors = read_csv(anchors) 3433 if len([_ for _ in anchors if 'd13C_VPDB' in _]): 3434 data.Nominal_d13C_VPDB = { 3435 _['Sample']: _['d13C_VPDB'] 3436 for _ in anchors 3437 if 'd13C_VPDB' in _ 3438 } 3439 if len([_ for _ in anchors if 'd18O_VPDB' in _]): 3440 data.Nominal_d18O_VPDB = { 3441 _['Sample']: _['d18O_VPDB'] 3442 for _ in anchors 3443 if 'd18O_VPDB' in _ 3444 } 3445 if len([_ for _ in anchors if 'D47' in _]): 3446 data.Nominal_D4x = { 3447 _['Sample']: _['D47'] 3448 for _ in anchors 3449 if 'D47' in _ 3450 } 3451 3452 data.refresh() 3453 data.wg() 3454 data.crunch() 3455 data.standardize() 3456 data.summary(dir = output_dir) 3457 data.plot_residuals(dir = output_dir, filename = 'D47_residuals.pdf', kde = True) 3458 data.plot_bulk_compositions(dir = output_dir + '/bulk_compositions') 3459 data.plot_sessions(dir = output_dir) 3460 data.save_D47_correl(dir = output_dir) 3461 3462 if not run_D48: 3463 data.table_of_samples(dir = output_dir) 3464 data.table_of_analyses(dir = output_dir) 3465 data.table_of_sessions(dir = output_dir) 3466 3467 3468 if run_D48: 3469 data2 = D48data() 3470 print(rawdata) 3471 data2.read(rawdata) 3472 3473 data2 = D48data([r for r in data2 if r['UID'] not in exclude_uid and r['Sample'] not in exclude_sample]) 3474 3475 if anchors != 'none': 3476 if len([_ for _ in anchors if 'd13C_VPDB' in _]): 3477 data2.Nominal_d13C_VPDB = { 3478 _['Sample']: _['d13C_VPDB'] 3479 for _ in anchors 3480 if 'd13C_VPDB' in _ 3481 } 3482 if len([_ for _ in anchors if 'd18O_VPDB' in _]): 3483 data2.Nominal_d18O_VPDB = { 3484 _['Sample']: _['d18O_VPDB'] 3485 for _ in anchors 3486 if 'd18O_VPDB' in _ 3487 } 3488 if len([_ for _ in anchors if 'D48' in _]): 3489 data2.Nominal_D4x = { 3490 _['Sample']: _['D48'] 3491 for _ in anchors 3492 if 'D48' in _ 3493 } 3494 3495 data2.refresh() 3496 data2.wg() 3497 data2.crunch() 3498 data2.standardize() 3499 data2.summary(dir = output_dir) 3500 data2.plot_sessions(dir = output_dir) 3501 data2.plot_residuals(dir = output_dir, filename = 'D48_residuals.pdf', kde = True) 3502 data2.plot_distribution_of_analyses(dir = output_dir) 3503 data2.save_D48_correl(dir = output_dir) 3504 3505 table_of_analyses(data, data2, dir = output_dir) 3506 table_of_samples(data, data2, dir = output_dir) 3507 table_of_sessions(data, data2, dir = output_dir) 3508 3509def __cli(): 3510 _app()
68def fCO2eqD47_Petersen(T): 69 ''' 70 CO2 equilibrium Δ47 value as a function of T (in degrees C) 71 according to [Petersen et al. (2019)](https://doi.org/10.1029/2018GC008127). 72 73 ''' 74 return float(_fCO2eqD47_Petersen(T))
CO2 equilibrium Δ47 value as a function of T (in degrees C) according to Petersen et al. (2019).
79def fCO2eqD47_Wang(T): 80 ''' 81 CO2 equilibrium Δ47 value as a function of `T` (in degrees C) 82 according to [Wang et al. (2004)](https://doi.org/10.1016/j.gca.2004.05.039) 83 (supplementary data of [Dennis et al., 2011](https://doi.org/10.1016/j.gca.2011.09.025)). 84 ''' 85 return float(_fCO2eqD47_Wang(T))
CO2 equilibrium Δ47 value as a function of T
(in degrees C)
according to Wang et al. (2004)
(supplementary data of Dennis et al., 2011).
107def make_csv(x, hsep = ',', vsep = '\n'): 108 ''' 109 Formats a list of lists of strings as a CSV 110 111 **Parameters** 112 113 + `x`: the list of lists of strings to format 114 + `hsep`: the field separator (`,` by default) 115 + `vsep`: the line-ending convention to use (`\\n` by default) 116 117 **Example** 118 119 ```py 120 print(make_csv([['a', 'b', 'c'], ['d', 'e', 'f']])) 121 ``` 122 123 outputs: 124 125 ```py 126 a,b,c 127 d,e,f 128 ``` 129 ''' 130 return vsep.join([hsep.join(l) for l in x])
Formats a list of lists of strings as a CSV
Parameters
x
: the list of lists of strings to formathsep
: the field separator (,
by default)vsep
: the line-ending convention to use (\n
by default)
Example
print(make_csv([['a', 'b', 'c'], ['d', 'e', 'f']]))
outputs:
a,b,c
d,e,f
133def pf(txt): 134 ''' 135 Modify string `txt` to follow `lmfit.Parameter()` naming rules. 136 ''' 137 return txt.replace('-','_').replace('.','_').replace(' ','_')
Modify string txt
to follow lmfit.Parameter()
naming rules.
140def smart_type(x): 141 ''' 142 Tries to convert string `x` to a float if it includes a decimal point, or 143 to an integer if it does not. If both attempts fail, return the original 144 string unchanged. 145 ''' 146 try: 147 y = float(x) 148 except ValueError: 149 return x 150 if '.' not in x: 151 return int(y) 152 return y
Tries to convert string x
to a float if it includes a decimal point, or
to an integer if it does not. If both attempts fail, return the original
string unchanged.
155def pretty_table(x, header = 1, hsep = ' ', vsep = '–', align = '<'): 156 ''' 157 Reads a list of lists of strings and outputs an ascii table 158 159 **Parameters** 160 161 + `x`: a list of lists of strings 162 + `header`: the number of lines to treat as header lines 163 + `hsep`: the horizontal separator between columns 164 + `vsep`: the character to use as vertical separator 165 + `align`: string of left (`<`) or right (`>`) alignment characters. 166 167 **Example** 168 169 ```py 170 x = [['A', 'B', 'C'], ['1', '1.9999', 'foo'], ['10', 'x', 'bar']] 171 print(pretty_table(x)) 172 ``` 173 yields: 174 ``` 175 -- ------ --- 176 A B C 177 -- ------ --- 178 1 1.9999 foo 179 10 x bar 180 -- ------ --- 181 ``` 182 183 ''' 184 txt = [] 185 widths = [np.max([len(e) for e in c]) for c in zip(*x)] 186 187 if len(widths) > len(align): 188 align += '>' * (len(widths)-len(align)) 189 sepline = hsep.join([vsep*w for w in widths]) 190 txt += [sepline] 191 for k,l in enumerate(x): 192 if k and k == header: 193 txt += [sepline] 194 txt += [hsep.join([f'{e:{a}{w}}' for e, w, a in zip(l, widths, align)])] 195 txt += [sepline] 196 txt += [''] 197 return '\n'.join(txt)
Reads a list of lists of strings and outputs an ascii table
Parameters
x
: a list of lists of stringsheader
: the number of lines to treat as header lineshsep
: the horizontal separator between columnsvsep
: the character to use as vertical separatoralign
: string of left (<
) or right (>
) alignment characters.
Example
x = [['A', 'B', 'C'], ['1', '1.9999', 'foo'], ['10', 'x', 'bar']]
print(pretty_table(x))
yields:
-- ------ ---
A B C
-- ------ ---
1 1.9999 foo
10 x bar
-- ------ ---
200def transpose_table(x): 201 ''' 202 Transpose a list if lists 203 204 **Parameters** 205 206 + `x`: a list of lists 207 208 **Example** 209 210 ```py 211 x = [[1, 2], [3, 4]] 212 print(transpose_table(x)) # yields: [[1, 3], [2, 4]] 213 ``` 214 ''' 215 return [[e for e in c] for c in zip(*x)]
Transpose a list if lists
Parameters
x
: a list of lists
Example
x = [[1, 2], [3, 4]]
print(transpose_table(x)) # yields: [[1, 3], [2, 4]]
218def w_avg(X, sX) : 219 ''' 220 Compute variance-weighted average 221 222 Returns the value and SE of the weighted average of the elements of `X`, 223 with relative weights equal to their inverse variances (`1/sX**2`). 224 225 **Parameters** 226 227 + `X`: array-like of elements to average 228 + `sX`: array-like of the corresponding SE values 229 230 **Tip** 231 232 If `X` and `sX` are initially arranged as a list of `(x, sx)` doublets, 233 they may be rearranged using `zip()`: 234 235 ```python 236 foo = [(0, 1), (1, 0.5), (2, 0.5)] 237 print(w_avg(*zip(*foo))) # yields: (1.3333333333333333, 0.3333333333333333) 238 ``` 239 ''' 240 X = [ x for x in X ] 241 sX = [ sx for sx in sX ] 242 W = [ sx**-2 for sx in sX ] 243 W = [ w/sum(W) for w in W ] 244 Xavg = sum([ w*x for w,x in zip(W,X) ]) 245 sXavg = sum([ w**2*sx**2 for w,sx in zip(W,sX) ])**.5 246 return Xavg, sXavg
Compute variance-weighted average
Returns the value and SE of the weighted average of the elements of X
,
with relative weights equal to their inverse variances (1/sX**2
).
Parameters
X
: array-like of elements to averagesX
: array-like of the corresponding SE values
Tip
If X
and sX
are initially arranged as a list of (x, sx)
doublets,
they may be rearranged using zip()
:
foo = [(0, 1), (1, 0.5), (2, 0.5)]
print(w_avg(*zip(*foo))) # yields: (1.3333333333333333, 0.3333333333333333)
249def read_csv(filename, sep = ''): 250 ''' 251 Read contents of `filename` in csv format and return a list of dictionaries. 252 253 In the csv string, spaces before and after field separators (`','` by default) 254 are optional. 255 256 **Parameters** 257 258 + `filename`: the csv file to read 259 + `sep`: csv separator delimiting the fields. By default, use `,`, `;`, or `\t`, 260 whichever appers most often in the contents of `filename`. 261 ''' 262 with open(filename) as fid: 263 txt = fid.read() 264 265 if sep == '': 266 sep = sorted(',;\t', key = lambda x: - txt.count(x))[0] 267 txt = [[x.strip() for x in l.split(sep)] for l in txt.splitlines() if l.strip()] 268 return [{k: smart_type(v) for k,v in zip(txt[0], l) if v} for l in txt[1:]]
Read contents of filename
in csv format and return a list of dictionaries.
In the csv string, spaces before and after field separators (','
by default)
are optional.
Parameters
filename
: the csv file to readsep
: csv separator delimiting the fields. By default, use,
,;
, or, whichever appers most often in the contents of
filename
.
271def simulate_single_analysis( 272 sample = 'MYSAMPLE', 273 d13Cwg_VPDB = -4., d18Owg_VSMOW = 26., 274 d13C_VPDB = None, d18O_VPDB = None, 275 D47 = None, D48 = None, D49 = 0., D17O = 0., 276 a47 = 1., b47 = 0., c47 = -0.9, 277 a48 = 1., b48 = 0., c48 = -0.45, 278 Nominal_D47 = None, 279 Nominal_D48 = None, 280 Nominal_d13C_VPDB = None, 281 Nominal_d18O_VPDB = None, 282 ALPHA_18O_ACID_REACTION = None, 283 R13_VPDB = None, 284 R17_VSMOW = None, 285 R18_VSMOW = None, 286 LAMBDA_17 = None, 287 R18_VPDB = None, 288 ): 289 ''' 290 Compute working-gas delta values for a single analysis, assuming a stochastic working 291 gas and a “perfect” measurement (i.e. raw Δ values are identical to absolute values). 292 293 **Parameters** 294 295 + `sample`: sample name 296 + `d13Cwg_VPDB`, `d18Owg_VSMOW`: bulk composition of the working gas 297 (respectively –4 and +26 ‰ by default) 298 + `d13C_VPDB`, `d18O_VPDB`: bulk composition of the carbonate sample 299 + `D47`, `D48`, `D49`, `D17O`: clumped-isotope and oxygen-17 anomalies 300 of the carbonate sample 301 + `Nominal_D47`, `Nominal_D48`: where to lookup Δ47 and 302 Δ48 values if `D47` or `D48` are not specified 303 + `Nominal_d13C_VPDB`, `Nominal_d18O_VPDB`: where to lookup δ13C and 304 δ18O values if `d13C_VPDB` or `d18O_VPDB` are not specified 305 + `ALPHA_18O_ACID_REACTION`: 18O/16O acid fractionation factor 306 + `R13_VPDB`, `R17_VSMOW`, `R18_VSMOW`, `LAMBDA_17`, `R18_VPDB`: oxygen-17 307 correction parameters (by default equal to the `D4xdata` default values) 308 309 Returns a dictionary with fields 310 `['Sample', 'D17O', 'd13Cwg_VPDB', 'd18Owg_VSMOW', 'd45', 'd46', 'd47', 'd48', 'd49']`. 311 ''' 312 313 if Nominal_d13C_VPDB is None: 314 Nominal_d13C_VPDB = D4xdata().Nominal_d13C_VPDB 315 316 if Nominal_d18O_VPDB is None: 317 Nominal_d18O_VPDB = D4xdata().Nominal_d18O_VPDB 318 319 if ALPHA_18O_ACID_REACTION is None: 320 ALPHA_18O_ACID_REACTION = D4xdata().ALPHA_18O_ACID_REACTION 321 322 if R13_VPDB is None: 323 R13_VPDB = D4xdata().R13_VPDB 324 325 if R17_VSMOW is None: 326 R17_VSMOW = D4xdata().R17_VSMOW 327 328 if R18_VSMOW is None: 329 R18_VSMOW = D4xdata().R18_VSMOW 330 331 if LAMBDA_17 is None: 332 LAMBDA_17 = D4xdata().LAMBDA_17 333 334 if R18_VPDB is None: 335 R18_VPDB = D4xdata().R18_VPDB 336 337 R17_VPDB = R17_VSMOW * (R18_VPDB / R18_VSMOW) ** LAMBDA_17 338 339 if Nominal_D47 is None: 340 Nominal_D47 = D47data().Nominal_D47 341 342 if Nominal_D48 is None: 343 Nominal_D48 = D48data().Nominal_D48 344 345 if d13C_VPDB is None: 346 if sample in Nominal_d13C_VPDB: 347 d13C_VPDB = Nominal_d13C_VPDB[sample] 348 else: 349 raise KeyError(f"Sample {sample} is missing d13C_VDP value, and it is not defined in Nominal_d13C_VDP.") 350 351 if d18O_VPDB is None: 352 if sample in Nominal_d18O_VPDB: 353 d18O_VPDB = Nominal_d18O_VPDB[sample] 354 else: 355 raise KeyError(f"Sample {sample} is missing d18O_VPDB value, and it is not defined in Nominal_d18O_VPDB.") 356 357 if D47 is None: 358 if sample in Nominal_D47: 359 D47 = Nominal_D47[sample] 360 else: 361 raise KeyError(f"Sample {sample} is missing D47 value, and it is not defined in Nominal_D47.") 362 363 if D48 is None: 364 if sample in Nominal_D48: 365 D48 = Nominal_D48[sample] 366 else: 367 raise KeyError(f"Sample {sample} is missing D48 value, and it is not defined in Nominal_D48.") 368 369 X = D4xdata() 370 X.R13_VPDB = R13_VPDB 371 X.R17_VSMOW = R17_VSMOW 372 X.R18_VSMOW = R18_VSMOW 373 X.LAMBDA_17 = LAMBDA_17 374 X.R18_VPDB = R18_VPDB 375 X.R17_VPDB = R17_VSMOW * (R18_VPDB / R18_VSMOW)**LAMBDA_17 376 377 R45wg, R46wg, R47wg, R48wg, R49wg = X.compute_isobar_ratios( 378 R13 = R13_VPDB * (1 + d13Cwg_VPDB/1000), 379 R18 = R18_VSMOW * (1 + d18Owg_VSMOW/1000), 380 ) 381 R45, R46, R47, R48, R49 = X.compute_isobar_ratios( 382 R13 = R13_VPDB * (1 + d13C_VPDB/1000), 383 R18 = R18_VPDB * (1 + d18O_VPDB/1000) * ALPHA_18O_ACID_REACTION, 384 D17O=D17O, D47=D47, D48=D48, D49=D49, 385 ) 386 R45stoch, R46stoch, R47stoch, R48stoch, R49stoch = X.compute_isobar_ratios( 387 R13 = R13_VPDB * (1 + d13C_VPDB/1000), 388 R18 = R18_VPDB * (1 + d18O_VPDB/1000) * ALPHA_18O_ACID_REACTION, 389 D17O=D17O, 390 ) 391 392 d45 = 1000 * (R45/R45wg - 1) 393 d46 = 1000 * (R46/R46wg - 1) 394 d47 = 1000 * (R47/R47wg - 1) 395 d48 = 1000 * (R48/R48wg - 1) 396 d49 = 1000 * (R49/R49wg - 1) 397 398 for k in range(3): # dumb iteration to adjust for small changes in d47 399 R47raw = (1 + (a47 * D47 + b47 * d47 + c47)/1000) * R47stoch 400 R48raw = (1 + (a48 * D48 + b48 * d48 + c48)/1000) * R48stoch 401 d47 = 1000 * (R47raw/R47wg - 1) 402 d48 = 1000 * (R48raw/R48wg - 1) 403 404 return dict( 405 Sample = sample, 406 D17O = D17O, 407 d13Cwg_VPDB = d13Cwg_VPDB, 408 d18Owg_VSMOW = d18Owg_VSMOW, 409 d45 = d45, 410 d46 = d46, 411 d47 = d47, 412 d48 = d48, 413 d49 = d49, 414 )
Compute working-gas delta values for a single analysis, assuming a stochastic working gas and a “perfect” measurement (i.e. raw Δ values are identical to absolute values).
Parameters
sample
: sample named13Cwg_VPDB
,d18Owg_VSMOW
: bulk composition of the working gas (respectively –4 and +26 ‰ by default)d13C_VPDB
,d18O_VPDB
: bulk composition of the carbonate sampleD47
,D48
,D49
,D17O
: clumped-isotope and oxygen-17 anomalies of the carbonate sampleNominal_D47
,Nominal_D48
: where to lookup Δ47 and Δ48 values ifD47
orD48
are not specifiedNominal_d13C_VPDB
,Nominal_d18O_VPDB
: where to lookup δ13C and δ18O values ifd13C_VPDB
ord18O_VPDB
are not specifiedALPHA_18O_ACID_REACTION
: 18O/16O acid fractionation factorR13_VPDB
,R17_VSMOW
,R18_VSMOW
,LAMBDA_17
,R18_VPDB
: oxygen-17 correction parameters (by default equal to theD4xdata
default values)
Returns a dictionary with fields
['Sample', 'D17O', 'd13Cwg_VPDB', 'd18Owg_VSMOW', 'd45', 'd46', 'd47', 'd48', 'd49']
.
417def virtual_data( 418 samples = [], 419 a47 = 1., b47 = 0., c47 = -0.9, 420 a48 = 1., b48 = 0., c48 = -0.45, 421 rd45 = 0.020, rd46 = 0.060, 422 rD47 = 0.015, rD48 = 0.045, 423 d13Cwg_VPDB = None, d18Owg_VSMOW = None, 424 session = None, 425 Nominal_D47 = None, Nominal_D48 = None, 426 Nominal_d13C_VPDB = None, Nominal_d18O_VPDB = None, 427 ALPHA_18O_ACID_REACTION = None, 428 R13_VPDB = None, 429 R17_VSMOW = None, 430 R18_VSMOW = None, 431 LAMBDA_17 = None, 432 R18_VPDB = None, 433 seed = 0, 434 shuffle = True, 435 ): 436 ''' 437 Return list with simulated analyses from a single session. 438 439 **Parameters** 440 441 + `samples`: a list of entries; each entry is a dictionary with the following fields: 442 * `Sample`: the name of the sample 443 * `d13C_VPDB`, `d18O_VPDB`: bulk composition of the carbonate sample 444 * `D47`, `D48`, `D49`, `D17O` (all optional): clumped-isotope and oxygen-17 anomalies of the carbonate sample 445 * `N`: how many analyses to generate for this sample 446 + `a47`: scrambling factor for Δ47 447 + `b47`: compositional nonlinearity for Δ47 448 + `c47`: working gas offset for Δ47 449 + `a48`: scrambling factor for Δ48 450 + `b48`: compositional nonlinearity for Δ48 451 + `c48`: working gas offset for Δ48 452 + `rd45`: analytical repeatability of δ45 453 + `rd46`: analytical repeatability of δ46 454 + `rD47`: analytical repeatability of Δ47 455 + `rD48`: analytical repeatability of Δ48 456 + `d13Cwg_VPDB`, `d18Owg_VSMOW`: bulk composition of the working gas 457 (by default equal to the `simulate_single_analysis` default values) 458 + `session`: name of the session (no name by default) 459 + `Nominal_D47`, `Nominal_D48`: where to lookup Δ47 and Δ48 values 460 if `D47` or `D48` are not specified (by default equal to the `simulate_single_analysis` defaults) 461 + `Nominal_d13C_VPDB`, `Nominal_d18O_VPDB`: where to lookup δ13C and 462 δ18O values if `d13C_VPDB` or `d18O_VPDB` are not specified 463 (by default equal to the `simulate_single_analysis` defaults) 464 + `ALPHA_18O_ACID_REACTION`: 18O/16O acid fractionation factor 465 (by default equal to the `simulate_single_analysis` defaults) 466 + `R13_VPDB`, `R17_VSMOW`, `R18_VSMOW`, `LAMBDA_17`, `R18_VPDB`: oxygen-17 467 correction parameters (by default equal to the `simulate_single_analysis` default) 468 + `seed`: explicitly set to a non-zero value to achieve random but repeatable simulations 469 + `shuffle`: randomly reorder the sequence of analyses 470 471 472 Here is an example of using this method to generate an arbitrary combination of 473 anchors and unknowns for a bunch of sessions: 474 475 ```py 476 .. include:: ../../code_examples/virtual_data/example.py 477 ``` 478 479 This should output something like: 480 481 ``` 482 .. include:: ../../code_examples/virtual_data/output.txt 483 ``` 484 ''' 485 486 kwargs = locals().copy() 487 488 from numpy import random as nprandom 489 if seed: 490 rng = nprandom.default_rng(seed) 491 else: 492 rng = nprandom.default_rng() 493 494 N = sum([s['N'] for s in samples]) 495 errors45 = rng.normal(loc = 0, scale = 1, size = N) # generate random measurement errors 496 errors45 *= rd45 / stdev(errors45) # scale errors to rd45 497 errors46 = rng.normal(loc = 0, scale = 1, size = N) # generate random measurement errors 498 errors46 *= rd46 / stdev(errors46) # scale errors to rd46 499 errors47 = rng.normal(loc = 0, scale = 1, size = N) # generate random measurement errors 500 errors47 *= rD47 / stdev(errors47) # scale errors to rD47 501 errors48 = rng.normal(loc = 0, scale = 1, size = N) # generate random measurement errors 502 errors48 *= rD48 / stdev(errors48) # scale errors to rD48 503 504 k = 0 505 out = [] 506 for s in samples: 507 kw = {} 508 kw['sample'] = s['Sample'] 509 kw = { 510 **kw, 511 **{var: kwargs[var] 512 for var in [ 513 'd13Cwg_VPDB', 'd18Owg_VSMOW', 'ALPHA_18O_ACID_REACTION', 514 'Nominal_D47', 'Nominal_D48', 'Nominal_d13C_VPDB', 'Nominal_d18O_VPDB', 515 'R13_VPDB', 'R17_VSMOW', 'R18_VSMOW', 'LAMBDA_17', 'R18_VPDB', 516 'a47', 'b47', 'c47', 'a48', 'b48', 'c48', 517 ] 518 if kwargs[var] is not None}, 519 **{var: s[var] 520 for var in ['d13C_VPDB', 'd18O_VPDB', 'D47', 'D48', 'D49', 'D17O'] 521 if var in s}, 522 } 523 524 sN = s['N'] 525 while sN: 526 out.append(simulate_single_analysis(**kw)) 527 out[-1]['d45'] += errors45[k] 528 out[-1]['d46'] += errors46[k] 529 out[-1]['d47'] += (errors45[k] + errors46[k] + errors47[k]) * a47 530 out[-1]['d48'] += (2*errors46[k] + errors48[k]) * a48 531 sN -= 1 532 k += 1 533 534 if session is not None: 535 for r in out: 536 r['Session'] = session 537 538 if shuffle: 539 nprandom.shuffle(out) 540 541 return out
Return list with simulated analyses from a single session.
Parameters
samples
: a list of entries; each entry is a dictionary with the following fields:Sample
: the name of the sampled13C_VPDB
,d18O_VPDB
: bulk composition of the carbonate sampleD47
,D48
,D49
,D17O
(all optional): clumped-isotope and oxygen-17 anomalies of the carbonate sampleN
: how many analyses to generate for this sample
a47
: scrambling factor for Δ47b47
: compositional nonlinearity for Δ47c47
: working gas offset for Δ47a48
: scrambling factor for Δ48b48
: compositional nonlinearity for Δ48c48
: working gas offset for Δ48rd45
: analytical repeatability of δ45rd46
: analytical repeatability of δ46rD47
: analytical repeatability of Δ47rD48
: analytical repeatability of Δ48d13Cwg_VPDB
,d18Owg_VSMOW
: bulk composition of the working gas (by default equal to thesimulate_single_analysis
default values)session
: name of the session (no name by default)Nominal_D47
,Nominal_D48
: where to lookup Δ47 and Δ48 values ifD47
orD48
are not specified (by default equal to thesimulate_single_analysis
defaults)Nominal_d13C_VPDB
,Nominal_d18O_VPDB
: where to lookup δ13C and δ18O values ifd13C_VPDB
ord18O_VPDB
are not specified (by default equal to thesimulate_single_analysis
defaults)ALPHA_18O_ACID_REACTION
: 18O/16O acid fractionation factor (by default equal to thesimulate_single_analysis
defaults)R13_VPDB
,R17_VSMOW
,R18_VSMOW
,LAMBDA_17
,R18_VPDB
: oxygen-17 correction parameters (by default equal to thesimulate_single_analysis
default)seed
: explicitly set to a non-zero value to achieve random but repeatable simulationsshuffle
: randomly reorder the sequence of analyses
Here is an example of using this method to generate an arbitrary combination of anchors and unknowns for a bunch of sessions:
from D47crunch import virtual_data, D47data
args = dict(
samples = [
dict(Sample = 'ETH-1', N = 3),
dict(Sample = 'ETH-2', N = 3),
dict(Sample = 'ETH-3', N = 3),
dict(Sample = 'FOO', N = 3,
d13C_VPDB = -5., d18O_VPDB = -10.,
D47 = 0.3, D48 = 0.15),
dict(Sample = 'BAR', N = 3,
d13C_VPDB = -15., d18O_VPDB = -2.,
D47 = 0.6, D48 = 0.2),
], rD47 = 0.010, rD48 = 0.030)
session1 = virtual_data(session = 'Session_01', **args, seed = 123)
session2 = virtual_data(session = 'Session_02', **args, seed = 1234)
session3 = virtual_data(session = 'Session_03', **args, seed = 12345)
session4 = virtual_data(session = 'Session_04', **args, seed = 123456)
D = D47data(session1 + session2 + session3 + session4)
D.crunch()
D.standardize()
D.table_of_sessions(verbose = True, save_to_file = False)
D.table_of_samples(verbose = True, save_to_file = False)
D.table_of_analyses(verbose = True, save_to_file = False)
This should output something like:
[table_of_sessions]
–––––––––– –– –– ––––––––––– –––––––––––– –––––– –––––– –––––– ––––––––––––– ––––––––––––– ––––––––––––––
Session Na Nu d13Cwg_VPDB d18Owg_VSMOW r_d13C r_d18O r_D47 a ± SE 1e3 x b ± SE c ± SE
–––––––––– –– –– ––––––––––– –––––––––––– –––––– –––––– –––––– ––––––––––––– ––––––––––––– ––––––––––––––
Session_01 9 6 -4.000 26.000 0.0205 0.0633 0.0075 1.015 ± 0.015 0.427 ± 0.232 -0.909 ± 0.006
Session_02 9 6 -4.000 26.000 0.0210 0.0882 0.0082 0.990 ± 0.015 0.484 ± 0.232 -0.905 ± 0.006
Session_03 9 6 -4.000 26.000 0.0186 0.0505 0.0091 0.997 ± 0.015 0.167 ± 0.233 -0.901 ± 0.006
Session_04 9 6 -4.000 26.000 0.0192 0.0467 0.0070 1.017 ± 0.015 0.229 ± 0.232 -0.910 ± 0.006
–––––––––– –– –– ––––––––––– –––––––––––– –––––– –––––– –––––– ––––––––––––– ––––––––––––– ––––––––––––––
[table_of_samples]
–––––– –– ––––––––– –––––––––– –––––– –––––– –––––––– –––––– ––––––––
Sample N d13C_VPDB d18O_VSMOW D47 SE 95% CL SD p_Levene
–––––– –– ––––––––– –––––––––– –––––– –––––– –––––––– –––––– ––––––––
ETH-1 12 2.02 37.01 0.2052 0.0083
ETH-2 12 -10.17 19.88 0.2085 0.0090
ETH-3 12 1.71 37.46 0.6132 0.0083
BAR 12 -15.02 37.22 0.6057 0.0042 ± 0.0085 0.0088 0.753
FOO 12 -5.00 28.89 0.3024 0.0031 ± 0.0062 0.0070 0.497
–––––– –– ––––––––– –––––––––– –––––– –––––– –––––––– –––––– ––––––––
[table_of_analyses]
––– –––––––––– –––––– ––––––––––– –––––––––––– ––––––––– ––––––––– –––––––––– –––––––––– –––––––––– –––––––––– –––––––––– ––––––––– ––––––––– ––––––––– ––––––––
UID Session Sample d13Cwg_VPDB d18Owg_VSMOW d45 d46 d47 d48 d49 d13C_VPDB d18O_VSMOW D47raw D48raw D49raw D47
––– –––––––––– –––––– ––––––––––– –––––––––––– ––––––––– ––––––––– –––––––––– –––––––––– –––––––––– –––––––––– –––––––––– ––––––––– ––––––––– ––––––––– ––––––––
1 Session_01 ETH-3 -4.000 26.000 5.755174 11.255104 16.792797 22.451660 28.306614 1.723596 37.497816 -0.270825 -0.181089 -0.195908 0.621458
2 Session_01 BAR -4.000 26.000 -9.959983 10.926995 0.053806 21.724901 10.707292 -15.041279 37.199026 -0.300066 -0.243252 -0.029371 0.599675
3 Session_01 ETH-3 -4.000 26.000 5.734896 11.229855 16.740410 22.402091 28.306614 1.702875 37.472070 -0.276998 -0.179635 -0.125368 0.615396
4 Session_01 FOO -4.000 26.000 -0.838118 2.819853 1.310384 5.326005 4.665655 -5.004629 28.895933 -0.593755 -0.319861 0.014956 0.309692
5 Session_01 FOO -4.000 26.000 -0.848028 2.874679 1.346196 5.439150 4.665655 -5.017230 28.951964 -0.601502 -0.316664 -0.081898 0.302042
6 Session_01 ETH-1 -4.000 26.000 6.010276 10.840276 16.207960 21.475150 27.780042 2.011176 37.073454 -0.704188 -0.315986 -0.172089 0.194589
7 Session_01 BAR -4.000 26.000 -9.920507 10.903408 0.065076 21.704075 10.707292 -14.998270 37.174839 -0.307018 -0.216978 -0.026076 0.592818
8 Session_01 ETH-2 -4.000 26.000 -5.982229 -6.110437 -12.827036 -12.492272 -18.023381 -10.166188 19.784916 -0.693555 -0.312598 0.251040 0.217274
9 Session_01 ETH-2 -4.000 26.000 -5.991278 -5.995054 -12.741562 -12.184075 -18.023381 -10.180122 19.902809 -0.711697 -0.232746 0.032602 0.199357
10 Session_01 BAR -4.000 26.000 -9.915975 10.968470 0.153453 21.749385 10.707292 -14.995822 37.241294 -0.286638 -0.301325 -0.157376 0.612868
11 Session_01 ETH-3 -4.000 26.000 5.727341 11.211663 16.713472 22.364770 28.306614 1.695479 37.453503 -0.278056 -0.180158 -0.082015 0.614365
12 Session_01 ETH-2 -4.000 26.000 -5.974124 -5.955517 -12.668784 -12.208184 -18.023381 -10.163274 19.943159 -0.694902 -0.336672 -0.063946 0.215880
13 Session_01 ETH-1 -4.000 26.000 6.049381 10.706856 16.135579 21.196941 27.780042 2.057827 36.937067 -0.685751 -0.324384 0.045870 0.212791
14 Session_01 FOO -4.000 26.000 -0.876454 2.906764 1.341194 5.490264 4.665655 -5.048760 28.984806 -0.608593 -0.329808 -0.114437 0.295055
15 Session_01 ETH-1 -4.000 26.000 5.995601 10.755323 16.116087 21.285428 27.780042 1.998631 36.986704 -0.696924 -0.333640 0.008600 0.201787
16 Session_02 ETH-2 -4.000 26.000 -5.982371 -6.036210 -12.762399 -12.309944 -18.023381 -10.175178 19.819614 -0.701348 -0.277354 0.104418 0.212021
17 Session_02 BAR -4.000 26.000 -9.963888 10.865863 -0.023549 21.615868 10.707292 -15.053743 37.174715 -0.313906 -0.229031 0.093637 0.597041
18 Session_02 ETH-3 -4.000 26.000 5.719281 11.207303 16.681693 22.370886 28.306614 1.691780 37.488633 -0.296801 -0.165556 -0.065004 0.606143
19 Session_02 FOO -4.000 26.000 -0.848415 2.849823 1.308081 5.427767 4.665655 -5.018107 28.927036 -0.614791 -0.278426 -0.032784 0.292547
20 Session_02 ETH-1 -4.000 26.000 5.993918 10.617469 15.991900 21.070358 27.780042 2.006934 36.882679 -0.683329 -0.271476 0.278458 0.216152
21 Session_02 BAR -4.000 26.000 -9.957566 10.903888 0.031785 21.739434 10.707292 -15.048386 37.213724 -0.302139 -0.183327 0.012926 0.608897
22 Session_02 ETH-3 -4.000 26.000 5.716356 11.091821 16.582487 22.123857 28.306614 1.692901 37.370126 -0.279100 -0.178789 0.162540 0.624067
23 Session_02 ETH-2 -4.000 26.000 -5.950370 -5.959974 -12.650784 -12.197864 -18.023381 -10.143809 19.897777 -0.696916 -0.317263 -0.080604 0.216441
24 Session_02 ETH-3 -4.000 26.000 5.757137 11.232751 16.744567 22.398244 28.306614 1.731295 37.514660 -0.298533 -0.189123 -0.154557 0.604363
25 Session_02 FOO -4.000 26.000 -0.819742 2.826793 1.317044 5.330616 4.665655 -4.986618 28.903335 -0.612871 -0.329113 -0.018244 0.294481
26 Session_02 ETH-1 -4.000 26.000 6.019963 10.773112 16.163825 21.331060 27.780042 2.029040 37.042346 -0.692234 -0.324161 -0.051788 0.207075
27 Session_02 BAR -4.000 26.000 -9.936020 10.862339 0.024660 21.563307 10.707292 -15.023836 37.171034 -0.291333 -0.273498 0.070452 0.619812
28 Session_02 ETH-2 -4.000 26.000 -5.993476 -5.944866 -12.696865 -12.149754 -18.023381 -10.190430 19.913381 -0.713779 -0.298963 -0.064251 0.199436
29 Session_02 FOO -4.000 26.000 -0.835046 2.870518 1.355370 5.487896 4.665655 -5.004585 28.948243 -0.601666 -0.259900 -0.087592 0.305777
30 Session_02 ETH-1 -4.000 26.000 6.030532 10.851030 16.245571 21.457100 27.780042 2.037466 37.122284 -0.698413 -0.354920 -0.214443 0.200795
31 Session_03 BAR -4.000 26.000 -9.952115 11.034508 0.169809 21.885915 10.707292 -15.002819 37.370451 -0.296804 -0.298351 -0.246731 0.606414
32 Session_03 BAR -4.000 26.000 -9.957114 10.898997 0.044946 21.602296 10.707292 -15.003175 37.230716 -0.284699 -0.307849 0.021944 0.618578
33 Session_03 FOO -4.000 26.000 -0.823857 2.761300 1.258060 5.239992 4.665655 -4.973383 28.817444 -0.603327 -0.288652 0.114488 0.298751
34 Session_03 ETH-3 -4.000 26.000 5.753467 11.206589 16.719131 22.373244 28.306614 1.723960 37.511190 -0.294350 -0.161838 -0.099835 0.606103
35 Session_03 ETH-1 -4.000 26.000 6.040566 10.786620 16.205283 21.374963 27.780042 2.045244 37.077432 -0.685706 -0.307909 -0.099869 0.213609
36 Session_03 ETH-1 -4.000 26.000 5.994622 10.743980 16.116098 21.243734 27.780042 1.997857 37.033567 -0.684883 -0.352014 0.031692 0.214449
37 Session_03 BAR -4.000 26.000 -9.928709 10.989665 0.148059 21.852677 10.707292 -14.976237 37.324152 -0.299358 -0.242185 -0.184835 0.603855
38 Session_03 ETH-2 -4.000 26.000 -6.000290 -5.947172 -12.697463 -12.164602 -18.023381 -10.167221 19.848953 -0.705037 -0.309350 -0.052386 0.199061
39 Session_03 ETH-3 -4.000 26.000 5.748546 11.079879 16.580826 22.120063 28.306614 1.723364 37.380534 -0.302133 -0.158882 0.151641 0.598318
40 Session_03 FOO -4.000 26.000 -0.873798 2.820799 1.272165 5.370745 4.665655 -5.028782 28.878917 -0.596008 -0.277258 0.051165 0.306090
41 Session_03 ETH-3 -4.000 26.000 5.718991 11.146227 16.640814 22.243185 28.306614 1.689442 37.449023 -0.277332 -0.169668 0.053997 0.623187
42 Session_03 ETH-1 -4.000 26.000 6.004078 10.683951 16.045192 21.214355 27.780042 2.010134 36.971642 -0.705956 -0.262026 0.138399 0.193323
43 Session_03 FOO -4.000 26.000 -0.800284 2.851299 1.376828 5.379547 4.665655 -4.951581 28.910199 -0.597293 -0.329315 -0.087015 0.304784
44 Session_03 ETH-2 -4.000 26.000 -5.997147 -5.905858 -12.655382 -12.081612 -18.023381 -10.165400 19.891551 -0.706536 -0.308464 -0.137414 0.197550
45 Session_03 ETH-2 -4.000 26.000 -6.008525 -5.909707 -12.647727 -12.075913 -18.023381 -10.177379 19.887608 -0.683183 -0.294956 -0.117608 0.220975
46 Session_04 ETH-1 -4.000 26.000 6.023822 10.730714 16.121184 21.235757 27.780042 2.012958 36.989833 -0.696908 -0.333582 0.026555 0.205610
47 Session_04 ETH-3 -4.000 26.000 5.739420 11.128582 16.641344 22.166106 28.306614 1.695046 37.399884 -0.280608 -0.210162 0.066645 0.614665
48 Session_04 BAR -4.000 26.000 -9.951025 10.951923 0.089386 21.738926 10.707292 -15.031949 37.254709 -0.298065 -0.278834 -0.087463 0.601230
49 Session_04 FOO -4.000 26.000 -0.848192 2.777763 1.251297 5.280272 4.665655 -5.023358 28.822585 -0.601094 -0.281419 0.108186 0.303128
50 Session_04 BAR -4.000 26.000 -9.931741 10.819830 -0.023748 21.529372 10.707292 -15.006533 37.118743 -0.302866 -0.222623 0.148462 0.596536
51 Session_04 FOO -4.000 26.000 -0.853969 2.805035 1.267571 5.353907 4.665655 -5.030523 28.850660 -0.605611 -0.262571 0.060903 0.298685
52 Session_04 ETH-3 -4.000 26.000 5.751908 11.207110 16.726741 22.380392 28.306614 1.705481 37.480657 -0.285776 -0.155878 -0.099197 0.609567
53 Session_04 ETH-3 -4.000 26.000 5.798016 11.254135 16.832228 22.432473 28.306614 1.752928 37.528936 -0.275047 -0.197935 -0.239408 0.620088
54 Session_04 FOO -4.000 26.000 -0.791191 2.708220 1.256167 5.145784 4.665655 -4.960004 28.750896 -0.586913 -0.276505 0.183674 0.317065
55 Session_04 ETH-2 -4.000 26.000 -5.966627 -5.893789 -12.597717 -12.120719 -18.023381 -10.161842 19.911776 -0.691757 -0.372308 -0.193986 0.217132
56 Session_04 ETH-1 -4.000 26.000 6.017312 10.735930 16.123043 21.270597 27.780042 2.005824 36.995214 -0.693479 -0.309795 0.023309 0.208980
57 Session_04 ETH-2 -4.000 26.000 -5.986501 -5.915157 -12.656583 -12.060382 -18.023381 -10.182247 19.889836 -0.709603 -0.268277 -0.130450 0.199604
58 Session_04 ETH-2 -4.000 26.000 -5.973623 -5.975018 -12.694278 -12.194472 -18.023381 -10.166297 19.828211 -0.701951 -0.283570 -0.025935 0.207135
59 Session_04 ETH-1 -4.000 26.000 6.029937 10.766997 16.151273 21.345479 27.780042 2.018148 37.027152 -0.708855 -0.297953 -0.050465 0.193862
60 Session_04 BAR -4.000 26.000 -9.926078 10.884823 0.060864 21.650722 10.707292 -15.002880 37.185606 -0.287358 -0.232425 0.016044 0.611760
––– –––––––––– –––––– ––––––––––– –––––––––––– ––––––––– ––––––––– –––––––––– –––––––––– –––––––––– –––––––––– –––––––––– ––––––––– ––––––––– ––––––––– ––––––––
543def table_of_samples( 544 data47 = None, 545 data48 = None, 546 dir = 'output', 547 filename = None, 548 save_to_file = True, 549 print_out = True, 550 output = None, 551 ): 552 ''' 553 Print out, save to disk and/or return a combined table of samples 554 for a pair of `D47data` and `D48data` objects. 555 556 **Parameters** 557 558 + `data47`: `D47data` instance 559 + `data48`: `D48data` instance 560 + `dir`: the directory in which to save the table 561 + `filename`: the name to the csv file to write to 562 + `save_to_file`: whether to save the table to disk 563 + `print_out`: whether to print out the table 564 + `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`); 565 if set to `'raw'`: return a list of list of strings 566 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 567 ''' 568 if data47 is None: 569 if data48 is None: 570 raise TypeError("Arguments must include at least one D47data() or D48data() instance.") 571 else: 572 return data48.table_of_samples( 573 dir = dir, 574 filename = filename, 575 save_to_file = save_to_file, 576 print_out = print_out, 577 output = output 578 ) 579 else: 580 if data48 is None: 581 return data47.table_of_samples( 582 dir = dir, 583 filename = filename, 584 save_to_file = save_to_file, 585 print_out = print_out, 586 output = output 587 ) 588 else: 589 out47 = data47.table_of_samples(save_to_file = False, print_out = False, output = 'raw') 590 out48 = data48.table_of_samples(save_to_file = False, print_out = False, output = 'raw') 591 out = transpose_table(transpose_table(out47) + transpose_table(out48)[4:]) 592 593 if save_to_file: 594 if not os.path.exists(dir): 595 os.makedirs(dir) 596 if filename is None: 597 filename = f'D47D48_samples.csv' 598 with open(f'{dir}/{filename}', 'w') as fid: 599 fid.write(make_csv(out)) 600 if print_out: 601 print('\n'+pretty_table(out)) 602 if output == 'raw': 603 return out 604 elif output == 'pretty': 605 return pretty_table(out)
Print out, save to disk and/or return a combined table of samples
for a pair of D47data
and D48data
objects.
Parameters
data47
:D47data
instancedata48
:D48data
instancedir
: the directory in which to save the tablefilename
: the name to the csv file to write tosave_to_file
: whether to save the table to diskprint_out
: whether to print out the tableoutput
: if set to'pretty'
: return a pretty text table (seepretty_table()
); if set to'raw'
: return a list of list of strings (e.g.,[['header1', 'header2'], ['0.1', '0.2']]
)
608def table_of_sessions( 609 data47 = None, 610 data48 = None, 611 dir = 'output', 612 filename = None, 613 save_to_file = True, 614 print_out = True, 615 output = None, 616 ): 617 ''' 618 Print out, save to disk and/or return a combined table of sessions 619 for a pair of `D47data` and `D48data` objects. 620 ***Only applicable if the sessions in `data47` and those in `data48` 621 consist of the exact same sets of analyses.*** 622 623 **Parameters** 624 625 + `data47`: `D47data` instance 626 + `data48`: `D48data` instance 627 + `dir`: the directory in which to save the table 628 + `filename`: the name to the csv file to write to 629 + `save_to_file`: whether to save the table to disk 630 + `print_out`: whether to print out the table 631 + `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`); 632 if set to `'raw'`: return a list of list of strings 633 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 634 ''' 635 if data47 is None: 636 if data48 is None: 637 raise TypeError("Arguments must include at least one D47data() or D48data() instance.") 638 else: 639 return data48.table_of_sessions( 640 dir = dir, 641 filename = filename, 642 save_to_file = save_to_file, 643 print_out = print_out, 644 output = output 645 ) 646 else: 647 if data48 is None: 648 return data47.table_of_sessions( 649 dir = dir, 650 filename = filename, 651 save_to_file = save_to_file, 652 print_out = print_out, 653 output = output 654 ) 655 else: 656 out47 = data47.table_of_sessions(save_to_file = False, print_out = False, output = 'raw') 657 out48 = data48.table_of_sessions(save_to_file = False, print_out = False, output = 'raw') 658 for k,x in enumerate(out47[0]): 659 if k>7: 660 out47[0][k] = out47[0][k].replace('a', 'a_47').replace('b', 'b_47').replace('c', 'c_47') 661 out48[0][k] = out48[0][k].replace('a', 'a_48').replace('b', 'b_48').replace('c', 'c_48') 662 out = transpose_table(transpose_table(out47) + transpose_table(out48)[7:]) 663 664 if save_to_file: 665 if not os.path.exists(dir): 666 os.makedirs(dir) 667 if filename is None: 668 filename = f'D47D48_sessions.csv' 669 with open(f'{dir}/{filename}', 'w') as fid: 670 fid.write(make_csv(out)) 671 if print_out: 672 print('\n'+pretty_table(out)) 673 if output == 'raw': 674 return out 675 elif output == 'pretty': 676 return pretty_table(out)
Print out, save to disk and/or return a combined table of sessions
for a pair of D47data
and D48data
objects.
Only applicable if the sessions in data47
and those in data48
consist of the exact same sets of analyses.
Parameters
data47
:D47data
instancedata48
:D48data
instancedir
: the directory in which to save the tablefilename
: the name to the csv file to write tosave_to_file
: whether to save the table to diskprint_out
: whether to print out the tableoutput
: if set to'pretty'
: return a pretty text table (seepretty_table()
); if set to'raw'
: return a list of list of strings (e.g.,[['header1', 'header2'], ['0.1', '0.2']]
)
679def table_of_analyses( 680 data47 = None, 681 data48 = None, 682 dir = 'output', 683 filename = None, 684 save_to_file = True, 685 print_out = True, 686 output = None, 687 ): 688 ''' 689 Print out, save to disk and/or return a combined table of analyses 690 for a pair of `D47data` and `D48data` objects. 691 692 If the sessions in `data47` and those in `data48` do not consist of 693 the exact same sets of analyses, the table will have two columns 694 `Session_47` and `Session_48` instead of a single `Session` column. 695 696 **Parameters** 697 698 + `data47`: `D47data` instance 699 + `data48`: `D48data` instance 700 + `dir`: the directory in which to save the table 701 + `filename`: the name to the csv file to write to 702 + `save_to_file`: whether to save the table to disk 703 + `print_out`: whether to print out the table 704 + `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`); 705 if set to `'raw'`: return a list of list of strings 706 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 707 ''' 708 if data47 is None: 709 if data48 is None: 710 raise TypeError("Arguments must include at least one D47data() or D48data() instance.") 711 else: 712 return data48.table_of_analyses( 713 dir = dir, 714 filename = filename, 715 save_to_file = save_to_file, 716 print_out = print_out, 717 output = output 718 ) 719 else: 720 if data48 is None: 721 return data47.table_of_analyses( 722 dir = dir, 723 filename = filename, 724 save_to_file = save_to_file, 725 print_out = print_out, 726 output = output 727 ) 728 else: 729 out47 = data47.table_of_analyses(save_to_file = False, print_out = False, output = 'raw') 730 out48 = data48.table_of_analyses(save_to_file = False, print_out = False, output = 'raw') 731 732 if [l[1] for l in out47[1:]] == [l[1] for l in out48[1:]]: # if sessions are identical 733 out = transpose_table(transpose_table(out47) + transpose_table(out48)[-1:]) 734 else: 735 out47[0][1] = 'Session_47' 736 out48[0][1] = 'Session_48' 737 out47 = transpose_table(out47) 738 out48 = transpose_table(out48) 739 out = transpose_table(out47[:2] + out48[1:2] + out47[2:] + out48[-1:]) 740 741 if save_to_file: 742 if not os.path.exists(dir): 743 os.makedirs(dir) 744 if filename is None: 745 filename = f'D47D48_sessions.csv' 746 with open(f'{dir}/{filename}', 'w') as fid: 747 fid.write(make_csv(out)) 748 if print_out: 749 print('\n'+pretty_table(out)) 750 if output == 'raw': 751 return out 752 elif output == 'pretty': 753 return pretty_table(out)
Print out, save to disk and/or return a combined table of analyses
for a pair of D47data
and D48data
objects.
If the sessions in data47
and those in data48
do not consist of
the exact same sets of analyses, the table will have two columns
Session_47
and Session_48
instead of a single Session
column.
Parameters
data47
:D47data
instancedata48
:D48data
instancedir
: the directory in which to save the tablefilename
: the name to the csv file to write tosave_to_file
: whether to save the table to diskprint_out
: whether to print out the tableoutput
: if set to'pretty'
: return a pretty text table (seepretty_table()
); if set to'raw'
: return a list of list of strings (e.g.,[['header1', 'header2'], ['0.1', '0.2']]
)
801class D4xdata(list): 802 ''' 803 Store and process data for a large set of Δ47 and/or Δ48 804 analyses, usually comprising more than one analytical session. 805 ''' 806 807 ### 17O CORRECTION PARAMETERS 808 R13_VPDB = 0.01118 # (Chang & Li, 1990) 809 ''' 810 Absolute (13C/12C) ratio of VPDB. 811 By default equal to 0.01118 ([Chang & Li, 1990](http://www.cnki.com.cn/Article/CJFDTotal-JXTW199004006.htm)) 812 ''' 813 814 R18_VSMOW = 0.0020052 # (Baertschi, 1976) 815 ''' 816 Absolute (18O/16C) ratio of VSMOW. 817 By default equal to 0.0020052 ([Baertschi, 1976](https://doi.org/10.1016/0012-821X(76)90115-1)) 818 ''' 819 820 LAMBDA_17 = 0.528 # (Barkan & Luz, 2005) 821 ''' 822 Mass-dependent exponent for triple oxygen isotopes. 823 By default equal to 0.528 ([Barkan & Luz, 2005](https://doi.org/10.1002/rcm.2250)) 824 ''' 825 826 R17_VSMOW = 0.00038475 # (Assonov & Brenninkmeijer, 2003, rescaled to R13_VPDB) 827 ''' 828 Absolute (17O/16C) ratio of VSMOW. 829 By default equal to 0.00038475 830 ([Assonov & Brenninkmeijer, 2003](https://dx.doi.org/10.1002/rcm.1011), 831 rescaled to `R13_VPDB`) 832 ''' 833 834 R18_VPDB = R18_VSMOW * 1.03092 835 ''' 836 Absolute (18O/16C) ratio of VPDB. 837 By definition equal to `R18_VSMOW * 1.03092`. 838 ''' 839 840 R17_VPDB = R17_VSMOW * 1.03092 ** LAMBDA_17 841 ''' 842 Absolute (17O/16C) ratio of VPDB. 843 By definition equal to `R17_VSMOW * 1.03092 ** LAMBDA_17`. 844 ''' 845 846 LEVENE_REF_SAMPLE = 'ETH-3' 847 ''' 848 After the Δ4x standardization step, each sample is tested to 849 assess whether the Δ4x variance within all analyses for that 850 sample differs significantly from that observed for a given reference 851 sample (using [Levene's test](https://en.wikipedia.org/wiki/Levene%27s_test), 852 which yields a p-value corresponding to the null hypothesis that the 853 underlying variances are equal). 854 855 `LEVENE_REF_SAMPLE` (by default equal to `'ETH-3'`) specifies which 856 sample should be used as a reference for this test. 857 ''' 858 859 ALPHA_18O_ACID_REACTION = round(np.exp(3.59 / (90 + 273.15) - 1.79e-3), 6) # (Kim et al., 2007, calcite) 860 ''' 861 Specifies the 18O/16O fractionation factor generally applicable 862 to acid reactions in the dataset. Currently used by `D4xdata.wg()`, 863 `D4xdata.standardize_d13C`, and `D4xdata.standardize_d18O`. 864 865 By default equal to 1.008129 (calcite reacted at 90 °C, 866 [Kim et al., 2007](https://dx.doi.org/10.1016/j.chemgeo.2007.08.005)). 867 ''' 868 869 Nominal_d13C_VPDB = { 870 'ETH-1': 2.02, 871 'ETH-2': -10.17, 872 'ETH-3': 1.71, 873 } # (Bernasconi et al., 2018) 874 ''' 875 Nominal δ13C_VPDB values assigned to carbonate standards, used by 876 `D4xdata.standardize_d13C()`. 877 878 By default equal to `{'ETH-1': 2.02, 'ETH-2': -10.17, 'ETH-3': 1.71}` after 879 [Bernasconi et al. (2018)](https://doi.org/10.1029/2017GC007385). 880 ''' 881 882 Nominal_d18O_VPDB = { 883 'ETH-1': -2.19, 884 'ETH-2': -18.69, 885 'ETH-3': -1.78, 886 } # (Bernasconi et al., 2018) 887 ''' 888 Nominal δ18O_VPDB values assigned to carbonate standards, used by 889 `D4xdata.standardize_d18O()`. 890 891 By default equal to `{'ETH-1': -2.19, 'ETH-2': -18.69, 'ETH-3': -1.78}` after 892 [Bernasconi et al. (2018)](https://doi.org/10.1029/2017GC007385). 893 ''' 894 895 d13C_STANDARDIZATION_METHOD = '2pt' 896 ''' 897 Method by which to standardize δ13C values: 898 899 + `none`: do not apply any δ13C standardization. 900 + `'1pt'`: within each session, offset all initial δ13C values so as to 901 minimize the difference between final δ13C_VPDB values and 902 `Nominal_d13C_VPDB` (averaged over all analyses for which `Nominal_d13C_VPDB` is defined). 903 + `'2pt'`: within each session, apply a affine trasformation to all δ13C 904 values so as to minimize the difference between final δ13C_VPDB 905 values and `Nominal_d13C_VPDB` (averaged over all analyses for which `Nominal_d13C_VPDB` 906 is defined). 907 ''' 908 909 d18O_STANDARDIZATION_METHOD = '2pt' 910 ''' 911 Method by which to standardize δ18O values: 912 913 + `none`: do not apply any δ18O standardization. 914 + `'1pt'`: within each session, offset all initial δ18O values so as to 915 minimize the difference between final δ18O_VPDB values and 916 `Nominal_d18O_VPDB` (averaged over all analyses for which `Nominal_d18O_VPDB` is defined). 917 + `'2pt'`: within each session, apply a affine trasformation to all δ18O 918 values so as to minimize the difference between final δ18O_VPDB 919 values and `Nominal_d18O_VPDB` (averaged over all analyses for which `Nominal_d18O_VPDB` 920 is defined). 921 ''' 922 923 def __init__(self, l = [], mass = '47', logfile = '', session = 'mySession', verbose = False): 924 ''' 925 **Parameters** 926 927 + `l`: a list of dictionaries, with each dictionary including at least the keys 928 `Sample`, `d45`, `d46`, and `d47` or `d48`. 929 + `mass`: `'47'` or `'48'` 930 + `logfile`: if specified, write detailed logs to this file path when calling `D4xdata` methods. 931 + `session`: define session name for analyses without a `Session` key 932 + `verbose`: if `True`, print out detailed logs when calling `D4xdata` methods. 933 934 Returns a `D4xdata` object derived from `list`. 935 ''' 936 self._4x = mass 937 self.verbose = verbose 938 self.prefix = 'D4xdata' 939 self.logfile = logfile 940 list.__init__(self, l) 941 self.Nf = None 942 self.repeatability = {} 943 self.refresh(session = session) 944 945 946 def make_verbal(oldfun): 947 ''' 948 Decorator: allow temporarily changing `self.prefix` and overriding `self.verbose`. 949 ''' 950 @wraps(oldfun) 951 def newfun(*args, verbose = '', **kwargs): 952 myself = args[0] 953 oldprefix = myself.prefix 954 myself.prefix = oldfun.__name__ 955 if verbose != '': 956 oldverbose = myself.verbose 957 myself.verbose = verbose 958 out = oldfun(*args, **kwargs) 959 myself.prefix = oldprefix 960 if verbose != '': 961 myself.verbose = oldverbose 962 return out 963 return newfun 964 965 966 def msg(self, txt): 967 ''' 968 Log a message to `self.logfile`, and print it out if `verbose = True` 969 ''' 970 self.log(txt) 971 if self.verbose: 972 print(f'{f"[{self.prefix}]":<16} {txt}') 973 974 975 def vmsg(self, txt): 976 ''' 977 Log a message to `self.logfile` and print it out 978 ''' 979 self.log(txt) 980 print(txt) 981 982 983 def log(self, *txts): 984 ''' 985 Log a message to `self.logfile` 986 ''' 987 if self.logfile: 988 with open(self.logfile, 'a') as fid: 989 for txt in txts: 990 fid.write(f'\n{dt.now().strftime("%Y-%m-%d %H:%M:%S")} {f"[{self.prefix}]":<16} {txt}') 991 992 993 def refresh(self, session = 'mySession'): 994 ''' 995 Update `self.sessions`, `self.samples`, `self.anchors`, and `self.unknowns`. 996 ''' 997 self.fill_in_missing_info(session = session) 998 self.refresh_sessions() 999 self.refresh_samples() 1000 1001 1002 def refresh_sessions(self): 1003 ''' 1004 Update `self.sessions` and set `scrambling_drift`, `slope_drift`, and `wg_drift` 1005 to `False` for all sessions. 1006 ''' 1007 self.sessions = { 1008 s: {'data': [r for r in self if r['Session'] == s]} 1009 for s in sorted({r['Session'] for r in self}) 1010 } 1011 for s in self.sessions: 1012 self.sessions[s]['scrambling_drift'] = False 1013 self.sessions[s]['slope_drift'] = False 1014 self.sessions[s]['wg_drift'] = False 1015 self.sessions[s]['d13C_standardization_method'] = self.d13C_STANDARDIZATION_METHOD 1016 self.sessions[s]['d18O_standardization_method'] = self.d18O_STANDARDIZATION_METHOD 1017 1018 1019 def refresh_samples(self): 1020 ''' 1021 Define `self.samples`, `self.anchors`, and `self.unknowns`. 1022 ''' 1023 self.samples = { 1024 s: {'data': [r for r in self if r['Sample'] == s]} 1025 for s in sorted({r['Sample'] for r in self}) 1026 } 1027 self.anchors = {s: self.samples[s] for s in self.samples if s in self.Nominal_D4x} 1028 self.unknowns = {s: self.samples[s] for s in self.samples if s not in self.Nominal_D4x} 1029 1030 1031 def read(self, filename, sep = '', session = ''): 1032 ''' 1033 Read file in csv format to load data into a `D47data` object. 1034 1035 In the csv file, spaces before and after field separators (`','` by default) 1036 are optional. Each line corresponds to a single analysis. 1037 1038 The required fields are: 1039 1040 + `UID`: a unique identifier 1041 + `Session`: an identifier for the analytical session 1042 + `Sample`: a sample identifier 1043 + `d45`, `d46`, and at least one of `d47` or `d48`: the working-gas delta values 1044 1045 Independently known oxygen-17 anomalies may be provided as `D17O` (in ‰ relative to 1046 VSMOW, λ = `self.LAMBDA_17`), and are otherwise assumed to be zero. Working-gas deltas `d47`, `d48` 1047 and `d49` are optional, and set to NaN by default. 1048 1049 **Parameters** 1050 1051 + `fileneme`: the path of the file to read 1052 + `sep`: csv separator delimiting the fields 1053 + `session`: set `Session` field to this string for all analyses 1054 ''' 1055 with open(filename) as fid: 1056 self.input(fid.read(), sep = sep, session = session) 1057 1058 1059 def input(self, txt, sep = '', session = ''): 1060 ''' 1061 Read `txt` string in csv format to load analysis data into a `D47data` object. 1062 1063 In the csv string, spaces before and after field separators (`','` by default) 1064 are optional. Each line corresponds to a single analysis. 1065 1066 The required fields are: 1067 1068 + `UID`: a unique identifier 1069 + `Session`: an identifier for the analytical session 1070 + `Sample`: a sample identifier 1071 + `d45`, `d46`, and at least one of `d47` or `d48`: the working-gas delta values 1072 1073 Independently known oxygen-17 anomalies may be provided as `D17O` (in ‰ relative to 1074 VSMOW, λ = `self.LAMBDA_17`), and are otherwise assumed to be zero. Working-gas deltas `d47`, `d48` 1075 and `d49` are optional, and set to NaN by default. 1076 1077 **Parameters** 1078 1079 + `txt`: the csv string to read 1080 + `sep`: csv separator delimiting the fields. By default, use `,`, `;`, or `\t`, 1081 whichever appers most often in `txt`. 1082 + `session`: set `Session` field to this string for all analyses 1083 ''' 1084 if sep == '': 1085 sep = sorted(',;\t', key = lambda x: - txt.count(x))[0] 1086 txt = [[x.strip() for x in l.split(sep)] for l in txt.splitlines() if l.strip()] 1087 data = [{k: v if k in ['UID', 'Session', 'Sample'] else smart_type(v) for k,v in zip(txt[0], l) if v != ''} for l in txt[1:]] 1088 1089 if session != '': 1090 for r in data: 1091 r['Session'] = session 1092 1093 self += data 1094 self.refresh() 1095 1096 1097 @make_verbal 1098 def wg(self, samples = None, a18_acid = None): 1099 ''' 1100 Compute bulk composition of the working gas for each session based on 1101 the carbonate standards defined in both `self.Nominal_d13C_VPDB` and 1102 `self.Nominal_d18O_VPDB`. 1103 ''' 1104 1105 self.msg('Computing WG composition:') 1106 1107 if a18_acid is None: 1108 a18_acid = self.ALPHA_18O_ACID_REACTION 1109 if samples is None: 1110 samples = [s for s in self.Nominal_d13C_VPDB if s in self.Nominal_d18O_VPDB] 1111 1112 assert a18_acid, f'Acid fractionation factor should not be zero.' 1113 1114 samples = [s for s in samples if s in self.Nominal_d13C_VPDB and s in self.Nominal_d18O_VPDB] 1115 R45R46_standards = {} 1116 for sample in samples: 1117 d13C_vpdb = self.Nominal_d13C_VPDB[sample] 1118 d18O_vpdb = self.Nominal_d18O_VPDB[sample] 1119 R13_s = self.R13_VPDB * (1 + d13C_vpdb / 1000) 1120 R17_s = self.R17_VPDB * ((1 + d18O_vpdb / 1000) * a18_acid) ** self.LAMBDA_17 1121 R18_s = self.R18_VPDB * (1 + d18O_vpdb / 1000) * a18_acid 1122 1123 C12_s = 1 / (1 + R13_s) 1124 C13_s = R13_s / (1 + R13_s) 1125 C16_s = 1 / (1 + R17_s + R18_s) 1126 C17_s = R17_s / (1 + R17_s + R18_s) 1127 C18_s = R18_s / (1 + R17_s + R18_s) 1128 1129 C626_s = C12_s * C16_s ** 2 1130 C627_s = 2 * C12_s * C16_s * C17_s 1131 C628_s = 2 * C12_s * C16_s * C18_s 1132 C636_s = C13_s * C16_s ** 2 1133 C637_s = 2 * C13_s * C16_s * C17_s 1134 C727_s = C12_s * C17_s ** 2 1135 1136 R45_s = (C627_s + C636_s) / C626_s 1137 R46_s = (C628_s + C637_s + C727_s) / C626_s 1138 R45R46_standards[sample] = (R45_s, R46_s) 1139 1140 for s in self.sessions: 1141 db = [r for r in self.sessions[s]['data'] if r['Sample'] in samples] 1142 assert db, f'No sample from {samples} found in session "{s}".' 1143# dbsamples = sorted({r['Sample'] for r in db}) 1144 1145 X = [r['d45'] for r in db] 1146 Y = [R45R46_standards[r['Sample']][0] for r in db] 1147 x1, x2 = np.min(X), np.max(X) 1148 1149 if x1 < x2: 1150 wgcoord = x1/(x1-x2) 1151 else: 1152 wgcoord = 999 1153 1154 if wgcoord < -.5 or wgcoord > 1.5: 1155 # unreasonable to extrapolate to d45 = 0 1156 R45_wg = np.mean([y/(1+x/1000) for x,y in zip(X,Y)]) 1157 else : 1158 # d45 = 0 is reasonably well bracketed 1159 R45_wg = np.polyfit(X, Y, 1)[1] 1160 1161 X = [r['d46'] for r in db] 1162 Y = [R45R46_standards[r['Sample']][1] for r in db] 1163 x1, x2 = np.min(X), np.max(X) 1164 1165 if x1 < x2: 1166 wgcoord = x1/(x1-x2) 1167 else: 1168 wgcoord = 999 1169 1170 if wgcoord < -.5 or wgcoord > 1.5: 1171 # unreasonable to extrapolate to d46 = 0 1172 R46_wg = np.mean([y/(1+x/1000) for x,y in zip(X,Y)]) 1173 else : 1174 # d46 = 0 is reasonably well bracketed 1175 R46_wg = np.polyfit(X, Y, 1)[1] 1176 1177 d13Cwg_VPDB, d18Owg_VSMOW = self.compute_bulk_delta(R45_wg, R46_wg) 1178 1179 self.msg(f'Session {s} WG: δ13C_VPDB = {d13Cwg_VPDB:.3f} δ18O_VSMOW = {d18Owg_VSMOW:.3f}') 1180 1181 self.sessions[s]['d13Cwg_VPDB'] = d13Cwg_VPDB 1182 self.sessions[s]['d18Owg_VSMOW'] = d18Owg_VSMOW 1183 for r in self.sessions[s]['data']: 1184 r['d13Cwg_VPDB'] = d13Cwg_VPDB 1185 r['d18Owg_VSMOW'] = d18Owg_VSMOW 1186 1187 1188 def compute_bulk_delta(self, R45, R46, D17O = 0): 1189 ''' 1190 Compute δ13C_VPDB and δ18O_VSMOW, 1191 by solving the generalized form of equation (17) from 1192 [Brand et al. (2010)](https://doi.org/10.1351/PAC-REP-09-01-05), 1193 assuming that δ18O_VSMOW is not too big (0 ± 50 ‰) and 1194 solving the corresponding second-order Taylor polynomial. 1195 (Appendix A of [Daëron et al., 2016](https://doi.org/10.1016/j.chemgeo.2016.08.014)) 1196 ''' 1197 1198 K = np.exp(D17O / 1000) * self.R17_VSMOW * self.R18_VSMOW ** -self.LAMBDA_17 1199 1200 A = -3 * K ** 2 * self.R18_VSMOW ** (2 * self.LAMBDA_17) 1201 B = 2 * K * R45 * self.R18_VSMOW ** self.LAMBDA_17 1202 C = 2 * self.R18_VSMOW 1203 D = -R46 1204 1205 aa = A * self.LAMBDA_17 * (2 * self.LAMBDA_17 - 1) + B * self.LAMBDA_17 * (self.LAMBDA_17 - 1) / 2 1206 bb = 2 * A * self.LAMBDA_17 + B * self.LAMBDA_17 + C 1207 cc = A + B + C + D 1208 1209 d18O_VSMOW = 1000 * (-bb + (bb ** 2 - 4 * aa * cc) ** .5) / (2 * aa) 1210 1211 R18 = (1 + d18O_VSMOW / 1000) * self.R18_VSMOW 1212 R17 = K * R18 ** self.LAMBDA_17 1213 R13 = R45 - 2 * R17 1214 1215 d13C_VPDB = 1000 * (R13 / self.R13_VPDB - 1) 1216 1217 return d13C_VPDB, d18O_VSMOW 1218 1219 1220 @make_verbal 1221 def crunch(self, verbose = ''): 1222 ''' 1223 Compute bulk composition and raw clumped isotope anomalies for all analyses. 1224 ''' 1225 for r in self: 1226 self.compute_bulk_and_clumping_deltas(r) 1227 self.standardize_d13C() 1228 self.standardize_d18O() 1229 self.msg(f"Crunched {len(self)} analyses.") 1230 1231 1232 def fill_in_missing_info(self, session = 'mySession'): 1233 ''' 1234 Fill in optional fields with default values 1235 ''' 1236 for i,r in enumerate(self): 1237 if 'D17O' not in r: 1238 r['D17O'] = 0. 1239 if 'UID' not in r: 1240 r['UID'] = f'{i+1}' 1241 if 'Session' not in r: 1242 r['Session'] = session 1243 for k in ['d47', 'd48', 'd49']: 1244 if k not in r: 1245 r[k] = np.nan 1246 1247 1248 def standardize_d13C(self): 1249 ''' 1250 Perform δ13C standadization within each session `s` according to 1251 `self.sessions[s]['d13C_standardization_method']`, which is defined by default 1252 by `D47data.refresh_sessions()`as equal to `self.d13C_STANDARDIZATION_METHOD`, but 1253 may be redefined abitrarily at a later stage. 1254 ''' 1255 for s in self.sessions: 1256 if self.sessions[s]['d13C_standardization_method'] in ['1pt', '2pt']: 1257 XY = [(r['d13C_VPDB'], self.Nominal_d13C_VPDB[r['Sample']]) for r in self.sessions[s]['data'] if r['Sample'] in self.Nominal_d13C_VPDB] 1258 X,Y = zip(*XY) 1259 if self.sessions[s]['d13C_standardization_method'] == '1pt': 1260 offset = np.mean(Y) - np.mean(X) 1261 for r in self.sessions[s]['data']: 1262 r['d13C_VPDB'] += offset 1263 elif self.sessions[s]['d13C_standardization_method'] == '2pt': 1264 a,b = np.polyfit(X,Y,1) 1265 for r in self.sessions[s]['data']: 1266 r['d13C_VPDB'] = a * r['d13C_VPDB'] + b 1267 1268 def standardize_d18O(self): 1269 ''' 1270 Perform δ18O standadization within each session `s` according to 1271 `self.ALPHA_18O_ACID_REACTION` and `self.sessions[s]['d18O_standardization_method']`, 1272 which is defined by default by `D47data.refresh_sessions()`as equal to 1273 `self.d18O_STANDARDIZATION_METHOD`, but may be redefined abitrarily at a later stage. 1274 ''' 1275 for s in self.sessions: 1276 if self.sessions[s]['d18O_standardization_method'] in ['1pt', '2pt']: 1277 XY = [(r['d18O_VSMOW'], self.Nominal_d18O_VPDB[r['Sample']]) for r in self.sessions[s]['data'] if r['Sample'] in self.Nominal_d18O_VPDB] 1278 X,Y = zip(*XY) 1279 Y = [(1000+y) * self.R18_VPDB * self.ALPHA_18O_ACID_REACTION / self.R18_VSMOW - 1000 for y in Y] 1280 if self.sessions[s]['d18O_standardization_method'] == '1pt': 1281 offset = np.mean(Y) - np.mean(X) 1282 for r in self.sessions[s]['data']: 1283 r['d18O_VSMOW'] += offset 1284 elif self.sessions[s]['d18O_standardization_method'] == '2pt': 1285 a,b = np.polyfit(X,Y,1) 1286 for r in self.sessions[s]['data']: 1287 r['d18O_VSMOW'] = a * r['d18O_VSMOW'] + b 1288 1289 1290 def compute_bulk_and_clumping_deltas(self, r): 1291 ''' 1292 Compute δ13C_VPDB, δ18O_VSMOW, and raw Δ47, Δ48, Δ49 values for a single analysis `r`. 1293 ''' 1294 1295 # Compute working gas R13, R18, and isobar ratios 1296 R13_wg = self.R13_VPDB * (1 + r['d13Cwg_VPDB'] / 1000) 1297 R18_wg = self.R18_VSMOW * (1 + r['d18Owg_VSMOW'] / 1000) 1298 R45_wg, R46_wg, R47_wg, R48_wg, R49_wg = self.compute_isobar_ratios(R13_wg, R18_wg) 1299 1300 # Compute analyte isobar ratios 1301 R45 = (1 + r['d45'] / 1000) * R45_wg 1302 R46 = (1 + r['d46'] / 1000) * R46_wg 1303 R47 = (1 + r['d47'] / 1000) * R47_wg 1304 R48 = (1 + r['d48'] / 1000) * R48_wg 1305 R49 = (1 + r['d49'] / 1000) * R49_wg 1306 1307 r['d13C_VPDB'], r['d18O_VSMOW'] = self.compute_bulk_delta(R45, R46, D17O = r['D17O']) 1308 R13 = (1 + r['d13C_VPDB'] / 1000) * self.R13_VPDB 1309 R18 = (1 + r['d18O_VSMOW'] / 1000) * self.R18_VSMOW 1310 1311 # Compute stochastic isobar ratios of the analyte 1312 R45stoch, R46stoch, R47stoch, R48stoch, R49stoch = self.compute_isobar_ratios( 1313 R13, R18, D17O = r['D17O'] 1314 ) 1315 1316 # Check that R45/R45stoch and R46/R46stoch are undistinguishable from 1, 1317 # and raise a warning if the corresponding anomalies exceed 0.02 ppm. 1318 if (R45 / R45stoch - 1) > 5e-8: 1319 self.vmsg(f'This is unexpected: R45/R45stoch - 1 = {1e6 * (R45 / R45stoch - 1):.3f} ppm') 1320 if (R46 / R46stoch - 1) > 5e-8: 1321 self.vmsg(f'This is unexpected: R46/R46stoch - 1 = {1e6 * (R46 / R46stoch - 1):.3f} ppm') 1322 1323 # Compute raw clumped isotope anomalies 1324 r['D47raw'] = 1000 * (R47 / R47stoch - 1) 1325 r['D48raw'] = 1000 * (R48 / R48stoch - 1) 1326 r['D49raw'] = 1000 * (R49 / R49stoch - 1) 1327 1328 1329 def compute_isobar_ratios(self, R13, R18, D17O=0, D47=0, D48=0, D49=0): 1330 ''' 1331 Compute isobar ratios for a sample with isotopic ratios `R13` and `R18`, 1332 optionally accounting for non-zero values of Δ17O (`D17O`) and clumped isotope 1333 anomalies (`D47`, `D48`, `D49`), all expressed in permil. 1334 ''' 1335 1336 # Compute R17 1337 R17 = self.R17_VSMOW * np.exp(D17O / 1000) * (R18 / self.R18_VSMOW) ** self.LAMBDA_17 1338 1339 # Compute isotope concentrations 1340 C12 = (1 + R13) ** -1 1341 C13 = C12 * R13 1342 C16 = (1 + R17 + R18) ** -1 1343 C17 = C16 * R17 1344 C18 = C16 * R18 1345 1346 # Compute stochastic isotopologue concentrations 1347 C626 = C16 * C12 * C16 1348 C627 = C16 * C12 * C17 * 2 1349 C628 = C16 * C12 * C18 * 2 1350 C636 = C16 * C13 * C16 1351 C637 = C16 * C13 * C17 * 2 1352 C638 = C16 * C13 * C18 * 2 1353 C727 = C17 * C12 * C17 1354 C728 = C17 * C12 * C18 * 2 1355 C737 = C17 * C13 * C17 1356 C738 = C17 * C13 * C18 * 2 1357 C828 = C18 * C12 * C18 1358 C838 = C18 * C13 * C18 1359 1360 # Compute stochastic isobar ratios 1361 R45 = (C636 + C627) / C626 1362 R46 = (C628 + C637 + C727) / C626 1363 R47 = (C638 + C728 + C737) / C626 1364 R48 = (C738 + C828) / C626 1365 R49 = C838 / C626 1366 1367 # Account for stochastic anomalies 1368 R47 *= 1 + D47 / 1000 1369 R48 *= 1 + D48 / 1000 1370 R49 *= 1 + D49 / 1000 1371 1372 # Return isobar ratios 1373 return R45, R46, R47, R48, R49 1374 1375 1376 def split_samples(self, samples_to_split = 'all', grouping = 'by_session'): 1377 ''' 1378 Split unknown samples by UID (treat all analyses as different samples) 1379 or by session (treat analyses of a given sample in different sessions as 1380 different samples). 1381 1382 **Parameters** 1383 1384 + `samples_to_split`: a list of samples to split, e.g., `['IAEA-C1', 'IAEA-C2']` 1385 + `grouping`: `by_uid` | `by_session` 1386 ''' 1387 if samples_to_split == 'all': 1388 samples_to_split = [s for s in self.unknowns] 1389 gkeys = {'by_uid':'UID', 'by_session':'Session'} 1390 self.grouping = grouping.lower() 1391 if self.grouping in gkeys: 1392 gkey = gkeys[self.grouping] 1393 for r in self: 1394 if r['Sample'] in samples_to_split: 1395 r['Sample_original'] = r['Sample'] 1396 r['Sample'] = f"{r['Sample']}__{r[gkey]}" 1397 elif r['Sample'] in self.unknowns: 1398 r['Sample_original'] = r['Sample'] 1399 self.refresh_samples() 1400 1401 1402 def unsplit_samples(self, tables = False): 1403 ''' 1404 Reverse the effects of `D47data.split_samples()`. 1405 1406 This should only be used after `D4xdata.standardize()` with `method='pooled'`. 1407 1408 After `D4xdata.standardize()` with `method='indep_sessions'`, one should 1409 probably use `D4xdata.combine_samples()` instead to reverse the effects of 1410 `D47data.split_samples()` with `grouping='by_uid'`, or `w_avg()` to reverse the 1411 effects of `D47data.split_samples()` with `grouping='by_sessions'` (because in 1412 that case session-averaged Δ4x values are statistically independent). 1413 ''' 1414 unknowns_old = sorted({s for s in self.unknowns}) 1415 CM_old = self.standardization.covar[:,:] 1416 VD_old = self.standardization.params.valuesdict().copy() 1417 vars_old = self.standardization.var_names 1418 1419 unknowns_new = sorted({r['Sample_original'] for r in self if 'Sample_original' in r}) 1420 1421 Ns = len(vars_old) - len(unknowns_old) 1422 vars_new = vars_old[:Ns] + [f'D{self._4x}_{pf(u)}' for u in unknowns_new] 1423 VD_new = {k: VD_old[k] for k in vars_old[:Ns]} 1424 1425 W = np.zeros((len(vars_new), len(vars_old))) 1426 W[:Ns,:Ns] = np.eye(Ns) 1427 for u in unknowns_new: 1428 splits = sorted({r['Sample'] for r in self if 'Sample_original' in r and r['Sample_original'] == u}) 1429 if self.grouping == 'by_session': 1430 weights = [self.samples[s][f'SE_D{self._4x}']**-2 for s in splits] 1431 elif self.grouping == 'by_uid': 1432 weights = [1 for s in splits] 1433 sw = sum(weights) 1434 weights = [w/sw for w in weights] 1435 W[vars_new.index(f'D{self._4x}_{pf(u)}'),[vars_old.index(f'D{self._4x}_{pf(s)}') for s in splits]] = weights[:] 1436 1437 CM_new = W @ CM_old @ W.T 1438 V = W @ np.array([[VD_old[k]] for k in vars_old]) 1439 VD_new = {k:v[0] for k,v in zip(vars_new, V)} 1440 1441 self.standardization.covar = CM_new 1442 self.standardization.params.valuesdict = lambda : VD_new 1443 self.standardization.var_names = vars_new 1444 1445 for r in self: 1446 if r['Sample'] in self.unknowns: 1447 r['Sample_split'] = r['Sample'] 1448 r['Sample'] = r['Sample_original'] 1449 1450 self.refresh_samples() 1451 self.consolidate_samples() 1452 self.repeatabilities() 1453 1454 if tables: 1455 self.table_of_analyses() 1456 self.table_of_samples() 1457 1458 def assign_timestamps(self): 1459 ''' 1460 Assign a time field `t` of type `float` to each analysis. 1461 1462 If `TimeTag` is one of the data fields, `t` is equal within a given session 1463 to `TimeTag` minus the mean value of `TimeTag` for that session. 1464 Otherwise, `TimeTag` is by default equal to the index of each analysis 1465 in the dataset and `t` is defined as above. 1466 ''' 1467 for session in self.sessions: 1468 sdata = self.sessions[session]['data'] 1469 try: 1470 t0 = np.mean([r['TimeTag'] for r in sdata]) 1471 for r in sdata: 1472 r['t'] = r['TimeTag'] - t0 1473 except KeyError: 1474 t0 = (len(sdata)-1)/2 1475 for t,r in enumerate(sdata): 1476 r['t'] = t - t0 1477 1478 1479 def report(self): 1480 ''' 1481 Prints a report on the standardization fit. 1482 Only applicable after `D4xdata.standardize(method='pooled')`. 1483 ''' 1484 report_fit(self.standardization) 1485 1486 1487 def combine_samples(self, sample_groups): 1488 ''' 1489 Combine analyses of different samples to compute weighted average Δ4x 1490 and new error (co)variances corresponding to the groups defined by the `sample_groups` 1491 dictionary. 1492 1493 Caution: samples are weighted by number of replicate analyses, which is a 1494 reasonable default behavior but is not always optimal (e.g., in the case of strongly 1495 correlated analytical errors for one or more samples). 1496 1497 Returns a tuplet of: 1498 1499 + the list of group names 1500 + an array of the corresponding Δ4x values 1501 + the corresponding (co)variance matrix 1502 1503 **Parameters** 1504 1505 + `sample_groups`: a dictionary of the form: 1506 ```py 1507 {'group1': ['sample_1', 'sample_2'], 1508 'group2': ['sample_3', 'sample_4', 'sample_5']} 1509 ``` 1510 ''' 1511 1512 samples = [s for k in sorted(sample_groups.keys()) for s in sorted(sample_groups[k])] 1513 groups = sorted(sample_groups.keys()) 1514 group_total_weights = {k: sum([self.samples[s]['N'] for s in sample_groups[k]]) for k in groups} 1515 D4x_old = np.array([[self.samples[x][f'D{self._4x}']] for x in samples]) 1516 CM_old = np.array([[self.sample_D4x_covar(x,y) for x in samples] for y in samples]) 1517 W = np.array([ 1518 [self.samples[i]['N']/group_total_weights[j] if i in sample_groups[j] else 0 for i in samples] 1519 for j in groups]) 1520 D4x_new = W @ D4x_old 1521 CM_new = W @ CM_old @ W.T 1522 1523 return groups, D4x_new[:,0], CM_new 1524 1525 1526 @make_verbal 1527 def standardize(self, 1528 method = 'pooled', 1529 weighted_sessions = [], 1530 consolidate = True, 1531 consolidate_tables = False, 1532 consolidate_plots = False, 1533 constraints = {}, 1534 ): 1535 ''' 1536 Compute absolute Δ4x values for all replicate analyses and for sample averages. 1537 If `method` argument is set to `'pooled'`, the standardization processes all sessions 1538 in a single step, assuming that all samples (anchors and unknowns alike) are homogeneous, 1539 i.e. that their true Δ4x value does not change between sessions, 1540 ([Daëron, 2021](https://doi.org/10.1029/2020GC009592)). If `method` argument is set to 1541 `'indep_sessions'`, the standardization processes each session independently, based only 1542 on anchors analyses. 1543 ''' 1544 1545 self.standardization_method = method 1546 self.assign_timestamps() 1547 1548 if method == 'pooled': 1549 if weighted_sessions: 1550 for session_group in weighted_sessions: 1551 if self._4x == '47': 1552 X = D47data([r for r in self if r['Session'] in session_group]) 1553 elif self._4x == '48': 1554 X = D48data([r for r in self if r['Session'] in session_group]) 1555 X.Nominal_D4x = self.Nominal_D4x.copy() 1556 X.refresh() 1557 result = X.standardize(method = 'pooled', weighted_sessions = [], consolidate = False) 1558 w = np.sqrt(result.redchi) 1559 self.msg(f'Session group {session_group} MRSWD = {w:.4f}') 1560 for r in X: 1561 r[f'wD{self._4x}raw'] *= w 1562 else: 1563 self.msg(f'All D{self._4x}raw weights set to 1 ‰') 1564 for r in self: 1565 r[f'wD{self._4x}raw'] = 1. 1566 1567 params = Parameters() 1568 for k,session in enumerate(self.sessions): 1569 self.msg(f"Session {session}: scrambling_drift is {self.sessions[session]['scrambling_drift']}.") 1570 self.msg(f"Session {session}: slope_drift is {self.sessions[session]['slope_drift']}.") 1571 self.msg(f"Session {session}: wg_drift is {self.sessions[session]['wg_drift']}.") 1572 s = pf(session) 1573 params.add(f'a_{s}', value = 0.9) 1574 params.add(f'b_{s}', value = 0.) 1575 params.add(f'c_{s}', value = -0.9) 1576 params.add(f'a2_{s}', value = 0., 1577# vary = self.sessions[session]['scrambling_drift'], 1578 ) 1579 params.add(f'b2_{s}', value = 0., 1580# vary = self.sessions[session]['slope_drift'], 1581 ) 1582 params.add(f'c2_{s}', value = 0., 1583# vary = self.sessions[session]['wg_drift'], 1584 ) 1585 if not self.sessions[session]['scrambling_drift']: 1586 params[f'a2_{s}'].expr = '0' 1587 if not self.sessions[session]['slope_drift']: 1588 params[f'b2_{s}'].expr = '0' 1589 if not self.sessions[session]['wg_drift']: 1590 params[f'c2_{s}'].expr = '0' 1591 1592 for sample in self.unknowns: 1593 params.add(f'D{self._4x}_{pf(sample)}', value = 0.5) 1594 1595 for k in constraints: 1596 params[k].expr = constraints[k] 1597 1598 def residuals(p): 1599 R = [] 1600 for r in self: 1601 session = pf(r['Session']) 1602 sample = pf(r['Sample']) 1603 if r['Sample'] in self.Nominal_D4x: 1604 R += [ ( 1605 r[f'D{self._4x}raw'] - ( 1606 p[f'a_{session}'] * self.Nominal_D4x[r['Sample']] 1607 + p[f'b_{session}'] * r[f'd{self._4x}'] 1608 + p[f'c_{session}'] 1609 + r['t'] * ( 1610 p[f'a2_{session}'] * self.Nominal_D4x[r['Sample']] 1611 + p[f'b2_{session}'] * r[f'd{self._4x}'] 1612 + p[f'c2_{session}'] 1613 ) 1614 ) 1615 ) / r[f'wD{self._4x}raw'] ] 1616 else: 1617 R += [ ( 1618 r[f'D{self._4x}raw'] - ( 1619 p[f'a_{session}'] * p[f'D{self._4x}_{sample}'] 1620 + p[f'b_{session}'] * r[f'd{self._4x}'] 1621 + p[f'c_{session}'] 1622 + r['t'] * ( 1623 p[f'a2_{session}'] * p[f'D{self._4x}_{sample}'] 1624 + p[f'b2_{session}'] * r[f'd{self._4x}'] 1625 + p[f'c2_{session}'] 1626 ) 1627 ) 1628 ) / r[f'wD{self._4x}raw'] ] 1629 return R 1630 1631 M = Minimizer(residuals, params) 1632 result = M.least_squares() 1633 self.Nf = result.nfree 1634 self.t95 = tstudent.ppf(1 - 0.05/2, self.Nf) 1635 new_names, new_covar, new_se = _fullcovar(result)[:3] 1636 result.var_names = new_names 1637 result.covar = new_covar 1638 1639 for r in self: 1640 s = pf(r["Session"]) 1641 a = result.params.valuesdict()[f'a_{s}'] 1642 b = result.params.valuesdict()[f'b_{s}'] 1643 c = result.params.valuesdict()[f'c_{s}'] 1644 a2 = result.params.valuesdict()[f'a2_{s}'] 1645 b2 = result.params.valuesdict()[f'b2_{s}'] 1646 c2 = result.params.valuesdict()[f'c2_{s}'] 1647 r[f'D{self._4x}'] = (r[f'D{self._4x}raw'] - c - b * r[f'd{self._4x}'] - c2 * r['t'] - b2 * r['t'] * r[f'd{self._4x}']) / (a + a2 * r['t']) 1648 1649 1650 self.standardization = result 1651 1652 for session in self.sessions: 1653 self.sessions[session]['Np'] = 3 1654 for k in ['scrambling', 'slope', 'wg']: 1655 if self.sessions[session][f'{k}_drift']: 1656 self.sessions[session]['Np'] += 1 1657 1658 if consolidate: 1659 self.consolidate(tables = consolidate_tables, plots = consolidate_plots) 1660 return result 1661 1662 1663 elif method == 'indep_sessions': 1664 1665 if weighted_sessions: 1666 for session_group in weighted_sessions: 1667 X = D4xdata([r for r in self if r['Session'] in session_group], mass = self._4x) 1668 X.Nominal_D4x = self.Nominal_D4x.copy() 1669 X.refresh() 1670 # This is only done to assign r['wD47raw'] for r in X: 1671 X.standardize(method = method, weighted_sessions = [], consolidate = False) 1672 self.msg(f'D{self._4x}raw weights set to {1000*X[0][f"wD{self._4x}raw"]:.1f} ppm for sessions in {session_group}') 1673 else: 1674 self.msg('All weights set to 1 ‰') 1675 for r in self: 1676 r[f'wD{self._4x}raw'] = 1 1677 1678 for session in self.sessions: 1679 s = self.sessions[session] 1680 p_names = ['a', 'b', 'c', 'a2', 'b2', 'c2'] 1681 p_active = [True, True, True, s['scrambling_drift'], s['slope_drift'], s['wg_drift']] 1682 s['Np'] = sum(p_active) 1683 sdata = s['data'] 1684 1685 A = np.array([ 1686 [ 1687 self.Nominal_D4x[r['Sample']] / r[f'wD{self._4x}raw'], 1688 r[f'd{self._4x}'] / r[f'wD{self._4x}raw'], 1689 1 / r[f'wD{self._4x}raw'], 1690 self.Nominal_D4x[r['Sample']] * r['t'] / r[f'wD{self._4x}raw'], 1691 r[f'd{self._4x}'] * r['t'] / r[f'wD{self._4x}raw'], 1692 r['t'] / r[f'wD{self._4x}raw'] 1693 ] 1694 for r in sdata if r['Sample'] in self.anchors 1695 ])[:,p_active] # only keep columns for the active parameters 1696 Y = np.array([[r[f'D{self._4x}raw'] / r[f'wD{self._4x}raw']] for r in sdata if r['Sample'] in self.anchors]) 1697 s['Na'] = Y.size 1698 CM = linalg.inv(A.T @ A) 1699 bf = (CM @ A.T @ Y).T[0,:] 1700 k = 0 1701 for n,a in zip(p_names, p_active): 1702 if a: 1703 s[n] = bf[k] 1704# self.msg(f'{n} = {bf[k]}') 1705 k += 1 1706 else: 1707 s[n] = 0. 1708# self.msg(f'{n} = 0.0') 1709 1710 for r in sdata : 1711 a, b, c, a2, b2, c2 = s['a'], s['b'], s['c'], s['a2'], s['b2'], s['c2'] 1712 r[f'D{self._4x}'] = (r[f'D{self._4x}raw'] - c - b * r[f'd{self._4x}'] - c2 * r['t'] - b2 * r['t'] * r[f'd{self._4x}']) / (a + a2 * r['t']) 1713 r[f'wD{self._4x}'] = r[f'wD{self._4x}raw'] / (a + a2 * r['t']) 1714 1715 s['CM'] = np.zeros((6,6)) 1716 i = 0 1717 k_active = [j for j,a in enumerate(p_active) if a] 1718 for j,a in enumerate(p_active): 1719 if a: 1720 s['CM'][j,k_active] = CM[i,:] 1721 i += 1 1722 1723 if not weighted_sessions: 1724 w = self.rmswd()['rmswd'] 1725 for r in self: 1726 r[f'wD{self._4x}'] *= w 1727 r[f'wD{self._4x}raw'] *= w 1728 for session in self.sessions: 1729 self.sessions[session]['CM'] *= w**2 1730 1731 for session in self.sessions: 1732 s = self.sessions[session] 1733 s['SE_a'] = s['CM'][0,0]**.5 1734 s['SE_b'] = s['CM'][1,1]**.5 1735 s['SE_c'] = s['CM'][2,2]**.5 1736 s['SE_a2'] = s['CM'][3,3]**.5 1737 s['SE_b2'] = s['CM'][4,4]**.5 1738 s['SE_c2'] = s['CM'][5,5]**.5 1739 1740 if not weighted_sessions: 1741 self.Nf = len(self) - len(self.unknowns) - np.sum([self.sessions[s]['Np'] for s in self.sessions]) 1742 else: 1743 self.Nf = 0 1744 for sg in weighted_sessions: 1745 self.Nf += self.rmswd(sessions = sg)['Nf'] 1746 1747 self.t95 = tstudent.ppf(1 - 0.05/2, self.Nf) 1748 1749 avgD4x = { 1750 sample: np.mean([r[f'D{self._4x}'] for r in self if r['Sample'] == sample]) 1751 for sample in self.samples 1752 } 1753 chi2 = np.sum([(r[f'D{self._4x}'] - avgD4x[r['Sample']])**2 for r in self]) 1754 rD4x = (chi2/self.Nf)**.5 1755 self.repeatability[f'sigma_{self._4x}'] = rD4x 1756 1757 if consolidate: 1758 self.consolidate(tables = consolidate_tables, plots = consolidate_plots) 1759 1760 1761 def standardization_error(self, session, d4x, D4x, t = 0): 1762 ''' 1763 Compute standardization error for a given session and 1764 (δ47, Δ47) composition. 1765 ''' 1766 a = self.sessions[session]['a'] 1767 b = self.sessions[session]['b'] 1768 c = self.sessions[session]['c'] 1769 a2 = self.sessions[session]['a2'] 1770 b2 = self.sessions[session]['b2'] 1771 c2 = self.sessions[session]['c2'] 1772 CM = self.sessions[session]['CM'] 1773 1774 x, y = D4x, d4x 1775 z = a * x + b * y + c + a2 * x * t + b2 * y * t + c2 * t 1776# x = (z - b*y - b2*y*t - c - c2*t) / (a+a2*t) 1777 dxdy = -(b+b2*t) / (a+a2*t) 1778 dxdz = 1. / (a+a2*t) 1779 dxda = -x / (a+a2*t) 1780 dxdb = -y / (a+a2*t) 1781 dxdc = -1. / (a+a2*t) 1782 dxda2 = -x * a2 / (a+a2*t) 1783 dxdb2 = -y * t / (a+a2*t) 1784 dxdc2 = -t / (a+a2*t) 1785 V = np.array([dxda, dxdb, dxdc, dxda2, dxdb2, dxdc2]) 1786 sx = (V @ CM @ V.T) ** .5 1787 return sx 1788 1789 1790 @make_verbal 1791 def summary(self, 1792 dir = 'output', 1793 filename = None, 1794 save_to_file = True, 1795 print_out = True, 1796 ): 1797 ''' 1798 Print out an/or save to disk a summary of the standardization results. 1799 1800 **Parameters** 1801 1802 + `dir`: the directory in which to save the table 1803 + `filename`: the name to the csv file to write to 1804 + `save_to_file`: whether to save the table to disk 1805 + `print_out`: whether to print out the table 1806 ''' 1807 1808 out = [] 1809 out += [['N samples (anchors + unknowns)', f"{len(self.samples)} ({len(self.anchors)} + {len(self.unknowns)})"]] 1810 out += [['N analyses (anchors + unknowns)', f"{len(self)} ({len([r for r in self if r['Sample'] in self.anchors])} + {len([r for r in self if r['Sample'] in self.unknowns])})"]] 1811 out += [['Repeatability of δ13C_VPDB', f"{1000 * self.repeatability['r_d13C_VPDB']:.1f} ppm"]] 1812 out += [['Repeatability of δ18O_VSMOW', f"{1000 * self.repeatability['r_d18O_VSMOW']:.1f} ppm"]] 1813 out += [[f'Repeatability of Δ{self._4x} (anchors)', f"{1000 * self.repeatability[f'r_D{self._4x}a']:.1f} ppm"]] 1814 out += [[f'Repeatability of Δ{self._4x} (unknowns)', f"{1000 * self.repeatability[f'r_D{self._4x}u']:.1f} ppm"]] 1815 out += [[f'Repeatability of Δ{self._4x} (all)', f"{1000 * self.repeatability[f'r_D{self._4x}']:.1f} ppm"]] 1816 out += [['Model degrees of freedom', f"{self.Nf}"]] 1817 out += [['Student\'s 95% t-factor', f"{self.t95:.2f}"]] 1818 out += [['Standardization method', self.standardization_method]] 1819 1820 if save_to_file: 1821 if not os.path.exists(dir): 1822 os.makedirs(dir) 1823 if filename is None: 1824 filename = f'D{self._4x}_summary.csv' 1825 with open(f'{dir}/{filename}', 'w') as fid: 1826 fid.write(make_csv(out)) 1827 if print_out: 1828 self.msg('\n' + pretty_table(out, header = 0)) 1829 1830 1831 @make_verbal 1832 def table_of_sessions(self, 1833 dir = 'output', 1834 filename = None, 1835 save_to_file = True, 1836 print_out = True, 1837 output = None, 1838 ): 1839 ''' 1840 Print out an/or save to disk a table of sessions. 1841 1842 **Parameters** 1843 1844 + `dir`: the directory in which to save the table 1845 + `filename`: the name to the csv file to write to 1846 + `save_to_file`: whether to save the table to disk 1847 + `print_out`: whether to print out the table 1848 + `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`); 1849 if set to `'raw'`: return a list of list of strings 1850 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 1851 ''' 1852 include_a2 = any([self.sessions[session]['scrambling_drift'] for session in self.sessions]) 1853 include_b2 = any([self.sessions[session]['slope_drift'] for session in self.sessions]) 1854 include_c2 = any([self.sessions[session]['wg_drift'] for session in self.sessions]) 1855 1856 out = [['Session','Na','Nu','d13Cwg_VPDB','d18Owg_VSMOW','r_d13C','r_d18O',f'r_D{self._4x}','a ± SE','1e3 x b ± SE','c ± SE']] 1857 if include_a2: 1858 out[-1] += ['a2 ± SE'] 1859 if include_b2: 1860 out[-1] += ['b2 ± SE'] 1861 if include_c2: 1862 out[-1] += ['c2 ± SE'] 1863 for session in self.sessions: 1864 out += [[ 1865 session, 1866 f"{self.sessions[session]['Na']}", 1867 f"{self.sessions[session]['Nu']}", 1868 f"{self.sessions[session]['d13Cwg_VPDB']:.3f}", 1869 f"{self.sessions[session]['d18Owg_VSMOW']:.3f}", 1870 f"{self.sessions[session]['r_d13C_VPDB']:.4f}", 1871 f"{self.sessions[session]['r_d18O_VSMOW']:.4f}", 1872 f"{self.sessions[session][f'r_D{self._4x}']:.4f}", 1873 f"{self.sessions[session]['a']:.3f} ± {self.sessions[session]['SE_a']:.3f}", 1874 f"{1e3*self.sessions[session]['b']:.3f} ± {1e3*self.sessions[session]['SE_b']:.3f}", 1875 f"{self.sessions[session]['c']:.3f} ± {self.sessions[session]['SE_c']:.3f}", 1876 ]] 1877 if include_a2: 1878 if self.sessions[session]['scrambling_drift']: 1879 out[-1] += [f"{self.sessions[session]['a2']:.1e} ± {self.sessions[session]['SE_a2']:.1e}"] 1880 else: 1881 out[-1] += [''] 1882 if include_b2: 1883 if self.sessions[session]['slope_drift']: 1884 out[-1] += [f"{self.sessions[session]['b2']:.1e} ± {self.sessions[session]['SE_b2']:.1e}"] 1885 else: 1886 out[-1] += [''] 1887 if include_c2: 1888 if self.sessions[session]['wg_drift']: 1889 out[-1] += [f"{self.sessions[session]['c2']:.1e} ± {self.sessions[session]['SE_c2']:.1e}"] 1890 else: 1891 out[-1] += [''] 1892 1893 if save_to_file: 1894 if not os.path.exists(dir): 1895 os.makedirs(dir) 1896 if filename is None: 1897 filename = f'D{self._4x}_sessions.csv' 1898 with open(f'{dir}/{filename}', 'w') as fid: 1899 fid.write(make_csv(out)) 1900 if print_out: 1901 self.msg('\n' + pretty_table(out)) 1902 if output == 'raw': 1903 return out 1904 elif output == 'pretty': 1905 return pretty_table(out) 1906 1907 1908 @make_verbal 1909 def table_of_analyses( 1910 self, 1911 dir = 'output', 1912 filename = None, 1913 save_to_file = True, 1914 print_out = True, 1915 output = None, 1916 ): 1917 ''' 1918 Print out an/or save to disk a table of analyses. 1919 1920 **Parameters** 1921 1922 + `dir`: the directory in which to save the table 1923 + `filename`: the name to the csv file to write to 1924 + `save_to_file`: whether to save the table to disk 1925 + `print_out`: whether to print out the table 1926 + `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`); 1927 if set to `'raw'`: return a list of list of strings 1928 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 1929 ''' 1930 1931 out = [['UID','Session','Sample']] 1932 extra_fields = [f for f in [('SampleMass','.2f'),('ColdFingerPressure','.1f'),('AcidReactionYield','.3f')] if f[0] in {k for r in self for k in r}] 1933 for f in extra_fields: 1934 out[-1] += [f[0]] 1935 out[-1] += ['d13Cwg_VPDB','d18Owg_VSMOW','d45','d46','d47','d48','d49','d13C_VPDB','d18O_VSMOW','D47raw','D48raw','D49raw',f'D{self._4x}'] 1936 for r in self: 1937 out += [[f"{r['UID']}",f"{r['Session']}",f"{r['Sample']}"]] 1938 for f in extra_fields: 1939 out[-1] += [f"{r[f[0]]:{f[1]}}"] 1940 out[-1] += [ 1941 f"{r['d13Cwg_VPDB']:.3f}", 1942 f"{r['d18Owg_VSMOW']:.3f}", 1943 f"{r['d45']:.6f}", 1944 f"{r['d46']:.6f}", 1945 f"{r['d47']:.6f}", 1946 f"{r['d48']:.6f}", 1947 f"{r['d49']:.6f}", 1948 f"{r['d13C_VPDB']:.6f}", 1949 f"{r['d18O_VSMOW']:.6f}", 1950 f"{r['D47raw']:.6f}", 1951 f"{r['D48raw']:.6f}", 1952 f"{r['D49raw']:.6f}", 1953 f"{r[f'D{self._4x}']:.6f}" 1954 ] 1955 if save_to_file: 1956 if not os.path.exists(dir): 1957 os.makedirs(dir) 1958 if filename is None: 1959 filename = f'D{self._4x}_analyses.csv' 1960 with open(f'{dir}/{filename}', 'w') as fid: 1961 fid.write(make_csv(out)) 1962 if print_out: 1963 self.msg('\n' + pretty_table(out)) 1964 return out 1965 1966 @make_verbal 1967 def covar_table( 1968 self, 1969 correl = False, 1970 dir = 'output', 1971 filename = None, 1972 save_to_file = True, 1973 print_out = True, 1974 output = None, 1975 ): 1976 ''' 1977 Print out, save to disk and/or return the variance-covariance matrix of D4x 1978 for all unknown samples. 1979 1980 **Parameters** 1981 1982 + `dir`: the directory in which to save the csv 1983 + `filename`: the name of the csv file to write to 1984 + `save_to_file`: whether to save the csv 1985 + `print_out`: whether to print out the matrix 1986 + `output`: if set to `'pretty'`: return a pretty text matrix (see `pretty_table()`); 1987 if set to `'raw'`: return a list of list of strings 1988 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 1989 ''' 1990 samples = sorted([u for u in self.unknowns]) 1991 out = [[''] + samples] 1992 for s1 in samples: 1993 out.append([s1]) 1994 for s2 in samples: 1995 if correl: 1996 out[-1].append(f'{self.sample_D4x_correl(s1, s2):.6f}') 1997 else: 1998 out[-1].append(f'{self.sample_D4x_covar(s1, s2):.8e}') 1999 2000 if save_to_file: 2001 if not os.path.exists(dir): 2002 os.makedirs(dir) 2003 if filename is None: 2004 if correl: 2005 filename = f'D{self._4x}_correl.csv' 2006 else: 2007 filename = f'D{self._4x}_covar.csv' 2008 with open(f'{dir}/{filename}', 'w') as fid: 2009 fid.write(make_csv(out)) 2010 if print_out: 2011 self.msg('\n'+pretty_table(out)) 2012 if output == 'raw': 2013 return out 2014 elif output == 'pretty': 2015 return pretty_table(out) 2016 2017 @make_verbal 2018 def table_of_samples( 2019 self, 2020 dir = 'output', 2021 filename = None, 2022 save_to_file = True, 2023 print_out = True, 2024 output = None, 2025 ): 2026 ''' 2027 Print out, save to disk and/or return a table of samples. 2028 2029 **Parameters** 2030 2031 + `dir`: the directory in which to save the csv 2032 + `filename`: the name of the csv file to write to 2033 + `save_to_file`: whether to save the csv 2034 + `print_out`: whether to print out the table 2035 + `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`); 2036 if set to `'raw'`: return a list of list of strings 2037 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 2038 ''' 2039 2040 out = [['Sample','N','d13C_VPDB','d18O_VSMOW',f'D{self._4x}','SE','95% CL','SD','p_Levene']] 2041 for sample in self.anchors: 2042 out += [[ 2043 f"{sample}", 2044 f"{self.samples[sample]['N']}", 2045 f"{self.samples[sample]['d13C_VPDB']:.2f}", 2046 f"{self.samples[sample]['d18O_VSMOW']:.2f}", 2047 f"{self.samples[sample][f'D{self._4x}']:.4f}",'','', 2048 f"{self.samples[sample][f'SD_D{self._4x}']:.4f}" if self.samples[sample]['N'] > 1 else '', '' 2049 ]] 2050 for sample in self.unknowns: 2051 out += [[ 2052 f"{sample}", 2053 f"{self.samples[sample]['N']}", 2054 f"{self.samples[sample]['d13C_VPDB']:.2f}", 2055 f"{self.samples[sample]['d18O_VSMOW']:.2f}", 2056 f"{self.samples[sample][f'D{self._4x}']:.4f}", 2057 f"{self.samples[sample][f'SE_D{self._4x}']:.4f}", 2058 f"± {self.samples[sample][f'SE_D{self._4x}'] * self.t95:.4f}", 2059 f"{self.samples[sample][f'SD_D{self._4x}']:.4f}" if self.samples[sample]['N'] > 1 else '', 2060 f"{self.samples[sample]['p_Levene']:.3f}" if self.samples[sample]['N'] > 2 else '' 2061 ]] 2062 if save_to_file: 2063 if not os.path.exists(dir): 2064 os.makedirs(dir) 2065 if filename is None: 2066 filename = f'D{self._4x}_samples.csv' 2067 with open(f'{dir}/{filename}', 'w') as fid: 2068 fid.write(make_csv(out)) 2069 if print_out: 2070 self.msg('\n'+pretty_table(out)) 2071 if output == 'raw': 2072 return out 2073 elif output == 'pretty': 2074 return pretty_table(out) 2075 2076 2077 def plot_sessions(self, dir = 'output', figsize = (8,8), filetype = 'pdf', dpi = 100): 2078 ''' 2079 Generate session plots and save them to disk. 2080 2081 **Parameters** 2082 2083 + `dir`: the directory in which to save the plots 2084 + `figsize`: the width and height (in inches) of each plot 2085 + `filetype`: 'pdf' or 'png' 2086 + `dpi`: resolution for PNG output 2087 ''' 2088 if not os.path.exists(dir): 2089 os.makedirs(dir) 2090 2091 for session in self.sessions: 2092 sp = self.plot_single_session(session, xylimits = 'constant') 2093 ppl.savefig(f'{dir}/D{self._4x}_plot_{session}.{filetype}', **({'dpi': dpi} if filetype.lower() == 'png' else {})) 2094 ppl.close(sp.fig) 2095 2096 2097 2098 @make_verbal 2099 def consolidate_samples(self): 2100 ''' 2101 Compile various statistics for each sample. 2102 2103 For each anchor sample: 2104 2105 + `D47` or `D48`: the nominal Δ4x value for this anchor, specified by `self.Nominal_D4x` 2106 + `SE_D47` or `SE_D48`: set to zero by definition 2107 2108 For each unknown sample: 2109 2110 + `D47` or `D48`: the standardized Δ4x value for this unknown 2111 + `SE_D47` or `SE_D48`: the standard error of Δ4x for this unknown 2112 2113 For each anchor and unknown: 2114 2115 + `N`: the total number of analyses of this sample 2116 + `SD_D47` or `SD_D48`: the “sample” (in the statistical sense) standard deviation for this sample 2117 + `d13C_VPDB`: the average δ13C_VPDB value for this sample 2118 + `d18O_VSMOW`: the average δ18O_VSMOW value for this sample (as CO2) 2119 + `p_Levene`: the p-value from a [Levene test](https://en.wikipedia.org/wiki/Levene%27s_test) of equal 2120 variance, indicating whether the Δ4x repeatability this sample differs significantly from 2121 that observed for the reference sample specified by `self.LEVENE_REF_SAMPLE`. 2122 ''' 2123 D4x_ref_pop = [r[f'D{self._4x}'] for r in self.samples[self.LEVENE_REF_SAMPLE]['data']] 2124 for sample in self.samples: 2125 self.samples[sample]['N'] = len(self.samples[sample]['data']) 2126 if self.samples[sample]['N'] > 1: 2127 self.samples[sample][f'SD_D{self._4x}'] = stdev([r[f'D{self._4x}'] for r in self.samples[sample]['data']]) 2128 2129 self.samples[sample]['d13C_VPDB'] = np.mean([r['d13C_VPDB'] for r in self.samples[sample]['data']]) 2130 self.samples[sample]['d18O_VSMOW'] = np.mean([r['d18O_VSMOW'] for r in self.samples[sample]['data']]) 2131 2132 D4x_pop = [r[f'D{self._4x}'] for r in self.samples[sample]['data']] 2133 if len(D4x_pop) > 2: 2134 self.samples[sample]['p_Levene'] = levene(D4x_ref_pop, D4x_pop, center = 'median')[1] 2135 2136 if self.standardization_method == 'pooled': 2137 for sample in self.anchors: 2138 self.samples[sample][f'D{self._4x}'] = self.Nominal_D4x[sample] 2139 self.samples[sample][f'SE_D{self._4x}'] = 0. 2140 for sample in self.unknowns: 2141 self.samples[sample][f'D{self._4x}'] = self.standardization.params.valuesdict()[f'D{self._4x}_{pf(sample)}'] 2142 try: 2143 self.samples[sample][f'SE_D{self._4x}'] = self.sample_D4x_covar(sample)**.5 2144 except ValueError: 2145 # when `sample` is constrained by self.standardize(constraints = {...}), 2146 # it is no longer listed in self.standardization.var_names. 2147 # Temporary fix: define SE as zero for now 2148 self.samples[sample][f'SE_D4{self._4x}'] = 0. 2149 2150 elif self.standardization_method == 'indep_sessions': 2151 for sample in self.anchors: 2152 self.samples[sample][f'D{self._4x}'] = self.Nominal_D4x[sample] 2153 self.samples[sample][f'SE_D{self._4x}'] = 0. 2154 for sample in self.unknowns: 2155 self.msg(f'Consolidating sample {sample}') 2156 self.unknowns[sample][f'session_D{self._4x}'] = {} 2157 session_avg = [] 2158 for session in self.sessions: 2159 sdata = [r for r in self.sessions[session]['data'] if r['Sample'] == sample] 2160 if sdata: 2161 self.msg(f'{sample} found in session {session}') 2162 avg_D4x = np.mean([r[f'D{self._4x}'] for r in sdata]) 2163 avg_d4x = np.mean([r[f'd{self._4x}'] for r in sdata]) 2164 # !! TODO: sigma_s below does not account for temporal changes in standardization error 2165 sigma_s = self.standardization_error(session, avg_d4x, avg_D4x) 2166 sigma_u = sdata[0][f'wD{self._4x}raw'] / self.sessions[session]['a'] / len(sdata)**.5 2167 session_avg.append([avg_D4x, (sigma_u**2 + sigma_s**2)**.5]) 2168 self.unknowns[sample][f'session_D{self._4x}'][session] = session_avg[-1] 2169 self.samples[sample][f'D{self._4x}'], self.samples[sample][f'SE_D{self._4x}'] = w_avg(*zip(*session_avg)) 2170 weights = {s: self.unknowns[sample][f'session_D{self._4x}'][s][1]**-2 for s in self.unknowns[sample][f'session_D{self._4x}']} 2171 wsum = sum([weights[s] for s in weights]) 2172 for s in weights: 2173 self.unknowns[sample][f'session_D{self._4x}'][s] += [self.unknowns[sample][f'session_D{self._4x}'][s][1]**-2 / wsum] 2174 2175 for r in self: 2176 r[f'D{self._4x}_residual'] = r[f'D{self._4x}'] - self.samples[r['Sample']][f'D{self._4x}'] 2177 2178 2179 2180 def consolidate_sessions(self): 2181 ''' 2182 Compute various statistics for each session. 2183 2184 + `Na`: Number of anchor analyses in the session 2185 + `Nu`: Number of unknown analyses in the session 2186 + `r_d13C_VPDB`: δ13C_VPDB repeatability of analyses within the session 2187 + `r_d18O_VSMOW`: δ18O_VSMOW repeatability of analyses within the session 2188 + `r_D47` or `r_D48`: Δ4x repeatability of analyses within the session 2189 + `a`: scrambling factor 2190 + `b`: compositional slope 2191 + `c`: WG offset 2192 + `SE_a`: Model stadard erorr of `a` 2193 + `SE_b`: Model stadard erorr of `b` 2194 + `SE_c`: Model stadard erorr of `c` 2195 + `scrambling_drift` (boolean): whether to allow a temporal drift in the scrambling factor (`a`) 2196 + `slope_drift` (boolean): whether to allow a temporal drift in the compositional slope (`b`) 2197 + `wg_drift` (boolean): whether to allow a temporal drift in the WG offset (`c`) 2198 + `a2`: scrambling factor drift 2199 + `b2`: compositional slope drift 2200 + `c2`: WG offset drift 2201 + `Np`: Number of standardization parameters to fit 2202 + `CM`: model covariance matrix for (`a`, `b`, `c`, `a2`, `b2`, `c2`) 2203 + `d13Cwg_VPDB`: δ13C_VPDB of WG 2204 + `d18Owg_VSMOW`: δ18O_VSMOW of WG 2205 ''' 2206 for session in self.sessions: 2207 if 'd13Cwg_VPDB' not in self.sessions[session]: 2208 self.sessions[session]['d13Cwg_VPDB'] = self.sessions[session]['data'][0]['d13Cwg_VPDB'] 2209 if 'd18Owg_VSMOW' not in self.sessions[session]: 2210 self.sessions[session]['d18Owg_VSMOW'] = self.sessions[session]['data'][0]['d18Owg_VSMOW'] 2211 self.sessions[session]['Na'] = len([r for r in self.sessions[session]['data'] if r['Sample'] in self.anchors]) 2212 self.sessions[session]['Nu'] = len([r for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns]) 2213 2214 self.msg(f'Computing repeatabilities for session {session}') 2215 self.sessions[session]['r_d13C_VPDB'] = self.compute_r('d13C_VPDB', samples = 'anchors', sessions = [session]) 2216 self.sessions[session]['r_d18O_VSMOW'] = self.compute_r('d18O_VSMOW', samples = 'anchors', sessions = [session]) 2217 self.sessions[session][f'r_D{self._4x}'] = self.compute_r(f'D{self._4x}', sessions = [session]) 2218 2219 if self.standardization_method == 'pooled': 2220 for session in self.sessions: 2221 2222 # different (better?) computation of D4x repeatability for each session: 2223 sqresiduals = [(r[f'D{self._4x}'] - self.samples[r['Sample']][f'D{self._4x}'])**2 for r in self.sessions[session]['data']] 2224 self.sessions[session][f'r_D{self._4x}'] = np.mean(sqresiduals)**.5 2225 2226 self.sessions[session]['a'] = self.standardization.params.valuesdict()[f'a_{pf(session)}'] 2227 i = self.standardization.var_names.index(f'a_{pf(session)}') 2228 self.sessions[session]['SE_a'] = self.standardization.covar[i,i]**.5 2229 2230 self.sessions[session]['b'] = self.standardization.params.valuesdict()[f'b_{pf(session)}'] 2231 i = self.standardization.var_names.index(f'b_{pf(session)}') 2232 self.sessions[session]['SE_b'] = self.standardization.covar[i,i]**.5 2233 2234 self.sessions[session]['c'] = self.standardization.params.valuesdict()[f'c_{pf(session)}'] 2235 i = self.standardization.var_names.index(f'c_{pf(session)}') 2236 self.sessions[session]['SE_c'] = self.standardization.covar[i,i]**.5 2237 2238 self.sessions[session]['a2'] = self.standardization.params.valuesdict()[f'a2_{pf(session)}'] 2239 if self.sessions[session]['scrambling_drift']: 2240 i = self.standardization.var_names.index(f'a2_{pf(session)}') 2241 self.sessions[session]['SE_a2'] = self.standardization.covar[i,i]**.5 2242 else: 2243 self.sessions[session]['SE_a2'] = 0. 2244 2245 self.sessions[session]['b2'] = self.standardization.params.valuesdict()[f'b2_{pf(session)}'] 2246 if self.sessions[session]['slope_drift']: 2247 i = self.standardization.var_names.index(f'b2_{pf(session)}') 2248 self.sessions[session]['SE_b2'] = self.standardization.covar[i,i]**.5 2249 else: 2250 self.sessions[session]['SE_b2'] = 0. 2251 2252 self.sessions[session]['c2'] = self.standardization.params.valuesdict()[f'c2_{pf(session)}'] 2253 if self.sessions[session]['wg_drift']: 2254 i = self.standardization.var_names.index(f'c2_{pf(session)}') 2255 self.sessions[session]['SE_c2'] = self.standardization.covar[i,i]**.5 2256 else: 2257 self.sessions[session]['SE_c2'] = 0. 2258 2259 i = self.standardization.var_names.index(f'a_{pf(session)}') 2260 j = self.standardization.var_names.index(f'b_{pf(session)}') 2261 k = self.standardization.var_names.index(f'c_{pf(session)}') 2262 CM = np.zeros((6,6)) 2263 CM[:3,:3] = self.standardization.covar[[i,j,k],:][:,[i,j,k]] 2264 try: 2265 i2 = self.standardization.var_names.index(f'a2_{pf(session)}') 2266 CM[3,[0,1,2,3]] = self.standardization.covar[i2,[i,j,k,i2]] 2267 CM[[0,1,2,3],3] = self.standardization.covar[[i,j,k,i2],i2] 2268 try: 2269 j2 = self.standardization.var_names.index(f'b2_{pf(session)}') 2270 CM[3,4] = self.standardization.covar[i2,j2] 2271 CM[4,3] = self.standardization.covar[j2,i2] 2272 except ValueError: 2273 pass 2274 try: 2275 k2 = self.standardization.var_names.index(f'c2_{pf(session)}') 2276 CM[3,5] = self.standardization.covar[i2,k2] 2277 CM[5,3] = self.standardization.covar[k2,i2] 2278 except ValueError: 2279 pass 2280 except ValueError: 2281 pass 2282 try: 2283 j2 = self.standardization.var_names.index(f'b2_{pf(session)}') 2284 CM[4,[0,1,2,4]] = self.standardization.covar[j2,[i,j,k,j2]] 2285 CM[[0,1,2,4],4] = self.standardization.covar[[i,j,k,j2],j2] 2286 try: 2287 k2 = self.standardization.var_names.index(f'c2_{pf(session)}') 2288 CM[4,5] = self.standardization.covar[j2,k2] 2289 CM[5,4] = self.standardization.covar[k2,j2] 2290 except ValueError: 2291 pass 2292 except ValueError: 2293 pass 2294 try: 2295 k2 = self.standardization.var_names.index(f'c2_{pf(session)}') 2296 CM[5,[0,1,2,5]] = self.standardization.covar[k2,[i,j,k,k2]] 2297 CM[[0,1,2,5],5] = self.standardization.covar[[i,j,k,k2],k2] 2298 except ValueError: 2299 pass 2300 2301 self.sessions[session]['CM'] = CM 2302 2303 elif self.standardization_method == 'indep_sessions': 2304 pass # Not implemented yet 2305 2306 2307 @make_verbal 2308 def repeatabilities(self): 2309 ''' 2310 Compute analytical repeatabilities for δ13C_VPDB, δ18O_VSMOW, Δ4x 2311 (for all samples, for anchors, and for unknowns). 2312 ''' 2313 self.msg('Computing reproducibilities for all sessions') 2314 2315 self.repeatability['r_d13C_VPDB'] = self.compute_r('d13C_VPDB', samples = 'anchors') 2316 self.repeatability['r_d18O_VSMOW'] = self.compute_r('d18O_VSMOW', samples = 'anchors') 2317 self.repeatability[f'r_D{self._4x}a'] = self.compute_r(f'D{self._4x}', samples = 'anchors') 2318 self.repeatability[f'r_D{self._4x}u'] = self.compute_r(f'D{self._4x}', samples = 'unknowns') 2319 self.repeatability[f'r_D{self._4x}'] = self.compute_r(f'D{self._4x}', samples = 'all samples') 2320 2321 2322 @make_verbal 2323 def consolidate(self, tables = True, plots = True): 2324 ''' 2325 Collect information about samples, sessions and repeatabilities. 2326 ''' 2327 self.consolidate_samples() 2328 self.consolidate_sessions() 2329 self.repeatabilities() 2330 2331 if tables: 2332 self.summary() 2333 self.table_of_sessions() 2334 self.table_of_analyses() 2335 self.table_of_samples() 2336 2337 if plots: 2338 self.plot_sessions() 2339 2340 2341 @make_verbal 2342 def rmswd(self, 2343 samples = 'all samples', 2344 sessions = 'all sessions', 2345 ): 2346 ''' 2347 Compute the χ2, root mean squared weighted deviation 2348 (i.e. reduced χ2), and corresponding degrees of freedom of the 2349 Δ4x values for samples in `samples` and sessions in `sessions`. 2350 2351 Only used in `D4xdata.standardize()` with `method='indep_sessions'`. 2352 ''' 2353 if samples == 'all samples': 2354 mysamples = [k for k in self.samples] 2355 elif samples == 'anchors': 2356 mysamples = [k for k in self.anchors] 2357 elif samples == 'unknowns': 2358 mysamples = [k for k in self.unknowns] 2359 else: 2360 mysamples = samples 2361 2362 if sessions == 'all sessions': 2363 sessions = [k for k in self.sessions] 2364 2365 chisq, Nf = 0, 0 2366 for sample in mysamples : 2367 G = [ r for r in self if r['Sample'] == sample and r['Session'] in sessions ] 2368 if len(G) > 1 : 2369 X, sX = w_avg([r[f'D{self._4x}'] for r in G], [r[f'wD{self._4x}'] for r in G]) 2370 Nf += (len(G) - 1) 2371 chisq += np.sum([ ((r[f'D{self._4x}']-X)/r[f'wD{self._4x}'])**2 for r in G]) 2372 r = (chisq / Nf)**.5 if Nf > 0 else 0 2373 self.msg(f'RMSWD of r["D{self._4x}"] is {r:.6f} for {samples}.') 2374 return {'rmswd': r, 'chisq': chisq, 'Nf': Nf} 2375 2376 2377 @make_verbal 2378 def compute_r(self, key, samples = 'all samples', sessions = 'all sessions'): 2379 ''' 2380 Compute the repeatability of `[r[key] for r in self]` 2381 ''' 2382 2383 if samples == 'all samples': 2384 mysamples = [k for k in self.samples] 2385 elif samples == 'anchors': 2386 mysamples = [k for k in self.anchors] 2387 elif samples == 'unknowns': 2388 mysamples = [k for k in self.unknowns] 2389 else: 2390 mysamples = samples 2391 2392 if sessions == 'all sessions': 2393 sessions = [k for k in self.sessions] 2394 2395 if key in ['D47', 'D48']: 2396 # Full disclosure: the definition of Nf is tricky/debatable 2397 G = [r for r in self if r['Sample'] in mysamples and r['Session'] in sessions] 2398 chisq = (np.array([r[f'{key}_residual'] for r in G])**2).sum() 2399 Nf = len(G) 2400# print(f'len(G) = {Nf}') 2401 Nf -= len([s for s in mysamples if s in self.unknowns]) 2402# print(f'{len([s for s in mysamples if s in self.unknowns])} unknown samples to consider') 2403 for session in sessions: 2404 Np = len([ 2405 _ for _ in self.standardization.params 2406 if ( 2407 self.standardization.params[_].expr is not None 2408 and ( 2409 (_[0] in 'abc' and _[1] == '_' and _[2:] == pf(session)) 2410 or (_[0] in 'abc' and _[1:3] == '2_' and _[3:] == pf(session)) 2411 ) 2412 ) 2413 ]) 2414# print(f'session {session}: {Np} parameters to consider') 2415 Na = len({ 2416 r['Sample'] for r in self.sessions[session]['data'] 2417 if r['Sample'] in self.anchors and r['Sample'] in mysamples 2418 }) 2419# print(f'session {session}: {Na} different anchors in that session') 2420 Nf -= min(Np, Na) 2421# print(f'Nf = {Nf}') 2422 2423# for sample in mysamples : 2424# X = [ r[key] for r in self if r['Sample'] == sample and r['Session'] in sessions ] 2425# if len(X) > 1 : 2426# chisq += np.sum([ (x-self.samples[sample][key])**2 for x in X ]) 2427# if sample in self.unknowns: 2428# Nf += len(X) - 1 2429# else: 2430# Nf += len(X) 2431# if samples in ['anchors', 'all samples']: 2432# Nf -= sum([self.sessions[s]['Np'] for s in sessions]) 2433 r = (chisq / Nf)**.5 if Nf > 0 else 0 2434 2435 else: # if key not in ['D47', 'D48'] 2436 chisq, Nf = 0, 0 2437 for sample in mysamples : 2438 X = [ r[key] for r in self if r['Sample'] == sample and r['Session'] in sessions ] 2439 if len(X) > 1 : 2440 Nf += len(X) - 1 2441 chisq += np.sum([ (x-np.mean(X))**2 for x in X ]) 2442 r = (chisq / Nf)**.5 if Nf > 0 else 0 2443 2444 self.msg(f'Repeatability of r["{key}"] is {1000*r:.1f} ppm for {samples}.') 2445 return r 2446 2447 def sample_average(self, samples, weights = 'equal', normalize = True): 2448 ''' 2449 Weighted average Δ4x value of a group of samples, accounting for covariance. 2450 2451 Returns the weighed average Δ4x value and associated SE 2452 of a group of samples. Weights are equal by default. If `normalize` is 2453 true, `weights` will be rescaled so that their sum equals 1. 2454 2455 **Examples** 2456 2457 ```python 2458 self.sample_average(['X','Y'], [1, 2]) 2459 ``` 2460 2461 returns the value and SE of [Δ4x(X) + 2 Δ4x(Y)]/3, 2462 where Δ4x(X) and Δ4x(Y) are the average Δ4x 2463 values of samples X and Y, respectively. 2464 2465 ```python 2466 self.sample_average(['X','Y'], [1, -1], normalize = False) 2467 ``` 2468 2469 returns the value and SE of the difference Δ4x(X) - Δ4x(Y). 2470 ''' 2471 if weights == 'equal': 2472 weights = [1/len(samples)] * len(samples) 2473 2474 if normalize: 2475 s = sum(weights) 2476 if s: 2477 weights = [w/s for w in weights] 2478 2479 try: 2480# indices = [self.standardization.var_names.index(f'D47_{pf(sample)}') for sample in samples] 2481# C = self.standardization.covar[indices,:][:,indices] 2482 C = np.array([[self.sample_D4x_covar(x, y) for x in samples] for y in samples]) 2483 X = [self.samples[sample][f'D{self._4x}'] for sample in samples] 2484 return correlated_sum(X, C, weights) 2485 except ValueError: 2486 return (0., 0.) 2487 2488 2489 def sample_D4x_covar(self, sample1, sample2 = None): 2490 ''' 2491 Covariance between Δ4x values of samples 2492 2493 Returns the error covariance between the average Δ4x values of two 2494 samples. If if only `sample_1` is specified, or if `sample_1 == sample_2`), 2495 returns the Δ4x variance for that sample. 2496 ''' 2497 if sample2 is None: 2498 sample2 = sample1 2499 if self.standardization_method == 'pooled': 2500 i = self.standardization.var_names.index(f'D{self._4x}_{pf(sample1)}') 2501 j = self.standardization.var_names.index(f'D{self._4x}_{pf(sample2)}') 2502 return self.standardization.covar[i, j] 2503 elif self.standardization_method == 'indep_sessions': 2504 if sample1 == sample2: 2505 return self.samples[sample1][f'SE_D{self._4x}']**2 2506 else: 2507 c = 0 2508 for session in self.sessions: 2509 sdata1 = [r for r in self.sessions[session]['data'] if r['Sample'] == sample1] 2510 sdata2 = [r for r in self.sessions[session]['data'] if r['Sample'] == sample2] 2511 if sdata1 and sdata2: 2512 a = self.sessions[session]['a'] 2513 # !! TODO: CM below does not account for temporal changes in standardization parameters 2514 CM = self.sessions[session]['CM'][:3,:3] 2515 avg_D4x_1 = np.mean([r[f'D{self._4x}'] for r in sdata1]) 2516 avg_d4x_1 = np.mean([r[f'd{self._4x}'] for r in sdata1]) 2517 avg_D4x_2 = np.mean([r[f'D{self._4x}'] for r in sdata2]) 2518 avg_d4x_2 = np.mean([r[f'd{self._4x}'] for r in sdata2]) 2519 c += ( 2520 self.unknowns[sample1][f'session_D{self._4x}'][session][2] 2521 * self.unknowns[sample2][f'session_D{self._4x}'][session][2] 2522 * np.array([[avg_D4x_1, avg_d4x_1, 1]]) 2523 @ CM 2524 @ np.array([[avg_D4x_2, avg_d4x_2, 1]]).T 2525 ) / a**2 2526 return float(c) 2527 2528 def sample_D4x_correl(self, sample1, sample2 = None): 2529 ''' 2530 Correlation between Δ4x errors of samples 2531 2532 Returns the error correlation between the average Δ4x values of two samples. 2533 ''' 2534 if sample2 is None or sample2 == sample1: 2535 return 1. 2536 return ( 2537 self.sample_D4x_covar(sample1, sample2) 2538 / self.unknowns[sample1][f'SE_D{self._4x}'] 2539 / self.unknowns[sample2][f'SE_D{self._4x}'] 2540 ) 2541 2542 def plot_single_session(self, 2543 session, 2544 kw_plot_anchors = dict(ls='None', marker='x', mec=(.75, 0, 0), mew = .75, ms = 4), 2545 kw_plot_unknowns = dict(ls='None', marker='x', mec=(0, 0, .75), mew = .75, ms = 4), 2546 kw_plot_anchor_avg = dict(ls='-', marker='None', color=(.75, 0, 0), lw = .75), 2547 kw_plot_unknown_avg = dict(ls='-', marker='None', color=(0, 0, .75), lw = .75), 2548 kw_contour_error = dict(colors = [[0, 0, 0]], alpha = .5, linewidths = 0.75), 2549 xylimits = 'free', # | 'constant' 2550 x_label = None, 2551 y_label = None, 2552 error_contour_interval = 'auto', 2553 fig = 'new', 2554 ): 2555 ''' 2556 Generate plot for a single session 2557 ''' 2558 if x_label is None: 2559 x_label = f'δ$_{{{self._4x}}}$ (‰)' 2560 if y_label is None: 2561 y_label = f'Δ$_{{{self._4x}}}$ (‰)' 2562 2563 out = _SessionPlot() 2564 anchors = [a for a in self.anchors if [r for r in self.sessions[session]['data'] if r['Sample'] == a]] 2565 unknowns = [u for u in self.unknowns if [r for r in self.sessions[session]['data'] if r['Sample'] == u]] 2566 anchors_d = [r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.anchors] 2567 anchors_D = [r[f'D{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.anchors] 2568 unknowns_d = [r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns] 2569 unknowns_D = [r[f'D{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns] 2570 anchor_avg = (np.array([ np.array([ 2571 np.min([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) - 1, 2572 np.max([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) + 1 2573 ]) for sample in anchors]).T, 2574 np.array([ np.array([0, 0]) + self.Nominal_D4x[sample] for sample in anchors]).T) 2575 unknown_avg = (np.array([ np.array([ 2576 np.min([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) - 1, 2577 np.max([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) + 1 2578 ]) for sample in unknowns]).T, 2579 np.array([ np.array([0, 0]) + self.unknowns[sample][f'D{self._4x}'] for sample in unknowns]).T) 2580 2581 2582 if fig == 'new': 2583 out.fig = ppl.figure(figsize = (6,6)) 2584 ppl.subplots_adjust(.1,.1,.9,.9) 2585 2586 out.anchor_analyses, = ppl.plot( 2587 anchors_d, 2588 anchors_D, 2589 **kw_plot_anchors) 2590 out.unknown_analyses, = ppl.plot( 2591 unknowns_d, 2592 unknowns_D, 2593 **kw_plot_unknowns) 2594 out.anchor_avg = ppl.plot( 2595 *anchor_avg, 2596 **kw_plot_anchor_avg) 2597 out.unknown_avg = ppl.plot( 2598 *unknown_avg, 2599 **kw_plot_unknown_avg) 2600 if xylimits == 'constant': 2601 x = [r[f'd{self._4x}'] for r in self] 2602 y = [r[f'D{self._4x}'] for r in self] 2603 x1, x2, y1, y2 = np.min(x), np.max(x), np.min(y), np.max(y) 2604 w, h = x2-x1, y2-y1 2605 x1 -= w/20 2606 x2 += w/20 2607 y1 -= h/20 2608 y2 += h/20 2609 ppl.axis([x1, x2, y1, y2]) 2610 elif xylimits == 'free': 2611 x1, x2, y1, y2 = ppl.axis() 2612 else: 2613 x1, x2, y1, y2 = ppl.axis(xylimits) 2614 2615 if error_contour_interval != 'none': 2616 xi, yi = np.linspace(x1, x2), np.linspace(y1, y2) 2617 XI,YI = np.meshgrid(xi, yi) 2618 SI = np.array([[self.standardization_error(session, x, y) for x in xi] for y in yi]) 2619 if error_contour_interval == 'auto': 2620 rng = np.max(SI) - np.min(SI) 2621 if rng <= 0.01: 2622 cinterval = 0.001 2623 elif rng <= 0.03: 2624 cinterval = 0.004 2625 elif rng <= 0.1: 2626 cinterval = 0.01 2627 elif rng <= 0.3: 2628 cinterval = 0.03 2629 elif rng <= 1.: 2630 cinterval = 0.1 2631 else: 2632 cinterval = 0.5 2633 else: 2634 cinterval = error_contour_interval 2635 2636 cval = np.arange(np.ceil(SI.min() / .001) * .001, np.ceil(SI.max() / .001 + 1) * .001, cinterval) 2637 out.contour = ppl.contour(XI, YI, SI, cval, **kw_contour_error) 2638 out.clabel = ppl.clabel(out.contour) 2639 contour = (XI, YI, SI, cval, cinterval) 2640 2641 if fig == None: 2642 return { 2643 'anchors':anchors, 2644 'unknowns':unknowns, 2645 'anchors_d':anchors_d, 2646 'anchors_D':anchors_D, 2647 'unknowns_d':unknowns_d, 2648 'unknowns_D':unknowns_D, 2649 'anchor_avg':anchor_avg, 2650 'unknown_avg':unknown_avg, 2651 'contour':contour, 2652 } 2653 2654 ppl.xlabel(x_label) 2655 ppl.ylabel(y_label) 2656 ppl.title(session, weight = 'bold') 2657 ppl.grid(alpha = .2) 2658 out.ax = ppl.gca() 2659 2660 return out 2661 2662 def plot_residuals( 2663 self, 2664 kde = False, 2665 hist = False, 2666 binwidth = 2/3, 2667 dir = 'output', 2668 filename = None, 2669 highlight = [], 2670 colors = None, 2671 figsize = None, 2672 dpi = 100, 2673 yspan = None, 2674 ): 2675 ''' 2676 Plot residuals of each analysis as a function of time (actually, as a function of 2677 the order of analyses in the `D4xdata` object) 2678 2679 + `kde`: whether to add a kernel density estimate of residuals 2680 + `hist`: whether to add a histogram of residuals (incompatible with `kde`) 2681 + `histbins`: specify bin edges for the histogram 2682 + `dir`: the directory in which to save the plot 2683 + `highlight`: a list of samples to highlight 2684 + `colors`: a dict of `{<sample>: <color>}` for all samples 2685 + `figsize`: (width, height) of figure 2686 + `dpi`: resolution for PNG output 2687 + `yspan`: factor controlling the range of y values shown in plot 2688 (by default: `yspan = 1.5 if kde else 1.0`) 2689 ''' 2690 2691 from matplotlib import ticker 2692 2693 if yspan is None: 2694 if kde: 2695 yspan = 1.5 2696 else: 2697 yspan = 1.0 2698 2699 # Layout 2700 fig = ppl.figure(figsize = (8,4) if figsize is None else figsize) 2701 if hist or kde: 2702 ppl.subplots_adjust(left = .08, bottom = .05, right = .98, top = .8, wspace = -0.72) 2703 ax1, ax2 = ppl.subplot(121), ppl.subplot(1,15,15) 2704 else: 2705 ppl.subplots_adjust(.08,.05,.78,.8) 2706 ax1 = ppl.subplot(111) 2707 2708 # Colors 2709 N = len(self.anchors) 2710 if colors is None: 2711 if len(highlight) > 0: 2712 Nh = len(highlight) 2713 if Nh == 1: 2714 colors = {highlight[0]: (0,0,0)} 2715 elif Nh == 3: 2716 colors = {a: c for a,c in zip(highlight, [(0,0,1), (1,0,0), (0,2/3,0)])} 2717 elif Nh == 4: 2718 colors = {a: c for a,c in zip(highlight, [(0,0,1), (1,0,0), (0,2/3,0), (.75,0,.75)])} 2719 else: 2720 colors = {a: hls_to_rgb(k/Nh, .4, 1) for k,a in enumerate(highlight)} 2721 else: 2722 if N == 3: 2723 colors = {a: c for a,c in zip(self.anchors, [(0,0,1), (1,0,0), (0,2/3,0)])} 2724 elif N == 4: 2725 colors = {a: c for a,c in zip(self.anchors, [(0,0,1), (1,0,0), (0,2/3,0), (.75,0,.75)])} 2726 else: 2727 colors = {a: hls_to_rgb(k/N, .4, 1) for k,a in enumerate(self.anchors)} 2728 2729 ppl.sca(ax1) 2730 2731 ppl.axhline(0, color = 'k', alpha = .25, lw = 0.75) 2732 2733 ax1.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: f'${x:+.0f}$' if x else '$0$')) 2734 2735 session = self[0]['Session'] 2736 x1 = 0 2737# ymax = np.max([1e3 * (r['D47'] - self.samples[r['Sample']]['D47']) for r in self]) 2738 x_sessions = {} 2739 one_or_more_singlets = False 2740 one_or_more_multiplets = False 2741 multiplets = set() 2742 for k,r in enumerate(self): 2743 if r['Session'] != session: 2744 x2 = k-1 2745 x_sessions[session] = (x1+x2)/2 2746 ppl.axvline(k - 0.5, color = 'k', lw = .5) 2747 session = r['Session'] 2748 x1 = k 2749 singlet = len(self.samples[r['Sample']]['data']) == 1 2750 if not singlet: 2751 multiplets.add(r['Sample']) 2752 if r['Sample'] in self.unknowns: 2753 if singlet: 2754 one_or_more_singlets = True 2755 else: 2756 one_or_more_multiplets = True 2757 kw = dict( 2758 marker = 'x' if singlet else '+', 2759 ms = 4 if singlet else 5, 2760 ls = 'None', 2761 mec = colors[r['Sample']] if r['Sample'] in colors else (0,0,0), 2762 mew = 1, 2763 alpha = 0.2 if singlet else 1, 2764 ) 2765 if highlight and r['Sample'] not in highlight: 2766 kw['alpha'] = 0.2 2767 ppl.plot(k, 1e3 * r[f'D{self._4x}_residual'], **kw) 2768 x2 = k 2769 x_sessions[session] = (x1+x2)/2 2770 2771 ppl.axhspan(-self.repeatability[f'r_D{self._4x}']*1000, self.repeatability[f'r_D{self._4x}']*1000, color = 'k', alpha = .05, lw = 1) 2772 ppl.axhspan(-self.repeatability[f'r_D{self._4x}']*1000*self.t95, self.repeatability[f'r_D{self._4x}']*1000*self.t95, color = 'k', alpha = .05, lw = 1) 2773 if not (hist or kde): 2774 ppl.text(len(self), self.repeatability[f'r_D{self._4x}']*1000, f" SD = {self.repeatability[f'r_D{self._4x}']*1000:.1f} ppm", size = 9, alpha = 1, va = 'center') 2775 ppl.text(len(self), self.repeatability[f'r_D{self._4x}']*1000*self.t95, f" 95% CL = ± {self.repeatability[f'r_D{self._4x}']*1000*self.t95:.1f} ppm", size = 9, alpha = 1, va = 'center') 2776 2777 xmin, xmax, ymin, ymax = ppl.axis() 2778 if yspan != 1: 2779 ymin, ymax = (ymin + ymax)/2 - yspan * (ymax - ymin)/2, (ymin + ymax)/2 + yspan * (ymax - ymin)/2 2780 for s in x_sessions: 2781 ppl.text( 2782 x_sessions[s], 2783 ymax +1, 2784 s, 2785 va = 'bottom', 2786 **( 2787 dict(ha = 'center') 2788 if len(self.sessions[s]['data']) > (0.15 * len(self)) 2789 else dict(ha = 'left', rotation = 45) 2790 ) 2791 ) 2792 2793 if hist or kde: 2794 ppl.sca(ax2) 2795 2796 for s in colors: 2797 kw['marker'] = '+' 2798 kw['ms'] = 5 2799 kw['mec'] = colors[s] 2800 kw['label'] = s 2801 kw['alpha'] = 1 2802 ppl.plot([], [], **kw) 2803 2804 kw['mec'] = (0,0,0) 2805 2806 if one_or_more_singlets: 2807 kw['marker'] = 'x' 2808 kw['ms'] = 4 2809 kw['alpha'] = .2 2810 kw['label'] = 'other (N$\\,$=$\\,$1)' if one_or_more_multiplets else 'other' 2811 ppl.plot([], [], **kw) 2812 2813 if one_or_more_multiplets: 2814 kw['marker'] = '+' 2815 kw['ms'] = 4 2816 kw['alpha'] = 1 2817 kw['label'] = 'other (N$\\,$>$\\,$1)' if one_or_more_singlets else 'other' 2818 ppl.plot([], [], **kw) 2819 2820 if hist or kde: 2821 leg = ppl.legend(loc = 'upper right', bbox_to_anchor = (1, 1), bbox_transform=fig.transFigure, borderaxespad = 1.5, fontsize = 9) 2822 else: 2823 leg = ppl.legend(loc = 'lower right', bbox_to_anchor = (1, 0), bbox_transform=fig.transFigure, borderaxespad = 1.5) 2824 leg.set_zorder(-1000) 2825 2826 ppl.sca(ax1) 2827 2828 ppl.ylabel(f'Δ$_{{{self._4x}}}$ residuals (ppm)') 2829 ppl.xticks([]) 2830 ppl.axis([-1, len(self), None, None]) 2831 2832 if hist or kde: 2833 ppl.sca(ax2) 2834 X = 1e3 * np.array([r[f'D{self._4x}_residual'] for r in self if r['Sample'] in multiplets or r['Sample'] in self.anchors]) 2835 2836 if kde: 2837 from scipy.stats import gaussian_kde 2838 yi = np.linspace(ymin, ymax, 201) 2839 xi = gaussian_kde(X).evaluate(yi) 2840 ppl.fill_betweenx(yi, xi, xi*0, fc = (0,0,0,.15), lw = 1, ec = (.75,.75,.75,1)) 2841# ppl.plot(xi, yi, 'k-', lw = 1) 2842 elif hist: 2843 ppl.hist( 2844 X, 2845 orientation = 'horizontal', 2846 histtype = 'stepfilled', 2847 ec = [.4]*3, 2848 fc = [.25]*3, 2849 alpha = .25, 2850 bins = np.linspace(-9e3*self.repeatability[f'r_D{self._4x}'], 9e3*self.repeatability[f'r_D{self._4x}'], int(18/binwidth+1)), 2851 ) 2852 ppl.text(0, 0, 2853 f" SD = {self.repeatability[f'r_D{self._4x}']*1000:.1f} ppm\n 95% CL = ± {self.repeatability[f'r_D{self._4x}']*1000*self.t95:.1f} ppm", 2854 size = 7.5, 2855 alpha = 1, 2856 va = 'center', 2857 ha = 'left', 2858 ) 2859 2860 ppl.axis([0, None, ymin, ymax]) 2861 ppl.xticks([]) 2862 ppl.yticks([]) 2863# ax2.spines['left'].set_visible(False) 2864 ax2.spines['right'].set_visible(False) 2865 ax2.spines['top'].set_visible(False) 2866 ax2.spines['bottom'].set_visible(False) 2867 2868 ax1.axis([None, None, ymin, ymax]) 2869 2870 if not os.path.exists(dir): 2871 os.makedirs(dir) 2872 if filename is None: 2873 return fig 2874 elif filename == '': 2875 filename = f'D{self._4x}_residuals.pdf' 2876 ppl.savefig(f'{dir}/{filename}', dpi = dpi) 2877 ppl.close(fig) 2878 2879 2880 def simulate(self, *args, **kwargs): 2881 ''' 2882 Legacy function with warning message pointing to `virtual_data()` 2883 ''' 2884 raise DeprecationWarning('D4xdata.simulate is deprecated and has been replaced by virtual_data()') 2885 2886 def plot_distribution_of_analyses( 2887 self, 2888 dir = 'output', 2889 filename = None, 2890 vs_time = False, 2891 figsize = (6,4), 2892 subplots_adjust = (0.02, 0.13, 0.85, 0.8), 2893 output = None, 2894 dpi = 100, 2895 ): 2896 ''' 2897 Plot temporal distribution of all analyses in the data set. 2898 2899 **Parameters** 2900 2901 + `dir`: the directory in which to save the plot 2902 + `vs_time`: if `True`, plot as a function of `TimeTag` rather than sequentially. 2903 + `dpi`: resolution for PNG output 2904 + `figsize`: (width, height) of figure 2905 + `dpi`: resolution for PNG output 2906 ''' 2907 2908 asamples = [s for s in self.anchors] 2909 usamples = [s for s in self.unknowns] 2910 if output is None or output == 'fig': 2911 fig = ppl.figure(figsize = figsize) 2912 ppl.subplots_adjust(*subplots_adjust) 2913 Xmin = min([r['TimeTag'] if vs_time else j for j,r in enumerate(self)]) 2914 Xmax = max([r['TimeTag'] if vs_time else j for j,r in enumerate(self)]) 2915 Xmax += (Xmax-Xmin)/40 2916 Xmin -= (Xmax-Xmin)/41 2917 for k, s in enumerate(asamples + usamples): 2918 if vs_time: 2919 X = [r['TimeTag'] for r in self if r['Sample'] == s] 2920 else: 2921 X = [x for x,r in enumerate(self) if r['Sample'] == s] 2922 Y = [-k for x in X] 2923 ppl.plot(X, Y, 'o', mec = None, mew = 0, mfc = 'b' if s in usamples else 'r', ms = 3, alpha = .75) 2924 ppl.axhline(-k, color = 'b' if s in usamples else 'r', lw = .5, alpha = .25) 2925 ppl.text(Xmax, -k, f' {s}', va = 'center', ha = 'left', size = 7, color = 'b' if s in usamples else 'r') 2926 ppl.axis([Xmin, Xmax, -k-1, 1]) 2927 ppl.xlabel('\ntime') 2928 ppl.gca().annotate('', 2929 xy = (0.6, -0.02), 2930 xycoords = 'axes fraction', 2931 xytext = (.4, -0.02), 2932 arrowprops = dict(arrowstyle = "->", color = 'k'), 2933 ) 2934 2935 2936 x2 = -1 2937 for session in self.sessions: 2938 x1 = min([r['TimeTag'] if vs_time else j for j,r in enumerate(self) if r['Session'] == session]) 2939 if vs_time: 2940 ppl.axvline(x1, color = 'k', lw = .75) 2941 if x2 > -1: 2942 if not vs_time: 2943 ppl.axvline((x1+x2)/2, color = 'k', lw = .75, alpha = .5) 2944 x2 = max([r['TimeTag'] if vs_time else j for j,r in enumerate(self) if r['Session'] == session]) 2945# from xlrd import xldate_as_datetime 2946# print(session, xldate_as_datetime(x1, 0), xldate_as_datetime(x2, 0)) 2947 if vs_time: 2948 ppl.axvline(x2, color = 'k', lw = .75) 2949 ppl.axvspan(x1,x2,color = 'k', zorder = -100, alpha = .15) 2950 ppl.text((x1+x2)/2, 1, f' {session}', ha = 'left', va = 'bottom', rotation = 45, size = 8) 2951 2952 ppl.xticks([]) 2953 ppl.yticks([]) 2954 2955 if output is None: 2956 if not os.path.exists(dir): 2957 os.makedirs(dir) 2958 if filename == None: 2959 filename = f'D{self._4x}_distribution_of_analyses.pdf' 2960 ppl.savefig(f'{dir}/{filename}', dpi = dpi) 2961 ppl.close(fig) 2962 elif output == 'ax': 2963 return ppl.gca() 2964 elif output == 'fig': 2965 return fig 2966 2967 2968 def plot_bulk_compositions( 2969 self, 2970 samples = None, 2971 dir = 'output/bulk_compositions', 2972 figsize = (6,6), 2973 subplots_adjust = (0.15, 0.12, 0.95, 0.92), 2974 show = False, 2975 sample_color = (0,.5,1), 2976 analysis_color = (.7,.7,.7), 2977 labeldist = 0.3, 2978 radius = 0.05, 2979 ): 2980 ''' 2981 Plot δ13C_VBDP vs δ18O_VSMOW (of CO2) for all analyses. 2982 2983 By default, creates a directory `./output/bulk_compositions` where plots for 2984 each sample are saved. Another plot named `__all__.pdf` shows all analyses together. 2985 2986 2987 **Parameters** 2988 2989 + `samples`: Only these samples are processed (by default: all samples). 2990 + `dir`: where to save the plots 2991 + `figsize`: (width, height) of figure 2992 + `subplots_adjust`: passed to `subplots_adjust()` 2993 + `show`: whether to call `matplotlib.pyplot.show()` on the plot with all samples, 2994 allowing for interactive visualization/exploration in (δ13C, δ18O) space. 2995 + `sample_color`: color used for replicate markers/labels 2996 + `analysis_color`: color used for sample markers/labels 2997 + `labeldist`: distance (in inches) from replicate markers to replicate labels 2998 + `radius`: radius of the dashed circle providing scale. No circle if `radius = 0`. 2999 ''' 3000 3001 from matplotlib.patches import Ellipse 3002 3003 if samples is None: 3004 samples = [_ for _ in self.samples] 3005 3006 saved = {} 3007 3008 for s in samples: 3009 3010 fig = ppl.figure(figsize = figsize) 3011 fig.subplots_adjust(*subplots_adjust) 3012 ax = ppl.subplot(111) 3013 ppl.xlabel('$δ^{18}O_{VSMOW}$ of $CO_2$ (‰)') 3014 ppl.ylabel('$δ^{13}C_{VPDB}$ (‰)') 3015 ppl.title(s) 3016 3017 3018 XY = np.array([[_['d18O_VSMOW'], _['d13C_VPDB']] for _ in self.samples[s]['data']]) 3019 UID = [_['UID'] for _ in self.samples[s]['data']] 3020 XY0 = XY.mean(0) 3021 3022 for xy in XY: 3023 ppl.plot([xy[0], XY0[0]], [xy[1], XY0[1]], '-', lw = 1, color = analysis_color) 3024 3025 ppl.plot(*XY.T, 'wo', mew = 1, mec = analysis_color) 3026 ppl.plot(*XY0, 'wo', mew = 2, mec = sample_color) 3027 ppl.text(*XY0, f' {s}', va = 'center', ha = 'left', color = sample_color, weight = 'bold') 3028 saved[s] = [XY, XY0] 3029 3030 x1, x2, y1, y2 = ppl.axis() 3031 x0, dx = (x1+x2)/2, (x2-x1)/2 3032 y0, dy = (y1+y2)/2, (y2-y1)/2 3033 dx, dy = [max(max(dx, dy), radius)]*2 3034 3035 ppl.axis([ 3036 x0 - 1.2*dx, 3037 x0 + 1.2*dx, 3038 y0 - 1.2*dy, 3039 y0 + 1.2*dy, 3040 ]) 3041 3042 XY0_in_display_space = fig.dpi_scale_trans.inverted().transform(ax.transData.transform(XY0)) 3043 3044 for xy, uid in zip(XY, UID): 3045 3046 xy_in_display_space = fig.dpi_scale_trans.inverted().transform(ax.transData.transform(xy)) 3047 vector_in_display_space = xy_in_display_space - XY0_in_display_space 3048 3049 if (vector_in_display_space**2).sum() > 0: 3050 3051 unit_vector_in_display_space = vector_in_display_space / ((vector_in_display_space**2).sum())**0.5 3052 label_vector_in_display_space = vector_in_display_space + unit_vector_in_display_space * labeldist 3053 label_xy_in_display_space = XY0_in_display_space + label_vector_in_display_space 3054 label_xy_in_data_space = ax.transData.inverted().transform(fig.dpi_scale_trans.transform(label_xy_in_display_space)) 3055 3056 ppl.text(*label_xy_in_data_space, uid, va = 'center', ha = 'center', color = analysis_color) 3057 3058 else: 3059 3060 ppl.text(*xy, f'{uid} ', va = 'center', ha = 'right', color = analysis_color) 3061 3062 if radius: 3063 ax.add_artist(Ellipse( 3064 xy = XY0, 3065 width = radius*2, 3066 height = radius*2, 3067 ls = (0, (2,2)), 3068 lw = .7, 3069 ec = analysis_color, 3070 fc = 'None', 3071 )) 3072 ppl.text( 3073 XY0[0], 3074 XY0[1]-radius, 3075 f'\n± {radius*1e3:.0f} ppm', 3076 color = analysis_color, 3077 va = 'top', 3078 ha = 'center', 3079 linespacing = 0.4, 3080 size = 8, 3081 ) 3082 3083 if not os.path.exists(dir): 3084 os.makedirs(dir) 3085 fig.savefig(f'{dir}/{s}.pdf') 3086 ppl.close(fig) 3087 3088 fig = ppl.figure(figsize = figsize) 3089 fig.subplots_adjust(*subplots_adjust) 3090 ppl.xlabel('$δ^{18}O_{VSMOW}$ of $CO_2$ (‰)') 3091 ppl.ylabel('$δ^{13}C_{VPDB}$ (‰)') 3092 3093 for s in saved: 3094 for xy in saved[s][0]: 3095 ppl.plot([xy[0], saved[s][1][0]], [xy[1], saved[s][1][1]], '-', lw = 1, color = analysis_color) 3096 ppl.plot(*saved[s][0].T, 'wo', mew = 1, mec = analysis_color) 3097 ppl.plot(*saved[s][1], 'wo', mew = 1.5, mec = sample_color) 3098 ppl.text(*saved[s][1], f' {s}', va = 'center', ha = 'left', color = sample_color, weight = 'bold') 3099 3100 x1, x2, y1, y2 = ppl.axis() 3101 ppl.axis([ 3102 x1 - (x2-x1)/10, 3103 x2 + (x2-x1)/10, 3104 y1 - (y2-y1)/10, 3105 y2 + (y2-y1)/10, 3106 ]) 3107 3108 3109 if not os.path.exists(dir): 3110 os.makedirs(dir) 3111 fig.savefig(f'{dir}/__all__.pdf') 3112 if show: 3113 ppl.show() 3114 ppl.close(fig) 3115 3116 3117 def _save_D4x_correl( 3118 self, 3119 samples = None, 3120 dir = 'output', 3121 filename = None, 3122 D4x_precision = 4, 3123 correl_precision = 4, 3124 ): 3125 ''' 3126 Save D4x values along with their SE and correlation matrix. 3127 3128 **Parameters** 3129 3130 + `samples`: Only these samples are output (by default: all samples). 3131 + `dir`: the directory in which to save the faile (by defaut: `output`) 3132 + `filename`: the name to the csv file to write to (by default: `D4x_correl.csv`) 3133 + `D4x_precision`: the precision to use when writing `D4x` and `D4x_SE` values (by default: 4) 3134 + `correl_precision`: the precision to use when writing correlation factor values (by default: 4) 3135 ''' 3136 if samples is None: 3137 samples = sorted([s for s in self.unknowns]) 3138 3139 out = [['Sample']] + [[s] for s in samples] 3140 out[0] += [f'D{self._4x}', f'D{self._4x}_SE', f'D{self._4x}_correl'] 3141 for k,s in enumerate(samples): 3142 out[k+1] += [f'{self.samples[s][f"D{self._4x}"]:.4f}', f'{self.samples[s][f"SE_D{self._4x}"]:.4f}'] 3143 for s2 in samples: 3144 out[k+1] += [f'{self.sample_D4x_correl(s,s2):.4f}'] 3145 3146 if not os.path.exists(dir): 3147 os.makedirs(dir) 3148 if filename is None: 3149 filename = f'D{self._4x}_correl.csv' 3150 with open(f'{dir}/{filename}', 'w') as fid: 3151 fid.write(make_csv(out))
Store and process data for a large set of Δ47 and/or Δ48 analyses, usually comprising more than one analytical session.
923 def __init__(self, l = [], mass = '47', logfile = '', session = 'mySession', verbose = False): 924 ''' 925 **Parameters** 926 927 + `l`: a list of dictionaries, with each dictionary including at least the keys 928 `Sample`, `d45`, `d46`, and `d47` or `d48`. 929 + `mass`: `'47'` or `'48'` 930 + `logfile`: if specified, write detailed logs to this file path when calling `D4xdata` methods. 931 + `session`: define session name for analyses without a `Session` key 932 + `verbose`: if `True`, print out detailed logs when calling `D4xdata` methods. 933 934 Returns a `D4xdata` object derived from `list`. 935 ''' 936 self._4x = mass 937 self.verbose = verbose 938 self.prefix = 'D4xdata' 939 self.logfile = logfile 940 list.__init__(self, l) 941 self.Nf = None 942 self.repeatability = {} 943 self.refresh(session = session)
Parameters
l
: a list of dictionaries, with each dictionary including at least the keysSample
,d45
,d46
, andd47
ord48
.mass
:'47'
or'48'
logfile
: if specified, write detailed logs to this file path when callingD4xdata
methods.session
: define session name for analyses without aSession
keyverbose
: ifTrue
, print out detailed logs when callingD4xdata
methods.
Returns a D4xdata
object derived from list
.
Absolute (18O/16C) ratio of VSMOW. By default equal to 0.0020052 (Baertschi, 1976)
Mass-dependent exponent for triple oxygen isotopes. By default equal to 0.528 (Barkan & Luz, 2005)
Absolute (17O/16C) ratio of VSMOW.
By default equal to 0.00038475
(Assonov & Brenninkmeijer, 2003,
rescaled to R13_VPDB
)
Absolute (18O/16C) ratio of VPDB.
By definition equal to R18_VSMOW * 1.03092
.
Absolute (17O/16C) ratio of VPDB.
By definition equal to R17_VSMOW * 1.03092 ** LAMBDA_17
.
After the Δ4x standardization step, each sample is tested to assess whether the Δ4x variance within all analyses for that sample differs significantly from that observed for a given reference sample (using Levene's test, which yields a p-value corresponding to the null hypothesis that the underlying variances are equal).
LEVENE_REF_SAMPLE
(by default equal to 'ETH-3'
) specifies which
sample should be used as a reference for this test.
Specifies the 18O/16O fractionation factor generally applicable
to acid reactions in the dataset. Currently used by D4xdata.wg()
,
D4xdata.standardize_d13C
, and D4xdata.standardize_d18O
.
By default equal to 1.008129 (calcite reacted at 90 °C, Kim et al., 2007).
Nominal δ13CVPDB values assigned to carbonate standards, used by
D4xdata.standardize_d13C()
.
By default equal to {'ETH-1': 2.02, 'ETH-2': -10.17, 'ETH-3': 1.71}
after
Bernasconi et al. (2018).
Nominal δ18OVPDB values assigned to carbonate standards, used by
D4xdata.standardize_d18O()
.
By default equal to {'ETH-1': -2.19, 'ETH-2': -18.69, 'ETH-3': -1.78}
after
Bernasconi et al. (2018).
Method by which to standardize δ13C values:
none
: do not apply any δ13C standardization.'1pt'
: within each session, offset all initial δ13C values so as to minimize the difference between final δ13CVPDB values andNominal_d13C_VPDB
(averaged over all analyses for whichNominal_d13C_VPDB
is defined).'2pt'
: within each session, apply a affine trasformation to all δ13C values so as to minimize the difference between final δ13CVPDB values andNominal_d13C_VPDB
(averaged over all analyses for whichNominal_d13C_VPDB
is defined).
Method by which to standardize δ18O values:
none
: do not apply any δ18O standardization.'1pt'
: within each session, offset all initial δ18O values so as to minimize the difference between final δ18OVPDB values andNominal_d18O_VPDB
(averaged over all analyses for whichNominal_d18O_VPDB
is defined).'2pt'
: within each session, apply a affine trasformation to all δ18O values so as to minimize the difference between final δ18OVPDB values andNominal_d18O_VPDB
(averaged over all analyses for whichNominal_d18O_VPDB
is defined).
946 def make_verbal(oldfun): 947 ''' 948 Decorator: allow temporarily changing `self.prefix` and overriding `self.verbose`. 949 ''' 950 @wraps(oldfun) 951 def newfun(*args, verbose = '', **kwargs): 952 myself = args[0] 953 oldprefix = myself.prefix 954 myself.prefix = oldfun.__name__ 955 if verbose != '': 956 oldverbose = myself.verbose 957 myself.verbose = verbose 958 out = oldfun(*args, **kwargs) 959 myself.prefix = oldprefix 960 if verbose != '': 961 myself.verbose = oldverbose 962 return out 963 return newfun
Decorator: allow temporarily changing self.prefix
and overriding self.verbose
.
966 def msg(self, txt): 967 ''' 968 Log a message to `self.logfile`, and print it out if `verbose = True` 969 ''' 970 self.log(txt) 971 if self.verbose: 972 print(f'{f"[{self.prefix}]":<16} {txt}')
Log a message to self.logfile
, and print it out if verbose = True
975 def vmsg(self, txt): 976 ''' 977 Log a message to `self.logfile` and print it out 978 ''' 979 self.log(txt) 980 print(txt)
Log a message to self.logfile
and print it out
983 def log(self, *txts): 984 ''' 985 Log a message to `self.logfile` 986 ''' 987 if self.logfile: 988 with open(self.logfile, 'a') as fid: 989 for txt in txts: 990 fid.write(f'\n{dt.now().strftime("%Y-%m-%d %H:%M:%S")} {f"[{self.prefix}]":<16} {txt}')
Log a message to self.logfile
993 def refresh(self, session = 'mySession'): 994 ''' 995 Update `self.sessions`, `self.samples`, `self.anchors`, and `self.unknowns`. 996 ''' 997 self.fill_in_missing_info(session = session) 998 self.refresh_sessions() 999 self.refresh_samples()
Update self.sessions
, self.samples
, self.anchors
, and self.unknowns
.
1002 def refresh_sessions(self): 1003 ''' 1004 Update `self.sessions` and set `scrambling_drift`, `slope_drift`, and `wg_drift` 1005 to `False` for all sessions. 1006 ''' 1007 self.sessions = { 1008 s: {'data': [r for r in self if r['Session'] == s]} 1009 for s in sorted({r['Session'] for r in self}) 1010 } 1011 for s in self.sessions: 1012 self.sessions[s]['scrambling_drift'] = False 1013 self.sessions[s]['slope_drift'] = False 1014 self.sessions[s]['wg_drift'] = False 1015 self.sessions[s]['d13C_standardization_method'] = self.d13C_STANDARDIZATION_METHOD 1016 self.sessions[s]['d18O_standardization_method'] = self.d18O_STANDARDIZATION_METHOD
Update self.sessions
and set scrambling_drift
, slope_drift
, and wg_drift
to False
for all sessions.
1019 def refresh_samples(self): 1020 ''' 1021 Define `self.samples`, `self.anchors`, and `self.unknowns`. 1022 ''' 1023 self.samples = { 1024 s: {'data': [r for r in self if r['Sample'] == s]} 1025 for s in sorted({r['Sample'] for r in self}) 1026 } 1027 self.anchors = {s: self.samples[s] for s in self.samples if s in self.Nominal_D4x} 1028 self.unknowns = {s: self.samples[s] for s in self.samples if s not in self.Nominal_D4x}
Define self.samples
, self.anchors
, and self.unknowns
.
1031 def read(self, filename, sep = '', session = ''): 1032 ''' 1033 Read file in csv format to load data into a `D47data` object. 1034 1035 In the csv file, spaces before and after field separators (`','` by default) 1036 are optional. Each line corresponds to a single analysis. 1037 1038 The required fields are: 1039 1040 + `UID`: a unique identifier 1041 + `Session`: an identifier for the analytical session 1042 + `Sample`: a sample identifier 1043 + `d45`, `d46`, and at least one of `d47` or `d48`: the working-gas delta values 1044 1045 Independently known oxygen-17 anomalies may be provided as `D17O` (in ‰ relative to 1046 VSMOW, λ = `self.LAMBDA_17`), and are otherwise assumed to be zero. Working-gas deltas `d47`, `d48` 1047 and `d49` are optional, and set to NaN by default. 1048 1049 **Parameters** 1050 1051 + `fileneme`: the path of the file to read 1052 + `sep`: csv separator delimiting the fields 1053 + `session`: set `Session` field to this string for all analyses 1054 ''' 1055 with open(filename) as fid: 1056 self.input(fid.read(), sep = sep, session = session)
Read file in csv format to load data into a D47data
object.
In the csv file, spaces before and after field separators (','
by default)
are optional. Each line corresponds to a single analysis.
The required fields are:
UID
: a unique identifierSession
: an identifier for the analytical sessionSample
: a sample identifierd45
,d46
, and at least one ofd47
ord48
: the working-gas delta values
Independently known oxygen-17 anomalies may be provided as D17O
(in ‰ relative to
VSMOW, λ = self.LAMBDA_17
), and are otherwise assumed to be zero. Working-gas deltas d47
, d48
and d49
are optional, and set to NaN by default.
Parameters
fileneme
: the path of the file to readsep
: csv separator delimiting the fieldssession
: setSession
field to this string for all analyses
1059 def input(self, txt, sep = '', session = ''): 1060 ''' 1061 Read `txt` string in csv format to load analysis data into a `D47data` object. 1062 1063 In the csv string, spaces before and after field separators (`','` by default) 1064 are optional. Each line corresponds to a single analysis. 1065 1066 The required fields are: 1067 1068 + `UID`: a unique identifier 1069 + `Session`: an identifier for the analytical session 1070 + `Sample`: a sample identifier 1071 + `d45`, `d46`, and at least one of `d47` or `d48`: the working-gas delta values 1072 1073 Independently known oxygen-17 anomalies may be provided as `D17O` (in ‰ relative to 1074 VSMOW, λ = `self.LAMBDA_17`), and are otherwise assumed to be zero. Working-gas deltas `d47`, `d48` 1075 and `d49` are optional, and set to NaN by default. 1076 1077 **Parameters** 1078 1079 + `txt`: the csv string to read 1080 + `sep`: csv separator delimiting the fields. By default, use `,`, `;`, or `\t`, 1081 whichever appers most often in `txt`. 1082 + `session`: set `Session` field to this string for all analyses 1083 ''' 1084 if sep == '': 1085 sep = sorted(',;\t', key = lambda x: - txt.count(x))[0] 1086 txt = [[x.strip() for x in l.split(sep)] for l in txt.splitlines() if l.strip()] 1087 data = [{k: v if k in ['UID', 'Session', 'Sample'] else smart_type(v) for k,v in zip(txt[0], l) if v != ''} for l in txt[1:]] 1088 1089 if session != '': 1090 for r in data: 1091 r['Session'] = session 1092 1093 self += data 1094 self.refresh()
Read txt
string in csv format to load analysis data into a D47data
object.
In the csv string, spaces before and after field separators (','
by default)
are optional. Each line corresponds to a single analysis.
The required fields are:
UID
: a unique identifierSession
: an identifier for the analytical sessionSample
: a sample identifierd45
,d46
, and at least one ofd47
ord48
: the working-gas delta values
Independently known oxygen-17 anomalies may be provided as D17O
(in ‰ relative to
VSMOW, λ = self.LAMBDA_17
), and are otherwise assumed to be zero. Working-gas deltas d47
, d48
and d49
are optional, and set to NaN by default.
Parameters
txt
: the csv string to readsep
: csv separator delimiting the fields. By default, use,
,;
, or, whichever appers most often in
txt
.session
: setSession
field to this string for all analyses
1097 @make_verbal 1098 def wg(self, samples = None, a18_acid = None): 1099 ''' 1100 Compute bulk composition of the working gas for each session based on 1101 the carbonate standards defined in both `self.Nominal_d13C_VPDB` and 1102 `self.Nominal_d18O_VPDB`. 1103 ''' 1104 1105 self.msg('Computing WG composition:') 1106 1107 if a18_acid is None: 1108 a18_acid = self.ALPHA_18O_ACID_REACTION 1109 if samples is None: 1110 samples = [s for s in self.Nominal_d13C_VPDB if s in self.Nominal_d18O_VPDB] 1111 1112 assert a18_acid, f'Acid fractionation factor should not be zero.' 1113 1114 samples = [s for s in samples if s in self.Nominal_d13C_VPDB and s in self.Nominal_d18O_VPDB] 1115 R45R46_standards = {} 1116 for sample in samples: 1117 d13C_vpdb = self.Nominal_d13C_VPDB[sample] 1118 d18O_vpdb = self.Nominal_d18O_VPDB[sample] 1119 R13_s = self.R13_VPDB * (1 + d13C_vpdb / 1000) 1120 R17_s = self.R17_VPDB * ((1 + d18O_vpdb / 1000) * a18_acid) ** self.LAMBDA_17 1121 R18_s = self.R18_VPDB * (1 + d18O_vpdb / 1000) * a18_acid 1122 1123 C12_s = 1 / (1 + R13_s) 1124 C13_s = R13_s / (1 + R13_s) 1125 C16_s = 1 / (1 + R17_s + R18_s) 1126 C17_s = R17_s / (1 + R17_s + R18_s) 1127 C18_s = R18_s / (1 + R17_s + R18_s) 1128 1129 C626_s = C12_s * C16_s ** 2 1130 C627_s = 2 * C12_s * C16_s * C17_s 1131 C628_s = 2 * C12_s * C16_s * C18_s 1132 C636_s = C13_s * C16_s ** 2 1133 C637_s = 2 * C13_s * C16_s * C17_s 1134 C727_s = C12_s * C17_s ** 2 1135 1136 R45_s = (C627_s + C636_s) / C626_s 1137 R46_s = (C628_s + C637_s + C727_s) / C626_s 1138 R45R46_standards[sample] = (R45_s, R46_s) 1139 1140 for s in self.sessions: 1141 db = [r for r in self.sessions[s]['data'] if r['Sample'] in samples] 1142 assert db, f'No sample from {samples} found in session "{s}".' 1143# dbsamples = sorted({r['Sample'] for r in db}) 1144 1145 X = [r['d45'] for r in db] 1146 Y = [R45R46_standards[r['Sample']][0] for r in db] 1147 x1, x2 = np.min(X), np.max(X) 1148 1149 if x1 < x2: 1150 wgcoord = x1/(x1-x2) 1151 else: 1152 wgcoord = 999 1153 1154 if wgcoord < -.5 or wgcoord > 1.5: 1155 # unreasonable to extrapolate to d45 = 0 1156 R45_wg = np.mean([y/(1+x/1000) for x,y in zip(X,Y)]) 1157 else : 1158 # d45 = 0 is reasonably well bracketed 1159 R45_wg = np.polyfit(X, Y, 1)[1] 1160 1161 X = [r['d46'] for r in db] 1162 Y = [R45R46_standards[r['Sample']][1] for r in db] 1163 x1, x2 = np.min(X), np.max(X) 1164 1165 if x1 < x2: 1166 wgcoord = x1/(x1-x2) 1167 else: 1168 wgcoord = 999 1169 1170 if wgcoord < -.5 or wgcoord > 1.5: 1171 # unreasonable to extrapolate to d46 = 0 1172 R46_wg = np.mean([y/(1+x/1000) for x,y in zip(X,Y)]) 1173 else : 1174 # d46 = 0 is reasonably well bracketed 1175 R46_wg = np.polyfit(X, Y, 1)[1] 1176 1177 d13Cwg_VPDB, d18Owg_VSMOW = self.compute_bulk_delta(R45_wg, R46_wg) 1178 1179 self.msg(f'Session {s} WG: δ13C_VPDB = {d13Cwg_VPDB:.3f} δ18O_VSMOW = {d18Owg_VSMOW:.3f}') 1180 1181 self.sessions[s]['d13Cwg_VPDB'] = d13Cwg_VPDB 1182 self.sessions[s]['d18Owg_VSMOW'] = d18Owg_VSMOW 1183 for r in self.sessions[s]['data']: 1184 r['d13Cwg_VPDB'] = d13Cwg_VPDB 1185 r['d18Owg_VSMOW'] = d18Owg_VSMOW
Compute bulk composition of the working gas for each session based on
the carbonate standards defined in both self.Nominal_d13C_VPDB
and
self.Nominal_d18O_VPDB
.
1188 def compute_bulk_delta(self, R45, R46, D17O = 0): 1189 ''' 1190 Compute δ13C_VPDB and δ18O_VSMOW, 1191 by solving the generalized form of equation (17) from 1192 [Brand et al. (2010)](https://doi.org/10.1351/PAC-REP-09-01-05), 1193 assuming that δ18O_VSMOW is not too big (0 ± 50 ‰) and 1194 solving the corresponding second-order Taylor polynomial. 1195 (Appendix A of [Daëron et al., 2016](https://doi.org/10.1016/j.chemgeo.2016.08.014)) 1196 ''' 1197 1198 K = np.exp(D17O / 1000) * self.R17_VSMOW * self.R18_VSMOW ** -self.LAMBDA_17 1199 1200 A = -3 * K ** 2 * self.R18_VSMOW ** (2 * self.LAMBDA_17) 1201 B = 2 * K * R45 * self.R18_VSMOW ** self.LAMBDA_17 1202 C = 2 * self.R18_VSMOW 1203 D = -R46 1204 1205 aa = A * self.LAMBDA_17 * (2 * self.LAMBDA_17 - 1) + B * self.LAMBDA_17 * (self.LAMBDA_17 - 1) / 2 1206 bb = 2 * A * self.LAMBDA_17 + B * self.LAMBDA_17 + C 1207 cc = A + B + C + D 1208 1209 d18O_VSMOW = 1000 * (-bb + (bb ** 2 - 4 * aa * cc) ** .5) / (2 * aa) 1210 1211 R18 = (1 + d18O_VSMOW / 1000) * self.R18_VSMOW 1212 R17 = K * R18 ** self.LAMBDA_17 1213 R13 = R45 - 2 * R17 1214 1215 d13C_VPDB = 1000 * (R13 / self.R13_VPDB - 1) 1216 1217 return d13C_VPDB, d18O_VSMOW
Compute δ13CVPDB and δ18OVSMOW, by solving the generalized form of equation (17) from Brand et al. (2010), assuming that δ18OVSMOW is not too big (0 ± 50 ‰) and solving the corresponding second-order Taylor polynomial. (Appendix A of Daëron et al., 2016)
1220 @make_verbal 1221 def crunch(self, verbose = ''): 1222 ''' 1223 Compute bulk composition and raw clumped isotope anomalies for all analyses. 1224 ''' 1225 for r in self: 1226 self.compute_bulk_and_clumping_deltas(r) 1227 self.standardize_d13C() 1228 self.standardize_d18O() 1229 self.msg(f"Crunched {len(self)} analyses.")
Compute bulk composition and raw clumped isotope anomalies for all analyses.
1232 def fill_in_missing_info(self, session = 'mySession'): 1233 ''' 1234 Fill in optional fields with default values 1235 ''' 1236 for i,r in enumerate(self): 1237 if 'D17O' not in r: 1238 r['D17O'] = 0. 1239 if 'UID' not in r: 1240 r['UID'] = f'{i+1}' 1241 if 'Session' not in r: 1242 r['Session'] = session 1243 for k in ['d47', 'd48', 'd49']: 1244 if k not in r: 1245 r[k] = np.nan
Fill in optional fields with default values
1248 def standardize_d13C(self): 1249 ''' 1250 Perform δ13C standadization within each session `s` according to 1251 `self.sessions[s]['d13C_standardization_method']`, which is defined by default 1252 by `D47data.refresh_sessions()`as equal to `self.d13C_STANDARDIZATION_METHOD`, but 1253 may be redefined abitrarily at a later stage. 1254 ''' 1255 for s in self.sessions: 1256 if self.sessions[s]['d13C_standardization_method'] in ['1pt', '2pt']: 1257 XY = [(r['d13C_VPDB'], self.Nominal_d13C_VPDB[r['Sample']]) for r in self.sessions[s]['data'] if r['Sample'] in self.Nominal_d13C_VPDB] 1258 X,Y = zip(*XY) 1259 if self.sessions[s]['d13C_standardization_method'] == '1pt': 1260 offset = np.mean(Y) - np.mean(X) 1261 for r in self.sessions[s]['data']: 1262 r['d13C_VPDB'] += offset 1263 elif self.sessions[s]['d13C_standardization_method'] == '2pt': 1264 a,b = np.polyfit(X,Y,1) 1265 for r in self.sessions[s]['data']: 1266 r['d13C_VPDB'] = a * r['d13C_VPDB'] + b
Perform δ13C standadization within each session s
according to
self.sessions[s]['d13C_standardization_method']
, which is defined by default
by D47data.refresh_sessions()
as equal to self.d13C_STANDARDIZATION_METHOD
, but
may be redefined abitrarily at a later stage.
1268 def standardize_d18O(self): 1269 ''' 1270 Perform δ18O standadization within each session `s` according to 1271 `self.ALPHA_18O_ACID_REACTION` and `self.sessions[s]['d18O_standardization_method']`, 1272 which is defined by default by `D47data.refresh_sessions()`as equal to 1273 `self.d18O_STANDARDIZATION_METHOD`, but may be redefined abitrarily at a later stage. 1274 ''' 1275 for s in self.sessions: 1276 if self.sessions[s]['d18O_standardization_method'] in ['1pt', '2pt']: 1277 XY = [(r['d18O_VSMOW'], self.Nominal_d18O_VPDB[r['Sample']]) for r in self.sessions[s]['data'] if r['Sample'] in self.Nominal_d18O_VPDB] 1278 X,Y = zip(*XY) 1279 Y = [(1000+y) * self.R18_VPDB * self.ALPHA_18O_ACID_REACTION / self.R18_VSMOW - 1000 for y in Y] 1280 if self.sessions[s]['d18O_standardization_method'] == '1pt': 1281 offset = np.mean(Y) - np.mean(X) 1282 for r in self.sessions[s]['data']: 1283 r['d18O_VSMOW'] += offset 1284 elif self.sessions[s]['d18O_standardization_method'] == '2pt': 1285 a,b = np.polyfit(X,Y,1) 1286 for r in self.sessions[s]['data']: 1287 r['d18O_VSMOW'] = a * r['d18O_VSMOW'] + b
Perform δ18O standadization within each session s
according to
self.ALPHA_18O_ACID_REACTION
and self.sessions[s]['d18O_standardization_method']
,
which is defined by default by D47data.refresh_sessions()
as equal to
self.d18O_STANDARDIZATION_METHOD
, but may be redefined abitrarily at a later stage.
1290 def compute_bulk_and_clumping_deltas(self, r): 1291 ''' 1292 Compute δ13C_VPDB, δ18O_VSMOW, and raw Δ47, Δ48, Δ49 values for a single analysis `r`. 1293 ''' 1294 1295 # Compute working gas R13, R18, and isobar ratios 1296 R13_wg = self.R13_VPDB * (1 + r['d13Cwg_VPDB'] / 1000) 1297 R18_wg = self.R18_VSMOW * (1 + r['d18Owg_VSMOW'] / 1000) 1298 R45_wg, R46_wg, R47_wg, R48_wg, R49_wg = self.compute_isobar_ratios(R13_wg, R18_wg) 1299 1300 # Compute analyte isobar ratios 1301 R45 = (1 + r['d45'] / 1000) * R45_wg 1302 R46 = (1 + r['d46'] / 1000) * R46_wg 1303 R47 = (1 + r['d47'] / 1000) * R47_wg 1304 R48 = (1 + r['d48'] / 1000) * R48_wg 1305 R49 = (1 + r['d49'] / 1000) * R49_wg 1306 1307 r['d13C_VPDB'], r['d18O_VSMOW'] = self.compute_bulk_delta(R45, R46, D17O = r['D17O']) 1308 R13 = (1 + r['d13C_VPDB'] / 1000) * self.R13_VPDB 1309 R18 = (1 + r['d18O_VSMOW'] / 1000) * self.R18_VSMOW 1310 1311 # Compute stochastic isobar ratios of the analyte 1312 R45stoch, R46stoch, R47stoch, R48stoch, R49stoch = self.compute_isobar_ratios( 1313 R13, R18, D17O = r['D17O'] 1314 ) 1315 1316 # Check that R45/R45stoch and R46/R46stoch are undistinguishable from 1, 1317 # and raise a warning if the corresponding anomalies exceed 0.02 ppm. 1318 if (R45 / R45stoch - 1) > 5e-8: 1319 self.vmsg(f'This is unexpected: R45/R45stoch - 1 = {1e6 * (R45 / R45stoch - 1):.3f} ppm') 1320 if (R46 / R46stoch - 1) > 5e-8: 1321 self.vmsg(f'This is unexpected: R46/R46stoch - 1 = {1e6 * (R46 / R46stoch - 1):.3f} ppm') 1322 1323 # Compute raw clumped isotope anomalies 1324 r['D47raw'] = 1000 * (R47 / R47stoch - 1) 1325 r['D48raw'] = 1000 * (R48 / R48stoch - 1) 1326 r['D49raw'] = 1000 * (R49 / R49stoch - 1)
Compute δ13CVPDB, δ18OVSMOW, and raw Δ47, Δ48, Δ49 values for a single analysis r
.
1329 def compute_isobar_ratios(self, R13, R18, D17O=0, D47=0, D48=0, D49=0): 1330 ''' 1331 Compute isobar ratios for a sample with isotopic ratios `R13` and `R18`, 1332 optionally accounting for non-zero values of Δ17O (`D17O`) and clumped isotope 1333 anomalies (`D47`, `D48`, `D49`), all expressed in permil. 1334 ''' 1335 1336 # Compute R17 1337 R17 = self.R17_VSMOW * np.exp(D17O / 1000) * (R18 / self.R18_VSMOW) ** self.LAMBDA_17 1338 1339 # Compute isotope concentrations 1340 C12 = (1 + R13) ** -1 1341 C13 = C12 * R13 1342 C16 = (1 + R17 + R18) ** -1 1343 C17 = C16 * R17 1344 C18 = C16 * R18 1345 1346 # Compute stochastic isotopologue concentrations 1347 C626 = C16 * C12 * C16 1348 C627 = C16 * C12 * C17 * 2 1349 C628 = C16 * C12 * C18 * 2 1350 C636 = C16 * C13 * C16 1351 C637 = C16 * C13 * C17 * 2 1352 C638 = C16 * C13 * C18 * 2 1353 C727 = C17 * C12 * C17 1354 C728 = C17 * C12 * C18 * 2 1355 C737 = C17 * C13 * C17 1356 C738 = C17 * C13 * C18 * 2 1357 C828 = C18 * C12 * C18 1358 C838 = C18 * C13 * C18 1359 1360 # Compute stochastic isobar ratios 1361 R45 = (C636 + C627) / C626 1362 R46 = (C628 + C637 + C727) / C626 1363 R47 = (C638 + C728 + C737) / C626 1364 R48 = (C738 + C828) / C626 1365 R49 = C838 / C626 1366 1367 # Account for stochastic anomalies 1368 R47 *= 1 + D47 / 1000 1369 R48 *= 1 + D48 / 1000 1370 R49 *= 1 + D49 / 1000 1371 1372 # Return isobar ratios 1373 return R45, R46, R47, R48, R49
Compute isobar ratios for a sample with isotopic ratios R13
and R18
,
optionally accounting for non-zero values of Δ17O (D17O
) and clumped isotope
anomalies (D47
, D48
, D49
), all expressed in permil.
1376 def split_samples(self, samples_to_split = 'all', grouping = 'by_session'): 1377 ''' 1378 Split unknown samples by UID (treat all analyses as different samples) 1379 or by session (treat analyses of a given sample in different sessions as 1380 different samples). 1381 1382 **Parameters** 1383 1384 + `samples_to_split`: a list of samples to split, e.g., `['IAEA-C1', 'IAEA-C2']` 1385 + `grouping`: `by_uid` | `by_session` 1386 ''' 1387 if samples_to_split == 'all': 1388 samples_to_split = [s for s in self.unknowns] 1389 gkeys = {'by_uid':'UID', 'by_session':'Session'} 1390 self.grouping = grouping.lower() 1391 if self.grouping in gkeys: 1392 gkey = gkeys[self.grouping] 1393 for r in self: 1394 if r['Sample'] in samples_to_split: 1395 r['Sample_original'] = r['Sample'] 1396 r['Sample'] = f"{r['Sample']}__{r[gkey]}" 1397 elif r['Sample'] in self.unknowns: 1398 r['Sample_original'] = r['Sample'] 1399 self.refresh_samples()
Split unknown samples by UID (treat all analyses as different samples) or by session (treat analyses of a given sample in different sessions as different samples).
Parameters
samples_to_split
: a list of samples to split, e.g.,['IAEA-C1', 'IAEA-C2']
grouping
:by_uid
|by_session
1402 def unsplit_samples(self, tables = False): 1403 ''' 1404 Reverse the effects of `D47data.split_samples()`. 1405 1406 This should only be used after `D4xdata.standardize()` with `method='pooled'`. 1407 1408 After `D4xdata.standardize()` with `method='indep_sessions'`, one should 1409 probably use `D4xdata.combine_samples()` instead to reverse the effects of 1410 `D47data.split_samples()` with `grouping='by_uid'`, or `w_avg()` to reverse the 1411 effects of `D47data.split_samples()` with `grouping='by_sessions'` (because in 1412 that case session-averaged Δ4x values are statistically independent). 1413 ''' 1414 unknowns_old = sorted({s for s in self.unknowns}) 1415 CM_old = self.standardization.covar[:,:] 1416 VD_old = self.standardization.params.valuesdict().copy() 1417 vars_old = self.standardization.var_names 1418 1419 unknowns_new = sorted({r['Sample_original'] for r in self if 'Sample_original' in r}) 1420 1421 Ns = len(vars_old) - len(unknowns_old) 1422 vars_new = vars_old[:Ns] + [f'D{self._4x}_{pf(u)}' for u in unknowns_new] 1423 VD_new = {k: VD_old[k] for k in vars_old[:Ns]} 1424 1425 W = np.zeros((len(vars_new), len(vars_old))) 1426 W[:Ns,:Ns] = np.eye(Ns) 1427 for u in unknowns_new: 1428 splits = sorted({r['Sample'] for r in self if 'Sample_original' in r and r['Sample_original'] == u}) 1429 if self.grouping == 'by_session': 1430 weights = [self.samples[s][f'SE_D{self._4x}']**-2 for s in splits] 1431 elif self.grouping == 'by_uid': 1432 weights = [1 for s in splits] 1433 sw = sum(weights) 1434 weights = [w/sw for w in weights] 1435 W[vars_new.index(f'D{self._4x}_{pf(u)}'),[vars_old.index(f'D{self._4x}_{pf(s)}') for s in splits]] = weights[:] 1436 1437 CM_new = W @ CM_old @ W.T 1438 V = W @ np.array([[VD_old[k]] for k in vars_old]) 1439 VD_new = {k:v[0] for k,v in zip(vars_new, V)} 1440 1441 self.standardization.covar = CM_new 1442 self.standardization.params.valuesdict = lambda : VD_new 1443 self.standardization.var_names = vars_new 1444 1445 for r in self: 1446 if r['Sample'] in self.unknowns: 1447 r['Sample_split'] = r['Sample'] 1448 r['Sample'] = r['Sample_original'] 1449 1450 self.refresh_samples() 1451 self.consolidate_samples() 1452 self.repeatabilities() 1453 1454 if tables: 1455 self.table_of_analyses() 1456 self.table_of_samples()
Reverse the effects of D47data.split_samples()
.
This should only be used after D4xdata.standardize()
with method='pooled'
.
After D4xdata.standardize()
with method='indep_sessions'
, one should
probably use D4xdata.combine_samples()
instead to reverse the effects of
D47data.split_samples()
with grouping='by_uid'
, or w_avg()
to reverse the
effects of D47data.split_samples()
with grouping='by_sessions'
(because in
that case session-averaged Δ4x values are statistically independent).
1458 def assign_timestamps(self): 1459 ''' 1460 Assign a time field `t` of type `float` to each analysis. 1461 1462 If `TimeTag` is one of the data fields, `t` is equal within a given session 1463 to `TimeTag` minus the mean value of `TimeTag` for that session. 1464 Otherwise, `TimeTag` is by default equal to the index of each analysis 1465 in the dataset and `t` is defined as above. 1466 ''' 1467 for session in self.sessions: 1468 sdata = self.sessions[session]['data'] 1469 try: 1470 t0 = np.mean([r['TimeTag'] for r in sdata]) 1471 for r in sdata: 1472 r['t'] = r['TimeTag'] - t0 1473 except KeyError: 1474 t0 = (len(sdata)-1)/2 1475 for t,r in enumerate(sdata): 1476 r['t'] = t - t0
Assign a time field t
of type float
to each analysis.
If TimeTag
is one of the data fields, t
is equal within a given session
to TimeTag
minus the mean value of TimeTag
for that session.
Otherwise, TimeTag
is by default equal to the index of each analysis
in the dataset and t
is defined as above.
1479 def report(self): 1480 ''' 1481 Prints a report on the standardization fit. 1482 Only applicable after `D4xdata.standardize(method='pooled')`. 1483 ''' 1484 report_fit(self.standardization)
Prints a report on the standardization fit.
Only applicable after D4xdata.standardize(method='pooled')
.
1487 def combine_samples(self, sample_groups): 1488 ''' 1489 Combine analyses of different samples to compute weighted average Δ4x 1490 and new error (co)variances corresponding to the groups defined by the `sample_groups` 1491 dictionary. 1492 1493 Caution: samples are weighted by number of replicate analyses, which is a 1494 reasonable default behavior but is not always optimal (e.g., in the case of strongly 1495 correlated analytical errors for one or more samples). 1496 1497 Returns a tuplet of: 1498 1499 + the list of group names 1500 + an array of the corresponding Δ4x values 1501 + the corresponding (co)variance matrix 1502 1503 **Parameters** 1504 1505 + `sample_groups`: a dictionary of the form: 1506 ```py 1507 {'group1': ['sample_1', 'sample_2'], 1508 'group2': ['sample_3', 'sample_4', 'sample_5']} 1509 ``` 1510 ''' 1511 1512 samples = [s for k in sorted(sample_groups.keys()) for s in sorted(sample_groups[k])] 1513 groups = sorted(sample_groups.keys()) 1514 group_total_weights = {k: sum([self.samples[s]['N'] for s in sample_groups[k]]) for k in groups} 1515 D4x_old = np.array([[self.samples[x][f'D{self._4x}']] for x in samples]) 1516 CM_old = np.array([[self.sample_D4x_covar(x,y) for x in samples] for y in samples]) 1517 W = np.array([ 1518 [self.samples[i]['N']/group_total_weights[j] if i in sample_groups[j] else 0 for i in samples] 1519 for j in groups]) 1520 D4x_new = W @ D4x_old 1521 CM_new = W @ CM_old @ W.T 1522 1523 return groups, D4x_new[:,0], CM_new
Combine analyses of different samples to compute weighted average Δ4x
and new error (co)variances corresponding to the groups defined by the sample_groups
dictionary.
Caution: samples are weighted by number of replicate analyses, which is a reasonable default behavior but is not always optimal (e.g., in the case of strongly correlated analytical errors for one or more samples).
Returns a tuplet of:
- the list of group names
- an array of the corresponding Δ4x values
- the corresponding (co)variance matrix
Parameters
sample_groups
: a dictionary of the form:
{'group1': ['sample_1', 'sample_2'],
'group2': ['sample_3', 'sample_4', 'sample_5']}
1526 @make_verbal 1527 def standardize(self, 1528 method = 'pooled', 1529 weighted_sessions = [], 1530 consolidate = True, 1531 consolidate_tables = False, 1532 consolidate_plots = False, 1533 constraints = {}, 1534 ): 1535 ''' 1536 Compute absolute Δ4x values for all replicate analyses and for sample averages. 1537 If `method` argument is set to `'pooled'`, the standardization processes all sessions 1538 in a single step, assuming that all samples (anchors and unknowns alike) are homogeneous, 1539 i.e. that their true Δ4x value does not change between sessions, 1540 ([Daëron, 2021](https://doi.org/10.1029/2020GC009592)). If `method` argument is set to 1541 `'indep_sessions'`, the standardization processes each session independently, based only 1542 on anchors analyses. 1543 ''' 1544 1545 self.standardization_method = method 1546 self.assign_timestamps() 1547 1548 if method == 'pooled': 1549 if weighted_sessions: 1550 for session_group in weighted_sessions: 1551 if self._4x == '47': 1552 X = D47data([r for r in self if r['Session'] in session_group]) 1553 elif self._4x == '48': 1554 X = D48data([r for r in self if r['Session'] in session_group]) 1555 X.Nominal_D4x = self.Nominal_D4x.copy() 1556 X.refresh() 1557 result = X.standardize(method = 'pooled', weighted_sessions = [], consolidate = False) 1558 w = np.sqrt(result.redchi) 1559 self.msg(f'Session group {session_group} MRSWD = {w:.4f}') 1560 for r in X: 1561 r[f'wD{self._4x}raw'] *= w 1562 else: 1563 self.msg(f'All D{self._4x}raw weights set to 1 ‰') 1564 for r in self: 1565 r[f'wD{self._4x}raw'] = 1. 1566 1567 params = Parameters() 1568 for k,session in enumerate(self.sessions): 1569 self.msg(f"Session {session}: scrambling_drift is {self.sessions[session]['scrambling_drift']}.") 1570 self.msg(f"Session {session}: slope_drift is {self.sessions[session]['slope_drift']}.") 1571 self.msg(f"Session {session}: wg_drift is {self.sessions[session]['wg_drift']}.") 1572 s = pf(session) 1573 params.add(f'a_{s}', value = 0.9) 1574 params.add(f'b_{s}', value = 0.) 1575 params.add(f'c_{s}', value = -0.9) 1576 params.add(f'a2_{s}', value = 0., 1577# vary = self.sessions[session]['scrambling_drift'], 1578 ) 1579 params.add(f'b2_{s}', value = 0., 1580# vary = self.sessions[session]['slope_drift'], 1581 ) 1582 params.add(f'c2_{s}', value = 0., 1583# vary = self.sessions[session]['wg_drift'], 1584 ) 1585 if not self.sessions[session]['scrambling_drift']: 1586 params[f'a2_{s}'].expr = '0' 1587 if not self.sessions[session]['slope_drift']: 1588 params[f'b2_{s}'].expr = '0' 1589 if not self.sessions[session]['wg_drift']: 1590 params[f'c2_{s}'].expr = '0' 1591 1592 for sample in self.unknowns: 1593 params.add(f'D{self._4x}_{pf(sample)}', value = 0.5) 1594 1595 for k in constraints: 1596 params[k].expr = constraints[k] 1597 1598 def residuals(p): 1599 R = [] 1600 for r in self: 1601 session = pf(r['Session']) 1602 sample = pf(r['Sample']) 1603 if r['Sample'] in self.Nominal_D4x: 1604 R += [ ( 1605 r[f'D{self._4x}raw'] - ( 1606 p[f'a_{session}'] * self.Nominal_D4x[r['Sample']] 1607 + p[f'b_{session}'] * r[f'd{self._4x}'] 1608 + p[f'c_{session}'] 1609 + r['t'] * ( 1610 p[f'a2_{session}'] * self.Nominal_D4x[r['Sample']] 1611 + p[f'b2_{session}'] * r[f'd{self._4x}'] 1612 + p[f'c2_{session}'] 1613 ) 1614 ) 1615 ) / r[f'wD{self._4x}raw'] ] 1616 else: 1617 R += [ ( 1618 r[f'D{self._4x}raw'] - ( 1619 p[f'a_{session}'] * p[f'D{self._4x}_{sample}'] 1620 + p[f'b_{session}'] * r[f'd{self._4x}'] 1621 + p[f'c_{session}'] 1622 + r['t'] * ( 1623 p[f'a2_{session}'] * p[f'D{self._4x}_{sample}'] 1624 + p[f'b2_{session}'] * r[f'd{self._4x}'] 1625 + p[f'c2_{session}'] 1626 ) 1627 ) 1628 ) / r[f'wD{self._4x}raw'] ] 1629 return R 1630 1631 M = Minimizer(residuals, params) 1632 result = M.least_squares() 1633 self.Nf = result.nfree 1634 self.t95 = tstudent.ppf(1 - 0.05/2, self.Nf) 1635 new_names, new_covar, new_se = _fullcovar(result)[:3] 1636 result.var_names = new_names 1637 result.covar = new_covar 1638 1639 for r in self: 1640 s = pf(r["Session"]) 1641 a = result.params.valuesdict()[f'a_{s}'] 1642 b = result.params.valuesdict()[f'b_{s}'] 1643 c = result.params.valuesdict()[f'c_{s}'] 1644 a2 = result.params.valuesdict()[f'a2_{s}'] 1645 b2 = result.params.valuesdict()[f'b2_{s}'] 1646 c2 = result.params.valuesdict()[f'c2_{s}'] 1647 r[f'D{self._4x}'] = (r[f'D{self._4x}raw'] - c - b * r[f'd{self._4x}'] - c2 * r['t'] - b2 * r['t'] * r[f'd{self._4x}']) / (a + a2 * r['t']) 1648 1649 1650 self.standardization = result 1651 1652 for session in self.sessions: 1653 self.sessions[session]['Np'] = 3 1654 for k in ['scrambling', 'slope', 'wg']: 1655 if self.sessions[session][f'{k}_drift']: 1656 self.sessions[session]['Np'] += 1 1657 1658 if consolidate: 1659 self.consolidate(tables = consolidate_tables, plots = consolidate_plots) 1660 return result 1661 1662 1663 elif method == 'indep_sessions': 1664 1665 if weighted_sessions: 1666 for session_group in weighted_sessions: 1667 X = D4xdata([r for r in self if r['Session'] in session_group], mass = self._4x) 1668 X.Nominal_D4x = self.Nominal_D4x.copy() 1669 X.refresh() 1670 # This is only done to assign r['wD47raw'] for r in X: 1671 X.standardize(method = method, weighted_sessions = [], consolidate = False) 1672 self.msg(f'D{self._4x}raw weights set to {1000*X[0][f"wD{self._4x}raw"]:.1f} ppm for sessions in {session_group}') 1673 else: 1674 self.msg('All weights set to 1 ‰') 1675 for r in self: 1676 r[f'wD{self._4x}raw'] = 1 1677 1678 for session in self.sessions: 1679 s = self.sessions[session] 1680 p_names = ['a', 'b', 'c', 'a2', 'b2', 'c2'] 1681 p_active = [True, True, True, s['scrambling_drift'], s['slope_drift'], s['wg_drift']] 1682 s['Np'] = sum(p_active) 1683 sdata = s['data'] 1684 1685 A = np.array([ 1686 [ 1687 self.Nominal_D4x[r['Sample']] / r[f'wD{self._4x}raw'], 1688 r[f'd{self._4x}'] / r[f'wD{self._4x}raw'], 1689 1 / r[f'wD{self._4x}raw'], 1690 self.Nominal_D4x[r['Sample']] * r['t'] / r[f'wD{self._4x}raw'], 1691 r[f'd{self._4x}'] * r['t'] / r[f'wD{self._4x}raw'], 1692 r['t'] / r[f'wD{self._4x}raw'] 1693 ] 1694 for r in sdata if r['Sample'] in self.anchors 1695 ])[:,p_active] # only keep columns for the active parameters 1696 Y = np.array([[r[f'D{self._4x}raw'] / r[f'wD{self._4x}raw']] for r in sdata if r['Sample'] in self.anchors]) 1697 s['Na'] = Y.size 1698 CM = linalg.inv(A.T @ A) 1699 bf = (CM @ A.T @ Y).T[0,:] 1700 k = 0 1701 for n,a in zip(p_names, p_active): 1702 if a: 1703 s[n] = bf[k] 1704# self.msg(f'{n} = {bf[k]}') 1705 k += 1 1706 else: 1707 s[n] = 0. 1708# self.msg(f'{n} = 0.0') 1709 1710 for r in sdata : 1711 a, b, c, a2, b2, c2 = s['a'], s['b'], s['c'], s['a2'], s['b2'], s['c2'] 1712 r[f'D{self._4x}'] = (r[f'D{self._4x}raw'] - c - b * r[f'd{self._4x}'] - c2 * r['t'] - b2 * r['t'] * r[f'd{self._4x}']) / (a + a2 * r['t']) 1713 r[f'wD{self._4x}'] = r[f'wD{self._4x}raw'] / (a + a2 * r['t']) 1714 1715 s['CM'] = np.zeros((6,6)) 1716 i = 0 1717 k_active = [j for j,a in enumerate(p_active) if a] 1718 for j,a in enumerate(p_active): 1719 if a: 1720 s['CM'][j,k_active] = CM[i,:] 1721 i += 1 1722 1723 if not weighted_sessions: 1724 w = self.rmswd()['rmswd'] 1725 for r in self: 1726 r[f'wD{self._4x}'] *= w 1727 r[f'wD{self._4x}raw'] *= w 1728 for session in self.sessions: 1729 self.sessions[session]['CM'] *= w**2 1730 1731 for session in self.sessions: 1732 s = self.sessions[session] 1733 s['SE_a'] = s['CM'][0,0]**.5 1734 s['SE_b'] = s['CM'][1,1]**.5 1735 s['SE_c'] = s['CM'][2,2]**.5 1736 s['SE_a2'] = s['CM'][3,3]**.5 1737 s['SE_b2'] = s['CM'][4,4]**.5 1738 s['SE_c2'] = s['CM'][5,5]**.5 1739 1740 if not weighted_sessions: 1741 self.Nf = len(self) - len(self.unknowns) - np.sum([self.sessions[s]['Np'] for s in self.sessions]) 1742 else: 1743 self.Nf = 0 1744 for sg in weighted_sessions: 1745 self.Nf += self.rmswd(sessions = sg)['Nf'] 1746 1747 self.t95 = tstudent.ppf(1 - 0.05/2, self.Nf) 1748 1749 avgD4x = { 1750 sample: np.mean([r[f'D{self._4x}'] for r in self if r['Sample'] == sample]) 1751 for sample in self.samples 1752 } 1753 chi2 = np.sum([(r[f'D{self._4x}'] - avgD4x[r['Sample']])**2 for r in self]) 1754 rD4x = (chi2/self.Nf)**.5 1755 self.repeatability[f'sigma_{self._4x}'] = rD4x 1756 1757 if consolidate: 1758 self.consolidate(tables = consolidate_tables, plots = consolidate_plots)
Compute absolute Δ4x values for all replicate analyses and for sample averages.
If method
argument is set to 'pooled'
, the standardization processes all sessions
in a single step, assuming that all samples (anchors and unknowns alike) are homogeneous,
i.e. that their true Δ4x value does not change between sessions,
(Daëron, 2021). If method
argument is set to
'indep_sessions'
, the standardization processes each session independently, based only
on anchors analyses.
1761 def standardization_error(self, session, d4x, D4x, t = 0): 1762 ''' 1763 Compute standardization error for a given session and 1764 (δ47, Δ47) composition. 1765 ''' 1766 a = self.sessions[session]['a'] 1767 b = self.sessions[session]['b'] 1768 c = self.sessions[session]['c'] 1769 a2 = self.sessions[session]['a2'] 1770 b2 = self.sessions[session]['b2'] 1771 c2 = self.sessions[session]['c2'] 1772 CM = self.sessions[session]['CM'] 1773 1774 x, y = D4x, d4x 1775 z = a * x + b * y + c + a2 * x * t + b2 * y * t + c2 * t 1776# x = (z - b*y - b2*y*t - c - c2*t) / (a+a2*t) 1777 dxdy = -(b+b2*t) / (a+a2*t) 1778 dxdz = 1. / (a+a2*t) 1779 dxda = -x / (a+a2*t) 1780 dxdb = -y / (a+a2*t) 1781 dxdc = -1. / (a+a2*t) 1782 dxda2 = -x * a2 / (a+a2*t) 1783 dxdb2 = -y * t / (a+a2*t) 1784 dxdc2 = -t / (a+a2*t) 1785 V = np.array([dxda, dxdb, dxdc, dxda2, dxdb2, dxdc2]) 1786 sx = (V @ CM @ V.T) ** .5 1787 return sx
Compute standardization error for a given session and (δ47, Δ47) composition.
1790 @make_verbal 1791 def summary(self, 1792 dir = 'output', 1793 filename = None, 1794 save_to_file = True, 1795 print_out = True, 1796 ): 1797 ''' 1798 Print out an/or save to disk a summary of the standardization results. 1799 1800 **Parameters** 1801 1802 + `dir`: the directory in which to save the table 1803 + `filename`: the name to the csv file to write to 1804 + `save_to_file`: whether to save the table to disk 1805 + `print_out`: whether to print out the table 1806 ''' 1807 1808 out = [] 1809 out += [['N samples (anchors + unknowns)', f"{len(self.samples)} ({len(self.anchors)} + {len(self.unknowns)})"]] 1810 out += [['N analyses (anchors + unknowns)', f"{len(self)} ({len([r for r in self if r['Sample'] in self.anchors])} + {len([r for r in self if r['Sample'] in self.unknowns])})"]] 1811 out += [['Repeatability of δ13C_VPDB', f"{1000 * self.repeatability['r_d13C_VPDB']:.1f} ppm"]] 1812 out += [['Repeatability of δ18O_VSMOW', f"{1000 * self.repeatability['r_d18O_VSMOW']:.1f} ppm"]] 1813 out += [[f'Repeatability of Δ{self._4x} (anchors)', f"{1000 * self.repeatability[f'r_D{self._4x}a']:.1f} ppm"]] 1814 out += [[f'Repeatability of Δ{self._4x} (unknowns)', f"{1000 * self.repeatability[f'r_D{self._4x}u']:.1f} ppm"]] 1815 out += [[f'Repeatability of Δ{self._4x} (all)', f"{1000 * self.repeatability[f'r_D{self._4x}']:.1f} ppm"]] 1816 out += [['Model degrees of freedom', f"{self.Nf}"]] 1817 out += [['Student\'s 95% t-factor', f"{self.t95:.2f}"]] 1818 out += [['Standardization method', self.standardization_method]] 1819 1820 if save_to_file: 1821 if not os.path.exists(dir): 1822 os.makedirs(dir) 1823 if filename is None: 1824 filename = f'D{self._4x}_summary.csv' 1825 with open(f'{dir}/{filename}', 'w') as fid: 1826 fid.write(make_csv(out)) 1827 if print_out: 1828 self.msg('\n' + pretty_table(out, header = 0))
Print out an/or save to disk a summary of the standardization results.
Parameters
dir
: the directory in which to save the tablefilename
: the name to the csv file to write tosave_to_file
: whether to save the table to diskprint_out
: whether to print out the table
1831 @make_verbal 1832 def table_of_sessions(self, 1833 dir = 'output', 1834 filename = None, 1835 save_to_file = True, 1836 print_out = True, 1837 output = None, 1838 ): 1839 ''' 1840 Print out an/or save to disk a table of sessions. 1841 1842 **Parameters** 1843 1844 + `dir`: the directory in which to save the table 1845 + `filename`: the name to the csv file to write to 1846 + `save_to_file`: whether to save the table to disk 1847 + `print_out`: whether to print out the table 1848 + `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`); 1849 if set to `'raw'`: return a list of list of strings 1850 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 1851 ''' 1852 include_a2 = any([self.sessions[session]['scrambling_drift'] for session in self.sessions]) 1853 include_b2 = any([self.sessions[session]['slope_drift'] for session in self.sessions]) 1854 include_c2 = any([self.sessions[session]['wg_drift'] for session in self.sessions]) 1855 1856 out = [['Session','Na','Nu','d13Cwg_VPDB','d18Owg_VSMOW','r_d13C','r_d18O',f'r_D{self._4x}','a ± SE','1e3 x b ± SE','c ± SE']] 1857 if include_a2: 1858 out[-1] += ['a2 ± SE'] 1859 if include_b2: 1860 out[-1] += ['b2 ± SE'] 1861 if include_c2: 1862 out[-1] += ['c2 ± SE'] 1863 for session in self.sessions: 1864 out += [[ 1865 session, 1866 f"{self.sessions[session]['Na']}", 1867 f"{self.sessions[session]['Nu']}", 1868 f"{self.sessions[session]['d13Cwg_VPDB']:.3f}", 1869 f"{self.sessions[session]['d18Owg_VSMOW']:.3f}", 1870 f"{self.sessions[session]['r_d13C_VPDB']:.4f}", 1871 f"{self.sessions[session]['r_d18O_VSMOW']:.4f}", 1872 f"{self.sessions[session][f'r_D{self._4x}']:.4f}", 1873 f"{self.sessions[session]['a']:.3f} ± {self.sessions[session]['SE_a']:.3f}", 1874 f"{1e3*self.sessions[session]['b']:.3f} ± {1e3*self.sessions[session]['SE_b']:.3f}", 1875 f"{self.sessions[session]['c']:.3f} ± {self.sessions[session]['SE_c']:.3f}", 1876 ]] 1877 if include_a2: 1878 if self.sessions[session]['scrambling_drift']: 1879 out[-1] += [f"{self.sessions[session]['a2']:.1e} ± {self.sessions[session]['SE_a2']:.1e}"] 1880 else: 1881 out[-1] += [''] 1882 if include_b2: 1883 if self.sessions[session]['slope_drift']: 1884 out[-1] += [f"{self.sessions[session]['b2']:.1e} ± {self.sessions[session]['SE_b2']:.1e}"] 1885 else: 1886 out[-1] += [''] 1887 if include_c2: 1888 if self.sessions[session]['wg_drift']: 1889 out[-1] += [f"{self.sessions[session]['c2']:.1e} ± {self.sessions[session]['SE_c2']:.1e}"] 1890 else: 1891 out[-1] += [''] 1892 1893 if save_to_file: 1894 if not os.path.exists(dir): 1895 os.makedirs(dir) 1896 if filename is None: 1897 filename = f'D{self._4x}_sessions.csv' 1898 with open(f'{dir}/{filename}', 'w') as fid: 1899 fid.write(make_csv(out)) 1900 if print_out: 1901 self.msg('\n' + pretty_table(out)) 1902 if output == 'raw': 1903 return out 1904 elif output == 'pretty': 1905 return pretty_table(out)
Print out an/or save to disk a table of sessions.
Parameters
dir
: the directory in which to save the tablefilename
: the name to the csv file to write tosave_to_file
: whether to save the table to diskprint_out
: whether to print out the tableoutput
: if set to'pretty'
: return a pretty text table (seepretty_table()
); if set to'raw'
: return a list of list of strings (e.g.,[['header1', 'header2'], ['0.1', '0.2']]
)
1908 @make_verbal 1909 def table_of_analyses( 1910 self, 1911 dir = 'output', 1912 filename = None, 1913 save_to_file = True, 1914 print_out = True, 1915 output = None, 1916 ): 1917 ''' 1918 Print out an/or save to disk a table of analyses. 1919 1920 **Parameters** 1921 1922 + `dir`: the directory in which to save the table 1923 + `filename`: the name to the csv file to write to 1924 + `save_to_file`: whether to save the table to disk 1925 + `print_out`: whether to print out the table 1926 + `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`); 1927 if set to `'raw'`: return a list of list of strings 1928 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 1929 ''' 1930 1931 out = [['UID','Session','Sample']] 1932 extra_fields = [f for f in [('SampleMass','.2f'),('ColdFingerPressure','.1f'),('AcidReactionYield','.3f')] if f[0] in {k for r in self for k in r}] 1933 for f in extra_fields: 1934 out[-1] += [f[0]] 1935 out[-1] += ['d13Cwg_VPDB','d18Owg_VSMOW','d45','d46','d47','d48','d49','d13C_VPDB','d18O_VSMOW','D47raw','D48raw','D49raw',f'D{self._4x}'] 1936 for r in self: 1937 out += [[f"{r['UID']}",f"{r['Session']}",f"{r['Sample']}"]] 1938 for f in extra_fields: 1939 out[-1] += [f"{r[f[0]]:{f[1]}}"] 1940 out[-1] += [ 1941 f"{r['d13Cwg_VPDB']:.3f}", 1942 f"{r['d18Owg_VSMOW']:.3f}", 1943 f"{r['d45']:.6f}", 1944 f"{r['d46']:.6f}", 1945 f"{r['d47']:.6f}", 1946 f"{r['d48']:.6f}", 1947 f"{r['d49']:.6f}", 1948 f"{r['d13C_VPDB']:.6f}", 1949 f"{r['d18O_VSMOW']:.6f}", 1950 f"{r['D47raw']:.6f}", 1951 f"{r['D48raw']:.6f}", 1952 f"{r['D49raw']:.6f}", 1953 f"{r[f'D{self._4x}']:.6f}" 1954 ] 1955 if save_to_file: 1956 if not os.path.exists(dir): 1957 os.makedirs(dir) 1958 if filename is None: 1959 filename = f'D{self._4x}_analyses.csv' 1960 with open(f'{dir}/{filename}', 'w') as fid: 1961 fid.write(make_csv(out)) 1962 if print_out: 1963 self.msg('\n' + pretty_table(out)) 1964 return out
Print out an/or save to disk a table of analyses.
Parameters
dir
: the directory in which to save the tablefilename
: the name to the csv file to write tosave_to_file
: whether to save the table to diskprint_out
: whether to print out the tableoutput
: if set to'pretty'
: return a pretty text table (seepretty_table()
); if set to'raw'
: return a list of list of strings (e.g.,[['header1', 'header2'], ['0.1', '0.2']]
)
1966 @make_verbal 1967 def covar_table( 1968 self, 1969 correl = False, 1970 dir = 'output', 1971 filename = None, 1972 save_to_file = True, 1973 print_out = True, 1974 output = None, 1975 ): 1976 ''' 1977 Print out, save to disk and/or return the variance-covariance matrix of D4x 1978 for all unknown samples. 1979 1980 **Parameters** 1981 1982 + `dir`: the directory in which to save the csv 1983 + `filename`: the name of the csv file to write to 1984 + `save_to_file`: whether to save the csv 1985 + `print_out`: whether to print out the matrix 1986 + `output`: if set to `'pretty'`: return a pretty text matrix (see `pretty_table()`); 1987 if set to `'raw'`: return a list of list of strings 1988 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 1989 ''' 1990 samples = sorted([u for u in self.unknowns]) 1991 out = [[''] + samples] 1992 for s1 in samples: 1993 out.append([s1]) 1994 for s2 in samples: 1995 if correl: 1996 out[-1].append(f'{self.sample_D4x_correl(s1, s2):.6f}') 1997 else: 1998 out[-1].append(f'{self.sample_D4x_covar(s1, s2):.8e}') 1999 2000 if save_to_file: 2001 if not os.path.exists(dir): 2002 os.makedirs(dir) 2003 if filename is None: 2004 if correl: 2005 filename = f'D{self._4x}_correl.csv' 2006 else: 2007 filename = f'D{self._4x}_covar.csv' 2008 with open(f'{dir}/{filename}', 'w') as fid: 2009 fid.write(make_csv(out)) 2010 if print_out: 2011 self.msg('\n'+pretty_table(out)) 2012 if output == 'raw': 2013 return out 2014 elif output == 'pretty': 2015 return pretty_table(out)
Print out, save to disk and/or return the variance-covariance matrix of D4x for all unknown samples.
Parameters
dir
: the directory in which to save the csvfilename
: the name of the csv file to write tosave_to_file
: whether to save the csvprint_out
: whether to print out the matrixoutput
: if set to'pretty'
: return a pretty text matrix (seepretty_table()
); if set to'raw'
: return a list of list of strings (e.g.,[['header1', 'header2'], ['0.1', '0.2']]
)
2017 @make_verbal 2018 def table_of_samples( 2019 self, 2020 dir = 'output', 2021 filename = None, 2022 save_to_file = True, 2023 print_out = True, 2024 output = None, 2025 ): 2026 ''' 2027 Print out, save to disk and/or return a table of samples. 2028 2029 **Parameters** 2030 2031 + `dir`: the directory in which to save the csv 2032 + `filename`: the name of the csv file to write to 2033 + `save_to_file`: whether to save the csv 2034 + `print_out`: whether to print out the table 2035 + `output`: if set to `'pretty'`: return a pretty text table (see `pretty_table()`); 2036 if set to `'raw'`: return a list of list of strings 2037 (e.g., `[['header1', 'header2'], ['0.1', '0.2']]`) 2038 ''' 2039 2040 out = [['Sample','N','d13C_VPDB','d18O_VSMOW',f'D{self._4x}','SE','95% CL','SD','p_Levene']] 2041 for sample in self.anchors: 2042 out += [[ 2043 f"{sample}", 2044 f"{self.samples[sample]['N']}", 2045 f"{self.samples[sample]['d13C_VPDB']:.2f}", 2046 f"{self.samples[sample]['d18O_VSMOW']:.2f}", 2047 f"{self.samples[sample][f'D{self._4x}']:.4f}",'','', 2048 f"{self.samples[sample][f'SD_D{self._4x}']:.4f}" if self.samples[sample]['N'] > 1 else '', '' 2049 ]] 2050 for sample in self.unknowns: 2051 out += [[ 2052 f"{sample}", 2053 f"{self.samples[sample]['N']}", 2054 f"{self.samples[sample]['d13C_VPDB']:.2f}", 2055 f"{self.samples[sample]['d18O_VSMOW']:.2f}", 2056 f"{self.samples[sample][f'D{self._4x}']:.4f}", 2057 f"{self.samples[sample][f'SE_D{self._4x}']:.4f}", 2058 f"± {self.samples[sample][f'SE_D{self._4x}'] * self.t95:.4f}", 2059 f"{self.samples[sample][f'SD_D{self._4x}']:.4f}" if self.samples[sample]['N'] > 1 else '', 2060 f"{self.samples[sample]['p_Levene']:.3f}" if self.samples[sample]['N'] > 2 else '' 2061 ]] 2062 if save_to_file: 2063 if not os.path.exists(dir): 2064 os.makedirs(dir) 2065 if filename is None: 2066 filename = f'D{self._4x}_samples.csv' 2067 with open(f'{dir}/{filename}', 'w') as fid: 2068 fid.write(make_csv(out)) 2069 if print_out: 2070 self.msg('\n'+pretty_table(out)) 2071 if output == 'raw': 2072 return out 2073 elif output == 'pretty': 2074 return pretty_table(out)
Print out, save to disk and/or return a table of samples.
Parameters
dir
: the directory in which to save the csvfilename
: the name of the csv file to write tosave_to_file
: whether to save the csvprint_out
: whether to print out the tableoutput
: if set to'pretty'
: return a pretty text table (seepretty_table()
); if set to'raw'
: return a list of list of strings (e.g.,[['header1', 'header2'], ['0.1', '0.2']]
)
2077 def plot_sessions(self, dir = 'output', figsize = (8,8), filetype = 'pdf', dpi = 100): 2078 ''' 2079 Generate session plots and save them to disk. 2080 2081 **Parameters** 2082 2083 + `dir`: the directory in which to save the plots 2084 + `figsize`: the width and height (in inches) of each plot 2085 + `filetype`: 'pdf' or 'png' 2086 + `dpi`: resolution for PNG output 2087 ''' 2088 if not os.path.exists(dir): 2089 os.makedirs(dir) 2090 2091 for session in self.sessions: 2092 sp = self.plot_single_session(session, xylimits = 'constant') 2093 ppl.savefig(f'{dir}/D{self._4x}_plot_{session}.{filetype}', **({'dpi': dpi} if filetype.lower() == 'png' else {})) 2094 ppl.close(sp.fig)
Generate session plots and save them to disk.
Parameters
dir
: the directory in which to save the plotsfigsize
: the width and height (in inches) of each plotfiletype
: 'pdf' or 'png'dpi
: resolution for PNG output
2098 @make_verbal 2099 def consolidate_samples(self): 2100 ''' 2101 Compile various statistics for each sample. 2102 2103 For each anchor sample: 2104 2105 + `D47` or `D48`: the nominal Δ4x value for this anchor, specified by `self.Nominal_D4x` 2106 + `SE_D47` or `SE_D48`: set to zero by definition 2107 2108 For each unknown sample: 2109 2110 + `D47` or `D48`: the standardized Δ4x value for this unknown 2111 + `SE_D47` or `SE_D48`: the standard error of Δ4x for this unknown 2112 2113 For each anchor and unknown: 2114 2115 + `N`: the total number of analyses of this sample 2116 + `SD_D47` or `SD_D48`: the “sample” (in the statistical sense) standard deviation for this sample 2117 + `d13C_VPDB`: the average δ13C_VPDB value for this sample 2118 + `d18O_VSMOW`: the average δ18O_VSMOW value for this sample (as CO2) 2119 + `p_Levene`: the p-value from a [Levene test](https://en.wikipedia.org/wiki/Levene%27s_test) of equal 2120 variance, indicating whether the Δ4x repeatability this sample differs significantly from 2121 that observed for the reference sample specified by `self.LEVENE_REF_SAMPLE`. 2122 ''' 2123 D4x_ref_pop = [r[f'D{self._4x}'] for r in self.samples[self.LEVENE_REF_SAMPLE]['data']] 2124 for sample in self.samples: 2125 self.samples[sample]['N'] = len(self.samples[sample]['data']) 2126 if self.samples[sample]['N'] > 1: 2127 self.samples[sample][f'SD_D{self._4x}'] = stdev([r[f'D{self._4x}'] for r in self.samples[sample]['data']]) 2128 2129 self.samples[sample]['d13C_VPDB'] = np.mean([r['d13C_VPDB'] for r in self.samples[sample]['data']]) 2130 self.samples[sample]['d18O_VSMOW'] = np.mean([r['d18O_VSMOW'] for r in self.samples[sample]['data']]) 2131 2132 D4x_pop = [r[f'D{self._4x}'] for r in self.samples[sample]['data']] 2133 if len(D4x_pop) > 2: 2134 self.samples[sample]['p_Levene'] = levene(D4x_ref_pop, D4x_pop, center = 'median')[1] 2135 2136 if self.standardization_method == 'pooled': 2137 for sample in self.anchors: 2138 self.samples[sample][f'D{self._4x}'] = self.Nominal_D4x[sample] 2139 self.samples[sample][f'SE_D{self._4x}'] = 0. 2140 for sample in self.unknowns: 2141 self.samples[sample][f'D{self._4x}'] = self.standardization.params.valuesdict()[f'D{self._4x}_{pf(sample)}'] 2142 try: 2143 self.samples[sample][f'SE_D{self._4x}'] = self.sample_D4x_covar(sample)**.5 2144 except ValueError: 2145 # when `sample` is constrained by self.standardize(constraints = {...}), 2146 # it is no longer listed in self.standardization.var_names. 2147 # Temporary fix: define SE as zero for now 2148 self.samples[sample][f'SE_D4{self._4x}'] = 0. 2149 2150 elif self.standardization_method == 'indep_sessions': 2151 for sample in self.anchors: 2152 self.samples[sample][f'D{self._4x}'] = self.Nominal_D4x[sample] 2153 self.samples[sample][f'SE_D{self._4x}'] = 0. 2154 for sample in self.unknowns: 2155 self.msg(f'Consolidating sample {sample}') 2156 self.unknowns[sample][f'session_D{self._4x}'] = {} 2157 session_avg = [] 2158 for session in self.sessions: 2159 sdata = [r for r in self.sessions[session]['data'] if r['Sample'] == sample] 2160 if sdata: 2161 self.msg(f'{sample} found in session {session}') 2162 avg_D4x = np.mean([r[f'D{self._4x}'] for r in sdata]) 2163 avg_d4x = np.mean([r[f'd{self._4x}'] for r in sdata]) 2164 # !! TODO: sigma_s below does not account for temporal changes in standardization error 2165 sigma_s = self.standardization_error(session, avg_d4x, avg_D4x) 2166 sigma_u = sdata[0][f'wD{self._4x}raw'] / self.sessions[session]['a'] / len(sdata)**.5 2167 session_avg.append([avg_D4x, (sigma_u**2 + sigma_s**2)**.5]) 2168 self.unknowns[sample][f'session_D{self._4x}'][session] = session_avg[-1] 2169 self.samples[sample][f'D{self._4x}'], self.samples[sample][f'SE_D{self._4x}'] = w_avg(*zip(*session_avg)) 2170 weights = {s: self.unknowns[sample][f'session_D{self._4x}'][s][1]**-2 for s in self.unknowns[sample][f'session_D{self._4x}']} 2171 wsum = sum([weights[s] for s in weights]) 2172 for s in weights: 2173 self.unknowns[sample][f'session_D{self._4x}'][s] += [self.unknowns[sample][f'session_D{self._4x}'][s][1]**-2 / wsum] 2174 2175 for r in self: 2176 r[f'D{self._4x}_residual'] = r[f'D{self._4x}'] - self.samples[r['Sample']][f'D{self._4x}']
Compile various statistics for each sample.
For each anchor sample:
D47
orD48
: the nominal Δ4x value for this anchor, specified byself.Nominal_D4x
SE_D47
orSE_D48
: set to zero by definition
For each unknown sample:
D47
orD48
: the standardized Δ4x value for this unknownSE_D47
orSE_D48
: the standard error of Δ4x for this unknown
For each anchor and unknown:
N
: the total number of analyses of this sampleSD_D47
orSD_D48
: the “sample” (in the statistical sense) standard deviation for this sampled13C_VPDB
: the average δ13CVPDB value for this sampled18O_VSMOW
: the average δ18OVSMOW value for this sample (as CO2)p_Levene
: the p-value from a Levene test of equal variance, indicating whether the Δ4x repeatability this sample differs significantly from that observed for the reference sample specified byself.LEVENE_REF_SAMPLE
.
2180 def consolidate_sessions(self): 2181 ''' 2182 Compute various statistics for each session. 2183 2184 + `Na`: Number of anchor analyses in the session 2185 + `Nu`: Number of unknown analyses in the session 2186 + `r_d13C_VPDB`: δ13C_VPDB repeatability of analyses within the session 2187 + `r_d18O_VSMOW`: δ18O_VSMOW repeatability of analyses within the session 2188 + `r_D47` or `r_D48`: Δ4x repeatability of analyses within the session 2189 + `a`: scrambling factor 2190 + `b`: compositional slope 2191 + `c`: WG offset 2192 + `SE_a`: Model stadard erorr of `a` 2193 + `SE_b`: Model stadard erorr of `b` 2194 + `SE_c`: Model stadard erorr of `c` 2195 + `scrambling_drift` (boolean): whether to allow a temporal drift in the scrambling factor (`a`) 2196 + `slope_drift` (boolean): whether to allow a temporal drift in the compositional slope (`b`) 2197 + `wg_drift` (boolean): whether to allow a temporal drift in the WG offset (`c`) 2198 + `a2`: scrambling factor drift 2199 + `b2`: compositional slope drift 2200 + `c2`: WG offset drift 2201 + `Np`: Number of standardization parameters to fit 2202 + `CM`: model covariance matrix for (`a`, `b`, `c`, `a2`, `b2`, `c2`) 2203 + `d13Cwg_VPDB`: δ13C_VPDB of WG 2204 + `d18Owg_VSMOW`: δ18O_VSMOW of WG 2205 ''' 2206 for session in self.sessions: 2207 if 'd13Cwg_VPDB' not in self.sessions[session]: 2208 self.sessions[session]['d13Cwg_VPDB'] = self.sessions[session]['data'][0]['d13Cwg_VPDB'] 2209 if 'd18Owg_VSMOW' not in self.sessions[session]: 2210 self.sessions[session]['d18Owg_VSMOW'] = self.sessions[session]['data'][0]['d18Owg_VSMOW'] 2211 self.sessions[session]['Na'] = len([r for r in self.sessions[session]['data'] if r['Sample'] in self.anchors]) 2212 self.sessions[session]['Nu'] = len([r for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns]) 2213 2214 self.msg(f'Computing repeatabilities for session {session}') 2215 self.sessions[session]['r_d13C_VPDB'] = self.compute_r('d13C_VPDB', samples = 'anchors', sessions = [session]) 2216 self.sessions[session]['r_d18O_VSMOW'] = self.compute_r('d18O_VSMOW', samples = 'anchors', sessions = [session]) 2217 self.sessions[session][f'r_D{self._4x}'] = self.compute_r(f'D{self._4x}', sessions = [session]) 2218 2219 if self.standardization_method == 'pooled': 2220 for session in self.sessions: 2221 2222 # different (better?) computation of D4x repeatability for each session: 2223 sqresiduals = [(r[f'D{self._4x}'] - self.samples[r['Sample']][f'D{self._4x}'])**2 for r in self.sessions[session]['data']] 2224 self.sessions[session][f'r_D{self._4x}'] = np.mean(sqresiduals)**.5 2225 2226 self.sessions[session]['a'] = self.standardization.params.valuesdict()[f'a_{pf(session)}'] 2227 i = self.standardization.var_names.index(f'a_{pf(session)}') 2228 self.sessions[session]['SE_a'] = self.standardization.covar[i,i]**.5 2229 2230 self.sessions[session]['b'] = self.standardization.params.valuesdict()[f'b_{pf(session)}'] 2231 i = self.standardization.var_names.index(f'b_{pf(session)}') 2232 self.sessions[session]['SE_b'] = self.standardization.covar[i,i]**.5 2233 2234 self.sessions[session]['c'] = self.standardization.params.valuesdict()[f'c_{pf(session)}'] 2235 i = self.standardization.var_names.index(f'c_{pf(session)}') 2236 self.sessions[session]['SE_c'] = self.standardization.covar[i,i]**.5 2237 2238 self.sessions[session]['a2'] = self.standardization.params.valuesdict()[f'a2_{pf(session)}'] 2239 if self.sessions[session]['scrambling_drift']: 2240 i = self.standardization.var_names.index(f'a2_{pf(session)}') 2241 self.sessions[session]['SE_a2'] = self.standardization.covar[i,i]**.5 2242 else: 2243 self.sessions[session]['SE_a2'] = 0. 2244 2245 self.sessions[session]['b2'] = self.standardization.params.valuesdict()[f'b2_{pf(session)}'] 2246 if self.sessions[session]['slope_drift']: 2247 i = self.standardization.var_names.index(f'b2_{pf(session)}') 2248 self.sessions[session]['SE_b2'] = self.standardization.covar[i,i]**.5 2249 else: 2250 self.sessions[session]['SE_b2'] = 0. 2251 2252 self.sessions[session]['c2'] = self.standardization.params.valuesdict()[f'c2_{pf(session)}'] 2253 if self.sessions[session]['wg_drift']: 2254 i = self.standardization.var_names.index(f'c2_{pf(session)}') 2255 self.sessions[session]['SE_c2'] = self.standardization.covar[i,i]**.5 2256 else: 2257 self.sessions[session]['SE_c2'] = 0. 2258 2259 i = self.standardization.var_names.index(f'a_{pf(session)}') 2260 j = self.standardization.var_names.index(f'b_{pf(session)}') 2261 k = self.standardization.var_names.index(f'c_{pf(session)}') 2262 CM = np.zeros((6,6)) 2263 CM[:3,:3] = self.standardization.covar[[i,j,k],:][:,[i,j,k]] 2264 try: 2265 i2 = self.standardization.var_names.index(f'a2_{pf(session)}') 2266 CM[3,[0,1,2,3]] = self.standardization.covar[i2,[i,j,k,i2]] 2267 CM[[0,1,2,3],3] = self.standardization.covar[[i,j,k,i2],i2] 2268 try: 2269 j2 = self.standardization.var_names.index(f'b2_{pf(session)}') 2270 CM[3,4] = self.standardization.covar[i2,j2] 2271 CM[4,3] = self.standardization.covar[j2,i2] 2272 except ValueError: 2273 pass 2274 try: 2275 k2 = self.standardization.var_names.index(f'c2_{pf(session)}') 2276 CM[3,5] = self.standardization.covar[i2,k2] 2277 CM[5,3] = self.standardization.covar[k2,i2] 2278 except ValueError: 2279 pass 2280 except ValueError: 2281 pass 2282 try: 2283 j2 = self.standardization.var_names.index(f'b2_{pf(session)}') 2284 CM[4,[0,1,2,4]] = self.standardization.covar[j2,[i,j,k,j2]] 2285 CM[[0,1,2,4],4] = self.standardization.covar[[i,j,k,j2],j2] 2286 try: 2287 k2 = self.standardization.var_names.index(f'c2_{pf(session)}') 2288 CM[4,5] = self.standardization.covar[j2,k2] 2289 CM[5,4] = self.standardization.covar[k2,j2] 2290 except ValueError: 2291 pass 2292 except ValueError: 2293 pass 2294 try: 2295 k2 = self.standardization.var_names.index(f'c2_{pf(session)}') 2296 CM[5,[0,1,2,5]] = self.standardization.covar[k2,[i,j,k,k2]] 2297 CM[[0,1,2,5],5] = self.standardization.covar[[i,j,k,k2],k2] 2298 except ValueError: 2299 pass 2300 2301 self.sessions[session]['CM'] = CM 2302 2303 elif self.standardization_method == 'indep_sessions': 2304 pass # Not implemented yet
Compute various statistics for each session.
Na
: Number of anchor analyses in the sessionNu
: Number of unknown analyses in the sessionr_d13C_VPDB
: δ13CVPDB repeatability of analyses within the sessionr_d18O_VSMOW
: δ18OVSMOW repeatability of analyses within the sessionr_D47
orr_D48
: Δ4x repeatability of analyses within the sessiona
: scrambling factorb
: compositional slopec
: WG offsetSE_a
: Model stadard erorr ofa
SE_b
: Model stadard erorr ofb
SE_c
: Model stadard erorr ofc
scrambling_drift
(boolean): whether to allow a temporal drift in the scrambling factor (a
)slope_drift
(boolean): whether to allow a temporal drift in the compositional slope (b
)wg_drift
(boolean): whether to allow a temporal drift in the WG offset (c
)a2
: scrambling factor driftb2
: compositional slope driftc2
: WG offset driftNp
: Number of standardization parameters to fitCM
: model covariance matrix for (a
,b
,c
,a2
,b2
,c2
)d13Cwg_VPDB
: δ13CVPDB of WGd18Owg_VSMOW
: δ18OVSMOW of WG
2307 @make_verbal 2308 def repeatabilities(self): 2309 ''' 2310 Compute analytical repeatabilities for δ13C_VPDB, δ18O_VSMOW, Δ4x 2311 (for all samples, for anchors, and for unknowns). 2312 ''' 2313 self.msg('Computing reproducibilities for all sessions') 2314 2315 self.repeatability['r_d13C_VPDB'] = self.compute_r('d13C_VPDB', samples = 'anchors') 2316 self.repeatability['r_d18O_VSMOW'] = self.compute_r('d18O_VSMOW', samples = 'anchors') 2317 self.repeatability[f'r_D{self._4x}a'] = self.compute_r(f'D{self._4x}', samples = 'anchors') 2318 self.repeatability[f'r_D{self._4x}u'] = self.compute_r(f'D{self._4x}', samples = 'unknowns') 2319 self.repeatability[f'r_D{self._4x}'] = self.compute_r(f'D{self._4x}', samples = 'all samples')
Compute analytical repeatabilities for δ13CVPDB, δ18OVSMOW, Δ4x (for all samples, for anchors, and for unknowns).
2322 @make_verbal 2323 def consolidate(self, tables = True, plots = True): 2324 ''' 2325 Collect information about samples, sessions and repeatabilities. 2326 ''' 2327 self.consolidate_samples() 2328 self.consolidate_sessions() 2329 self.repeatabilities() 2330 2331 if tables: 2332 self.summary() 2333 self.table_of_sessions() 2334 self.table_of_analyses() 2335 self.table_of_samples() 2336 2337 if plots: 2338 self.plot_sessions()
Collect information about samples, sessions and repeatabilities.
2341 @make_verbal 2342 def rmswd(self, 2343 samples = 'all samples', 2344 sessions = 'all sessions', 2345 ): 2346 ''' 2347 Compute the χ2, root mean squared weighted deviation 2348 (i.e. reduced χ2), and corresponding degrees of freedom of the 2349 Δ4x values for samples in `samples` and sessions in `sessions`. 2350 2351 Only used in `D4xdata.standardize()` with `method='indep_sessions'`. 2352 ''' 2353 if samples == 'all samples': 2354 mysamples = [k for k in self.samples] 2355 elif samples == 'anchors': 2356 mysamples = [k for k in self.anchors] 2357 elif samples == 'unknowns': 2358 mysamples = [k for k in self.unknowns] 2359 else: 2360 mysamples = samples 2361 2362 if sessions == 'all sessions': 2363 sessions = [k for k in self.sessions] 2364 2365 chisq, Nf = 0, 0 2366 for sample in mysamples : 2367 G = [ r for r in self if r['Sample'] == sample and r['Session'] in sessions ] 2368 if len(G) > 1 : 2369 X, sX = w_avg([r[f'D{self._4x}'] for r in G], [r[f'wD{self._4x}'] for r in G]) 2370 Nf += (len(G) - 1) 2371 chisq += np.sum([ ((r[f'D{self._4x}']-X)/r[f'wD{self._4x}'])**2 for r in G]) 2372 r = (chisq / Nf)**.5 if Nf > 0 else 0 2373 self.msg(f'RMSWD of r["D{self._4x}"] is {r:.6f} for {samples}.') 2374 return {'rmswd': r, 'chisq': chisq, 'Nf': Nf}
Compute the χ2, root mean squared weighted deviation
(i.e. reduced χ2), and corresponding degrees of freedom of the
Δ4x values for samples in samples
and sessions in sessions
.
Only used in D4xdata.standardize()
with method='indep_sessions'
.
2377 @make_verbal 2378 def compute_r(self, key, samples = 'all samples', sessions = 'all sessions'): 2379 ''' 2380 Compute the repeatability of `[r[key] for r in self]` 2381 ''' 2382 2383 if samples == 'all samples': 2384 mysamples = [k for k in self.samples] 2385 elif samples == 'anchors': 2386 mysamples = [k for k in self.anchors] 2387 elif samples == 'unknowns': 2388 mysamples = [k for k in self.unknowns] 2389 else: 2390 mysamples = samples 2391 2392 if sessions == 'all sessions': 2393 sessions = [k for k in self.sessions] 2394 2395 if key in ['D47', 'D48']: 2396 # Full disclosure: the definition of Nf is tricky/debatable 2397 G = [r for r in self if r['Sample'] in mysamples and r['Session'] in sessions] 2398 chisq = (np.array([r[f'{key}_residual'] for r in G])**2).sum() 2399 Nf = len(G) 2400# print(f'len(G) = {Nf}') 2401 Nf -= len([s for s in mysamples if s in self.unknowns]) 2402# print(f'{len([s for s in mysamples if s in self.unknowns])} unknown samples to consider') 2403 for session in sessions: 2404 Np = len([ 2405 _ for _ in self.standardization.params 2406 if ( 2407 self.standardization.params[_].expr is not None 2408 and ( 2409 (_[0] in 'abc' and _[1] == '_' and _[2:] == pf(session)) 2410 or (_[0] in 'abc' and _[1:3] == '2_' and _[3:] == pf(session)) 2411 ) 2412 ) 2413 ]) 2414# print(f'session {session}: {Np} parameters to consider') 2415 Na = len({ 2416 r['Sample'] for r in self.sessions[session]['data'] 2417 if r['Sample'] in self.anchors and r['Sample'] in mysamples 2418 }) 2419# print(f'session {session}: {Na} different anchors in that session') 2420 Nf -= min(Np, Na) 2421# print(f'Nf = {Nf}') 2422 2423# for sample in mysamples : 2424# X = [ r[key] for r in self if r['Sample'] == sample and r['Session'] in sessions ] 2425# if len(X) > 1 : 2426# chisq += np.sum([ (x-self.samples[sample][key])**2 for x in X ]) 2427# if sample in self.unknowns: 2428# Nf += len(X) - 1 2429# else: 2430# Nf += len(X) 2431# if samples in ['anchors', 'all samples']: 2432# Nf -= sum([self.sessions[s]['Np'] for s in sessions]) 2433 r = (chisq / Nf)**.5 if Nf > 0 else 0 2434 2435 else: # if key not in ['D47', 'D48'] 2436 chisq, Nf = 0, 0 2437 for sample in mysamples : 2438 X = [ r[key] for r in self if r['Sample'] == sample and r['Session'] in sessions ] 2439 if len(X) > 1 : 2440 Nf += len(X) - 1 2441 chisq += np.sum([ (x-np.mean(X))**2 for x in X ]) 2442 r = (chisq / Nf)**.5 if Nf > 0 else 0 2443 2444 self.msg(f'Repeatability of r["{key}"] is {1000*r:.1f} ppm for {samples}.') 2445 return r
Compute the repeatability of [r[key] for r in self]
2447 def sample_average(self, samples, weights = 'equal', normalize = True): 2448 ''' 2449 Weighted average Δ4x value of a group of samples, accounting for covariance. 2450 2451 Returns the weighed average Δ4x value and associated SE 2452 of a group of samples. Weights are equal by default. If `normalize` is 2453 true, `weights` will be rescaled so that their sum equals 1. 2454 2455 **Examples** 2456 2457 ```python 2458 self.sample_average(['X','Y'], [1, 2]) 2459 ``` 2460 2461 returns the value and SE of [Δ4x(X) + 2 Δ4x(Y)]/3, 2462 where Δ4x(X) and Δ4x(Y) are the average Δ4x 2463 values of samples X and Y, respectively. 2464 2465 ```python 2466 self.sample_average(['X','Y'], [1, -1], normalize = False) 2467 ``` 2468 2469 returns the value and SE of the difference Δ4x(X) - Δ4x(Y). 2470 ''' 2471 if weights == 'equal': 2472 weights = [1/len(samples)] * len(samples) 2473 2474 if normalize: 2475 s = sum(weights) 2476 if s: 2477 weights = [w/s for w in weights] 2478 2479 try: 2480# indices = [self.standardization.var_names.index(f'D47_{pf(sample)}') for sample in samples] 2481# C = self.standardization.covar[indices,:][:,indices] 2482 C = np.array([[self.sample_D4x_covar(x, y) for x in samples] for y in samples]) 2483 X = [self.samples[sample][f'D{self._4x}'] for sample in samples] 2484 return correlated_sum(X, C, weights) 2485 except ValueError: 2486 return (0., 0.)
Weighted average Δ4x value of a group of samples, accounting for covariance.
Returns the weighed average Δ4x value and associated SE
of a group of samples. Weights are equal by default. If normalize
is
true, weights
will be rescaled so that their sum equals 1.
Examples
self.sample_average(['X','Y'], [1, 2])
returns the value and SE of [Δ4x(X) + 2 Δ4x(Y)]/3, where Δ4x(X) and Δ4x(Y) are the average Δ4x values of samples X and Y, respectively.
self.sample_average(['X','Y'], [1, -1], normalize = False)
returns the value and SE of the difference Δ4x(X) - Δ4x(Y).
2489 def sample_D4x_covar(self, sample1, sample2 = None): 2490 ''' 2491 Covariance between Δ4x values of samples 2492 2493 Returns the error covariance between the average Δ4x values of two 2494 samples. If if only `sample_1` is specified, or if `sample_1 == sample_2`), 2495 returns the Δ4x variance for that sample. 2496 ''' 2497 if sample2 is None: 2498 sample2 = sample1 2499 if self.standardization_method == 'pooled': 2500 i = self.standardization.var_names.index(f'D{self._4x}_{pf(sample1)}') 2501 j = self.standardization.var_names.index(f'D{self._4x}_{pf(sample2)}') 2502 return self.standardization.covar[i, j] 2503 elif self.standardization_method == 'indep_sessions': 2504 if sample1 == sample2: 2505 return self.samples[sample1][f'SE_D{self._4x}']**2 2506 else: 2507 c = 0 2508 for session in self.sessions: 2509 sdata1 = [r for r in self.sessions[session]['data'] if r['Sample'] == sample1] 2510 sdata2 = [r for r in self.sessions[session]['data'] if r['Sample'] == sample2] 2511 if sdata1 and sdata2: 2512 a = self.sessions[session]['a'] 2513 # !! TODO: CM below does not account for temporal changes in standardization parameters 2514 CM = self.sessions[session]['CM'][:3,:3] 2515 avg_D4x_1 = np.mean([r[f'D{self._4x}'] for r in sdata1]) 2516 avg_d4x_1 = np.mean([r[f'd{self._4x}'] for r in sdata1]) 2517 avg_D4x_2 = np.mean([r[f'D{self._4x}'] for r in sdata2]) 2518 avg_d4x_2 = np.mean([r[f'd{self._4x}'] for r in sdata2]) 2519 c += ( 2520 self.unknowns[sample1][f'session_D{self._4x}'][session][2] 2521 * self.unknowns[sample2][f'session_D{self._4x}'][session][2] 2522 * np.array([[avg_D4x_1, avg_d4x_1, 1]]) 2523 @ CM 2524 @ np.array([[avg_D4x_2, avg_d4x_2, 1]]).T 2525 ) / a**2 2526 return float(c)
Covariance between Δ4x values of samples
Returns the error covariance between the average Δ4x values of two
samples. If if only sample_1
is specified, or if sample_1 == sample_2
),
returns the Δ4x variance for that sample.
2528 def sample_D4x_correl(self, sample1, sample2 = None): 2529 ''' 2530 Correlation between Δ4x errors of samples 2531 2532 Returns the error correlation between the average Δ4x values of two samples. 2533 ''' 2534 if sample2 is None or sample2 == sample1: 2535 return 1. 2536 return ( 2537 self.sample_D4x_covar(sample1, sample2) 2538 / self.unknowns[sample1][f'SE_D{self._4x}'] 2539 / self.unknowns[sample2][f'SE_D{self._4x}'] 2540 )
Correlation between Δ4x errors of samples
Returns the error correlation between the average Δ4x values of two samples.
2542 def plot_single_session(self, 2543 session, 2544 kw_plot_anchors = dict(ls='None', marker='x', mec=(.75, 0, 0), mew = .75, ms = 4), 2545 kw_plot_unknowns = dict(ls='None', marker='x', mec=(0, 0, .75), mew = .75, ms = 4), 2546 kw_plot_anchor_avg = dict(ls='-', marker='None', color=(.75, 0, 0), lw = .75), 2547 kw_plot_unknown_avg = dict(ls='-', marker='None', color=(0, 0, .75), lw = .75), 2548 kw_contour_error = dict(colors = [[0, 0, 0]], alpha = .5, linewidths = 0.75), 2549 xylimits = 'free', # | 'constant' 2550 x_label = None, 2551 y_label = None, 2552 error_contour_interval = 'auto', 2553 fig = 'new', 2554 ): 2555 ''' 2556 Generate plot for a single session 2557 ''' 2558 if x_label is None: 2559 x_label = f'δ$_{{{self._4x}}}$ (‰)' 2560 if y_label is None: 2561 y_label = f'Δ$_{{{self._4x}}}$ (‰)' 2562 2563 out = _SessionPlot() 2564 anchors = [a for a in self.anchors if [r for r in self.sessions[session]['data'] if r['Sample'] == a]] 2565 unknowns = [u for u in self.unknowns if [r for r in self.sessions[session]['data'] if r['Sample'] == u]] 2566 anchors_d = [r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.anchors] 2567 anchors_D = [r[f'D{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.anchors] 2568 unknowns_d = [r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns] 2569 unknowns_D = [r[f'D{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] in self.unknowns] 2570 anchor_avg = (np.array([ np.array([ 2571 np.min([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) - 1, 2572 np.max([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) + 1 2573 ]) for sample in anchors]).T, 2574 np.array([ np.array([0, 0]) + self.Nominal_D4x[sample] for sample in anchors]).T) 2575 unknown_avg = (np.array([ np.array([ 2576 np.min([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) - 1, 2577 np.max([r[f'd{self._4x}'] for r in self.sessions[session]['data'] if r['Sample'] == sample]) + 1 2578 ]) for sample in unknowns]).T, 2579 np.array([ np.array([0, 0]) + self.unknowns[sample][f'D{self._4x}'] for sample in unknowns]).T) 2580 2581 2582 if fig == 'new': 2583 out.fig = ppl.figure(figsize = (6,6)) 2584 ppl.subplots_adjust(.1,.1,.9,.9) 2585 2586 out.anchor_analyses, = ppl.plot( 2587 anchors_d, 2588 anchors_D, 2589 **kw_plot_anchors) 2590 out.unknown_analyses, = ppl.plot( 2591 unknowns_d, 2592 unknowns_D, 2593 **kw_plot_unknowns) 2594 out.anchor_avg = ppl.plot( 2595 *anchor_avg, 2596 **kw_plot_anchor_avg) 2597 out.unknown_avg = ppl.plot( 2598 *unknown_avg, 2599 **kw_plot_unknown_avg) 2600 if xylimits == 'constant': 2601 x = [r[f'd{self._4x}'] for r in self] 2602 y = [r[f'D{self._4x}'] for r in self] 2603 x1, x2, y1, y2 = np.min(x), np.max(x), np.min(y), np.max(y) 2604 w, h = x2-x1, y2-y1 2605 x1 -= w/20 2606 x2 += w/20 2607 y1 -= h/20 2608 y2 += h/20 2609 ppl.axis([x1, x2, y1, y2]) 2610 elif xylimits == 'free': 2611 x1, x2, y1, y2 = ppl.axis() 2612 else: 2613 x1, x2, y1, y2 = ppl.axis(xylimits) 2614 2615 if error_contour_interval != 'none': 2616 xi, yi = np.linspace(x1, x2), np.linspace(y1, y2) 2617 XI,YI = np.meshgrid(xi, yi) 2618 SI = np.array([[self.standardization_error(session, x, y) for x in xi] for y in yi]) 2619 if error_contour_interval == 'auto': 2620 rng = np.max(SI) - np.min(SI) 2621 if rng <= 0.01: 2622 cinterval = 0.001 2623 elif rng <= 0.03: 2624 cinterval = 0.004 2625 elif rng <= 0.1: 2626 cinterval = 0.01 2627 elif rng <= 0.3: 2628 cinterval = 0.03 2629 elif rng <= 1.: 2630 cinterval = 0.1 2631 else: 2632 cinterval = 0.5 2633 else: 2634 cinterval = error_contour_interval 2635 2636 cval = np.arange(np.ceil(SI.min() / .001) * .001, np.ceil(SI.max() / .001 + 1) * .001, cinterval) 2637 out.contour = ppl.contour(XI, YI, SI, cval, **kw_contour_error) 2638 out.clabel = ppl.clabel(out.contour) 2639 contour = (XI, YI, SI, cval, cinterval) 2640 2641 if fig == None: 2642 return { 2643 'anchors':anchors, 2644 'unknowns':unknowns, 2645 'anchors_d':anchors_d, 2646 'anchors_D':anchors_D, 2647 'unknowns_d':unknowns_d, 2648 'unknowns_D':unknowns_D, 2649 'anchor_avg':anchor_avg, 2650 'unknown_avg':unknown_avg, 2651 'contour':contour, 2652 } 2653 2654 ppl.xlabel(x_label) 2655 ppl.ylabel(y_label) 2656 ppl.title(session, weight = 'bold') 2657 ppl.grid(alpha = .2) 2658 out.ax = ppl.gca() 2659 2660 return out
Generate plot for a single session
2662 def plot_residuals( 2663 self, 2664 kde = False, 2665 hist = False, 2666 binwidth = 2/3, 2667 dir = 'output', 2668 filename = None, 2669 highlight = [], 2670 colors = None, 2671 figsize = None, 2672 dpi = 100, 2673 yspan = None, 2674 ): 2675 ''' 2676 Plot residuals of each analysis as a function of time (actually, as a function of 2677 the order of analyses in the `D4xdata` object) 2678 2679 + `kde`: whether to add a kernel density estimate of residuals 2680 + `hist`: whether to add a histogram of residuals (incompatible with `kde`) 2681 + `histbins`: specify bin edges for the histogram 2682 + `dir`: the directory in which to save the plot 2683 + `highlight`: a list of samples to highlight 2684 + `colors`: a dict of `{<sample>: <color>}` for all samples 2685 + `figsize`: (width, height) of figure 2686 + `dpi`: resolution for PNG output 2687 + `yspan`: factor controlling the range of y values shown in plot 2688 (by default: `yspan = 1.5 if kde else 1.0`) 2689 ''' 2690 2691 from matplotlib import ticker 2692 2693 if yspan is None: 2694 if kde: 2695 yspan = 1.5 2696 else: 2697 yspan = 1.0 2698 2699 # Layout 2700 fig = ppl.figure(figsize = (8,4) if figsize is None else figsize) 2701 if hist or kde: 2702 ppl.subplots_adjust(left = .08, bottom = .05, right = .98, top = .8, wspace = -0.72) 2703 ax1, ax2 = ppl.subplot(121), ppl.subplot(1,15,15) 2704 else: 2705 ppl.subplots_adjust(.08,.05,.78,.8) 2706 ax1 = ppl.subplot(111) 2707 2708 # Colors 2709 N = len(self.anchors) 2710 if colors is None: 2711 if len(highlight) > 0: 2712 Nh = len(highlight) 2713 if Nh == 1: 2714 colors = {highlight[0]: (0,0,0)} 2715 elif Nh == 3: 2716 colors = {a: c for a,c in zip(highlight, [(0,0,1), (1,0,0), (0,2/3,0)])} 2717 elif Nh == 4: 2718 colors = {a: c for a,c in zip(highlight, [(0,0,1), (1,0,0), (0,2/3,0), (.75,0,.75)])} 2719 else: 2720 colors = {a: hls_to_rgb(k/Nh, .4, 1) for k,a in enumerate(highlight)} 2721 else: 2722 if N == 3: 2723 colors = {a: c for a,c in zip(self.anchors, [(0,0,1), (1,0,0), (0,2/3,0)])} 2724 elif N == 4: 2725 colors = {a: c for a,c in zip(self.anchors, [(0,0,1), (1,0,0), (0,2/3,0), (.75,0,.75)])} 2726 else: 2727 colors = {a: hls_to_rgb(k/N, .4, 1) for k,a in enumerate(self.anchors)} 2728 2729 ppl.sca(ax1) 2730 2731 ppl.axhline(0, color = 'k', alpha = .25, lw = 0.75) 2732 2733 ax1.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: f'${x:+.0f}$' if x else '$0$')) 2734 2735 session = self[0]['Session'] 2736 x1 = 0 2737# ymax = np.max([1e3 * (r['D47'] - self.samples[r['Sample']]['D47']) for r in self]) 2738 x_sessions = {} 2739 one_or_more_singlets = False 2740 one_or_more_multiplets = False 2741 multiplets = set() 2742 for k,r in enumerate(self): 2743 if r['Session'] != session: 2744 x2 = k-1 2745 x_sessions[session] = (x1+x2)/2 2746 ppl.axvline(k - 0.5, color = 'k', lw = .5) 2747 session = r['Session'] 2748 x1 = k 2749 singlet = len(self.samples[r['Sample']]['data']) == 1 2750 if not singlet: 2751 multiplets.add(r['Sample']) 2752 if r['Sample'] in self.unknowns: 2753 if singlet: 2754 one_or_more_singlets = True 2755 else: 2756 one_or_more_multiplets = True 2757 kw = dict( 2758 marker = 'x' if singlet else '+', 2759 ms = 4 if singlet else 5, 2760 ls = 'None', 2761 mec = colors[r['Sample']] if r['Sample'] in colors else (0,0,0), 2762 mew = 1, 2763 alpha = 0.2 if singlet else 1, 2764 ) 2765 if highlight and r['Sample'] not in highlight: 2766 kw['alpha'] = 0.2 2767 ppl.plot(k, 1e3 * r[f'D{self._4x}_residual'], **kw) 2768 x2 = k 2769 x_sessions[session] = (x1+x2)/2 2770 2771 ppl.axhspan(-self.repeatability[f'r_D{self._4x}']*1000, self.repeatability[f'r_D{self._4x}']*1000, color = 'k', alpha = .05, lw = 1) 2772 ppl.axhspan(-self.repeatability[f'r_D{self._4x}']*1000*self.t95, self.repeatability[f'r_D{self._4x}']*1000*self.t95, color = 'k', alpha = .05, lw = 1) 2773 if not (hist or kde): 2774 ppl.text(len(self), self.repeatability[f'r_D{self._4x}']*1000, f" SD = {self.repeatability[f'r_D{self._4x}']*1000:.1f} ppm", size = 9, alpha = 1, va = 'center') 2775 ppl.text(len(self), self.repeatability[f'r_D{self._4x}']*1000*self.t95, f" 95% CL = ± {self.repeatability[f'r_D{self._4x}']*1000*self.t95:.1f} ppm", size = 9, alpha = 1, va = 'center') 2776 2777 xmin, xmax, ymin, ymax = ppl.axis() 2778 if yspan != 1: 2779 ymin, ymax = (ymin + ymax)/2 - yspan * (ymax - ymin)/2, (ymin + ymax)/2 + yspan * (ymax - ymin)/2 2780 for s in x_sessions: 2781 ppl.text( 2782 x_sessions[s], 2783 ymax +1, 2784 s, 2785 va = 'bottom', 2786 **( 2787 dict(ha = 'center') 2788 if len(self.sessions[s]['data']) > (0.15 * len(self)) 2789 else dict(ha = 'left', rotation = 45) 2790 ) 2791 ) 2792 2793 if hist or kde: 2794 ppl.sca(ax2) 2795 2796 for s in colors: 2797 kw['marker'] = '+' 2798 kw['ms'] = 5 2799 kw['mec'] = colors[s] 2800 kw['label'] = s 2801 kw['alpha'] = 1 2802 ppl.plot([], [], **kw) 2803 2804 kw['mec'] = (0,0,0) 2805 2806 if one_or_more_singlets: 2807 kw['marker'] = 'x' 2808 kw['ms'] = 4 2809 kw['alpha'] = .2 2810 kw['label'] = 'other (N$\\,$=$\\,$1)' if one_or_more_multiplets else 'other' 2811 ppl.plot([], [], **kw) 2812 2813 if one_or_more_multiplets: 2814 kw['marker'] = '+' 2815 kw['ms'] = 4 2816 kw['alpha'] = 1 2817 kw['label'] = 'other (N$\\,$>$\\,$1)' if one_or_more_singlets else 'other' 2818 ppl.plot([], [], **kw) 2819 2820 if hist or kde: 2821 leg = ppl.legend(loc = 'upper right', bbox_to_anchor = (1, 1), bbox_transform=fig.transFigure, borderaxespad = 1.5, fontsize = 9) 2822 else: 2823 leg = ppl.legend(loc = 'lower right', bbox_to_anchor = (1, 0), bbox_transform=fig.transFigure, borderaxespad = 1.5) 2824 leg.set_zorder(-1000) 2825 2826 ppl.sca(ax1) 2827 2828 ppl.ylabel(f'Δ$_{{{self._4x}}}$ residuals (ppm)') 2829 ppl.xticks([]) 2830 ppl.axis([-1, len(self), None, None]) 2831 2832 if hist or kde: 2833 ppl.sca(ax2) 2834 X = 1e3 * np.array([r[f'D{self._4x}_residual'] for r in self if r['Sample'] in multiplets or r['Sample'] in self.anchors]) 2835 2836 if kde: 2837 from scipy.stats import gaussian_kde 2838 yi = np.linspace(ymin, ymax, 201) 2839 xi = gaussian_kde(X).evaluate(yi) 2840 ppl.fill_betweenx(yi, xi, xi*0, fc = (0,0,0,.15), lw = 1, ec = (.75,.75,.75,1)) 2841# ppl.plot(xi, yi, 'k-', lw = 1) 2842 elif hist: 2843 ppl.hist( 2844 X, 2845 orientation = 'horizontal', 2846 histtype = 'stepfilled', 2847 ec = [.4]*3, 2848 fc = [.25]*3, 2849 alpha = .25, 2850 bins = np.linspace(-9e3*self.repeatability[f'r_D{self._4x}'], 9e3*self.repeatability[f'r_D{self._4x}'], int(18/binwidth+1)), 2851 ) 2852 ppl.text(0, 0, 2853 f" SD = {self.repeatability[f'r_D{self._4x}']*1000:.1f} ppm\n 95% CL = ± {self.repeatability[f'r_D{self._4x}']*1000*self.t95:.1f} ppm", 2854 size = 7.5, 2855 alpha = 1, 2856 va = 'center', 2857 ha = 'left', 2858 ) 2859 2860 ppl.axis([0, None, ymin, ymax]) 2861 ppl.xticks([]) 2862 ppl.yticks([]) 2863# ax2.spines['left'].set_visible(False) 2864 ax2.spines['right'].set_visible(False) 2865 ax2.spines['top'].set_visible(False) 2866 ax2.spines['bottom'].set_visible(False) 2867 2868 ax1.axis([None, None, ymin, ymax]) 2869 2870 if not os.path.exists(dir): 2871 os.makedirs(dir) 2872 if filename is None: 2873 return fig 2874 elif filename == '': 2875 filename = f'D{self._4x}_residuals.pdf' 2876 ppl.savefig(f'{dir}/{filename}', dpi = dpi) 2877 ppl.close(fig)
Plot residuals of each analysis as a function of time (actually, as a function of
the order of analyses in the D4xdata
object)
kde
: whether to add a kernel density estimate of residualshist
: whether to add a histogram of residuals (incompatible withkde
)histbins
: specify bin edges for the histogramdir
: the directory in which to save the plothighlight
: a list of samples to highlightcolors
: a dict of{<sample>: <color>}
for all samplesfigsize
: (width, height) of figuredpi
: resolution for PNG outputyspan
: factor controlling the range of y values shown in plot (by default:yspan = 1.5 if kde else 1.0
)
2880 def simulate(self, *args, **kwargs): 2881 ''' 2882 Legacy function with warning message pointing to `virtual_data()` 2883 ''' 2884 raise DeprecationWarning('D4xdata.simulate is deprecated and has been replaced by virtual_data()')
Legacy function with warning message pointing to virtual_data()
2886 def plot_distribution_of_analyses( 2887 self, 2888 dir = 'output', 2889 filename = None, 2890 vs_time = False, 2891 figsize = (6,4), 2892 subplots_adjust = (0.02, 0.13, 0.85, 0.8), 2893 output = None, 2894 dpi = 100, 2895 ): 2896 ''' 2897 Plot temporal distribution of all analyses in the data set. 2898 2899 **Parameters** 2900 2901 + `dir`: the directory in which to save the plot 2902 + `vs_time`: if `True`, plot as a function of `TimeTag` rather than sequentially. 2903 + `dpi`: resolution for PNG output 2904 + `figsize`: (width, height) of figure 2905 + `dpi`: resolution for PNG output 2906 ''' 2907 2908 asamples = [s for s in self.anchors] 2909 usamples = [s for s in self.unknowns] 2910 if output is None or output == 'fig': 2911 fig = ppl.figure(figsize = figsize) 2912 ppl.subplots_adjust(*subplots_adjust) 2913 Xmin = min([r['TimeTag'] if vs_time else j for j,r in enumerate(self)]) 2914 Xmax = max([r['TimeTag'] if vs_time else j for j,r in enumerate(self)]) 2915 Xmax += (Xmax-Xmin)/40 2916 Xmin -= (Xmax-Xmin)/41 2917 for k, s in enumerate(asamples + usamples): 2918 if vs_time: 2919 X = [r['TimeTag'] for r in self if r['Sample'] == s] 2920 else: 2921 X = [x for x,r in enumerate(self) if r['Sample'] == s] 2922 Y = [-k for x in X] 2923 ppl.plot(X, Y, 'o', mec = None, mew = 0, mfc = 'b' if s in usamples else 'r', ms = 3, alpha = .75) 2924 ppl.axhline(-k, color = 'b' if s in usamples else 'r', lw = .5, alpha = .25) 2925 ppl.text(Xmax, -k, f' {s}', va = 'center', ha = 'left', size = 7, color = 'b' if s in usamples else 'r') 2926 ppl.axis([Xmin, Xmax, -k-1, 1]) 2927 ppl.xlabel('\ntime') 2928 ppl.gca().annotate('', 2929 xy = (0.6, -0.02), 2930 xycoords = 'axes fraction', 2931 xytext = (.4, -0.02), 2932 arrowprops = dict(arrowstyle = "->", color = 'k'), 2933 ) 2934 2935 2936 x2 = -1 2937 for session in self.sessions: 2938 x1 = min([r['TimeTag'] if vs_time else j for j,r in enumerate(self) if r['Session'] == session]) 2939 if vs_time: 2940 ppl.axvline(x1, color = 'k', lw = .75) 2941 if x2 > -1: 2942 if not vs_time: 2943 ppl.axvline((x1+x2)/2, color = 'k', lw = .75, alpha = .5) 2944 x2 = max([r['TimeTag'] if vs_time else j for j,r in enumerate(self) if r['Session'] == session]) 2945# from xlrd import xldate_as_datetime 2946# print(session, xldate_as_datetime(x1, 0), xldate_as_datetime(x2, 0)) 2947 if vs_time: 2948 ppl.axvline(x2, color = 'k', lw = .75) 2949 ppl.axvspan(x1,x2,color = 'k', zorder = -100, alpha = .15) 2950 ppl.text((x1+x2)/2, 1, f' {session}', ha = 'left', va = 'bottom', rotation = 45, size = 8) 2951 2952 ppl.xticks([]) 2953 ppl.yticks([]) 2954 2955 if output is None: 2956 if not os.path.exists(dir): 2957 os.makedirs(dir) 2958 if filename == None: 2959 filename = f'D{self._4x}_distribution_of_analyses.pdf' 2960 ppl.savefig(f'{dir}/{filename}', dpi = dpi) 2961 ppl.close(fig) 2962 elif output == 'ax': 2963 return ppl.gca() 2964 elif output == 'fig': 2965 return fig
Plot temporal distribution of all analyses in the data set.
Parameters
dir
: the directory in which to save the plotvs_time
: ifTrue
, plot as a function ofTimeTag
rather than sequentially.dpi
: resolution for PNG outputfigsize
: (width, height) of figuredpi
: resolution for PNG output
2968 def plot_bulk_compositions( 2969 self, 2970 samples = None, 2971 dir = 'output/bulk_compositions', 2972 figsize = (6,6), 2973 subplots_adjust = (0.15, 0.12, 0.95, 0.92), 2974 show = False, 2975 sample_color = (0,.5,1), 2976 analysis_color = (.7,.7,.7), 2977 labeldist = 0.3, 2978 radius = 0.05, 2979 ): 2980 ''' 2981 Plot δ13C_VBDP vs δ18O_VSMOW (of CO2) for all analyses. 2982 2983 By default, creates a directory `./output/bulk_compositions` where plots for 2984 each sample are saved. Another plot named `__all__.pdf` shows all analyses together. 2985 2986 2987 **Parameters** 2988 2989 + `samples`: Only these samples are processed (by default: all samples). 2990 + `dir`: where to save the plots 2991 + `figsize`: (width, height) of figure 2992 + `subplots_adjust`: passed to `subplots_adjust()` 2993 + `show`: whether to call `matplotlib.pyplot.show()` on the plot with all samples, 2994 allowing for interactive visualization/exploration in (δ13C, δ18O) space. 2995 + `sample_color`: color used for replicate markers/labels 2996 + `analysis_color`: color used for sample markers/labels 2997 + `labeldist`: distance (in inches) from replicate markers to replicate labels 2998 + `radius`: radius of the dashed circle providing scale. No circle if `radius = 0`. 2999 ''' 3000 3001 from matplotlib.patches import Ellipse 3002 3003 if samples is None: 3004 samples = [_ for _ in self.samples] 3005 3006 saved = {} 3007 3008 for s in samples: 3009 3010 fig = ppl.figure(figsize = figsize) 3011 fig.subplots_adjust(*subplots_adjust) 3012 ax = ppl.subplot(111) 3013 ppl.xlabel('$δ^{18}O_{VSMOW}$ of $CO_2$ (‰)') 3014 ppl.ylabel('$δ^{13}C_{VPDB}$ (‰)') 3015 ppl.title(s) 3016 3017 3018 XY = np.array([[_['d18O_VSMOW'], _['d13C_VPDB']] for _ in self.samples[s]['data']]) 3019 UID = [_['UID'] for _ in self.samples[s]['data']] 3020 XY0 = XY.mean(0) 3021 3022 for xy in XY: 3023 ppl.plot([xy[0], XY0[0]], [xy[1], XY0[1]], '-', lw = 1, color = analysis_color) 3024 3025 ppl.plot(*XY.T, 'wo', mew = 1, mec = analysis_color) 3026 ppl.plot(*XY0, 'wo', mew = 2, mec = sample_color) 3027 ppl.text(*XY0, f' {s}', va = 'center', ha = 'left', color = sample_color, weight = 'bold') 3028 saved[s] = [XY, XY0] 3029 3030 x1, x2, y1, y2 = ppl.axis() 3031 x0, dx = (x1+x2)/2, (x2-x1)/2 3032 y0, dy = (y1+y2)/2, (y2-y1)/2 3033 dx, dy = [max(max(dx, dy), radius)]*2 3034 3035 ppl.axis([ 3036 x0 - 1.2*dx, 3037 x0 + 1.2*dx, 3038 y0 - 1.2*dy, 3039 y0 + 1.2*dy, 3040 ]) 3041 3042 XY0_in_display_space = fig.dpi_scale_trans.inverted().transform(ax.transData.transform(XY0)) 3043 3044 for xy, uid in zip(XY, UID): 3045 3046 xy_in_display_space = fig.dpi_scale_trans.inverted().transform(ax.transData.transform(xy)) 3047 vector_in_display_space = xy_in_display_space - XY0_in_display_space 3048 3049 if (vector_in_display_space**2).sum() > 0: 3050 3051 unit_vector_in_display_space = vector_in_display_space / ((vector_in_display_space**2).sum())**0.5 3052 label_vector_in_display_space = vector_in_display_space + unit_vector_in_display_space * labeldist 3053 label_xy_in_display_space = XY0_in_display_space + label_vector_in_display_space 3054 label_xy_in_data_space = ax.transData.inverted().transform(fig.dpi_scale_trans.transform(label_xy_in_display_space)) 3055 3056 ppl.text(*label_xy_in_data_space, uid, va = 'center', ha = 'center', color = analysis_color) 3057 3058 else: 3059 3060 ppl.text(*xy, f'{uid} ', va = 'center', ha = 'right', color = analysis_color) 3061 3062 if radius: 3063 ax.add_artist(Ellipse( 3064 xy = XY0, 3065 width = radius*2, 3066 height = radius*2, 3067 ls = (0, (2,2)), 3068 lw = .7, 3069 ec = analysis_color, 3070 fc = 'None', 3071 )) 3072 ppl.text( 3073 XY0[0], 3074 XY0[1]-radius, 3075 f'\n± {radius*1e3:.0f} ppm', 3076 color = analysis_color, 3077 va = 'top', 3078 ha = 'center', 3079 linespacing = 0.4, 3080 size = 8, 3081 ) 3082 3083 if not os.path.exists(dir): 3084 os.makedirs(dir) 3085 fig.savefig(f'{dir}/{s}.pdf') 3086 ppl.close(fig) 3087 3088 fig = ppl.figure(figsize = figsize) 3089 fig.subplots_adjust(*subplots_adjust) 3090 ppl.xlabel('$δ^{18}O_{VSMOW}$ of $CO_2$ (‰)') 3091 ppl.ylabel('$δ^{13}C_{VPDB}$ (‰)') 3092 3093 for s in saved: 3094 for xy in saved[s][0]: 3095 ppl.plot([xy[0], saved[s][1][0]], [xy[1], saved[s][1][1]], '-', lw = 1, color = analysis_color) 3096 ppl.plot(*saved[s][0].T, 'wo', mew = 1, mec = analysis_color) 3097 ppl.plot(*saved[s][1], 'wo', mew = 1.5, mec = sample_color) 3098 ppl.text(*saved[s][1], f' {s}', va = 'center', ha = 'left', color = sample_color, weight = 'bold') 3099 3100 x1, x2, y1, y2 = ppl.axis() 3101 ppl.axis([ 3102 x1 - (x2-x1)/10, 3103 x2 + (x2-x1)/10, 3104 y1 - (y2-y1)/10, 3105 y2 + (y2-y1)/10, 3106 ]) 3107 3108 3109 if not os.path.exists(dir): 3110 os.makedirs(dir) 3111 fig.savefig(f'{dir}/__all__.pdf') 3112 if show: 3113 ppl.show() 3114 ppl.close(fig)
Plot δ13C_VBDP vs δ18OVSMOW (of CO2) for all analyses.
By default, creates a directory ./output/bulk_compositions
where plots for
each sample are saved. Another plot named __all__.pdf
shows all analyses together.
Parameters
samples
: Only these samples are processed (by default: all samples).dir
: where to save the plotsfigsize
: (width, height) of figuresubplots_adjust
: passed tosubplots_adjust()
show
: whether to callmatplotlib.pyplot.show()
on the plot with all samples, allowing for interactive visualization/exploration in (δ13C, δ18O) space.sample_color
: color used for replicate markers/labelsanalysis_color
: color used for sample markers/labelslabeldist
: distance (in inches) from replicate markers to replicate labelsradius
: radius of the dashed circle providing scale. No circle ifradius = 0
.
Inherited Members
- builtins.list
- clear
- copy
- append
- insert
- extend
- pop
- remove
- index
- count
- reverse
- sort
3156class D47data(D4xdata): 3157 ''' 3158 Store and process data for a large set of Δ47 analyses, 3159 usually comprising more than one analytical session. 3160 ''' 3161 3162 Nominal_D4x = { 3163 'ETH-1': 0.2052, 3164 'ETH-2': 0.2085, 3165 'ETH-3': 0.6132, 3166 'ETH-4': 0.4511, 3167 'IAEA-C1': 0.3018, 3168 'IAEA-C2': 0.6409, 3169 'MERCK': 0.5135, 3170 } # I-CDES (Bernasconi et al., 2021) 3171 ''' 3172 Nominal Δ47 values assigned to the Δ47 anchor samples, used by 3173 `D47data.standardize()` to normalize unknown samples to an absolute Δ47 3174 reference frame. 3175 3176 By default equal to (after [Bernasconi et al. (2021)](https://doi.org/10.1029/2020GC009588)): 3177 ```py 3178 { 3179 'ETH-1' : 0.2052, 3180 'ETH-2' : 0.2085, 3181 'ETH-3' : 0.6132, 3182 'ETH-4' : 0.4511, 3183 'IAEA-C1' : 0.3018, 3184 'IAEA-C2' : 0.6409, 3185 'MERCK' : 0.5135, 3186 } 3187 ``` 3188 ''' 3189 3190 3191 @property 3192 def Nominal_D47(self): 3193 return self.Nominal_D4x 3194 3195 3196 @Nominal_D47.setter 3197 def Nominal_D47(self, new): 3198 self.Nominal_D4x = dict(**new) 3199 self.refresh() 3200 3201 3202 def __init__(self, l = [], **kwargs): 3203 ''' 3204 **Parameters:** same as `D4xdata.__init__()` 3205 ''' 3206 D4xdata.__init__(self, l = l, mass = '47', **kwargs) 3207 3208 3209 def D47fromTeq(self, fCo2eqD47 = 'petersen', priority = 'new'): 3210 ''' 3211 Find all samples for which `Teq` is specified, compute equilibrium Δ47 3212 value for that temperature, and add treat these samples as additional anchors. 3213 3214 **Parameters** 3215 3216 + `fCo2eqD47`: Which CO2 equilibrium law to use 3217 (`petersen`: [Petersen et al. (2019)](https://doi.org/10.1029/2018GC008127); 3218 `wang`: [Wang et al. (2019)](https://doi.org/10.1016/j.gca.2004.05.039)). 3219 + `priority`: if `replace`: forget old anchors and only use the new ones; 3220 if `new`: keep pre-existing anchors but update them in case of conflict 3221 between old and new Δ47 values; 3222 if `old`: keep pre-existing anchors but preserve their original Δ47 3223 values in case of conflict. 3224 ''' 3225 f = { 3226 'petersen': fCO2eqD47_Petersen, 3227 'wang': fCO2eqD47_Wang, 3228 }[fCo2eqD47] 3229 foo = {} 3230 for r in self: 3231 if 'Teq' in r: 3232 if r['Sample'] in foo: 3233 assert foo[r['Sample']] == f(r['Teq']), f'Different values of `Teq` provided for sample `{r["Sample"]}`.' 3234 else: 3235 foo[r['Sample']] = f(r['Teq']) 3236 else: 3237 assert r['Sample'] not in foo, f'`Teq` is inconsistently specified for sample `{r["Sample"]}`.' 3238 3239 if priority == 'replace': 3240 self.Nominal_D47 = {} 3241 for s in foo: 3242 if priority != 'old' or s not in self.Nominal_D47: 3243 self.Nominal_D47[s] = foo[s] 3244 3245 def save_D47_correl(self, *args, **kwargs): 3246 return self._save_D4x_correl(*args, **kwargs) 3247 3248 save_D47_correl.__doc__ = D4xdata._save_D4x_correl.__doc__.replace('D4x', 'D47')
Store and process data for a large set of Δ47 analyses, usually comprising more than one analytical session.
3202 def __init__(self, l = [], **kwargs): 3203 ''' 3204 **Parameters:** same as `D4xdata.__init__()` 3205 ''' 3206 D4xdata.__init__(self, l = l, mass = '47', **kwargs)
Parameters: same as D4xdata.__init__()
Nominal Δ47 values assigned to the Δ47 anchor samples, used by
D47data.standardize()
to normalize unknown samples to an absolute Δ47
reference frame.
By default equal to (after Bernasconi et al. (2021)):
{
'ETH-1' : 0.2052,
'ETH-2' : 0.2085,
'ETH-3' : 0.6132,
'ETH-4' : 0.4511,
'IAEA-C1' : 0.3018,
'IAEA-C2' : 0.6409,
'MERCK' : 0.5135,
}
3209 def D47fromTeq(self, fCo2eqD47 = 'petersen', priority = 'new'): 3210 ''' 3211 Find all samples for which `Teq` is specified, compute equilibrium Δ47 3212 value for that temperature, and add treat these samples as additional anchors. 3213 3214 **Parameters** 3215 3216 + `fCo2eqD47`: Which CO2 equilibrium law to use 3217 (`petersen`: [Petersen et al. (2019)](https://doi.org/10.1029/2018GC008127); 3218 `wang`: [Wang et al. (2019)](https://doi.org/10.1016/j.gca.2004.05.039)). 3219 + `priority`: if `replace`: forget old anchors and only use the new ones; 3220 if `new`: keep pre-existing anchors but update them in case of conflict 3221 between old and new Δ47 values; 3222 if `old`: keep pre-existing anchors but preserve their original Δ47 3223 values in case of conflict. 3224 ''' 3225 f = { 3226 'petersen': fCO2eqD47_Petersen, 3227 'wang': fCO2eqD47_Wang, 3228 }[fCo2eqD47] 3229 foo = {} 3230 for r in self: 3231 if 'Teq' in r: 3232 if r['Sample'] in foo: 3233 assert foo[r['Sample']] == f(r['Teq']), f'Different values of `Teq` provided for sample `{r["Sample"]}`.' 3234 else: 3235 foo[r['Sample']] = f(r['Teq']) 3236 else: 3237 assert r['Sample'] not in foo, f'`Teq` is inconsistently specified for sample `{r["Sample"]}`.' 3238 3239 if priority == 'replace': 3240 self.Nominal_D47 = {} 3241 for s in foo: 3242 if priority != 'old' or s not in self.Nominal_D47: 3243 self.Nominal_D47[s] = foo[s]
Find all samples for which Teq
is specified, compute equilibrium Δ47
value for that temperature, and add treat these samples as additional anchors.
Parameters
fCo2eqD47
: Which CO2 equilibrium law to use (petersen
: Petersen et al. (2019);wang
: Wang et al. (2019)).priority
: ifreplace
: forget old anchors and only use the new ones; ifnew
: keep pre-existing anchors but update them in case of conflict between old and new Δ47 values; ifold
: keep pre-existing anchors but preserve their original Δ47 values in case of conflict.
Save D47 values along with their SE and correlation matrix.
Parameters
samples
: Only these samples are output (by default: all samples).dir
: the directory in which to save the faile (by defaut:output
)filename
: the name to the csv file to write to (by default:D47_correl.csv
)D47_precision
: the precision to use when writingD47
andD47_SE
values (by default: 4)correl_precision
: the precision to use when writing correlation factor values (by default: 4)
Inherited Members
- D4xdata
- R13_VPDB
- R18_VSMOW
- LAMBDA_17
- R17_VSMOW
- R18_VPDB
- R17_VPDB
- LEVENE_REF_SAMPLE
- ALPHA_18O_ACID_REACTION
- Nominal_d13C_VPDB
- Nominal_d18O_VPDB
- d13C_STANDARDIZATION_METHOD
- d18O_STANDARDIZATION_METHOD
- verbose
- prefix
- logfile
- Nf
- repeatability
- make_verbal
- msg
- vmsg
- log
- refresh
- refresh_sessions
- refresh_samples
- read
- input
- wg
- compute_bulk_delta
- crunch
- fill_in_missing_info
- standardize_d13C
- standardize_d18O
- compute_bulk_and_clumping_deltas
- compute_isobar_ratios
- split_samples
- unsplit_samples
- assign_timestamps
- report
- combine_samples
- standardize
- standardization_error
- summary
- table_of_sessions
- table_of_analyses
- covar_table
- table_of_samples
- plot_sessions
- consolidate_samples
- consolidate_sessions
- repeatabilities
- consolidate
- rmswd
- compute_r
- sample_average
- sample_D4x_covar
- sample_D4x_correl
- plot_single_session
- plot_residuals
- simulate
- plot_distribution_of_analyses
- plot_bulk_compositions
- builtins.list
- clear
- copy
- append
- insert
- extend
- pop
- remove
- index
- count
- reverse
- sort
3251class D48data(D4xdata): 3252 ''' 3253 Store and process data for a large set of Δ48 analyses, 3254 usually comprising more than one analytical session. 3255 ''' 3256 3257 Nominal_D4x = { 3258 'ETH-1': 0.138, 3259 'ETH-2': 0.138, 3260 'ETH-3': 0.270, 3261 'ETH-4': 0.223, 3262 'GU-1': -0.419, 3263 } # (Fiebig et al., 2019, 2021) 3264 ''' 3265 Nominal Δ48 values assigned to the Δ48 anchor samples, used by 3266 `D48data.standardize()` to normalize unknown samples to an absolute Δ48 3267 reference frame. 3268 3269 By default equal to (after [Fiebig et al. (2019)](https://doi.org/10.1016/j.chemgeo.2019.05.019), 3270 [Fiebig et al. (2021)](https://doi.org/10.1016/j.gca.2021.07.012)): 3271 3272 ```py 3273 { 3274 'ETH-1' : 0.138, 3275 'ETH-2' : 0.138, 3276 'ETH-3' : 0.270, 3277 'ETH-4' : 0.223, 3278 'GU-1' : -0.419, 3279 } 3280 ``` 3281 ''' 3282 3283 3284 @property 3285 def Nominal_D48(self): 3286 return self.Nominal_D4x 3287 3288 3289 @Nominal_D48.setter 3290 def Nominal_D48(self, new): 3291 self.Nominal_D4x = dict(**new) 3292 self.refresh() 3293 3294 3295 def __init__(self, l = [], **kwargs): 3296 ''' 3297 **Parameters:** same as `D4xdata.__init__()` 3298 ''' 3299 D4xdata.__init__(self, l = l, mass = '48', **kwargs) 3300 3301 def save_D48_correl(self, *args, **kwargs): 3302 return self._save_D4x_correl(*args, **kwargs) 3303 3304 save_D48_correl.__doc__ = D4xdata._save_D4x_correl.__doc__.replace('D4x', 'D48')
Store and process data for a large set of Δ48 analyses, usually comprising more than one analytical session.
3295 def __init__(self, l = [], **kwargs): 3296 ''' 3297 **Parameters:** same as `D4xdata.__init__()` 3298 ''' 3299 D4xdata.__init__(self, l = l, mass = '48', **kwargs)
Parameters: same as D4xdata.__init__()
Nominal Δ48 values assigned to the Δ48 anchor samples, used by
D48data.standardize()
to normalize unknown samples to an absolute Δ48
reference frame.
By default equal to (after Fiebig et al. (2019), Fiebig et al. (2021)):
{
'ETH-1' : 0.138,
'ETH-2' : 0.138,
'ETH-3' : 0.270,
'ETH-4' : 0.223,
'GU-1' : -0.419,
}
Save D48 values along with their SE and correlation matrix.
Parameters
samples
: Only these samples are output (by default: all samples).dir
: the directory in which to save the faile (by defaut:output
)filename
: the name to the csv file to write to (by default:D48_correl.csv
)D48_precision
: the precision to use when writingD48
andD48_SE
values (by default: 4)correl_precision
: the precision to use when writing correlation factor values (by default: 4)
Inherited Members
- D4xdata
- R13_VPDB
- R18_VSMOW
- LAMBDA_17
- R17_VSMOW
- R18_VPDB
- R17_VPDB
- LEVENE_REF_SAMPLE
- ALPHA_18O_ACID_REACTION
- Nominal_d13C_VPDB
- Nominal_d18O_VPDB
- d13C_STANDARDIZATION_METHOD
- d18O_STANDARDIZATION_METHOD
- verbose
- prefix
- logfile
- Nf
- repeatability
- make_verbal
- msg
- vmsg
- log
- refresh
- refresh_sessions
- refresh_samples
- read
- input
- wg
- compute_bulk_delta
- crunch
- fill_in_missing_info
- standardize_d13C
- standardize_d18O
- compute_bulk_and_clumping_deltas
- compute_isobar_ratios
- split_samples
- unsplit_samples
- assign_timestamps
- report
- combine_samples
- standardize
- standardization_error
- summary
- table_of_sessions
- table_of_analyses
- covar_table
- table_of_samples
- plot_sessions
- consolidate_samples
- consolidate_sessions
- repeatabilities
- consolidate
- rmswd
- compute_r
- sample_average
- sample_D4x_covar
- sample_D4x_correl
- plot_single_session
- plot_residuals
- simulate
- plot_distribution_of_analyses
- plot_bulk_compositions
- builtins.list
- clear
- copy
- append
- insert
- extend
- pop
- remove
- index
- count
- reverse
- sort
3307class D49data(D4xdata): 3308 ''' 3309 Store and process data for a large set of Δ49 analyses, 3310 usually comprising more than one analytical session. 3311 ''' 3312 3313 Nominal_D4x = {"1000C": 0.0, "25C": 2.228} # Wang 2004 3314 ''' 3315 Nominal Δ49 values assigned to the Δ49 anchor samples, used by 3316 `D49data.standardize()` to normalize unknown samples to an absolute Δ49 3317 reference frame. 3318 3319 By default equal to (after [Wang et al. (2004)](https://doi.org/10.1016/j.gca.2004.05.039)): 3320 3321 ```py 3322 { 3323 "1000C": 0.0, 3324 "25C": 2.228 3325 } 3326 ``` 3327 ''' 3328 3329 @property 3330 def Nominal_D49(self): 3331 return self.Nominal_D4x 3332 3333 @Nominal_D49.setter 3334 def Nominal_D49(self, new): 3335 self.Nominal_D4x = dict(**new) 3336 self.refresh() 3337 3338 def __init__(self, l=[], **kwargs): 3339 ''' 3340 **Parameters:** same as `D4xdata.__init__()` 3341 ''' 3342 D4xdata.__init__(self, l=l, mass='49', **kwargs) 3343 3344 def save_D49_correl(self, *args, **kwargs): 3345 return self._save_D4x_correl(*args, **kwargs) 3346 3347 save_D49_correl.__doc__ = D4xdata._save_D4x_correl.__doc__.replace('D4x', 'D49')
Store and process data for a large set of Δ49 analyses, usually comprising more than one analytical session.
3338 def __init__(self, l=[], **kwargs): 3339 ''' 3340 **Parameters:** same as `D4xdata.__init__()` 3341 ''' 3342 D4xdata.__init__(self, l=l, mass='49', **kwargs)
Parameters: same as D4xdata.__init__()
Nominal Δ49 values assigned to the Δ49 anchor samples, used by
D49data.standardize()
to normalize unknown samples to an absolute Δ49
reference frame.
By default equal to (after Wang et al. (2004)):
{
"1000C": 0.0,
"25C": 2.228
}
Save D49 values along with their SE and correlation matrix.
Parameters
samples
: Only these samples are output (by default: all samples).dir
: the directory in which to save the faile (by defaut:output
)filename
: the name to the csv file to write to (by default:D49_correl.csv
)D49_precision
: the precision to use when writingD49
andD49_SE
values (by default: 4)correl_precision
: the precision to use when writing correlation factor values (by default: 4)
Inherited Members
- D4xdata
- R13_VPDB
- R18_VSMOW
- LAMBDA_17
- R17_VSMOW
- R18_VPDB
- R17_VPDB
- LEVENE_REF_SAMPLE
- ALPHA_18O_ACID_REACTION
- Nominal_d13C_VPDB
- Nominal_d18O_VPDB
- d13C_STANDARDIZATION_METHOD
- d18O_STANDARDIZATION_METHOD
- verbose
- prefix
- logfile
- Nf
- repeatability
- make_verbal
- msg
- vmsg
- log
- refresh
- refresh_sessions
- refresh_samples
- read
- input
- wg
- compute_bulk_delta
- crunch
- fill_in_missing_info
- standardize_d13C
- standardize_d18O
- compute_bulk_and_clumping_deltas
- compute_isobar_ratios
- split_samples
- unsplit_samples
- assign_timestamps
- report
- combine_samples
- standardize
- standardization_error
- summary
- table_of_sessions
- table_of_analyses
- covar_table
- table_of_samples
- plot_sessions
- consolidate_samples
- consolidate_sessions
- repeatabilities
- consolidate
- rmswd
- compute_r
- sample_average
- sample_D4x_covar
- sample_D4x_correl
- plot_single_session
- plot_residuals
- simulate
- plot_distribution_of_analyses
- plot_bulk_compositions
- builtins.list
- clear
- copy
- append
- insert
- extend
- pop
- remove
- index
- count
- reverse
- sort